用Python将文本文件转换为HTML文件

13 投票

5 回答

65657 浏览

提问于 2025-04-18 13:04

我有一个文本文件，里面包含：

JavaScript              0
/AA                     0
OpenAction              1
AcroForm                0
JBIG2Decode             0
RichMedia               0
Launch                  0
Colors>2^24             0
uri                     0

我写了这段代码来把文本文件转换成HTML格式：

contents = open("C:\\Users\\Suleiman JK\\Desktop\\Static_hash\\test","r")
    with open("suleiman.html", "w") as e:
        for lines in contents.readlines():
            e.write(lines + "<br>\n")

但是我在生成的HTML文件中遇到的问题是，每一行的两个栏目之间没有空格：

JavaScript 0
/AA 0
OpenAction 1
AcroForm 0
JBIG2Decode 0
RichMedia 0
Launch 0
Colors>2^24 0
uri 0

我该怎么做才能让内容和文本文件中的两个栏目一样有空格呢？

文本处理文本转换编码问题格式化 html格式

5 个回答

我在这里逐行添加了标题，并把每一行都放在< tr >和< td >标签里，这样应该能形成一个没有列的单一表格。对于col1和col2这些标签（< tr >< /tr > 和 < td >< /td >【为了可读性留了空格】）是没必要使用的。

日志片段：

MUTHU页面

2019/08/19 19:59:25 MUTHUKUMAR时间日期，行：118 信息 | 日志对象创建：MUTHUKUMAR_APP_USER_SIGNUP_LOG 2019/08/19 19:59:25 MUTHUKUMAR_DB_USER_SIGN_UP，行：48 信息 | ***** 用户注册页面开始 ***** 2019/08/19 19:59:25 MUTHUKUMAR_DB_USER_SIGN_UP，行：49 信息 | 输入名字：[只允许字母，最少3个字符，最多20个字符]

html源页面：

'''

<?xml version="1.0" encoding="utf-8"?>
<body>
 <table>
  <p>
   MUTHU PAGE
  </p>
  <tr>
   <td>
    2019/08/19 19:59:25 MUTHUKUMAR_TIME_DATE,line: 118     INFO | Logger object created for: MUTHUKUMAR_APP_USER_SIGNUP_LOG
   </td>
  </tr>
  <tr>
   <td>
    2019/08/19 19:59:25 MUTHUKUMAR_DB_USER_SIGN_UP,line: 48     INFO | ***** User SIGNUP page start *****
   </td>
  </tr>
  <tr>
   <td>
    2019/08/19 19:59:25 MUTHUKUMAR_DB_USER_SIGN_UP,line: 49     INFO | Enter first name: [Alphabet character only allowed, minimum 3 character to maximum 20 chracter]

'''

代码：

from bs4 import BeautifulSoup

soup = BeautifulSoup(features='xml')
body = soup.new_tag('body')
soup.insert(0, body)
table = soup.new_tag('table')
body.insert(0, table)

with open('C:\\Users\xxxxx\\Documents\\Latest_24_may_2019\\New_27_jun_2019\\DB\\log\\input.txt') as infile:
    title_s = soup.new_tag('p')
    title_s.string = " MUTHU PAGE "
    table.insert(0, title_s)
    for line in infile:
        row = soup.new_tag('tr')
        col1 = list(line.split('\n'))
        col1 = [ each for each in col1 if each != '']
        for coltext in col1:
            col = soup.new_tag('td')
            col.string = coltext
            row.insert(0, col)
        table.insert(len(table.contents), row)

with open('C:\\Users\xxxx\\Documents\\Latest_24_may_2019\\New_27_jun_2019\\DB\\log\\output.html', 'w') as outfile:
    outfile.write(soup.prettify())

回答于 2025-04-18 由 Python大师

分享举报

你可以使用一些独立的模板库，比如mako或者jinja。下面是一个使用jinja的例子：

from jinja2 import Template
c = '''<!doctype html>
<html>
<head>
    <title>My Title</title>
</head>
<body>
<table>
   <thead>
       <tr><th>Col 1</th><th>Col 2</th></tr>
   </thead>
   <tbody>
       {% for col1, col2 in lines %}
       <tr><td>{{ col 1}}</td><td>{{ col2 }}</td></tr>
       {% endfor %}
   </tbody>
</table>
</body>
</html>'''

t = Template(c)

lines = []

with open('yourfile.txt', 'r') as f:
    for line in f:
        lines.append(line.split())

with open('results.html', 'w') as f:
    f.write(t.render(lines=lines))

如果你不能安装jinja，那么这里有一个替代方案：

header = '<!doctyle html><html><head><title>My Title</title></head><body>'
body = '<table><thead><tr><th>Col 1</th><th>Col 2</th></tr>'
footer = '</table></body></html>'

with open('input.txt', 'r') as input, open('output.html', 'w') as output:
   output.writeln(header)
   output.writeln(body)
   for line in input:
       col1, col2 = line.rstrip().split()
       output.write('<tr><td>{}</td><td>{}</td></tr>\n'.format(col1, col2))
   output.write(footer)

回答于 2025-04-18 由 Python大师

分享举报

这是因为HTML解析器会把所有的空白字符合并在一起。其实有两种方法可以做到这一点（当然可能还有很多其他方法）。

一种方法是把内容标记为“预格式化文本”，也就是用<pre>...</pre>标签包裹起来。

另一种方法是使用表格（这正是表格的用途）：

<table>
  <tr><td>Javascript</td><td>0</td></tr>
  ...
</table>

手动输入可能会比较麻烦，但从你的脚本生成就简单多了。像这样应该可以工作：

contents = open("C:\\Users\\Suleiman JK\\Desktop\\Static_hash\\test","r")
with open("suleiman.html", "w") as e:
    e.write("<table>\n")   
    for lines in contents.readlines():
        e.write("<tr><td>%s</td><td>%s</td></tr>\n"%lines.split())
    e.write("</table>\n")

回答于 2025-04-18 由 Python大师

分享举报

这段是HTML代码 -- 使用 BeautifulSoup 来处理它

from bs4 import BeautifulSoup

soup = BeautifulSoup()
body = soup.new_tag('body')
soup.insert(0, body)
table = soup.new_tag('table')
body.insert(0, table)

with open('path/to/input/file.txt') as infile:
    for line in infile:
        row = soup.new_tag('tr')
        col1, col2 = line.split()
        for coltext in (col2, col1): # important that you reverse order
            col = soup.new_tag('td')
            col.string = coltext
            row.insert(0, col)
        table.insert(len(table.contents), row)

with open('path/to/output/file.html', 'w') as outfile:
    outfile.write(soup.prettify())

回答于 2025-04-18 由 Python大师

分享举报

只需要在你的代码中加上 <pre> 和 </pre> 标签，这样可以确保你的文本按照你在原始文本文件中设置的格式显示出来。

contents = open"C:\\Users\\Suleiman JK\\Desktop\\Static_hash\\test","r")
with open("suleiman.html", "w") as e:
    for lines in contents.readlines():
        e.write("<pre>" + lines + "</pre> <br>\n")

回答于 2025-04-18 由 Python大师

分享举报

用Python将文本文件转换为HTML文件

5 个回答

撰写回答