在Python 3中从带格式标记的纯文本生成HTML

0 投票

1 回答

817 浏览

提问于 2025-04-16 21:11

我写了一些Python 3的脚本，用来处理一个格式化的文本文件，把里面的数据放到SQLite数据库里。然后，这些数据库里的数据会被用在一个PHP应用程序中。我的文本文件里的数据有一些格式标记，比如粗体和斜体，但这些标记对浏览器来说并不容易理解。它的格式大概是这样的：

fi:xxxx        (italics on the word xxxx (turned off at the word break))
fi:{xxx…xxx}   (italics on the word or phrase in the curly brackets {})
fb:xxxx        (bold on the word xxxx (turned off at the word break))
fb:{xxx}       (bold on the word or phrase in the brackets {})
fv:xxxx        (bold on the word xxxx (turned off at the word break))
fv:{xxx…xxx}   (bold on the word or phrase in the brackets {})
fn:{xxx…xxx}   (no formatting)

我想把每一行的源文本转换成两行：第一行是用HTML标签替代源文本格式的字符串，第二行是去掉所有格式标记后的字符串。每一行源文本都需要有一行格式化的和一行去格式化的，即使那一行没有使用任何格式标记。在源数据中，可能会在同一行出现多个不同（或相同）的格式标记，但你不会找到任何在行末之前就结束的标记。

1 个回答

要格式化带括号的部分，你可以这样做：

while text.find(":{") > -1:
    index = text.find(":{")
    if text[index-2:index]=="fb":
        text = text[:index-2] + "<b>" + text[index+2:] #insert <b>
        text = text.replace("}","</b>",1) # replace one.
    # else if fi, fv, etc.

这段代码会把“other fb:{bold text} text”转换成“other bold text text”。

接下来，你可以处理用空格分开的部分：

array = text.split(" ")
for word in array:
    if (word.startswith("fi")):
        word = "<i>"+word[2:]+"</i>"
    else if (word.startswith("fb")):
        ....
text = " ".join(array)

如果你只想要纯文本，可以把像“<b>”和“</b>”这样的标签替换成空字符串“”。

如果格式化的内容不跨多行，逐行读取和转换会更高效，可以使用：

inFile = open("file.txt","r")
outFile = open("file.out","w")

def convert(text):
    #Change text here.
    return text

for line in inFile:
    outFile.write(convert(line))

回答于 2025-04-16 由 Python大师

分享举报

在Python 3中从带格式标记的纯文本生成HTML

1 个回答

撰写回答