用python将文本解析为段落(循环问题)

2024-04-19 17:48:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用googlesheetsapi和python从电子表格中输入的数据生成HTML标记。有时用户在单个单元格中输入长文本块,我希望在出现新行时使用python将其解析为语义段落。你知道吗

通过使用str.splitlines()和forloop,我可以让它在概念上工作,但是循环的第一次迭代是打印出来的。你知道吗

#!/usr/bin/python

#sample text from spreadsheet
text = """Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."""

#break intro text into paragraphs
def pgparse(text):
    #split at every new line
    lines = text.splitlines()
    #wrap lines in p tags
    for i in lines:
        return '<p>'+i+'</p>'

print(pgparse(text))

结果:

<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.</p>

预期结果:

<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.</p>
<p>It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>

Tags: andofthetextintypewithit
2条回答
return '<p>'+i+'</p>'

此行退出函数。也许你想要:

def pgparse(text):
    result = []
    #split at every new line
    lines = text.splitlines()
    #wrap lines in p tags
    for i in lines:
        result.append('<p>'+i+'</p>')
    return result

你只回第一行。你的第二行从来没有包装过。 试试这个:

#!/usr/bin/python

#sample text from spreadsheet
text = """Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."""

#break intro text into paragraphs
def pgparse(text):
    #split at every new line
    lines = text.splitlines()
    #wrap lines in p tags
    return "\n".join('<p>'+i+'</p>' for i in lines)

print(pgparse(text))

使用生成器表达式包装行,然后使用\n将它们连接回来

相关问题 更多 >