在Python中,应该如何排除包含任何模式列表的文件行?

2024-04-24 02:35:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文件,我想从中删除包含特定模式的每一行。假设模式如下:

lineRemovalPatterns = [
    "!DOCTYPE html",
    "<html",
    "<head",
    "<meta",
    "<title",
    "<link rel>",
    "</head>",
    "<body>",
    "</body>",
    "</html>"
]

我应该如何循环文件并只保留不包含这些模式的行?你知道吗

HTMLGitFileContent = ""
HTMLSVNFileName = "README_SVN.html"
# Loop over the lines of the HTML SVN file, building the resultant Git file
# content. If any of the line removal patterns are in a line, remove that
# line.
HTMLSVNFile = open(HTMLSVNFileName, "r")
for line in HTMLSVNFile:
    for lineRemovalPattern in lineRemovalPatterns:
        if lineRemovalPattern not in line:
            HTMLGitFileContent = HTMLGitFileContent + "\n" + line
            break

Tags: 文件oftheinhtmlline模式svn
2条回答

可以使用^{}而不是lineRemovalPattern not in line来排除包含要删除的子字符串的行。你知道吗

不过,我还是回显@doctorlove,因为真正的DOM解析器可能会更好地为您服务。这条路不要走太远!你知道吗

以下方法使用函数any返回值的求反,该函数应用于涉及当前行和模式列表的列表理解:

# Create a variable for resultant Git file content.
HTMLGitFileContent = ""
HTMLSVNFileName = "README_SVN.html"
HTMLGitFileName = "README.html"
# Loop over the lines of the HTML SVN file, building the resultant Git file
# content. If any of the line removal patterns are in a line, remove that
# line.
HTMLSVNFile = open(HTMLSVNFileName, "r")
for line in HTMLSVNFile:
    if not any(pattern in line for pattern in lineRemovalPatterns):
        HTMLGitFileContent = HTMLGitFileContent + line
HTMLGitFile = open(HTMLGitFileName, "w")
HTMLGitFile.write(HTMLGitFileContent)

相关问题 更多 >