Python从novellong字符串中删除单词的完整句子

2024-04-27 03:23:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我已把一本小说粘贴到一个文本文件中。 我想删除所有包含以下句子的行,因为它们一直出现在每页的顶部(只要删除它们在这些行中的出现也可以):

"Thermal Molecular Movement in , Order and Probability"

"Molecular and Ionic Interactions as the Basis for the Formation"

"Interfacial Phenomena and Membranes"

我的第一次尝试如下:

mystring = file.read()
mystring=mystring.strip("Molecular Structure of Biological Systems")
mystring=mystring.strip("Thermal Molecular Movement in , Order and Probability")
mystring=mystring.strip("Molecular and Ionic Interactions as the Basis for the Formation")
mystring=mystring.strip("Interfacial Phenomena and Membranes")

new_file=open("no_refs.txt", "w")

new_file.write(mystring)

file.close()

但是,这对输出文本文件没有影响。。。内容完全没有变化。。。我觉得这很奇怪,因为下面的玩具示例很好用:

>>> "Hello this is a sentence. Please read it".strip("Please read it")
'Hello this is a sentence.'

由于上述方法无效,我尝试了以下方法:

file=open("novel.txt", "r")
mystring = file.readlines()
for lines in mystring:
    if "Thermal Molecular Movement in , Order and Probability" in lines:
        mystring.replace(lines, "")
    elif "Molecular and Ionic Interactions as the Basis for the Formation" in lines:
        mystring.replace(lines, "")
    elif "Interfacial Phenomena and Membranes" in lines:
        mystring.replace(lines, "")
    else:
        continue

new_file=open("no_refs.txt", "w")

new_file.write(mystring)
new_file.close()
file.close()

但是这次尝试我得到了一个错误:

TypeError:应为字符串或其他字符缓冲区对象


Tags: andtheinnewfororderfileprobability
1条回答
网友
1楼 · 发布于 2024-04-27 03:23:37
  • 首先str.strip()仅当在字符串的开始结束处找到模式时才删除该模式,这说明它似乎在测试中起作用,但实际上不是您想要的。你知道吗
  • 第二,您正试图对不在当前行上的列表执行替换(并且您不分配回替换结果)

下面是一个固定版本,它成功地删除了线的模式:

with open("novel.txt", "r") as file:
    mystring = file.readlines()
    for i,line in enumerate(mystring):
        for pattern in ["Thermal Molecular Movement in , Order and Probability","Molecular and Ionic Interactions as the Basis for the Formation","Interfacial Phenomena and Membranes"]:
            if pattern in line:
                mystring[i] = line.replace(pattern,"")                    

    # print the processed lines
    print("".join(mystring))

注意enumerate构造,它允许迭代值和索引。仅对值进行迭代将允许查找模式,但不允许在原始列表中修改它们。你知道吗

还要注意with open构造,它在离开块时立即关闭文件。你知道吗

这是一个完全删除包含模式的行的版本(等等,这里有一些单行函数编程的东西):

with open("novel.txt", "r") as file:
    mystring = file.readlines()
    pattern_list = ["Thermal Molecular Movement in , Order and Probability","Molecular and Ionic Interactions as the Basis for the Formation","Interfacial Phenomena and Membranes"]
     mystring = "".join(filter(lambda line:all(pattern not in line for pattern in pattern_list),mystring))
    # print the processed lines
    print(mystring)

说明:根据条件筛选行列表:行中不能有任何不需要的模式。你知道吗

相关问题 更多 >