从另一个fi删除停止字

thestop = open("stopwords1.txt", "r").readlines() def remove_stop(stopwords): new = [] new.append(open("helpme.txt","r").readlines()) stop = [] stop.append(stopwords) for word in stop[:]: new.remove(word) print(new) remove_stop(thestop)

3条回答

网友

1楼 · 编辑于 2024-04-26 17:54:22

在你的代码中'word'返回一个列表。您正在尝试删除“new”中不存在的项。所以是抛出错误。将for循环替换为

for word in stop[:]: 
    for i in word:
        if i in new:
            new.remove(i)

网友

2楼 · 编辑于 2024-04-26 17:54:22

尝试打印remove_stop函数中的stop变量，它应该类似于[['stop word 1\n', 'stop word 2\n'....]]。（readlines不删除回车键）

因此，for循环将只有一个元素是停止字的列表，而不是它们本身的停止字（对于new也是一样）。在

这个问题可以这样解决，删除new和stop变量并替换它们。在

stop = stopwords

new = open("helpme.txt","r").read().split('\n')

您还需要将thestop更改为open("stopwords1.txt", "r").read().split('\n')以删除任何回车符，或者可以在使用readlines读取文件后删除它们。在

最后，您将需要一个嵌套循环，因为您希望从每一行中删除停止字，您的循环将是这样的。在

for i in range(0, len(new)):
    for j in range(0, len(stop)):
        new[i] = new[i].replace(stop[j], '')

网友

3楼 · 编辑于 2024-04-26 17:54:22

你可以在你的代码中改进很多东西。。。在

def remove_stop(stopwords):
    stopwords = set(stopwords) # It is faster to look up in a set!
    new = []

正确打开文件并将其用作迭代器：

^{pr2}$

对于文件中的每一行，将该行分解成单词。检查单词是否不在stopwords中，并将幸存者合并成另一行。将该行添加到已处理行的列表中。注意，如果有任何单词后跟标点符号，则不会对其进行处理。使用NLTK处理标点符号。在

            line = ' '.join([word for word in line.split() 
                               if word not in stopwords])
            new.append(line)

后五行可以写成一行行，但你不必走那么远。别忘了返回干净线的列表！在

    return new

相关问题更多 >

编程相关推荐

热门问题

热门文章