使用WordNet和NLTK在语料库中替换同义词 - python
我想写一个简单的Python脚本,利用NLTK这个库来在文本文件中查找并替换同义词。
下面的代码让我遇到了错误:
Traceback (most recent call last):
File "C:\Users\Nedim\Documents\sinon2.py", line 21, in <module>
change(word)
File "C:\Users\Nedim\Documents\sinon2.py", line 4, in change
synonym = wn.synset(word + ".n.01").lemma_names
TypeError: can only concatenate list (not "str") to list
这是代码:
from nltk.corpus import wordnet as wn
def change(word):
synonym = wn.synset(word + ".n.01").lemma_names
if word in synonym:
filename = open("C:/Users/tester/Desktop/test.txt").read()
writeSynonym = filename.replace(str(word), str(synonym[0]))
f = open("C:/Users/tester/Desktop/test.txt", 'w')
f.write(writeSynonym)
f.close()
f = open("C:/Users/tester/Desktop/test.txt")
lines = f.readlines()
for i in range(len(lines)):
word = lines[i].split()
change(word)
2 个回答
1
有两件事。首先,你可以把读取文件的部分改成:
for line in open("C:/Users/tester/Desktop/test.txt"):
word = line.split()
其次,.split()
会返回一个字符串列表,而你的 change
函数似乎只处理一个单词。这就是导致错误的原因。你的 word
实际上是一个列表。
如果你想处理那一行中的每一个单词,可以把它改成:
for line in open("C:/Users/tester/Desktop/test.txt"):
words = line.split()
for word in words:
change(word)
2
这样做效率不是特别高,而且这并不能替换一个单独的同义词,因为每个词可能有多个同义词。你可以从中选择一个。
from nltk.corpus import wordnet as wn
from nltk.corpus.reader.plaintext import PlaintextCorpusReader
corpus_root = 'C://Users//tester//Desktop//'
wordlists = PlaintextCorpusReader(corpus_root, '.*')
for word in wordlists.words('test.txt'):
synonymList = set()
wordNetSynset = wn.synsets(word)
for synSet in wordNetSynset:
for synWords in synSet.lemma_names:
synonymList.add(synWords)
print synonymList