Python 字典与 nltk
我需要找出在brown_news语料库中出现多个词性标签的单词,只能使用Python字典。到目前为止,我只做到这些:
import nltk
brown_news = nltk.corpus.brown.tagged_words(categories="news")
multi_tags = {}
for (word,tag) in brown_news:
. . .
我知道需要把所有的单词和标签放进一个字典里(在这个例子中叫multi_tags),用的是if-then语句。接下来,是不是需要再用一个方法来筛选出那些只出现一个标签的单词呢?任何帮助都非常感谢。
1 个回答
1
你可以使用一个叫做 defaultdict
的工具来记录单词和它们的词性出现的情况。希望这对你有帮助:
>>> from nltk.corpus import brown
>>> from collections import defaultdict
>>> word2pos = defaultdict(set)
>>> for i in brown.tagged_words():
... word2pos[i[0]].add(i[1])
...
>>> for word in word2pos:
... if len(word2pos[word]) > 1:
... print word, word2pos[word]
[输出结果]:
consented set(['VBN', 'VBD'])
centered set(['VBN', 'VBD'])
conspicuously set(['QL', 'RB'])
injected set(['VBN', 'VBD'])
strung set(['VBN', 'VBD'])
ram set(['VB', 'NN'])
relatively set(['QL', 'RB'])
postgraduate set(['JJ', 'NN'])
rides set(['VBZ', 'NNS'])
glimpsed set(['VBN', 'VBD'])
Ogden set(['NP', 'NP-HL', 'NP-TL'])
Reports set(['VBZ', 'NNS', 'NNS-TL'])
audition set(['VB', 'NN'])
commanding set(['VBG', 'NN'])
glow set(['VB', 'VB-HL', 'NN'])
metal set(['NN-HL', 'NN'])
contacted set(['VBN', 'VBD'])