Python中无词序的二元频率

txt = open('txt file', 'r') finder1 = BigramCollocationFinder.from_words(txt.read().split(),window_size = 3) finder1.apply_freq_filter(3) bigram_measures = nltk.collocations.BigramAssocMeasures() for k,v in sorted(list(combinations((set(finder1.ngram_fd.items())),2)),key=lambda t:t[-1], reverse=True)[:10]: print(k,v)

1条回答

网友

1楼 · 发布于 2024-06-16 15:01:44

这似乎是您可以使用sets作为Counter中的键的地方。从链接的文档中可以看出，集合是无序容器，计数器是专门用于计算iterable中对象出现次数的字典。可能看起来像这样：

from string import punctuation as punct

with open('txt file.txt') as txt:
    doc = txt.read().translate({c: '' for c in punct}).split()

c = Counter()

c.update(fronzenset((doc[i], doc[i+1])) for i in range(len(doc) - 1))

with语句处理文件，然后自动关闭连接。从那里，它将它读入由空格字符（空格、换行符等）分隔的单词列表中。然后初始化计数器并计算字符串中无序的单词对。在

相关问题更多 >

编程相关推荐

热门问题

热门文章