如何通过nltk.频率分配一个大的分割列表或fi

import nltk import cPickle as pickle import sys import os import itertools as it for no,i in enumerate(it.islice(it.count(), 3, 33+3, 3)): if no == 0: fil = tokens[0:i] else: fil = tokens[i-3+1:i+1] file_name = "/tmp/words/text" + str(no+1) + '.p' files = open(file_name, "wb") pickle.dump(fil, files) files.close()

2条回答

网友

1楼 · 编辑于 2024-04-25 19:55:24

尝试：

FreqDist(chain(*[word_tokenize(line) for line in open('in.txt')]))

例如：

^{pr2}$

网友

2楼 · 编辑于 2024-04-25 19:55:24

我将以下文本存储到11个pickle文件中：

text = 'The European Union’s plan to send refugees fleeing Syria’s civil war back to Turkey en masse could be illegal, a top UN official has said, as concerns mounted that Greece,Greece2'

这个目录名为words（path=/tmp/words），有11个填充名为testo1、testo2等。现在我找到了实现目标的正确理解：

^{pr2}$

现在，似乎一切都在工作，但我在问自己，这是否会一步一步地给FreqDist提供信息，还是先加载列表，然后再处理它。因为我的想法是一步一步地加载和处理文件，而不必一次加载以节省内存。在

再次感谢你的帮助。在

相关问题更多 >

编程相关推荐

热门问题

热门文章