如何将NLTK计算分配到多个核心?
我正在对恩龙数据集进行二元组分析:
for message in messages.find():
sentences = [ s for s in nltk.tokenize.sent_tokenize(message["body"]) ]
for sentence in sentences:
words = words + PunktWordTokenizer().tokenize(sentence)
finder = BigramCollocationFinder.from_words(words)
print finder.nbest(bigram_measures.pmi, 20)
但是,当我查看“top”命令的输出时,我发现一个核心的使用率很高,而其他核心却闲着。有没有办法让我把计算任务分配到其他所有核心上(这是在谷歌云计算引擎上)
top命令的输出:
Tasks: 117 total, 2 running, 115 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 7369132 total, 5303352 used, 2065780 free, 68752 buffers
KiB Swap: 0 total, 0 used, 0 free, 4747800 cached