从文本文件创建Python字典并检索每个单词的计数问题的回答

从文本文件创建Python字典并检索每个单词的计数

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

这听起来像是<code>collections.Counter</code>的工作： <pre><code>import collections with open('gettysburg.txt') as f: c = collections.Counter(f.read().split()) print "'Four' appears %d times"%c['Four'] print "'the' appears %d times"%c['the'] print "There are %d total words"%sum(c.values()) print "The 5 most common words are", c.most_common(5) </code></pre> 结果： <pre class="lang-none prettyprint-override"><code>$ python foo.py 'Four' appears 1 times 'the' appears 9 times There are 267 total words The 5 most common words are [('that', 10), ('the', 9), ('to', 8), ('we', 8), ('a', 7)] </code></pre> <hr/> 当然，这将“自由”和“这个”算作单词（注意单词中的标点符号）。此外，它还将“The”和“The”视为不同的单词。此外，处理整个文件可能会丢失非常大的文件。 这是一个忽略标点和大小写的版本，在大文件上更节省内存。 <pre><code>import collections import re with open('gettysburg.txt') as f: c = collections.Counter( word.lower() for line in f for word in re.findall(r'\b[^\W\d_]+\b', line)) print "'Four' appears %d times"%c['Four'] print "'the' appears %d times"%c['the'] print "There are %d total words"%sum(c.values()) print "The 5 most common words are", c.most_common(5) </code></pre> 结果： <pre class="lang-none prettyprint-override"><code>$ python foo.py 'Four' appears 0 times 'the' appears 11 times There are 271 total words The 5 most common words are [('that', 13), ('the', 11), ('we', 10), ('to', 8), ('here', 8)] </code></pre> 参考文献： <ul> <li><a href="https://docs.python.org/2/library/re.html" rel="nofollow noreferrer">https://docs.python.org/2/library/re.html</a></li> <li><a href="https://docs.python.org/2/library/collections.html#collections.Counter" rel="nofollow noreferrer">https://docs.python.org/2/library/collections.html#collections.Counter</a></li> <li><a href="https://stackoverflow.com/questions/5717886/extracting-whole-words">Extracting whole words</a></li> </ul>

从文本文件创建Python字典并检索每个单词的计数

1 个回答

相关Python问题