在列表中查找最小公有元素

5 投票

5 回答

7393 浏览

提问于 2025-04-17 14:16

我想生成一个有序的列表，列出一段大文本中出现频率最低的单词，出现频率最低的单词排在最前面，并且还要显示它在文本中出现的次数。

我从一些在线期刊文章中抓取了文本，然后简单地进行了分割和赋值；

article_one = """ large body of text """.split() 
=> ("large","body", "of", "text")

接下来似乎需要用到正则表达式，但我对编程还不太熟悉——如果最好的答案里有正则表达式，能不能给我推荐一个好的正则表达式教程，除了pydoc以外的？

正则表达式文本处理数据抓取有序列表词频分析最小公倍数

5 个回答

在一个列表中找到最少出现的元素。根据 Collections模块中的Counter类

c.most_common()[:-n-1:-1]       # n least common elements

所以，找到列表中最少出现元素的代码是

from collections import Counter
Counter( mylist ).most_common()[:-2:-1]

两个最少出现的元素是

from collections import Counter
Counter( mylist ).most_common()[:-3:-1]

python-3.x

回答于 2025-04-17 由 Python大师

分享举报

有没有更简单、更短的写法，用一个叫 defaultdict 的东西？Counter 这个工具不错，但需要 Python 2.7，而这个方法从 2.5 版本就可以用了 :)

import collections

counter = collections.defaultdict(int)
article_one = """ large body of text """

for word in article_one.split():
    counter[word] += 1

print sorted(counter.iteritems(), key=lambda x: x[::-1])

回答于 2025-04-17 由 Python大师

分享举报

这是来自官方文档的现成答案，大家可以参考一下。

# From the official documentation ->>
>>> # Tally occurrences of words in a list
>>> cnt = Counter()
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
...     cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})
## ^^^^--- from the standard documentation.

>>> # Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall('\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
 ('you', 554),  ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]

>>> def least_common(adict, n=None):
.....:       if n is None:
.....:               return sorted(adict.iteritems(), key=itemgetter(1), reverse=False)
.....:       return heapq.nsmallest(n, adict.iteritems(), key=itemgetter(1))

显然，你可以根据自己的需要进行调整 :D

回答于 2025-04-17 由 Python大师

分享举报

在列表中查找最小公有元素

5 个回答

撰写回答