查找列表的公共元素

1 投票

7 回答

924 浏览

提问于 2025-04-16 03:26

你好，参考之前的帖子。

给定以下列表：

['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']

我想统计每个以大写字母开头的单词出现的次数，并显示出现次数最多的前三个。

我对那些不以大写字母开头的单词不感兴趣。

如果一个单词出现多次，有时是大写开头，有时不是，只计算大写开头的次数。

这是我目前的代码：

words = ""
for word in open('novel.txt', 'rU'):
      words += word
words = words.split(' ')
words= list(words)
words = ('\n'.join(words)).split('\n')

word_counter = {}

for word in words:

        if word in word_counter:
            word_counter[word] += 1
        else:
            word_counter[word] = 1      
popular_words = sorted(word_counter, key = word_counter.get, reverse = True)
top_3 = popular_words[:3]

matches = []

for i in range(3):

      print word_counter[top_3[i]], top_3[i]

列表数据处理编程技巧文本分析统计大写字母公共元素单词频率

7 个回答

这里有一些额外的评论：

words = ""
for word in open('novel.txt', 'rU'):
      words += word
words = words.split(' ')
words= list(words)
words = ('\n'.join(words)).split('\n')

可以用以下内容替换：

text = open('novel.txt', 'rU').read() # read everything
wordlist = text.split() # split on all whitespace

但是你还没有使用“必须以大写字母开头”的要求。现在是时候添加了：

capwordlist = (word for word in wordlist if word.istitle())

istitle() 的意思是 word[0].isupper() and word[1:].islower()。这就是说 'SO'.istitle() -> False。

这可能对你有用，但也许你只想要 word[0].isupper()。

如果你不能使用 collections.Counter（在2.7版本中新增），这一部分是不错的选择。

word_counter = {}

for word in capwordlist:

        if word in word_counter:
            word_counter[word] += 1
        else:
            word_counter[word] = 1      
popular_words = sorted(word_counter, key = word_counter.get, reverse = True)
top_3 = popular_words[:3]

否则这就简单变成：

from collections import Counter

word_counter = Counter(capwords)
top_3 = word_counter.most_common(3) # gives `word, count` pairs!

还有这个：

for i in range(3):
      print word_counter[top_3[i]], top_3[i]

可以变成：

for word in top_3:
    print word_counter[word], word

回答于 2025-04-16 由 Python大师

分享举报

在编程中，有时候我们会遇到一些问题，比如代码运行不正常或者出现错误。这时候，我们可以去一些技术论坛，比如StackOverflow，去寻找解决方案或者向其他人提问。

在这些论坛上，很多人会分享他们的经验和解决方法。你可以看到其他人遇到的类似问题，以及他们是如何解决的。这对于刚开始学习编程的人来说，特别有帮助，因为你可以从别人的错误和成功中学习。

如果你在论坛上提问，记得描述清楚你的问题，包括你遇到的错误信息和你尝试过的解决方法。这样其他人才能更好地理解你的问题，并给出有效的建议。

总之，利用好这些技术论坛，可以帮助你更快地解决问题，提升你的编程技能。

print "\n".join(sorted(["%d %s" % (lst.count(i), i) \
             for i in set(lst) if i.istitle()])[-3:])
2 And
5 Cats
6 Jellicle

回答于 2025-04-16 由 Python大师

分享举报

#uncomment to produce the word file
##words = ['Jellicle', 'Cats', 'are', 'black', 'and', 'white,', 'Jellicle', 'Cats', 'are', 'rather', 'small;', 'Jellicle', 'Cats', 'are', 'merry', 'and', 'bright,', 'And', 'pleasant', 'to', 'hear', 'when', 'they', 'caterwaul.', 'Jellicle', 'Cats', 'have', 'cheerful', 'faces,', 'Jellicle', 'Cats', 'have', 'bright', 'black', 'eyes;', 'They', 'like', 'to', 'practise', 'their', 'airs', 'and', 'graces', 'And', 'wait', 'for', 'the', 'Jellicle', 'Moon', 'to', 'rise.', '']
##open('novel.txt','w').write('\n'.join(words))

import string
cap_words = [word.strip(string.punctuation) for word in open('novel.txt').read().split() if word.istitle()]
##print(cap_words) # debug
try:
    from collections import Counter # Python >= 2.7
    print('Counter')
    print(Counter(cap_words).most_common(3))
except ImportError:
    print('Normal dict')
    wordcount= dict()
    for word in cap_words:
         wordcount[word] = (wordcount[word] + 1
                            if word in wordcount
                            else 1)
    print(sorted(wordcount.items(), key = lambda x: x[1], reverse = True)[:3])

我不太明白你为什么想用'rU'模式来保持不同类型的行结束符。一般来说，我会像我在上面修改的代码那样，正常打开文件就可以了。
补充一下：你有一些单词和标点符号连在一起，所以我用strip()把它们清理了一下。

回答于 2025-04-16 由 Python大师

分享举报

查找列表的公共元素

7 个回答

撰写回答