如何使用多重计数制作字典

2024-06-09 05:46:00 发布

您现在位置:Python中文网/ 问答频道 /正文

这是一个很难问的问题。但是,到目前为止,我有以下代码:

#create the dictionary with the word profiles

        for u in unique:
            kw = u
            count_word = [i for i in temp for j in i.split() if j == kw]
            count_dict = {j: i.count(j) for i in count_word for j in i.split() if j != kw}
            print(kw)

            #format the dictionary
            for a, c in sorted(count_dict.items(), key=lambda x: x[0]):
                print('{}: {}'.format(a, c))
            print()

这正是我想要它做的,除了,独特的词需要一个计数器以及。在下面的例子中,我将river作为惟一的词,它将遍历代码并与temp列表进行比较。其输出如下:

river (# This should be river: 4 not just river)
atlantic: 1
branch: 1
commonplace: 1
considering: 1
contrary: 1
country: 1
cover: 1
crookedest: 1
crow: 1
degrees: 1
delaware: 1
drainage-basin: 1
draws: 1
fly: 1
forty-five: 1
ground: 1
idaho: 1
journey: 1
longest: 1
longitude: 1
main: 1
miles: 1
missouri: 1
pacific: 1
part: 1
remarkable: 1
safe: 1
seaboard: 1
seems: 1
seventy-five: 1
six: 1
slope: 1
spread: 1
states: 1
supply: 1
territories: 1
twenty-eight: 1
uses: 1
vast: 1
water: 1
ways: 1
world: 1
world--four: 1

它看起来很棒,正是我想做的。不过,看看排名靠前的river是怎么不算数的?River在文本中出现了4次,因此我需要一个唯一单词的计数器来计算River 4次,同时仍然给我下面的输出。你知道吗

以下是我使用的列表(temp)和集合(unique):

独特

{'longest', 'considering', 'receives', 'water', 'discharges', 'atlantic', 'austria', 'part', 'idaho', 'main', 'drainage-basin', 'st', 'twenty-five', 'seventy-five', 'slope--a', 'world--four', 'remarkable', 'rivers', 'country', 'crookedest', 'areas', 'ireland', 'fifty-four', 'portugal', 'valley', 'france', 'almost', 'branch', 'twenty-eight', 'fertile', 'england', 'crow', 'spread', 'italy', 'journey', 'germany', 'river', 'draws', 'exceptionally', 'scotland', 'fly', 'uses', 'supply', 'region', 'rhine', 'ground', 'thirty-eight', 'thames', 'pacific', 'degrees', 'mississippi', 'lawrence', 'six', 'cover', 'subordinate', 'flats', 'navigable', 'area', 'proper', 'states', 'safe', 'wide', 'territories', 'vast', 'hundreds', 'contrary', 'missouri', 'commonplace', 'gulf', 'worth', 'seaboard', 'steamboats', 'wales', 'turkey', 'combined', 'delaware', 'forty-five', 'carries', 'seems', 'reading', 'keels', 'longitude', 'spain', 'ways'}

温度

['mississippi worth reading about', ' commonplace river contrary ways remarkable', ' considering missouri main branch longest river world--four miles', ' seems safe crookedest river world part journey uses cover ground crow fly six seventy-five', ' discharges water st', ' lawrence twenty-five rhine thirty-eight thames', ' river vast drainage-basin draws water supply twenty-eight states territories delaware atlantic seaboard country idaho pacific slope spread forty-five degrees longitude', ' mississippi receives carries gulf water fifty-four subordinate rivers navigable steamboats hundreds navigable flats keels', ' area drainage-basin combined areas england wales scotland ireland france spain portugal germany austria italy turkey almost wide region fertile mississippi valley proper exceptionally so']

如果你有任何问题,请随时问他们。你知道吗

谢谢你


Tags: theinforworldcountfourfivekw
3条回答
import collections

temp = ['mississippi worth reading about', ' commonplace river contrary ways remarkable', ' considering missouri main branch longest river world four miles', ' seems safe crookedest river world part journey uses cover ground crow fly six seventy-five', ' discharges water st', ' lawrence twenty-five rhine thirty-eight thames', ' river vast drainage-basin draws water supply twenty-eight states territories delaware atlantic seaboard country idaho pacific slope spread forty-five degrees longitude', ' mississippi receives carries gulf water fifty-four subordinate rivers navigable steamboats hundreds navigable flats keels', ' area drainage-basin combined areas england wales scotland ireland france spain portugal germany austria italy turkey almost wide region fertile mississippi valley proper exceptionally so']
one_big_string="".join(temp)

print(collections.Counter(one_big_string.split()))

Counter({'river': 4, 'mississippi': 3, 'water': 3, 'drainage-basin': 2, 'navigable': 2, 'worth': 1, 'reading': 1, 'about': 1, 'commonplace': 1, 'contrary': 1, 'ways': 1, 'remarkable': 1, 'considering': 1, 'missouri': 1, 'main': 1, 'branch': 1, 'longest': 1, 'world four': 1, 'miles': 1, 'seems': 1, 'safe': 1, 'crookedest': 1, 'world': 1, 'part': 1, 'journey': 1, 'uses': 1, 'cover': 1, 'ground': 1, 'crow': 1, 'fly': 1, 'six': 1, 'seventy-five': 1, 'discharges': 1, 'st': 1, 'lawrence': 1, 'twenty-five': 1, 'rhine': 1, 'thirty-eight': 1, 'thames': 1, 'vast': 1, 'draws': 1, 'supply': 1, 'twenty-eight': 1, 'states': 1, 'territories': 1, 'delaware': 1, 'atlantic': 1, 'seaboard': 1, 'country': 1, 'idaho': 1, 'pacific': 1, 'slope': 1, 'spread': 1, 'forty-five': 1, 'degrees': 1, 'longitude': 1, 'receives': 1, 'carries': 1, 'gulf': 1, 'fifty-four': 1, 'subordinate': 1, 'rivers': 1, 'steamboats': 1, 'hundreds': 1, 'flats': 1, 'keels': 1, 'area': 1, 'combined': 1, 'areas': 1, 'england': 1, 'wales': 1, 'scotland': 1, 'ireland': 1, 'france': 1, 'spain': 1, 'portugal': 1, 'germany': 1, 'austria': 1, 'italy': 1, 'turkey': 1, 'almost': 1, 'wide': 1, 'region': 1, 'fertile': 1, 'valley': 1, 'proper': 1, 'exceptionally': 1, 'so': 1})

TL;DR您的解决方案,但它似乎太复杂了,多次存储行,然后构建dict理解,可能会覆盖关键字,丢失最后的字数并保留第一个字数(即1)。你知道吗

有一种简短且防失败的方法:您希望利用好的旧的collections.Counter,但仅限于某些单词。你知道吗

要构建这个过滤计数器,请迭代单词,但使用您的唯一列表过滤掉它们(您构建的set非常适合高效地过滤掉不需要的单词,让我们保留它):

import collections

c = collections.Counter(word for line in temp for word in line.split() if word in unique)

然后打印,排序:

for word,count in sorted(c.items()):
    print("{}: {}".format(word,count))

打印(摘录):

...
reading: 1
receives: 1
region: 1
remarkable: 1
rhine: 1
river: 4
rivers: 1
safe: 1
scotland: 1
seaboard: 1
...
temp = ['mississippi worth reading about', ' commonplace river contrary ways remarkable', ' considering missouri main branch longest river world four miles', ' seems safe crookedest river world part journey uses cover ground crow fly six seventy-five', ' discharges water st', ' lawrence twenty-five rhine thirty-eight thames', ' river vast drainage-basin draws water supply twenty-eight states territories delaware atlantic seaboard country idaho pacific slope spread forty-five degrees longitude', ' mississippi receives carries gulf water fifty-four subordinate rivers navigable steamboats hundreds navigable flats keels', ' area drainage-basin combined areas england wales scotland ireland france spain portugal germany austria italy turkey almost wide region fertile mississippi valley proper exceptionally so']
unique = {'longest', 'considering', 'receives', 'water', 'discharges', 'atlantic', 'austria', 'part', 'idaho', 'main', 'drainage-basin', 'st', 'twenty-five', 'seventy-five', 'slope a', 'world four', 'remarkable', 'rivers', 'country', 'crookedest', 'areas', 'ireland', 'fifty-four', 'portugal', 'valley', 'france', 'almost', 'branch', 'twenty-eight', 'fertile', 'england', 'crow', 'spread', 'italy', 'journey', 'germany', 'river', 'draws', 'exceptionally', 'scotland', 'fly', 'uses', 'supply', 'region', 'rhine', 'ground', 'thirty-eight', 'thames', 'pacific', 'degrees', 'mississippi', 'lawrence', 'six', 'cover', 'subordinate', 'flats', 'navigable', 'area', 'proper', 'states', 'safe', 'wide', 'territories', 'vast', 'hundreds', 'contrary', 'missouri', 'commonplace', 'gulf', 'worth', 'seaboard', 'steamboats', 'wales', 'turkey', 'combined', 'delaware', 'forty-five', 'carries', 'seems', 'reading', 'keels', 'longitude', 'spain', 'ways'}
words = dict(zip(list(unique), [0 for i in unique]))
for str in temp:
    for w in str.split():
        if w in unique:
            words[w] += 1

for a in sorted(words):
    print('{}: {}'.format(a, words[a]))

相关问题 更多 >