计算每个人的词频

2024-03-29 07:46:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图为用户的词频生成统计数据,就像他们在评论中给出的那样

用户1:词频

用户2:词频等等。。。你知道吗

我该怎么做?你知道吗

这里我试图访问每个用户的评论,但它给了我一个错误。你知道吗

请建议方法和sudo代码。你知道吗

import json
from pprint import pprint

file = open('/Users/mack/Downloads/WKA/task/reviews.json','r')
content = file.read()
file = json.loads(content)

for eid, txt in file["id"]["text"]:
    print(eid, txt)

像这样的大json:

[
    {
       "id": 1,
       "text": "Bought this over a month ago and everything came like advertise. I got the purple cover and it looks wonderful. The outlet works just fine and charges my kindle without a problem. I also bought it on sale so it was $20 cheaper. Best. Deal. Ever. Love my kindle paperwhite (love being able to read in the dark too!) Also makes reading at work much easier than a traditional book. Thanks Amazon.",
    },
    {
       "id": 2,
       "text": "Why three stars? Skip the next two paragraphs. Purchased the bundle on Black Friday - great price. The device works as advertised and I'm enjoying it. However, the lighting (even on max) is underwhelming. The features are handy and easy to use (i.e. dictionary, highlighting, bookmark, etc.) The case is attractive and sturdy enough, but the magnetic closure is rather weak. I suspect the case would open easily if the device were dropped.. In retrospect, I probably would have been dollars ahead to purchase a less expensive case separately rather than bundling. The reason for the three (3) stars? The promoted $15 credit towards purchase of ebook(s). After two unsuccessful attempts to redeem the credit and visiting with an Amazon rep, it appears the credit only works for Amazon digital/published books and is NOT applicable to third party publisher/sellers such as HarperCollins, Random House, Simon and Schuster, Penguin, Tyndale, Scholastic, Thomas Nelson, etc. etc. After respectfully telling the rep that this promotion seems very misleading and asking where I could find a list of authors and/or books for which the credit is applicable, he could offer no such list or database. He suggested finding an author on the Amazon ebook list, clicking on a title, putting the book into the order box and then noting the publisher in the order box. If it didn't say Amazon, I would know the credit could not be applied. I have since located several of my favorite writers and pulled up many of their ebooks. As I expected, NONE were available for purchase with the credit. ALL were published by major publishing houses. NONE were published by Amazon digital. I cannot imagine any prolific author of note not being affiliated with major publishing houses - which leaves the enticing ebook credit pretty much useless to me. The language in the Terms and Conditions seems vague at best regarding this restriction. This lack of clarity gives the consumer little, if any, pause regarding the use of the credit. After trying to use it, I felt like I had been scammed. I would NOT recommend purchasing the bundle - even on special pricing days like Black Friday. I feel like I simply gave $15 to Amazon and got virtually nothing in return. If I had it to do over again, I definitely would purchase the Paperwhite. I also would buy the Amazon charger and probably a less expensive case. (Even though I suspect a 5watt iPhone charger would work perfectly, I would still purchase the Amazon charger. In the event the device became problematic, the charger would be on the invoice thereby suggesting the device had been properly charged and disallowing refusal to repair or replace due to improper charging.) The device has been wonderful to use, the case is okay, haven't had to use the charger yet (impressive), but the $15 ebook credit seems virtually worthless.",
    }
]

输入: id及其相对文本,如json中所示

输出: id和文本中出现的字数


Tags: andofthetoinidjsonamazon
1条回答
网友
1楼 · 发布于 2024-03-29 07:46:31

file = \
[
    {
       "id": 1,
       "text": "Bought this over a month ago and everything came like advertise. I got the purple cover and it looks wonderful. The outlet works just fine and charges my kindle without a problem. I also bought it on sale so it was $20 cheaper. Best. Deal. Ever. Love my kindle paperwhite (love being able to read in the dark too!) Also makes reading at work much easier than a traditional book. Thanks Amazon.",
    },
    {
       "id": 2,
       "text": "Why three stars? Skip the next two paragraphs. Purchased the bundle on Black Friday - great price. The device works as advertised and I'm enjoying it. However, the lighting (even on max) is underwhelming. The features are handy and easy to use (i.e. dictionary, highlighting, bookmark, etc.) The case is attractive and sturdy enough, but the magnetic closure is rather weak. I suspect the case would open easily if the device were dropped.. In retrospect, I probably would have been dollars ahead to purchase a less expensive case separately rather than bundling. The reason for the three (3) stars? The promoted $15 credit towards purchase of ebook(s). After two unsuccessful attempts to redeem the credit and visiting with an Amazon rep, it appears the credit only works for Amazon digital/published books and is NOT applicable to third party publisher/sellers such as HarperCollins, Random House, Simon and Schuster, Penguin, Tyndale, Scholastic, Thomas Nelson, etc. etc. After respectfully telling the rep that this promotion seems very misleading and asking where I could find a list of authors and/or books for which the credit is applicable, he could offer no such list or database. He suggested finding an author on the Amazon ebook list, clicking on a title, putting the book into the order box and then noting the publisher in the order box. If it didn't say Amazon, I would know the credit could not be applied. I have since located several of my favorite writers and pulled up many of their ebooks. As I expected, NONE were available for purchase with the credit. ALL were published by major publishing houses. NONE were published by Amazon digital. I cannot imagine any prolific author of note not being affiliated with major publishing houses - which leaves the enticing ebook credit pretty much useless to me. The language in the Terms and Conditions seems vague at best regarding this restriction. This lack of clarity gives the consumer little, if any, pause regarding the use of the credit. After trying to use it, I felt like I had been scammed. I would NOT recommend purchasing the bundle - even on special pricing days like Black Friday. I feel like I simply gave $15 to Amazon and got virtually nothing in return. If I had it to do over again, I definitely would purchase the Paperwhite. I also would buy the Amazon charger and probably a less expensive case. (Even though I suspect a 5watt iPhone charger would work perfectly, I would still purchase the Amazon charger. In the event the device became problematic, the charger would be on the invoice thereby suggesting the device had been properly charged and disallowing refusal to repair or replace due to improper charging.) The device has been wonderful to use, the case is okay, haven't had to use the charger yet (impressive), but the $15 ebook credit seems virtually worthless.",
    }
]

字典

count = {}
for user in file:
    count[user['id']] = {}
    for word in user['text'].split():
        count[user['id']][word] = count[user['id']].get(word, 0) + 1

输出:

{1: {'work': 1, 'so': 1, 'like': 1, 'came': 1, 'and': 3, 'problem.': 1, 'over': 1, 'dark': 1, 'the': 2, 'just': 1, 'than': 1, 'Deal.': 1, 'being': 1, 'purple': 1, 'wonderful.': 1, 'reading': 1, 'my': 2, 'Also': 1, 'makes': 1, 'on': 1, 'Love': 1, '(love': 1, 'fine': 1, 'Ever.': 1, 'paperwhite': 1, 'Thanks': 1, 'to': 1, '$20': 1, 'bought': 1, 'book.': 1, 'at': 1, 'traditional': 1, 'read': 1, 'looks': 1, 'in': 1, 'cover': 1, 'kindle': 2, 'cheaper.': 1, 'too!)': 1, 'Best.': 1, 'works': 1, 'Amazon.': 1, 'The': 1, 'it': 3, 'easier': 1, 'this': 1, 'got': 1, 'sale': 1, 'outlet': 1, 'without': 1, 'also': 1, 'advertise.': 1, 'Bought': 1, 'much': 1, 'able': 1, 'everything': 1, 'I': 2, 'ago': 1, 'was': 1, 'a': 3, 'charges': 1, 'month': 1}, 2: {'repair': 1, 'many': 1, 'applied.': 1, 'noting': 1, 'respectfully': 1, 'expected,': 1, 'days': 1, 'several': 1, 'then': 1, 'best': 1, 'very': 1, 'being': 1, 'telling': 1, 'weak.': 1, 'clicking': 1, 'okay,': 1, 'any,': 1, 'got': 1, 'improper': 1, 'to': 12, 'trying': 1, 'use,': 1, 'if': 2, 'became': 1, 'closure': 1, 'is': 6, 'sturdy': 1, 'buy': 1, 'Nelson,': 1, 'features': 1, 'lighting': 1, 'After': 3, '(3)': 1, 'finding': 1, 'putting': 1, 'of': 7, 'unsuccessful': 1, 'say': 1, 'simply': 1, 'which': 2, 'device': 5, 'only': 1, 'attractive': 1, 'max)': 1, 'offer': 1, 'nothing': 1, 'lack': 1, 'Random': 1, 'pulled': 1, 'Paperwhite.': 1, 'this': 2, 'felt': 1, 'visiting': 1, 'appears': 1, 'publisher/sellers': 1, 'two': 2, 'ebooks.': 1, 'are': 1, 'major': 2, 'Tyndale,': 1, 'pretty': 1, 'clarity': 1, 'dollars': 1, 'Penguin,': 1, 'even': 1, 'enticing': 1, '(impressive),': 1, 'price.': 1, 'and': 13, 'over': 1, 'seems': 3, "didn't": 1, 'also': 1, 'order': 2, 'little,': 1, 'Amazon,': 1, 'reason': 1, 'have': 2, 'suggested': 1, 'digital.': 1, '(even': 1, 'redeem': 1, 'no': 1, 'pricing': 1, 'Simon': 1, 'pause': 1, 'cannot': 1, 'on': 6, 'publisher': 1, 'HarperCollins,': 1, 'yet': 1, 'Purchased': 1, 'consumer': 1, 'note': 1, 'attempts': 1, 'imagine': 1, 'box': 1, 'suspect': 2, 'case.': 1, 'an': 2, 'author': 2, 'Skip': 1, 'much': 1, 'published': 2, 'charging.)': 1, 'be': 2, 'affiliated': 1, 'list,': 1, 'expensive': 2, 'digital/published': 1, 'leaves': 1, 'purchasing': 1, 'Why': 1, 'return.': 1, 'Conditions': 1, '5watt': 1, 'vague': 1, 'title,': 1, 'This': 1, 'If': 2, 'know': 1, 'do': 1, 'favorite': 1, 'invoice': 1, 'than': 1, 'Terms': 1, 'House,': 1, 'handy': 1, 'since': 1, 'In': 2, 'up': 1, 'charged': 1, 'definitely': 1, 'purchase': 5, 'like': 3, 'replace': 1, 'rep': 1, 'wonderful': 1, 'the': 35, 'enough,': 1, 'Friday': 1, 'find': 1, 'problematic,': 1, 'been': 4, 'applicable': 1, 'probably': 2, 'bundle': 2, 'open': 1, 'credit': 7, 'However,': 1, 'could': 3, 'paragraphs.': 1, 'As': 1, 'still': 1, 'but': 2, 'restriction.': 1, 'ahead': 1, 'NONE': 2, 'gave': 1, 'charger.': 1, 'language': 1, 'advertised': 1, 'database.': 1, 'again,': 1, 'bundling.': 1, 'dropped..': 1, 'work': 1, 'houses.': 1, 'and/or': 1, 'credit.': 2, 'authors': 1, 'great': 1, 'third': 1, 'he': 1, 'by': 2, 'has': 1, 'promotion': 1, 'dictionary,': 1, 'at': 1, 'works': 2, 'book': 1, 'though': 1, 'it': 3, 'useless': 1, 'it.': 1, 'writers': 1, 'refusal': 1, 'NOT': 2, 'as': 2, 'Schuster,': 1, 'less': 2, 'would': 9, 'I': 17, 'a': 5, 'their': 1, '(i.e.': 1, 'box.': 1, 'enjoying': 1, 'Amazon': 7, '$15': 3, 'separately': 1, 'it,': 1, 'promoted': 1, 'publishing': 2, 'with': 3, "haven't": 1, 'easy': 1, 'magnetic': 1, 'retrospect,': 1, 'ebook(s).': 1, 'Black': 2, 'special': 1, 'list': 2, 'scammed.': 1, 'charger': 4, 'rather': 2, 'located': 1, 'misleading': 1, 'asking': 1, '(Even': 1, 'feel': 1, 'Scholastic,': 1, 'such': 2, 'ebook': 3, 'into': 1, 'recommend': 1, 'Friday.': 1, 'towards': 1, 'Thomas': 1, 'easily': 1, 'gives': 1, 'properly': 1, 'case': 4, 'me.': 1, 'three': 2, 'etc.': 2, 'rep,': 1, 'next': 1, 'bookmark,': 1, 'etc.)': 1, 'my': 1, 'not': 2, 'were': 4, 'in': 3, 'suggesting': 1, 'disallowing': 1, 'iPhone': 1, 'party': 1, 'any': 1, 'where': 1, 'perfectly,': 1, 'regarding': 2, 'applicable,': 1, 'underwhelming.': 1, '-': 3, 'virtually': 2, 'worthless.': 1, 'or': 2, 'had': 4, 'use': 4, 'highlighting,': 1, 'event': 1, 'He': 1, 'houses': 1, 'that': 1, 'for': 4, "I'm": 1, 'The': 7, 'available': 1, 'prolific': 1, 'stars?': 2, 'ALL': 1, 'thereby': 1, 'due': 1, 'books': 2}}

每个回路310µs±272 ns(平均±标准偏差7次,每个回路1000次)


collections.Counter

from collections import Counter

count = {}
for user in file:
    count[user['id']] = Counter()
    for word in user['text'].split():
        count[user['id']][word] += 1

输出:

{1: Counter({'and': 3, 'it': 3, 'a': 3, 'my': 2, 'kindle': 2, 'I': 2, 'the': 2, 'charges': 1, 'dark': 1, 'reading': 1, 'purple': 1, 'being': 1, 'works': 1, 'outlet': 1, 'read': 1, 'too!)': 1, 'like': 1, 'wonderful.': 1, 'also': 1, 'The': 1, 'much': 1, 'sale': 1, 'paperwhite': 1, 'cover': 1, 'Thanks': 1, 'Best.': 1, 'came': 1, 'Deal.': 1, 'so': 1, 'Ever.': 1, 'ago': 1, 'advertise.': 1, '$20': 1, 'Amazon.': 1, 'bought': 1, 'problem.': 1, 'cheaper.': 1, 'got': 1, 'month': 1, 'work': 1, 'makes': 1, 'just': 1, 'than': 1, 'everything': 1, 'Also': 1, 'this': 1, 'fine': 1, 'able': 1, 'to': 1, 'without': 1, 'was': 1, 'in': 1, 'book.': 1, 'at': 1, 'Bought': 1, 'Love': 1, 'on': 1, 'over': 1, 'looks': 1, '(love': 1, 'traditional': 1, 'easier': 1}), 2: Counter({'the': 35, 'I': 17, 'and': 13, 'to': 12, 'would': 9, 'Amazon': 7, 'credit': 7, 'The': 7, 'of': 7, 'on': 6, 'is': 6, 'a': 5, 'device': 5, 'purchase': 5, 'use': 4, 'been': 4, 'charger': 4, 'case': 4, 'were': 4, 'for': 4, 'had': 4, 'like': 3, 'in': 3, 'it': 3, '-': 3, '$15': 3, 'ebook': 3, 'could': 3, 'seems': 3, 'with': 3, 'After': 3, 'published': 2, 'works': 2, 'two': 2, 'by': 2, 'books': 2, 'In': 2, 'rather': 2, 'or': 2, 'such': 2, 'not': 2, 'probably': 2, 'less': 2, 'be': 2, 'major': 2, 'author': 2, 'NOT': 2, 'which': 2, 'publishing': 2, 'etc.': 2, 'expensive': 2, 'NONE': 2, 'if': 2, 'bundle': 2, 'as': 2, 'have': 2, 'credit.': 2, 'virtually': 2, 'list': 2, 'three': 2, 'Black': 2, 'this': 2, 'an': 2, 'regarding': 2, 'stars?': 2, 'order': 2, 'If': 2, 'suspect': 2, 'but': 2, 'properly': 1, 'charging.)': 1, 'dollars': 1, 'underwhelming.': 1, 'located': 1, 'dropped..': 1, 'suggesting': 1, 'return.': 1, 'much': 1, 'Conditions': 1, 'charger.': 1, 'Scholastic,': 1, 'list,': 1, 'attempts': 1, 'note': 1, 'pause': 1, 'applicable,': 1, 'repair': 1, 'replace': 1, 'and/or': 1, 'box.': 1, 'He': 1, 'invoice': 1, 'clarity': 1, 'Thomas': 1, 'title,': 1, "I'm": 1, 'it,': 1, 'enticing': 1, 'separately': 1, 'event': 1, 'pulled': 1, 'though': 1, 'Tyndale,': 1, 'several': 1, 'use,': 1, 'has': 1, 'noting': 1, 'promotion': 1, 'pretty': 1, 'suggested': 1, 'vague': 1, 'lack': 1, 'bundling.': 1, "haven't": 1, 'houses': 1, 'retrospect,': 1, 'clicking': 1, 'easy': 1, 'Amazon,': 1, 'Schuster,': 1, 'favorite': 1, 'reason': 1, 'many': 1, '(even': 1, 'applicable': 1, 'special': 1, 'iPhone': 1, 'prolific': 1, 'definitely': 1, 'my': 1, 'up': 1, 'wonderful': 1, 'are': 1, 'attractive': 1, 'case.': 1, 'it.': 1, 'redeem': 1, 'know': 1, 'digital/published': 1, 'great': 1, 'no': 1, 'any,': 1, 'As': 1, 'promoted': 1, 'respectfully': 1, 'rep': 1, 'telling': 1, 'ebooks.': 1, "didn't": 1, 'handy': 1, 'However,': 1, 'publisher/sellers': 1, 'disallowing': 1, 'price.': 1, 'perfectly,': 1, 'very': 1, 'worthless.': 1, 'into': 1, 'restriction.': 1, 'magnetic': 1, 'buy': 1, 'next': 1, 'HarperCollins,': 1, 'unsuccessful': 1, 'their': 1, 'find': 1, 'pricing': 1, 'Why': 1, 'language': 1, 'asking': 1, '(Even': 1, 'any': 1, 'imagine': 1, 'trying': 1, 'offer': 1, 'ebook(s).': 1, 'towards': 1, 'Random': 1, 'thereby': 1, 'Paperwhite.': 1, 'Simon': 1, 'third': 1, 'rep,': 1, 'Skip': 1, 'consumer': 1, 'finding': 1, 'affiliated': 1, 'cannot': 1, 'House,': 1, 'houses.': 1, 'say': 1, 'gave': 1, 'enjoying': 1, 'due': 1, 'etc.)': 1, '(impressive),': 1, 'publisher': 1, 'ALL': 1, 'became': 1, 'scammed.': 1, 'gives': 1, 'appears': 1, 'recommend': 1, 'improper': 1, 'problematic,': 1, 'Friday': 1, 'sturdy': 1, 'again,': 1, 'open': 1, 'expected,': 1, 'got': 1, 'dictionary,': 1, 'max)': 1, 'lighting': 1, 'Nelson,': 1, 'feel': 1, 'applied.': 1, 'yet': 1, 'party': 1, 'book': 1, 'enough,': 1, 'available': 1, 'purchasing': 1, 'okay,': 1, 'days': 1, 'bookmark,': 1, 'misleading': 1, 'where': 1, 'putting': 1, 'box': 1, '5watt': 1, 'Friday.': 1, 'felt': 1, 'ahead': 1, 'even': 1, 'authors': 1, 'leaves': 1, 'advertised': 1, 'easily': 1, 'visiting': 1, 'refusal': 1, 'me.': 1, 'Terms': 1, 'only': 1, 'digital.': 1, 'also': 1, 'he': 1, 'useless': 1, 'This': 1, 'still': 1, 'then': 1, 'highlighting,': 1, 'do': 1, 'features': 1, 'Purchased': 1, 'closure': 1, 'database.': 1, 'Penguin,': 1, 'work': 1, 'best': 1, 'than': 1, 'paragraphs.': 1, 'since': 1, 'being': 1, 'that': 1, 'over': 1, 'charged': 1, 'nothing': 1, 'writers': 1, '(i.e.': 1, 'weak.': 1, 'at': 1, '(3)': 1, 'simply': 1, 'little,': 1})}

每个回路536µs±858 ns(平均±标准偏差7次,每个回路1000次)

相关问题 更多 >