使用python将tokenize应用于json时出错

2024-04-16 20:58:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在遵循一个教程,我可以看到已经在用户中流行,但是,有一个错误,坚持我没有找到解决办法。 我将PyCharm和Python3.6与此代码一起使用。我感谢时间的帮助和帮助,谢谢你。你知道吗

代码:

import json
from collections import Counter
import re
from nltk.corpus import stopwords
import string


with open(fname, 'r', newline='\r\n') as f:
    count_all = Counter()
    for line in f:

        tweet = json.loads(line)
        terms_stop = [term for term in preprocess(tweet['text']) if term not in stop]
        terms_single = set(terms_stop)

        terms_hash = [term for term in preprocess(tweet['text']) if term.startswith('#')]

我得到的错误是:

    Traceback (most recent call last):
  File "C:/Users/Sukhivinder/PycharmProjects/mscProjectOne/sentimentJSONfile.py", line 50, in <module>
    tweet = json.loads(line)
  File "C:\Program Files\Python36\lib\json\__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "C:\Program Files\Python36\lib\json\decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files\Python36\lib\json\decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

数据:

{"created_at":"Wed Nov 15 21:37:57 +0000 2017","id":930912780831678464,"id_str":"930912780831678464","text":"Greatest Brexit speech ever? LABOUR MP\u2019s address will make your neck hairs stand up https:\/\/t.co\/5G3uEELEll","source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":886309819245879296,"id_str":"886309819245879296","name":"HMS","screen_name":"HMS150446","location":"England, United Kingdom","url":null,"description":"Any R\/T is not an endorsement but a way of sharing articles received on my twitter feed which I think are interesting and want to share.","translator_type":"none","protected":false,"verified":false,"followers_count":491,"friends_count":1480,"listed_count":4,"favourites_count":12236,"statuses_count":27375,"created_at":"Sat Jul 15 19:41:42 +0000 2017","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"F5F8FA","profile_background_image_url":"","profile_background_image_url_https":"","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/886676198155329538\/E8RsRDyz_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/886676198155329538\/E8RsRDyz_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/886309819245879296\/1500235099","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/5G3uEELEll","expanded_url":"https:\/\/www.express.co.uk\/news\/politics\/880048\/Brexit-speech-Labour-MP-Peter-Shore-EU","display_url":"express.co.uk\/news\/politics\/\u2026","indices":[84,107]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en","timestamp_ms":"1510781877130"}
{"created_at":"Wed Nov 15 21:37:57 +0000 2017","id":930912782056345600,"id_str":"930912782056345600","text":"RT @ThatTimWalker: This is a courageous journalist and a courageous newspaper. Would Mrs May have owned up to our Russian problem if i\u2026 ","source":"\u003ca href=\"http:\/\/twitter.com\/#!\/download\/ipad\" rel=\"nofollow\"\u003eTwitter for iPad\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":2687419047,"id_str":"2687419047","name":"Elizabeth#FBPE#StopBrexit","screen_name":"epcarruthers","location":"Edinburgh and Dormont","url":null,"description":"wife,mother, grandmother, gardener, LibDemocrat, and owner of 2 delightful cats.","translator_type":"none","protected":false,"verified":false,"followers_count":248,"friends_count":185,"listed_count":2,"favourites_count":18269,"statuses_count":29872,"created_at":"Tue Jul 08 15:24:35 +0000 2014","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/715085957146525696\/edS1d2lF_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/715085957146525696\/edS1d2lF_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/2687419047\/1510238275","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Tue Nov 14 23:24:41 +0000 2017","id":930577252827516928,"id_str":"930577252827516928","text":"This is a courageous journalist and a courageous newspaper. Would Mrs May have owned up to our Russian problem if i\u2026 https:\/\/t.co\/U83rrcqahM","display_text_range":[0,140],"source":"\u003ca href=\"http:\/\/twitter.com\/download\/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":60606206,"id_str":"60606206","name":"Tim Walker","screen_name":"ThatTimWalker","location":"London","url":null,"description":"A point of view","translator_type":"none","protected":false,"verified":true,"followers_count":18870,"friends_count":887,"listed_count":214,"favourites_count":8392,"statuses_count":26689,"created_at":"Mon Jul 27 14:07:16 +0000 2009","utc_offset":0,"time_zone":"Casablanca","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1B95E0","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/914607099220496384\/dtzHhd2V_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/914607099220496384\/dtzHhd2V_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/60606206\/1398249447","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"quoted_status_id":930576544820547584,"quoted_status_id_str":"930576544820547584","quoted_status":{"created_at":"Tue Nov 14 23:21:52 +0000 2017","id":930576544820547584,"id_str":"930576544820547584","text":"Playing catch-up on Brexit-Trump-Russia? My piece from May. Read it before Cambridge Analytica (of FBI &amp; Wikileaks\u2026 https:\/\/t.co\/pnUZeed6kq","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":722242009,"id_str":"722242009","name":"Carole Cadwalladr","screen_name":"carolecadwalla","location":null,"url":"https:\/\/www.theguardian.com\/profile\/carolecadwalladr","description":"Late adopter. Early giver-upper. Guardian & Observer writer.","translator_type":"none","protected":false,"verified":false,"followers_count":44945,"friends_count":1866,"listed_count":547,"favourites_count":660,"statuses_count":2899,"created_at":"Sat Jul 28 14:06:01 +0000 2012","utc_offset":3600,"time_zone":"Amsterdam","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/875727729525747717\/ZAIcCXFJ_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/875727729525747717\/ZAIcCXFJ_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/722242009\/1503701353","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"extended_tweet":{"full_text":"Playing catch-up on Brexit-Trump-Russia? My piece from May. Read it before Cambridge Analytica (of FBI &amp; Wikileaks fame!) sues @guardian into oblivion. They're trying to shut this - me, us - down. \nhttps:\/\/t.co\/KKZUJ81NE9","display_text_range":[0,225],"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/KKZUJ81NE9","expanded_url":"https:\/\/www.theguardian.com\/technology\/2017\/may\/07\/the-great-british-brexit-robbery-hijacked-democracy?CMP=share_btn_tw","display_url":"theguardian.com\/technology\/201\u2026","indices":[202,225]}],"user_mentions":[{"screen_name":"guardian","name":"The Guardian","id":87818409,"id_str":"87818409","indices":[131,140]}],"symbols":[]}},"quote_count":169,"reply_count":113,"retweet_count":2694,"favorite_count":2551,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/pnUZeed6kq","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/930576544820547584","display_url":"twitter.com\/i\/web\/status\/9\u2026","indices":[120,143]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"is_quote_status":true,"extended_tweet":{"full_text":"This is a courageous journalist and a courageous newspaper. Would Mrs May have owned up to our Russian problem if it hadn't got into newspapers? I wonder. https:\/\/t.co\/Qj8AkSxnIx","display_text_range":[0,154],"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/Qj8AkSxnIx","expanded_url":"https:\/\/twitter.com\/carolecadwalla\/status\/930576544820547584","display_url":"twitter.com\/carolecadwalla\u2026","indices":[155,178]}],"user_mentions":[],"symbols":[]}},"quote_count":2,"reply_count":5,"retweet_count":206,"favorite_count":293,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/U83rrcqahM","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/930577252827516928","display_url":"twitter.com\/i\/web\/status\/9\u2026","indices":[117,140]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"quoted_status_id":930576544820547584,"quoted_status_id_str":"930576544820547584","quoted_status":{"created_at":"Tue Nov 14 23:21:52 +0000 2017","id":930576544820547584,"id_str":"930576544820547584","text":"Playing catch-up on Brexit-Trump-Russia? My piece from May. Read it before Cambridge Analytica (of FBI &amp; Wikileaks\u2026 https:\/\/t.co\/pnUZeed6kq","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":true,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":722242009,"id_str":"722242009","name":"Carole Cadwalladr","screen_name":"carolecadwalla","location":null,"url":"https:\/\/www.theguardian.com\/profile\/carolecadwalladr","description":"Late adopter. Early giver-upper. Guardian & Observer writer.","translator_type":"none","protected":false,"verified":false,"followers_count":44945,"friends_count":1866,"listed_count":547,"favourites_count":660,"statuses_count":2899,"created_at":"Sat Jul 28 14:06:01 +0000 2012","utc_offset":3600,"time_zone":"Amsterdam","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/875727729525747717\/ZAIcCXFJ_normal.jpg","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/875727729525747717\/ZAIcCXFJ_normal.jpg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/722242009\/1503701353","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"extended_tweet":{"full_text":"Playing catch-up on Brexit-Trump-Russia? My piece from May. Read it before Cambridge Analytica (of FBI &amp; Wikileaks fame!) sues @guardian into oblivion. They're trying to shut this - me, us - down. \nhttps:\/\/t.co\/KKZUJ81NE9","display_text_range":[0,225],"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/KKZUJ81NE9","expanded_url":"https:\/\/www.theguardian.com\/technology\/2017\/may\/07\/the-great-british-brexit-robbery-hijacked-democracy?CMP=share_btn_tw","display_url":"theguardian.com\/technology\/201\u2026","indices":[202,225]}],"user_mentions":[{"screen_name":"guardian","name":"The Guardian","id":87818409,"id_str":"87818409","indices":[131,140]}],"symbols":[]}},"quote_count":169,"reply_count":113,"retweet_count":2694,"favorite_count":2551,"entities":{"hashtags":[],"urls":[{"url":"https:\/\/t.co\/pnUZeed6kq","expanded_url":"https:\/\/twitter.com\/i\/web\/status\/930576544820547584","display_url":"twitter.com\/i\/web\/status\/9\u2026","indices":[120,143]}],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"en"},"is_quote_status":true,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"ThatTimWalker","name":"Tim Walker","id":60606206,"id_str":"60606206","indices":[3,17]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1510781877422"}

我已经添加了完整的错误消息,以回应一个用户,但我有一个职位,其中包括大多数代码,所以需要进一步的文本,以允许编辑。你知道吗


Tags: toinhttpsimagecomidfalseurl
1条回答
网友
1楼 · 发布于 2024-04-16 20:58:47

TL;DR:

对于open()函数,在模式字符串中使用U。你知道吗

我的问题呢?

我将您的open()调用更改为使用Universal Newline Support。它去了。。。你知道吗

发件人:

with open(fname, 'r', newline='\r\n') as f:

收件人:

with open(fname, 'rU') as f:

这解决了我测试中的问题。你知道吗

什么是通用新线支持?

PEP-278

This PEP discusses a way in which Python can support I/O on files which have a newline format that is not the native format on the platform, so that Python on each platform can read and import files with CR (Macintosh), LF (Unix) or CR LF (Windows) line endings.

相关问题 更多 >