使用Python获取Twitter数据时出现Unicode解码错误

3 投票

1 回答

2186 浏览

提问于 2025-04-30 07:38

当我想要获取与特定阿拉伯关键词相关的Twitter数据时，代码如下：

#imports
from tweepy import Stream
from tweepy import OAuthHandler 
from tweepy.streaming import StreamListener

#setting up the keys
consumer_key = '………….' 
consumer_secret = '…………….'
access_token = '…………..'
access_secret = '……...'

class TweetListener(StreamListener):
    # A listener handles tweets are the received from the stream.
    #This is a basic listener that just prints received tweets to standard output

    def on_data(self, data):
        print (data)
        return True

    def on_error(self, status):
        print (status)

    #printing all the tweets to the standard output
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_secret)

    stream = Stream(auth, TweetListener())
    stream.filter(track=['سوريا'])

我收到了这个错误信息：

Traceback (most recent call last):
File "/Users/Mona/Desktop/twitter.py", line 29, in <module>
stream.filter(track=['سوريا'])
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site- packages/tweepy/streaming.py", line 303, in filter
encoded_track = [s.encode(encoding) for s in track]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd8 in position 0: ordinal not in range(128)

请问有什么帮助吗！！

暂无标签

1 个回答

我查看了tweepy的源代码，发现Stream中的一行代码似乎是导致问题的根源。具体来说，这行代码来自filter方法。当你在代码中调用 stream.filter(track=['سوريا']) 时，Stream会调用 s.encode('utf-8')，这里的s就是'سوريا'（在filter的源代码中，你会看到utf-8是默认的编码方式）。在这个时候，代码就会抛出一个异常。

要解决这个问题，我们需要使用Unicode字符串。

 t = u"سوريا"
 stream.filter(track=[t])

（为了更清楚，我把你的字符串放到了变量t中）。

回答于 2025-04-30 由 Python大师

分享举报

使用Python获取Twitter数据时出现Unicode解码错误

1 个回答

撰写回答