使用Tweepy:urllib3进行流式处理。异常。协议错误和序列“列表索引超出范围”错误

2024-04-18 18:49:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我用来流式传输tweet的代码(如下所示)可以正常工作几分钟,但过了一段时间,它开始返回无限多个“列表索引超出范围”错误序列。你知道吗

from urllib3.exceptions import ProtocolError
from tweepy import Stream
from tweepy.auth import OAuthHandler
from tweepy.streaming import StreamListener
import time

ckey = 'your code here'
csecret = 'your code here'
atoken = 'your code here'
asecret = 'your code here'

class listener(StreamListener):

    def on_data(self, data):
        while True:
            try:
              #  print (data)
                tweet = data.split(',"text":"')[1].split('","')[0]
                tweet2 = data.split(',"screen_name":"')[1].split('","location')[0]
                print (tweet2,tweet)
                saveFile = open ('example.csv','a')
                saveFile.write('@')
                saveFile.write(tweet2)
                saveFile.write(';')
                saveFile.write(tweet)
                saveFile.write('\n')
                saveFile.close()
                return True

            except BaseException as e:
                print ('Failed on data', str(e))
                time.sleep(5)
            except ProtocolError:
                continue

        def on_error(self, status):
            print (status)

auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=['example'])

这是我在发了几十条微博后得到的回报:

('Failed on data', 'list index out of range') 
('Failed on data', 'list index out of range')
('Failed on data', 'list index out of range') 
('Failed on data', 'list index out of range')
('Failed on data', 'list index out of range') 
('Failed on data', 'list index out of range')
[...]

也许这与urllib3.exceptions中的except ProtocolError:有关?我把它放进去是为了解决我之前遇到的另一个错误:

urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 bytes read, 3650 more expected)', IncompleteRead(0 bytes read, 3650 more expected))

从我观察到的情况来看,当我处理在短时间内收集的大量tweet时,总是会出现这种IncompleteRead错误。我通常要运行大约2到3个小时的代码,所以我必须继续启动它。你知道吗

有什么帮助吗?建议无疑是受欢迎的。你知道吗


Tags: offromimportauthdataindexonrange