为拥有数百万追随者的用户使用tweepy获取所有twitter提到的内容

try: while 1: for results in tweepy.Cursor(twitter_api.search, q="@celebrity_handle").items(9999999): item = (results.text).encode('utf-8').strip() wr.writerow([item, results.created_at]) # write to a csv (tweet, date)

1条回答

网友

1楼 · 发布于 2024-04-28 08:06:26

我会使用搜索api。我对下面的代码做了类似的操作。它似乎完全按预期工作。我在一个特定的电影明星身上用过它，快速浏览了15568条推文，所有的推文似乎都是@提到的。（我从他们的整个时间表中抽出来。）

在你的例子中，在你想运行的搜索中，比如说，每天，我会存储你为每个用户最后一次提到的I d，并在你每次重新运行搜索时将该值设置为“sinceId”。

顺便说一下，AppAuthHandler比OAuthHandler快得多，您不需要对这些类型的数据提取进行用户身份验证。

auth = tweepy.AppAuthHandler(consumer_token, consumer_secret)
auth.secure = True
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

这就是我们要找的。在您的例子中，我将创建一个列表，并在每次搜索查询运行过程中遍历所有用户名。

retweet_filter='-filter:retweets'这过滤掉了转发

在下面的每个api.search调用中，我将把以下内容作为查询参数：

q=searchQuery+retweet_filter

以下代码（以及上面的api设置）来自this link：

tweetsPerQry = 100这是API允许的最大值

fName = 'tweets.txt'我们将把tweets存储在一个文本文件中。

如果需要特定ID之后的结果，请将sinceId设置为该ID。否则默认为无下限，尽可能回到API允许的范围

sinceId = None

如果仅低于特定ID的结果为，请将“最大ID”设置为该ID。否则默认为无上限，从与搜索查询匹配的最新tweet开始。

max_id = -1L
//however many you want to limit your collection to.  how much storage space do you have?
maxTweets = 10000000 

tweetCount = 0
print("Downloading max {0} tweets".format(maxTweets))
with open(fName, 'w') as f:
    while tweetCount < maxTweets:
        try:
            if (max_id <= 0):
                if (not sinceId):
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry)
                else:
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            since_id=sinceId)
            else:
                if (not sinceId):
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            max_id=str(max_id - 1))
                else:
                    new_tweets = api.search(q=searchQuery, count=tweetsPerQry,
                                            max_id=str(max_id - 1),
                                            since_id=sinceId)
            if not new_tweets:
                print("No more tweets found")
                break
            for tweet in new_tweets:
                f.write(jsonpickle.encode(tweet._json, unpicklable=False) +
                        '\n')
            tweetCount += len(new_tweets)
            print("Downloaded {0} tweets".format(tweetCount))
            max_id = new_tweets[-1].id
        except tweepy.TweepError as e:
            # Just exit if any error
            print("some error : " + str(e))
            break

print ("Downloaded {0} tweets, Saved to {1}".format(tweetCount, fName))

相关问题更多 >

编程相关推荐

热门问题

热门文章