从selenium元素为twitter获取tweettext

browser.find_elements_by_css_selector("[data-testid=\"tweet\"]") # works browser.find_elements_by_xpath("/html/body/div[1]/div/div/div[2]/main/div/div/div/div[1]/div/div[2]/div/div/section/div/div/div/div/div/div/article/div/div/div/div[2]/div[2]/div[1]/div/div") # works

2条回答

网友

1楼 · 编辑于 2024-04-20 07:41:09

您可以使用Selenium来刮除twitter，但将twitter API与tweepy一起使用会更容易/更快/更高效。您可以在此处注册开发人员帐户：https://developer.twitter.com/en/docs

注册后，获取访问密钥并使用tweepy，如下所示：

import tweepy

# connects to twitter and authenticates your requests
auth = tweepy.OAuthHandler(TWapiKey, TWapiSecretKey)
auth.set_access_token(TWaccessToken, TWaccessTokenSecret)

# wait_on_rate_limit prevents you from requesting too many times and having twitter block you
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

# loops through every tweet that tweepy.Cursor pulls   api.search tells cursor 
# what to do, q is the search term, result_type can be recent popular or mixed, 
# and the max_id/since_id are snowflake ids which are twitters way of 
# representing time and finally count is the maximum amount of tweets you can return per request.
for tweet in tweepy.Cursor(api.search, q=YourSearchTerm, result_type='recent', max_id=snowFlakeCurrent, since_id=snowFlakeEnd, count=100).items(500):
        createdTime = tweet.created_at.strftime('%Y-%m-%d %H:%M')
        createdTime = dt.datetime.strptime(createdTime, '%Y-%m-%d %H:%M').replace(tzinfo=pytz.UTC)
        data.append(createdTime)

这段代码是一个脚本示例，它从YourSearchTerm最近的推文中提取500条推文，然后将每条推文的创建时间附加到列表中。您可以在此处查看tweepy文档：http://docs.tweepy.org/en/latest/

使用tweepy.Cursor（）拉取的每个tweet都有许多属性，您可以选择并附加到列表中，或者执行其他操作。尽管有可能用Selenium来刮掉twitter，但确实不推荐使用Selenium，因为它会非常慢，而tweepy返回只需几秒钟

网友

2楼 · 编辑于 2024-04-20 07:41:09

申请API并不总是成功的。我使用了Twint，它提供了一种快速刮取的方法。在本例中，将其转换为CSV输出

def search_twitter(terms, start_date, filename, lang):

   c = twint.Config()
   c.Search = terms

   c.Custom_csv = ["id", "user_id", "username", "tweet"]
   c.Output = filename
   c.Store_csv = True
   c.Lang = lang
   c.Since = start_date

   twint.run.Search(c)

return

相关问题更多 >

编程相关推荐

热门问题

热门文章