如何從Twitter打印推文？

import urllib import urllib.request from bs4 import BeautifulSoup theurl = "https://twitter.com/search?q=ghana%20and%20jollof&src=typed_query" thepage = urllib.request.urlopen(theurl) soup = BeautifulSoup(thepage, "html.parser") i = 1 for tweets in soup.findAll('div', { "class": "css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0" }): print (i) print (tweets.find('span').text) i = i+1 print(tweets)

1条回答

网友

1楼 · 发布于 2024-04-26 04:22:58

您应该使用请求库，而且您的请求中缺少user-agent头，这对于twitter来说似乎是必需的。你知道吗

下面是一个工作示例：

import requests
from bs4 import BeautifulSoup

# without this you get strange reponses
headers = {
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36',
}

# the correct way to pass the arguments
params = (
    ('q', 'ghana and jollof'),
    ('src', 'typed_query'),
)

r = requests.get('https://twitter.com/search', headers=headers, params=params)
soup = BeautifulSoup(r.content, 'html.parser')
allTweetsContainers = soup.findAll("div", {"class": "tweet"})

print(len(allTweetsContainers))
# all that remains is to parse the tweets one by one

问题是，这样每个请求只加载20条tweet，您需要检查network选项卡并查看浏览器如何动态加载其余的tweet。你知道吗

但是这是非常乏味的，我强烈建议使用一个直接调用twitterapi的库，比如https://github.com/twintproject/twint

相关问题更多 >

编程相关推荐

热门问题

热门文章