<p>嗯,向下滚动操作似乎触发了一个API调用,您可以使用<code>requests</code>模块模拟该调用来加载每个页面</p>
<p>以下是<em>最新新闻</em>部分的示例:</p>
<pre><code> import requests
from bs4 import BeautifulSoup
## The function which read the news by page
def getNews(page):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0',
'Accept': 'text/html, */*; q=0.01',
'Accept-Language': 'fr,fr-FR;q=0.8,en-US;q=0.5,en;q=0.3',
'X-Requested-With': 'XMLHttpRequest',
'Connection': 'keep-alive',
'Pragma': 'no-cache',
'Cache-Control': 'no-cache',
}
params = (
('page', page),
('slug', ''),
('taxo', ''),
)
response = requests.get('https://thenextweb.com/wp-content/themes/cyberdelia/ajax/partials/grid-pager.php', headers=headers, params=params)
return response.content
## Loop through page
for page in range(2):
print("Page", page)
soup = BeautifulSoup(getNews(page))
## Some simple data processing
for news in soup.find_all('li'):
news_div = news.find('div',{'class':'story-text'})
#Check if the li contains the desired info
if news_div == None: continue
print("News headline:", news_div.find('a').text.strip())
print("News link:", news_div.find('a').get('href'))
print("News extract:", news_div.find('p', {'class':'story-chunk'}).text.strip())
print("#"*10)
print()
</code></pre>
<p><strong>输出</strong>:</p>
<pre><code>Page 0
##########
News headline: Can AI convincingly answer existential questions?
News link: https://thenextweb.com/neural/2020/07/06/study-tests-whether-ai-can-convincingly-answer-existential-questions/
News extract: A new study has explored whether AI can provide more attractive answers to existential questions than history's most influential ...
##########
News headline: Here are the Xbox Series X games we think Microsoft will show off on July 23
News link: https://thenextweb.com/gaming/2020/07/06/xbox-series-x-games-microsoft-show-off-july-23/
News extract: Microsoft will be showing off its first-party Xbox Series X games at the end of the month. We can guess what we might be ...
##########
News headline: Uber buys Postmates for $2.65 billion — and traders are into it
News link: https://thenextweb.com/hardfork/2020/07/06/uber-stock-postmates-buyout-acquisition-billion/
News extract: Uber's $2.65 billion Postmates all-stock acquisition comes less than a month after talks to buy rival GrubHub fell through. ...
</code></pre>