<p>我回答了类似的问题<a href="https://stackoverflow.com/questions/56106040/unable-to-scrape-google-news-heading-via-their-class/66805472#66805472">here</a></p>
<p>代码(<em>我在这里添加了两行额外的代码,用于提取文章摘要</em>):</p>
<pre><code>from bs4 import BeautifulSoup
import requests
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
response = requests.get(
'https://www.google.com/search?hl=en-US&q=best+coockie&tbm=nws&sxsrf=ALeKk009n7GZbzUhUpsMTt89rigSAluBsQ%3A1616683043826&ei=I6BcYP_OMeGlrgTAwLpA&oq=best+coockie&gs_l=psy-ab.3...325216.326993.0.327292.12.12.0.0.0.0.163.1250.2j9.11.0....0...1c.1.64.psy-ab..1.0.0....0.305S8ngx0uo',
headers=headers)
html = response.text
soup = BeautifulSoup(html, 'lxml')
for headings in soup.findAll('div', class_='dbsr'):
title = headings.find('div', class_='JheGif nDgy9d').text
summary = headings.find('div', class_='Y3v8qd').text
link = headings.a['href']
print(title)
print(summary)
print(link)
print()
</code></pre>
<hr/>
<p>或者,您可以从SerpApi下载<a href="https://serpapi.com/news-results" rel="nofollow noreferrer">Google News Result API</a></p>
<p>JSON的一部分:</p>
<pre class="lang-json prettyprint-override"><code>"news_results": [
{
"position": 1,
"link": "https://abc7chicago.com/eisenhower-expressway-crash-wrong-way-chicago-traffic/10456033/",
"title": "Eisenhower Expressway crashes: 5 killed in separate I-290 wrong-way crashes in Chicago, Forest Park",
"source": "WLS-TV",
"date": "16 hours ago",
"thumbnail": "https://serpapi.com/searches/606340870574f50571da7bfd/images/2f5ade266f837059c67526895fb3916f7518aefbb5215951bb79d83871345dedc741519fefe9c85a8abb834360552c65898af6461c5709de.jpeg"
}
]
</code></pre>
<p>要集成的代码:</p>
<pre><code>import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "chicago",
"tbm": "nws",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for news_result in results["news_results"]:
print(f"Article summary: {news_result['snippet']}\n")
</code></pre>
<p>输出:</p>
<pre class="lang-none prettyprint-override"><code>Article summary: A Chicago-based marijuana cultivator and dispenser that has rapidly grown
into one of the nation's biggest pot firms is under federal ...
Article summary: With 2021 being a pivotal season for the Chicago Cubs and the direction of
the franchise, here are three bold predictions you may see play out ...
Article summary: The Chicago Blackhawks have lacked puck management. With a team with high
offensive upside in the Carolina Hurricanes, this cannot ...
Article summary: Chicago, IL - Lírica, Chicagos New Latin-American Inspired Restaurant and
Bar,
Article summary: A father of three is lucky to be alive after what he describes as a failed
carjacking that left him running for his life, and his car riddled with
bullets, ...
Article summary: Robservations on the media beat: VSiN, the Las Vegas-based sports
information network founded by a group of Chicago entrepreneurs in ...
Article summary: In the day's first reported shooting a man was shot about 2 a.m. in the
2700 block of South Karlov Avenue.
Article summary: Cameo, the Chicago-based startup that lets users buy video shout-outs from
celebrities, has banked $100 million in Series C funding which ...
Article summary: CHICAGO (CBS) — Although much of the contemporary discussion of COVID-19
center around rolling out the vaccines, there are still people ...
</code></pre>
<p>免责声明,我为SerpApi工作</em></p>