当我和BeautifulSoup在网上乱搞时,我可以接受还是忽略谷歌隐私声明?

2024-05-26 14:21:33 发布

您现在位置:Python中文网/ 问答频道 /正文

从控制台运行以下代码时,我无法查看Google新闻页面的HTML。我看到的HTML是Google隐私声明(以“在继续之前”开头的HTML)

from bs4 import BeautifulSoup
import requests

headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get("https://www.google.com/news", headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())

有没有办法防止隐私通知突然出现

取而代之的是我得到的一小段:

  <title>
   Before you continue
  </title>
  <meta content="initial-scale=1, maximum-scale=5, width=device-width" name="viewport"/>
  <link href="//www.google.com/favicon.ico" rel="shortcut icon"/>
 </head>
 <body>
  <div class="signin">
   <a class="button" href="https://accounts.google.com/ServiceLogin?hl=en-US&amp;continue=https://news.google.com/topics/CAAqBwgKMKHQ9Qowlc7cAg&amp;gae=cb-">
    Sign in
   </a>
  </div>
  <div class="box">
   <img alt="Google" height="28" src="//www.gstatic.com/images/branding/googlelogo/1x/googlelogo_color_68x28dp.png" srcset="//www.gstatic.com/images/branding/googlelogo/2x/googlelogo_color_68x28dp.png 2x" width="68"/>
   <div class="productLogoContainer">
    <img alt="" aria-hidden="true" class="image" height="100%" src="https://www.gstatic.com/ac/cb/scene_cookie_wall_search_v2.svg" width="100%"/>
   </div>

Tags: httpsimportdivcomhtmlwwwgooglewidth
1条回答
网友
1楼 · 发布于 2024-05-26 14:21:33

您可以将CONSENTcookie设置为不获取,然后继续“页面:

import requests
from bs4 import BeautifulSoup

headers = {"User-Agent": "Mozilla/5.0"}
cookies = {"CONSENT": "YES+cb.20210720-07-p0.en+FX+410"}
r = requests.get(
    "https://www.google.com/news", headers=headers, cookies=cookies
)
soup = BeautifulSoup(r.content, "html.parser")
print(soup.prettify())

相关问题 更多 >

    热门问题