用靓汤抓取youtube网站

2024-03-29 11:24:32 发布

您现在位置:Python中文网/ 问答频道 /正文

enter image description here

我使用以下代码抓取youtube搜索结果:

import requests
from bs4 import BeautifulSoup

url = "https://www.youtube.com/results?search_query=python"
response = requests.get(url)
soup = BeautifulSoup(response.content,'html.parser')
for each in soup.find_all("a", class_="yt-simple-endpoint style-scope ytd-video-renderer"):
    print(each.get('href'))

但它什么也没有回报。这个代码有什么问题


Tags: 代码fromhttpsimportcomurlgetyoutube
1条回答
网友
1楼 · 发布于 2024-03-29 11:24:32

BeatifulSoup不是Youtube刮片的合适工具-Youtube正在使用JavaScript生成大量内容

您可以轻松地测试它:

>>> import requests
>>> from bs4 import BeautifulSoup

>>> url = "https://www.youtube.com/results?search_query=python"
>>> response = requests.get(url)
>>> soup = BeautifulSoup(response.content,'html.parser')
>>> soup.find_all("a")
[<a href="//www.youtube.com/yt/about/en-GB/" slot="guide-links-primary" style="display: none;">About</a>, <a href="//www.youtube.com/yt/press/en-GB/" slot="guide-links-primary" style="display: none;">Press</a>, <a href="//www.youtube.com/yt/copyright/en-GB/" slot="guide-links-primary" style="display: none;">Copyright</a>, <a href="/t/contact_us" slot="guide-links-primary" style="display: none;">Contact us</a>, <a href="//www.youtube.com/yt/creators/en-GB/" slot="guide-links-primary" style="display: none;">Creators</a>, <a href="//www.youtube.com/yt/advertise/en-GB/" slot="guide-links-primary" style="display: none;">Advertise</a>, <a href="//www.youtube.com/yt/dev/en-GB/" slot="guide-links-primary" style="display: none;">Developers</a>, <a href="/t/terms" slot="guide-links-secondary" style="display: none;">Terms</a>, <a href="https://www.google.co.uk/intl/en-GB/policies/privacy/" slot="guide-links-secondary" style="display: none;">Privacy</a>, <a href="//www.youtube.com/yt/policyandsafety/en-GB/" slot="guide-links-secondary" style="display: none;">Policy and Safety</a>, <a href="/new" slot="guide-links-secondary" style="display: none;">Test new features</a>]

(请注意,您在屏幕截图上看到的链接不在列表中)

为此,您需要使用另一种解决方案-硒可能是一个不错的选择。请查看此线程以了解详细信息Fetch all href link using selenium in python

相关问题 更多 >