我在用请求。获取()解析youtube HTML文本。当我在通过链接后打印输出时,一个视频出现故障。我正在尝试使用BeautifulSoup按顺序播放视频,故障视频在所有其他视频之前显示。任何建议或潜在的解决办法都会有帮助。你知道吗
true = requests.get(link +searched)
page = true.text
#print(page)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)
search_results = soup.findAll('a', attrs={'class': 'yt-uix-tile-link'})
#print(search_results)
更多参考的附加代码:
import pafy
import vlc
import requests
from bs4 import BeautifulSoup
import time
link="https://www.youtube.com/results?search_query="
youtube = "https://www.youtube.com"
word = "jid playlist"
def findlnks(searched):
if '&list' not in searched:
true = requests.get(link +searched)
page = true.text
#print(page)
soup = BeautifulSoup(page, 'html.parser')
#print(soup)
search_results = soup.findAll('a', attrs={'class': 'yt-uix-tile-link'})
#print(search_results)
else:
true = requests.get(searched)
page = true.text
soup = BeautifulSoup(page, 'html.parser')
#print(soup)
search_results = soup.findAll('a', class_ ="spf-link playlist-video clearfix yt-uix-sessionlink spf-link")
#print(search_results)
return search_results
if 'mix' in word or 'playlist' in word:
total_results = findlnks(word)
i =0
playlist_size=0
#while i< 10:
#print(total_results[i]['title'])
#i+=1
while 'list' not in (total_results[i])['href']:
print(total_results[i]['href'])
i = i + 1
playlist_results=findlnks(youtube + ((total_results[i])['href']))
while playlist_size<40:
#print(youtube + (playlist_results[playlist_size])['href'])
playlist_size = playlist_size +1
while (playlist_results[playlist_size])['href'] !='\0':
url = youtube + (playlist_results[playlist_size])['href']
#print(url)
video = pafy.new(url)
best = video.getbest()
playurl = best.url
Instance = vlc.Instance()
player = Instance.media_player_new()
media = Instance.media_new(playurl)
media.get_mrl()
player.set_media(media)
player.play()
playing = set([1, 2, 3, 4])
time.sleep(1)
duration = player.get_length() / 1000
mm, ss = divmod(duration, 60)
while True:
state = player.get_state()
if state not in playing:
break
continue
playlist_size = playlist_size +1
Youtube在网站上使用ajax技术。请求库只适用于非ajax站点。考虑使用无头浏览器从youtube获取html,然后提取数据。 你可以在无头模式下使用firefox或phantomjs浏览器
我找到了这个教程,可能对你有帮助。 https://towardsdatascience.com/web-scraping-using-selenium-and-beautifulsoup-99195cd70a58
相关问题 更多 >
编程相关推荐