请求。获取()读取youtube HTML时未返回正确的文本顺序

2024-04-25 09:18:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我在用请求。获取()解析youtube HTML文本。当我在通过链接后打印输出时,一个视频出现故障。我正在尝试使用BeautifulSoup按顺序播放视频,故障视频在所有其他视频之前显示。任何建议或潜在的解决办法都会有帮助。你知道吗

    true = requests.get(link +searched)
    page = true.text
    #print(page)
    soup = BeautifulSoup(page, 'html.parser')
    #print(soup)
    search_results = soup.findAll('a', attrs={'class': 'yt-uix-tile-link'})
    #print(search_results)

更多参考的附加代码:

import pafy
import vlc
import requests
from bs4 import BeautifulSoup
import time
link="https://www.youtube.com/results?search_query="
youtube = "https://www.youtube.com"
word = "jid playlist"


def findlnks(searched):
    if '&list' not in searched:
        true = requests.get(link +searched)
        page = true.text
        #print(page)
        soup = BeautifulSoup(page, 'html.parser')
        #print(soup)
        search_results = soup.findAll('a', attrs={'class': 'yt-uix-tile-link'})
        #print(search_results)
    else:
        true = requests.get(searched)
        page = true.text
        soup = BeautifulSoup(page, 'html.parser')
        #print(soup)
        search_results = soup.findAll('a', class_ ="spf-link playlist-video clearfix yt-uix-sessionlink spf-link")
        #print(search_results)
    return search_results

if 'mix' in word or 'playlist' in word:
    total_results = findlnks(word)
    i =0
    playlist_size=0
    #while i< 10:
        #print(total_results[i]['title'])
        #i+=1
    while 'list' not in (total_results[i])['href']:
        print(total_results[i]['href'])
        i = i + 1




    playlist_results=findlnks(youtube + ((total_results[i])['href']))

    while playlist_size<40:
        #print(youtube + (playlist_results[playlist_size])['href'])
        playlist_size = playlist_size +1

    while (playlist_results[playlist_size])['href'] !='\0':
        url = youtube + (playlist_results[playlist_size])['href']
        #print(url)
        video = pafy.new(url)
        best = video.getbest()
        playurl = best.url
        Instance = vlc.Instance()
        player = Instance.media_player_new()
        media = Instance.media_new(playurl)

        media.get_mrl()
        player.set_media(media)
        player.play()
        playing = set([1, 2, 3, 4])
        time.sleep(1)
        duration = player.get_length() / 1000
        mm, ss = divmod(duration, 60)

        while True:
            state = player.get_state()
            if state not in playing:
                break
            continue
        playlist_size = playlist_size +1

Tags: truesearchsizegetyoutubepagelinkmedia
1条回答
网友
1楼 · 发布于 2024-04-25 09:18:08

Youtube在网站上使用ajax技术。请求库只适用于非ajax站点。考虑使用无头浏览器从youtube获取html,然后提取数据。 你可以在无头模式下使用firefox或phantomjs浏览器

我找到了这个教程,可能对你有帮助。 https://towardsdatascience.com/web-scraping-using-selenium-and-beautifulsoup-99195cd70a58

# Firefox session
driver = webdriver.Firefox()
driver.get(videos_url)
# Try to increase the time if the page need more time to be fully loaded
driver.implicitly_wait(100)

相关问题 更多 >