如何在Python中使用BeautifulSoup获取视频src

from bs4 import BeautifulSoup from urllib.request import Request, urlopen with open('pages2crawl.txt', 'r') as inFile: lines = [line.rstrip() for line in inFile] for page in lines: req = Request(page, headers={'User-Agent': 'Mozilla/5.0'}) soup = BeautifulSoup(urlopen(req).read(), 'html.parser') pages = soup.findAll('div', attrs={'class' : 'mejs__mediaelement'}) for e in pages: video = e.find("video").get("src") if video.endswith("m3u8"): print(video)

2条回答

网友

1楼 · 编辑于 2024-05-23 14:32:34

如果您只想制作一个简单的脚本，那么使用正则表达式可能会更容易

import re, requests

s = requests.Session() #start the session
data = s.get(url) #http get request to download data
data = data.text #get the raw text

vidlinks = re.findall("src='(.*?).m3u8'/>", data) #find all between the two parts in the data
print(vidlinks[0] + ".m3u8") #print the full link with extension

网友

2楼 · 编辑于 2024-05-23 14:32:34

您可以使用CSS选择器source[type="application/x-mpegURL"]提取MPEG链接（或source[type="video/mp4"]提取mp4链接）：

import requests
from bs4 import BeautifulSoup

url = "https://www.loc.gov/item/2015669100/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

link_mpeg = soup.select_one('source[type="application/x-mpegURL"]')["src"]
link_mp4 = soup.select_one('source[type="video/mp4"]')["src"]
print(link_mpeg)
print(link_mp4)

印刷品：

https://tile.loc.gov/streaming-services/iiif/service:afc:afc2010039:afc2010039_crhp0001:afc2010039_crhp0001_mv04/full/full/0/full/default.m3u8
https://tile.loc.gov/storage-services/service/afc/afc2010039/afc2010039_crhp0001/afc2010039_crhp0001_mv04.mp4

相关问题更多 >

编程相关推荐

热门问题

热门文章