如何获取特定的脚本标记?

2024-06-16 15:00:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试用python构建一个下载管理器脚本,该网页包含一些脚本标记,我想隔离一个特定的脚本,脚本html5player.setVideoUrlHigh('https://*****');

我不知道该怎么做,我能够得到所有的脚本标记,但是我无法用这个代码得到脚本标记html5player.setVideoUrlHigh('https://*****');

这是我的python代码

from urllib.request import urlopen
import re
from bs4 import BeautifulSoup
Url = '*****'
pg = urlopen(Url)
sp = BeautifulSoup(pg)
script_tag = sp.find_all('script')
# print(script_tag[1])
print(re.search("setVideoHLS\(\'(.*?)\'\)", script_tag).group(1))

我想要得到的脚本标记是:

<script>
    logged_user = false;
    var static_id_cdn = 17;
    var html5player = new HTML5Player('html5video', '56420147');
    if (html5player) {
        html5player.setVideoTitle('passionate hotel room');
        html5player.setSponsors(false);
        html5player.setVideoUrlLow('https://*****');
        html5player.setVideoUrlHigh('https://******');
        html5player.setVideoHLS('https://****');
        html5player.setThumbUrl('https://**');
        html5player.setStaticDomain('***');
        html5player.setHttps();
        html5player.setCanUseHttps();
        document.getElementById('html5video').style.minHeight = '';
        html5player.initPlayer();
   }

如何从函数“html5player.setVideoUrlHigh('https://*****')获取参数


Tags: 代码fromhttps标记importre脚本url
1条回答
网友
1楼 · 发布于 2024-06-16 15:00:39

您可以使用此代码获取脚本标记

import re
from bs4 import BeautifulSoup

html = """<script>    logged_user = false;
var static_id_cdn = 17;
var html5player = new HTML5Player('html5video', '56420147');
if (html5player) {
    html5player.setVideoTitle('passionate hotel room');
    html5player.setSponsors(false);
    html5player.setVideoUrlLow('https://*****');
    html5player.setVideoUrlHigh('https://******');
    html5player.setVideoHLS(''https://****');
    html5player.setThumbUrl('https://**');
    html5player.setStaticDomain('***');
    html5player.setHttps();
    html5player.setCanUseHttps();
    document.getElementById('html5video').style.minHeight = '';
    html5player.initPlayer();
}</script>"""

soup = BeautifulSoup(HTML)

txt = soup.script.get_text()
print(txt)

输出:

logged_user = false;
var static_id_cdn = 17;
var html5player = new HTML5Player('html5video', '56420147');
if (html5player) {
    html5player.setVideoTitle('passionate hotel room');
    html5player.setSponsors(false);
    html5player.setVideoUrlLow('https://*****');
    html5player.setVideoUrlHigh('https://******');
    html5player.setVideoHLS(''https://****');
    html5player.setThumbUrl('https://**');
    html5player.setStaticDomain('***');
    html5player.setHttps();
    html5player.setCanUseHttps();
    document.getElementById('html5video').style.minHeight = '';
    html5player.initPlayer();
   }

编辑

import requests
import bs4
import re

url = 'url'
r = requests.get(url)
bs = bs4.BeautifulSoup(r.text, "html.parser")
scripts = bs.find_all('script')
src = scripts[7] #Needed script is in position 7
print(re.search("html5player.setVideoUrlHigh\(\'(.*?)\'\)", str(src)).group(1))

相关问题 更多 >