如何在网站HTML/JavaScript中查找和解码URL编码字符串以抓取OddsPortal的实时赔率？

1 投票

1 回答

154 浏览

提问于 2025-04-14 15:35

我正在做一个项目，想从OddsPortal网站上抓取各个比赛的实时赔率。你可以在这个链接找到相关信息：https://www.oddsportal.com/inplay-odds/live-now/football/，我还参考了这个有用的指南：https://github.com/jckkrr/Unlayering_Oddsportal。

我的目标是获取每场比赛的实时赔率数据，但在访问所需的链接时遇到了一些困难。

我使用Python的requests库，可以从这个链接获取所有实时比赛的列表：https://www.oddsportal.com/feed/livegames/liveOdds/0/0.dat?_=

import requests

url = "https://www.oddsportal.com/feed/livegames/liveOdds/0/0.dat?_="
response = requests.get(url)
data = response.text

问题出现在尝试访问每场比赛的实时赔率时。

赔率信息存储在不同的链接中，链接的结构如下：https://fb.oddsportal.com/feed/match/1-1-{match_id_code}-1-2-{secondary_id_code}.dat

这是一个单场实时比赛网页的截图，以及它对应的赔率链接：https://www.oddsportal.com/feed/live-event/1-1-AsILkjnd-1-2-yjbd1.dat（当比赛结束时，这个赔率链接会返回404错误）

在这个例子中（来自截图），第一个id代码AsILkjnd可以在这个链接的实时比赛列表中找到：https://www.oddsportal.com/feed/livegames/liveOdds/0/0.dat?_=

但是第二个id代码在这里找不到，甚至在单个页面的HTML中也没有。

我现在卡在了如何找到和解码这个第二个id代码上。

它似乎是一个URL编码的字符串，类似于%79%6a%39%64%39，我认为这个字符串隐藏在网站的HTML或JavaScript代码中。

到目前为止，我还没有找到这些编码的字符串。

有没有人能帮我找到并解码这些URL编码的字符串？

javascript 数据解析网页抓取 html解析 url编码实时赔率 oddsportal 编码字符串

1 个回答

因为 secondary_id_code 这个信息不容易直接找到，所以它可能是通过 JavaScript 动态加载到页面上的。像 OddsPortal 这样的网站通常会使用 JavaScript 来动态加载数据，这意味着单单获取页面的 HTML 可能无法显示浏览器用户看到的所有数据。下面是处理这个问题的方法：

1. 分析网络流量

使用浏览器的开发者工具（通常可以通过按 F12 或右键点击选择“检查”来打开），然后切换到“网络”标签。
刷新页面，观察在初始页面加载后出现的 XHR（XMLHttpRequest）或 Fetch 请求。这些请求通常会获取动态内容，比如你的 secondary_id_code。

2. 使用 Selenium 或类似工具：

由于 secondary_id_code 可能是动态加载的，可以考虑使用 Selenium 这个工具，它可以自动操作网页浏览器。Selenium 能像真实的浏览器一样执行 JavaScript，这样你就能访问到动态加载的数据。
下面是使用 Selenium 访问动态内容的简单方法：


    from selenium import webdriver
    
    # Path to your WebDriver (e.g., ChromeDriver)
    driver_path = '/path/to/your/chromedriver'
    
    # URL of the live matches page
    url = 'https://www.oddsportal.com/inplay-odds/live-now/football/'
    
    # Initialize the WebDriver and open the URL
    driver = webdriver.Chrome(executable_path=driver_path)
    driver.get(url)
    
    # You may need to wait for the page to load dynamically loaded content
    # For this, Selenium provides explicit and implicit waits
    
    # Now, you can search the DOM for the `secondary_id_code` as it would be rendered in a browser
    # For example, finding an element that contains the code, or observing AJAX requests that might contain it
    # This could involve analyzing the page's JavaScript or observing network requests, as mentioned earlier
    
    # Always remember to close the WebDriver
    driver.quit()

3. 解码 secondary_id_code

如果你找到了 secondary_id_code，但它是 URL 编码的（比如 %79%6a%39%64%39），你可以使用 Python 的 urllib.parse.unquote() 函数来解码它：


    from urllib.parse import unquote
    
    encoded_str = '%79%6a%39%64%39'
    decoded_str = unquote(encoded_str)
    print(decoded_str)  # This will print the decoded string

回答于 2025-04-14 由 Python大师

分享举报

如何在网站HTML/JavaScript中查找和解码URL编码字符串以抓取OddsPortal的实时赔率？

1 个回答

1. 分析网络流量

2. 使用 Selenium 或类似工具：

3. 解码 secondary_id_code

撰写回答