Python：使用请求访问会话存储

1条回答

网友

1楼 · 发布于 2024-06-10 11:01:47

IIUC：编写以下代码是为了将sessionStorage属性值从网页提取到Python dict

import re
import json
from bs4 import BeautifulSoup as bs
import requests

# Setup.
site = 'http://www.some-site.com/page'
exp = '^[\n\s]+sessionStorage.setItem\(.*JSON.stringify\((?P<content>{.*})\)\)'

r = requests.get(site)
if r.status_code == 200:
    soup = bs(r.text)
    # Extract all <script> tags from the full HTML.
    scripts = soup.findAll('script')
    # Loop through all <script> tags until sessionStorage is found.
    script = [s.string for s in scripts if 'sessionStorage' in s.decode()]
    # Use regex (with a named capture group) to extract the JSON data.
    m = re.match(exp, script[0])
    if m:
        content = m['content']
        # Convert scraped JSON data to a dict.
        data = json.loads(content)

注意：regex模式可能需要修改，以适合您（用户）的特定用例

TL；博士（背景）：

我在寻找上述代码更优雅的解决方案时遇到了这个问题

在我的例子中，我正在为一个站点编写单元测试，需要从一个特定的网页获取sessionStorage属性，以测试它是否包含预期的元素。由于数据是JSON格式的，因此此代码提取JSON数据并转换为Python dict以供检查

TL；博士（背景）：

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python：使用请求访问会话存储

TL；博士（背景）：

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >