Python请求库从POST请求中提取单独的JSON和HTML响应

1条回答

网友

1楼 · 发布于 2024-04-26 22:04:43

这可能是因为页面发送到浏览器的javascript向API发出请求，以获取有关电影的json信息

您可以尝试直接将请求发送到他们的API（请参见编辑2），使用类似Beautiful Soup的库解析html，也可以使用python中的专用刮取库。我对scrapy有很好的经验。它比请求快得多

编辑：

如果页面使用动态加载的内容（我认为是这种情况），则必须在PhantomJS浏览器中使用selenium，而不是请求。以下是一个例子：

from bs4 import BeautifulSoup
from selenium import webdriver

url = "your url"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')

# Then parse the html code here

或者你可以load the dynamic content with scrapy

如果你想进入刮削行业，我推荐后者。这将需要更多的时间来学习，但这是一个更好的解决方案

编辑2：

要直接向他们的api发出请求，您只需复制您看到的请求即可。使用google chrome，如果您单击请求并转到“标题”，您可以看到该请求：

之后，您只需使用请求库复制请求：

import requests
import json

url = 'http://paste.the.url/?here='

response = requests.get(url)

content = response.content

# in my case content was byte string 
# (it looks like b'data' instead of 'data' when you print it)
# if this is you case, convert it to string, like so

content_string = content.decode()

content_json = json.loads(content_string)

# do whatever you like with the data

您可以根据需要修改url，例如，如果它类似于http://api.movies.com/?page=1&movietype=3，您可以将movietype=3修改为movietype=2以观看不同类型的电影，等等

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python请求库从POST请求中提取单独的JSON和HTML响应

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >