如何在javascript网站上使用XPath获取数据？

import lxml.html import lxml.etree import requests link = 'http://www.inquirer.net/' res = requests.get(link) r = res.content html_content = lxml.html.fromstring(r) root = html_content.xpath('//*[@id="tgs3_info"]/h2') print(root)

2条回答

网友

1楼 · 编辑于 2024-06-16 14:22:57

输入来自Xurasky的解决方案以避免403错误

import lxml.html
import lxml.etree
from urllib.request import Request, urlopen

req = Request('http://www.inquirer.net/', headers={'User-Agent': 'Mozilla/5.0'})
r = urlopen(req).read()
html_content = lxml.html.fromstring(r)
root = html_content.xpath('//*[@id="tgs3_info"]/h2')
for a in root:
    print(a.text_content())

输出

^{pr2}$

网友

2楼 · 编辑于 2024-06-16 14:22:57

我相信你会得到urllib.error.HTTPError：HTTP错误403:禁止错误。在

您可以使用

import lxml.html
import lxml.etree
from urllib.request import Request, urlopen

req = Request('http://www.inquirer.net/', headers={'User-Agent': 'Mozilla/5.0'})
res = urlopen(req).read()
html_content = lxml.html.fromstring(r)
root = html_content.xpath('//*[@id="tgs3_info"]/h2')
print(root)

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在javascript网站上使用XPath获取数据？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >