我尝试分析以下页面:https://www.amazon.de/s?k=lego+7134&__mk_nl_NL=amazon&ref=nb_sb_noss_1。你知道吗
你知道吗请求。获取获取全部代码,但当我尝试使用Beautiful Soup解析它时,它返回一个空列表[]。你知道吗
我尝试过编码,使用chromium,请求html,不同的解析器,替换代码的开头等等。我很遗憾地说,似乎什么都不管用。你知道吗
from fake_useragent import UserAgent
from lxml import html
import requests
from bs4 import BeautifulSoup as soup
url = "https://www.amazon.de/s?k=lego+7134&__mk_nl_NL=amazon&ref=nb_sb_noss_1"
userAgentList = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
'Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1',
'Mozilla/5.0 (Windows NT 5.1; rv:36.0) Gecko/20100101 Firefox/36.0',
]
proxyList = [
'xxx.x.x.xxx:8080',
'xx.xx.xx.xx:3128',
]
def make_soup_am(url):
print(url)
random.shuffle(proxyList)
s = requests.Session()
s.proxies = proxyList
headers = {'User-Agent': random.choice(userAgentList)}
pageHTML = s.get(url, headers=headers).text
pageSoup = soup(pageHTML, features='lxml')
return pageSoup
make_soup_am()
有人有主意吗?你知道吗
提前谢谢
汤姆
目前没有回答
相关问题 更多 >
编程相关推荐