请求。获取返回代码,bs4给出空lis

2024-04-25 04:23:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试分析以下页面:https://www.amazon.de/s?k=lego+7134&__mk_nl_NL=amazon&ref=nb_sb_noss_1。你知道吗

你知道吗请求。获取获取全部代码,但当我尝试使用Beautiful Soup解析它时,它返回一个空列表[]。你知道吗

我尝试过编码,使用chromium,请求html,不同的解析器,替换代码的开头等等。我很遗憾地说,似乎什么都不管用。你知道吗

from fake_useragent import UserAgent
from lxml import html
import requests
from bs4 import BeautifulSoup as soup

url = "https://www.amazon.de/s?k=lego+7134&__mk_nl_NL=amazon&ref=nb_sb_noss_1"

userAgentList = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36',
    'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36',
    'Mozilla/5.0 (Windows NT 5.1; rv:7.0.1) Gecko/20100101 Firefox/7.0.1',
    'Mozilla/5.0 (Windows NT 5.1; rv:36.0) Gecko/20100101 Firefox/36.0',
]

proxyList = [
    'xxx.x.x.xxx:8080',
    'xx.xx.xx.xx:3128',
]

def make_soup_am(url):
    print(url)
    random.shuffle(proxyList)
    s = requests.Session()
    s.proxies = proxyList
    headers = {'User-Agent': random.choice(userAgentList)}
    pageHTML = s.get(url, headers=headers).text
    pageSoup = soup(pageHTML, features='lxml')
    return pageSoup

make_soup_am()

有人有主意吗?你知道吗

提前谢谢

汤姆


Tags: fromimporturlmozillaamazonwindowschromesafari