Python&beautifulsoup4/Selenium无法从kicksusa.com?

2024-04-26 05:10:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从kicksusa.com网站我遇到了一些问题。在

当我尝试一个基本的BS4方法时,像这样(导入是从使用所有这些的主程序复制/粘贴的):

import requests
import csv
import io
import os
import re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from datetime import datetime
from bs4 import BeautifulSoup

data1 = requests.get('https://www.kicksusa.com/')
soup1 = BeautifulSoup(data1.text, 'html.parser')

button = soup1.find('span', attrs={'class': 'shop-btn'}).text.strip()
print(button)

结果是“None”,这告诉我数据是通过JS隐藏的。所以,我试着用硒,就像这样:

^{pr2}$

我得到“找不到元素”。在

有人知道如何使用BS4或Selenium刮取这个网站吗?提前谢谢你!在


Tags: textfromimportcomdatetime网站seleniumbutton
3条回答

请试试下面的方法代码。它应该返回你的文本巴顿。希望这个帮助。在

from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument(' headless')
options.add_argument(" start-maximized")
options.add_argument(' disable-browser-side-navigation')
options.add_argument('window-size=1920x1080');
driver = webdriver.Chrome(chrome_options=options)
driver.get('https://www.kicksusa.com/')
url = driver.find_element_by_css_selector("span.shop-btn")
print(driver.execute_script("return arguments[0].innerHTML", url))

对于需要重复的链接,可以使用下面的css选择器限制为每对链接中的第一个

#products-grid .item [href]:first-child

^{pr2}$

问题是您被检测为bot并得到如下响应:

<html style="height:100%">
    <head>
        <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
        <meta name="format-detection" content="telephone=no">
        <meta name="viewport" content="initial-scale=1.0">
        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
        <script type="text/javascript" src="/_Incapsula_Resource?SWJIYLWA=719d34d31c8e3a6e6fffd425f7e032f3"></script>
    </head>
    <body style="margin:0px;height:100%">
    <iframe src="/_Incapsula_Resource?CWUDNSAI=20&xinfo=5-36224256-0%200NNN%20RT%281552245394179%20277%29%20q%280%20-1%20-1%200%29%20r%280%20-1%29%20B15%2811%2c110765%2c0%29%20U2&incident_id=314001710050302156-195663432827669173&edet=15&cinfo=0b000000"
            frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula
        incident ID: 314001710050302156-195663432827669173
    </iframe>
    </body>
</html>

请求和美化团队

如果要使用requestsbs,请从浏览器开发工具visid_incap_和{}从请求头复制到{},并在{}中使用它们:

^{pr2}$

当您运行Selenium时,有时会得到相同的响应: enter image description here

重新加载页面对我有用。尝试以下代码:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.kicksusa.com/')

if len(driver.find_elements_by_css_selector("[name=ROBOTS]")) > 0:
    driver.get('https://www.kicksusa.com/')

shop_buttons = driver.find_elements_by_css_selector("span.shop-btn")
for button in shop_buttons:
    print(button.text)

相关问题 更多 >