无法加载该网页https://www.riachuelo.com.br/feminino/colecaofeminino 使用Selenium和Python

from selenium import webdriver from selenium.webdriver.chrome.options import Options from fake_useragent import UserAgent URL = "https://www.riachuelo.com.br/feminino/colecao-feminino" options = Options() ua = UserAgent() userAgent = ua.random options.add_argument(f'user-agent={userAgent}') driver = webdriver.Chrome(chrome_options=options,executable_path=r"C:\Program Files (x86)\chromedriver.exe") driver.get(URL)

1条回答

网友

1楼 · 发布于 2024-06-10 05:53:43

我使用Selenium在https://www.riachuelo.com.br/feminino/colecao-feminino处执行您的用例以加载网页，如下所示：

from selenium import webdriver

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.riachuelo.com.br/feminino/colecao-feminino')

同样，根据您的观察，我遇到了网页从未加载的相同障碍：

分析

在检查网页的DOM Tree时，您会发现一些<iframe>，<script>标记引用了关键字dist。例如：

src="https://dtbot.directtalk.com.br/1.0/staticbot/dist/js/../index.html#!/?token=c243ce95-db6c-4ab6-9f2b-bf60d69c2d3d&widget=true&top=40&text=Alguma%20d%C3%BAvida%3F&textcolor=ffffff&bgcolor=4E1D3A&from=bottomRigth"
<script id="dtbot-script" src="https://dtbot.directtalk.com.br/1.0/staticbot/dist/js/dtbot.js?token=c243ce95-db6c-4ab6-9f2b-bf60d69c2d3d&widget=true&top=40&text=Alguma%20d%C3%BAvida%3F&textcolor=ffffff&bgcolor=4E1D3A&from=bottomRigth"></script>

这清楚地表明网站受到机器人管理服务提供商Distil Networks的保护，并且ChromeDriver的导航被检测到，随后被阻止

蒸馏

根据第There Really Is Something About Distil.it...条：

Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.

此外

"One pattern with Selenium was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".

参考文献

您可以在以下内容中找到一些详细的讨论：

分析

蒸馏

参考文献

相关问题更多 >

编程相关推荐

热门问题

热门文章