意外的TagNameException:Message:Select仅适用于<Select>元素,而不适用于使用Selenium和Python选择国家时出现的<div>错误

2024-04-23 15:08:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我想在迭代散点图中刮取。 在这样做之前,我想改变国家,但我没有选择我想要的国家

网站链接: https://vizhub.healthdata.org/tobacco/

错误是

---------------------------------------------------------------------------
UnexpectedTagNameException                Traceback (most recent call last)
<ipython-input-46-fa99f3a62600> in <module>
      1 cc=driver.find_element_by_xpath('//*[@id="location_id"]')
----> 2 select_English=Select(cc)
      3 #select_English.select_by_visible_text('English (GB)')

/Applications/anaconda3/lib/python3.7/site-packages/selenium/webdriver/support/select.py in __init__(self, webelement)
     37             raise UnexpectedTagNameException(
     38                 "Select only works on <select> elements, not on <%s>" %
---> 39                 webelement.tag_name)
     40         self._el = webelement
     41         multi = self._el.get_attribute("multiple")

UnexpectedTagNameException: Message: Select only works on <select> elements, not on <div>

我从网站上复制了xpath,这表明它是一个select元素

这是我的密码

import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium.common.exceptions import NoSuchElementException
chrome_optionsme = Options()
chrome_optionsme.add_argument("--incognito")
chrome_optionsme.add_argument("--window-size=1920x1080")
driver = webdriver.Chrome(options=chrome_optionsme, 
                          executable_path="path/chromedriver")
url="https://vizhub.healthdata.org/tobacco/"
driver.get(url)
driver.find_element_by_xpath('//*[@id="data"]').click()
cc=driver.find_element_by_xpath('//*[@id="location_id"]')
select_English=Select(cc)

其次,如何使用selenium提取迭代图数据


Tags: fromimportidsupportbyenglishondriver
2条回答

选择元素是一个占位符。点击和其他事件由其他元素处理

也许您可以完全绕过HTML。下面返回一个相当大的JSON

import requests

import json

CHINA = 6

url = 'https://vizhub.healthdata.org/tobacco/php/getCountryData.php'

r = requests.post(url, data={'location_id': CHINA})

d = json.loads(r.content)

print(d)

此错误消息

UnexpectedTagNameException: Message: Select only works on <select> elements, not on <div>

…意味着您的程序在<div>元素上调用了^{},其中as ^{}仅与标记一起工作


解决方案

网站https://vizhub.healthdata.org/tobacco/内的国家下拉列表可通过以下方式访问:

  • 使用css_selector

    a.select2-choice>span
    
  • 使用xpath

    //a[@class='select2-choice']/span
    

快照:

country

因此,要从国家下拉列表中选择中国,可能的解决方案应该是:

driver.get("https://vizhub.healthdata.org/tobacco/")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//img[@id='data']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@class='select2-choice']/span"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//ul[@class='select2-results']//div[@class='select2-result-label' and text()='China']"))).click()

tl;博士

然而,解决方案在我这方面没有起作用,当我继续检查网页DOM Tree时,发现一些<script>标签指的是JavaScripts具有关键字dist。例如:

  • <script type="text/javascript" src="https://unpkg.com/react-dom@15.6.2/dist/react-dom.min.js"></script>
  • <script type="text/javascript" src="https://unpkg.com/ihme-ui@0.34.0/dist/ihme-ui.js"></script>
  • <script type="text/javascript" src="https://unpkg.com/clipboard@1.7.1/dist/clipboard.min.js"></script>
  • <link type="text/css" rel="stylesheet" href="https://unpkg.com/ihme-ui@0.34.0/dist/ihme-ui.css">

这清楚地表明,网站受到机器人管理服务提供商Distil Networks的保护,由Selenium驱动ChromeDriver启动的浏览上下文的导航被检测到,随后被阻止


蒸馏

根据{a11}条:

Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.

此外

"One pattern with Selenium was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".


参考文献

您可以在以下内容中找到一些详细的讨论:

相关问题 更多 >