在firefox和python中使用selenium抓取网站

# -*- coding: utf-8 -*- import mechanize from bs4 import BeautifulSoup import re from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys from selenium.webdriver.support.ui import Select from selenium.common.exceptions import NoSuchElementException from selenium.common.exceptions import NoAlertPresentException import unittest, time, re #-----------selenium part(ignored)----------------# browser = webdriver.Chrome() # Get local session of firefox browser.get("http://fortune.com/fortune500/") time.sleep(1) # Let the page load, will be added to the API industry_button = browser.find_element_by_name('filters[Industry]') print industry_button count = 0; industry_value = industry_button.find_elements_by_tag_name('option') for number in industry_value: count += 1 print number print count

2条回答

网友

1楼 · 编辑于 2024-05-16 19:41:30

Selenium使处理select->optionHTML结构变得容易——有一个^{} class提供了一个易于使用的接口。例如，.options将列出所有可用的选项。对于每个选项，您可以获得.text来获取内部HTML，或者.get_attribute('value')来获取value属性的值。在

另外，显式地等待切换按钮出现，而不是time.sleep()。在

from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select


browser = webdriver.Chrome()
browser.get("http://fortune.com/fortune500/")

# toggle
toggle = WebDriverWait(browser, 10).until(EC.visibility_of_element_located((By.CLASS_NAME, "filter-toggle")))
toggle.click()

# get all options
industry_button = Select(browser.find_element_by_name("filters[Industry]"))
for option in industry_button.options:
    print option.text

印刷品：

^{pr2}$

网友

2楼 · 编辑于 2024-05-16 19:41:30

.text返回该元素的innerHTML，而不是值。您想要得到value属性。在

可能是周围的东西：

element.get_attribute('value')

相关问题更多 >

编程相关推荐

热门问题

热门文章