无法使用python selenium从表中提取值

driver = webdriver.Chrome('/Users/Administrador/Documents/chromedriver') main_url = 'https://www.justiciacordoba.gob.ar/Estatico/JEL/Escrutinios/ReportesEleccion20190512/default.html' driver.get(main_url) # This works perfectly driver.switch_to.frame("topFrame") dropdown_secciones = driver.find_element_by_xpath('./html/body/table/tbody/tr/td/table/tbody/tr[2]/td/table/tbody/tr[1]/td[2]/select') select_box_secciones = Select(dropdown_secciones) select_box_secciones.select_by_value("1|308") dropdown_circuitos = driver.find_element_by_xpath('//*[@id="cmbCircuitos"]') select_box_circuitos = Select(dropdown_circuitos) select_box_circuitos.select_by_index(1) mostrar_click = driver.find_element_by_xpath('/html/body/table/tbody/tr/td/table/tbody/tr[3]/td/div/input[1]') mostrar_click.click() driver.switch_to.default_content() driver.switch_to.frame('mainFrame') # This doesn´t work. No element is found. for r in range(8,35): #from row 8 up to row 35 for c in range(3,7): #starting in column 3 up to column 7 value = driver.find_element_by_xpath('/html/body/table/tbody/tr["+str(r)+"]/td["+str(c)+"]').text print(value)

2条回答

网友

1楼 · 编辑于 2024-04-26 04:06:42

我看了一下表，问题是有些行可能有空列（尝试获取tr[9]，您就会明白我的意思

另外，对于包含3列值的行，您可能已经知道，有些行的td元素比其他行少

因此，您可以保留8-35范围，但在此范围内，您可以使用以下xpath获取所有实际包含文本的列（使用find_元素获取列表）

for r in range(8,35):
    columns = driver.find_elements_by_xpath('/html/body/table/tbody/tr[{0}]/td[text()]'.format(r))

    if len(columns) > 0:
        print(columns[0].text)
        print(columns[1].text)
        print(columns[2].text)

xpath应该始终为您提供3个元素，每个元素对应于中包含文本的列。如前所述，某些行将为空，但您可以轻松处理，因为列结果的计数为0

网友

2楼 · 编辑于 2024-04-26 04:06:42

下面是一段您可能感兴趣的代码

我认为，与xpath操作相比，使用BeautifulSoup进行html解析会更好

下面代码的思想是：

一旦我们切换到“大型机”，等待“表”出现
对于每个“tr”元素，查找其中具有“class”属性的所有“td”元素（因为这些元素包含数据）
如果“td”元素的数量为3，则获取此数据

这种方法的优点：

你们不需要知道具体的时间，直到桌子出现
您不需要知道具体的行数、开始或结束的索引

换句话说，这完全是相对的

from selenium.webdriver.support.ui import Select
from selenium import webdriver
from bs4 import BeautifulSoup
import time

url = "https://www.justiciacordoba.gob.ar/Estatico/JEL/Escrutinios/ReportesEleccion20190512/default.html"

driver = webdriver.Chrome("C:\\path\\to\\chromedriver.exe")
driver.get(url)

driver.switch_to.frame("topFrame")

select_box_secciones = Select(driver.find_element_by_id('cmbSecciones'))
select_box_circuitos = Select(driver.find_element_by_id('cmbCircuitos'))
mostrar = driver.find_element_by_id('cmdMostrar')

select_box_secciones.select_by_value("1|308")
select_box_circuitos.select_by_index(1)
mostrar.click()

driver.switch_to.default_content()
driver.switch_to.frame('mainFrame')

while 'table' not in driver.page_source:
    time.sleep(0.1)

soup = BeautifulSoup(driver.page_source, "html.parser")
for tr in soup.find('table').find_all('tr'):
    row = tr.find_all(lambda td: td.has_attr('class'))

    if (len(row) == 3) and (row[0].text != 'Nº'):
        data = [td.text for td in row]
        print(data)

driver.quit()

上述脚本的输出为：

['P22', 'PARTIDO HUMANISTA', '117']
['A500', 'CORDOBA CAMBIA', '2.999']
['P217', 'ENCUENTRO VECINAL CÓRDOBA', '786']
['20', 'UNIÓN DEL CENTRO DEMOCRÁTICO (U.CE.DE.)', '21']
['3', 'UNIÓN CÍVICA RADICAL', '1.053']
['A830', 'FRENTE DE IZQUIERDA Y DE LOS TRABAJADORES', '611']
['P238', 'MOVIMIENTO AVANZADA SOCIALISTA', '35']
['A601', 'HACEMOS POR CORDOBA', '4.059']
['P57', 'MOVIMIENTO DE ACCIÓN VECINAL', '31']
['P191', 'VECINALISMO INDEPENDIENTE', '152']
['P200', 'PARTIDO UNION CIUDADANA', '135']
['A300', 'MST - NUEVA IZQUIERDA', '329']

相关问题更多 >

编程相关推荐

热门问题

热门文章