获取所有链接的文本

2024-04-24 22:03:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试获取某个a的文本,它住在一个div中的li中。类似:div -> ul -> li -> a。但我没有达到这个目标。我可以打印第一项,当我将.find更改为findAll时,控制台返回一个错误:

"ResultSet object has no attribute `'%s'`. You're probably treating a list of items like a single item. Did you call `find_all()` when you meant to call `find()`?" 
`% key` AttributeError: `ResultSet` object has no attribute 'text'. You're probably treating a list of items like a single item. Did you call `find_all()` when you meant to call `find()`?

我的代码到现在为止:

 1 from urllib.request import urlopen
 2 from bs4 import  BeautifulSoup
 3 import pandas as pd
 4 
 5 url = "http://amoraosromances.blogspot.com/"
 6 page = urlopen(url)
 7 soup = BeautifulSoup(page, 'lxml')
 8 
 9 for div in soup.findAll('div', class_='widget Label', id='Label2'):
10     a = div.findAll('a')
11     print(a.text)

Tags: noimportdivreyouobjectattributeli
1条回答
网友
1楼 · 发布于 2024-04-24 22:03:58

如果您没有绑定到beautifulsoup,那么可以使用Selenium WebDriver执行相同的操作

我试过这段代码来得到你想要的

from selenium import webdriver
import time

chrome_path  = 'path_to_chromedriver_exe'

driver = webdriver.Chrome(chrome_path)

driver.maximize_window()

driver.get('http://amoraosromances.blogspot.com/')
time.sleep(5)

parent_element = driver.find_element_by_css_selector('div#Label2.widget.Label > div > ul')
child_elements = parent_element.find_elements_by_tag_name('li')

for i in child_elements:
    print(i.text)


driver.quit()

这使得输出像

ABANDONADO NO ALTAR (8)
ACIDENTE (97)
ADOLESCENTE (23)
ADORÁVEL PRISIONEIRA FANFIC (1)
ADULTÉRIO (26)
AEROMOÇA (3)
AGENCIA DE CASAMENTO (3)
AMANTE (74)
....

如果您想为Chrome设置Selenium,可以使用这个link开始使用Selenium

相关问题 更多 >