靓汤如何从div class和<ul>中获取<li>项目,而不需要ul的任何类名和ID

2024-04-28 21:56:25 发布

您现在位置:Python中文网/ 问答频道 /正文

输入数据如下所示,其中它有多个ul标记,可以在pythonbeautifulsoup中刮取。在

<div class="column one-second"><p></p> <ul> <li>Commercial automobile</li> <li>Excess liability</li> <li>General liability</li> <li>Inland marine (cargo)</li> </ul> <p></p></div> <div class="column one-second"><p></p> <ul> <li>Professional Liability</li> <li>Property</li> <li>Workers’ compensation</li> </ul> <p></p></div>

To get the listed items from `ul` tag using beautiful soup library, I tried this but did not work:

    amusements_soup.find_all('li', attrs={'id': 'menu-item-16'})


    amusements_soup.find_all('div',{'class':'column one-second'})


    ul = amusements_soup.find("h2", text="Services & Solutions").find_next_sibling("ul")

expected output :

> Commercial automobile
> 
> Excess liability
> 
> General liability
>
> Inland marine 
>
> Professional Liability
> 
> Workers’ compensation

Tags: divcolumnlifinduloneclassgeneral
2条回答

对于使用列表理解的子代组合器的类和类型选择器也是一样的

results = [item.text for item in amusements_soup.select('.one-second li')]

假设amusements_soup包含您提到的HTML,这应该可以:

from bs4 import BeautifulSoup

page = '<div class="column one-second"><p></p> <ul> <li>Commercial automobile</li> <li>Excess liability</li> <li>General liability</li> <li>Inland marine (cargo)</li> </ul> <p></p></div> <div class="column one-second"><p></p> <ul> <li>Professional Liability</li> <li>Property</li> <li>Workers’ compensation</li> </ul> <p></p></div>'
amusements_soup = BeautifulSoup(page,"html.parser")
for item in amusements_soup.findAll('div',{'class':'column one-second'}):
    sub_items = item.findAll('li')
    for sub_item in sub_items:
        print(sub_item.text)

输出:

^{pr2}$

如果这对您不起作用,您必须检查amusements_soup是否是您认为的那样

相关问题 更多 >