靓汤生第一个孩子

网友

1楼 · 编辑于 2024-05-14 09:59:25

使用现代版本的bs4（当然是bs4.7.1+），您可以访问：第一个子css伪选择器。很好，很有描述性。

from bs4 import BeautifulSoup as bs

html = '''
<div class="cities"> 
       <div id="3232"> London </div>
       <div id="131"> York </div>
  </div>
  '''
soup = bs(html, 'lxml') #or 'html.parser'
first_children = [i.text for i in soup.select('.cities div:first-child')]
print(first_children)

网友

2楼 · 编辑于 2024-05-14 09:59:25

当前接受的答案适用于所有城市，而问题只需要第一个。

如果只需要第一个子元素，可以利用.children返回迭代器而不是列表。请记住，迭代器会动态地生成列表项，并且由于我们只需要迭代器的第一个元素，因此我们不需要生成所有其他city元素（从而节省时间）。

for div in nsoup.find_all(class_='cities'):
    first_child = next(div.children, None)
    if first_child is not None:
        print(first_child.string.strip())

网友

3楼 · 编辑于 2024-05-14 09:59:25

children返回迭代器。

for div in nsoup.find_all(class_='cities'):
    for childdiv in div.find_all('div'):
        print (childdiv.string) #london, york

由于'\n'等非标记位于.children中，因此引发了AttributeRor。只需使用适当的子选择器来查找特定的div

（更多编辑）无法复制您的异常-以下是我所做的：

In [137]: print foo.prettify()
<div class="cities">
 <div id="3232">
  London
 </div>
 <div id="131">
  York
 </div>
</div>

In [138]: for div in foo.find_all(class_ = 'cities'):
   .....:     for childdiv in div.find_all('div'):
   .....:         print childdiv.string
   .....: 
 London 
 York 

In [139]: for div in foo.find_all(class_ = 'cities'):
   .....:     for childdiv in div.find_all('div'):
   .....:         print childdiv.string, childdiv['id']
   .....: 
 London  3232
 York  131

相关问题更多 >

编程相关推荐

热门问题

热门文章

靓汤生第一个孩子

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >