使用soup.select选择Beautiful Soup中的第二个子元素？

19 投票

3 回答

25457 浏览

提问于 2025-04-18 13:07

我有：

<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>

现在，如果我已经有了h2标签，最简单的方法来获取这里的“Peter”是什么呢？我试过：

soup.select("#names > p:nth-child(1)")

但是这里我遇到了一个叫做nth-child NotImplementedError的错误：

NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.

所以我不太明白这是怎么回事。第二个选择是获取所有的'p'标签子元素，然后硬性选择[1]，但这样会有索引超出范围的风险，这就需要在每次尝试获取Peter的时候用try/except包裹，这样做有点傻。

有没有办法用soup.select()函数选择nth-child？

编辑： 把nth-child换成nth-of-type似乎解决了问题，所以正确的代码是：

soup.select("#names > p:nth-of-type(1)")

不太明白为什么它不接受nth-child，但看起来nth-child和nth-of-type返回的结果是一样的。

error handling data extraction web scraping beautiful soup html parsing css selectors nth-child nth-of-type

3 个回答

Beautiful Soup 4.7.0（在2019年初发布）现在支持大部分选择器，包括 :nth-child。

从4.7.0版本开始，Beautiful Soup通过SoupSieve项目支持大多数CSS4选择器。如果你是通过 pip 安装Beautiful Soup的，SoupSieve会自动安装，所以你不需要额外操作。

所以，如果你升级你的版本：

pip install bs4 -U

你就可以使用几乎所有你需要的选择器，包括 nth-child。

不过要注意，在你的输入HTML中， #names 的 h2 标签实际上没有任何子元素：

<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>

这里只有3个元素，它们都是兄弟元素，所以

#names > p:nth-child(1)

即使在CSS或JavaScript中，这也不会起作用。

如果 #names 元素有 <p> 作为 子元素，你的选择器就能在一定程度上工作：

html = '''
<div id='names'>
    <p>John</p>
    <p>Peter</p>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
soup.select("#names > p:nth-child(1)")

输出：

[<p>John</p>]

当然， John 的 <p> 是 #names 的第一个子元素。如果你想要 Peter，可以使用 :nth-child(2)。

如果这些元素都是相邻的兄弟元素，你可以用 + 来选择下一个兄弟元素：

html = '''
<h2 id='names'>Names</h2>
<p>John</p>
<p>Peter</p>
'''
soup = BeautifulSoup(html, 'html.parser')
soup.select("#names + p + p")

输出：

[<p>Peter</p>]

回答于 2025-04-18 由 Python大师

分享举报

'nth-of-child' 在 beautifulsoup4 中根本没有实现（截至目前），也就是说在 beautifulsoup 的代码里没有相关的代码可以使用。作者特意添加了 'NotImplementedError' 来说明这一点，这里是相关代码

根据你在问题中提到的 HTML，你并不是在寻找 h2#names 的子元素。

你真正想要找的是第二个相邻的兄弟元素，我不是 CSS 选择器的专家，但我发现这样做是有效的。

soup.select("#names + p + p")

回答于 2025-04-18 由 Python大师

分享举报

把你的修改作为一个答案，这样其他人更容易找到：

使用 nth-of-type 而不是 nth-child：

soup.select("#names > p:nth-of-type(1)")

回答于 2025-04-18 由 Python大师

分享举报

使用soup.select选择Beautiful Soup中的第二个子元素？

3 个回答

撰写回答