BeautifulSoup/Python中contains()选择器的等效方法

1 投票

3 回答

2154 浏览

提问于 2025-04-17 08:40

使用jQuery选择器，你可以选择一个包含文本“John”的

，方法是用$("div:contains('John')")，这样你就可以找到第二个<div>元素。

<div>Bill</div>
<div>John</div>
<div>Joe</div>

我想知道在Python的Beautiful Soup或者其他Python模块中怎么做这个？

我刚刚看了一场关于网络爬虫的讲座，讲者提到在lxml中可以使用CSS选择器。请问我必须用这个吗，还是只用Beautiful Soup也可以？

背景：我问这个是为了处理一个抓取下来的网页。

lxml 网页抓取网络爬虫 beautifulsoup CSS选择器文本选择 jQuery选择器

3 个回答

Beautiful Soup 现在支持 :contains 选择器啦！

如果你想找一个包含文本 John 的 div，可以试试：

html = """
<div>Bill</div>
<div>John</div>
<div>Joe</div>
"""
soup = BeautifulSoup(html, "html.parser")

>>> print(soup.select_one("div:contains('John')"))
<div>John</div>

注意：使用选择器的时候，要用 .select_one() 代替 .find()，或者用 select() 代替 find_all()。

回答于 2025-04-17 由 Python大师

分享举报

在编程中，有时候我们需要把一些数据从一个地方转移到另一个地方。这就像把水从一个杯子倒到另一个杯子一样。这个过程可能会涉及到不同的步骤，比如选择要转移的数据、确定目标位置等等。

有些时候，转移的数据可能会很大，或者需要在转移的过程中进行一些处理。这就像在倒水的时候，可能需要先过滤一下水，确保它是干净的。

在编程中，我们可以使用一些工具和方法来帮助我们完成这些数据转移的任务。这些工具就像是我们在厨房里用的各种器具，帮助我们更方便地完成工作。

总之，数据转移是编程中一个常见的任务，理解这个过程能帮助我们更好地处理和管理数据。

>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup("""
... <div>Bill</div>
... <div>John</div>
... <div>Joe</div>
... """)
# equality
>>> [tag for tag in soup.findAll('div') if tag.text == 'John']
[<div>John</div>]
# containment
>>> [tag for tag in soup.findAll('div') if 'John' in tag.text]
[<div>John</div>]

回答于 2025-04-17 由 Python大师

分享举报

这里有一种更简洁的方法，使用BeautifulSoup库：

>>> soup('div', text='John')
[u'John']
>>> import re
>>> soup('div', text=re.compile('Jo'))
[u'John', u'Joe']

soup()和soup.findAll()是一样的。你可以用字符串、正则表达式或者任意函数来选择你需要的内容。

在你的情况下，标准库里的ElementTree就足够用了：

from xml.etree import cElementTree as etree

xml = """
    <div>Bill</div>
    <div>John</div>
    <div>Joe</div>
"""
root = etree.fromstring("<root>%s</root>" % xml)
for div in root.getiterator('div'):
    if "John" in div.text:
       print(etree.tostring(div))

回答于 2025-04-17 由 Python大师

分享举报

BeautifulSoup/Python中contains()选择器的等效方法

3 个回答

撰写回答