找到不同的字符串并返回BeautifulSoup中的包含标记

网友

1楼 · 编辑于 2024-05-23 17:17:56

进行这种复杂匹配的最简单方法是write a function that performs the match，并将函数作为text参数的值传入。在

def must_contain_all(*strings):                                                 
    def must_contain(markup):                                                   
        return markup is not None and all(s in markup for s in strings)         
    return must_contain

现在可以得到匹配的字符串：

^{pr2}$

要获取包含字符串的标记，请使用.parent运算符：

print [text.parent for text in soup.find_all(text=must_contain_all("world", "puzzle"))]
# [<p>Who in the world am I? Ah, that's the great puzzle.</p>]

网友

2楼 · 编辑于 2024-05-23 17:17:56

您可能需要考虑使用lxml而不是BeautifulSoup。 lxml允许您通过xpath查找元素：

使用此锅炉板设置：

import lxml.html as LH
import re

html = """
<p>
If everybody minded their own business, the world would go around a great deal faster than it does.
</p>

<p>
Who in the world am I? Ah, that's the great puzzle.
</p>
"""

doc = LH.fromstring(html)

这将查找包含字符串world的所有<p>标记中的文本：

^{pr2}$

这将查找包含world和{}的所有<p>标记中的所有文本：

print(doc.xpath('//p[contains(text(),"world") and contains(text(),"puzzle")]/text()'))
["\nWho in the world am I? Ah, that's the great puzzle.\n"]

网友

3楼 · 编辑于 2024-05-23 17:17:56

但最有效的方法可能是：

len(set(soup.find_all(text="world")
    & set(soup.find_all(text="book")
    & set(soup.find_all(text="puzzle")))

相关问题更多 >

编程相关推荐

热门问题

热门文章

找到不同的字符串并返回BeautifulSoup中的包含标记

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >