使用BeautifulSoup查找包含特定tex的HTML标记

网友

1楼 · 编辑于 2024-05-15 23:27:31

from BeautifulSoup import BeautifulSoup
import re

html_text = """
<h2>this is cool #12345678901</h2>
<h2>this is nothing</h2>
<h1>foo #126666678901</h1>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>
"""

soup = BeautifulSoup(html_text)


for elem in soup(text=re.compile(r' #\S{11}')):
    print elem.parent

印刷品：

<h2>this is cool #12345678901</h2>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>

网友

2楼 · 编辑于 2024-05-15 23:27:31

当使用text=作为标准而不是在其他情况下使用BeautifulSoup.Tag时，BeautifulSoup search操作会传递[a list of]BeautifulSoup.NavigableString对象。检查对象的__dict__以查看提供给您的属性。在这些属性中，parent优于previous，因为changes in BS4。

from BeautifulSoup import BeautifulSoup
from pprint import pprint
import re

html_text = """
<h2>this is cool #12345678901</h2>
<h2>this is nothing</h2>
<h2>this is interesting #126666678901</h2>
<h2>this is blah #124445678901</h2>
"""

soup = BeautifulSoup(html_text)

# Even though the OP was not looking for 'cool', it's more understandable to work with item zero.
pattern = re.compile(r'cool')

pprint(soup.find(text=pattern).__dict__)
#>> {'next': u'\n',
#>>  'nextSibling': None,
#>>  'parent': <h2>this is cool #12345678901</h2>,
#>>  'previous': <h2>this is cool #12345678901</h2>,
#>>  'previousSibling': None}

print soup.find('h2')
#>> <h2>this is cool #12345678901</h2>
print soup.find('h2', text=pattern)
#>> this is cool #12345678901
print soup.find('h2', text=pattern).parent
#>> <h2>this is cool #12345678901</h2>
print soup.find('h2', text=pattern) == soup.find('h2')
#>> False
print soup.find('h2', text=pattern) == soup.find('h2').text
#>> True
print soup.find('h2', text=pattern).parent == soup.find('h2')
#>> True

网友

3楼 · 编辑于 2024-05-15 23:27:31

对于bs4（Beautiful Soup 4），OP的尝试与预期完全一样：

from bs4 import BeautifulSoup
soup = BeautifulSoup("<h2> this is cool #12345678901 </h2>")
soup('h2',text=re.compile(r' #\S{11}'))

返回[<h2> this is cool #12345678901 </h2>]。

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用BeautifulSoup查找包含特定tex的HTML标记

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >