在标记之间获取多个文本块

<div class="left_panel"> <h4>Header1</h4> block of text that I want. <br /> <br /> another block of text that I want. <br /> <br /> still more text that I want. <br /> <br /> <p> </p> <h4>Header2</h4>

2条回答

网友

1楼 · 编辑于 2024-05-19 17:02:57

我不明白为什么你要把soup作为参数传递，但是你没有使用它。在

如果您使用了正确的soup实例，就不会出现该错误。findAllNext(h4)返回<h4>Header1</h4>和{}，对每一个应用nextSibling返回文本同级，它们是

block of text that I want.

以及

^{pr2}$

对你来说。在

网友

2楼 · 编辑于 2024-05-19 17:02:57

找到第一个标题并在^{}上迭代，直到找到另一个标题：

from bs4 import BeautifulSoup

data = """
<div class="left_panel">
    <h4>Header1</h4>
      block of text that I want.
    <br />
    <br />
      another block of text that I want.
    <br />
    <br />
      still more text that I want.
    <br />
    <br />
      <p>&nbsp;</p>
    <h4>Header2</h4>
</div>
"""

soup = BeautifulSoup(data)
header1 = soup.find('h4', text='Header1')
for item in header1.next_siblings:
    if getattr(item, 'name') == 'h4' and item.text == 'Header2':
        break

    print item

更新（收集两个h4标记之间的文本）：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

在标记之间获取多个文本块

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >