如何使用BeautifulSoup根据标签的子级和同级来选择标签？

 OBAMAObama's first quotes More quotes from Obama Some more Obama quotes ModeratorModerator's quotes Some more quotes ROMNEYRomney's quotes More quotes from Romney Some more Romney quotes 

input = ''' OBAMAObama's first quotes More quotes from Obama Some more Obama quotes ModeratorModerator's quotes Some more quotes ROMNEYRomney's quotes More quotes from Romney Some more Romney quotes ''' soup = BeautifulSoup(input) debate_text = soup.find("span", { "class" : "displaytext" }) president_quotes = debate_text.find_all("i", text="OBAMA") for i in president_quotes: siblings = i.next_siblings for sibling in siblings: print(sibling)

2条回答

网友

1楼 · 编辑于 2024-06-06 19:49:31

我认为一种类似于finite state machine的解决方案会在这里起作用。像这样：

soup = BeautifulSoup(input, 'lxml')
debate_text = soup.find("span", { "class" : "displaytext" })
obama_is_on = False
obama_tags = []
for p in debate_text("p"):
    if p.i and 'OBAMA' in p.i:
        # assuming <i> is used only to indicate speaker
        obama_is_on = True
    if p.i and 'OBAMA' not in p.i:
        obama_is_on = False
        continue
    if obama_is_on:
        obama_tags.append(p)
print(obama_tags)

[<p>
<i>OBAMA</i>Obama's first quotes
        </p>, <p>More quotes from Obama</p>, <p>Some more Obama quotes</p>]

网友

2楼 · 编辑于 2024-06-06 19:49:31

其他的Obama引号是p的兄弟姐妹，而不是i，因此您需要找到i的父母的兄弟姐妹。当您在这些兄弟姐妹之间循环时，您可以在有i时停止。像这样：

for i in president_quotes:
    print(i.next_sibling)
    siblings = i.parent.find_next_siblings('p')
    for sibling in siblings:
        if sibling.find("i"):
            break
        print(sibling.string)

打印内容：

Obama's first quotes

More quotes from Obama
Some more Obama quotes

相关问题更多 >

编程相关推荐

热门问题

热门文章