乍一看,我认为.next_sibling
和previous_sibling
应该是兄弟标记是很自然的。但我今天玩它的时候,它就产生了类似于"\n"
的NavigableString
在仔细检查its documentation之后,注意到:
In real documents, the .next_sibling or .previous_sibling of a tag will usually be a string containing whitespace. Going back to the “three sisters” document:
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a>
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>
You might think that the .next_sibling of the first <a> tag would be the second <a> tag. But actually, it’s a string: the comma and newline that separate the first <a> tag from the second:
link = soup.a
link
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
link.next_sibling
# u',\n'
The second <a> tag is actually the .next_sibling of the comma:
link.next_sibling.next_sibling
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
为什么
文件第16页 """ “”“
我希望我回答了你的问题
.find_next_sibling
属性用于对HTML文档进行细粒度搜索。CSS选择器不能做的事情(它们可以选择标记,而不是标记之间的字符串,例如,不能用CSS选择器选择字符串SELECT THIS
:<p>some text</p>SELECT THIS<p>some text</p>
)如果要搜索同级标记,请使用
find_next_sibling()
方法。您还可以通过将text=True
参数传递给find_next_sibling()
来模拟.find_next_sibling
行为:相关问题 更多 >
编程相关推荐