无法创建适当的选择器来分析某个字符串

1条回答

网友

1楼 · 发布于 2024-04-27 04:53:35

对于这个问题的具体例子，最好的答案是：

for item in root.cssselect(".expected-content"):
    print(item.tail)

aselement.tail返回最后一个子级之后的文本。但是，如果所需的文本在子节点之前或之间，则这将不起作用。因此，一个更可靠的解决方案是：

item.text_content()根据文件：

Returns the text content of the element, including the text content of its children, with no markup.

所以，如果你不想要孩子们的文本，先删除这些：

from lxml.html import fromstring

html_elem="""
<a class="expected-content" href="/4570/I-wanna-be-scraped-alone">
    <span class="undesirable-content">I shouldn't be parsed</span>
    I wanna be scraped alone
</a>
"""

root = fromstring(html_elem)
for item in root.cssselect(".expected-content"):
    for child in item:
        child.drop_tree()
    print(item.text_content())

请注意，在这个示例中也返回了一些空白，我确信这很容易清除。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

无法创建适当的选择器来分析某个字符串

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >