lxml xpath表达式，用于选择给定子节点（包括其子节点）下的所有文本

<node1> <text title='book'> <div chapter='0'> <div id='theNode'> <p xml:id="40"> A House that has: <p xml:id="45">- a window;</p> <p xml:id="46">- a door</p> <p xml:id="46">- a door</p> its a beuatiful house </p> </div> </div> </text> </node1>

2条回答

网友

1楼 · 编辑于 2024-04-20 02:06:13

另一种选择：

XML_tree = etree.fromstring(XML_content)
text = [el.strip() for el in XML_tree.xpath('//text()[ancestor::text[@title="book"]][normalize-space()]')]
print(" ".join(text))
print("\n".join(text))

输出：

A House that has: - a window; - a door - a door its a beuatiful house
A House that has:
- a window;
- a door
- a door
its a beuatiful house

网友

2楼 · 编辑于 2024-04-20 02:06:13

尝试使用^{}或^{}

from lxml import etree

XML_content = """
<node1>
    <text title='book'>
       <div chapter='0'>
          <div id='theNode'>
              <p xml:id="x40">
               A House that has:
                   <p xml:id="x45">- a window;</p>
                   <p xml:id="x46">- a door</p>
                   <p xml:id="x47">- a door</p>
               its a beuatiful house
               </p>
          </div>
       </div>
    </text>
</node1>
"""

XML_tree = etree.fromstring(XML_content)
text = XML_tree.xpath('string(//text[@title="book"]/div/div/p)')
# text = XML_tree.xpath('normalize-space(//text[@title="book"]/div/div/p)')
print(text)

使用string()输出


               A House that has:
                   - a window;
                   - a door
                   - a door
               its a beuatiful house

使用normalize-space()输出

A House that has: - a window; - a door - a door its a beuatiful house

相关问题更多 >

编程相关推荐

热门问题

热门文章

lxml xpath表达式，用于选择给定子节点（包括其子节点）下的所有文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >