我需要检查XML文件中所有text()
节点中的所有单词。我使用XPath//text()
来选择文本节点,使用regex来选择单词。如果单词存在于一组关键字中,我需要用一些东西替换它并更新XML
通常设置元素的文本是使用.text
完成的,但是{.text
只会更改第一个子文本节点。在mixed content element中,其他文本节点实际上是它前面兄弟节点的.tail
如何更新所有文本节点
在下面的简化示例中,我只是尝试将匹配的关键字用方括号括起来
输入XML
<doc>
<para>I think the only card she has <gotcha>is the</gotcha> Lorem card. We have so many things that we have to do
better... and certainly ipsum is one of them. When other <gotcha>websites</gotcha> give you text, they're not
sending the best. They're not sending you, they're <gotcha>sending words</gotcha> that have lots of problems
and they're <gotcha>bringing</gotcha> those problems with us. They're bringing mistakes. They're bringing
misspellings. They're typists… And some, <gotcha>I assume</gotcha>, are good words.</para>
</doc>
所需输出
<doc>
<para>I think [the] only card she has <gotcha>[is] [the]</gotcha> Lorem card. We have so many things that we have to do
better... and certainly [ipsum] [is] one of them. When other <gotcha>websites</gotcha> give you text, they're not
sending [the] [best]. They're not sending you, they're <gotcha>sending words</gotcha> that have lots of [problems]
and they're <gotcha>bringing</gotcha> those [problems] with us. They're bringing [mistakes]. They're bringing
misspellings. They're typists… And some, <gotcha>I assume</gotcha>, are good words.</para>
</doc>
我在文档中找到了这个解决方案的关键:Using XPath to find text
特别是_ElementUnicodeResult的
is_text
和is_tail
属性使用这些属性,我可以判断是否需要更新父级_Element的
.text
或.tail
属性这一点一开始理解起来有点棘手,因为当您在文本节点(
_ElementUnicodeResult
)上使用getparent()
时,前面的同级是作为父节点返回的;不是真正的父母示例
Python
输出(转储到控制台)
相关问题 更多 >
编程相关推荐