我想在下面的xml文件中搜索关键字
<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 /home/pisenberg/grobid/grobid-0.6.1/grobid-home/schemas/xsd/Grobid.xsd"
xmlns:xlink="http://www.w3.org/1999/xlink">
<text xml:lang="en">
<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>text before ref<ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b46">47,</ref><ref type="bibr" target="#b66">67]</ref>text after ref</p></div>
</body>
</text>
</TEI>
我的代码:
from lxml import etree
import os
import csv
from shutil import copyfile
import pandas as pd
teins = {'tei':'http://www.tei-c.org/ns/1.0'} #info on the xml structure
searchterm = "before" #put your search term in lowercase
filepath = "./test.xml"
with open(filepath,'r', encoding='utf8') as file:
try:
tree = etree.parse(file)
root = etree.XML(etree.tostring(tree))
textNode = root.find(".//tei:text",teins)
for elem in textNode.iter():
if elem.text:
if searchterm.lower() in elem.text.lower():
print(elem.text)
except Exception as e: # work on python 3.x
print(str(e))
如果我搜索“before”,我可以得到结果,它会打印“before”。但是,如果我搜索“after”,它将不会打印任何内容
我觉得textNode.iter()
无法在<ref>
标记之后到达<p>
标记内的文本。
我想知道有人知道怎么解决这个问题吗
任何帮助都将不胜感激
目前没有回答
相关问题 更多 >
编程相关推荐