如何使用Python解析非结构化的XML文件?

2024-05-13 19:16:20 发布

您现在位置:Python中文网/ 问答频道 /正文

如何解析非结构化xml文件?我需要使用elementTree获取患者标签和标题中的数据。在

<?xml version="1.0" encoding="UTF-8"?>
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:hl7-org:v3 CDA.xsd">
<templateId root="2.16.840.1.113883.10.20.22.1.1"/>
<id extension="4b78219a-1d02-4e7c-9870-dc7ce3b8a8fb" root="1.2.840.113619.21.1.3214775361124994304.5.1"/>
<code code="34133-9" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="Summarization of episode note"/>
<title>Summary</title>
<effectiveTime value="20170919160921ddfdsdsdsd31-0400"/>
<confidentialityCode code="N" codeSystem="2.16.840.dwdwddsd1.113883.5.25"/>
<recordTarget>
<patientRole><id extension="0" root="1.2.840.113619.21.1.3214775361124994304.2.1.1.2"/>
<addr use="HP"><streetAddressLine>addd2 </streetAddressLine><city>fgfgrtt</city><state>tr</state><postalCode>121213434</postalCode><country>rere</country></addr>
<patient>
<name><given>fname</given><family>lname</family></name>
<administrativeGenderCode code="F" codeSystem="2.16.840.1.113883.5.1" displayName="Female"/>
<birthTime value="19501025"/>
<maritalStatusCode code="M" codeSystem="2434.16.840.1.143434313883.5.2" displayName="M"/>
<languageCommunication>
<languageCode code="eng"/>
<proficiencyLevelCode nullFlavor="NI"/>
<preferenceInd value="true"/>
</languageCommunication>
</patient>

我要名字,姓,性别和头衔。在


Tags: orgidtitlevalueextensioncoderootv3
2条回答

使用BeautifulSoupbs4和lxml解析器库来废弃xml数据。在

from bs4 import BeautifulSoup

xml_data = '''<?xml version="1.0" encoding="UTF-8"?>
<ClinicalDocument xmlns="urn:hl7-org:v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:hl7-org:v3 CDA.xsd">
<templateId root="2.16.840.1.113883.10.20.22.1.1"/>
<id extension="4b78219a-1d02-4e7c-9870-dc7ce3b8a8fb" root="1.2.840.113619.21.1.3214775361124994304.5.1"/>
<code code="34133-9" codeSystem="2.16.840.1.113883.6.1" codeSystemName="LOINC" displayName="Summarization of episode note"/>
<title>Summary</title>
<effectiveTime value="20170919160921ddfdsdsdsd31-0400"/>
<confidentialityCode code="N" codeSystem="2.16.840.dwdwddsd1.113883.5.25"/>
<recordTarget>
<patientRole><id extension="0" root="1.2.840.113619.21.1.3214775361124994304.2.1.1.2"/>
<addr use="HP"><streetAddressLine>addd2 </streetAddressLine><city>fgfgrtt</city><state>tr</state><postalCode>121213434</postalCode><country>rere</country></addr>
<patient>
<name><given>fname</given><family>lname</family></name>
<administrativeGenderCode code="F" codeSystem="2.16.840.1.113883.5.1" displayName="Female"/>
<birthTime value="19501025"/>
<maritalStatusCode code="M" codeSystem="2434.16.840.1.143434313883.5.2" displayName="M"/>
<languageCommunication>
<languageCode code="eng"/>
<proficiencyLevelCode nullFlavor="NI"/>
<preferenceInd value="true"/>
</languageCommunication>
</patient>'''


soup = BeautifulSoup(xml_data, "lxml")

title = soup.find("title")
print(title.text.strip())

patient = soup.find("patient")
given = patient.find("given").text.strip()
family = patient.find("family").text.strip()
gender = patient.find("administrativegendercode")['displayname'].strip()

print(given)
print(family)
print(gender)

开/关:

^{pr2}$

安装库依赖项:

pip3 install beautifulsoup4==4.7.1
pip3 install lxml==4.3.3

或者您可以简单地使用lxml。下面是我使用的教程:https://lxml.de/tutorial.html 但它应该类似于:

from lxml import etree
root = etree.Element("patient")
print(root.find("given"))
print(root.find("family"))
print(root.find("give"))

相关问题 更多 >