我试图用python分析XML文件。我需要获取xml数据作为数据帧
import pandas as pd
import xml.etree.ElementTree as et
def parse_XML(xml_file, df_cols):
xtree = et.parse(xml_file)
xroot = xtree.getroot()
rows = []
for node in xroot:
res = []
res.append(node.attrib.get(df_cols[0]))
for el in df_cols[1:]:
if node is not None and node.find(el) is not None:
res.append(node.find(el).text)
else:
res.append(None)
rows.append({df_cols[i]: res[i]
for i, _ in enumerate(df_cols)})
out_df = pd.DataFrame(rows, columns=df_cols)
return out_df
parse_XML('/Users/newuser/Desktop/TESTRATP/arrets.xml', ["Name","gml"])
但我正在低于数据帧
Name gml
0 None None
1 None None
2 None None
我的XML文件是:
<?xml version="1.0" encoding="UTF-8"?>
<PublicationDelivery version="1.09:FR-NETEX_ARRET-2.1-1.0" xmlns="http://www.netex.org.uk/netex" xmlns:core="http://www.govtalk.gov.uk/core" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:ifopt="http://www.ifopt.org.uk/ifopt" xmlns:siri="http://www.siri.org.uk/siri" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.netex.org.uk/netex">
<PublicationTimestamp>2020-08-05T06:00:01+00:00</PublicationTimestamp>
<ParticipantRef>transport.data.gouv.fr</ParticipantRef>
<dataObjects>
<GeneralFrame id="FR:GeneralFrame:NETEX_ARRET:" version="any">
<members>
<Quay id="FR:Quay:zenbus_StopPoint_SP_351400003_LOC:" version="any">
<Name>ST FELICIEN - Centre</Name>
<Centroid>
<Location>
<gml:pos srsName="EPSG:2154">828054.2068251468 6444393.512041969</gml:pos>
</Location>
</Centroid>
<TransportMode>bus</TransportMode>
</Quay>
<Quay id="FR:Quay:zenbus_StopPoint_SP_361350004_LOC:" version="any">
<Name>ST FELICIEN - Chemin de Juny</Name>
<Centroid>
<Location>
<gml:pos srsName="EPSG:2154">828747.3172982805 6445226.100290826</gml:pos>
</Location>
</Centroid>
<TransportMode>bus</TransportMode>
</Quay>
<Quay id="FR:Quay:zenbus_StopPoint_SP_343500005_LOC:" version="any">
<Name>ST FELICIEN - Darone</Name>
<Centroid>
<Location>
<gml:pos srsName="EPSG:2154">829036.2709757038 6444724.878001894</gml:pos>
</Location>
</Centroid>
<TransportMode>bus</TransportMode>
</Quay>
<Quay id="FR:Quay:zenbus_StopPoint_SP_359440004_LOC:" version="any">
<Name>ST FELICIEN - Col de Fontayes</Name>
<Centroid>
<Location>
<gml:pos srsName="EPSG:2154">829504.7993360173 6445490.57188837</gml:pos>
</Location>
</Centroid>
<TransportMode>bus</TransportMode>
</Quay>
</members>
</GeneralFrame>
</dataObjects>
</PublicationDelivery>
我在这里给了您xml文件的一小部分。我无法提供完整的XML文件,因为它超出了stackoverflow中的字符限制。我想知道为什么我得到了上面的输出,我不知道我的错误在哪里。我是新手,有人能帮我吗?多谢各位
我的方法是避免xml解析,通过使用
xmlplain
从xml生成JSON直接切换到pandas
输出
使用xml读取的替代解决方案
该列表只是您希望作为“根”导航到的标记。 之后需要清理列名
相关问题 更多 >
编程相关推荐