Python 3.x 中类似 plistlib 的 XML 解析?
我有一些GPS数据,存储在一个.tcx文件里。这个文件其实是一个xml格式的文件(下面是文件开头的内容)。
<?xml version="1.0" encoding="utf-8"?>
<TrainingCenterDatabase xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tp1="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpx="http://www.topografix.com/GPX/1/1" xsi:schemaLocation="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd">
<Activities>
<Activity Sport="Other">
<Id>2012-01-17T11:44:35Z</Id>
<Lap StartTime="2012-01-17T11:44:35Z">
<TotalTimeSeconds>0</TotalTimeSeconds>
<DistanceMeters>0</DistanceMeters>
<Calories>0</Calories>
<Intensity>Active</Intensity>
<TriggerMethod>Manual</TriggerMethod>
<Track>
<Trackpoint>
<Time>2012-01-17T11:44:35Z</Time>
<Position>
<LatitudeDegrees>59.720211518183351</LatitudeDegrees>
我之前接触过的类似东西是苹果的.plist文件,它们也用类似的格式,不过里面的信息是放在一个<dictionary>
标签里的,我记得是这样的。
下面的代码会让我得到嵌套的字典...
import plistlib
pl = plistlib.readPlist('/Users/name/Documents/file.plist')
for sub_dict in pl:
print(sub_dict['keyA'])
print(sub_dict['keyD'])
print(sub_dict['keyE'])
print(sub_dict['keyG'])
我知道有xml.dom.minidom、etree和lxml这些工具,但我在弄清楚怎么得到和上面plistlib模块一样的输出时遇到了困难。
我的最终目标是能够把这两个数据集中的某些键合并在一起。一步一步来吧...
编辑 -----------------
我已经找到了一些可以用的方法:
from xml.dom.minidom import parse
doc = parse('/Users/name/Documents/GPS/gps.tcx')
lat = doc.getElementsByTagName("LatitudeDegrees")
time = doc.getElementsByTagName("Time")
for x in lat:
print(x.firstChild.data)
1 个回答
1
我需要给你发的XML加上结束标签,这样lxml解析器才能正确解析它。完成这个后,就可以通过调用 doc.xpath
来提取时间和纬度数据。
import lxml.etree as ET
import io
content='''<?xml version="1.0" encoding="utf-8"?>
<TrainingCenterDatabase xmlns="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:tp1="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpx="http://www.topografix.com/GPX/1/1" xsi:schemaLocation="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2 http://www.garmin.com/xmlschemas/TrainingCenterDatabasev2.xsd">
<Activities>
<Activity Sport="Other">
<Id>2012-01-17T11:44:35Z</Id>
<Lap StartTime="2012-01-17T11:44:35Z">
<TotalTimeSeconds>0</TotalTimeSeconds>
<DistanceMeters>0</DistanceMeters>
<Calories>0</Calories>
<Intensity>Active</Intensity>
<TriggerMethod>Manual</TriggerMethod>
<Track>
<Trackpoint>
<Time>2012-01-17T11:44:35Z</Time>
<Position>
<LatitudeDegrees>59.920211518183351</LatitudeDegrees>
</Position>
</Trackpoint>
</Track>
</Lap>
</Activity>
</Activities>
</TrainingCenterDatabase>
'''
doc = ET.fromstring(content)
ns = {'ns':'http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2'}
for trackpoint in doc.xpath('//ns:Trackpoint', namespaces = ns):
print(trackpoint.xpath('(ns:Time|ns:Position/ns:LatitudeDegrees)/text()', namespaces = ns))
这样就能得到
['2012-01-17T11:44:35Z', '59.920211518183351']