Python XML提取循环
我有一段脚本,感觉快完成了。我找到了一个简单的方法来写它,但我不知道怎么把它变成一个循环。
我正在从一个xml文件中提取数据,这个文件的格式是这样的:
<Trackpoint>
<Time>2012-01-17T11:44:35Z</Time>
<Position>
<LatitudeDegrees>51.920211518183351</LatitudeDegrees>
<LongitudeDegrees>26.706042898818851</LongitudeDegrees>
</Position>
<AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
<Trackpoint>
<Time>2012-01-17T11:45:21Z</Time>
<Position>
<LatitudeDegrees>51.920243117958307</LatitudeDegrees>
<LongitudeDegrees>26.706140967085958</LongitudeDegrees>
</Position>
<AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
我可以用下面的代码来获取,比如说纬度:
from xml.dom.minidom import parse
doc = parse('/Users/name/Documents/GPS/gps.tcx')
lat = doc.getElementsByTagName("LatitudeDegrees")
time = doc.getElementsByTagName("Time")
trackpoint = doc.getElementsByTagName("Trackpoint")
for x in lat:
print(x.firstChild.data)
但是我想要按顺序获取纬度、经度和时间。
我猜我需要使用
for x in trackpoint
但我目前想到的实现方式是这样的。
count = 0
n = len(trackpoint)
while count < n:
print(time[count].firstChild.data)
print(lat[count].firstChild.data)
print(lon[count].firstChild.data)
count += 1
有没有人有什么想法?我觉得我只是缺少了一些很简单的东西!
3 个回答
0
也许你在寻找zip这个功能:
import xml.dom.minidom as minidom
import os
doc = minidom.parse(os.path.expanduser('~/test/gps.tcx'))
latitudes = doc.getElementsByTagName("LatitudeDegrees")
longitudes = doc.getElementsByTagName("LongitudeDegrees")
time = doc.getElementsByTagName("Time")
trackpoint = doc.getElementsByTagName("Trackpoint")
for t,lat,lon in zip(time,latitudes,longitudes):
print(t.firstChild.data, lat.firstChild.data, lon.firstChild.data)
2
我通常觉得用 ElementTree 来解析 XML 文件更容易理解,也更简单。例如,你可以用三行代码读取纬度。
import xml.etree.ElementTree as etree
s="""<root>
<Trackpoint>
<Time>2012-01-17T11:44:35Z</Time>
<Position>
<LatitudeDegrees>51.920211518183351</LatitudeDegrees>
<LongitudeDegrees>26.706042898818851</LongitudeDegrees>
</Position>
<AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
<Trackpoint>
<Time>2012-01-17T11:45:21Z</Time>
<Position>
<LatitudeDegrees>51.920243117958307</LatitudeDegrees>
<LongitudeDegrees>26.706140967085958</LongitudeDegrees>
</Position>
<AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
</root>
"""
root = etree.fromstring(s)
for point in root:
print point.find('Position/LatitudeDegrees').text
假设你想把每个点转换成一个字典。
varnames = [
('Position/LatitudeDegrees', 'lat'),
('Position/LongitudeDegrees', 'lon'),
('Time', 'time'),
('AltitudeMeters', 'alt')
]
points = []
for pointelem in etree.fromstring(s):
point = {}
for tag, varname in varnames:
point[varname] = pointelem.find(tag).text
points.append(point)
import pprint
pprint.pprint(points)
输出结果:
[{'alt': '-43.6026611328125',
'lat': '51.920211518183351',
'lon': '26.706042898818851',
'time': '2012-01-17T11:44:35Z'},
{'alt': '-43.6026611328125',
'lat': '51.920243117958307',
'lon': '26.706140967085958',
'time': '2012-01-17T11:45:21Z'}]
4
首先,找到所有的 Trackpoint
元素,然后对它们进行循环。接着,在循环中找到每个 Trackpoint
元素想要的子元素:
from xml.dom.minidom import parse
doc = parse('in.tcx')
trackpoints = doc.getElementsByTagName("Trackpoint")
result = []
elements = ('Time', 'LatitudeDegrees', 'LongitudeDegrees')
for tp in trackpoints:
obj = {}
for el in elements:
obj[el] = tp.getElementsByTagName(el)[0].firstChild.data
result.append(obj)
print(result)