处理XML文档中的缺失元素

1 投票
2 回答
1059 浏览
提问于 2025-04-18 00:20

我有一些XML数据,其中一部分看起来像这样:

<osgb:departedMember>
<osgb:DepartedFeature fid='osgb4000000024942964'>
<osgb:boundedBy>
<gml:Box srsName='osgb:BNG'>
<gml:coordinates>188992.575,55981.029 188992.575,55981.029</gml:coordinates>
</gml:Box>
</osgb:boundedBy>
<osgb:theme>Road Network</osgb:theme>
<osgb:reasonForDeparture>Deleted</osgb:reasonForDeparture>
<osgb:deletionDate>2014-02-19</osgb:deletionDate>
</osgb:DepartedFeature>
</osgb:departedMember>

我正在用以下方式解析它:

departedmembers = doc_root.findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}departedMember')
for departedMember in departedMembers:
    findWhat='{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}DepartedFeature'
    fid = int(departedmember.find(findWhat).attrib['fid'].replace('osgb', ''))
    theme=departedmember[0].findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}theme')[0].text    
    reason=departedmember[0].findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}reasonForDeparture')[0].text
    date=departedmember[0].findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}deletionDate')[0].text

有时候,原因或者日期,或者两者都为空,也就是说,这些元素缺失了,而不仅仅是内容为空。根据XSD的规定,这是合法的,但在尝试获取一个不存在的元素的文本时,我会遇到属性错误。为了处理这个问题,我把获取原因和日期的代码放在了try和except块里,像这样:

try:
    date=departedmember[0].findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}deletionDate')[0].text
except:
    pass

这样做是有效的,但我不喜欢这样使用except/pass,所以我在想有没有更好的方法来解析这样的文档,因为其中一些元素是可选的。

2 个回答

2

没错,问题不在于搜索的方法,而是在没有找到任何元素时,如何引用返回的结果。你可以这样写你的代码:

results = departedmember[0].findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}deletionDate')

if results:
    date = results[0].text
else:
    # there is no element,
    # do what you want in this case
5

因为你只对findall的第一个元素感兴趣,所以可以把findall(x)[0]换成find(x)。另外,如果你想避免使用try/except这种结构,可以用三元运算符。

departedmembers = doc_root.findall('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}departedMember')
for departedMember in departedMembers:
    ...
    date = departedmember[0].find('{http://www.ordnancesurvey.co.uk/xml/namespaces/osgb}deletionDate')
    date = None if date == None else date.text # Considering you want to set the element to None if it was not found

撰写回答