我有一个xml文件:
<uniprot created="2010-12-20">
<entry dataset="abc">
<references id="1">
<title>first references</title>
<author>
<person name="Mr. A"/>
<person name="Mr. B"/>
<person name="Mr. C"/>
</author>
<scope> scope 1 for id 1 </scope>
<scope> scope 2 for id 1 </scope>
<scope> scope 2 for id 1 </scope>
</references>
<references id="2">
<title>Second references</title>
<author>
<person name="Mr. D"/>
<person name="Mr. E"/>
<person name="Mr. F"/>
</author>
<scope> scope 1 for id 2 </scope>
<scope> scope 2 for id 2 </scope>
<scope> scope 3 for id 2 </scope>
</references>
<references id="3">
<title>third references</title>
<author>
<person name="Mr. G"/>
<person name="Mr. H"/>
<person name="Mr. I"/>
</author>
<scope> scope 1 for id 3 </scope>
<scope> scope 2 for id 3 </scope>
<scope> scope 3 for id 3 </scope>
</references>
<references id="4">
<title>fourth references</title>
<author>
<person name="Mr. J"/>
<person name="Mr. K"/>
<person name="Mr. L"/>
</author>
<scope> scope 1 for id 4 </scope>
<scope> scope 2 for id 4 </scope>
<scope> scope 3 for id 4 </scope>
</references>
</entry>
</uniprot>
我希望此xml中的所有引用采用特定格式: 输出:
First Reference
Mr A, Mr B, Mr C
Scope 1 for id 1, Scope 2 for id 1, Scope 3 for id 1
Second Reference
Mr D, Mr E, Mr F
Scope 1 for id 2, Scope 2 for id 2, Scope 3 for id 2
Third Reference
Mr G, Mr H, Mr I
Scope 1 for id 3, Scope 2 for id 3, Scope 3 for id 3
Fourth Reference
Mr J, Mr K, Mr L
Scope 1 for id 4, Scope 2 for id 4, Scope 3 for id 4
我已经写了我的代码,我可以得到正确格式的标题值,但我不能得到作者信息,特别是每个条目。你知道吗
import xml.etree.ElementTree as ET
document = ET.parse("recipe.xml")
root = document.getroot()
title=[]
author=[]
scope=[]
for i in root.getiterator('title'):
title.append(i.text)
for j in root.getiterator('author'):
author.append(j.text)
for k in root.getiterator('scope'):
scope.append(k.text)
for i,j,k in zip(title,author,scope):
print i,j,k
为此使用LXML和xpath:
输出:
因为作者的名字存储在
person
标记的name
属性中,所以我们也可以使用dict来存储每个reference
数据,如下所示:最后你会有这样一个单子:
相关问题 更多 >
编程相关推荐