在python中检索xml数据

<uniprot created="2010-12-20"> <entry dataset="abc"> <references id="1"> <title>first references</title> <author> <person name="Mr. A"/> <person name="Mr. B"/> <person name="Mr. C"/> </author> <scope> scope 1 for id 1 </scope> <scope> scope 2 for id 1 </scope> <scope> scope 2 for id 1 </scope> </references> <references id="2"> <title>Second references</title> <author> <person name="Mr. D"/> <person name="Mr. E"/> <person name="Mr. F"/> </author> <scope> scope 1 for id 2 </scope> <scope> scope 2 for id 2 </scope> <scope> scope 3 for id 2 </scope> </references> <references id="3"> <title>third references</title> <author> <person name="Mr. G"/> <person name="Mr. H"/> <person name="Mr. I"/> </author> <scope> scope 1 for id 3 </scope> <scope> scope 2 for id 3 </scope> <scope> scope 3 for id 3 </scope> </references> <references id="4"> <title>fourth references</title> <author> <person name="Mr. J"/> <person name="Mr. K"/> <person name="Mr. L"/> </author> <scope> scope 1 for id 4 </scope> <scope> scope 2 for id 4 </scope> <scope> scope 3 for id 4 </scope> </references> </entry> </uniprot>

First Reference Mr A, Mr B, Mr C Scope 1 for id 1, Scope 2 for id 1, Scope 3 for id 1 Second Reference Mr D, Mr E, Mr F Scope 1 for id 2, Scope 2 for id 2, Scope 3 for id 2 Third Reference Mr G, Mr H, Mr I Scope 1 for id 3, Scope 2 for id 3, Scope 3 for id 3 Fourth Reference Mr J, Mr K, Mr L Scope 1 for id 4, Scope 2 for id 4, Scope 3 for id 4

import xml.etree.ElementTree as ET document = ET.parse("recipe.xml") root = document.getroot() title=[] author=[] scope=[] for i in root.getiterator('title'): title.append(i.text) for j in root.getiterator('author'): author.append(j.text) for k in root.getiterator('scope'): scope.append(k.text) for i,j,k in zip(title,author,scope): print i,j,k

2条回答

网友

1楼 · 编辑于 2024-05-29 02:18:17

为此使用LXML和xpath：

import lxml
from lxml.etree import fromstring,tostring
# x has the xml
x = fromstring(x)

def print_references(ref_node):
    authors = " ".join([t for t in ref_node.xpath('author/person/@name')])
    scope = ", ".join([t.text for t in ref_node.xpath('scope')])
    ref = next(iter(ref_node.xpath('@id')),None)
    print "{} Reference\n{}\n{}\n".format(ref, authors, scope.lstrip())

references = x.xpath('//references')
for ref in references:
    print_references(ref)

输出：

1 Reference
Mr. A Mr. B Mr. C
scope 1 for id 1 ,  scope 2 for id 1 ,  scope 2 for id 1

2 Reference
Mr. D Mr. E Mr. F
scope 1 for id 2 ,  scope 2 for id 2 ,  scope 3 for id 2

3 Reference
Mr. G Mr. H Mr. I
scope 1 for id 3 ,  scope 2 for id 3 ,  scope 3 for id 3

4 Reference
Mr. J Mr. K Mr. L
scope 1 for id 4 ,  scope 2 for id 4 ,  scope 3 for id 4

网友

2楼 · 编辑于 2024-05-29 02:18:17

因为作者的名字存储在person标记的name属性中，所以我们也可以使用dict来存储每个reference数据，如下所示：

references = []
for i in root.getiterator('title'):
    reference = {
        'title': i.text,
        'authors': [],
        'scopes': [],    
    }

    for j in root.getiterator('author'):
        for person in root.getiterator('person'):
            reference['authors'].append(person.get('name'))

        for k in root.getiterator('scope'):
            reference['scopes'].append(k.text)

最后你会有这样一个单子：

[
    {
        'title': 'Something',
        'authors': [
            'Author 1',
            'Author 2',
        ],
        'scopes': [
            'scope 1',
            'scope 2',
        ]
    }
]

相关问题更多 >

编程相关推荐

热门问题

热门文章