解析Google个人资料XML源

Question

我正在尝试解析谷歌API提供的XML数据，这些数据是关于他们的个人资料的。这个XML大概长这样：

  <ns0:feed ns1:etag="W/"Dk8BQ3o8eCt7I2A9WhRUE0g."">
 <ns0:updated>2012-01-23T21:40:52.470Z</ns0:updated>
 <ns0:category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#profile"/>
 <ns0:id>domain.com</ns0:id>
 <ns0:generator uri="http://www.google.com/m8/feeds" version="1.0">Contacts</ns0:generator>
<ns0:author>
 <ns0:name>domain.com</ns0:name>
 </ns0:author>
 <ns0:link href="http://www.google.com/" rel="alternate" type="text/html"/>
 <ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full" rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml"/>
 <ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full/batch" rel="http://schemas.google.com/g/2005#batch" type="application/atom+xml"/>
 <ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full?max-results=300" rel="self" type="application/atom+xml"/>
 <ns2:startIndex>1</ns2:startIndex>
 <ns2:itemsPerPage>300</ns2:itemsPerPage>
<ns0:entry ns1:etag=""URRaQR4KTit7I2A4"">
 <ns0:category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#profile"/>
 <ns0:id>http://www.google.com/m8/feeds/profiles/domain/domain.com/full/pname</ns0:id>
<ns1:name>
 <ns1:familyName>Name</ns1:familyName>
 <ns1:fullName>Persobn Name</ns1:fullName>
 <ns1:givenName>Robert</ns1:givenName>
 </ns1:name>
 <ns0:updated>2012-01-23T21:40:52.597Z</ns0:updated>
<ns1:organization primary="true" rel="http://schemas.google.com/g/2005#work">
 <ns1:orgTitle>JobField</ns1:orgTitle>
 <ns1:orgDepartment>DepartmentField</ns1:orgDepartment>
 <ns1:orgName>CompanyField</ns1:orgName>
 </ns1:organization>
 <ns3:status indexed="true"/>
 <ns0:title>Person Name</ns0:title>
 <ns0:link href="https://www.google.com/m8/feeds/photos/profile/domain.com/pname" rel="http://schemas.google.com/contacts/2008/rel#photo" type="image/*"/>
 <ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full/pname" rel="self" type="application/atom+xml"/>
 <ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full/pname" rel="edit" type="application/atom+xml"/>
 <ns1:email address="pname@gapps.domain.com" rel="http://schemas.google.com/g/2005#other"/>
 <ns1:email address="pname@domain.com" primary="true" rel="http://schemas.google.com/g/2005#other"/>
 <ns4:edited>2012-01-23T21:40:52.597Z</ns4:edited>
 </ns0:entry>

我只需要名字字段和在组织命名空间下的字段。我的问题是，怎么才能正确地做到这一点。我之前从来没有解析过XML，我看到有人提到Element Tree、Stone Soup、sax等等。我目前有的代码是：

import xml.dom.minidom

def explore_children(nodelist,inset):
    for subnode in nodelist:
        if (subnode.nodeType == subnode.ELEMENT_NODE):
            which = subnode.tagName
            called = "" # in case it's not an img or title
            if (which == "img"): called = subnode.getAttribute("name")
            if (which == "title"): called = subnode.getAttribute("text")
            print inset + which + " " + called
            explore_children(subnode.childNodes,"  "+inset)
        if (subnode.nodeType == subnode.TEXT_NODE):
            pass

fh = open("c:\\python27\\junk.xml","r")
doc = xml.dom.minidom.parse(fh)
explore_children(doc.childNodes,"")

这段代码会把所有的属性名称打印到控制台，并显示任何带有名称或文本的内容。我想要的是把一条记录中的所有名字和组织的文本放在一行上，但我完全搞不清楚该怎么做，任何帮助都非常感谢。

数据提取属性名称 xml解析 sax解析 google api element tree 组织命名空间

解析Google个人资料XML源

1 个回答

撰写回答