解析Google个人资料XML源
我正在尝试解析谷歌API提供的XML数据,这些数据是关于他们的个人资料的。这个XML大概长这样:
<ns0:feed ns1:etag="W/"Dk8BQ3o8eCt7I2A9WhRUE0g."">
<ns0:updated>2012-01-23T21:40:52.470Z</ns0:updated>
<ns0:category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#profile"/>
<ns0:id>domain.com</ns0:id>
<ns0:generator uri="http://www.google.com/m8/feeds" version="1.0">Contacts</ns0:generator>
<ns0:author>
<ns0:name>domain.com</ns0:name>
</ns0:author>
<ns0:link href="http://www.google.com/" rel="alternate" type="text/html"/>
<ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full" rel="http://schemas.google.com/g/2005#feed" type="application/atom+xml"/>
<ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full/batch" rel="http://schemas.google.com/g/2005#batch" type="application/atom+xml"/>
<ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full?max-results=300" rel="self" type="application/atom+xml"/>
<ns2:startIndex>1</ns2:startIndex>
<ns2:itemsPerPage>300</ns2:itemsPerPage>
<ns0:entry ns1:etag=""URRaQR4KTit7I2A4"">
<ns0:category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/contact/2008#profile"/>
<ns0:id>http://www.google.com/m8/feeds/profiles/domain/domain.com/full/pname</ns0:id>
<ns1:name>
<ns1:familyName>Name</ns1:familyName>
<ns1:fullName>Persobn Name</ns1:fullName>
<ns1:givenName>Robert</ns1:givenName>
</ns1:name>
<ns0:updated>2012-01-23T21:40:52.597Z</ns0:updated>
<ns1:organization primary="true" rel="http://schemas.google.com/g/2005#work">
<ns1:orgTitle>JobField</ns1:orgTitle>
<ns1:orgDepartment>DepartmentField</ns1:orgDepartment>
<ns1:orgName>CompanyField</ns1:orgName>
</ns1:organization>
<ns3:status indexed="true"/>
<ns0:title>Person Name</ns0:title>
<ns0:link href="https://www.google.com/m8/feeds/photos/profile/domain.com/pname" rel="http://schemas.google.com/contacts/2008/rel#photo" type="image/*"/>
<ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full/pname" rel="self" type="application/atom+xml"/>
<ns0:link href="https://www.google.com/m8/feeds/profiles/domain/domain.com/full/pname" rel="edit" type="application/atom+xml"/>
<ns1:email address="pname@gapps.domain.com" rel="http://schemas.google.com/g/2005#other"/>
<ns1:email address="pname@domain.com" primary="true" rel="http://schemas.google.com/g/2005#other"/>
<ns4:edited>2012-01-23T21:40:52.597Z</ns4:edited>
</ns0:entry>
我只需要名字字段和在组织命名空间下的字段。我的问题是,怎么才能正确地做到这一点。我之前从来没有解析过XML,我看到有人提到Element Tree、Stone Soup、sax等等。我目前有的代码是:
import xml.dom.minidom
def explore_children(nodelist,inset):
for subnode in nodelist:
if (subnode.nodeType == subnode.ELEMENT_NODE):
which = subnode.tagName
called = "" # in case it's not an img or title
if (which == "img"): called = subnode.getAttribute("name")
if (which == "title"): called = subnode.getAttribute("text")
print inset + which + " " + called
explore_children(subnode.childNodes," "+inset)
if (subnode.nodeType == subnode.TEXT_NODE):
pass
fh = open("c:\\python27\\junk.xml","r")
doc = xml.dom.minidom.parse(fh)
explore_children(doc.childNodes,"")
这段代码会把所有的属性名称打印到控制台,并显示任何带有名称或文本的内容。我想要的是把一条记录中的所有名字和组织的文本放在一行上,但我完全搞不清楚该怎么做,任何帮助都非常感谢。
1 个回答
0
你不需要手动去做这些,可以使用谷歌的gdata库:
pip install gdata
然后:
>>> from gdata.contacts import client
>>> gd_client = client.ContactsClient(source='YOUR_APPLICATION_NAME', domain='place.com')
>>> profile = gd_client.GetProfile('https://www.google.com/m8/feeds/profiles/domain/place.com/full/pname')
更多示例请查看:http://code.google.com/googleapps/domain/profiles/developers_guide.html(所有示例中的Python标签)