在Python中解析包含XML条目的text/csv文件

"<entry xmlns=""http://www.w3.org/2005/Atom"" xmlns:gnip=""http://www.gnip.com/schemas/2010""> <id>tag:search.twitter.com,2005:157796632933576704</id> <published>2012-01-13T12:10:23+00:00</published> <updated>2012-01-13T12:10:23+00:00</updated> <summary type=""html"">RT @sprice54: If you rearrange the words ""Debit card"" you can spell ""Bad Credit""</summary> <link rel=""alternate"" type=""text/html"" href=""http://twitter.com/GCordivari/statuses/157796632933576704""/> <source> <link rel=""self"" type=""application/json"" href=""https://stream.twitter.com/1/statuses/filter.json""/> <title>Twitter - Stream - Track</title> <updated>2012-01-13T12:10:25Z</updated> </source> <service:provider xmlns:service=""http://activitystrea.ms/service-provider""> <name>Twitter</name> <uri>http://www.twitter.com/</uri> <icon/> </service:provider> <contributor> <name>Steve Price</name> <uri>http://www.twitter.com/sprice54</uri> </contributor> <link rel=""via"" type=""text/html"" href=""http://twitter.com/sprice54/statuses/157748462321012736""/> <title>George Cordivari shared: Steve Price posted a note on Twitter</title> <category term=""StatusShared"" label=""Status Shared""/> <category term=""NoteShared"" label=""Note Shared""/> <activity:verb xmlns:activity=""http://activitystrea.ms/spec/1.0/"">http://activitystrea.ms/schema/1.0/share</activity:verb> <activity:object xmlns:activity=""http://activitystrea.ms/spec/1.0/""> <activity:object-type>http://activitystrea.ms/schema/1.0/note</activity:object-type> <id>object:search.twitter.com,2005:157796632933576704</id> <content type=""html"">RT @sprice54: If you rearrange the words ""Debit card"" you can spell ""Bad Credit""</content> <link rel=""alternate"" type=""text/html"" href=""http://twitter.com/GCordivari/statuses/157796632933576704""/> </activity:object> <author> <name>George Cordivari</name> <uri>http://www.twitter.com/GCordivari</uri> </author> <activity:author xmlns:activity=""http://activitystrea.ms/spec/1.0/""> <activity:object-type>http://activitystrea.ms/schema/1.0/person</activity:object-type> <gnip:friends xmlns:gnip=""http://www.gnip.com/schemas/2010"" followersCount=""37"" followingCount=""61""/> <link rel=""alternate"" type=""text/html"" length=""0"" href=""http://www.twitter.com/GCordivari""/> <link rel=""avatar"" href=""http://a0.twimg.com/profile_images/1670548060/274805_1268643462_1179159089_n_normal.jpg""/> <id>http://www.twitter.com/GCordivari</id> </activity:author> <activity:actor xmlns:activity=""http://activitystrea.ms/spec/1.0/""> <activity:object-type>http://activitystrea.ms/schema/1.0/person</activity:object-type> <gnip:friends xmlns:gnip=""http://www.gnip.com/schemas/2010"" followersCount=""37"" followingCount=""61""/> <gnip:stats xmlns:gnip=""http://www.gnip.com/schemas/2010"" activityCount=""370"" upstreamId=""id:twitter.com:427031045""/> <link rel=""alternate"" type=""text/html"" length=""0"" href=""http://www.twitter.com/GCordivari""/> <link rel=""avatar"" href=""http://a0.twimg.com/profile_images/1670548060/274805_1268643462_1179159089_n_normal.jpg""/> <id>http://www.twitter.com/GCordivari</id> <os:location xmlns:os=""http://ns.opensocial.org/2008/opensocial"">Drexel Hell</os:location> <os:aboutMe xmlns:os=""http://ns.opensocial.org/2008/opensocial"">This is the way I live. #CirocInMyCupIDGAF #CloudNine #FollowMeLikeTheLeader </os:aboutMe> </activity:actor> <gnip:twitter_entities xmlns:gnip=""http://www.gnip.com/schemas/2010""> <user_mentions> <user_mention start=""3"" end=""12""> <id>255347428</id> <name>Steve Price</name> <screen_name>sprice54</screen_name> </user_mention> </user_mentions> </gnip:twitter_entities> <gnip:matching_rules> <gnip:matching_rule rel=""inferred"">""debit card""</gnip:matching_rule> </gnip:matching_rules> </entry>"

3条回答

网友

1楼 · 编辑于 2024-05-17 19:46:09

Python有很多非常好的xml解析工具。beauthoulsoup非常流行，因为它有一个简单的api。http://www.crummy.com/software/BeautifulSoup/doc/

lmxml是一个非常好的库，用于非常快速的xml解析，但是需要libxml

网上有很多教程逐步解释了用python解析xml的基本原理。 http://www.learningpython.com/2008/05/07/elegant-xml-parsing-using-the-elementtree-module/

网友

2楼 · 编辑于 2024-05-17 19:46:09

下面是docs中的示例，您可以如何提取所有命名元素，比如contributor，并将它们导出到一个新的XML文档中。在

import xml.dom.minidom as minidom

#open the input csv/xml file
inputPath = '/path/to/xml.csv'
xml_csv = open(inputPath)

#open a output file in write mode
outputPath = '/path/to/contributors.xml'
outxml = open(outputPath,'w')

#create a new xml document and top level element
impl = minidom.getDOMImplementation()
newxml = impl.createDocument(None,'contributors',None)
top = newxml.documentElement

#loop through each line in the file splitting on commas
for line in xml_csv:
    xmlFields = line.split(',')

    for fldxml in xmlFields:
        #double double quotes caused the parser to choke, I'm replacing them here
        fldxml = fldxml.replace('""','"')

        #parse the xml data from each field and 
        #find all contributor elements under the top level
        dom = minidom.parseString(xmlfld)
        contributors = entry.getElementByTagName('contributor')

        #add each contributor to the new xml document
        for contributor in contributors:
            top.appendChild(contributor)

#write out the new xml contributors document in pretty XML
outxml.write(newxml.toprettyxml())
outxml.close()

网友

3楼 · 编辑于 2024-05-17 19:46:09

使用csv模块解析csv，使用elementtree模块解析xml字段。在

如果xml数据与RSS兼容，请查看feedparser。在

相关问题更多 >

编程相关推荐

热门问题

热门文章