如何在Python中获取没有根节点的XML
给定以下数据:
<rdf:RDF
xmlns="http://purl.org/rss/1.0/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
xmlns:dc="http://purl.org/dc/elements/1.
<channel rdf:about="http://www.gmanews.tv/">
<title>GMANews.TV</title>
<description> GMA News.tv bring you the latest news from GMA News teams and highlights of your favorite shows. Subscribe now and stay up-to-date with GMA News.tv.</description>
<link>http://www.gmanews.tv/</link>
</channel>
<item rdf:about="http://www.gmanews.tv/story/232365/world/magnitude-59-quake-hits-chilean-coast-no-damage">
<dc:format>text/html</dc:format>
<dc:date>2011-09-14T16:39:22+08:00</dc:date>
<dc:source>http://www.gmanews.tv/story/232365/world/magnitude-59-quake-hits-chilean-coast-no-damage </dc:source>
<title><![CDATA[Magnitude-5.9 quake hits Chilean coast, no damage]]></title>
<link>http://www.gmanews.tv/story/232365/world/magnitude-59-quake-hits-chilean-coast-no-damage </link>
<description><![CDATA[SANTIAGO - A magnitude 5.9 quake hit just off the coast of central Chile early on Wednesday, but the state emergency office said there were no reports of damage.]]></description>
</item>
<item rdf:about="http://www.gmanews.tv/story/232362/nation/house-minority-blames-pnoys-advisers-for-legal-setbacks">
<dc:format>text/html</dc:format>
<dc:date>2011-09-14T16:04:51+08:00</dc:date>
<dc:source>http://www.gmanews.tv/story/232362/nation/house-minority-blames-pnoys-advisers-for-legal-setbacks </dc:source>
<title><![CDATA[House minority blames PNoy's advisers for legal 'setbacks']]></title>
<link>http://www.gmanews.tv/story/232362/nation/house-minority-blames-pnoys-advisers-for-legal-setbacks </link>
<description><![CDATA[Members of the opposition at the House of Representatives on Wednesday blamed President Benigno Aquino III's advisers for the various legal "setbacks" suffered by his administration and advised him to consider replacing some of his advisers.]]></description>
</item>
<item rdf:about="http://www.gmanews.tv/story/232356/nation/ex-sharia-judge-20-others-may-testify-in-poll-fraud-probe">
<dc:format>text/html</dc:format>
<dc:date>2011-09-14T15:19:45+08:00</dc:date>
<dc:source>http://www.gmanews.tv/story/232356/nation/ex-sharia-judge-20-others-may-testify-in-poll-fraud-probe </dc:source>
<title><![CDATA[Ex-Shari'a judge, 20 others may testify in poll fraud probe]]></title>
<link>http://www.gmanews.tv/story/232356/nation/ex-sharia-judge-20-others-may-testify-in-poll-fraud-probe </link>
<description><![CDATA[The former Shari'a court judge who claimed to have helped Gloria Macapagal-Arroyo cheat in the 2004 presidential elections and at least 20 others may serve as witnesses in the joint investigation by the Commission on Elections and Department of Justice on the alleged poll fraud, Comelec chief Sixto Brillantes Jr. said Wednesday.]]></description>
</item>
</rdf:RDF>
现在我想获取所有在 <item>
标签里的内容。这很简单,但我对 Python 还不太熟悉。我不太确定怎么解析 rdf,然后提取出所有的 <item>
。
补充说明:我不能使用任何第三方库,因为我的脚本要在嵌入式系统上运行。
2 个回答
1
因为不能使用第三方库,这里用Python自带的ElementTree实现了相同的功能:
from xml.etree import ElementTree as etree
document = etree.parse(open('your-example-xml.rdf'))
root = document.getroot()
ns_purl = 'http://purl.org/rss/1.0/'
ns_rdf = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
for item in root.findall('{%s}item' % ns_purl):
print item.attrib.get('{%s}about' % ns_rdf)
print item.find('{%s}description' % ns_purl).text
print
2
lxml 是一个很棒的工具,可以帮助你处理所有与 XML 相关的事情。比如,你发的那个 XML 示例:
from lxml import etree
document = etree.parse('your-example-xml.rdf')
root = document.getroot()
# Namespace shortcuts
ns = root.nsmap.get(None)
rdf = root.nsmap.get('rdf')
for item in root.xpath('purl:item', namespaces={'purl': ns}):
print item.attrib.get('{%s}about' % rdf)
print item.xpath('purl:description/text()', namespaces={'purl': ns})
print
不过,如果你只是想解析 RDF,可能会有专门处理 RDF 的库可以使用。