解析带前缀的.xml标签？xml.etree.ElementTree

4 投票

2 回答

4346 浏览

提问于 2025-04-17 05:18

我可以读取标签，除了当标签有前缀的时候。我在Stack Overflow上找不到相关的问题。

我需要读取 media:content。我试过用 image = node.find("media:content")。

这是我输入的RSS内容：

<channel>
  <title>Popular  Photography in the last 1 week</title>
  <item>
    <title>foo</title>
    <media:category label="Miscellaneous">photography/misc</media:category>
    <media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
  </item>
  <item> ... </item>
</channel>

我可以读取一个兄弟标签 title。

from xml.etree import ElementTree
with open('cache1.rss', 'rt') as f:
    tree = ElementTree.parse(f)

for node in tree.findall('.//channel/item'):
    title =  node.find("title").text

我一直在查阅文档，但在“前缀”这一部分卡住了。

2 个回答

media 是一个 XML 命名空间，它需要在之前的地方用 xmlns:media="..." 来定义。想了解如何在 lxml 中为 XPath 表达式定义 XML 命名空间，可以查看这个链接：http://lxml.de/xpathxslt.html#namespaces-and-prefixes。

回答于 2025-04-17 由 Python大师

分享举报

下面是一个使用 XML 命名空间和 ElementTree 的例子：

>>> x = '''\
<channel xmlns:media="http://www.w3.org/TR/html4/">
  <title>Popular  Photography in the last 1 week</title>
  <item>
    <title>foo</title>
    <media:category label="Miscellaneous">photography/misc</media:category>
    <media:content url="http://foo.com/1.jpg" height="375" width="500" medium="image"/>
  </item>
  <item> ... </item>
</channel>
'''
>>> node = ElementTree.fromstring(x)
>>> for elem in node.findall('item/{http://www.w3.org/TR/html4/}category'):
        print elem.text


photography/misc

回答于 2025-04-17 由 Python大师

分享举报

解析带前缀的.xml标签？xml.etree.ElementTree

2 个回答

撰写回答