如何从XML数据中获取特定元素？

2 投票

1 回答

3254 浏览

提问于 2025-04-16 19:17

我有一些代码用来获取XML数据：

import cStringIO
import pycurl
from xml.etree import ElementTree

_API_KEY = 'my api key'
_ima = '/the/path/to/a/image'

sock = cStringIO.StringIO()

upl = pycurl.Curl()

values = [
            ("key", _API_KEY),
            ("image", (upl.FORM_FILE, _ima))]

upl.setopt(upl.URL, "http://api.imgur.com/2/upload.xml")
upl.setopt(upl.HTTPPOST, values)
upl.setopt(upl.WRITEFUNCTION, sock.write)
upl.perform()
upl.close()
xmldata = sock.getvalue()
#print xmldata
sock.close()

得到的数据看起来是这样的：

<?xml version="1.0" encoding="utf-8"?>
<upload><image><name></name><title></title><caption></caption><hash>dxPGi</hash><deletehash>kj2XOt4DC13juUW</deletehash><datetime>2011-06-10 02:59:26</datetime><type>image/png</type><animated>false</animated><width>1024</width><height>768</height><size>172863</size><views>0</views><bandwidth>0</bandwidth></image><links><original>https://i.stack.imgur.com/dxPGi.png</original><imgur_page>http://imgur.com/dxPGi</imgur_page><delete_page>http://imgur.com/delete/kj2XOt4DC13juUW</delete_page><small_square>https://i.stack.imgur.com/dxPGis.jpg</small_square><large_thumbnail>https://i.stack.imgur.com/dxPGil.jpg</large_thumbnail></links></upload>

现在，按照这个回答，我想从数据中提取一些特定的值。

这是我的尝试：

tree = ElementTree.fromstring(xmldata)
url = tree.findtext('original')
webpage = tree.findtext('imgur_page')
delpage = tree.findtext('delete_page')

print 'Url: ' + str(url)
print 'Pagina: ' + str(webpage)
print 'Link de borrado: ' + str(delpage)

如果我尝试添加.text来获取文本，就会出现AttributeError错误：

Traceback (most recent call last):
  File "<pyshell#28>", line 27, in <module>
    url = tree.find('original').text
AttributeError: 'NoneType' object has no attribute 'text'

我在Python的帮助文档中找不到关于ElementTree这个属性的任何信息。我该如何只获取文本，而不是对象呢？

我找到了一些关于获取文本字符串的信息在这里；但是当我尝试时却出现了TypeError错误：

Traceback (most recent call last): 
  File "<pyshell#32>", line 34, in <module>
    print 'Url: ' + url
TypeError: cannot concatenate 'str' and 'NoneType' objects

如果我尝试打印'Url: ' + str(url)，就没有错误，但结果显示为None。

我该如何从这个XML中获取url、网页和delete_page的数据呢？

XML URL提取类型错误数据解析 elementtree 元素提取网页数据文本获取

1 个回答

你的 find() 调用是想要找到树顶层的一个直接子元素，要求这个子元素的标签名是 original，而不是树下层的任何标签。如果你想找到树中所有标签名为 original 的元素，可以使用：

url = tree.find('.//original').text

ElementTree 的 find() 方法的匹配规则可以在这个页面的表格中找到： http://effbot.org/zone/element-xpath.htm

对于 // 的匹配规则，它说明：

选择当前元素下面所有层级的所有子元素（搜索整个子树）。例如，“.//egg” 会选择整个树中的所有 “egg” 元素。

补充：这里有一些测试代码给你，它使用了你之前发的 XML 示例字符串，我刚刚在 TextMate 中用 XML Tidy 处理过，让它更易读：

from xml.etree import ElementTree
xmldata = '''<?xml version="1.0" encoding="utf-8"?>
<upload>
    <image>
        <name/>
        <title/>
        <caption/>
        <hash>dxPGi</hash>
        <deletehash>kj2XOt4DC13juUW</deletehash>
        <datetime>2011-06-10 02:59:26</datetime>
        <type>image/png</type>
        <animated>false</animated>
        <width>1024</width>
        <height>768</height>
        <size>172863</size>
        <views>0</views>
        <bandwidth>0</bandwidth>
</image>
<links>
    <original>https://i.stack.imgur.com/dxPGi.png</original>
    <imgur_page>http://imgur.com/dxPGi</imgur_page>
    <delete_page>http://imgur.com/delete/kj2XOt4DC13juUW</delete_page>
    <small_square>https://i.stack.imgur.com/dxPGis.jpg</small_square>
    <large_thumbnail>https://i.stack.imgur.com/dxPGil.jpg</large_thumbnail>
</links>
</upload>'''
tree = ElementTree.fromstring(xmldata)
print tree.find('.//original').text

在我的机器上（OS X，运行 python 2.6.1），输出结果是：

Ian-Cs-MacBook-Pro:tmp ian$ python test.py 
https://i.stack.imgur.com/dxPGi.png

回答于 2025-04-16 由 Python大师

分享举报

如何从XML数据中获取特定元素？

1 个回答

撰写回答