GAE Python LXML - XMLSyntaxError 规范要求属性对象的值

0 投票
1 回答
6813 浏览
提问于 2025-04-17 14:36

我正在使用Google App Engine和Python,想要获取一个经过GZ压缩的XML文件,并用LXML的iterparse来解析它。我参考了lxml.de上的示例,写了以下代码:

import gzip, base64, StringIO
from lxml import etree
from google.appengine.ext import webapp
from google.appengine.api.urlfetch import fetch

class Catalog(webapp.RequestHandler):
user = xxx
password = yyy
catalog = fetch('url',
                    headers={"Authorization": 
                             "Basic %s" % base64.b64encode(user + ':' + password)}, deadline=600)
items = etree.iterparse(StringIO.StringIO(catalog), tag='product')

for _, element in items:
    print('%s -- %s' % (element.findtext('name'), element[1].text))
    element.clear()

当我运行这段代码时,出现了以下错误:

for _, element in coupons:
File "iterparse.pxi", line 491, in lxml.etree.iterparse.__next__ (src/lxml\lxml.etree.c:98565)
File "iterparse.pxi", line 543, in lxml.etree.iterparse._read_more_events (src/lxml\lxml.etree.c:99086)
File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml\lxml.etree.c:74791)
XMLSyntaxError: Specification mandate value for attribute object, line 1, column 53

这个错误是什么意思呢?我猜可能是XML文件格式有问题,但我不知道该从哪里找出问题。任何帮助都非常感谢!

1 个回答

2

这个问题的解决方法是对获取数据和压缩部分进行了不同的处理,启用了异步请求,并使用了webapp2框架。使用这些方法后,问题就解决了 :) 下面是代码:

from google.appengine.api.urlfetch import fetch
import gzip, webapp2, base64, StringIO, datetime
from credentials import CJCredentials
from lxml import etree

class Catalog(webapp2.RequestHandler):
def get(self):
    user = xxx
    password = yyy
    url = 'some_url'

    catalogResponse = fetch(url, headers={
        "Authorization": "Basic %s" % base64.b64encode(user + ':' + password)
    }, deadline=10000000)

    f = StringIO.StringIO(catalogResponse.content)
    c = gzip.GzipFile(fileobj=f)
    content = c.read()

    xml = StringIO.StringIO(content)

    tree = etree.iterparse(xml, tag='product')

    for event, element in tree:
       print element.name

撰写回答