在Google App Engine上使用Python 2.7时无法使用lxml.etree

6 投票
4 回答
7473 浏览
提问于 2025-04-17 06:20

我一直在尝试在 Google App Engine 上用 Python 2.7 结合 html5lib 和 lxml。但是当我运行以下代码时,出现了一个错误,提示“NameError: global name 'etree' is not defined”。这是不是意味着在 Google App Engine 上不能使用 lxml.etree?还是我漏掉了什么?

app.yaml

application: testsite
version: 1
runtime: python27
api_version: 1
threadsafe: false

handlers:
- url: /.*
  script: index.py   

libraries:
- name: lxml
  version: "2.3"  # I thought this would allow me to use lxml.etree

index.py

from testhandler import TestHandler
application = webapp.WSGIApplication([('/', TestHandler)], debug=True)

testhandler.py

import urllib2
import html5lib
from html5lib import treebuilders
try:
    from lxml import etree
    print("running with lxml.etree")
except ImportError:
    try:
        # Python 2.5
        import xml.etree.cElementTree as etree
        print("running with cElementTree on Python 2.5+")
    except ImportError:
        try:
            # Python 2.5
            import xml.etree.ElementTree as etree
            print("running with ElementTree on Python 2.5+")
        except ImportError:
            try:
                # normal cElementTree install
                import cElementTree as etree
                print("running with cElementTree")
            except ImportError:
                try:
                    # normal ElementTree install
                    import elementtree.ElementTree as etree
                    print("running with ElementTree")
                except ImportError:
                    print("Failed to import ElementTree from any known place")

from google.appengine.ext import webapp

class TestHandler(webapp.RequestHandler):
    def get(self):
        f = urllib2.urlopen("http://www.yahoo.com/").read()
        doc = html5lib.parse(f, treebuilder='lxml')
        elems = doc.xpath("//*[local-name() = 'a']")
        self.response.out.write(len(elems))

错误信息

running with cElementTree on Python 2.5+
Status: 500 Internal Server Error
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Expires: Fri, 01 Jan 1990 00:00:00 GMT
Content-Length: 769

<pre>Traceback (most recent call last):
  File &quot;/usr/local/bin/google_appengine/google/appengine/ext/webapp/_webapp25.py&quot;,     line 701, in __call__
handler.get(*groups)
  File &quot;/home/test/testhandler.py&quot;, line 38, in get
    parser = html5lib.HTMLParser(tree= treebuilders.getTreeBuilder('lxml'))
  File &quot;/home/test/html5lib/html5parser.py&quot;, line 68, in __init__
    self.tree = tree(namespaceHTMLElements)
  File &quot;/home/test/html5lib/treebuilders/etree_lxml.py&quot;, line 176, in __init__
    builder = etree_builders.getETreeModule(etree, fullTree=fullTree)
NameError: global name 'etree' is not defined
</pre>

补充说明

其实,我尝试了几种方法来创建一个文档对象,但都没成功。其中一种方法是我尝试导入 from lxml.html import document_fromstring,结果也出现了错误。

Traceback (most recent call last):
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4143, in _HandleRequest
    self._Dispatch(dispatcher, self.rfile, outfile, env_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 4049, in _Dispatch
    base_env_dict=env_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 616, in Dispatch
    base_env_dict=base_env_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3120, in Dispatch
    self._module_dict)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 3024, in ExecuteCGI
    reset_modules = exec_script(handler_path, cgi_path, hook)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2887, in ExecuteOrImportScript
    exec module_code in script_module.__dict__
  File "/home/yoo/eclipse_workspace/website_checker/src/index.py", line 5, in <module>
    from handlers.updatecheck import UpdateCheckHandler
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module
    return self.FindAndLoadModule(submodule, fullname, search_path)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule
    description)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted
    description)
  File "/home/test/updatecheck.py", line 4, in <module>
    from lxml.html import document_fromstring
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2503, in load_module
    return self.FindAndLoadModule(submodule, fullname, search_path)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2375, in FindAndLoadModule
    description)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 1538, in Decorate
    return func(self, *args, **kwargs)
  File "/usr/local/bin/google_appengine/google/appengine/tools/dev_appserver.py", line 2318, in LoadModuleRestricted
    description)
  File "/usr/lib/python2.7/dist-packages/lxml/html/__init__.py", line 12, in <module>
    from lxml import etree
ImportError: cannot import name etree

根据错误信息来看,似乎是因为某种原因,App Engine 不允许我加载 etree 模块。我本来想用 lxml 的 xpath 功能,但我没有太多时间去搞清楚发生了什么,而且对 Python 的了解也不够。所以我打算试试用 'simpletree' 版本的方法。

f = urllib2.urlopen("http://www.yahoo.com/").read()
p = html5lib.HTMLParser()
doc = p.parse(f)
# do something with doc.childNodes
self.response.out.write(len(doc.childNodes))  

这方法不是特别好,但至少在我测试时在 Google App Engine 上是有效的。

4 个回答

0

试试在你的测试处理程序的最上面加上

import lxml

1

在Windows上,我遇到了这个问题,原因是python27的版本里没有包含lxml这个库。你可以使用一个叫easy_install的脚本,但这需要你自己编译源代码,这让我遇到了一些麻烦。

我在Google论坛上找到了一篇帖子,帮助我解决了这个问题:

https://groups.google.com/forum/?fromgroups=#!topic/comp.lang.python/Q8YeOIbn5Ds

不过,如果你想省去自己编译的麻烦,可以直接安装一个已经编译好的版本,比如可以从这个网站下载: http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml

只需从上面的网站下载可执行文件,然后运行这个*.exe文件,它会自动安装所需的所有代码。

1

你有没有在本地安装lxml?我之前也遇到过同样的错误 - 导入失败。你可以在这里下载lxml: http://pypi.python.org/pypi/lxml/

lxml可以和GAE一起使用,这真不错。不过现在关于这方面的文档或示例真的很少。

撰写回答