为什么使用IMDbPY时会收到这么多警告和错误?

4 投票
1 回答
2187 浏览
提问于 2025-04-16 13:55

我正在使用IMDbPY从IMDb获取数据。结果是正确的,一切看起来都没问题,只有一件事让我困扰:无论我怎么做,都会收到一些警告。虽然结果没问题,但这些结果总是在一长串警告和有时的错误之后才显示出来。

举个例子:下面的代码应该打印出《落水狗》(1992)

import imdb
db = imdb.IMDb()
movie_obj = db.search_movie('pulp fiction')[0]
db.update(movie_obj)
print movie_obj['long imdb canonical title']

它确实打印出来了,但在此之前却出现了以下的警告和错误:

2011-03-18 00:33:11,490 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:459: unable to use "lxml": No module named lxml.html
2011-03-18 00:33:11,507 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:450: falling back to "beautifulsoup"
2011-03-18 00:33:13,483 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:459: unable to use "lxml": No module named lxml.html
2011-03-18 00:33:13,483 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:450: falling back to "beautifulsoup"
2011-03-18 00:33:15,137 ERROR [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:566: DOMHTMLMovieParser: caught exception extracting XPath "//div[@id='tn15title']//span[starts-with(text(), 'TV series')]"
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\imdb\parser\http\utils.py", line 555, in xpath
    xpath_result = element.xpath(path)
  File "C:\Python27\lib\site-packages\imdb\parser\http\bsouplxml\etree.py", line 57, in xpath
    return path.apply(node)
  File "C:\Python27\lib\site-packages\imdb\parser\http\bsouplxml\bsoupxpath.py", line 113, in apply
    nodes = step.apply(nodes)
  File "C:\Python27\lib\site-packages\imdb\parser\http\bsouplxml\bsoupxpath.py", line 287, in apply
    found = filter(checker, found)
  File "C:\Python27\lib\site-packages\imdb\parser\http\bsouplxml\bsoupxpath.py", line 331, in __call__
    return self.__filter(node)
  File "C:\Python27\lib\site-packages\imdb\parser\http\bsouplxml\bsoupxpath.py", line 360, in __starts_with
    first = node.contents[0]
IndexError: list index out of range
2011-03-18 00:33:16,785 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:459: unable to use "lxml": No module named lxml.html
2011-03-18 00:33:16,785 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:450: falling back to "beautifulsoup"
2011-03-18 00:33:16,849 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:459: unable to use "lxml": No module named lxml.html
2011-03-18 00:33:16,849 WARNING [imdbpy.parser.http.domparser] C:\Python27\lib\site-packages\imdb\parser\http\utils.py:450: falling back to "beautifulsoup"

为什么会这样呢?我是不是做错了什么?

1 个回答

2

这个问题其实很简单明了:

无法使用 "lxml":没有名为 lxml.html 的模块

你可以这样检查一下这个模块是否存在:

  1. 在终端或命令提示符中,输入 python 并运行。
  2. 把第一行的输出结果发出来(比如 Python 2.6.6 (r266...)。
  3. 在这个命令行界面中,输入 import lxml
  4. 接着,尝试输入 import lxml.html

对我来说,结果是这样的:

blender@desktop:~$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) 
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import lxml
>>> import lxml.html
>>> 

我已经安装了这个模块,所以没有任何输出(成功导入了)。

撰写回答