urlgrabber错误

0 投票
1 回答
691 浏览
提问于 2025-04-18 13:17

下面这段代码是用来从网址获取图片的,但它失败了。出于某种原因,它抛出了一个键盘中断(KeyboardInterrupt),这让我的脚本崩溃了,即使我用try-catch把它包裹起来也没用……

问题是,为什么它会失败,明明那个网址是存在的呢?

>>> import urlgrabber
>>> urlgrabber.urlgrab('http://upload.wikimedia.org/wikipedia/en/thumb/e/e0/Passion_Flower.JPG/220px-Passion_Flower.JPG', filename='/home/eran/a.tmp', timeout = 2, retry = 2, reget = 'simple')

这段代码产生了以下的错误追踪信息:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1098, in _hdr_retrieve
    self.size = int(length)
ValueError: invalid literal for int() with base 10: 'Age, Content-Length, Date, X-Cache, X-Varnish\r\n'
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1098, in _hdr_retrieve
    self.size = int(length)
ValueError: invalid literal for int() with base 10: 'Age, Content-Length, Date, X-Cache, X-Varnish\r\n'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 612, in urlgrab
    return default_grabber.urlgrab(url, filename, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 976, in urlgrab
    return self._retry(opts, retryfunc, url, filename)
  File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 880, in _retry
    r = apply(func, (opts,) + args, {})
  File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 962, in retryfunc
    fo = PyCurlFileObject(url, filename, opts)
  File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1056, in __init__
    self._do_open()
  File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1308, in _do_open
    self._do_grab()
  File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1438, in _do_grab
    self._do_perform()
  File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1244, in _do_perform
    raise KeyboardInterrupt
KeyboardInterrupt

1 个回答

1

你为什么不使用requests库呢?我觉得它更简单,而且能满足你的需求。你可以通过以下方式安装它:

pip install requests

然后代码是:

>>> import requests
>>> r = requests.get('http://upload.wikimedia.org/wikipedia/en/thumb/e/e0/Passion_Flower.JPG/220px-Passion_Flower.JPG')
>>> if r.status_code == 200:
>>>     open('/tmp/flower.jpg', 'w').write(r.content)

撰写回答