urlgrabber错误
下面这段代码是用来从网址获取图片的,但它失败了。出于某种原因,它抛出了一个键盘中断(KeyboardInterrupt),这让我的脚本崩溃了,即使我用try-catch把它包裹起来也没用……
问题是,为什么它会失败,明明那个网址是存在的呢?
>>> import urlgrabber
>>> urlgrabber.urlgrab('http://upload.wikimedia.org/wikipedia/en/thumb/e/e0/Passion_Flower.JPG/220px-Passion_Flower.JPG', filename='/home/eran/a.tmp', timeout = 2, retry = 2, reget = 'simple')
这段代码产生了以下的错误追踪信息:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1098, in _hdr_retrieve
self.size = int(length)
ValueError: invalid literal for int() with base 10: 'Age, Content-Length, Date, X-Cache, X-Varnish\r\n'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1098, in _hdr_retrieve
self.size = int(length)
ValueError: invalid literal for int() with base 10: 'Age, Content-Length, Date, X-Cache, X-Varnish\r\n'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 612, in urlgrab
return default_grabber.urlgrab(url, filename, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 976, in urlgrab
return self._retry(opts, retryfunc, url, filename)
File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 880, in _retry
r = apply(func, (opts,) + args, {})
File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 962, in retryfunc
fo = PyCurlFileObject(url, filename, opts)
File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1056, in __init__
self._do_open()
File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1308, in _do_open
self._do_grab()
File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1438, in _do_grab
self._do_perform()
File "/usr/local/lib/python2.7/dist-packages/urlgrabber/grabber.py", line 1244, in _do_perform
raise KeyboardInterrupt
KeyboardInterrupt
1 个回答
1
你为什么不使用requests库呢?我觉得它更简单,而且能满足你的需求。你可以通过以下方式安装它:
pip install requests
然后代码是:
>>> import requests
>>> r = requests.get('http://upload.wikimedia.org/wikipedia/en/thumb/e/e0/Passion_Flower.JPG/220px-Passion_Flower.JPG')
>>> if r.status_code == 200:
>>> open('/tmp/flower.jpg', 'w').write(r.content)