/usr/bin/scrapy无法启动
Scrapy 这个工具现在无法启动了。今天我用 yum 更新了一下,结果就不行了。我尝试卸载 Scrapy,然后用 pip 重新安装,但还是没成功。
Scrapy 的版本是 0.22.2
$ uname -a
Linux localhost.localdomain 3.13.9-100.fc19.x86_64 #1 SMP Fri Apr 4 00:51:59 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ python -V
Python 2.7.5
$ scrapy
Traceback (most recent call last):
File "/usr/bin/scrapy", line 4, in <module>
execute()
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 122, in execute
cmds = _get_commands_dict(settings, inproject)
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 46, in _get_commands_dict
cmds = _get_commands_from_module('scrapy.commands', inproject)
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 29, in _get_commands_from_module
for cmd in _iter_command_classes(module):
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 20, in _iter_command_classes
for module in walk_modules(module_name):
File "/usr/lib/python2.7/site-packages/scrapy/utils/misc.py", line 68, in walk_modules
submod = import_module(fullpath)
File "/usr/lib64/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/usr/lib/python2.7/site-packages/scrapy/commands/deploy.py", line 14, in <module>
from w3lib.form import encode_multipart
File "/usr/lib/python2.7/site-packages/w3lib-1.5-py2.7.egg/w3lib/form.py", line 3, in <module>
if six.PY2:
AttributeError: 'module' object has no attribute 'PY2'
six.PY2?
$ python
Python 2.7.5 (default, Nov 12 2013, 16:18:42)
[GCC 4.8.2 20131017 (Red Hat 4.8.2-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import six
>>> six.PY2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'PY2'
>>> six.PY3
False
>>>
从 /usr/lib/python2.7/site-packages/w3lib-1.5-py2.7.egg/w3lib/form.py 中删除对 "six.PY2" 的引用后,Scrapy 就能启动了。
#import six
#if six.PY2:
# from cStringIO import StringIO as BytesIO
#else:
# from io import BytesIO
到
import six
from cStringIO import StringIO as BytesIO
不过,接着尝试运行 scrapy crawl MySpider
时又失败了:
$ scrapy crawl MySpider
Starting domain scrape
2014-04-10 19:43:39-0400 [scrapy] INFO: Scrapy 0.22.2 started (bot: scrapy_myspider)
2014-04-10 19:43:39-0400 [scrapy] INFO: Optional features available: ssl, http11, boto
2014-04-10 19:43:39-0400 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'scrapy_myspider.spiders', 'CLOSESPIDER_TIMEOUT': 40, 'SPIDER_MODULES': ['scrapy_myspider.spiders'], 'LOG_LEVEL': 'INFO', 'RETRY_ENABLED': False, 'HTTPCACHE_DIR': '/tmp/scrapy_cache', 'HTTPCACHE_ENABLED': True, 'RETRY_TIMES': 1, 'BOT_NAME': 'scrapy_myspider', 'AJAXCRAWL_ENABLED': True, 'CONCURRENT_ITEMS': 400, 'COOKIES_ENABLED': False, 'DOWNLOAD_TIMEOUT': 14}
2014-04-10 19:43:40-0400 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-04-10 19:43:41-0400 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, DefaultHeadersMiddleware, AjaxCrawlMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, ChunkedTransferMiddleware, DownloaderStats, HttpCacheMiddleware
2014-04-10 19:43:41-0400 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-04-10 19:43:41-0400 [scrapy] INFO: Enabled item pipelines:
2014-04-10 19:43:41-0400 [MySpider] INFO: Spider opened
2014-04-10 19:43:41-0400 [MySpider] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-04-10 19:43:41-0400 [MySpider] ERROR: Obtaining request from start requests
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1192, in run
self.mainLoop()
File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 1201, in mainLoop
self.runUntilCurrent()
File "/usr/lib64/python2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib/python2.7/site-packages/scrapy/utils/reactor.py", line 41, in __call__
return self._func(*self._a, **self._kw)
--- <exception caught here> ---
File "/usr/lib/python2.7/site-packages/scrapy/core/engine.py", line 111, in _next_request
request = next(slot.start_requests)
File "/usr/lib/python2.7/site-packages/scrapy/spider.py", line 50, in start_requests
yield self.make_requests_from_url(url)
File "/usr/lib/python2.7/site-packages/scrapy/spider.py", line 53, in make_requests_from_url
return Request(url, dont_filter=True)
File "/usr/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 26, in __init__
self._set_url(url)
File "/usr/lib/python2.7/site-packages/scrapy/http/request/__init__.py", line 52, in _set_url
self._url = escape_ajax(safe_url_string(url))
File "/usr/lib/python2.7/site-packages/w3lib-1.5-py2.7.egg/w3lib/url.py", line 52, in safe_url_string
return moves.urllib.parse.quote(s, _safe_chars)
exceptions.AttributeError: '_MovedItems' object has no attribute 'urllib'
2014-04-10 19:43:41-0400 [MySpider] INFO: Closing spider (finished)
2014-04-10 19:43:41-0400 [MySpider] INFO: Dumping Scrapy stats:
{'finish_reason': 'finished',
'finish_time': datetime.datetime(2014, 4, 10, 23, 43, 41, 120645),
'log_count/ERROR': 1,
'log_count/INFO': 7,
'start_time': datetime.datetime(2014, 4, 10, 23, 43, 41, 114721)}
2014-04-10 19:43:41-0400 [MySpider] INFO: Spider closed (finished)
有没有人知道该从哪里入手?是 yum update
的问题,还是 Python 的问题?:D
补充信息
$ pip freeze
...
six==1.6.1
...
$ python
>>> import six
>>> six.__file__
/usr/lib/python2.7/site-packages/six.pyc
$ yumdb info python-six
Loaded plugins: langpacks, refresh-packagekit
python-six-1.3.0-1.fc19.noarch
checksum_data = [**redacted**]
checksum_type = sha256
command_line = install python-bugzilla python-requests python-urllib3 python-six
from_repo = fedora
from_repo_revision = 1372417620
from_repo_timestamp = 1372419845
installed_by = 1000
origin_url = http://fedora.mirror.constant.com/linux/releases/19/Everything/x86_64/os/Packages/p/python-six-1.3.0-1.fc19.noarch.rpm
reason = user
releasever = 19
var_uuid = b2714b4a-0654-4c5c-8405-80724410fdde
$ yum info python-six
Loaded plugins: langpacks, refresh-packagekit
Installed Packages
Name : python-six
Arch : noarch
Version : 1.3.0
Release : 1.fc19
Size : 50 k
Repo : installed
From repo : fedora
Summary : Python 2 and 3 compatibility utilities
URL : http://pypi.python.org/pypi/six/
License : MIT
Description : python-six provides simple utilities for wrapping over differences between
: Python 2 and Python 3.
:
: This is the Python 2 build of the module.
更多信息
$ repoquery -lq python-six
/usr/lib/python2.7/site-packages/six-1.3.0-py2.7.egg-info
/usr/lib/python2.7/site-packages/six.py
/usr/lib/python2.7/site-packages/six.pyc
/usr/lib/python2.7/site-packages/six.pyo
/usr/share/doc/python-six-1.3.0
/usr/share/doc/python-six-1.3.0/LICENSE
/usr/share/doc/python-six-1.3.0/README
/usr/share/doc/python-six-1.3.0/index.rst
解决了吗?
我做了以下操作:
$ wget http://bitbucket.org/ianb/virtualenv/raw/tip/virtualenv.py
$ python virtualenv.py ~/venv/base
$ echo 'source ~/venv/base/bin/activate' >> ~/.bash_profile
退出 Gnome 会话,然后重新登录。
$ pip install --user scrapy
$ scrapy
现在可以正常运行了。
解决了 X2,见下文
1 个回答
0
解决了
卸载 scrapy:
$ sudo pip uninstall scrapy
从 ~/.bash_profile
文件中删除 source ~/venv/base/bin/activate
这一行。
现在出现了以下错误:
$ pip install --user scrapy
The temporary folder for building (/tmp/pip-build-dave) is not owned by your user!
pip will not work until the temporary folder is either deleted or owned by your user account.
Traceback (most recent call last):
File "/usr/bin/pip", line 9, in <module>
load_entry_point('pip==1.3.1', 'console_scripts', 'pip')()
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 351, in load_entry_point
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2363, in load_entry_point
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2088, in load
File "/usr/lib/python2.7/site-packages/pip/__init__.py", line 9, in <module>
from pip.util import get_installed_distributions, get_prog
File "/usr/lib/python2.7/site-packages/pip/util.py", line 15, in <module>
from pip.locations import site_packages, running_under_virtualenv, virtualenv_no_global
File "/usr/lib/python2.7/site-packages/pip/locations.py", line 64, in <module>
build_prefix = _get_build_prefix()
File "/usr/lib/python2.7/site-packages/pip/locations.py", line 54, in _get_build_prefix
raise pip.exceptions.InstallationError(msg)
pip.exceptions.InstallationError: The temporary folder for building (/tmp/pip-build-dave) is not owned by your user!
所以……
$ sudo rm -rf /tmp/pip-build-dave
...
$ pip install --user scrapy
$ scrapy
现在可以正常工作了!顺便感谢 cdunklau 和 \u03b5 在 Freenode 的 #python 频道的帮助!=)