我已经成功地创建了一个爬虫与scrapy下载到CSV和拉图片到images/full文件夹。在
现在我想在爬网后清理它,把文件拉到一个zip归档文件中,同时删除“full”文件夹和CSV。在
我就是这样做的:
解析器_属性.py公司名称:
# -*- coding: utf-8 -*-
# interpret attributes
def gender(i):
switcher={
'damen & herren' : 1,
'herren, unisex' : 1,
'unisex' : 1,
'damen' : 2,
'herren' : 6
}
for k, v in switcher.items():
if k.lower() in i.lower():
return v
return "Invalid: " + i
在测试.py公司名称:
^{pr2}$回溯:
scrapy crawl test -o csv/181201_test.csv -t csv
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/usr/local/lib/python3.7/site-packages/scrapy/cmdline.py", line 149, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 249, in __init__
super(CrawlerProcess, self).__init__(settings)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 137, in __init__
self.spider_loader = _get_spider_loader(settings)
File "/usr/local/lib/python3.7/site-packages/scrapy/crawler.py", line 336, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/usr/local/lib/python3.7/site-packages/scrapy/spiderloader.py", line 61, in from_settings
return cls(settings)
File "/usr/local/lib/python3.7/site-packages/scrapy/spiderloader.py", line 25, in __init__
self._load_all_spiders()
File "/usr/local/lib/python3.7/site-packages/scrapy/spiderloader.py", line 47, in _load_all_spiders
for module in walk_modules(name):
File "/usr/local/lib/python3.7/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/Users/user/test_crawl/bid/bid/spiders/test.py", line 96, in <module>
cleanup('test')
File "/Users/user/test_crawl/bid/bid/spiders/test.py", line 84, in cleanup
shutil.make_archive(filename, 'zip', imagefolder)
File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/shutil.py", line 792, in make_archive
os.chdir(root_dir)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/user/test_crawl/bid/images/full'
有两种主要的方法来执行一个废蜘蛛:
你的代码试图混合这两种方式,这是行不通的。在
我可以想出两种方法来做你想做的事:
close_spider
方法中处理压缩/删除逻辑前者可能更简单,但后者避免了在抓取过程完成后压缩和删除文件的需要。在
您只需使用正确的绝对导入路径。
^{pr2}$但可能会以同样的方式实现:
相关问题 更多 >
编程相关推荐