抓痒爬错了钉子

2024-04-26 07:09:50 发布

您现在位置:Python中文网/ 问答频道 /正文

scrapy crawl [spider-name] fault中,操作员说

In spider folder of my project i have two spiders named spider1 and spider2….Now when i write the command scrapy crawl spider1 in my root project folder it calls spider2.py instead of spider1.py. when i will delete spider2.py from my project then it calls spider1.py

我经历过完全相同的行为,并使用了完全相同的解决方案。对OP all的响应归结为删除所有.pyc文件。在

I have cleaned spider1.pyc ,spider2.pyc and init.pyc. Now when i run scrapy crawl spider1 in my root flder of project it actually runs spider2.py but spider1.pyc file is generated instead of spider2.pyc

我也看到过这种行为。在

但医生们对所有这些问题和解决办法只字未提。 https://doc.scrapy.org/en/latest/intro/tutorial.html

“name:标识蜘蛛。它在项目中必须是唯一的,也就是说,不能为不同的spider设置相同的名称。”

https://doc.scrapy.org/en/1.0/topics/spiders.html#scrapy.spiders.Spider “名称: 定义此蜘蛛的名称的字符串。spider name是Scrapy如何定位(和实例化)spider的,因此它必须是唯一的。但是,没有什么可以阻止您实例化同一个spider的多个实例。这是最重要的蜘蛛属性,也是必需的。”

这是有道理的,所以斯皮奇知道该跑哪只蜘蛛,但它不工作了,所以缺了什么?谢谢。在

编辑 好吧,又发生了。这是我的回溯:

(aishah) malikarumi@Tetuoan2:~/Projects/aishah/acquire$ scrapy crawl crawl_h4
Traceback (most recent call last):
File "/home/malikarumi/Projects/aishah/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/home/malikarumi/Projects/aishah/lib/python3.5/site-packages/scrapy /cmdline.py", line 141, in execute
cmd.crawler_process = CrawlerProcess(settings)
File "/home/malikarumi/Projects/aishah/lib/python3.5/site-packages/scrapy/crawler.py", line 238, in __init__
super(CrawlerProcess, self).__init__(settings)
File "/home/malikarumi/Projects/aishah/lib/python3.5/site-packages/scrapy/crawler.py", line 129, in __init__
self.spider_loader = _get_spider_loader(settings)
File "/home/malikarumi/Projects/aishah/lib/python3.5/site-packages/scrapy/crawler.py", line 325, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "/home/malikarumi/Projects/aishah/lib/python3.5/site-packages/scrapy/spiderloader.py", line 33, in from_settings
return cls(settings)
File "/home/malikarumi/Projects/aishah/lib/python3.5/site-packages/scrapy/spiderloader.py", line 20, in __init__
self._load_all_spiders()
File "/home/malikarumi/Projects/aishah/lib/python3.5/site-packages/scrapy/spiderloader.py", line 28, in _load_all_spiders
for module in walk_modules(name):
File "/home/malikarumi/Projects/aishah/lib/python3.5/site-packages/scrapy/utils/misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "/usr/lib/python3.5/importlib/__init__.py", line 126, in  import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 986, in _gcd_import
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 661, in exec_module
File "<frozen importlib._bootstrap_external>", line 767, in get_code
File "<frozen importlib._bootstrap_external>", line 727, in source_to_code
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/home/malikarumi/Projects/aishah/acquire/acquire/spiders/crawl_h3.py",
line 19  (follow=True, callback='parse_item'),))
               ^
SyntaxError: invalid syntax`

请注意:我打电话给craw-h4。我得到了craw-h3。我将craw_h3保留为原样,包括语法错误,因此在重构时我会有一些东西可以比较。此语法错误不在爬网\ h4中。在

默认情况下,设置不变。文档还说“命令行提供的参数是优先级最高的参数,覆盖了任何其他选项。您可以使用-s(或--set)命令行选项显式重写一个(或多个)设置。“https://doc.scrapy.org/en/latest/topics/settings.html#topics-settings

我在回溯中看到了一行是关于冷冻拷贝的。文档讨论如何使用此设置使设置不可变。https://doc.scrapy.org/en/latest/topics/api.html。我不知道它的用例是什么,但我没有选择它,我也不知道如何取消选择它,如果这是问题的话。在


Tags: inpyhomesettingsliblinesiteimportlib
1条回答
网友
1楼 · 发布于 2024-04-26 07:09:50

即使没有运行spider,也没有一个spider会出现语法错误。我假设scrapy编译了你所有的蜘蛛,即使你只想运行其中一个。仅仅因为它在您的其他spider中捕获错误并不意味着它没有运行您调用的spider。我也有过类似的经历:scrapy在我目前没有尝试运行的spider中捕捉错误,但它最终还是运行了我想要的spider。修复您的语法错误,并尝试使用不同的方法来验证您的spider运行了这样的打印或收集了与其他spider不同的数据。在

相关问题 更多 >