无法将scrapy模块作为库导入

0 投票

1 回答

675 浏览

提问于 2025-04-18 03:03

我正在尝试根据Scrapy的文档，从Python脚本中运行爬虫：http://doc.scrapy.org/en/latest/topics/practices.html

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from testspiders.spiders.followall import FollowAllSpider
from scrapy.utils.project import get_project_settings

spider = FollowAllSpider(domain='scrapinghub.com')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run() # the script will block here until the spider_closed signal was sent

但是Python就是无法导入这个模块，错误信息大概是这样的：

Traceback (most recent call last):
...
    from scrapy.crawler import Crawler
  File "aappp/scrapy.py", line 1, in <module>
ImportError: No module named crawler

这个问题在Scrapy文档的常见问题中有简单提到，但对我帮助不大。

错误处理模块导入网络爬虫文档 scrapy 爬虫常见问题

1 个回答

你试过这样做吗？

from scrapy.project import crawler

（在http://doc.scrapy.org/en/latest/faq.html上是这样做的——看起来他们已经在那儿回答了你的问题。）

它还提供了一种更新的方法，并且说之前的方法已经不推荐使用了：

“这种访问爬虫对象的方法已经不推荐使用，代码应该改为使用 from_crawler 类方法，例如：

class SomeExtension(object):

@classmethod
def from_crawler(cls, crawler):
    o = cls()
    o.crawler = crawler
    return o

”

回答于 2025-04-18 由 Python大师

分享举报

无法将scrapy模块作为库导入

1 个回答

撰写回答