从Python身上跑出来

from twisted.internet import reactor from scrapy.crawler import Crawler from scrapy.settings import Settings from scrapy import log from testspiders.spiders.followall import FollowAllSpider spider = FollowAllSpider(domain='scrapinghub.com') crawler = Crawler(Settings()) crawler.configure() crawler.crawl(spider) crawler.start() log.start() reactor.run() # the script will block here

2条回答

网友

1楼 · 编辑于 2024-04-25 00:19:40

只需导入它并传递到crawler.crawl()，如：

from testspiders.spiders.spider_a import MySpider

spider = MySpider()
crawler.crawl(spider)

网友

2楼 · 编辑于 2024-04-25 00:19:40

在scrapy0.19.x中（可能与旧版本一起工作），您可以执行以下操作。在

spider = FollowAllSpider(domain='scrapinghub.com')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run() # the script will block here

您甚至可以直接从以下脚本调用命令：

^{pr2}$

看看我的答案here。我changed官方的documentation所以现在你的爬虫程序使用你的设置并可以产生输出。在

相关问题更多 >

编程相关推荐

热门问题

热门文章