如何在两个爬虫完成后停止反应器
我有这段代码,当两个爬虫都完成后,程序还是在运行。
#!C:\Python27\python.exe
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from carrefour.spiders.tesco import TescoSpider
from carrefour.spiders.carr import CarrSpider
from scrapy.utils.project import get_project_settings
import threading
import time
def tescofcn():
tescoSpider = TescoSpider()
settings = get_project_settings()
crawler = Crawler(settings)
crawler.configure()
crawler.crawl(tescoSpider)
crawler.start()
def carrfcn():
carrSpider = CarrSpider()
settings = get_project_settings()
crawler = Crawler(settings)
crawler.configure()
crawler.crawl(carrSpider)
crawler.start()
t1=threading.Thread(target=tescofcn)
t2=threading.Thread(target=carrfcn)
t1.start()
t2.start()
log.start()
reactor.run()
我试着把这个插入到两个函数里
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
,结果是速度快的那个爬虫结束了两个爬虫的运行,而速度慢的那个爬虫还没完成就被强制终止了。
1 个回答
1
你可以创建一个函数,来检查正在运行的爬虫列表,然后把这个函数连接到 singals.spider_closed
上。
from scrapy.utils.trackref import iter_all
def close_reactor_if_no_spiders():
running_spiders = [spider for spider in iter_all('Spider')]
if not running_spiders:
reactor.stop()
crawler.signals.connect(close_reactor_if_no_spiders, signal=signals.spider_closed)
不过,我还是建议你使用 scrapyd
来管理多个爬虫的运行。