擅长:python、mysql、java
<p>更好的解决方案是(如果有多个蜘蛛),它动态地获取蜘蛛并运行它们。</p>
<pre><code>from scrapy import spiderloader
from scrapy.utils import project
from twisted.internet.defer import inlineCallbacks
@inlineCallbacks
def crawl():
settings = project.get_project_settings()
spider_loader = spiderloader.SpiderLoader.from_settings(settings)
spiders = spider_loader.list()
classes = [spider_loader.load(name) for name in spiders]
for my_spider in classes:
yield runner.crawl(my_spider)
reactor.stop()
crawl()
reactor.run()
</code></pre>
<p><strong>(第二种解决方案):</strong>
因为<code>spiders.list()</code>在Scrapy 1.4yuda解决方案中被弃用,所以应该将其转换为</p>
<pre><code>from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess
setting = get_project_settings()
spider_loader = spiderloader.SpiderLoader.from_settings(settings)
for spider_name in spider_loader.list():
print ("Running spider %s" % (spider_name))
process.crawl(spider_name)
process.start()
</code></pre>