<p>以脚本形式运行scrapy应用程序的首选方法-<a href="https://docs.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script" rel="nofollow noreferrer">docs</a><br/>
您可以使用内置的<a href="https://docs.scrapy.org/en/latest/topics/feed-exports.html" rel="nofollow noreferrer">feed exporters</a><br/>
在您的情况下,it解决方案如下(对于scrapy 2.1版):</p>
<pre><code>import scrapy
from scrapy.crawler import CrawlerProcess
class TCSpider(scrapy.Spider):
name = "techcrunch"
def start_requests(self):
urls = [
"https://techcrunch.com/"
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
SET_SELECTOR = ".post-block__title"
output = " BEGIN OUTPUT "
print(output)
for data in response.css(SET_SELECTOR):
print(' BEGIN DATA ')
print(data)
TITLE_SELECTOR = "a ::text"
URL_SELECTOR = "a ::attr(href)"
yield {
'title': data.css(TITLE_SELECTOR).extract_first(),
'url':data.css(URL_SELECTOR).extract_first(),
}
process = CrawlerProcess(settings={
"FEEDS": {
"items.json": {"format": "json"},
#"items.jl": {"format": "jsonlines"},
},
})
process.crawl(TCSpider)
process.start()
</code></pre>