基于https://github.com/scrapy/quotesbot/blob/master/quotesbot/spiders/toscrapexpath.py不使用屈服要求传递数据

# -*- coding: utf-8 -*- import scrapy from quotesbot.items import MyItems from scrapy import Request class ToScrapeSpiderXPath(scrapy.Spider): name = 'toscrape-xpath' start_urls = [ 'http://quotes.toscrape.com/', ] def parse(self, response): item = MyItems() for quote in response.xpath('//div[@class="quote"]'): item['tinfo'] = quote.xpath('./span[@class="text"]/text()').extract_first() yield item but then when I modify the code as below: # -*- coding: utf-8 -*- import scrapy from quotesbot.items import MyItems from scrapy import Request class ToScrapeSpiderXPath(scrapy.Spider): name = 'toscrape-xpath' start_urls = [ 'http://quotes.toscrape.com/', ] def parse(self, response): item = MyItems() for quote in response.xpath('//div[@class="quote"]'): item['tinfo'] = quote.xpath('./span[@class="text"]/text()').extract_first() yield Request("http://quotes.toscrape.com/", callback=self.parse2, meta={'item':item}) def parse2(self, response): item = response.meta['item'] yield item

# -*- coding: utf-8 -*-`enter code here` # Define here the models for your scraped items # # See documentation in: # https://doc.scrapy.org/en/latest/topics/items.html import scrapy class QuotesbotItem(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() pass class MyItems(scrapy.Item): # define the fields for your item here like: # name = scrapy.Field() tinfo = scrapy.Field() pass

1条回答

网友

1楼 · 发布于 2024-04-24 11:03:23

你的蜘蛛逻辑很混乱：

def parse(self, response):
    for quote in response.xpath('//div[@class="quote"]'):
            yield Request("http://quotes.toscrape.com/", 
    callback=self.parse2, meta={'item':item})

对于您在quotes.toscrape.com上找到的每一个报价，您是否安排另一个请求到同一个网页？所发生的是这些新的计划请求被scrapys duplicate request filter过滤掉。你知道吗

也许你应该把物品放在这里：

def parse(self, response):
    for quote in response.xpath('//div[@class="quote"]'):
        item = MyItems()
        item['tinfo'] = quote.xpath('./span[@class="text"]/text()').extract_first()
        yield item

为了说明当前爬虫程序什么都不做的原因，请参见此图：

相关问题更多 >

编程相关推荐

热门问题

热门文章