递归爬网

class BoxSpider(scrapy.Spider): name = "mag" start_urls = [ "http://www.example.com/index.html" ] def secondPage(self, response): secondPageItem = CinemasItem() secondPageItem['trailer'] = 'trailer' secondPageItem['synopsis'] = 'synopsis' yield secondPageItem def parse(self, response): for sel in response.xpath('//*[@id="conteudoInternas"]/ul/li'): item = CinemasItem() item['title'] = 'title' item['room'] = 'room' item['mclass'] = 'mclass' item['minAge'] = 'minAge' item['cover'] = 'cover' item['sessions'] = 'sessions' secondUrl = sel.xpath('p[1]/a/@href').extract()[0] yield item yield scrapy.Request(url=secondUrl, callback=self.secondPage)

1条回答

网友

1楼 · 发布于 2024-04-19 17:50:41

您需要将在^{}内部的parse()中实例化的item传递给secondPage回调：

def parse(self, response):
    for sel in response.xpath('//*[@id="conteudoInternas"]/ul/li'):
        item = CinemasItem()
        item['title'] = 'title'
        item['room'] = 'room'
        item['mclass'] = 'mclass'
        item['minAge'] = 'minAge'
        item['cover'] = 'cover'
        item['sessions'] = 'sessions'

        secondUrl = sel.xpath('p[1]/a/@href').extract()[0]

        # see: we are passing the item inside the meta
        yield scrapy.Request(url=secondUrl, meta={'item': item}, callback=self.secondPage)

def secondPage(self, response):
    # see: we are getting the item from meta
    item = response.meta['item']

    item['trailer'] = 'trailer'
    item['synopsis'] = 'synopsis'
    yield item

另请参见：

Passing additional data to callback functions.

相关问题更多 >

编程相关推荐

热门问题

热门文章

递归爬网

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >