为什么在yield Request时自定义回调未被调用,而parse方法被调用?

0 投票
1 回答
1018 浏览
提问于 2025-04-18 05:18

我想在这个网页上翻页,我写了下面的代码,

pageNav.py:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request

class pageNaviSpider(Spider):
    name = 'navi'
    start_urls = ['https://itunes.apple.com/us/genre/ios-books/id6018?mt=8&letter=A&page=1#page']

    def parse(self, response):
        print 'response from: ', response.url
        self.parseLink(response)

    def parseLink(self, response):
        print 'response from: ', response.url
        sel = Selector(response)

        for url in sel.xpath("//a[@class='paginate-more']/@href").extract():
            yield Request(url, callback=self.parseLink) 

上面的Python代码没有成功运行。但是,我写了另一段爬虫代码,下面这段代码却运行得很好。我不知道为什么。有没有什么建议?

pageNav2.py:

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request

class pageNaviSpider(Spider):
    name = 'navi2'
    start_urls = ['https://itunes.apple.com/us/genre/ios-books/id6018?mt=8&letter=A&page=1#page']

def parse(self, response):
    print 'response from: ', response.url
    sel = Selector(response)

    for url in sel.xpath("//a[@class='paginate-more']/@href").extract():
        yield Request(url, callback=self.parseLink) 

1 个回答

3

你应该把这部分改成:

def parse(self, response):
    print 'response from: ', response.url
    self.parseLink(response)

改成这样:

def parse(self, response):
    print 'response from: ', response.url
    for item in self.parseLink(response):
        yield item

如果没有 return/yield 语句,函数会返回 None

撰写回答