为什么在yield Request时自定义回调未被调用,而parse方法被调用?
我想在这个网页上翻页,我写了下面的代码,
pageNav.py:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
class pageNaviSpider(Spider):
name = 'navi'
start_urls = ['https://itunes.apple.com/us/genre/ios-books/id6018?mt=8&letter=A&page=1#page']
def parse(self, response):
print 'response from: ', response.url
self.parseLink(response)
def parseLink(self, response):
print 'response from: ', response.url
sel = Selector(response)
for url in sel.xpath("//a[@class='paginate-more']/@href").extract():
yield Request(url, callback=self.parseLink)
上面的Python代码没有成功运行。但是,我写了另一段爬虫代码,下面这段代码却运行得很好。我不知道为什么。有没有什么建议?
pageNav2.py:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
class pageNaviSpider(Spider):
name = 'navi2'
start_urls = ['https://itunes.apple.com/us/genre/ios-books/id6018?mt=8&letter=A&page=1#page']
def parse(self, response):
print 'response from: ', response.url
sel = Selector(response)
for url in sel.xpath("//a[@class='paginate-more']/@href").extract():
yield Request(url, callback=self.parseLink)
1 个回答
3
你应该把这部分改成:
def parse(self, response):
print 'response from: ', response.url
self.parseLink(response)
改成这样:
def parse(self, response):
print 'response from: ', response.url
for item in self.parseLink(response):
yield item
如果没有 return/yield
语句,函数会返回 None
。