scrapy 无法调用 Request() 回调

3 投票

2 回答

3295 浏览

提问于 2025-04-17 20:01

我正在用Scrapy写一个递归解析的脚本，但是Request()这个函数没有调用我想要的回调函数suppose_to_parse()，也没有调用我在回调参数中提供的任何函数。我尝试了不同的写法，但都没有成功。请问我该从哪里入手呢？

from scrapy.http import Request
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector



class joomler(BaseSpider):
    name = "scrapy"
    allowed_domains = ["scrapy.org"]
    start_urls = ["http://blog.scrapy.org/"]


    def parse(self, response):
        print "Working... "+response.url
        hxs = HtmlXPathSelector(response)
        for link in hxs.select('//a/@href').extract():
            if not link.startswith('http://') and not link.startswith('#'):
               url=""
               url=(self.start_urls[0]+link).replace('//','/')
               print url
               yield Request(url, callback=self.suppose_to_parse)


    def suppose_to_parse(self, response):
        print "asdasd"
        print response.url

回调函数网络爬虫 scrapy 递归解析

2 个回答

我不是专家，但我试了你的代码，我觉得问题不在请求上，生成的链接好像有问题。如果你把一些链接放到一个列表里，然后逐个处理这些链接，并用回调函数发出请求，那就能正常工作了。

回答于 2025-04-17 由 Python大师

分享举报

把这个“产出”（yield）放到if语句外面去：

for link in hxs.select('//a/@href').extract():
    url = link
    if not link.startswith('http://') and not link.startswith('#'):
        url = (self.start_urls[0] + link).replace('//','/')

    print url
    yield Request(url, callback=self.suppose_to_parse)

回答于 2025-04-17 由 Python大师

分享举报

scrapy 无法调用 Request() 回调

2 个回答

撰写回答