从爬网spid中的相对url生成绝对路径

# -*- coding: utf-8 -*- import scrapy from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Rule class FfySpider(CrawlSpider): name = 'FFy' allowed_domains = ['cartoon3rbi.net'] start_urls = ['https://www.cartoon3rbi.net/cats-pages-1.html/'] rules = ( Rule(LinkExtractor(restrict_xpaths='//div[@class="cartoon_cat_name"]'), process_links='make_absolute_path', callback='parse_item', follow=True), ) def parse_item(self, response): yield { 'name': response.xpath('//div[@class="cartoon_eps_name"]/a/text()[2]').extract(), } def make_absolute_path(self, links): for link in links: url = 'https://www.cartoon3rbi.net/' + link return url

1条回答

网友

1楼 · 发布于 2024-04-26 12:27:24

来自scrapy文档： https://docs.scrapy.org/en/latest/topics/spiders.html#crawling-rules

process_links is a callable, or a string (in which case a method from the spider object with that name will be used) which will be called for each list of links extracted from each response using the specified link_extractor. This is mainly used for filtering purposes.

我认为process_links指定的函数是用链接列表调用的，应该返回链接列表（或生成器）

    def make_absolute_path(self, links):
        for link in links:
            url = 'https://www.cartoon3rbi.net/' + link
            yield url

相关问题更多 >

编程相关推荐

热门问题

热门文章