python scrapy css选择器名称提取不起作用

import scrapy class BloreSpider(scrapy.Spider): name = 'blore' start_urls = ['http://www.engineering.careers360.com/search/college/bangalore'] def parse(self, response): for quote in response.css('div.title'): yield { 'author': quote.xpath('.//a/text()').extract_first(), } next_page = response.css('li.pager-next a::attr("href")').extract_first() if next_page: next_page = response.urljoin(next_page) yield scrapy.Request(next_page, callback=self.parse)

1条回答

网友

1楼 · 发布于 2024-06-09 03:28:17

xpath需要与您的quote节点相对，换句话说，您需要在//之前添加.。在

试试这个：

def parse(self, response):
    for quote in response.css('div.title'):
        yield {
            #'author': quote.xpath('//a/text()').extract_first(),
            #                       ^
            'author': quote.xpath('.//a/text()').extract_first(),
        }

    next_page = response.css('li.pager-next a::attr("href")').extract_first()
    # if next_page is not None:
    if next_page:  # you can also just do this
        next_page = response.urljoin(next_page)
        yield scrapy.Request(next_page, callback=self.parse)

编辑：查看您提供的日志，您在尝试检索时似乎得到了404机器人.txt. 尝试在settings.py中设置ROBOTS_TXT_OBEY = False

相关问题更多 >

编程相关推荐

热门问题

热门文章