使用Scrapy但输出fi中没有数据的日本网站

import scrapy class suumotest(scrapy.Spider): name = "testsecond" start_urls = [ 'https://suumo.jp/jj/chintai/ichiran/FR301FC005/?tc=0401303&tc=0401304&ar=010&bs=040' ] def parse(self, response): # for following property link for href in response.css('.property_inner-title+a::attr(href)').extract(): yield scrapy.Request(response.urljoin(href), callback=self.parse_info) # defining parser to extract data def parse_info(self, response): def extract_with_css(query): return response.css(query).extract_first().strip() yield { 'Title': extract_with_css('h1.section_title::text'), 'Fee': extract_with_css('td.detailinfo-col--01 span.detailvalue-item-accent::text'), 'Fee Descrition': extract_with_css('td.detailinfo-col--01 span.detailvalue-item-text::text'), 'Prop Description': extract_with_css('td.detailinfo-col--03::text'), 'Prop Address': extract_with_css('td.detailinfo-col--04::text'), }

1条回答

网友

1楼 · 发布于 2024-06-17 11:12:30

parse方法中的第一个css选择器出现故障：

response.css('.property_inner-title+a::attr(href)').extract()

+是这里的错误。只需将其替换为空格，例如：

^{pr2}$

另一个问题是在您定义的extract_with_css()函数中：

def parse_info(self, response):
    def extract_with_css(query):
        return response.css(query).extract_first().strip()

这里的问题是如果找不到值，extract_first()将在默认情况下返回None，并且.strip()是string基类的函数，因为没有得到字符串，这将引发错误。
要修复此问题，可以将默认值设置为extract_first为空字符串：

def parse_info(self, response):
    def extract_with_css(query):
        return response.css(query).extract_first('').strip()

相关问题更多 >

编程相关推荐

热门问题

热门文章