Python爬虫在任何不同的URL请求上返回相同的响应

2024-04-25 18:16:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在建立一个非常简单的刮刀,但有一个非常愚蠢的错误,我正在做的地方,我无法找到

在response方法中,我使用产品列表页面上所有产品的循环来获得对任何URL的相同响应

我正在下面添加代码,请帮助

def parse(self, response): 
    item = {}
    count = 0
    for single in response.xpath('//div[@class="_3O0U0u"]/div'):
        count+=1
        # print(count)
        item['data_id'] = single.xpath('.//@data-id').extract_first()
        item['price'] = single.xpath('.//div[@class="_1vC4OE"]/text()').extract_first()
        item['url'] = single.xpath('.//div[@class="_1UoZlX"]/a[@class="_31qSD5"]/@href').extract_first()
        if not item['url']:
            item['url'] = single.xpath('.//div[@class="_3liAhj _1R0K0g"]/a[@class="Zhf2z-"]/@href').extract_first()
        #print(item)
        if item['url']:
            yield scrapy.Request('https://www.somewebsite.com' + item['url'], callback = self.get_product_detail, priority = 1, meta={'item': item})
            # break

    next_page = response.xpath('//div[@class="_2zg3yZ"]/nav/a[@class="_3fVaIS"]/span[contains(text(),"Next")]/parent::a/@href').extract_first()
    if next_page:
        next_page =  'https://www.somewebsite.com'+response.xpath('//div[@class="_2zg3yZ"]/nav/a[@class="_3fVaIS"]/span[contains(text(),"Next")]/parent::a/@href').extract_first()
        yield scrapy.Request(next_page, callback=self.parse ,priority=1)

def get_product_detail(self, response):
    dict_item = response.meta['item']
    sku = dict_item['data_id']
    print('dict SKU ======== ', sku)

Tags: selfdivurlresponsecountpageextractitem