访问内部URL时出错

2024-04-25 06:27:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我在start\u url数组中有一个url,如下所示:

start_urls = [
        'https://www.ebay.com/sch/tp_peacesports/m.html?_nkw=&_armrs=1&_ipg=&_from='
    ]

    def parse(self, response):
        shop_title = self.getShopTitle(response)
        sell_count = self.getSellCount(response)
        self.shopParser(response, shop_title, sell_count)


    def shopParser(self, response, shop_title, sell_count):
        items = EbayItem()
        items['shop_title'] = shop_title
        items['sell_count'] = sell_count
        if sell_count > 0:
            item_links = response.xpath('//ul[@id="ListViewInner"]/li/h3/a/@href').extract()
            for link in item_links:
                items['item_price'] = response.xpath('//span[@itemprop="price"]/text()').extract_first()

        yield items

现在在shopParser()for循环中,我有不同的链接,我需要有不同的响应而不是来自start\u url的原始响应,我如何才能做到这一点


Tags: selfurltitleresponsedefcountextractitems
1条回答
网友
1楼 · 发布于 2024-04-25 06:27:33

您需要调用对新页面的请求,否则您将无法获得任何新的html。尝试以下操作:

def parse(self, response):
    shop_title = response.meta.get('shop_title', self.getShopTitle(response))
    sell_count = response.meta.get('sell_count', self.getSellCount(response))

    # here you logic with item parsing
    if sell_count > 0:
        item_links = response.xpath('//ul[@id="ListViewInner"]/li/h3/a/@href').extract()
        # yield requests to next pages
        for link in item_links:
            yield scrapy.Request(response.urljoin(link), meta={'shop_title': shop_title, 'sell_count': sell_count})

这些新请求也将由parse函数解析。如果需要,也可以设置另一个回调

相关问题 更多 >