访问内部URL时出错

start_urls = [ 'https://www.ebay.com/sch/tp_peacesports/m.html?_nkw=&_armrs=1&_ipg=&_from=' ] def parse(self, response): shop_title = self.getShopTitle(response) sell_count = self.getSellCount(response) self.shopParser(response, shop_title, sell_count) def shopParser(self, response, shop_title, sell_count): items = EbayItem() items['shop_title'] = shop_title items['sell_count'] = sell_count if sell_count > 0: item_links = response.xpath('//ul[@id="ListViewInner"]/li/h3/a/@href').extract() for link in item_links: items['item_price'] = response.xpath('//span[@itemprop="price"]/text()').extract_first() yield items

1条回答

网友
1楼 · 发布于 2024-04-25 06:27:33

您需要调用对新页面的请求，否则您将无法获得任何新的html。尝试以下操作：
def parse(self, response): shop_title = response.meta.get('shop_title', self.getShopTitle(response)) sell_count = response.meta.get('sell_count', self.getSellCount(response)) # here you logic with item parsing if sell_count > 0: item_links = response.xpath('//ul[@id="ListViewInner"]/li/h3/a/@href').extract() # yield requests to next pages for link in item_links: yield scrapy.Request(response.urljoin(link), meta={'shop_title': shop_title, 'sell_count': sell_count})
这些新请求也将由parse函数解析。如果需要，也可以设置另一个回调

相关问题更多 >

编程相关推荐

热门问题

热门文章