Getting KeyError:尝试从TripAdvisor中获取电子邮件地址时出现“link”

import scrapy from scrapy import Request class RestaurantSpider(scrapy.Spider): name = 'restaurant' start_urls = [ 'https://www.tripadvisor.com.my/Restaurants-g298570-Kuala_Lumpur_Wilayah_Persekutuan.html#EATERY_OVERVIEW_BOX']

def parse(self, response): listings = response.xpath( '//div[@class="restaurants-list-ListCell__cellContainer--2mpJS"]') for listing in listings: link = listing.xpath( './/a[@class="restaurants-list-ListCell__restaurantName--2aSdo"]/@href').extract_first() text = listing.xpath( './/a[@class="restaurants-list-ListCell__restaurantName--2aSdo"]/text()').extract_first() yield scrapy.Request(url=response.urljoin(link), callback=self.parse_listing, meta={ 'Link': link, 'Text': text } ) next_urls = response.xpath( '//*[@class="nav next rndBtn ui_button primary taLnk"]/@href').extract() for next_url in next_urls: yield scrapy.Request(response.urljoin(next_url), callback=self.parse)

def parse_listing(self, response): link = response.meta['link'] text = response.meta['text'] email = response.xpath( '//a[contains(@href, "mailto")]/@href').extract_first() yield { 'Link': link, 'Text': text, 'Email': email }

2条回答

网友

1楼 · 编辑于 2024-04-27 00:35:33

您在parse()方法中定义了meta={'Link':link,'Text':text}，但在parse_listing()方法中调用了错误的键link，以获取导致错误的值。你的XPath很容易出错。你知道吗

尝试以下操作以使其正常工作：

class RestaurantSpider(scrapy.Spider):
    name = 'restaurant'

    start_urls = [
        'https://www.tripadvisor.com.my/Restaurants-g298570-Kuala_Lumpur_Wilayah_Persekutuan.html#EATERY_OVERVIEW_BOX'
    ]

    def parse(self, response):
        for listing in response.xpath('//div[contains(@class,"__cellContainer ")]'):
            link = listing.xpath('.//a[contains(@class,"__restaurantName ")]/@href').get()
            text = listing.xpath('.//a[contains(@class,"__restaurantName ")]/text()').get()
            complete_url = response.urljoin(link)
            yield scrapy.Request(
                url=complete_url,
                callback=self.parse_listing,
                meta={'link': complete_url,'text': text}
            )

        next_url = response.xpath('//*[contains(@class,"pagination")]/*[contains(@class,"next")]/@href').get()
        if next_url:
            yield scrapy.Request(response.urljoin(next_url), callback=self.parse)

    def parse_listing(self, response):
        link = response.meta['link']
        text = response.meta['text']
        email = response.xpath('//a[contains(@href, "mailto:")]/@href').get()
        yield {'Link': link,'Text': text,'Email': email}

网友

2楼 · 编辑于 2024-04-27 00:35:33

将“link”替换为“href”

无法复制您的代码，但似乎不是链接属性。。。。所以抓住“href”

<a href="/Restaurant_Review-g298570-d15211507-Reviews-Vintage_1988_Cafe-Kuala_Lumpur_Wilayah_Persekutuan.html" class="restaurants-list-ListCell__restaurantName 2aSdo" target="_blank">Vintage 1988 Cafe</a>


link = response.meta['href']

相关问题更多 >

编程相关推荐

热门问题

热门文章