抓取RSS服务(谷歌趋势)

2024-06-08 00:32:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我在使用XMLFeedSpider的RSS服务(googletrends,url=https://trends.google.cl/trends/trendingsearches/daily/rss?geo=CL)上使用scrapy,但是我在一些标记上遇到了一些问题, 尤其是ht:。我在ht标记上得到了这个错误

class RssGoogleTrends(XMLFeedSpider):

    name = 'Google'
    allowed_domain = ['https://trends.google.com']
    start_urls = ['https://trends.google.com/trends/trendingsearches/daily/rss?geo=CL']

    itertag = 'item'
    def parse_node(self, response, node):
        self.logger.info('Hi, this is a <%s> node!: %s', self.itertag, ''.join(node.getall()))

        item = {}
        item['id'] = node.xpath('title/text()',).extract_first()
        item['link'] = node.xpath('link/text()',).extract_first()                 #define XPath for link
        item['description'] = node.xpath('description/text()',).extract_first()          #define XPath for description
        item['pubDate'] = node.xpath('pubDate/text()',).extract_first()
        item['approx_traffic'] = node.xpath('ht:approx_traffic/text()',).extract_first()
        print(item)
        return item

谢谢你的时间


Tags: texthttpsselfnodegooglelinkextractdescription

热门问题