抓取RSS服务（谷歌趋势）

2024-06-08 00:32:21 发布

男 | 程序猿一只，喜欢编程写python代码。

我在使用XMLFeedSpider的RSS服务（googletrends，url=https://trends.google.cl/trends/trendingsearches/daily/rss?geo=CL）上使用scrapy，但是我在一些标记上遇到了一些问题，尤其是ht:。我在ht标记上得到了这个错误

class RssGoogleTrends(XMLFeedSpider):

    name = 'Google'
    allowed_domain = ['https://trends.google.com']
    start_urls = ['https://trends.google.com/trends/trendingsearches/daily/rss?geo=CL']

    itertag = 'item'
    def parse_node(self, response, node):
        self.logger.info('Hi, this is a <%s> node!: %s', self.itertag, ''.join(node.getall()))

        item = {}
        item['id'] = node.xpath('title/text()',).extract_first()
        item['link'] = node.xpath('link/text()',).extract_first()                 #define XPath for link
        item['description'] = node.xpath('description/text()',).extract_first()          #define XPath for description
        item['pubDate'] = node.xpath('pubDate/text()',).extract_first()
        item['approx_traffic'] = node.xpath('ht:approx_traffic/text()',).extract_first()
        print(item)
        return item

谢谢你的时间

Tags： text https self node google link extract description

0条回答

目前没有回答

抓取RSS服务（谷歌趋势）

相关问题更多 >

编程相关推荐

热门问题

热门文章

抓取RSS服务（谷歌趋势）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >