从选择器中移除子节点

0 投票

2 回答

571 浏览

提问于 2025-04-17 21:53

我正在用Scrapy创建一个项目，目的是从网页上抓取特定的数据。

items = sel.xpath('//div[@class="productTiles cf"]/ul').extract()
     for item in items:
            price = sel.xpath('//ul/li[@class="productPrice"]/span/span[@class="salePrice"]').extract()
            print price

这样做会得到以下结果：

u'<span class="salePrice">$20.43\xa0<span class="reducedFrom">$40.95</span></span>',     
u'<span class="salePrice">$20.93\xa0<span class="reducedFrom">$40.95</span></span>

我想要的只是销售价格，比如20.43和20.93，而不需要其他标签和数据。希望能得到一些帮助。

数据清洗网页解析数据抓取 scrapy框架

2 个回答

span[@class="salePrice"] 会返回一个包含子元素的 span。

这段代码应该只获取最外层 span 的文本内容：

sel.xpath('//ul/li[@class="productPrice"]/span/span[@class="salePrice"]/text()').extract()[0]

回答于 2025-04-17 由 Python大师

分享举报

看起来解决方案是这样的：

//ul/li[@class="productPrice"]/span/span[@class="salePrice"]//text()

它会抓取我想要的那个元素的文本，像这样：

u'$20.43\xa0', u'$20.93\xa0'

现在我只需要处理一下，去掉最后多余的东西，就可以了。如果有人有更好的方法，我很想看看。

回答于 2025-04-17 由 Python大师

分享举报

从选择器中移除子节点

2 个回答

撰写回答