如何使用scrapy删除不同标签之间包含的文本

2024-03-29 13:47:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从这个link中删除产品描述。但是我如何删除包括标记之间的文本的整个文本呢。这是hxs对象 hxs.select('//div[@class="overview"]/div/text()').extract()但是原始的HTML:

These classic sneakers from
<b>Puma</b>
are best known for their neat and simple design. These basketball shoes are crafted by novel tooling that brings the sleek retro sneaker look. The pair is equipped with a
<b>leather and synthetic upper.</b>
A vulcanized non-slip rubber sole that is
<b>abrasion resistant ensures good traction.</b>

如果我使用上面提到的hxs对象,我会得到:

^{pr2}$

我想要的是:

These classic sneakers from Puma are best known for their neat and simple design. These
 basketball shoes are crafted by novel tooling that brings the sleek retro sneaker look. The pair is equipped with a leather and synthetic upper.A vulcanized non-slip rubber sole 
that is abrasion resistant ensures good traction.

正如您所看到的之间的文本丢失了,您能告诉我如何从页面中提取整个文本吗。在


Tags: and对象from文本divthatisare