我试图从这个link中删除产品描述。但是我如何删除包括标记之间的文本的整个文本呢。这是hxs对象
hxs.select('//div[@class="overview"]/div/text()').extract()
但是原始的HTML:
These classic sneakers from
<b>Puma</b>
are best known for their neat and simple design. These basketball shoes are crafted by novel tooling that brings the sleek retro sneaker look. The pair is equipped with a
<b>leather and synthetic upper.</b>
A vulcanized non-slip rubber sole that is
<b>abrasion resistant ensures good traction.</b>
如果我使用上面提到的hxs对象,我会得到:
^{pr2}$我想要的是:
These classic sneakers from Puma are best known for their neat and simple design. These
basketball shoes are crafted by novel tooling that brings the sleek retro sneaker look. The pair is equipped with a leather and synthetic upper.A vulcanized non-slip rubber sole
that is abrasion resistant ensures good traction.
正如您所看到的之间的文本丢失了,您能告诉我如何从页面中提取整个文本吗。在
尝试从标签中获取全部内容
然后,您可以使用regex从中删除标记,或者在没有问题的情况下保留它们。在
像这样的正则表达式:
^{pr2}$相关问题 更多 >
编程相关推荐