如何提取scray中的嵌套文本？

2024-06-02 06:53:25 发布

男 | 程序猿一只，喜欢编程写python代码。

我试图用Scrapy从这个网站上提取一段品牌描述： http://us.asos.com/hope-and-ivy/hope-ivy-dotty-mesh-midi-dress-with-ruffle-detail/prd/8663409?clr=black&cid=2623&pgesize=36&pge=0&totalstyles=627&gridsize=3&gridrow=1&gridcolumn=1

HTML元素如下所示：

<div class="brand-description">
  <h4>Brand</h4>
  <span>"Prom queens and wedding guests, claim the best-dressed title in "
    <a href="/Women/A-To-Z-Of-Brands/Hope-And-Ivy/Cat/pgecategory.aspx?cid=21368">
      <strong>"Hope and Ivy's"</strong>
    </a> 
    "occasion-ready collection. Shop its notice-me styles for hand-painted florals, Bardot necklines and figure-flattering pencil dresses."
  </span>
</div>

我想要的结果是：

“舞会皇后和婚礼宾客，在Hope and Ivy’s Convention ready collection中获得最佳着装称号。购买其notice me风格的手绘花朵、芭铎领口和修身铅笔裙。”

我试过这个方法：

^{pr2}$

然而，我得到的文本列表中缺少“strong”标签中的那些，即“Hope and Ivy's”：

['Prom queens and wedding guests, claim the best-dressed title in ',  ' occasion-ready collection. Shop its notice-me styles for hand-painted florals, Bardot necklines and figure-flattering pencil dresses.']

我的问题是，我能在不注意“href”标记的情况下获得纯文本吗？在

Tags： and div h4 collection notice strong me span

1条回答

网友

1楼 · 发布于 2024-06-02 06:53:25

您可能仍然需要进行一些后期处理，但这可能是您所能做的最好的方法：

response.xpath('normalize-space(//div[@class="brand-description"]/span)').extract_first()

这会给你

^{pr2}$

如何提取scray中的嵌套文本？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何提取scray中的嵌套文本？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >