如何在Python中从字符串列表中提取动态子字符串?

2024-04-25 09:48:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个从网上刮下来的字符串列表,我想提取它们的'href':

<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>

例如,我希望遍历列表并动态提取

/red-wine

<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>

谢谢!你知道吗


Tags: 字符串列表linkredliitemclasshref
2条回答

你可以用lxml来做这个。像这样:

from lxml import html
import request

response = request.get('<your url>')
tree = html.fromstring(response.text)
href = tree.xpath('//a[@class="subnav__item"]/@href')

这应该可以让您获得类"subnav__item"中的所有href

您还可以使用Beautiful Soup获得所需的文本:

from bs4 import *
data = '\
<li class="subnav__item"><a class="subnav__link " href="/red-wine">Red Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/white-wine">White Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/rose-wine">Rosé Wine</a></li>\
<li class="subnav__item"><a class="subnav__link " href="/fine-wine">Fine Wine</a></li>'
soup = BeautifulSoup(data, "html.parser")

lis = soup.findAll('a')
for li in lis:
    print(li['href'])
/red-wine
/white-wine
/rose-wine
/fine-wine

相关问题 更多 >