如何从div类下获取a href链接?

2024-04-20 06:00:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从页面的链接中刮取href属性,但最终以[]作为输出

HTML代码是:

<div class="style__width-100p___2woP5 style__flex-row___m8FHw">
  <div class="style__product-card___1gbex style__card___3eL67 style__raised___3MFEA style__white-bg___10nDR style__overflow-hidden___2maTX">
   <a href="/drugs/augmentin-625-duo-tablet-138629" target="_blank" class="button-text style__flex-row___2AKyf style__flex-1___A_qoj style__product-name___HASYw">
  </div>
  </div>

用于刮削的代码:

links = [a['href'] for a in soup.find_all('div', attrs={'class': 'style__width-100p___2woP5 style__flex-row___m8FHw'})]
print(links)

我期望的输出是:

/drugs/augmentin-625-duo-tablet-138629

Tags: 代码divstyleproductcardwidthclassrow
2条回答

这是你想要的吗

from bs4 import BeautifulSoup

sample = """
<div class="style__width-100p___2woP5 style__flex-row___m8FHw">
  <div class="style__product-card___1gbex style__card___3eL67 style__raised___3MFEA style__white-bg___10nDR style__overflow-hidden___2maTX">
   <a href="/drugs/augmentin-625-duo-tablet-138629" target="_blank" class="button-text style__flex-row___2AKyf style__flex-1___A_qoj style__product-name___HASYw">
  </div>
  </div>
"""

soup = BeautifulSoup(sample, "html.parser").find_all('div', attrs={'class': 'style__width-100p___2woP5 style__flex-row___m8FHw'})
links = [i.find("a")["href"] for i in soup]

for link in links:
    print(link)

输出:

/drugs/augmentin-625-duo-tablet-138629

您正在尝试从内部div而不是a标记获取href。要从想要的div获取所有链接,您可以使用以下内容:

from bs4 import BeautifulSoup

div_tag = """
    <div class="style__width-100p___2woP5 style__flex-row___m8FHw">
        <div class="style__product-card___1gbex style__card___3eL67 style__raised___3MFEA style__white-bg___10nDR style__overflow-hidden___2maTX">
            <a href="/drugs/augmentin-625-duo-tablet-138629" target="_blank" class="button-text style__flex-row___2AKyf style__flex-1___A_qoj style__product-name___HASYw">
        </div>
    </div>
"""

soup = BeautifulSoup(div_tag, features="html.parser")
for div in soup.find_all("div", "style__width-100p___2woP5 style__flex-row___m8FHw"):
    for a in div.find_all("a"):
        print(a["href"])

相关问题 更多 >