BeautifulSoup问题:如何通过匹配准确的标签内容来获得准确的链接?

2024-05-29 04:15:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我想得到在“S-1”后面的链接,而不是在“S-1/A”后面的链接。我试过“.找到所有(lambda标签:标记名称==‘td’和标记。获取()==['S-1'])“,尝试了“.select('td.S-1')”,但获取链接失败。我很感激你的帮助。你知道吗

以下是相关页面来源:

    <tr>
        <td>ADVANCE FINANCIAL BANCORP</td>
        <td>S-1/A</td>
        <td>10/31/1996</td>
        <td><a id="two_column_main_content_rpt_filings_fil_view_0" href="/markets/ipos/filing.ashx?filingid=1567309" target="_blank">Filing</a>
        </td>
    </tr>

    <tr>
        <td>ADVANCE FINANCIAL BANCORP</td>
        <td>S-1</td>
        <td>9/27/1996</td>
        <td><a id="two_column_main_content_rpt_filings_fil_view_1" href="/markets/ipos/filing.ashx?filingid=921318" target="_blank">Filing</a>
        </td>
    </tr>

以下是相关页面来源的截图:

Relevant Page Source

以下是整版源代码的链接:

https://www.nasdaq.com/markets/ipos/company/advance-financial-bancorp-5492-13046?tab=financials


Tags: 标记id链接main来源column页面tr
1条回答
网友
1楼 · 发布于 2024-05-29 04:15:20

试试这个:

from bs4 import BeautifulSoup
import requests    

def getlink(url):
    response = requests.get(url)
    mainpage = BeautifulSoup(response.text, 'html5lib')
    table = mainpage.findAll('table', attrs={"class": "marginB10px"})
    links = table[1].findAll('a')
    return links[1].get('href')    

link = getlink('https://www.nasdaq.com/markets/ipos/company/advance-financial-bancorp-5492-13046?tab=financials')
mainlink = 'https://www.nasdaq.com'
link = mainlink + link
print(link)

输出:

https://www.nasdaq.com/markets/ipos/filing.ashx?filingid=921318

相关问题 更多 >

    热门问题