每个刮削部分的抓取标题

2024-05-29 02:50:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在抓取一家银行的收益发布网站,以获取每个季度发布的PDF链接。每个季度部分都有一个标题(例如:2020年第一季度等)。我在获取报告方面没有问题,但我也希望每个报告都有相应的部分标题。以下是我目前掌握的情况:

import requests
from bs4 import BeautifulSoup
import urllib3


def scraper():

    urllib3.disable_warnings()

    FITBurl = "https://ir.53.com/annual-and-quarterly-results"

    FITBr = requests.get(url=FITBurl, verify=False)

    FITBsoup = BeautifulSoup(FITBr.text,'html.parser')

    #finds quarter name
    quarter = FITBsoup.find_all("h4")[0].text 
    print(quarter)  # I want this respective 'quarter' name for each earnings release

    mylist = {}
    for items in FITBsoup.find_all("div", class_="filefield-file"): #add [0:5] to the end to just get the latest
        for x in items.select("a"):
            title = x.text.strip()
            name = x['title'][:-4]
            if title == 'Quarterly Earnings Release':
                link = x['href']
                print(f'{title} {name}: {link}') # ideally quarter name would replace 'name'
                mylist[name] = link

scraper()

我试图将它添加到我的循环中,但它只是为每个“h4”标题生成所有输出,这是不对的。这看起来很简单,但它真的让我感到困惑。有什么建议吗


Tags: textnameimport标题fortitle报告link
1条回答
网友
1楼 · 发布于 2024-05-29 02:50:17

使用下面的css选择器,然后迭代

import requests
from bs4 import BeautifulSoup
import urllib3


def scraper():

    urllib3.disable_warnings()

    FITBurl = "https://ir.53.com/annual-and-quarterly-results"

    FITBr = requests.get(url=FITBurl, verify=False)

    FITBsoup = BeautifulSoup(FITBr.text,'html.parser')

    for item in FITBsoup.select('.view-inner-wrapper'):
    #finds quarter name
        quarter = item.select_one("h4").text
        print(quarter) 

        mylist = {}
        for items in item.find_all("div", class_="filefield-file"): #add [0:5] to the end to just get the latest
            for x in items.select("a"):
                title = x.text.strip()
                name = x['title'][:-4]
                if title == 'Quarterly Earnings Release':
                  link = x['href']
                  print('{} {}: {}'.format(title,name,link)) # ideally quarter name would replace 'name'
                  mylist[name] = link

scraper()

输出

First Quarter 2020
Quarterly Earnings Release Q1_2020_Earnings_Release_-_Final: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/Q1_2020_Earnings_Release_-_Final.pdf
Fourth Quarter 2019
Quarterly Earnings Release Q4_2019_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/Q4_2019_Earnings_Release.pdf
Third Quarter 2019
Quarterly Earnings Release FITB_3Q19_Quarterly_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/FITB_3Q19_Quarterly_Earnings_Release.pdf
Second Quarter 2019
Quarterly Earnings Release FITB_2Q19_Quarterly_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/FITB_2Q19_Quarterly_Earnings_Release.pdf
First Quarter 2019
Quarterly Earnings Release FITB_1Q19_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/FITB_1Q19_Earnings_Release.pdf
Fourth Quarter 2018
Quarterly Earnings Release 4Q18_Fifth_Third_Bancorp_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/4Q18_Fifth_Third_Bancorp_Earnings_Release.pdf
Third Quarter 2018
Quarterly Earnings Release Fifth_Third_Bancorp_3Q18_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/Fifth_Third_Bancorp_3Q18_Earnings_Release.pdf
Second Quarter 2018
Quarterly Earnings Release FITB_2Q18_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/FITB_2Q18_Earnings_Release.pdf
First Quarter 2018
Quarterly Earnings Release 1Q18_FITB_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/1Q18_FITB_Earnings_Release.pdf
Fourth Quarter 2017
Quarterly Earnings Release 4Q17_FITB_Earnings_Release_1: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/4Q17_FITB_Earnings_Release_1.pdf
Third Quarter 2017
Quarterly Earnings Release FITB_3Q17_Earnings_Release_1: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/FITB_3Q17_Earnings_Release_1.pdf
Second Quarter 2017
Quarterly Earnings Release Fifth_Third_Bancorp_-_2Q17_-_Earnings_Release_1: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/Fifth_Third_Bancorp_-_2Q17_-_Earnings_Release_1.pdf
First Quarter 2017
Quarterly Earnings Release 1Q17_Earnings_Release_report_: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/1Q17_Earnings_Release_report__0.pdf
Fourth Quarter 2016
Quarterly Earnings Release Fifth_Third_Bancorp_4Q16_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/Fifth_Third_Bancorp_4Q16_Earnings_Release.pdf
Third Quarter 2016
Quarterly Earnings Release 3Q16_Earnings_Release_FINAL: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/3Q16_Earnings_Release_FINAL.pdf
Second Quarter 2016
Quarterly Earnings Release Earnings_Release_2Q16_FINAL: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/Earnings_Release_2Q16_FINAL.pdf
First Quarter 2016
Quarterly Earnings Release FITB_1Q16_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/FITB_1Q16_Earnings_Release.pdf
Fourth Quarter 2015
Quarterly Earnings Release 4Q15_Earnings_Release_Final: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/4Q15_Earnings_Release_Final.pdf
Third Quarter 2015
Quarterly Earnings Release 3Q15_Earnings_Release_Final: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/3Q15_Earnings_Release_Final.pdf
Second Quarter 2015
Quarterly Earnings Release 2Q15_Earnings_Release_Final: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/2Q15_Earnings_Release_Final.pdf
First Quarter 2015
Quarterly Earnings Release FITB_1Q15_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/FITB_1Q15_Earnings_Release_0.pdf
Fourth Quarter 2014
Quarterly Earnings Release Earnings_Release_4Q14: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/Earnings_Release_4Q14.pdf
Third Quarter 2014
Quarterly Earnings Release 3Q14_FITB_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/3Q14_FITB_Earnings_Release.pdf
Second Quarter 2014
Quarterly Earnings Release 2Q14_Earnings_Release_FINAL: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/2Q14_Earnings_Release_FINAL.pdf
First Quarter 2014
Quarterly Earnings Release 1Q14_Earnings_Release_Final_with_Back_Tables1: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/1Q14_Earnings_Release_Final_with_Back_Tables1.pdf
Fourth Quarter 2013
Quarterly Earnings Release FITB_Q414_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/FITB_Q414_Earnings_Release.pdf
Third Quarter 2013
Quarterly Earnings Release FITB_Release_3Q13_FINAL_with_Tables: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/FITB_Release_3Q13_FINAL_with_Tables.pdf
Second Quarter 2013
Quarterly Earnings Release 2Q13EarningsReleaseFINAL2: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/2Q13EarningsReleaseFINAL2.pdf
First Quarter 2013
Quarterly Earnings Release FITB_1Q13_Earnings_Release: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/FITB_1Q13_Earnings_Release.pdf
Fourth Quarter 2012
Quarterly Earnings Release 4Q12_Earnings_Release_Final1: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/4Q12_Earnings_Release_Final1.pdf
Third Quarter 2012
Quarterly Earnings Release 3Q12_Earnings_Release_FINAL: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/3Q12_Earnings_Release_FINAL_0.pdf
Second Quarter 2012
Quarterly Earnings Release 2Q12_Earnings_Release_final: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/2Q12_Earnings_Release_final.pdf
First Quarter 2012
Quarterly Earnings Release FITB-1Q12_Earnings_Release_FINAL2: https://ir.53.com/sites/53-e.investorhq.businesswire.com/files/report/additional/FITB-1Q12_Earnings_Release_FINAL2.pdf

相关问题 更多 >

    热门问题