如何加快我的进程

2024-04-26 02:15:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我写了一个脚本,可以在网上搜集股票列表的数据。scraper必须从两个单独的页面获取数据,因此每个股票符号必须刮取两个不同的页面。如果我在1000个项目的列表上运行这个过程,大约需要30分钟才能完成。这并不可怕,我可以设置它并忘记它,但我想知道是否有一种方法可以加快这个过程。也许存储数据并等待在最后而不是在每个循环中写入所有数据?任何其他的想法都可以。你知道吗

import requests
from BeautifulSoup import BeautifulSoup
from progressbar import ProgressBar
import csv

symbols = {'AMBTQ','AABA','AAOI','AAPL','AAWC','ABEC','ABQQ','ACFN','ACIA','ACIW','ACLS'}
pbar = ProgressBar()

with open('industrials.csv', "ab") as csv_file:
    writer = csv.writer(csv_file, delimiter=',')
    writer.writerow(['Symbol','5 Yr EPS','EPS TTM'])
    for s in pbar(symbols):
        try:
            url1 = 'https://research.tdameritrade.com/grid/public/research/stocks/fundamentals?symbol='
            full1 = url1 + s
            response1 = requests.get(full1)
            html1 = response1.content
            soup1 = BeautifulSoup(html1)

            for hist_div in soup1.find("div", {"data-module-name": "HistoricGrowthAndShareDetailModule"}):
                EPS5yr = hist_div.find('label').text

        except Exception as e:
            EPS5yr = 'Bad Data'
            pass

        try:
            url2 = 'https://research.tdameritrade.com/grid/public/research/stocks/summary?symbol='
            full2 = url2 + s
            response2 = requests.get(full2)
            html2 = response2.content
            soup2 = BeautifulSoup(html2)

            for div in soup2.find("div", {"data-module-name": "StockSummaryModule"}):
                EPSttm = div.findAll("dd")[11].text

        except Exception as e:
            EPSttm = "Bad data"
            pass

        writer.writerow([s,EPS5yr,EPSttm])

Tags: csv数据inimportdivfordataas