使用PythonyFinance多线程下载雅虎股票历史记录

2024-06-07 10:13:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试下载一系列股票的历史数据,并将它们导出到一个csv文件中。我可以把它作为一个for循环,但当股票行情在1000年代时,这是非常缓慢的。我正在尝试多线程处理这个过程,但我不断得到许多不同的错误。有时它只会下载1个文件,有时会下载2或3个,有时甚至会下载6个,但决不会超过这一点。我猜这与6核12线程处理器有关,但我真的不知道

import csv
import os
import yfinance as yf
import pandas as pd
from threading import Thread

ticker_list = []

with open('tickers.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    name = None
    for row in reader:
        if row[0]:
            ticker_list.append(row[0])

start_date = '2019-03-03'
end_date = '2020-03-04'

data = pd.DataFrame()

def y_hist(i):
    ticker = ticker_list[i]
    data = yf.download(ticker, start=start_date, end=end_date, group_by="ticker")
    data.to_csv('yhist/' + ticker + '.csv', sep=',', encoding='utf-8')

threads = []

for i in range(os.cpu_count()):
    print('registering thread %d' % i)
    threads.append(Thread(target=y_hist,args=(i,)))

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

print('done')

这是一个csv的示例文件,带有足够的代码来测试这一点ticker.csv

以下是我阅读并使用代码的页面,旨在实现这一目标:

multithreading-to-scrape-yahoo-finance

Engineer Man threads

an-introduction-to-asynchronous-programming-in-python

这是一个带有输出的简化版本,可能有助于澄清问题

import os
import pandas as pd
import yfinance as yf
from threading import Thread

ticker_list = ['IBM','MSFT','QQQ','SPY','FB','XLV','XLF','XLK','XLE','GTHX','IYR','ONE','ROG','OLED','GLD']

def y_hist():
    for ticker in ticker_list:
        print(ticker)

threads = []

for i in range(os.cpu_count()):
    threads.append(Thread(target=y_hist))

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

输出:

IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
OLEDIBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
GLD
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
IBM
GLD
MSFT
ROG
OLED
GLD

QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
IBM
MSFT
QQQ
SPY
IBM
MSFT
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
IBM
MSFT
QQQ
SPY
FB
XLV
XLF
XLK
XLE
GTHX
IYR
ONE
ROG
OLED
GLD
GLD

Tags: fboneibmtickerspymsftxleqqq
1条回答
网友
1楼 · 发布于 2024-06-07 10:13:40

虽然这并不能直接修复我的坏代码,但它是一个可以得到相同结果的解决方案。它使用yfinance内置的多线程功能。不幸的是,我仍然不知道为什么原始代码不能工作,并且仍然希望得到反馈。与此同时,如果有人正在寻找解决同一问题的办法,这将起作用

import csv
import os
import yfinance as yf
import pandas as pd
import time
start = time.time()

ticker_list = []

with open('tickers.csv', 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    name = None
    for row in reader:
        if row[0]:
            ticker_list.append(row[0])

data = yf.download(
        tickers = ticker_list,
        period = '1y',
        interval = '1d',
        group_by = 'ticker',
        auto_adjust = False,
        prepost = False,
        threads = True,
        proxy = None
    )

data = data.T

for ticker in ticker_list:
    data.loc[(ticker,),].T.to_csv('yhist/' + ticker + '.csv', sep=',', encoding='utf-8')

print('It took', time.time()-start, 'seconds.')

运行400个报价器列表的时间:

线程设置为True时

[**********************************100%*************************************已完成400件中的400件

这花了23.420897006988525秒

线程设置为False时

[**********************************100%*************************************已完成400件中的400件

这花了133.77732181549072秒

相关问题 更多 >

    热门问题