Python/Pandas如何存储外汇勾数数据进行分析

2024-05-15 02:06:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我对编程、Python和Pandas不熟悉,所以希望这不是一个愚蠢的问题。

我从here下载了一些外汇数据。一个月的数据量约为50mil行,以CSV格式对所有对。

我希望最终能够跨多个时间框架和工具测试一个策略。在

下面是我使用的代码:

file_address = '/Users/Oliver/PyCharm/FX_app/test_data/EURUSD_test.csv'

df = pd.read_csv( file_address,
                  names       = ['Symbol', 'Date_Time', 'Bid', 'Ask'],
                  index_col   = 1,
                  parse_dates = True,
                  converters  = { 'Date_Time': convert_string_to_datetime }
                  )

#                                                               a non-PEP8 format
#                                                               didactic purpose

除了被截断的测试文件外,此读入过程需要很长时间。在

  • 有没有一种方法可以让Pandas更快地读取文件?在
  • 熊猫能合理处理的数据大小是否有限制?在

任何帮助都将不胜感激。在


Tags: 文件csv数据testpandasdateheretime
1条回答
网友
1楼 · 发布于 2024-05-15 02:06:49

我曾经玩过一些现金股票的逐点数据(最高30%的流动性股票,每天超过500万的记录)。下面是我使用chunksizehdf5处理文件读取问题的策略。在

import pandas as pd

# this is your FX file path
file_path = '/home/Jian/Downloads/EURUSD-2015-05.csv'
# read into 10,000 rows per chunk, lazy generator, very fast
file_reader = pd.read_csv(file_path, header=None, names=['Symbol', 'Date_time', 'Bid', 'Ask'], index_col=['Date_time'], parse_dates=['Date_time'], chunksize=10000)

# create your HDF5 at any path you like, with compression level 5 (0-9, 9 is extreme)
Jian_h5 = '/media/Primary Disk/Jian_Python_Data_Storage.h5'
h5_file = pd.HDFStore(Jian_h5, complevel=5, complib='blosc')  

# then write all records into hdf5 file
# this will take a while ... but it emphasizes on re-usability across different IPython sessions 
i = 1
for chunk in file_reader:
    h5_file.append('fx_tick_data', chunk, complevel=5, complib='blosc')
    i += 1
    print('Writing Chunk no.{}'.format(i))

Writing Chunk no.1
Writing Chunk no.2
Writing Chunk no.3
Writing Chunk no.4
...
Writing Chunk no.425


# check your hdf5 file, all 4,237,535 records are there
h5_file

Out[8]: 
<class 'pandas.io.pytables.HDFStore'>
File path: /media/Primary Disk/Jian_Python_Data_Storage.h5
/fx_tick_data            frame_table  (typ->appendable,nrows->4237535,ncols->3,indexers->[index])

# close file IO
h5_file.close()

# the advantage is that after you closing your current session, 
# you can still read the file very quickly when you reopen another session

# reopen your IPython session
Jian_h5 = '/media/Primary Disk/Jian_Python_Data_Storage.h5'
h5_file = pd.HDFStore(Jian_h5)  

%time fx_df = h5_file['fx_tick_data']
CPU times: user 1.93 s, sys: 439 ms, total: 2.37 s
Wall time: 2.37 s

Out[12]: 
                             Symbol     Bid     Ask
Date_time                                          
2015-05-01 00:00:00.017000  EUR/USD  1.1211  1.1212
2015-05-01 00:00:00.079000  EUR/USD  1.1212  1.1212
2015-05-01 00:00:00.210000  EUR/USD  1.1212  1.1213
2015-05-01 00:00:00.891000  EUR/USD  1.1212  1.1213
2015-05-01 00:00:05.179000  EUR/USD  1.1212  1.1213
2015-05-01 00:00:06.257000  EUR/USD  1.1212  1.1213
2015-05-01 00:00:09.195000  EUR/USD  1.1212  1.1213
2015-05-01 00:00:09.242000  EUR/USD  1.1212  1.1212
2015-05-01 00:00:09.257000  EUR/USD  1.1211  1.1212
2015-05-01 00:00:09.311000  EUR/USD  1.1211  1.1212
2015-05-01 00:00:09.538000  EUR/USD  1.1211  1.1212
2015-05-01 00:00:14.177000  EUR/USD  1.1211  1.1212
2015-05-01 00:00:14.238000  EUR/USD  1.1211  1.1212
2015-05-01 00:00:15.886000  EUR/USD  1.1211  1.1212
2015-05-01 00:00:17.122000  EUR/USD  1.1211  1.1212
...                             ...     ...     ...
2015-05-31 23:59:45.054000  EUR/USD  1.0958  1.0959
2015-05-31 23:59:45.063000  EUR/USD  1.0958  1.0958
2015-05-31 23:59:45.065000  EUR/USD  1.0958  1.0958
2015-05-31 23:59:45.073000  EUR/USD  1.0958  1.0958
2015-05-31 23:59:45.076000  EUR/USD  1.0958  1.0958
2015-05-31 23:59:45.210000  EUR/USD  1.0957  1.0958
2015-05-31 23:59:45.308000  EUR/USD  1.0957  1.0958
2015-05-31 23:59:45.806000  EUR/USD  1.0957  1.0958
2015-05-31 23:59:45.809000  EUR/USD  1.0957  1.0958
2015-05-31 23:59:45.909000  EUR/USD  1.0957  1.0958
2015-05-31 23:59:46.316000  EUR/USD  1.0957  1.0958
2015-05-31 23:59:46.527000  EUR/USD  1.0957  1.0958
2015-05-31 23:59:47.711000  EUR/USD  1.0957  1.0958
2015-05-31 23:59:51.721000  EUR/USD  1.0957  1.0958
2015-05-31 23:59:57.063000  EUR/USD  1.0957  1.0958

[4237535 rows x 3 columns]

不错,我们只需要2秒左右,就可以在以后的会话中从HDF5读取整个文件。在

相关问题 更多 >

    热门问题