从Pandas Datafram中写入格式化的二进制文件

import numpy as np import pandas as pd input_file_name = 'test.hst' input_file = open(input_file_name, 'rb') header = input_file.read(96) dt_header = np.dtype([('version', 'i4'), ('copyright', 'S64'), ('symbol', 'S12'), ('period', 'i4'), ('digits', 'i4'), ('timesign', 'i4'), ('last_sync', 'i4')]) header = np.fromstring(header, dt_header) dt_records = np.dtype([('ctm', 'i4'), ('open', 'f8'), ('low', 'f8'), ('high', 'f8'), ('close', 'f8'), ('volume', 'f8')]) records = np.fromfile(input_file, dt_records) input_file.close() df_records = pd.DataFrame(records) # Now, do some changes in the individual values of df_records # and then write it back to a binary file

2条回答

网友

1楼 · 编辑于 2024-05-26 11:12:00

熊猫现在提供的a wide variety of formats比tofile（）更稳定。to file（）最适合于快速文件存储，在这种情况下，您不希望文件在另一台计算机上使用，因为数据可能具有不同的endianness（big-/little endian）。

Format Type Data Description     Reader         Writer
text        CSV                  read_csv       to_csv
text        JSON                 read_json      to_json
text        HTML                 read_html      to_html
text        Local clipboard      read_clipboard to_clipboard
binary      MS Excel             read_excel     to_excel
binary      HDF5 Format          read_hdf       to_hdf
binary      Feather Format       read_feather   to_feather
binary      Parquet Format       read_parquet   to_parquet
binary      Msgpack              read_msgpack   to_msgpack
binary      Stata                read_stata     to_stata
binary      SAS                  read_sas    
binary      Python Pickle Format read_pickle    to_pickle
SQL         SQL                  read_sql       to_sql
SQL         Google Big Query     read_gbq       to_gbq

我目前正在使用HDF5，但如果我在亚马逊，我会使用拼花。

使用to_hdf的示例：

df.to_hdf('tmp.hdf','df', mode='w')
df2 = pd.read_hdf('tmp.hdf','df')

但是，HDF5格式可能不适合长期存档，因为它是fairly complex。它有150页的规范，只有一个300000行的C实现。

网友

2楼 · 编辑于 2024-05-26 11:12:00

我不清楚DataFrame是视图还是副本，但假设它是副本，您可以使用^{} method of the ^{}。

这将返回一个记录数组，然后可以使用tofile将其放入磁盘。

例如

df_records = pd.DataFrame(records)
# do some stuff
new_recarray = df_records.to_records()
new_recarray.tofile("myfile.npy")

数据将以压缩字节的形式驻留在内存中，其格式由重新排列数据类型描述。

相关问题更多 >

编程相关推荐

热门问题

热门文章