<p>熊猫现在提供的<a href="https://pandas.pydata.org/pandas-docs/stable/io.html#io-perf" rel="nofollow noreferrer">a wide variety of formats</a>比tofile()更稳定。to file()最适合于快速文件存储,在这种情况下,您不希望文件在另一台计算机上使用,因为数据可能具有不同的endianness(big-/little endian)。</p>
<pre><code>Format Type Data Description Reader Writer
text CSV read_csv to_csv
text JSON read_json to_json
text HTML read_html to_html
text Local clipboard read_clipboard to_clipboard
binary MS Excel read_excel to_excel
binary HDF5 Format read_hdf to_hdf
binary Feather Format read_feather to_feather
binary Parquet Format read_parquet to_parquet
binary Msgpack read_msgpack to_msgpack
binary Stata read_stata to_stata
binary SAS read_sas
binary Python Pickle Format read_pickle to_pickle
SQL SQL read_sql to_sql
SQL Google Big Query read_gbq to_gbq
</code></pre>
<p>我目前正在使用HDF5,但如果我在亚马逊,我会使用拼花。</p>
<p>使用<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_hdf.html" rel="nofollow noreferrer">to_hdf</a>的示例:</p>
<pre><code>df.to_hdf('tmp.hdf','df', mode='w')
df2 = pd.read_hdf('tmp.hdf','df')
</code></pre>
<p>但是,HDF5格式可能不适合长期存档,因为它是<a href="https://cyrille.rossant.net/moving-away-hdf5/" rel="nofollow noreferrer">fairly complex</a>。它有150页的规范,只有一个300000行的C实现。</p>