pandas.DataFrame在Python2和Python3间加载/保存:pickle协议问题

7 投票
3 回答
6608 浏览
提问于 2025-04-17 14:04

我还没弄明白怎么在Python 2和Python 3之间用pandas的DataFrame进行pickle的加载和保存。pickle有一个“协议”选项,我试过但没成功,希望有人能给我一个简单的建议。下面是导致错误的代码:

python2.7

>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a2')
>>> a = pandas.DataFrame.load('a2')
>>> a = pandas.DataFrame.load('a3')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
    return com.load(path)
  File "/usr/local/lib/python2.7/site-packages/pandas-0.10.1-py2.7-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
    return pickle.load(f)
ValueError: unsupported pickle protocol: 3

python3

>>> import pandas; from pylab import *
>>> a = pandas.DataFrame(randn(10,10))
>>> a.save('a3')
>>> a = pandas.DataFrame.load('a3')
>>> a = pandas.DataFrame.load('a2')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/generic.py", line 30, in load
    return com.load(path)
  File "/usr/local/lib/python3.3/site-packages/pandas-0.10.1-py3.3-linux-x86_64.egg/pandas/core/common.py", line 1107, in load
    return pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf4 in position 0: ordinal not in range(128)

也许期待pickle在不同的Python版本之间能正常工作有点过于乐观了?

3 个回答

1

你可以覆盖pickle包中可用的最高协议:

import pickle as pkl
import pandas as pd
if __name__ == '__main__':
    # this constant is defined in pickle.py in the pickle package:"
    pkl.HIGHEST_PROTOCOL = 2
    # 'foo.pkl' was saved in pickle protocol 4
    df = pd.read_pickle(r"C:\temp\foo.pkl")

    # 'foo_protocol_2' will be saved in pickle protocol 2 
    # and can be read in pandas with Python 2
    df.to_pickle(r"C:\temp\foo_protocol_2.pkl")

这确实不是一个优雅的解决方案,但它可以在不直接修改pandas代码的情况下完成工作。

更新:我发现新版的pandas允许在.to_pickle函数中指定pickle版本: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_pickle.html[1] DataFrame.to_pickle(path, compression='infer', protocol=4)

1

如果有人使用了 pandas.DataFrame.to_pickle(),那么可以按照下面的步骤修改源代码,以便设置 pickle 协议:

1) 在源文件 /pandas/io/pickle.py 中(修改之前先把原文件复制一份,命名为 /pandas/io/pickle.py.ori),搜索以下几行:

def to_pickle(obj, path):

pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)

将这些行改成:

def to_pickle(obj, path, protocol=pkl.HIGHEST_PROTOCOL):

pkl.dump(obj, f, protocol=protocol)

2) 在源文件 /pandas/core/generic.py 中(同样,修改之前先复制原文件为 /pandas/core/generic.py.ori),搜索以下几行:

def to_pickle(self, path):

return to_pickle(self, path)

将这些行改成:

def to_pickle(self, path, protocol=None):

return to_pickle(self, path, protocol)

3) 如果你的 Python 内核正在运行,重启它,然后使用任何可用的 pickle 协议(0, 1, 2, 3, 4)来保存你的数据框:

# Python 2.x can read this
df.to_pickle('my_dataframe.pck', protocol=2)

# protocol will be the highest (4), Python 2.x can not read this
df.to_pickle('my_dataframe.pck')

4) 在 pandas 升级后,重复步骤 1 和 2。

5)(可选) 可以向开发者请求在官方版本中添加这个功能(因为如果没有这些修改,你的代码在其他 Python 环境中会出错)。

祝你有个愉快的一天!

8

我也遇到过同样的问题。你可以用下面这个函数来修改数据框(dataframe)保存的文件格式,这样就能在python3中使用了:

import pickle
def change_pickle_protocol(filepath,protocol=2):
    with open(filepath,'rb') as f:
        obj = pickle.load(f)
    with open(filepath,'wb') as f:
        pickle.dump(obj,f,protocol=protocol)

然后你就可以在python2中顺利打开这个文件了。

撰写回答