熊猫中的msgpack
应该是pickle
的替代品。
This is a lightweight portable binary format, similar to binary JSON, that is highly space efficient, and provides good performance both on the writing (serialization), and reading (deserialization).
不过,我发现它的性能似乎与pickle没有什么区别。
df = pd.DataFrame(np.random.randn(10000, 100))
>>> %timeit df.to_pickle('test.p')
10 loops, best of 3: 22.4 ms per loop
>>> %timeit df.to_msgpack('test.msg')
10 loops, best of 3: 36.4 ms per loop
>>> %timeit pd.read_pickle('test.p')
100 loops, best of 3: 10.5 ms per loop
>>> %timeit pd.read_msgpack('test.msg')
10 loops, best of 3: 24.6 ms per loop
问题:由于pickle存在潜在的安全问题,msgpack相对于pickle有什么好处?pickle仍然是序列化数据的首选方法,还是目前存在更好的替代方法?
泡菜更适合以下情况:
protocol=
)cloudpickle
)MsgPack更适合以下情况:
正如@Jeff上面提到的,this blogpost可能会引起兴趣
相关问题 更多 >
编程相关推荐