如何在不复制数据的情况下连接pandas数据帧？

arr = np.random.randn(12).reshape(6, 2) df = pd.DataFrame(arr, columns = ('VALE5', 'PETR4'), index = dates) arr2 = np.random.randn(12).reshape(6, 2) df2 = pd.DataFrame(arr, columns = ('AMBV3', 'BBDC4'), index = dates) df_concat = pd.concat(dict(A = df, B = df2),axis=1) pp(df) pp(df_concat) arr[0, 0] = 9999999.99 pp(df) pp(df_concat)

In [56]: pp(df) VALE5 PETR4 2013-01-01 -0.557180 0.170073 2013-01-02 -0.975797 0.763136 2013-01-03 -0.913254 1.042521 2013-01-04 -1.973013 -2.069460 2013-01-05 -1.259005 1.448442 2013-01-06 -0.323640 0.024857 In [57]: pp(df_concat) A B VALE5 PETR4 AMBV3 BBDC4 2013-01-01 -0.557180 0.170073 -0.557180 0.170073 2013-01-02 -0.975797 0.763136 -0.975797 0.763136 2013-01-03 -0.913254 1.042521 -0.913254 1.042521 2013-01-04 -1.973013 -2.069460 -1.973013 -2.069460 2013-01-05 -1.259005 1.448442 -1.259005 1.448442 2013-01-06 -0.323640 0.024857 -0.323640 0.024857 In [58]: arr[0, 0] = 9999999.99 In [59]: pp(df) VALE5 PETR4 2013-01-01 9999999.990000 0.170073 2013-01-02 -0.975797 0.763136 2013-01-03 -0.913254 1.042521 2013-01-04 -1.973013 -2.069460 2013-01-05 -1.259005 1.448442 2013-01-06 -0.323640 0.024857 In [60]: pp(df_concat) A B VALE5 PETR4 AMBV3 BBDC4 2013-01-01 -0.557180 0.170073 -0.557180 0.170073 2013-01-02 -0.975797 0.763136 -0.975797 0.763136 2013-01-03 -0.913254 1.042521 -0.913254 1.042521 2013-01-04 -1.973013 -2.069460 -1.973013 -2.069460 2013-01-05 -1.259005 1.448442 -1.259005 1.448442 2013-01-06 -0.323640 0.024857 -0.323640 0.024857

1条回答

网友

1楼 · 发布于 2024-04-26 11:04:33

你不能（至少很容易）。当您调用concat时，最终会调用np.concatenate。

见this answer explaining why you can't concatenate arrays without copying。不足之处在于，不能保证数组在内存中是连续的。

这里有一个简单的例子

a = rand(2, 10)
x, y = a
z = vstack((x, y))
print 'x.base is a and y.base is a ==', x.base is a and y.base is a
print 'x.base is z or y.base is z ==', x.base is z or y.base is z

输出：

x.base is a and y.base is a == True
x.base is z or y.base is z == False

尽管x和y共享相同的base，即a，concatenate（因此vstack）不能假定它们是这样做的，因为通常需要连接任意跨步的数组。

您可以轻松地生成两个具有不同跨步的阵列，共享相同的内存，如下所示：

a = arange(10)
b = a[::2]
print a.strides
print b.strides

输出：

(8,)
(16,)

这就是为什么会发生以下情况：

In [214]: a = arange(10)

In [215]: a[::2].view(int16)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-215-0366fadb1128> in <module>()
----> 1 a[::2].view(int16)

ValueError: new type not compatible with array.

In [216]: a[::2].copy().view(int16)
Out[216]: array([0, 0, 0, 0, 2, 0, 0, 0, 4, 0, 0, 0, 6, 0, 0, 0, 8, 0, 0, 0], dtype=int16)

编辑：在df1.dtype != df2.dtype不会复制时使用pd.merge(df1, df2, copy=False)（或df1.merge(df2, copy=False)）。否则，复制一份。

相关问题更多 >

编程相关推荐

热门问题

热门文章