关于copy()方法的查询

2024-04-25 11:54:11 发布

您现在位置:Python中文网/ 问答频道 /正文

df1 = pd.DataFrame({'A':['aaa','bbb','ccc'], 'B':[1,2,3]})
df2=df1.copy()
df1.loc[0,'A']='111' #modifying the 1st element of column A
print df1
print df2

修改df1时,不会修改对象sf2。我期待它,因为我用了copy()

s1=pd.Series([[1,2],[3,4]])
s2=s1.copy()
s1[0][0]=0 #modifying the 1st element of list [1,2]
print s1
print s2

但为什么在这种情况下s2也发生了变化?我不希望s2发生任何变化,因为我使用了copy()来创建它,但出乎意料的是,当修改s1时,对象s2也被修改了。我不明白为什么。你知道吗


Tags: ofthe对象dataframeelementpddf1df2
2条回答

发生这种情况是因为pd.Series是dtype=object,所以它实际上复制了一堆对python对象的引用。观察:

In [1]: import pandas as pd

In [2]: s1=pd.Series([[1,2],[3,4]])
   ...:

In [3]: s1
Out[3]:
0    [1, 2]
1    [3, 4]
dtype: object

In [4]: s1.dtype
Out[4]: dtype('O')

由于list对象是可变的,因此操作:

s1[0][0]=0

就地修改列表。你知道吗

这种行为是一种“浅层复制”,通常情况下,pandas数据结构没有问题,因为通常您将使用数字数据类型,在这种情况下浅层复制不适用,或者如果您使用object dtype,您将使用python string对象,它们是不可变的。你知道吗

注意,pandas容器对深度副本有不同的概念。注意.copy方法有一个默认的deep=True,但是从文档中可以看到:

When deep=True (default), a new object will be created with a copy of the calling object's data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).

When deep=False, a new object will be created without copying the calling object's data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa). ... When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).

同样,这是因为pandas是为使用数字数据类型而设计的,对str对象有一些内置的支持。一个pd.Serieslist对象确实很奇怪,对于pd.Series来说,这并不是一个好的用例。你知道吗

当您复制s1对象时,它实际上创建了一个新的、独立的Series对象并将其引用到s2——正如您所期望的那样。但是,s1Series对象中的two list没有与Series重复。它只是复制了他们的参考资料。你知道吗

请参见here,以了解Python referenceobject之间的区别。你知道吗

简单地说,Python variable与实际的Python对象不同。变量(如s1s2)只是指向实际对象所在内存位置的引用。你知道吗

因为原始序列对象s1包含两个列表引用,而不是两个列表对象,所以只复制了内部列表对象的references(而不是列表对象本身)。你知道吗

import pandas as pd

s1=pd.Series([[1,2],[3,4]])
# The oject referenced by variable "s1" has a memory address
print ("s1:", hex(id(s1)))
s2=s1.copy()
# The oject referenced by variable "s2" has a different memory address
print ("s2:", hex(id(s2)))
# However when you copied "s1", the 
# list items within only had their references copied
# So "s1[0]" and "s2[0]" are simply references to the same object
print ("s1[0]:", hex(id(s1[0])))
print ("s2[0]:", hex(id(s2[0])))

输出:

s1: 0x7fcdf5678898 # A different address form s2
s2: 0x7fcddee25240 # A different address form s1
s1[0]: 0x7fcdddf9f6c8 # The same address for the first list
s2[0]: 0x7fcdddf9f6c8 # The same address for the first list

@juanpa.arrivillaga她的回答是正确的,你需要使用deep copy

相关问题 更多 >