防止更改应用于原始数据帧的好方法是什么?

2024-05-23 17:42:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图通过一些命令传递数据帧(为函数准备一系列参数)。但是,当我将一个数据帧分配给另一个数据帧时,这个分配似乎是等效的。换言之,在将数据帧分配给新的数据帧之后,所有更改也应用于原始数据帧。什么是将原始数据帧保持在其原始状态的好方法,以便可以将其重新分配给其他命令,以进行其他更改

请参见下面的示例

# Merge several dataframes

df1 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'eTIV': [1.12, 2.22, 3.43, 5.43], })
df2 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Ear_Vol': [5, 6, 7, 8]})
df3 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Nose': [1, 2, 3, 5], })
df4 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Eye_Vol': [1, 2, 3, 5], })
df5 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Finger': [1.3, 2.123, 3.4, 5.5], })

dfs = [df1, df2, df3, df4,df5]

df_final = reduce(lambda left,right: pd.merge(left,right,on='ID'), dfs)

df_final

 ID eTIV    Ear_Vol Nose    Eye_Vol Finger
0   Mary    1.12    5   1   1   1.300
1   Mike    2.22    6   2   2   2.123
2   Barry   3.43    7   3   3   3.400
3   Scotty  5.43    8   5   5   5.500

将数据帧分配到不同的数据帧和操作:

df = df_final
df_raw = df
df_raw.columns = df_raw.columns.str.replace(r"_Vol", "_Vol_Raw")
df_raw = pd.DataFrame(data = df_raw, columns= df_raw.columns)

新数据帧(如预期):

df_raw
ID  eTIV    Ear_Vol_Raw Nose    Eye_Vol_Raw Finger
0   Mary    1.12    5   1   1   1.300
1   Mike    2.22    6   2   2   2.123
2   Barry   3.43    7   3   3   3.400
3   Scotty  5.43    8   5   5   5.500

由于某种原因,原始数据帧也被更改了(为什么这里的赋值会更改原始数据帧?):

df

    ID  eTIV    Ear_Vol_Raw Nose    Eye_Vol_Raw Finger
0   Mary    1.12    5   1   1   1.300
1   Mike    2.22    6   2   2   2.123
2   Barry   3.43    7   3   3   3.400
3   Scotty  5.43    8   5   5   5.500

Tags: 数据iddataframedf原始数据rawpdmike
2条回答

如果要复制数据帧并创建新对象,请使用^{}

# Merge several dataframes
import pandas as pd
from functools import reduce
df1 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'eTIV': [1.12, 2.22, 3.43, 5.43], })
df2 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Ear_Vol': [5, 6, 7, 8]})
df3 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Nose': [1, 2, 3, 5], })
df4 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Eye_Vol': [1, 2, 3, 5], })
df5 = pd.DataFrame({'ID': ['Mary', 'Mike', 'Barry', 'Scotty'],'Finger': [1.3, 2.123, 3.4, 5.5], })

dfs = [df1, df2, df3, df4,df5]

df_final = reduce(lambda left,right: pd.merge(left,right,on='ID'), dfs)

df_final
df = df_final

print(df is df_final) #Prints True. They are both the same dataframe.

df_raw = df.copy() #Modified

print (df is df_raw) #Prints False. the copy method created a copy of the underlying dataframe object.
df_raw.columns = df_raw.columns.str.replace(r"_Vol", "_Vol_Raw")
df_raw = pd.DataFrame(data = df_raw, columns= df_raw.columns)
print(df_raw)
print(df) #No longer affected by df_raw

简单赋值显示原始行为的原因是名称引用python中的值。赋值只给出两个标签,都指向同一个底层数据帧对象。因此,当修改对象时,所有绑定到该对象的名称都会反映更改。好的进一步阅读here

如果要复制和重命名列,可以使用rename在单个步骤中完成,默认情况下,该方法复制基础数据:

df_raw = df.rename(axis='columns', mapper=lambda s: s.replace(r"_Vol", "_Vol_Raw"))

print(df)
print(df_raw)

输出

       ID  eTIV  Ear_Vol  Nose  Eye_Vol  Finger
0    Mary  1.12        5     1        1   1.300
1    Mike  2.22        6     2        2   2.123
2   Barry  3.43        7     3        3   3.400
3  Scotty  5.43        8     5        5   5.500
       ID  eTIV  Ear_Vol_Raw  Nose  Eye_Vol_Raw  Finger
0    Mary  1.12            5     1            1   1.300
1    Mike  2.22            6     2            2   2.123
2   Barry  3.43            7     3            3   3.400
3  Scotty  5.43            8     5            5   5.500

相关问题 更多 >