merge给出错误:DataFrame的对象是可变的,因此不能对它们进行散列

2024-05-23 15:30:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧dfCM,它是从另一个数据帧dfdict[dfCM]创建的,然后按如下方式处理:

  1. 删除了不需要的行
  2. 删除了不需要的列
  3. 添加了新的列

我现在需要将删除的列从dfdict[dfCM]添加回dfCM。注意,dfdict[dfCM]保存在数据帧字典中

以前我在代码中多次运行过类似的merge命令,但现在我得到了一个错误:DataFrame'对象是可变的,因此不能对它们进行散列处理

#add back deleted dfCM columns 
dfCM = pd.merge(dfCM, dfdict[dfCM], on=['ClaimID'], how = 'left', suffixes = ('', '_cm')) 
#remove duplicate columns
dfCM.filter(like='_cm',axis=1)

这就是dfCM的样子(有更多的列和列):

index ClaimID                 MeasCode  MeasAppType
0     MCE-2019-02-02-068-01     CLA48   AR  
1     MCE-2019-02-066-01        CLA48   AR  
2     MCE-2019-02-066-01B       CLA48   AR  
3     MCE-2019-02-066-02        CLB50   AR  
4     MCE-2019-02-066-02B       CLB50   AR  
5     MCE-2019-02-067-01        CLB51   AR  

dfdict的屏幕截图如下:

This is what dfdict looks like

这就是dfdict[dfCM]的样子(有更多的行和列):

index   ClaimID                 MeasAppType  MeasDesc
0       BAY-2019_C&S_19Q1       AR           Attic insulation; Domestic hot water heater/boiler; 
1       BAY-2019_COM_19Q1       AR           Attic insulation; Domestic hot water heater/boiler; 
2       BAY-2019_Com_Q2         NR           This record is not a project
3       BAY-2019_CS_Q2          NR           This record is not a project
4       BAY-2019_EM&V_19Q1      AR           Attic insulation; Domestic hot water heater/boiler; 

我可以通过更改dfdict[dfCM]中的所有列名来进行合并,如下所示。但这并不理想,因为现在我无法区分添加到dfCM的重复列和唯一列,因此无法删除重复列

    #add back deleted dfCM columns
    dfdict['dfCM'] = dfdict['dfCM'].add_suffix('_cm') #identified columns from dfCL
    dfCM = pd.merge(dfCM, dfdict['dfCM'], left_on='ClaimID', right_on='ClaimID_cm', how = 'left', suffixes = ('', '_cm'))

有没有更好的办法解决这个问题?谢谢


Tags: columns数据addoncmmergeleftar
1条回答
网友
1楼 · 发布于 2024-05-23 15:30:29

您需要解释如何创建dfdict,因为您试图使用数据帧作为字典的键,但您不能:

import pandas as pd
df1 = pd.DataFrame()
df2 = pd.DataFrame()
dfdict = {df1: 1, df2: 2}
Traceback (most recent call last):
  File "/Users/dgolding/PycharmProjects/team-general-wikis/venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-3207e8fd0e73>", line 1, in <module>
    {df1: 1, df2: 2}
  File "/Users/dgolding/PycharmProjects/team-general-wikis/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 1887, in __hash__
    " hashed".format(self.__class__.__name__)
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

也许您的字典键实际上是数据帧变量名的字符串?在这种情况下,当您尝试使用数据帧作为键来获取值时,会出现该错误:

dfdict = {"df1": df1, "df2": df2}
dfdict[df1]
Traceback (most recent call last):
  File "/Users/dgolding/PycharmProjects/team-general-wikis/venv/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-825e4ae2577b>", line 1, in <module>
    dfdict[df1]
  File "/Users/dgolding/PycharmProjects/team-general-wikis/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 1887, in __hash__
    " hashed".format(self.__class__.__name__)
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

也许你想这么做:dfdict["dfCM"]

相关问题 更多 >