循环内的串联不会更新数据帧

2024-03-28 16:46:49 发布

您现在位置:Python中文网/ 问答频道 /正文

for i in [train1,test1]:
    df_dummies = pd.get_dummies(i['Name'], prefix='Name',dummy_na=False)
    #print(df_dummies.head())
    #i.drop('Name',1,inplace=True)
    i = pd.concat([i,df_dummies],axis=1)
    print(i.head())

输出:

       PassengerId  Pclass  Name  Sex   Age  SibSp  Parch   Ticket     Fare  \
0          892       3   Mr.    1  34.5      0      0   330911   7.8292   
1          893       3  Mrs.    0  47.0      1      0   363272   7.0000   
2          894       2   Mr.    1  62.0      0      0   240276   9.6875   
3          895       3   Mr.    1  27.0      0      0   315154   8.6625   
4          896       3  Mrs.    0  22.0      1      1  3101298  12.2875   

   Embarked  Name_Dr.  Name_Master.  Name_Miss.  Name_Mr.  Name_Mrs.  \
0         2         0             0           0         1          0   
1         0         0             0           0         0          1   
2         2         0             0           0         1          0   
3         0         0             0           0         1          0   
4         0         0             0           0         0          1   

   Name_Rev.  Name_other  
0          0           0  
1          0           0  
2          0           0  
3          0           0  
4          0           0 

但是当再次在for循环外验证时,我没有得到伪变量

print(test1.head())

输出:

       PassengerId  Pclass  Name  Sex   Age  SibSp  Parch   Ticket     Fare  \
0          892       3   Mr.    1  34.5      0      0   330911   7.8292   
1          893       3  Mrs.    0  47.0      1      0   363272   7.0000   
2          894       2   Mr.    1  62.0      0      0   240276   9.6875   
3          895       3   Mr.    1  27.0      0      0   315154   8.6625   
4          896       3  Mrs.    0  22.0      1      1  3101298  12.2875   

   Embarked  
0         2  
1         0  
2         2  
3         0  
4         0  

很明显,我在这里遗漏了一些东西,请帮我找到错误,我认为这与数据帧的副本/地址有关


Tags: namedfforageheadpdmrprint
2条回答

我想您需要在list of DataFrames中分配df。我认为您的解决方案不起作用,因为concat返回新的DataFrame。你知道吗

L = [train1,test1]

for i, df in enumerate(L):
    df_dummies = pd.get_dummies(df['Name'], prefix='Name',dummy_na=False)
    #print(df_dummies.head())
    #i.drop('Name',1,inplace=True)
    L[i] = pd.concat([df,df_dummies],axis=1)


print (L[0])
   PassengerId  Pclass  Name  Sex   Age  SibSp  Parch   Ticket     Fare  \
0          892       3   Mr.    1  34.5      0      0   330911   7.8292   
1          893       3  Mrs.    0  47.0      1      0   363272   7.0000   
2          894       2   Mr.    1  62.0      0      0   240276   9.6875   
3          895       3   Mr.    1  27.0      0      0   315154   8.6625   
4          896       3  Mrs.    0  22.0      1      1  3101298  12.2875   

   Name_Mr.  Name_Mrs.  
0         1          0  
1         0          1  
2         1          0  
3         1          0  
4         0          1  

您可以改用list comprehension。你知道吗

df_list = [pd.concat([x, pd.get_dummies(x['Name'], prefix='Name',dummy_na=False)], 1)
                                                           for x in [train1, test1]]

df_list[0]

   PassengerId  Pclass  Name  Sex   Age  SibSp  Parch   Ticket     Fare  \
0          892       3   Mr.    1  34.5      0      0   330911   7.8292   
1          893       3  Mrs.    0  47.0      1      0   363272   7.0000   
2          894       2   Mr.    1  62.0      0      0   240276   9.6875   
3          895       3   Mr.    1  27.0      0      0   315154   8.6625   
4          896       3  Mrs.    0  22.0      1      1  3101298  12.2875   

   Embarked  Name_Mr.  Name_Mrs.  
0         2         1          0  
1         0         0          1  
2         2         1          0  
3         0         1          0  
4         0         0          1 

相关问题 更多 >