Pandas多个“是/否”虚拟变量

2024-04-25 00:19:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含多个分类变量的数据框,我需要将它们转换为虚拟变量。性别和地区(4种类型)很容易使用pd.get_dummies。但是,我有几个变量是yes/no。我该怎么做才能使伪yesno列包含变量名?例如,“married”变量将变成married_yes和{}?在

这是我当前的代码和前五行的屏幕截图:

genderdummy=pd.get_dummies(bank_df['gender'])
regiondummy=pd.get_dummies(bank_df['region'])
marrieddummy=pd.get_dummies(bank_df['married'])
cardummy=pd.get_dummies(bank_df['car'])
savingsdummy=pd.get_dummies(bank_df['savings_acct'])
currentdummy=pd.get_dummies(bank_df['current_acct'])
mortgagedummy=pd.get_dummies(bank_df['mortgage'])
pepdummy=pd.get_dummies(bank_df['pep'])
newdata_df=pd.concat([genderdummy,regiondummy,marrieddummy,cardummy,savingsdummy,currentdummy,mortgagedummy,pepdummy], axis=1)
newdata_df.head()

enter image description here

因此,根据建议,我现在得到的是:

^{pr2}$

enter image description here


Tags: nodfgetyespdbankacctdummies
2条回答

如果你改变你的方法,它会自动做到这一点。您只需要对数据帧而不是序列调用pd.get_dummies

import numpy as np
import pandas as pd

# Define sample data and columns for dummy variables
df = pd.DataFrame(np.random.choice(['yes', 'no'], size=(6, 3)), columns=['gender', 'region', 'married'])
dummy_vars = ['gender', 'married']

# Create dummy variables
pd.get_dummies(df[dummy_vars])

   gender_no  gender_yes  married_no  married_yes
0          0           1           1            0
1          1           0           0            1
2          0           1           1            0
3          1           0           1            0
4          1           0           1            0
5          0           1           1            0

或者可以使用prefix参数显式地:

^{pr2}$

更新:

使用变量,它应该如下所示:

genderdummy = pd.get_dummies(bank_df['gender'])
regiondummy = pd.get_dummies(bank_df['region'])
dummy_vars = ['married', 'car', 'savings_acct', 'current_acct', 'mortgage', 'pep']
other_dummies = pd.get_dummies(bank_df[dummy_vars])
newdata_df = pd.concat([genderdummy, regiondummy, other_dummies], axis=1)
newdata_df.head()

注意dummy_vars只是bank_df中列的名称。在

pandas.get_dummies()中使用prefix参数

df = pd.DataFrame({'text':['cat', 'dog','cat','dog']})
df = pd.get_dummies(df['text'], prefix='text')
print(df)

输出

^{pr2}$

相关问题 更多 >