Pandas中多列分类值的映射

2024-04-19 05:13:49 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个包含三列分类数据的dataframe,我想将三列分类数据转换为一个值,并映射到原始dataframe。我知道这在一个带有this的列中是可能的,但是多个列呢

例:从这个

>>>df = pd.DataFrame({'id':['0', '1', '2', '3','4'],
...                   'x':['tall', 'short', 'tall', 'short', 'tall'],
...                   'y':['fat', 'thin', 'thin', 'fat', 'fat'],
...                   'z':['male', 'female', 'female', 'male', 'male']},
...                   dtype='category')

>>>df
  id      x     y       z
0  0   tall   fat    male
1  1  short  thin  female
2  2   tall  thin  female
3  3  short   fat    male
4  4   tall   fat    male

通过映射列x、y和z来实现这一点

>>>df
  id      x     y       z  map
0  0   tall   fat    male    0
1  1  short  thin  female    1
2  2   tall  thin  female    2
3  3  short   fat    male    3
4  4   tall   fat    male    0

Tags: 数据iddataframedf分类thisfatmale
1条回答
网友
1楼 · 发布于 2024-04-19 05:13:49

这是groupby().ngroup()

df['map'] = df.groupby(['x','y','z'], sort=False).ngroup()

或者,如果您的数据是字符串类型,则可以使用某些特殊字符连接列,并使用单列方法:

# add('&') may not be needed
df['map'] = pd.factorize(df[['x','y','z']].add('&').sum(1))[0]

输出:

   id      x     y       z  map
0   0   tall   fat    male    0
1   1  short  thin  female    1
2   2   tall  thin  female    2
3   3  short   fat    male    3
4   4   tall   fat    male    0

相关问题 更多 >