如何将（NotOne）热编码转换为同一行上具有多个值的列

>>> import pandas as pd >>> example_input = pd.DataFrame({"one" : [0,1,0,1,0], "two" : [0,0,0,0,0], "three" : [1,1,1,1,0], "four" : [1,1,0,0,0] }) >>> print(example_input) one two three four 0 0 0 1 1 1 1 0 1 1 2 0 0 1 0 3 1 0 1 0 4 0 0 0 0 >>> desired_output = pd.DataFrame(["three, four", "one, three, four", "three", "one, three", ""]) >>> print(desired_output) 0 0 three, four 1 one, three, four 2 three 3 one, three 4

2条回答

网友

1楼 · 编辑于 2024-04-27 14:55:31

下面是一个使用python列表理解来迭代每一行的解决方案：

import pandas as pd

def reverse_hot_encoding(df, sep=', '):
    df = df.astype(bool)
    l = [sep.join(df.columns[row]) for _, row in df.iterrows()]
    return pd.Series(l)

if __name__ == '__main__':
    example_input = pd.DataFrame({"one"   : [0,1,0,1,0], 
                                  "two"   : [0,0,0,0,0],
                                  "three" : [1,1,1,1,0],
                                  "four"  : [1,1,0,0,0]
                                  })
    print(reverse_hot_encoding(example_input))

以下是输出：

0         three, four
1    one, three, four
2               three
3          one, three
4                    
dtype: object

网友

2楼 · 编辑于 2024-04-27 14:55:31

您可以执行^{}，这比迭代数据帧中的所有行要faster得多：

df.dot(df.columns + ', ').str.rstrip(', ')

0         three, four
1    one, three, four
2               three
3          one, three
4                    
dtype: object

相关问题更多 >

编程相关推荐

热门问题

热门文章