Pandas dataframe替换列中的唯一值

2条回答

网友

1楼 · 编辑于 2024-05-29 06:50:10

我会这样做：

In [184]: ['a','b','c'] + df.apply(lambda x: pd.factorize(x)[0]).astype(str)
Out[184]:
  col1 col2 col3
0   a0   b0   c0
1   a1   b1   c1
2   a2   b2   c2
3   a0   b0   c0
4   a2   b2   c1

更通用的方法：

^{pr2}$

网友

2楼 · 编辑于 2024-05-29 06:50:10

这是一个numpy解决方案。它应该是有效的，因为列表理解通常比apply+lambda快。在

字母表范围的源：Alphabet range python

import pandas as pd
from string import ascii_lowercase

df = pd.DataFrame({'col1': {0: 'Aba', 1: 'bab', 2: 'ccc', 3: 'Aba', 4: 'ccc'},
                   'col2': {0: 'xxx', 1: 'bhh', 2: 'kkk', 3: 'xxx', 4: 'kkk'},
                   'col3': {0: 'yyy', 1: 'jjj', 2: 'lll', 3: 'yyy', 4: 'jjj'}})

a = df.values
f = np.array([np.unique(a[:, i], return_inverse=True)[1] for i in range(a.shape[1])]).T

res = list(ascii_lowercase[:a.shape[1]]) + \
      pd.DataFrame(f.astype(str), columns=df.columns)

#   col1 col2 col3
# 0   a0   b2   c2
# 1   a1   b0   c0
# 2   a2   b1   c1
# 3   a0   b2   c2
# 4   a2   b1   c0

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas dataframe替换列中的唯一值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >