从datafram中的列表重命名列中的值

2条回答

网友

1楼 · 编辑于 2024-05-16 10:47:48

使用双^{}：

lis1=['A']
lis2=['S','O']

df['col2'] = np.where(df.col2.isin(lis1),'PC',
             np.where(df.col2.isin(lis2),'Ln','others'))

print (df)
   col1    col2
0     1      PC
1     2      PC
2     3      Ln
3     4      Ln
4     5      Ln
5     6  others

时间安排：

#[60000 rows x 2 columns]
df = pd.concat([df]*10000).reset_index(drop=True)

In [257]: %timeitnp.where(df.col2.isin(lis1),'PC',np.where(df.col2.isin(lis2),'Ln','others'))
100 loops, best of 3: 8.15 ms per loop

In [258]: %timeit in1d_based(df, lis1, lis2)
100 loops, best of 3: 4.98 ms per loop

网友

2楼 · 编辑于 2024-05-16 10:47:48

有一种方法-

a = df.col2.values
df.col2 = np.take(['others','PC','Ln'], np.in1d(a,lis1) + 2*np.in1d(a,lis2))

分步运行示例-

# Input dataframe
In [206]: df
Out[206]: 
   col1 col2
0     1    A
1     2    A
2     3    S
3     4    O
4     5    S
5     6    P

# Extract out col2 values
In [207]: a = df.col2.values

# Form an indexing array based on where we have matches in lis1 or lis2 or neither
In [208]: idx = np.in1d(a,lis1) + 2*np.in1d(a,lis2)

In [209]: idx
Out[209]: array([1, 1, 2, 2, 2, 0])

# Index into a list of new strings with those indices
In [210]: newvals = np.take(['others','PC','Ln'], idx)

In [211]: newvals
Out[211]: 
array(['PC', 'PC', 'Ln', 'Ln', 'Ln', 'others'], 
      dtype='|S6')

# Finally assign those into col2
In [212]: df.col2 = newvals

In [213]: df
Out[213]: 
   col1    col2
0     1      PC
1     2      PC
2     3      Ln
3     4      Ln
4     5      Ln
5     6  others

运行时测试-

In [251]: df=pd.DataFrame({'col1':[1,2,3,4,5,6], 'col2':list('AASOSP')})

In [252]: df = pd.concat([df]*10000).reset_index(drop=True)

In [253]: lis1
Out[253]: ['A']

In [254]: lis2
Out[254]: ['S', 'O']

In [255]: def in1d_based(df, lis1, lis2):
     ...:     a = df.col2.values
     ...:     return np.take(['others','PC','Ln'], np.in1d(a,lis1) + 2*np.in1d(a,lis2))
     ...: 

# @jezrael's soln
In [256]: %timeit np.where(df.col2.isin(lis1),'PC', np.where(df.col2.isin(lis2),'Ln','others'))
100 loops, best of 3: 3.78 ms per loop

In [257]: %timeit in1d_based(df, lis1, lis2)
1000 loops, best of 3: 1.89 ms per loop

相关问题更多 >

编程相关推荐

热门问题

热门文章

从datafram中的列表重命名列中的值

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >