从datafram中的列表重命名列中的值

2024-05-16 10:47:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个像这样的数据框

df=pd.DataFrame({'col1':[1,2,3,4,5,6], 'col2':list('AASOSP')})
df

我有两张单子

lis1=['A']
Lis2=['S','O']

我需要根据lis1和lis2替换col2中的值。 所以我用np.where来做。 像这样

df['col2'] = np.where(df.col2.isin(lis1),'PC',df.col2.isin(lis2),'Ln','others')

但这让我犯了一个错误

TypeError: function takes at most 3 arguments (5 given) Any suggestion is very appreciated.!!

最后,我的目标是将数据帧的col2中的值替换为

    col1    col2
0   1   PC
1   2   PC
2   3   Ln
3   4   Ln
4   5   Ln
5   6   others

Tags: 数据dataframedfnpwherelistcol2col1
2条回答

使用双^{}

lis1=['A']
lis2=['S','O']

df['col2'] = np.where(df.col2.isin(lis1),'PC',
             np.where(df.col2.isin(lis2),'Ln','others'))

print (df)
   col1    col2
0     1      PC
1     2      PC
2     3      Ln
3     4      Ln
4     5      Ln
5     6  others

时间安排:

#[60000 rows x 2 columns]
df = pd.concat([df]*10000).reset_index(drop=True)

In [257]: %timeitnp.where(df.col2.isin(lis1),'PC',np.where(df.col2.isin(lis2),'Ln','others'))
100 loops, best of 3: 8.15 ms per loop

In [258]: %timeit in1d_based(df, lis1, lis2)
100 loops, best of 3: 4.98 ms per loop

有一种方法-

a = df.col2.values
df.col2 = np.take(['others','PC','Ln'], np.in1d(a,lis1) + 2*np.in1d(a,lis2))

分步运行示例-

# Input dataframe
In [206]: df
Out[206]: 
   col1 col2
0     1    A
1     2    A
2     3    S
3     4    O
4     5    S
5     6    P

# Extract out col2 values
In [207]: a = df.col2.values

# Form an indexing array based on where we have matches in lis1 or lis2 or neither
In [208]: idx = np.in1d(a,lis1) + 2*np.in1d(a,lis2)

In [209]: idx
Out[209]: array([1, 1, 2, 2, 2, 0])

# Index into a list of new strings with those indices
In [210]: newvals = np.take(['others','PC','Ln'], idx)

In [211]: newvals
Out[211]: 
array(['PC', 'PC', 'Ln', 'Ln', 'Ln', 'others'], 
      dtype='|S6')

# Finally assign those into col2
In [212]: df.col2 = newvals

In [213]: df
Out[213]: 
   col1    col2
0     1      PC
1     2      PC
2     3      Ln
3     4      Ln
4     5      Ln
5     6  others

运行时测试-

In [251]: df=pd.DataFrame({'col1':[1,2,3,4,5,6], 'col2':list('AASOSP')})

In [252]: df = pd.concat([df]*10000).reset_index(drop=True)

In [253]: lis1
Out[253]: ['A']

In [254]: lis2
Out[254]: ['S', 'O']

In [255]: def in1d_based(df, lis1, lis2):
     ...:     a = df.col2.values
     ...:     return np.take(['others','PC','Ln'], np.in1d(a,lis1) + 2*np.in1d(a,lis2))
     ...: 

# @jezrael's soln
In [256]: %timeit np.where(df.col2.isin(lis1),'PC', np.where(df.col2.isin(lis2),'Ln','others'))
100 loops, best of 3: 3.78 ms per loop

In [257]: %timeit in1d_based(df, lis1, lis2)
1000 loops, best of 3: 1.89 ms per loop

相关问题 更多 >