Pandas使用过滤条件将值与前一行进行比较

+----+-------------+---------------+----------+ | | table_num | name | salary | |----+-------------+---------------+----------| | 0 | 001234 | John Johnson | 1200 | | 1 | 001234 | John Johnson | 1000 | | 2 | 001235 | John Johnson | 1000 | | 3 | 001235 | John Johnson | 1200 | | 4 | 001235 | John Johnson | 1000 | | 5 | 001235 | Steve Stevens | 1000 | | 6 | 001236 | Steve Stevens | 1200 | | 7 | 001236 | Steve Stevens | 1200 | | 8 | 001236 | Steve Stevens | 1200 | +----+-------------+---------------+----------+

# giving us only unique combination of 'table_num' and 'name' # since there can be same 'table_num' for different 'name' # and same names with different 'table_num' appears sometimes names_df = df[['table_num', 'name']].drop_duplicates() # then extracting particular name and table_num from Series for i in range(len(names_df)): ### Bottleneck of whole script ### t = names_df.iloc[i,[0,1]][0] n = names_df.iloc[i,[0,1]][1] # using shift() and lambda to check if there difference between two rows diff_sal = (df[(df['table_num']==t) & ((df['name']==n))]['salary'] - df[(df['table_num']==t) & ((df['name']==n))]['salary'].shift(1)).apply(lambda x: 1 if x>0 else (-1 if x<0 else 0)) df.loc[diff_sal.index, 'inc'] = diff_sal.values

df = pd.DataFrame({'table_num': ['001234','001234','001235','001235','001235','001235','001236','001236','001236'], 'name': ['John Johnson','John Johnson','John Johnson','John Johnson','John Johnson', 'Steve Stevens', 'Steve Stevens', 'Steve Stevens', 'Steve Stevens'], 'salary':[1200.,1000.,1000.,1200.,1000.,1000.,1200.,1200.,1200.]})

+----+-------------+---------------+----------+-------+ | | table_num | name | salary | inc | |----+-------------+---------------+----------+-------| | 0 | 001234 | John Johnson | 1200 | 0 | | 1 | 001234 | John Johnson | 1000 | -1 | | 2 | 001235 | John Johnson | 1000 | 0 | | 3 | 001235 | John Johnson | 1200 | 1 | | 4 | 001235 | John Johnson | 1000 | -1 | | 5 | 001235 | Steve Stevens | 1000 | 0 | | 6 | 001236 | Steve Stevens | 1200 | 0 | | 7 | 001236 | Steve Stevens | 1200 | 0 | | 8 | 001236 | Steve Stevens | 1200 | 0 | +----+-------------+---------------+----------+-------+

3条回答

网友

1楼 · 编辑于 2024-06-10 18:54:14

将^{}与^{}一起使用，最后转换为integers：

df['new'] = np.sign(df.groupby(['table_num', 'name'])['salary'].diff().fillna(0)).astype(int)
print (df)
   table_num           name  salary  new
0       1234   John Johnson    1200    0
1       1234   John Johnson    1000   -1
2       1235   John Johnson    1000    0
3       1235   John Johnson    1200    1
4       1235   John Johnson    1000   -1
5       1235  Steve Stevens    1000    0
6       1236  Steve Stevens    1200    0
7       1236  Steve Stevens    1200    0
8       1236  Steve Stevens    1200    0

网友

2楼 · 编辑于 2024-06-10 18:54:14

将^{}与^{}一起使用：

df['inc'] = df.groupby(['table_num', 'name'])['salary'].diff().fillna(0.0)
df.loc[df['inc'] > 0.0, 'inc'] = 1.0
df.loc[df['inc'] < 0.0, 'inc'] = -1.0

网友

3楼 · 编辑于 2024-06-10 18:54:14

shift()是一种方法，但您应该尽量避免使用循环。这里我们可以利用groupby()和{}的力量。检查熊猫docs。在

在你的情况下，你可以写下：

df.assign(inc=lambda x: x.groupby(['name'])
                      .salary
                      .transform(lambda y: y - y.shift(1))
                      .apply(lambda x: 1 if x>0 else (-1 if x<0 else 0))
      )

产量：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章