用新数据更新数据帧，同时保留现有的ID numb

df = pd.DataFrame({'id':[1,2,3,4], 'gp':['a','a','b','b'], 'meta':['one','two','three','four'], 'matchvar':['wwww','w ww w','xxxx','xyxx'], 'match':[np.nan,'yes',np.nan,'no']})

for g in df.groupby(['gp']): print(g[1]) id gp meta matchvar match 0 1 a one wwww NaN 1 2 a two w ww w yes id gp meta matchvar match 2 3 b three xxxx NaN 3 4 b four xyxx no

2条回答

网友

1楼 · 编辑于 2024-05-23 20:12:03

当您在注释中确认每个组有2行时，您可以尝试以下逻辑：create maskm将“no”组与“yes”组分开。处理“yes”组的id，并通过同时使用drop_duplicates和concat来选取其最后一行

m = df.match.eq('no').groupby(df.gp).transform('any')
df_yes = (df.assign(id=df.id.shift(fill_value=0))[~m]
            .drop_duplicates('gp', keep='last'))
df_final = pd.concat([df_yes, df[m]])

Out[108]:
   id gp   meta matchvar match
1   1  a    two   w ww w   yes
2   3  b  three     xxxx   NaN
3   4  b   four     xyxx    no

网友

2楼 · 编辑于 2024-05-23 20:12:03

遵循您的逻辑，仅使用矢量化方法来保持代码的效率，我们可以执行以下操作：

mask_yes = df['match'].eq('yes') # array with True for rows with 'yes'
mask_no = df['match'].eq('no')   # array with True for rows with 'no'

# if the row is 'yes', get the shifted id, else the original id
df['id'] = np.where(mask_yes, df['id'].shift(), df['id']) 

# if a group has 'no' mark all rows as True so we can keep the whole group
mask = df.assign(indicator=mask_no).groupby('gp')['indicator'].transform('any')
# filter on groups with 'no' or only the row 'yes'
df = df[mask | mask_yes]

    id gp   meta matchvar match
1  1.0  a    two   w ww w   yes
2  3.0  b  three     xxxx   NaN
3  4.0  b   four     xyxx    no

相关问题更多 >

编程相关推荐

热门问题

热门文章