获取group by函数后的第一个非空值

2024-05-14 05:49:36 发布

您现在位置:Python中文网/ 问答频道 /正文

在运行groupby函数之后,我想从每个group返回utm\u source列的第一个非空值。你知道吗

这是我写的代码:

file[file['steps'] == 'Sign-ups'].sort_values(by=['ts']).groupby('anonymous_id')['utm_source'].apply(lambda x: x.first_valid_index())

这似乎是回报:

anonymous_id
00003df1-be12-47b8-b3b8-d01c84a22fdf           NaN
00009cc0-279f-4ccf-aea4-f6af1f2bb75a           NaN
0000a6a0-00bc-475f-a9e5-9dcbb4309e78           NaN
0000c906-7060-4521-8090-9cd600b08974         638.0
0000c924-5959-4e2d-8757-0d10f96ca462           NaN
0000dc27-292c-4676-8a1b-4977f2ad1577         275.0
0000df7e-2579-4071-8aa5-814ab294bf9a         419.0

我不太确定与anonu id相关的值是什么。你知道吗

以下是我的数据示例:

{'anonymous_id': {0: '0000f8ea-3aa6-4423-9247-1d9580d378e1',
  1: '00015d49-2cd8-41b1-bbe7-6aedbefdb098',
  2: '0002226e-26a4-4f55-9578-2eff2999de7e',
  3: '00022b83-240e-4ef9-aaad-ac84064bb902',
  4: '00022b83-240e-4ef9-aaad-ac84064bb902'},
 'ts': {0: '2018-04-11 06:59:20.206000',
  1: '2019-05-18 05:59:11.874000',
  2: '2018-09-10 18:19:25.260000',
  3: '2017-10-11 08:20:18.092000',
  4: '2017-10-11 08:20:31.466000'},
 'utm_source': {0: nan, 1: 'facebook', 2: 'facebook', 3: nan, 4: nan},
 'rank': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2},
 'steps': {0: 'Sign-ups', 1: nan, 2: nan, 3: nan, 4: nan}}

因此,对于每个匿名的\u id,我将返回与anon\u id相关联的第一个utm\u源(按时间顺序,按ts列排序)


Tags: idsourcefacebooknanstepsfileutmgroupby
1条回答
网友
1楼 · 发布于 2024-05-14 05:49:36

So for each anonymous_id I would return the first (chronological, sorted by the ts column) utm_source associated with the anon_id

IIUC您可以先删除空值,然后groupby first

df.sort_values('ts').dropna(subset=['utm_source']).groupby('anonymous_id')['utm_source'].first()

示例数据的输出:

anonymous_id
00015d49-2cd8-41b1-bbe7-6aedbefdb098    facebook
0002226e-26a4-4f55-9578-2eff2999de7e    facebook

相关问题 更多 >