在一个聚合中使用多个idxmin（）和idmax（）进行多索引

> DT = data.table(id=c(1,1,1,2,2,2,2,3,3,3), col1=c(1,3,5,2,5,3,6,3,67,7), col2=c(4,6,8,3,65,3,5,4,4,7), col3=c(34,64,53,5,6,2,4,6,4,67)) > DT id col1 col2 col3 1: 1 1 4 34 2: 1 3 6 64 3: 1 5 8 53 4: 2 2 3 5 5: 2 5 65 6 6: 2 3 3 2 7: 2 6 5 4 8: 3 3 4 6 9: 3 67 4 4 10: 3 7 7 67 > DT_agg = DT[, .(agg1 = col1[which.max(col2)] , agg2 = col2[which.min(col3)] , agg3 = col1[which.max(col3)]) , by= id] > DT_agg id agg1 agg2 agg3 1: 1 5 4 3 2: 2 5 3 5 3: 3 7 4 7

DF =pd.DataFrame({'id':[1,1,1,2,2,2,2,3,3,3], 'col1':[1,3,5,2,5,3,6,3,67,7], 'col2':[4,6,8,3,65,3,5,4,4,7], 'col3':[34,64,53,5,6,2,4,6,4,67]}) DF Out[70]: id col1 col2 col3 0 1 1 4 34 1 1 3 6 64 2 1 5 8 53 3 2 2 3 5 4 2 5 65 6 5 2 3 3 2 6 2 6 5 4 7 3 3 4 6 8 3 67 4 4 9 3 7 7 67

3条回答

网友

1楼 · 编辑于 2024-04-26 11:32:44

你可以试试这个

DF.groupby('id').agg(agg1=('col1',lambda x:x[DF.loc[x.index,'col2'].idxmax()]),
                     agg2 = ('col2',lambda x:x[DF.loc[x.index,'col3'].idxmin()]),
                     agg3 = ('col1',lambda x:x[DF.loc[x.index,'col3'].idxmax()]))

    agg1  agg2  agg3
id
1      5     4     3
2      5     3     5
3      7     4     7

网友

2楼 · 编辑于 2024-04-26 11:32:44

python中的tidyverse方式如何：

>>> from datar.all import f, tibble, group_by, which_max, which_min, summarise
>>> 
>>> DF = tibble(
...     id=[1,1,1,2,2,2,2,3,3,3], 
...     col1=[1,3,5,2,5,3,6,3,67,7],
...     col2=[4,6,8,3,65,3,5,4,4,7], 
...     col3=[34,64,53,5,6,2,4,6,4,67]
... )
>>> 
>>> DF >> group_by(f.id) >> summarise(
...     agg1=f.col1[which_max(f.col2)],
...     agg2=f.col2[which_min(f.col3)],
...     agg3=f.col1[which_max(f.col3)]
... )
       id    agg1    agg2    agg3
  <int64> <int64> <int64> <int64>
0       1       5       4       3
1       2       5       3       5
2       3       7       4       7

我是^{}包的作者。如果您有任何问题，请随时提交问题

网友

3楼 · 编辑于 2024-04-26 11:32:44

玩弄这个问题，主要是想看看我是否能在原来的解决方案上提高速度。匿名函数有一种侵蚀速度的方式

grp = df.groupby("id")

        pd.DataFrame({ "col1": df.col1[grp.col2.idxmax()].array,
                       "col2": df.col2[grp.col3.idxmin()].array,
                       "col3": df.col1[grp.col3.idxmax()].array},
                       index=grp.indices)

    col1    col2    col3
1   5       4       3
2   5       3       5
3   7       4       7

加速~3倍。您的里程可能会有所不同。另外，为了简洁，不，我认为原始的解决方案可能是最简洁的。rdatatable擅长简洁和快速

相关问题更多 >

编程相关推荐

热门问题

热门文章