将列添加到使用groupby计算不同列计数的df

1条回答

网友

1楼 · 发布于 2024-05-14 19:33:45

使用^{}：

df['count'] = df.groupby('mother_ID')['hatchling_masses_g'].transform('count')

注意^{}和^{}与'count'之间的区别

样本数据：

import numpy as np
import pandas as pd

np.random.seed(5)
df = pd.DataFrame({
    'mother_ID': np.random.choice(['a', 'b'], 10),
    'hatchling_masses_g': np.random.randint(1, 100, 10)
})

  mother_ID  hatchling_masses_g
0         b                  63
1         a                  28
2         b                  31
3         b                  81
4         a                   8
5         a                  77
6         a                  16
7         b                  54
8         a                  81
9         a                  28

`groupby.count`

counts = df.groupby('mother_ID')['hatchling_masses_g'].count()

mother_ID
a    6
b    4
Name: hatchling_masses_g, dtype: int64

请注意，只有两行。当分配回数据帧时，有10行，这意味着pandas不知道如何将数据重新对齐。这将导致NaNs表示缺少数据：

df['count'] = counts

  mother_ID  hatchling_masses_g  count
0         b                  63    NaN
1         a                  28    NaN
2         b                  31    NaN
3         b                  81    NaN
4         a                   8    NaN
5         a                  77    NaN
6         a                  16    NaN
7         b                  54    NaN
8         a                  81    NaN
9         a                  28    NaN

它试图在索引中找到'a'和'b'，因为它无法找到，所以它只能用NaN值填充

`groupby.tranform('count')`

另一方面transform将使用以下计数填充整个组：

counts = df.groupby('mother_ID')['hatchling_masses_g'].transform('count')

counts：

0    4
1    6
2    4
3    4
4    6
5    6
6    6
7    4
8    6
9    6
Name: hatchling_masses_g, dtype: int64

注意，创建了10行（数据框中每行一行）：

这将很好地分配回数据帧（因为索引对齐）：

df['count'] = counts

  mother_ID  hatchling_masses_g  count
0         b                  63      4
1         a                  28      6
2         b                  31      4
3         b                  81      4
4         a                   8      6
5         a                  77      6
6         a                  16      6
7         b                  54      4
8         a                  81      6
9         a                  28      6

如果需要，可以通过^{}进行计数，然后^{}返回到组键上的数据帧：

counts = df.groupby('mother_ID')['hatchling_masses_g'].count().rename('count')
df = df.join(counts, on='mother_ID')

counts：

mother_ID
a    6
b    4
Name: count, dtype: int64

df：

  mother_ID  hatchling_masses_g  count
0         b                  63      4
1         a                  28      6
2         b                  31      4
3         b                  81      4
4         a                   8      6
5         a                  77      6
6         a                  16      6
7         b                  54      4
8         a                  81      6
9         a                  28      6

`groupby.count`

`groupby.tranform('count')`

相关问题更多 >

编程相关推荐

热门问题

热门文章

将列添加到使用groupby计算不同列计数的df

groupby.count

groupby.tranform('count')

相关问题 更多 >

编程相关推荐

热门问题

热门文章

`groupby.count`

`groupby.tranform('count')`

相关问题更多 >