如何使用pandas groupby向每个组添加一行?

2024-04-20 01:08:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我希望在每个组的第一行中添加新行,我的原始数据帧是:

df = pd.DataFrame({
    'ID': ['James', 'James', 'James','Max', 'Max', 'Max', 'Max','Park','Tom', 'Tom', 'Tom', 'Tom','Wong'],
    'From_num': [78, 420, 'Started', 298, 36, 298, 'Started', 'Started', 60, 520, 99, 'Started', 'Started'],
    'To_num': [96, 78, 420, 36, 78, 36, 298, 311, 150, 520, 78, 99, 39],
    'Date': ['2020-05-12', '2020-02-02', '2019-06-18',
             '2019-06-20', '2019-01-30', '2018-10-23',
             '2018-08-29', '2020-05-21', '2019-11-22',
             '2019-08-26', '2018-12-11', '2018-10-09', '2019-02-01']})

是这样的:

      ID From_num  To_num        Date
0   James       78      96  2020-05-12
1   James      420      78  2020-02-02
2   James  Started     420  2019-06-18
3     Max      298      36  2019-06-20
4     Max       36      78  2019-01-30
5     Max      298      36  2018-10-23
6     Max  Started     298  2018-08-29
7    Park  Started     311  2020-05-21
8     Tom       60     150  2019-11-22
9     Tom      520     520  2019-08-26
10    Tom       99      78  2018-12-11
11    Tom  Started      99  2018-10-09
12   Wong  Started      39  2019-02-01

对于每个人(“ID”),我希望在每个组(“ID”)的第一行上创建一个新的重复行,“ID”、“From_num”和“to_num”列中创建的行的值应与前一行相同,但“Date”值是旧的第一行的日期加上一天,例如对于James,新创建的行值是:“James”“78”“96”“2020-05-13”,与其余数据相同,因此我的预期结果是:

       ID From_num  To_num        Date
0   James       78      96  2020-05-13  # row added, Date + 1
1   James       78      96  2020-05-12
2   James      420      78  2020-02-02
3   James  Started     420  2019-06-18
4     Max      298      36  2019-06-21  # row added, Date + 1
5     Max      298      36  2019-06-20
6     Max       36      78  2019-01-30
7     Max      298      36  2018-10-23
8     Max  Started     298  2018-08-29
9    Park  Started     311  2020-05-22  # Row added, Date + 1
10   Park  Started     311  2020-05-21
11    Tom       60     150  2019-11-23  # Row added, Date + 1
12    Tom       60     150  2019-11-22
13    Tom      520     520  2019-08-26
14    Tom       99      78  2018-12-11
15    Tom  Started      99  2018-10-09
16   Wong  Started      39  2019-02-02  # Row added Date + 1
17   Wong  Started      39  2019-02-01

我写了一些循环条件,但速度很慢,如果你有任何好的想法,请帮助。非常感谢


Tags: tofromidparkadded原始数据datenum
1条回答
网友
1楼 · 发布于 2024-04-20 01:08:08

让我们试试这里。我们将在开始时向每个组追加一行,如下所示:

def augment_group(group):
    first_row = group.iloc[[0]]
    first_row['Date'] += pd.Timedelta(days=1) 
    return first_row.append(group)

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
(df.groupby('ID', as_index=False, group_keys=False)
   .apply(augment_group)
   .reset_index(drop=True))

       ID From_num  To_num       Date
0   James       78      96 2020-05-13
1   James       78      96 2020-05-12
2   James      420      78 2020-02-02
3   James  Started     420 2019-06-18
4     Max      298      36 2019-06-21
5     Max      298      36 2019-06-20
6     Max       36      78 2019-01-30
7     Max      298      36 2018-10-23
8     Max  Started     298 2018-08-29
9    Park  Started     311 2020-05-22
10   Park  Started     311 2020-05-21
11    Tom       60     150 2019-11-23
12    Tom       60     150 2019-11-22
13    Tom      520     520 2019-08-26
14    Tom       99      78 2018-12-11
15    Tom  Started      99 2018-10-09
16   Wong  Started      39 2019-02-02
17   Wong  Started      39 2019-02-01

尽管我同意@Joran Beasley的评论,这感觉有点像XY问题。也许试着澄清你试图解决的问题,而不是问如何实施你认为是解决问题的方法

相关问题 更多 >