如何在python中通过计算特定的时间范围来创建新列？

ID col1 col2 A 2018/07/01 3 A 2018/08/01 5 A 2018/10/01 10 B 2018/07/01 4 B 2018/10/01 7 B 2019/01/01 9 B 2019/04/01 12 C 2018/07/01 6 C 2018/09/01 5 C 2018/10/01 7

ID col1 col2 col3 A 2018/07/01 3 -7 A 2018/08/01 5 NaN A 2018/10/01 10 NaN B 2018/07/01 4 -3 B 2018/10/01 7 -2 B 2019/01/01 9 -3 B 2019/04/01 12 NaN C 2018/07/01 6 -1 C 2018/09/01 5 NaN C 2018/10/01 7 NaN

3条回答

网友

1楼 · 编辑于 2024-04-18 06:01:18

按理说，使用shift(freq='-3M')很容易，但不知何故，这对月初不起作用。所以我们可以：

# month end date
df['col1e'] = df.col1 + pd.DateOffset(months=3) - pd.DateOffset(days=1)

# shift by group
new_df = df.set_index('col1e').col2.shift(freq='-3M').reset_index(name='col3')

# copy the ID values
new_df['ID'] = df['ID'].values

# merge
df = df.merge(new_df, on=['col1e','ID'],how='left')

#final result
df['col3'] = df['col2'] - df['col3']

输出：

  ID       col1  col2      col1e  col3
0  A 2018-07-01     3 2018-09-30  -7.0
1  A 2018-08-01     5 2018-10-31   NaN
2  A 2018-10-01    10 2018-12-31   NaN
3  B 2018-07-01     4 2018-09-30  -3.0
4  B 2018-10-01     7 2018-12-31  -2.0
5  B 2019-01-01     9 2019-03-31  -3.0
6  B 2019-04-01    12 2019-06-30   NaN
7  C 2018-07-01     6 2018-09-30  -1.0
8  C 2018-09-01     5 2018-11-30   NaN
9  C 2018-10-01     7 2018-12-31   NaN

网友

2楼 · 编辑于 2024-04-18 06:01:18

您可以在数据帧组中使用“重新索引”（无重采样和无移位）：

def get_col2(grp):
    return grp.set_index("col1").reindex(grp["date2"],axis="index")["col2"]      

df["col3"]= df.assign(date2=df["col1"]+pd.offsets.MonthBegin(3)).groupby("ID").apply(get_col2).values

df["col3"]= df["col2"]-df["col3"]

输出：

ID       col1  col2  col3
0  A 2018-07-01     3  -7.0
1  A 2018-08-01     5   NaN
2  A 2018-10-01    10   NaN
3  B 2018-07-01     4  -3.0
4  B 2018-10-01     7  -2.0
5  B 2019-01-01     9  -3.0
6  B 2019-04-01    12   NaN
7  C 2018-07-01     6  -1.0
8  C 2018-09-01     5   NaN
9  C 2018-10-01     7   NaN

网友

3楼 · 编辑于 2024-04-18 06:01:18

按ID分组，并将日期列设置为索引并重新采样。然后换-3档。你知道吗

在：

def func(df):
    df = df.set_index(df.col1).resample('MS').asfreq()
    df['col3'] = df.col2 - df.col2.shift(-3)

    # Clean Up DataFrame        
    df = df.reset_index(0, drop=True).reset_index(drop=True).dropna(how='all')

    return df

df = pd.read_clipboard()
df.col1 = pd.to_datetime(df.col1)
group = df.groupby('ID', as_index=False)

df = group.apply(func).reset_index(drop=True)

输出：

|    | ID | col1       | col2 | col3 |
|  |  |      |   |   |
| 0  | A  | 2018-07-01 | 3.0  | -7.0 |
| 1  | A  | 2018-08-01 | 5.0  | NaN  |
| 2  | A  | 2018-10-01 | 10.0 | NaN  |
| 3  | B  | 2018-07-01 | 4.0  | -3.0 |
| 4  | B  | 2018-10-01 | 7.0  | -2.0 |
| 5  | B  | 2019-01-01 | 9.0  | -3.0 |
| 6  | B  | 2019-04-01 | 12.0 | NaN  |
| 7  | C  | 2018-07-01 | 6.0  | -1.0 |
| 8  | C  | 2018-09-01 | 5.0  | NaN  |
| 9  | C  | 2018-10-01 | 7.0  | NaN  |

相关问题更多 >

编程相关推荐

热门问题

热门文章