如何根据Pandas中的第一个索引值获取连续日期索引
我有一个数据框,它的索引在某个地方日期从2024-03-03变成了2023-02-25。我想把错误的部分(2023...)替换成正确的日期的延续。
示例:
2024-02-23 -5.60000
2024-02-24 -13.00000
2024-02-25 -27.20000
2024-02-26 -4.20000
2024-02-27 -11.20000
2024-02-28 -14.73625
2024-02-29 -19.37000
2024-03-01 -16.89000
2024-03-02 -5.97000
2024-03-03 -1.30000
2023-02-25 -35.40000
2023-02-26 -28.70000
2023-02-27 -26.40000
2023-02-28 -15.40000
2023-03-01 -14.10000
2023-03-02 -11.20000
2023-03-03 -21.00000
2023-03-04 -17.00000
2023-03-05 -17.60000
2023-03-06 -6.70000
怎么才能做到既干净又符合Python的风格呢?
1 个回答
0
假设这是原始的数据表,日期作为索引,还有一列叫做'值':
Values
2024-02-23 -5.60000
2024-02-24 -13.00000
2024-02-25 -27.20000
2024-02-26 -4.20000
2024-02-27 -11.20000
2024-02-28 -14.73625
2024-02-29 -19.37000
2024-03-01 -16.89000
2024-03-02 -5.97000
2024-03-03 -1.30000
2023-02-25 -35.40000
2023-02-26 -28.70000
2023-02-27 -26.40000
2023-02-28 -15.40000
2023-03-01 -14.10000
2023-03-02 -11.20000
2023-03-03 -21.00000
2023-03-04 -17.00000
2023-03-05 -17.60000
2023-03-06 -6.70000
你可以使用以下方法:
# reset the index and name it as 'date'
df = df.reset_index(names="date")
# find the first wrong date, and replace all wrong dates with NaT
wrong_date_idx = df[df["date"].diff() < df["date"].shift().diff()]["date"].index
df.loc[df.index >= wrong_date_idx[0], "date"] = pd.NaT
# fill the missing dates with the logical extension of the correct dates
df.loc[df["date"].isna(), "date"] = pd.date_range(
start=df["date"].dropna().iloc[-1] + pd.DateOffset(days=1),
periods=df["date"].isna().sum(),
freq="D",
)
# set 'date' as the index and remove name to keep original format
df = df.set_index("date")
df.index.name = None
Values
2024-02-23 -5.60000
2024-02-24 -13.00000
2024-02-25 -27.20000
2024-02-26 -4.20000
2024-02-27 -11.20000
2024-02-28 -14.73625
2024-02-29 -19.37000
2024-03-01 -16.89000
2024-03-02 -5.97000
2024-03-03 -1.30000
2024-03-04 -35.40000
2024-03-05 -28.70000
2024-03-06 -26.40000
2024-03-07 -15.40000
2024-03-08 -14.10000
2024-03-09 -11.20000
2024-03-10 -21.00000
2024-03-11 -17.00000
2024-03-12 -17.60000
2024-03-13 -6.70000