如何根据Pandas中的第一个索引值获取连续日期索引

0 投票
1 回答
31 浏览
提问于 2025-04-13 17:13

我有一个数据框,它的索引在某个地方日期从2024-03-03变成了2023-02-25。我想把错误的部分(2023...)替换成正确的日期的延续。

示例:

2024-02-23    -5.60000
2024-02-24   -13.00000
2024-02-25   -27.20000
2024-02-26    -4.20000
2024-02-27   -11.20000
2024-02-28   -14.73625
2024-02-29   -19.37000
2024-03-01   -16.89000
2024-03-02    -5.97000
2024-03-03    -1.30000
2023-02-25   -35.40000
2023-02-26   -28.70000
2023-02-27   -26.40000
2023-02-28   -15.40000
2023-03-01   -14.10000
2023-03-02   -11.20000
2023-03-03   -21.00000
2023-03-04   -17.00000
2023-03-05   -17.60000
2023-03-06    -6.70000

怎么才能做到既干净又符合Python的风格呢?

1 个回答

0

假设这是原始的数据表,日期作为索引,还有一列叫做'值':

              Values
2024-02-23  -5.60000
2024-02-24 -13.00000
2024-02-25 -27.20000
2024-02-26  -4.20000
2024-02-27 -11.20000
2024-02-28 -14.73625
2024-02-29 -19.37000
2024-03-01 -16.89000
2024-03-02  -5.97000
2024-03-03  -1.30000
2023-02-25 -35.40000
2023-02-26 -28.70000
2023-02-27 -26.40000
2023-02-28 -15.40000
2023-03-01 -14.10000
2023-03-02 -11.20000
2023-03-03 -21.00000
2023-03-04 -17.00000
2023-03-05 -17.60000
2023-03-06  -6.70000

你可以使用以下方法:

# reset the index and name it as 'date'
df = df.reset_index(names="date")

# find the first wrong date, and replace all wrong dates with NaT
wrong_date_idx = df[df["date"].diff() < df["date"].shift().diff()]["date"].index
df.loc[df.index >= wrong_date_idx[0], "date"] = pd.NaT

# fill the missing dates with the logical extension of the correct dates
df.loc[df["date"].isna(), "date"] = pd.date_range(
    start=df["date"].dropna().iloc[-1] + pd.DateOffset(days=1),
    periods=df["date"].isna().sum(),
    freq="D",
)

# set 'date' as the index and remove name to keep original format
df = df.set_index("date")
df.index.name = None
              Values
2024-02-23  -5.60000
2024-02-24 -13.00000
2024-02-25 -27.20000
2024-02-26  -4.20000
2024-02-27 -11.20000
2024-02-28 -14.73625
2024-02-29 -19.37000
2024-03-01 -16.89000
2024-03-02  -5.97000
2024-03-03  -1.30000
2024-03-04 -35.40000
2024-03-05 -28.70000
2024-03-06 -26.40000
2024-03-07 -15.40000
2024-03-08 -14.10000
2024-03-09 -11.20000
2024-03-10 -21.00000
2024-03-11 -17.00000
2024-03-12 -17.60000
2024-03-13  -6.70000

撰写回答