使用数据范围扩展数据帧

2024-05-08 12:09:05 发布

您现在位置:Python中文网/ 问答频道 /正文

考虑包含具有开始和结束日期的雇主-雇员链接的数据。你知道吗

   employer  employee      start        end
0         0         0 2007-01-01 2007-12-31
1         1        86 2007-01-01 2007-12-31
2         1        63 2007-06-01 2007-12-31
3         1        93 2007-01-01 2007-12-31

现在我想“散布”这个日期,即在startend之间为每个月创建一个观测值。我以为

def extend(x):
    index = pd.date_range(start=x['start'], end=x['end'], freq='M')
    df = pd.DataFrame([x.values], index=index, columns=x.index)
    return df

long = df.apply(extend, axis=1)

但是,它只包含以下索引:

>>> long.head()
Out[245]: 
   employer  employee  start  end
0  employer  employee  start  end
1  employer  employee  start  end

但是,当我在第一行进行测试时,它起了作用:

>>> extend(df.iloc[0])
Out[246]: 
            employer  employee      start        end
2007-01-31         0         0 2007-01-01 2007-12-31
2007-02-28         0         0 2007-01-01 2007-12-31
2007-03-31         0         0 2007-01-01 2007-12-31
(...)

我做错什么了?或许,有没有更好的方法?我的最终目标是获得与前一个相同的输出,但格式为employer employee month year


Tags: 数据dfdateindex链接defemployeeout
1条回答
网友
1楼 · 发布于 2024-05-08 12:09:05

我认为问题是apply期望返回与输入相同数量的行。你知道吗

您可以通过iterrows和列表理解来完成,而无需对代码进行太多修改:

def extend(x):
    index = pd.date_range(start=x['start'], end=x['end'], freq='M')
    df = pd.DataFrame([x.values], index=index, columns=x.index)
    return df

>>> new = pd.concat([extend(x) for _,x in df.iterrows()])
>>> new

            employer  employee      start        end
2007-01-31         0         0 2007-01-01 2007-12-31
2007-02-28         0         0 2007-01-01 2007-12-31
2007-03-31         0         0 2007-01-01 2007-12-31
2007-04-30         0         0 2007-01-01 2007-12-31
2007-05-31         0         0 2007-01-01 2007-12-31
2007-06-30         0         0 2007-01-01 2007-12-31
2007-07-31         0         0 2007-01-01 2007-12-31
2007-08-31         0         0 2007-01-01 2007-12-31
2007-09-30         0         0 2007-01-01 2007-12-31
2007-10-31         0         0 2007-01-01 2007-12-31
2007-11-30         0         0 2007-01-01 2007-12-31
2007-12-31         0         0 2007-01-01 2007-12-31
2007-01-31         1        86 2007-01-01 2007-12-31
2007-02-28         1        86 2007-01-01 2007-12-31
2007-03-31         1        86 2007-01-01 2007-12-31
2007-04-30         1        86 2007-01-01 2007-12-31
2007-05-31         1        86 2007-01-01 2007-12-31
2007-06-30         1        86 2007-01-01 2007-12-31
2007-07-31         1        86 2007-01-01 2007-12-31
2007-08-31         1        86 2007-01-01 2007-12-31
2007-09-30         1        86 2007-01-01 2007-12-31
2007-10-31         1        86 2007-01-01 2007-12-31
2007-11-30         1        86 2007-01-01 2007-12-31
2007-12-31         1        86 2007-01-01 2007-12-31
2007-06-30         1        63 2007-06-01 2007-12-31
2007-07-31         1        63 2007-06-01 2007-12-31
2007-08-31         1        63 2007-06-01 2007-12-31
2007-09-30         1        63 2007-06-01 2007-12-31
2007-10-31         1        63 2007-06-01 2007-12-31
2007-11-30         1        63 2007-06-01 2007-12-31
2007-12-31         1        63 2007-06-01 2007-12-31
2007-01-31         1        93 2007-01-01 2007-12-31
2007-02-28         1        93 2007-01-01 2007-12-31
2007-03-31         1        93 2007-01-01 2007-12-31
2007-04-30         1        93 2007-01-01 2007-12-31
2007-05-31         1        93 2007-01-01 2007-12-31
2007-06-30         1        93 2007-01-01 2007-12-31
2007-07-31         1        93 2007-01-01 2007-12-31
2007-08-31         1        93 2007-01-01 2007-12-31
2007-09-30         1        93 2007-01-01 2007-12-31
2007-10-31         1        93 2007-01-01 2007-12-31
2007-11-30         1        93 2007-01-01 2007-12-31
2007-12-31         1        93 2007-01-01 2007-12-31

你也可以用groupby/apply来做,因为它更灵活。所以类似于以下内容:

def extend(x):
    x = x.iloc[0,:]
    dates = pd.date_range(start=x['start'], end=x['end'], freq='M')
    return pd.DataFrame(dates,columns=['date'])

>>> long = df.groupby(['employer','employee'])[['start','end']].apply(extend)
>>> long

                           date
employer employee
0        0        0  2007-01-31
                  1  2007-02-28
                  2  2007-03-31
                  3  2007-04-30
                  4  2007-05-31
                  5  2007-06-30
                  6  2007-07-31
                  7  2007-08-31
                  8  2007-09-30
                  9  2007-10-31
                  10 2007-11-30
                  11 2007-12-31
1        63       0  2007-06-30
                  1  2007-07-31
                  2  2007-08-31
                  3  2007-09-30
                  4  2007-10-31
                  5  2007-11-30
                  6  2007-12-31
         86       0  2007-01-31
                  1  2007-02-28
                  2  2007-03-31
                  3  2007-04-30
                  4  2007-05-31
                  5  2007-06-30
                  6  2007-07-31
                  7  2007-08-31
                  8  2007-09-30
                  9  2007-10-31
                  10 2007-11-30
                  11 2007-12-31
         93       0  2007-01-31
                  1  2007-02-28
                  2  2007-03-31
                  3  2007-04-30
                  4  2007-05-31
                  5  2007-06-30
                  6  2007-07-31
                  7  2007-08-31
                  8  2007-09-30
                  9  2007-10-31
                  10 2007-11-30
                  11 2007-12-31

或者可以在concat行上迭代

相关问题 更多 >