将指定列的行合并到一个单元格(一个元组)中

2024-04-30 01:14:26 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个如下表:例如,在ID2下,这一行只显示该id的状态和日期的历史记录

 *id*,             *status*,                      *date*,               *Type*  
        2             dissolved                   2016/03/19                T1 
        nan           active                         NaT                    nan
        3             dissolved                  2016/03/19,                T3 
        nan           active                     2012/03/16                 nan
        4             in liquidation,            2017/03/19                 T2 
        nan           dissolved,                      NaT                   nan

我要做的是按id组合行,例如,对于第一个id=2,我得到:

 *id*,             *status*,                      *date*,               *Type*  
   2             [dissolved,active]              [2016/03/19,None]          T1 
                                                  

我试过:

 data.groupby(['id']).agg(lambda x: tuple(x)).applymap(list).reset_index() 

但这将id列设置为[2,nan],而我只需要2。我该怎么办?我只想为我的状态和日期值列表格式,而不是元组所有他们了

输入df:

df = pd.DataFrame([[2,"dissolved","2016/03/19","T1" ],
            [float("nan"),"active","NaT",float("nan")],
            [3,"dissolved","2016/03/19","T3" ],
            [float("nan"),"active","2012/03/16",float("nan")],
            [4,"in liquidation","2017/03/19","T2" ],
            [float("nan"),"dissolved","NaT",float("nan")]],columns = ["id","status","date","Type"])

Tags: iniddate状态typestatusnanfloat
1条回答
网友
1楼 · 发布于 2024-04-30 01:14:26

请尝试以下操作:

df = pd.DataFrame([[2,"dissolved","2016/03/19","T1" ],
            [float("nan"),"active","NaT",float("nan")],
            [3,"dissolved","2016/03/19","T3" ],
            [float("nan"),"active","2012/03/16",float("nan")],
            [4,"in liquidation","2017/03/19","T2" ],
            [float("nan"),"dissolved","NaT",float("nan")]],columns = ["id","status","date","Type"])
df = df.ffill()
df["status"] = df["status"]  + ","
df["date"] = df["date"]  + ","
df2 = df.groupby(["id","Type"]).sum()
df2["status"] = df2["status"].apply(lambda x: x.split(",")[0:len(x.split(","))-1])
df2["date"] = df2["date"].apply(lambda x: x.split(",")[0:len(x.split(","))-1])
df2

根据您的数据集,您可能需要对其进行一些调整。它的输出如下:

          status                        date
id  Type        
2.0 T1   [dissolved, active]           [2016/03/19, NaT]
3.0 T3   [dissolved, active]           [2016/03/19, 2012/03/16]
4.0 T2   [in liquidation, dissolved]   [2017/03/19, NaT]

相关问题 更多 >