如何在dataframe列中创建数组的嵌套数组

2024-06-08 00:56:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我有如下所示的数据帧(df)

输入

ShipID                                                                             CustomerCode  
['USWPR04-20210429-S-00001', 'USWPR04-20210429-S-00002','USWPR04-20210429-S-00006']    USWPR04
['MSLPR04-20210429-S-00001', 'MSLPR04-20210429-S-00002']                               MSLPR04

我需要创建名为df['LinkID']的新列,它是上述列的嵌套数组

输出

df['LinkID']

[{ "shipID": "USWPR04-20210429-S-00001", "customerCode": "USWPR04", "shiNumber": "20210429-S-00001" },
 { "shipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00002" },
 { "ShipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00006" }]

[{ "shipID": "MSLPR04-20210429-S-00001", "customerCode": "MSLPR04", "shiNumber": "20210429-S-00001" },
{ "shipID": "MSLPR04-20210429-S-00002", "customerCode": "MSLPR04", "shipNumber": "20210429-S-00002" }]

最终数据帧输出

ShipID                                                                             CustomerCode   link
['USWPR04-20210429-S-00001', 'USWPR04-20210429-S-00002','USWPR04-20210429-S-00006']    USWPR04    [{ "shipID": "USWPR04-20210429-S-00001", "customerCode": "USWPR04", "shiNumber": "20210429-S-00001" },{ "shipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00002" },{ "ShipID": "USWPR04-20210429-S-00002", "customerCode": "USWPR04", "shipNumber": "20210429-S-00006" }]
['MSLPR04-20210429-S-00001', 'MSLPR04-20210429-S-00002']                               MSLPR04    [{ "shipID": "MSLPR04-20210429-S-00001", "customerCode": "MSLPR04", "shiNumber": "20210429-S-00001" },{ "shipID": "MSLPR04-20210429-S-00002", "customerCode": "MSLPR04", "shipNumber": "20210429-S-00002" }]

如何做到这一点


Tags: 数据dflink数组shipnumberlinkidcustomercodeshipid
1条回答
网友
1楼 · 发布于 2024-06-08 00:56:05

更新答案:

步骤:

  1. 如果需要,请使用eval
  2. ShipID上分解数据帧
  3. 使用.str.split方法提取shipNumber
  4. 使用to_dict('records')并再次将其加载到数据帧中
  5. 使用groupbyagg使用list将其转换回原始结构
# df.ShipID = df.ShipID.apply(eval)
df2 = df.explode('ShipID')
df2['shipNumber'] = df2.ShipID.str.split('-',1).str[-1]
df2['link'] = pd.DataFrame({'link': df2.to_dict('records')})
df['link'] = df2.groupby(df2.index).agg(list)['link']

相关问题 更多 >

    热门问题