Pandas:从旧数据框中的字符串提取的数据创建新数据框

2024-04-20 12:06:54 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是一个包含示例数据的数据帧:

df = pd.DataFrame({'KEY': ['1','2','3'], 'RECORD': ['1','1','1'], 'SERIAL': ['1470','2321','300'], 'REMARKS': ['FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU','I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I DON\'T LIKE FRUIT[CANTALOPE,HONEYDEW]', 'THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234']})

First Dataframe

我需要将水果提取到与键、记录和序列号相关联的新数据框中。完成后应该是这样的:

df = pd.DataFrame({'KEY': ['1','1','1','2','2','2','2','2','3','3','3'], 'RECORD': ['1','1','1','1','1','1','1','1','1','1','1'], 'SERIAL': ['1470','1470','1470','2321','2321','2321','2321','2321','300','300','300'], 'FRUIT': ['APPLES','ORANGES','PEARS','BANANAS','CHERRIES','GRAPES','CANTALOPE','HONEYDEW','LEMONS','ORANGES','GRAPEFRUIT'], 'CODE': ['null','null','null','null','null','null','null','null','1234','1234','1234']})

Second Dataframe

根据我所做的研究,看起来我可以使用str.split和/或str.extract,但我不确定如何将每个水果与键、记录和序列匹配。除此之外,最后一条记录还有“@1234”。这些信息还需要提取出来,并与前面列出的3种水果进行匹配

我猜这个过程的第一步是提取水果,这应该很容易,因为它们都是串在一起的

有没有关于如何解决这个问题的建议

谢谢


1条回答
网友
1楼 · 发布于 2024-04-20 12:06:54

试试这个:

df['FruitList'] = df['REMARKS'].str.extract('\[(.+?)\]').squeeze().str.split(',')
df['CODE'] = df['REMARKS'].str.extract('@\s(\d+)')
df.explode('FruitList')

输出:

  KEY RECORD SERIAL                                            REMARKS   FruitList  CODE
0   1      1   1470     FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU      APPLES   NaN
0   1      1   1470     FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU     ORANGES   NaN
0   1      1   1470     FRUIT[APPLES,ORANGES,PEARS] IS HEALTHY FOR YOU       PEARS   NaN
1   2      1   2321  I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I D...     BANANAS   NaN
1   2      1   2321  I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I D...    CHERRIES   NaN
1   2      1   2321  I LIKE FRUIT[BANANAS,CHERRIES,GRAPES], BUT I D...      GRAPES   NaN
2   3      1    300   THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234      LEMONS  1234
2   3      1    300   THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234     ORANGES  1234
2   3      1    300   THERE IS FRUIT[LEMONS,ORANGES,GRAPEFRUIT] @ 1234  GRAPEFRUIT  1234

如果您愿意,您可以发表评论:

df.explode('FruitList').drop('REMARKS', axis=1))

输出:

  KEY RECORD SERIAL   FruitList  CODE
0   1      1   1470      APPLES   NaN
0   1      1   1470     ORANGES   NaN
0   1      1   1470       PEARS   NaN
1   2      1   2321     BANANAS   NaN
1   2      1   2321    CHERRIES   NaN
1   2      1   2321      GRAPES   NaN
2   3      1    300      LEMONS  1234
2   3      1    300     ORANGES  1234
2   3      1    300  GRAPEFRUIT  1234

相关问题 更多 >