如何从DataFrame中的两个单独列创建一个新的压缩列表项目的列?

2024-05-15 15:43:41 发布

您现在位置:Python中文网/ 问答频道 /正文

这个问题在某种程度上源于我之前提出的一个问题——Pandas groupby make two columns lists separately。这次我想创建一个新列,其中每个值都是一个列表,其中包含来自其他两列的压缩值的元组。例如:

# Original DataFrame
      fruit      sport                       weather
0     apple      [baseball, basketball]      [sunny, windy]
1     banana     [swimming, hockey]          [cloudy, windy]
2     orange     [football]                  [sunny]


# Desired DataFrame
      fruit      sport                       weather             pairs
0     apple      [baseball, basketball]      [sunny, windy]      [(baseball, sunny), (basketball, windy)]
1     banana     [swimming, hockey]          [cloudy, windy]     [(swimming, cloudy), (hocky, windy)]
2     orange     [football]                  [sunny]             [(football, sunny)]

我尝试了以下代码,但它提供了一些其他信息:

df['pairs'] = list(zip(df['sport'], df['weather']))

# Output DataFrame
      fruit      sport                       weather             pairs
0     apple      [baseball, basketball]      [sunny, windy]      ([baseball, sunny], [basketball, windy])
1     banana     [swimming, hockey]          [cloudy, windy]     ([swimming, cloudy], [hocky, windy])
2     orange     [football]                  [sunny]             ([football], [sunny])

正如你所看到的,它与我想做的“相反”。我应该怎么做才合适?提前谢谢


Tags: appledataframebananaweathercloudyfruitorangebasketball
3条回答

使用^{}覆盖axis=1zip

df['pairs'] = df.apply(lambda x: list(zip(x['sport'], x['weather'])), axis=1)
    fruit                   sport          weather                                     pairs
0   apple  [baseball, basketball]   [sunny, windy]  [(baseball, sunny), (basketball, windy)]
1  banana      [swimming, hockey]  [cloudy, windy]     [(swimming, cloudy), (hockey, windy)]
2  orange              [football]          [sunny]                       [(football, sunny)]

您可以利用map有一个嵌入式zip,并执行以下操作:

df['pairs'] = [list(x) for x in map(zip, df['sport'], df['weather'])]
print(df)

输出

    fruit  ...                                     pairs
0   apple  ...  [(baseball, sunny), (basketball, windy)]
1  banana  ...     [(swimming, cloudy), (hockey, windy)]
2  orange  ...                       [(football, sunny)]

[3 rows x 4 columns]

或者您可以使用itertuples

df['pairs'] = [list(zip(*x)) for x in df[['sport', 'weather']].itertuples(index=False)]

我想你错过了另一个list(zip())

df['pairs'] = list(list(zip(a,b)) for a,b in zip(df['sport'], df['weather']))

输出:

    fruit    sport                       weather              pairs
 0  apple    ['baseball', 'basketball']  ['sunny', 'windy']   [('baseball', 'sunny'), ('basketball', 'windy')]
 1  banana   ['swimming', 'hockey']      ['cloudy', 'windy']  [('swimming', 'cloudy'), ('hockey', 'windy')]
 2  orange   ['football']                ['sunny']            [('football', 'sunny')]

相关问题 更多 >