有条件地将来自不同数据帧的聚合列连接到新的数据帧中

In [22]: arrays = [np.array(['A1', 'A1', 'A1', 'A1', 'A2', 'A2', 'A2', 'A2']), ....: np.array(['B1', 'B1', 'B2', 'B2', 'B1', 'B1', 'B2', 'B2']), ....: np.array(['C1', 'C2', 'C1', 'C2', 'C1', 'C2', 'C1', 'C2'])] In [23]: df1 = pd.DataFrame(np.random.randint(10, size=(8, 4)), index=arrays) In [24]: df1 Out[24]: 0 1 2 3 A1 B1 C1 2 7 3 4 C2 6 2 1 7 B2 C1 3 3 5 6 C2 9 6 3 6 A2 B1 C1 7 8 0 6 C2 6 3 1 6 B2 C1 9 3 8 2 C2 7 1 2 8 In [25]: df2 = pd.DataFrame(np.random.randint(10, size=(8, 4)), index=arrays) In [26]: df2 Out[26]: 0 1 2 3 A1 B1 C1 7 2 5 2 C2 0 2 9 0 B2 C1 2 2 6 9 C2 4 6 3 8 A2 B1 C1 7 1 5 1 C2 6 2 2 6 B2 C1 5 8 1 6 C2 7 4 8 0

2条回答

网友

1楼 · 编辑于 2024-05-15 02:44:45

我设法实现了我想要的解决方案：

In [55]: df = pd.DataFrame()
In [56]: for t, n in [(df1, 'df1'), (df2, 'df2')]:
   ....:     t['nth'] = np.where(t.index.get_level_values(0).to_series().str.contains('1').values, t[2], t[3])
   ....:     df[n, 'max'] = t[0].groupby(level=[0, 1]).max()
   ....:     # reset_index() is required since nth() doesn't reduce number of index levels
   ....:     df[n, 'nth'] = t['nth'].groupby(level=[0, 1]).nth(0).reset_index(level=2, drop=True)
In [57]: df
Out[57]: 
       (df1, max)  (df1, nth)  (df2, max)  (df2, nth)
A1 B1           8           1           7           0
   B2           6           3           9           3
A2 B1           7           2           7           3
   B2           8           2           6           7

In [58]: df.columns = pd.MultiIndex.from_tuples(df.columns)
In [59]: df
Out[59]: 
      df1     df2    
      max nth max nth
A1 B1   8   1   7   0
   B2   6   3   9   3
A2 B1   7   2   7   3
   B2   8   2   6   7

网友

2楼 · 编辑于 2024-05-15 02:44:45

以下是我的出发点（与您的代码相同，不同的随机值）：

          0  1  2  3
A1 B1 C1  3  4  1  6
      C2  6  3  4  5
   B2 C1  8  3  5  1
      C2  8  5  1  6
A2 B1 C1  8  7  0  6
      C2  5  1  4  7
   B2 C1  3  1  8  5
      C2  7  1  7  8

df[0] = df.groupby(level=[0,1])[0].transform(max)

          0  1  2  3
A1 B1 C1  6  4  1  6
      C2  6  3  4  5
   B2 C1  8  3  5  1
      C2  8  5  1  6
A2 B1 C1  8  7  0  6
      C2  8  1  4  7
   B2 C1  7  1  8  5
      C2  7  1  7  8

在第一个级别中，我找不到直接检查“1”的方法，所以我只是将它转换为带有reset_index的colunn，然后对它使用string方法就相当容易了。你知道吗

df['one'] = df.reset_index().level_0.str.contains('1').values
df['nth'] = np.where( df.one, df[2], df[3] )

          0  1  2  3    one  nth
A1 B1 C1  6  4  1  6   True    1
      C2  6  3  4  5   True    4
   B2 C1  8  3  5  1   True    5
      C2  8  5  1  6   True    1
A2 B1 C1  8  7  0  6  False    6
      C2  8  1  4  7  False    7
   B2 C1  7  1  8  5  False    5
      C2  7  1  7  8  False    8

现在把事情清理干净（有些事情可以早点做，但我认为等到最后再把它们结合起来会更清楚）：

df.iloc[0::2,[0,-1]].reset_index(level=2,drop=True).rename(columns={0:'max'})

       max  nth
A1 B1    6    1
   B2    8    5
A2 B1    8    6
   B2    7    5

我不确定你是否也在问concat，但很简单：

pd.concat( [df1,df2], axis=1)

相关问题更多 >

编程相关推荐

热门问题

热门文章