Pandas(Python):在数据帧中添加一个新列,该列取决于它的行值和来自另一个数据帧的聚合值

2024-04-28 20:21:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我对Python和熊猫还不熟悉,所以我的怀疑也很愚蠢

问题:

所以我有两个数据帧,比如说df1df2,其中

df1就像

   treatment1 treatment2     value           comparision    test          adjustment  statsig   p_value
0   Treatment    Control  0.795953     Treatment:Control  t-test  Benjamini-Hochberg    False  0.795953
1  Treatment2    Control  0.795953    Treatment2:Control  t-test  Benjamini-Hochberg    False  0.795953
2  Treatment2  Treatment  0.795953  Treatment2:Treatment  t-test  Benjamini-Hochberg    False  0.795953

df2就像

     group_type  metric
0     Treatment    31.0
1    Treatment2    83.0
2     Treatment    51.0
3     Treatment    20.0
4       Control    41.0
..          ...     ...
336  Treatment3    35.0
337  Treatment3     9.0
338  Treatment3    35.0
339  Treatment3     9.0
340  Treatment3    35.0

我想在df1中添加一列mean_percentage_lift,其中

lift_mean_percentage = (mean(treatment1)/mean(treatment2) -1) * 100

where `treatment1` and `treatment2` can be anything in `[Treatment, Control, Treatment2]`

我的方法:

我正在使用数据帧的assign函数

df1.assign(mean_percentage_lift = lambda dataframe: lift_mean_percentage(df2, dataframe['treatment1'], dataframe['treatment2']))

在哪里

def lift_mean_percentage(df, treatment1, treatment2):
    treatment1_data = df[df[group_type_col] == treatment1]
    treatment2_data = df[df[group_type_col] == treatment2]
    mean1 = treatment1_data['metric'].mean()
    mean2 = treatment2_data['metric'].mean()
    return (mean1/mean2 -1) * 100

但是我得到了这个错误Can only compare identically-labeled Series objectstreatment1_data = df[df[group_type_col] == treatment1]。我做错了什么事了吗?还有别的选择吗


Tags: testdfdatatypegroupmeancontroldf1
1条回答
网友
1楼 · 发布于 2024-04-28 20:21:39

对于数据帧df2:

   group_type   metric
0   Treatment   31.0
1   Treatment2  83.0
2   Treatment   51.0
3   Treatment   20.0
4   Control     41.0
5   Treatment3  35.0
6   Treatment3  9.0
7   Treatment   35.0
8   Treatment3  9.0
9   Control     5.0

您可以尝试:

def lift_mean_percentage(df, T1, T2):
      treatment1= df['metric'][df['group_type']==T1].mean()
      treatment2= df['metric'][df['group_type']==T2].mean()
      return (treatment1/treatment2 -1) * 100

跑步:

lift_mean_percentage(df2,'Treatment2','Control')

结果是:

260.8695652173913

相关问题 更多 >