从一列中获取与行子集的另一列的最小值相对应的值

2024-04-19 12:14:46 发布

您现在位置:Python中文网/ 问答频道 /正文

如果问题不完全清楚,道歉。但是,我确实有一些示例代码显示了所需的输入和输出(见下文)

我有一个(大)数据帧,希望选择pval1中的最小值和相应的滞后。我还想选择pval2中的最小值和相应的滞后。我想对每一对变量(即,(A和B),(A和C)和(B和D))都这样做。每对变量在数据集中出现多次

我尝试了几种方法来尝试获得我想要的输出,但似乎遗漏了一些逻辑方面的东西,我不太确定是什么。任何帮助都将不胜感激

感谢所有帮助你的人

数据帧的外观如下所示:

myxdf = pd.DataFrame({
    'pval1': [0.01,0.2,0.001,0.3,0.0003,0.05,1,0.002,0.2],
    'pval2': [0.3,0.02,0.002,0.9,0.001,0.002,0.10,0.93,0.00001],
    'lag': [1,2,3,1,2,3,1,2,3],
    'var1': ['A','A','A','A','A','A','B','B','B'],
    'var2': ['B','B','B','C','C','C','D','D','D']
})
    
myxdf

上述示例的理想输出应该如下所示(请注意新的lag1和lag2列):

myxdf2 = pd.DataFrame({
    'pval1': [0.0010,0.0003,0.002],
    'pval2' : [0.002,0.001,0.00001],
    'lagp1': ['3','2','2'],
    'lagp2': ['3','2','3'],
    'var1': ['A','A','B'],
    'var2': ['B','C','D']
})

myxdf2

Tags: 数据方法代码示例dataframe逻辑pd见下文
1条回答
网友
1楼 · 发布于 2024-04-19 12:14:46

我相信您需要^{}作为最小值的索引,将其用于选择行、重命名列和通过^{}连接:

df = myxdf.groupby(['var1','var2'])[['pval1', 'pval2']].idxmin()


df1 = myxdf.loc[df['pval1'], ['pval1','lag']].rename(columns={'lag':'lagp1'})
df2 = myxdf.loc[df['pval2'], ['pval2','lag','var1','var2']].rename(columns={'lag':'lagp2'})

df = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
cols = ['pval1', 'pval2', 'lagp1', 'lagp2', 'var1', 'var2']
df = df[cols]
print (df)
    pval1    pval2  lagp1  lagp2 var1 var2
0  0.0010  0.00200      3      3    A    B
1  0.0003  0.00100      2      2    A    C
2  0.0020  0.00001      2      3    B    D

相关问题 更多 >