计算单独的相关性,按列值分组

2024-03-29 06:27:46 发布

您现在位置:Python中文网/ 问答频道 /正文

给定2个数据帧

A = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[1,2,3,3,2,1]})
B = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[4,3,2,2,3,4]})

A

  one   two
0   a   1
1   a   2
2   a   3
3   b   3
4   b   2
5   b   1

B类

    one two
0   a   4
1   a   3
2   a   2
3   b   2
4   b   3
5   b   4

如何同时计算相关性A[A['one']=='a']['two'].corr(B[B['one']['two'] =='a'])A[A['one']=='b']['two'].corr(B[B['one']['two'] =='b'])?最终目标是将相关性绘制为“一”列值“a”和“b”的函数,即

  corr
a  -1.0
b  -1.0

Tags: 数据函数dataframe绘制onepdtwocorr
2条回答

迭代这两个组的一种方法是:

x, y = A.groupby('one'), B.groupby('one')

res = {i[0]:i[1].two.corr(y.get_group(i[0]).two) for i in x}

pd.DataFrame(res.items())
#   0  1
#0  a -1
#1  b -1
import pandas as pd
import numpy as np

A = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[1,2,3,3,2,1]})
B = pd.DataFrame({'one':['a','a','a','b','b','b'], 'two':[4,3,2,2,3,4]})

A = A.set_index('one').sort_index()
B = B.set_index('one').sort_index()
# as they must have the same number of obs on a or b in both dfs, do horizontal concat
df = pd.concat([A, B], keys=['A', 'B'], axis=1)

def cal_corr(group):
    return pd.Series({'corr': group.A.corrwith(group.B).values[0]})

df.groupby(level='one').apply(cal_corr)

Out[211]: 
     corr
one      
a      -1
b      -1

相关问题 更多 >