如何使用rpy2使用for循环测试显著性?

2024-04-19 16:20:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图使用r(在rpy2包的帮助下)对pandas数据帧中的一些变量运行t测试。我使用jupyter笔记本中的魔法函数让python与R交互,除了循环之外,交互是成功的。你知道吗

以下是数据帧:

df.head()
Out[60]: 
              ID Category  Num Vert_Horizon Description  Fem_Valence_Mean  \
0  Animals_001_h  Animals    1            h  Dead Stork              2.40   
1  Animals_002_v  Animals    2            v        Lion              6.31   
2  Animals_003_h  Animals    3            h       Snake              5.14   
3  Animals_004_v  Animals    4            v        Wolf              4.55   
4  Animals_005_h  Animals    5            h         Bat              5.29   

   Fem_Valence_SD  Fem_Av/Ap_Mean  Fem_Av/Ap_SD  Arousal_Mean       ...        \
0            1.30            3.03          1.47          6.72       ...         
1            2.19            5.96          2.24          6.69       ...         
2            1.19            5.14          1.75          5.34       ...         
3            1.87            4.82          2.27          6.84       ...         
4            1.56            4.61          1.81          5.50       ...         

   Luminance  Contrast  JPEG_size80   LABL   LABA   LABB  Entropy  \
0     126.05     68.45       263028  51.75  -0.39  16.93     7.86   
1     123.41     32.34       250208  52.39  10.63  30.30     6.71   
2     135.28     59.92       190887  55.45   0.25   4.41     7.83   
3     122.15     75.10       282350  49.84   3.82   1.36     7.69   
4     131.81     59.77       329325  54.26  -0.34  -0.95     7.82   

   Classification  valence_median_split  temp_selection  
0                           Low_Valence             OUT  
1                          High_Valence             NaN  
2                           Low_Valence             OUT  
3                           Low_Valence             OUT  
4                           Low_Valence             OUT  

[5 rows x 35 columns]

以下是我尝试的方法:

%Rpush df

Variables = 'All_Valence_Mean', 'Male_Valence_Mean', 'Fem_Valence_Mean'

for var in Variables:
    %R var + '_Sig' <- t.test(var ~ valence_median_split, data = df, var.equal = TRUE)

我正在尝试将结果保存到添加了“Sig”字符串的“var”变量中。这个组件并不重要,但我真正想要的是让这个命令将“var”识别为变量列表中的一个变量。你知道吗

下面是我得到的错误:

Error in model.frame.default(formula = var ~ valence_median_split, data = df) : 
  invalid type (list) for variable 'var'

Error in model.frame.default(formula = var ~ valence_median_split, data = df) : 
  invalid type (list) for variable 'var'

Error in model.frame.default(formula = var ~ valence_median_split, data = df) : 
  invalid type (list) for variable 'var'
/anaconda3/lib/python3.7/site-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Error in model.frame.default(formula = var ~ valence_median_split, data = df) : 
  invalid type (list) for variable 'var'

  warnings.warn(x, RRuntimeWarning)

Tags: indffordatavarerroroutmean
1条回答
网友
1楼 · 发布于 2024-04-19 16:20:17

如果您对R更熟悉,那么将尽可能多的逻辑结果推送到R。例如,这将把结果存储在results 您将能够在随后的笔记本单元中从Python访问。你知道吗

%%R -i df -o results

Variables <- c("All_Valence_Mean", "Male_Valence_Mean",
               "Fem_Valence_Mean")
results <- list()

for (var in Variables) {
    results[[paste0(var, '_Sig')]] <- t.test(
        as.formula(paste(var, '~ valence_median_split')),
        data = df, var.equal = TRUE)
}

如果您对Python更熟悉,请尽可能多地使用Python:

Variables = ('All_Valence_Mean', 'Male_Valence_Mean',
             'Fem_Valence_Mean')
results = dict()
from rpy2.robjects.packages import importr
from rpy2.robjects import Formula
stats = importr('stats')

for var in Variables:
    results[('%s_Sig' % var] = stats.t_test(
        Formula('%s ~ valence_median_split' % var),
        data=df, var_equal=True)

相关问题 更多 >