使用statsmodel(pandas/matplotlib)在散点图上绘制p值

2024-05-26 16:28:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要帮助将p值添加到我的数字中,但我有三个问题。1) 每当我使用statsmodel来计算p-values时,我得到两个p-values,一个用于“截取”,一个用于y变量(我要绘制的变量)。2) 我正在使用循环一次创建多个图形。3) 我不知道如何分离我想要绘制的特定p-value,因为当我打印p值时,它显示了我正在准备的每个图形的p-values。以下是我的代码,以防您想了解我对这两个p-values的意思:

###(this is sample data in case you are trying to recreate the code)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
import statsmodels.api as sm

dpm=pd.DataFrame({'pm10_3135_2018':[30,34,32,44,45,46,59,54,59,30],
'nox_3135(ppb)':[20,29,27,31,33,33,34,23,32,31],
'CO_3135(ppm)':[0.8,0.9,0.1,0.2,0.5,0.5,0.7,0.8,0.9,0.3],
'O3_mda8_3135':[42,45,47,51,52,52,57,67,69,70],
'pm25_3135_2018':[6,7,6,7,4,5,2,11,9,18]})

##PM2.5 vs variables - whole year

dpm = dpm.reset_index()

x = [dpm.pm10_3135_2018,dpm['nox_3135(ppb)'],dpm['CO_3135(ppm)'],dpm.O3_mda8_3135]
y = dpm.pm25_3135_2018
xlab = ["PM10 (ug/m^3)", "NOx (ppb)", "CO (ppm)", "O3 MDA8 (ppb)"]
fnames = ['NOMR2_PM10vsPM25_yr_2018.png','NOMR2_NOxvsPM25_yr_2018.png','NOMR2_COvsPM25_yr_2018.png','NOMR2_O3vsPM25_yr_2018.png']

for xcol,lab,fname in zip(x,xlab,fnames):

    correlation_matrix1 = np.corrcoef(xcol, y)
    correlation_xy1 = correlation_matrix1[0,1]
    R2_1 = correlation_xy1**2
    m, b = np.polyfit(xcol,y,1)
    equation = 'y = ' + str(round(m,4)) + 'x' ' + ' + str(round(b,4))
    R2 = '$R^2$ =' + str(round(R2_1,3))
    fig, ax = plt.subplots()
    ax.plot(xcol, y, color='xkcd:red',linestyle='None',marker='o')
    ax.set_xlabel(lab,fontsize=15)
    ax.set_ylabel('PM2.5 (ug/m^3)',fontsize=15)
    ax.set_ylim(0,)
    ax.set_xlim(0,)
    plt.text(0.75, 0.65, equation, horizontalalignment='center',
             verticalalignment='center',
             transform=ax.transAxes)
    plt.text(0.7, 0.6, R2, horizontalalignment='center',
         verticalalignment='center',
         transform=ax.transAxes)
    model = smf.ols('xcol ~ y', data=dpm).fit()
    print(model.summary())
    print(model.pvalues)

对于代码的下一部分,我有这个,但我需要一种方法从statsmodel函数调用y变量p-values,并创建一个新变量P来表示这些p-values,然后在图上绘制P,但我不知道如何做到这一点(免责声明,这不是我的实际数据,因此数据点之间没有太多相关性,但过程是相同的)。

plt.text(0.7, 0.55, P, horizontalalignment='center',
     verticalalignment='center',
     transform=ax.transAxes)

fig.tight_layout()
#plt.savefig(fname)

Tags: importpngaspltaxdpmppbcenter
1条回答
网友
1楼 · 发布于 2024-05-26 16:28:04

model.pvalues是一个pandas系列(即使用type(model.pvalues)进行检查),因此如果要提取y的p值,只需执行以下操作

model.pvalues['y']

要将p值添加到绘图中,可以添加:

print(model.pvalues)
plt.text(0.7, 0.8, "y p-values: %.2f" %(model.pvalues['y']), horizontalalignment='center',
     verticalalignment='center',
     transform=ax.transAxes)

在这里,我添加了一点文本格式"y p-value..",以使绘图更清晰

下面是完整的循环:

for xcol,lab,fname in zip(x,xlab,fnames):

    correlation_matrix1 = np.corrcoef(xcol, y)
    correlation_xy1 = correlation_matrix1[0,1]
    R2_1 = correlation_xy1**2
    m, b = np.polyfit(xcol,y,1)
    equation = 'y = ' + str(round(m,4)) + 'x' ' + ' + str(round(b,4))
    R2 = '$R^2$ =' + str(round(R2_1,3))
    fig, ax = plt.subplots()
    ax.plot(xcol, y, color='xkcd:red',linestyle='None',marker='o')
    ax.set_xlabel(lab,fontsize=15)
    ax.set_ylabel('PM2.5 (ug/m^3)',fontsize=15)
    ax.set_ylim(0,)
    ax.set_xlim(0,)
    plt.text(0.75, 0.65, equation, horizontalalignment='center',
             verticalalignment='center',
             transform=ax.transAxes)
    plt.text(0.7, 0.6, R2, horizontalalignment='center',
         verticalalignment='center',
         transform=ax.transAxes)
    model = smf.ols('xcol ~ y', data=dpm).fit()
    print(model.summary())
    print(model.pvalues)

    #added code:
    plt.text(0.7, 0.8, "y p-values: %.2f" %(model.pvalues['y']), horizontalalignment='center',
         verticalalignment='center',
         transform=ax.transAxes)

另外,如果我正确地解释了您的代码、注释和标准统计信息,那么您的公式应该是

model = smf.ols('y ~ xcol', data=dpm).fit()

在本例中,您希望提取x变量的p值,以便使用model.pvalues[xcol]修改上述代码

相关问题 更多 >

    热门问题