使用seaborn将yaxis值限制为1的问题

import numpy as np import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from matplotlib.pyplot import figure # Read in the backscatter csv file as a data frame df_lakearea = pd.read_csv('lake_area.csv') figure(num=None, figsize=(8, 6), dpi=300, facecolor='w', edgecolor='k') # Control aesthetics sns.set() sns.set(style="whitegrid", rc={"grid.linewidth": 0.2, "lines.linewidth": 0.5}) # White grid background, width of grid line and series line sns.set_context(font_scale = 0.5) # Scale of font # Use seaborn pointplot function to plot the lake area lakearea_plot = sns.pointplot(x="variable", y="value", data=pd.melt(df_lakearea), color='maroon', linestyles=["-"], join="True", capsize=0.2) # Use the pd.melt function to converts the wide-form data frame to long-form. # Rotate the x axis labels so that they are readable plt.setp(lakearea_plot.get_xticklabels(), rotation=20) params = {'mathtext.default': 'regular' } plt.rcParams.update(params) lakearea_plot.set(xlabel='', ylabel='Area $(km^2)$') lakearea_plot.tick_params(labelsize=8) # Control the label size

1条回答

网友

1楼 · 发布于 2024-04-23 21:27:50

首先，当您在seaborn中绘制一个分类点图时，您的y值（数值）将聚合到基于每个类别的平均值。让我们使用seaborn的数据集来演示。你知道吗

import seaborn as sns

df = sns.load_dataset('tips')
sns.pointplot(x='day', y='tip', data=df)

在这个图中，您可以看到Thur的y值大约为2.8，这是因为Thur上的tips的平均值是2.8。我们可以通过以下方式进行验证：

df.groupby('day').tip.mean()

[Out]:
day
Thur    2.771452
Fri     2.734737
Sat     2.993103
Sun     3.255132
Name: tip, dtype: float64

其次，你可能也注意到Fri比其他组有更大的置信区间（CI）。事实上，这种线图中CI的大小表示样本大小，而不是数据分布。我们可以通过以下方式进行验证：

df.day.value_counts()

[Out]:
df.day.value_counts()
Sat     87
Sun     76
Thur    62
Fri     19
Name: day, dtype: int64

如您所见，我们的数据集中只有19个与Fri相关的观测值。因此，与其他群体相比，我们对自己的估计（平均值）“信心不足”。这就是为什么它有一个比其他群体更广泛的CI。你知道吗

下面是另一个例子：

sns.regplot(x='total_bill', y='tip', data=df)

你可以看出CI在50左右要宽得多，因为我们只有几个数据点。你知道吗

因此，您应该检查数据中每个组的平均值是否在y轴限制范围内，以及CI是否表示每个组中数据点的数量。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章