用Python在散点图上计算并绘制95%范围的数据

2条回答

网友

1楼 · 编辑于 2024-04-20 10:42:59

通勤的实际持续时间和预测之间的关系应该是线性的，所以我可以使用quantile regression：

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf

# Import data and print the last few rows
commutes = pd.read_csv('https://raw.githubusercontent.com/blokeley/commutes/master/commutes.csv')

# Create the quantile regression model
model = smf.quantreg('duration ~ prediction', commutes)

# Create a list of quantiles to calculate
quantiles = [0.05, 0.25, 0.50, 0.75, 0.95]

# Create a list of fits
fits = [model.fit(q=q) for q in quantiles]

# Create a new figure and axes
figure, axes = plt.subplots()

# Plot the scatter of data points
x = commutes['prediction']
axes.scatter(x, commutes['duration'], alpha=0.4)

# Create an array of predictions from the minimum to maximum to create the regression line
_x = np.linspace(x.min(), x.max())

for index, quantile in enumerate(quantiles):
    # Plot the quantile lines
    _y = fits[index].params['prediction'] * _x + fits[index].params['Intercept']
    axes.plot(_x, _y, label=quantile)

# Plot the line of perfect prediction
axes.plot(_x, _x, 'g ', label='Perfect prediction')
axes.legend()
axes.set_xlabel('Predicted duration (minutes)')
axes.set_ylabel('Actual duration (minutes)');

这样可以得到：

非常感谢我的同事Philip提供的分位数回归技巧。在

网友

2楼 · 编辑于 2024-04-20 10:42:59

你应该把你的数据拟合成高斯分布，在3西格玛标准偏差内，这将代表96%左右的结果。在

注意正态分布。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

用Python在散点图上计算并绘制95%范围的数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >