Plotly：如何使用Plotly和Plotly express绘制回归线？

2条回答

网友

1楼 · 编辑于 2024-06-01 00:50:59

Plotly还附带了statsmodels的本机包装器，用于打印（非线性）直线：

从他们的文档中引用：https://plotly.com/python/linear-fits/


import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()

网友

2楼 · 编辑于 2024-06-01 00:50:59

更新1:

现在，plotly express可以轻松地处理long and wide format（在您的例子中是后者）的数据，只需绘制回归线：

fig = px.scatter(df, x='X', y='Y', trendline="ols")

在问题末尾完成宽数据的代码片段

如果希望回归线突出，可以直接通过以下方式编辑线颜色：

fig.data[1].line.color = 'red'

您可以访问回归参数，如alpha和betathrough：

model = px.get_trendline_results(fig)
alpha = model.iloc[0]["px_fit_results"].params[0]
beta = model.iloc[0]["px_fit_results"].params[1]

您甚至可以通过以下方式请求非线性拟合：

fig = px.scatter(df, x='X', y='Y', trendline="lowess")

那么那些长格式呢？这就是plotly express展示其一些真正威力的地方。如果以内置数据集px.data.gapminder为例，则可以通过指定color="continent"来触发国家/地区数组的单个行：

长格式的完整代码段

import plotly.express as px

df = px.data.gapminder().query("year == 2007")
fig = px.scatter(df, x="gdpPercap", y="lifeExp", color="continent", trendline="lowess")
fig.show()

如果你想在模型选择和输出方面有更大的灵活性，你可以参考我对下面这篇文章的原始答案。但首先，在我回答的开头，这里有一个完整的例子片段：

宽数据的完整片段

import plotly.graph_objects as go
import plotly.express as px
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20
X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})

# figure with regression
# fig = px.scatter(df, x='X', y='Y', trendline="ols")
fig = px.scatter(df, x='X', y='Y', trendline="lowess")

# make the regression line stand out
fig.data[1].line.color = 'red'

# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')

fig.show()

原始答案：

对于回归分析，我喜欢使用statsmodels.api或sklearn.linear_model。我还喜欢在一个数据框架中组织数据和回归结果。这里有一种方法可以以干净、有条理的方式完成您想要的任务：

使用sklearn或statsmodels绘图：

使用sklearn进行编码：

from sklearn.linear_model import LinearRegression
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20

X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
df = pd.DataFrame({'X': X, 'Y':Y})

# regression
reg = LinearRegression().fit(np.vstack(df['X']), Y)
df['bestfit'] = reg.predict(np.vstack(df['X']))

# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))

# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')

fig.show()

使用statsmodels的代码：

import plotly.graph_objects as go
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime

# data
np.random.seed(123)
numdays=20

X = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()
Y = (np.random.randint(low=-20, high=20, size=numdays).cumsum()+100).tolist()

df = pd.DataFrame({'X': X, 'Y':Y})

# regression
df['bestfit'] = sm.OLS(df['Y'],sm.add_constant(df['X'])).fit().fittedvalues

# plotly figure setup
fig=go.Figure()
fig.add_trace(go.Scatter(name='X vs Y', x=df['X'], y=df['Y'].values, mode='markers'))
fig.add_trace(go.Scatter(name='line of best fit', x=X, y=df['bestfit'], mode='lines'))


# plotly figure layout
fig.update_layout(xaxis_title = 'X', yaxis_title = 'Y')

fig.show()

更新1:

长格式的完整代码段

宽数据的完整片段

原始答案：

相关问题更多 >

编程相关推荐

热门问题

热门文章