如何在不循环数据集的情况下突出显示周末?

2024-05-19 19:28:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试使用plotly绘制三个不同的timeseries数据帧(每个约60000条记录),同时使用不同的背景色突出显示周末(和工作时间)

有没有一种方法可以做到这一点,而不必像this solution中提到的那样遍历整个数据集。虽然这种方法可能有效,但在大型数据集上的性能可能很差


Tags: 数据方法记录时间绘制plotlythis性能
2条回答

我会考虑使用^ {CD1>},并将一个^ {< CD2>}附加到第二个y轴上,作为背景颜色而不是形状来指示周末。

绘图:

enter image description here

对于我的系统上下面代码段中的nperiods = 2000%%timeit返回:

162 ms ± 1.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

我最初建议使用fig.add_shape()的方法要慢得多:

49.2 s ± 2.18 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

完整代码:

# %%timeit
# imports

import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
import datetime
from plotly.subplots import make_subplots

pd.set_option('display.max_rows', None)

# data sample
cols = ['signal']
nperiods = 2000
np.random.seed(12)
df = pd.DataFrame(np.random.randint(-2, 2, size=(nperiods, len(cols))),
                  columns=cols)
datelist = pd.date_range(datetime.datetime(2020, 1, 1).strftime('%Y-%m-%d'),periods=nperiods).tolist()
df['date'] = datelist 
df = df.set_index(['date'])
df.index = pd.to_datetime(df.index)
df.iloc[0] = 0
df = df.cumsum().reset_index()
df['signal'] = df['signal'] + 100

# %%timeit
df['weekend'] = np.where((df.date.dt.weekday == 5) | (df.date.dt.weekday == 6), 1, 0 )

fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(go.Scatter(x=df['date'], y=df.weekend, fill = 'tonexty', showlegend = False), secondary_y=True)

fig.update_traces(line_shape = 'hv',
                  line_color = 'rgba(0,0,0,0)',
                  fillcolor = 'rgba(99, 110, 250, 0.2)',
                  row = 1, col = 1)
fig.update_xaxes(showgrid=False)#, gridwidth=1, gridcolor='rgba(0,0,255,0.1)')
fig.update_layout(yaxis2_range=[-0,0.1], yaxis2_showgrid=False,  yaxis2_tickfont_color = 'rgba(0,0,0,0)')
fig.add_trace(go.Scatter(x=df['date'], y = df.signal, line_color = 'blue'), secondary_y = False)

fig.update_layout()

fig.show()

您可以使用填充区域图表一次突出显示所有周末,而无需使用循环,也无需创建多个形状,有关示例,请参见下面的代码

import pandas as pd
import numpy as np
import plotly.graph_objects as go

# generate a time series
df = pd.DataFrame({
    'date': pd.date_range(start='2021-01-01', periods=18, freq='D'),
    'value': 100 * np.cumsum(np.random.normal(loc=0.01, scale=0.005, size=18))
})

# define the y-axis limits
ymin, ymax = df['value'].min() - 5, df['value'].max() + 5

# create an auxiliary time series for highlighting the weekends, equal
# to "ymax" on Saturday and Sunday, and to "ymin" on the other days
df['weekend'] = np.where(df['date'].dt.day_name().isin(['Saturday', 'Sunday']), ymax, ymin)

# define the figure layout
layout = dict(
    plot_bgcolor='white',
    paper_bgcolor='white',
    margin=dict(t=5, b=5, l=5, r=5, pad=0),
    yaxis=dict(
        range=[ymin, ymax],  # fix the y-axis limits
        tickfont=dict(size=6),
        linecolor='#000000',
        color='#000000',
        showgrid=False,
        mirror=True
    ),
    xaxis=dict(
        type='date',
        tickformat='%d-%b-%Y (%a)',
        tickfont=dict(size=6),
        nticks=20,
        linecolor='#000000',
        color='#000000',
        ticks='outside',
        mirror=True
    ),
)

# add the figure traces
data = []

# plot the weekends as a filled area chart
data.append(
    go.Scatter(
        x=df['date'],
        y=df['weekend'],
        fill='tonext',
        fillcolor='#d9d9d9',
        mode='lines',
        line=dict(width=0, shape='hvh'),
        showlegend=False,
        hoverinfo=None,
    )
)

# plot the time series as a line chart
data.append(
    go.Scatter(
        x=df['date'],
        y=df['value'],
        mode='lines+markers',
        marker=dict(size=4, color='#cc503e'),
        line=dict(width=1, color='#cc503e'),
        showlegend=False,
    )
)

# create the figure
fig = go.Figure(data=data, layout=layout)

# save the figure
fig.write_image('figure.png', scale=2, width=500, height=300)

enter image description here

相关问题 更多 >