使用matplotlib和seaborn绘制异常值

2024-06-16 10:18:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我对一家购物中心的一些入口传感器数据进行了异常检测。我想为每个入口创建一个图,并突出显示异常值(在数据框中的异常值列中标记为True)

以下是两个入口的一小部分数据,时间跨度为六天:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.DataFrame({"date": [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6],
                   "mall": ["Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1", "Mall1"],
                   "entrance": ["West", "West","West","West","West", "West", "East", "East", "East", "East", "East", "East"],
                   "in": [132, 140, 163, 142, 133, 150, 240, 250, 233, 234, 2000, 222],
                   "outlier": [False, False, False, False, False, False, False, False, False, False, True, False]})

为了创建几个图(完整数据中有20个入口),我在seaborn遇到了lmplot

sns.set_theme(style="darkgrid")
for i, group in df.groupby('entrance'):
    sns.lmplot(x="date", y="in", data=group, fit_reg=False, hue = "entrance")
    #pseudo code
    #for the rows that have an outlier (outlier = True) create a red dot for that observation
plt.show()

我想在这里完成两件事:

  1. 线图而不是散点图。我没有成功地使用sns.lineplot为每个入口创建单独的绘图,因为lmplot似乎更适合于此
  2. 对于每个入口图,我想显示哪些观察值是异常值,最好是红点。在我的绘图尝试中,我尝试过编写一些伪代码

Tags: 数据inimportfalsetrueforaspd
1条回答
网友
1楼 · 发布于 2024-06-16 10:18:08
  • seaborn.lmplot是一个Facetgrid,我认为在这种情况下,它更难使用
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

for i, group in df.groupby(['entrance']):

    # plot all the values as a lineplot
    sns.lineplot(x="date", y="in", data=group)
    
    # select the data when outlier is True and plot it
    data_t = group[group.outlier == True]
    sns.scatterplot(x="date", y="in", data=data_t, c=['r'])

    # add a title using the value from the groupby
    plt.title(f'Entrance: {i}')
    
    # show the plot here, not outside the loop
    plt.show()

enter image description here

备选方案

  • 此选项将允许设置图形的列数和行数
import math

# specify the number of columns to plot
ncols = 2

# determine the number of rows, even if there's an odd number of unique entrances
nrows = math.ceil(len(df.entrance.unique()) / ncols)

fig, axes = plt.subplots(ncols=ncols, nrows=nrows, figsize=(16, 16))

# extract the axes into an nx1 array, which is easier to index with idx.
axes = axes.ravel()

for idx, (i, group) in enumerate(df.groupby(['entrance'])):

    # plot all the values as a lineplot
    sns.lineplot(x="date", y="in", data=group, ax=axes[idx])
    
    # select the data when outlier is True and plot it
    data_t = group[group.outlier == True]
    sns.scatterplot(x="date", y="in", data=data_t, c=['r'], ax=axes[idx])
    axes[idx].set_title(f'Entrance: {i}')

相关问题 更多 >