DataframeGroupbySeries的绘图历史记录

data = pd.read_csv('insurance.csv') age sex bmi children smoker region charges 0 19 female 27.900 0 yes southwest 16884.92400 1 18 male 33.770 1 no southeast 1725.55230 2 28 male 33.000 3 no southeast 4449.46200 3 33 male 22.705 0 no northwest 21984.47061 4 32 male 28.880 0 no northwest 3866.85520 data.groupby('sex').region.hist()

1条回答

网友

1楼 · 发布于 2024-05-16 13:05:25

要根据性别为每列生成直方图，请执行以下操作：
- 'children'和'smoker'看起来不同，因为数字是离散的，分别只有6个和2个唯一值
- data.groupby('sex').hist(layout=(1, 4), figsize=(12, 4), ec='k', grid=False)将单独生成图形，但没有添加标题的简单方法
生成正确的可视化通常涉及为绘图API重塑数据
在python 3.8.11、pandas 1.3.2、matplotlib 3.4.2、seaborn 0.11.2

import pandas as pd

# load data
data = pd.read_csv('insurance.csv')

# convert smoker from a string to int value; hist doesn't work on object type columns
data.smoker = data.smoker.map({'no': 0, 'yes': 1})

# group each column by sex; data.groupby(['sex', 'region']) is also an option
for gender, df in data.groupby('sex'):

    # plot a hist for each column
    axes = df.hist(layout=(1, 5), figsize=(15, 4), ec='k', grid=False)

    # extract the figure object from the array of axes
    fig = axes[0][0].get_figure()

    # add the gender as the title
    fig.suptitle(gender)

关于OP中的data.groupby('sex').region.hist()，这是一个计数图，显示每个地区的性别计数；这不是直方图
^{}默认情况下计算因子的频率表

ax = pd.crosstab(data.region, data.sex).plot(kind='bar', rot=0)
ax.legend(title='gender', bbox_to_anchor=(1, 1.02), loc='upper left')

使用^{}

这需要使用^{}将数据从宽格式转换为长格式

import pandas as pd
import seaborn as sns

data = pd.read_csv('insurance.csv')
data.smoker = data.smoker.map({'no': 0, 'yes': 1})

# convert the dataframe from a wide to long form
df = data.melt(id_vars=['sex', 'region'])

# plot
p = sns.displot(data=df, kind='hist', x='value', col='variable', row='region', hue='sex',
                multiple='dodge', common_bins=False, facet_kws={'sharey': False, 'sharex': False})

使用^{}

相关问题更多 >

编程相关推荐

热门问题

热门文章

DataframeGroupbySeries的绘图历史记录

使用^{}

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >