分组值计数条形图的子图

2024-05-15 09:45:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我的桌子看起来像下面的东西

YEAR    RESPONSIBLE DISTRICT
2014    01 - PARIS
2014    01 - PARIS
2014    01 - PARIS
2014    01 - PARIS
2014    01 - PARIS
... ... ...
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO
2017    15 - SAN ANTONIO

在我写了

g = df.groupby('FISCAL YEAR')['RESPONSIBLE DISTRICT'].value_counts()

我在下面

YEAR         RESPONSIBLE DISTRICT
2014         05 - LUBBOCK            12312
             15 - SAN ANTONIO        10457
             18 - DALLAS              9885
             04 - AMARILLO            9617
             08 - ABILENE             8730
                                     ...  
2020         21 - PHARR               5645
             25 - CHILDRESS           5625
             20 - BEAUMONT            5560
             22 - LAREDO              5034
             24 - EL PASO             4620

我总共有25个区。现在我想创建25个子地块,每个子地块代表一个地区。对于每个子地块,我希望2014-2020年在x轴上,价值计数在y轴上。我怎么能这么做


Tags: dfvalueyearresponsiblesanfiscalgroupbyparis
3条回答
  • 仅使用pandas的正确方法是使用^{}塑造数据帧,然后正确使用^{}

进口及;资料

import pandas as pd
import numpy as np  # for test data
import seaborn as sns  # only for seaborn option

# test data
np.random.seed(365)
rows = 100000
data = {'YEAR': np.random.choice(range(2014, 2021), size=rows),
        'RESPONSIBLE DISTRICT': np.random.choice(['05 - LUBBOCK', '15 - SAN ANTONIO', '18 - DALLAS', '04 - AMARILLO', '08 - ABILENE', '21 - PHARR', '25 - CHILDRESS', '20 - BEAUMONT', '22 - LAREDO', '24 - EL PASO'], size=rows)}
df = pd.DataFrame(data)

# get the value count of each district by year and pivot the shape
dfp = df.value_counts(subset=['YEAR', 'RESPONSIBLE DISTRICT']).reset_index(name='VC').pivot(index='YEAR', columns='RESPONSIBLE DISTRICT', values='VC')

# display(dfp)
RESPONSIBLE DISTRICT  04 - AMARILLO  05 - LUBBOCK  08 - ABILENE  15 - SAN ANTONIO  18 - DALLAS  20 - BEAUMONT  21 - PHARR  22 - LAREDO  24 - EL PASO  25 - CHILDRESS
YEAR                                                                                                                                                                
2014                           1407          1406          1485              1456         1392           1456        1499         1458          1394            1452
2015                           1436          1423          1428              1441         1395           1400        1423         1442          1375            1399
2016                           1480          1381          1393              1415         1446           1442        1414         1435          1452            1454
2017                           1422          1388          1485              1447         1404           1401        1413         1470          1424            1426
2018                           1479          1424          1384              1450         1390           1384        1445         1435          1478            1386
2019                           1387          1317          1379              1457         1457           1476        1447         1459          1451            1406
2020                           1462          1452          1454              1448         1441           1428        1411         1407          1402            1445

pandas.DataFrame.plot

  • 如果首选线图,请使用kind='line'
# plot the dataframe
fig = dfp.plot(kind='bar', subplots=True, layout=(5, 5), figsize=(20, 20), legend=False)

enter image description here

seaborn.catplot

  • seabornmatplotlib的高级API
  • 这是最简单的方法,因为数据帧不需要重塑
p = sns.catplot(kind='count', data=df, col='RESPONSIBLE DISTRICT', col_wrap=5, x='YEAR', height=3.5, )
p.set_titles(row_template='{row_name}', col_template='{col_name}')  # shortens the titles

enter image description here

这是你所期望的吗

import matplotlib.pyplot as plt

fig, axs = plt.subplots(5, 5, sharex=True, sharey=True, figsize=(15, 15))
for ax, (district, sr) in zip(axs.flat, g.groupby('RESPONSIBLE DISTRICT')):
    ax.set_title(district)
    ax.plot(sr.index.get_level_values('YEAR'), sr.values)
fig.tight_layout()

plt.show()

production.png

这应该行得通

import matplotlib.pyplot as plt
import pandas as pd


g = df.groupby('YEAR')['RESPONSIBLE DISTRICT'].value_counts()


fig, axs = plt.subplots(5, 5, constrained_layout=True)

for ax, (district, dfi) in zip(axs.ravel(), g.groupby('RESPONSIBLE DISTRICT')):
    x = dfi.index.get_level_values('YEAR').values
    y = dfi.values
    ax.bar(x, y)
    ax.set_title(district)

plt.show()

相关问题 更多 >