Pandas和Matplotlib需要使用下拉列表按国家和特定国家首选疫苗的条形图接种百分比

2024-05-23 23:09:40 发布

您现在位置:Python中文网/ 问答频道 /正文

这是数据集

    location    date    vaccine total_vaccinations
0   Austria 2021-01-08  Johnson&Johnson 0
1   Austria 2021-01-08  Moderna 0
2   Austria 2021-01-08  Oxford/AstraZeneca  0
3   Austria 2021-01-08  Pfizer/BioNTech 30938
4   Austria 2021-01-15  Johnson&Johnson 0
... ... ... ... ...
8633    Uruguay 2021-07-05  Pfizer/BioNTech 1024793
8634    Uruguay 2021-07-05  Sinovac 3045997
8635    Uruguay 2021-07-06  Oxford/AstraZeneca  43245
8636    Uruguay 2021-07-06  Pfizer/BioNTech 1038942
8637    Uruguay 2021-07-06  Sinovac 3079853
8638 rows × 4 columns

我在Jupyter笔记本电脑公司工作

  1. 需要接种疫苗的国家百分比
  2. 使用下拉菜单(交互式绘图小部件)在特定国家绘制带有首选疫苗的条形图

Tags: 数据datelocation国家totaloxfordjohnsonvaccine
2条回答
  • 您可以从OWID中获取包括总体数据的新冠病毒数据
  • 这似乎是您根据制造商获取数据的地方
  • 数据可以与整体新冠病毒数据合并,以便您注意到的所有属性都可用
  • 使用了绘图,因此隐藏/显示跟踪是交互式的
  • 注意:并非许多国家按制造商公布数据
import requests, io
import pandas as pd

# get data by manufactuerer
dfm = pd.read_csv(io.StringIO(
    requests.get("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations-by-manufacturer.csv").text))

# get all COVID data
dfall = pd.read_csv(io.StringIO(
    requests.get("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv").text))

# join two datasets together and make manufactuerer data columns. NB not all countries publish this data...
dfv = (
    dfall.set_index(["location", "date"])
    .join(
        dfm.set_index(["location", "date", "vaccine"])
        .unstack("vaccine")
        .droplevel(0, 1),
        how="inner",
    )
    .reset_index()
)

# filter to latest data only
dfplot = (
    dfv.sort_values(["iso_code", "date"])
    .groupby("iso_code", as_index=False)
    .last()
    .sort_values("people_fully_vaccinated_per_hundred", ascending=False)
)

import plotly.express as px
import plotly.graph_objects as go

# use plotly so it's interactive.  rebase vaccines given by population
fig = px.bar(
    dfplot.assign(
        **{c: dfplot[c] / dfplot["population"] for c in dfm["vaccine"].unique()}
    ),
    x="location",
    y=dfm["vaccine"].unique(),
)
# add a line of people fully vaccinated
fig.add_trace(
    go.Scatter(
        x=dfplot["location"],
        y=dfplot["people_fully_vaccinated_per_hundred"] / 100,
        name="Fully vaccinated",
        mode="lines",
        line={"color": "purple", "width": 4},
    )
)

enter image description here

updated

  • 最初的要求是要求接种疫苗的人的百分比。已根据注释删除此项
  • 需求实际上被重新表述为一个交互式仪表板,所以我们使用了dash
from jupyter_dash import JupyterDash
import dash_core_components as dcc
import dash_html_components as html
import dash_table
import dash_bootstrap_components as dbc
from dash.dependencies import Input, Output, State
import requests, io
import pandas as pd
import plotly.express as px

# get data by manufactuerer
dfm = pd.read_csv(io.StringIO(
    requests.get("https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations-by-manufacturer.csv").text))


def buildTab(col="location"):
    dfc = pd.DataFrame({col: dfm[col].unique()})
    return dash_table.DataTable(
        id=col,
        columns=[{"name": c, "id": c} for c in dfc.columns],
        data=dfc.to_dict("records"),
        row_selectable="multi",
        style_header={"fontWeight": "bold"},
        style_as_list_view=True,
        css=[{"selector": ".dash-spreadsheet tr", "rule": "height: 5px;"}],
    )

# Build App
app = JupyterDash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])
app.layout = html.Div(
    [
        dbc.Row(
            [
                dbc.Col(
                    buildTab(col="location"),
                    width=3,
                    style={"height": "20vh", "overflow-y": "auto"},
                ),
                dbc.Col(
                    buildTab(col="vaccine"),
                    width=3,
                    style={"height": "20vh", "overflow-y": "auto"},
                ),
            ],
        ),
        html.Div(id="graphs"),
    ],
    style={
        "font-family": "Arial",
        "font-size": "0.9em",
    },
)

@app.callback(
    Output(component_id="graphs", component_property="children"),
    Input("location", "selected_rows"),
    Input("vaccine", "selected_rows"),
    State("location", "data"),
    State("vaccine", "data"),
)
def updateGraphs(selected_location, selected_vaccine, location, vaccine):
    global dfm
    if selected_location and selected_vaccine:
        d = dfm.merge(
            pd.DataFrame(location).iloc[selected_location], on="location", how="inner"
        ).merge(pd.DataFrame(vaccine).iloc[selected_vaccine], on="vaccine", how="inner")
        return dcc.Graph(
            figure=px.bar(
                d.sort_values(["location", "vaccine", "date"])
                .groupby(["location", "vaccine"], as_index=False)
                .last(),
                x="location",
                y="total_vaccinations",
                color="vaccine",
            )
        )
    else:
        return None

# Run app and display result inline in the notebook
app.run_server(mode="inline")

我可以按国家提供百分比帮助,但不能提供条形图部分。您可以使用groupby、merge和math来获取所需的数字:

df = pd.DataFrame({'location': ['Austria', 'Austria', 'Austria', 'Austria'],
                   'vaccine': ['Moderna', 'Johnson&Johnson', 'Moderna', 'Johnson&Johnson'],
                   'total_vaccinations': [1, 2, 3, 4]})

# df_tcv = df_total_by_country_by_vaccine
df_tcv = df.groupby(['location', 'vaccine'], as_index=False)['total_vaccinations'].sum()

df_total_by_country = df_tcv.groupby('location', as_index=False)['total_vaccinations'].sum()
df_total_by_country = df_total_by_country.rename(columns={'total_vaccinations': 'location_total'})

df_tcv = df_tcv.merge(df_total_by_country, on='location', how='left')

df_tcv['pct_vac_by_c'] = df_tcv['total_vaccinations'] / df_tcv['location_total']

要获取df_tcv,请执行以下操作:

  location          vaccine  total_vaccinations  location_total  pct_vac_by_c
0  Austria  Johnson&Johnson                   6              10           0.6
1  Austria          Moderna                   4              10           0.4

相关问题 更多 >