创建一个python函数,在一列中查找一系列癌症代码值,并按癌症代码返回前10位死亡人数

2024-05-15 06:11:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我被困在一个问题上,我有一个包含各种死因的大型数据集。我想按死亡原因筛选某些代码(总共100多个)和字符编号(例如F58),这样我就可以得到该疾病的死亡总数,并按前10位合并死亡总数对其进行排序

我已经发布了一个来自Excel的csv文件数据示例。你能帮我指出解决这个问题的正确方向吗

Mortality data example


Tags: 文件csv数据代码示例排序原因字符
3条回答

这将按代码和性别添加死亡人数,并在每个类别中创建一个计数。然后按代码和性别分组,按死亡人数降序排列

deaths = df.groupby(['code', 'sex']).size().reset_index(drop=False)
deaths.columns = ['code', 'sex', 'deaths'] 

deaths = deaths.groupby(['code', 'sex']).sum()
deaths.sort_values(by='deaths', ascending=False)

这里有两个例子可以帮助您:

import pandas as pd

# I will create a data frame from a dictionary for this example
dict_df = {
    "Code": ["A","B","C","D","C","B","B","B","A","A"],
    "Age":  [14, 16, 17, 4, 15, 16, 8, 10, 90, 99],
    "Sex":  [0, 1, 1, 1, 0, 0, 0, 0, 0, 1]
}

data = pd.DataFrame.from_dict(dict_df)

# Group by column code
data_bycode = data.groupby(["Code"]).size()

# Sort data_bycode in decreasing order
data_bycode.sort_values(ascending = False, inplace = True) 
data_bycode

另一种方法是从collections中提取感兴趣的列并使用Counter

from collections import Counter

# Collect data into a list
codes = data["Code"].tolist()

# Get ferquencies with Counter and transform it as a dict
freq_codes = dict(Counter(codes))

# Get a dictionary to create a data frame with columns Code and Count
dict_df = {"Code": [], "Count": []}
for key, value in freq_codes.items():
    dict_df["Code"].append(key)
    dict_df["Count"].append(value)

# Create df from dictionary 
df =  pd.DataFrame.from_dict(dict_df)
# Sort values in df
df.sort_values(ascending = False, inplace = True, by = "Count") # Neeeded here because we have more than one column
df

我希望它能有用:)

第一步是创建一个你正在寻找的代码列表,然后使用一个掩码在上面过滤你的数据帧

code_list = ['F58']  # add as many as you want

# Filter original dataframe on the codes
new_df = old_df[old_df['Code'].isin(code_list)]

然后,听起来你想做的是将数据按死因分组,并将该死因的总死亡人数相加:

# This groups codes and counts how many occurrences fall into that group
top_ten = new_df.groupby(by='Code').count()  

然后,您可以使用sortby()将数据帧从高到低排列,并对数据帧进行切片以保留前十位

希望这有帮助

更新: 在我的机器上尝试了一个玩具样品,结果如下: enter image description here

相关问题 更多 >

    热门问题