使用ICD代码和合并症的工具
icd的Python项目详细描述
用于处理ICD代码和合并症的工具。这是受r包icd启发的,它是一些基本功能的简单python实现。这已经过基准测试,能够为各种icd代码操作任务处理大型数据集(数千万行)。
如果您有兴趣帮助对此存储库作出贡献,请随时send me an email。
用法
在处理icd代码数据时,基本用法包括两个非常常见的任务。
- 将数据集从长格式转换为宽格式
- 处理已知共病映射的icd码
从长到宽的转换
数据通常是一种长格式,其中可能有个人的密钥,例如person_id和属于它的许多声明claim id。
例如:
claim_id | person_id | icd_cd_1 | icd_cd_2 | icd_cd_3 |
---|---|---|---|---|
001 | A | code_6 | code_2 | |
002 | A | code_8 | ||
003 | A | code_3 | code_2 | code_6 |
004 | B | code_1 | ||
005 | B | code_2 | code_3 | |
006 | C | code_4 | code_2 | code_5 |
为了便于处理,我们必须将表转换为更折叠的版本。^ {EM1}$$ICD
例如:
person_id | icd_cd_1 | icd_cd_2 | icd_cd_3 | icd_cd_4 |
---|---|---|---|---|
A | code_6 | code_2 | code_8 | code_3 |
B | code_1 | code_2 | code_3 | |
C | code_4 | code_2 | code_5 |
要完成此任务,只需使用函数long_to_short_transformation即可:
importpandasaspdimporticddata={"person_id":[1,1,1,2,2,3],"dx_1":["F11","E40","","F32","C77","G10"],"dx_2":["F1P","E400","","F322","C737",""]}df=pd.DataFrame.from_dict(data)icd.long_to_short_transformation(df,"person_id",["dx_1","dx_2"])
其中,df是您的pandas数据帧,“person\u id”是要上卷的列,[“dx\u 1”,“dx\u 2”]是包含icd代码的列数组。
需要注意的是,即使只有一个icd列,它也必须是一个数组。此外,还必须输入NaN值才能成为空字符串如“”。
函数将返回一个索引为person id的新数据帧、一个person id列以及以下格式中所需的许多唯一列icd、icd、…、icd。
将ICD代码处理为已知的共病
第二个任务是将共病映射到这些icd代码。为此,可以使用函数icd_to_comodities。从格式表中可以看出:
person_id | icd_cd_1 | icd_cd_2 | icd_cd_3 | icd_cd_4 |
---|---|---|---|---|
A | code_6 | code_2 | code_8 | code_3 |
B | code_1 | code_2 | code_3 | |
C | code_4 | code_2 | code_5 |
格式:
person_id | comorb_1 | comorb_2 | comorb_3 | comorb_4 |
---|---|---|---|---|
A | True | False | True | True |
B | False | True | False | False |
C | False | False | False | False |
此共病映射正在使用的映射上挂起。
执行的示例如下:
importpandasaspdimporticddf=pd.DataFrame.from_dict({'icd_0':{1:'F1P',2:'F322',3:''},'icd_1':{1:'F11',2:'C77',3:'G10'},'icd_2':{1:'',2:'C737',3:''},'icd_3':{1:'E400',2:'F32',3:''},'icd_4':{1:'E40',2:'',3:''},'person_id':{1:1,2:2,3:3}})icd.icd_to_comorbidities(df,"person_id",["icd_0","icd_1","icd_2","icd_3","icd_4"])
默认的默认映射是quan_elixhauser10,它是quan对following paper中原始elixhauser icd 9共病的转录。
可选地,您可以提供一个mapping关键字参数,例如:
icd.icd_to_comorbidities(df,"person_id",["icd_0","icd_1","icd_2","icd_3","icd_4"],mapping="quan_elixhauser10")
当前支持的映射是默认的“quan_elixhauser10”映射以及上面同一篇文章中引用的“charlson10”映射。另外,您可以在sas代码here中找到它们。
如果要创建自定义comborbidity映射,只需传入映射参数的dict,而不是受支持的关键字字符串。dict必须遵循以下格式:
custom_mapping={"paraplegia_and_hemiplegia":['G81','G82','G041','G114','G801','G802','G830','G831','G832','G833','G834','G839'],"renal_disease":['N18','N19','N052','N053','N054','N055','N056','N057','N250','I120','I131','N032','N033','N034','N035','N036','N037','Z490','Z491','Z492','Z940','Z992'],"cancer":['C00','C01','C02','C03','C04','C05','C06','C07','C08','C09','C10','C11','C12','C13','C14','C15','C16','C17','C18','C19','C20','C21','C22','C23','C24','C25','C26','C30','C31','C32','C33','C34','C37','C38','C39','C40','C41','C43','C45','C46','C47','C48','C49','C50','C51','C52','C53','C54','C55','C56','C57','C58','C60','C61','C62','C63','C64','C65','C66','C67','C68','C69','C70','C71','C72','C73','C74','C75','C76','C81','C82','C83','C84','C85','C88','C90','C91','C92','C93','C94','C95','C96','C97'],"moderate_or_sever_liver_disease":['K704','K711','K721','K729','K765','K766','K767','I850','I859','I864','I982'],"metastitic_carcinoma":['C77','C78','C79','C80'],"aids_hiv":['B20','B21','B22','B24']}icd.icd_to_comorbidities(df,"person_id",["icd_0","icd_1","icd_2","icd_3","icd_4"],mapping=custom_mapping)
上面的函数返回一个新的dataframe,其中person_id值作为索引,传入任何“person_id”字符串的列,以及填充有true或false的每个共病的列。
兼容性
icd目前支持python 3.4、3.5和3.6