我希望每隔两年对数据框条目进行分组,用分隔符“#”连接列值,并用分隔符“;”连接相同间隔内的条目
我以前通过iterating through the years and creating a new DataFrame实现了这一点,但它相当混乱——我更喜欢矢量化的解决方案
输入示例:
dx_code patient_id dx_name year
0 427.31 Z324563 Atrial fibrillation (CMS/HCC) 2012
1 H53.9 Z324563 Visual disturbance 2014
2 725 Z324563 Polymyalgia rheumatica (CMS/HCC) 2009
3 725 Z324563 Polymyalgia rheumatica (CMS/HCC) 2011
4 None Z273652 Disorder of bone and cartilage 2004
5 272.0 Z273652 Pure hypercholesterolemia 2006
6 729.81 Z273652 Swelling of limb 2012
7 446.5 Z273652 Giant cell arteritis (CMS/HCC) 2010
8 725 Z273652 Polymyalgia rheumatica (CMS/HCC) 2011
示例输出:
patient_id 2004–2005_dx \
0 Z324563 None
1 Z273652 None#Disorder of bone and cartilage
2006–2007_dx 2008–2009_dx \
0 None 725#Polymyalgia rheumatica (CMS/HCC)
1 272.0#Pure hypercholesterolemia None
2010–2011_dx \
0 725#Polymyalgia rheumatica (CMS/HCC)
1 446.5#Giant cell arteritis (CMS/HCC); 725#Polymyalgia rheumatica (CMS/HCC)
2012–2013_dx 2014_dx \
0 427.31#Atrial fibrillation (CMS/HCC) H53.9#Visual disturbance
1 729.81#Swelling of limb None
unknown_time_dx
0 None
1 None
在this回答之后,我有以下代码:
self.data.groupby(["patient_id", pd.Grouper(freq="2Y", key="date")])
.sum()
.unstack(fill_value=""))
它的输出如下:
dx_code dx_name
date 2004-12-31 2006-12-31 2010-12-31 2012-12-31 2014-12-31 2004-12-31 2006-12-31 2010-12-31 2012-12-31 2014-12-31
patient_id
Z273652 0 272.0 446.5 729.81725 Disorder of bone and cartilage Pure hypercholesterolemia Giant cell arteritis (CMS/HCC) Swelling of limbPolymyalgia rheumatica (CMS/HCC)
Z324563 725 427.31725 H53.9 Polymyalgia rheumatica (CMS/HCC) Atrial fibrillation (CMS/HCC)Polymyalgia rheum... Visual disturbance
但是,我似乎不知道如何组合这两个组中的列值
好的,让我们创建起始数据帧:
现在,定义垃圾箱:
所以
让我们加入dx_代码和dx_名称列:
最后,我们使用pivot_表:
让我们看看:
相关问题 更多 >
编程相关推荐