编辑1
@Serge answer可用于我的测试数据,但由于真实数据中存在错误而失败
start.csv
id;index;level1;level2;level3;strate;value
;1;Recruitement;;Recruitement start date;Total;2021-07-01
;1;Recruitement;;Recruitement start date;Ivory Coast;2021-07-01
;1;Recruitement;;Recruitement start date;Madagascar;2021-07-01
;1;Recruitement;;Recruitement start date;South Africa;2021-07-01
;2;Recruitement;;Sex (Woman), %;Total;47.0
;2;Recruitement;;Sex (Woman), %;Ivory Coast;40.0
;2;Recruitement;;Sex (Woman), %;Madagascar;40.0
;2;Recruitement;;Sex (Woman), %;South Africa;60.0
5-Total;3;Follow up;;D7 visits: N performed;Total;4 # -> LINE TO MERGE WITH...
5-Ivory Coast;3;Follow up;;D7 visits: N performed;Ivory Coast;1
5-Madagascar;3;Follow up;;D7 visits: N performed;Madagascar;2
5-South Africa;3;Follow up;;D7 visits: N performed;South Africa;1
5-Total;4;Follow up;;D7 visits: N expected;Total;12 #.... THIS LINE
5-Ivory Coast;4;Follow up;;D7 visits: N expected;Ivory Coast;4
5-Madagascar;4;Follow up;;D7 visits: N expected;Madagascar;4
5-South Africa;4;Follow up;;D7 visits: N expected;South Africa;4
;6;Recruitement;;Age, median;Total;35.0
;6;Recruitement;;Age, median;Ivory Coast;35.0
;6;Recruitement;;Age, median;Madagascar;31.0
;6;Recruitement;;Age, median;South Africa;36.0
;7;Recruitement;;Age, IQR;Total;(26.5-48.5)
;7;Recruitement;;Age, IQR;Ivory Coast;(26.0-48.0)
;7;Recruitement;;Age, IQR;Madagascar;(26.0-49.0)
;7;Recruitement;;Age, IQR;South Africa;(28.0-40.0)
;8;Recruitement;;Randomized all, N;Total;15
;8;Recruitement;;Randomized all, N;Ivory Coast;5
;8;Recruitement;;Randomized all, N;Madagascar;5
;8;Recruitement;;Randomized all, N;South Africa;5
15-Total;9;Recruitement;;Included severity stage 1, N;Total;1
15-South Africa;9;Recruitement;;Included severity stage 1, N;South Africa;1
16-Total;10;Recruitement;;Included severity stage 2, N;Total;8
16-Ivory Coast;10;Recruitement;;Included severity stage 2, N;Ivory Coast;4
16-Madagascar;10;Recruitement;;Included severity stage 2, N;Madagascar;1
16-South Africa;10;Recruitement;;Included severity stage 2, N;South Africa;3
17-Total;11;Recruitement;;Included severity stage 3, N;Total;6
17-Ivory Coast;11;Recruitement;;Included severity stage 3, N;Ivory Coast;1
17-Madagascar;11;Recruitement;;Included severity stage 3, N;Madagascar;4
17-South Africa;11;Recruitement;;Included severity stage 3, N;South Africa;1
15-Total;12;Recruitement;;Included severity stage 1, %;Total;7.0
15-Ivory Coast;12;Recruitement;;Included severity stage 1, %;Ivory Coast;nan
15-Madagascar;12;Recruitement;;Included severity stage 1, %;Madagascar;nan
15-South Africa;12;Recruitement;;Included severity stage 1, %;South Africa;20.0
16-Total;13;Recruitement;;Included severity stage 2, %;Total;53.0
16-Ivory Coast;13;Recruitement;;Included severity stage 2, %;Ivory Coast;80.0
16-Madagascar;13;Recruitement;;Included severity stage 2, %;Madagascar;20.0
16-South Africa;13;Recruitement;;Included severity stage 2, %;South Africa;60.0
17-Total;14;Recruitement;;Included severity stage 3, %;Total;40.0
17-Ivory Coast;14;Recruitement;;Included severity stage 3, %;Ivory Coast;20.0
17-Madagascar;14;Recruitement;;Included severity stage 3, %;Madagascar;80.0
17-South Africa;14;Recruitement;;Included severity stage 3, %;South Africa;20.0
我想合并具有相同id的csv行,并对该行的列进行一些处理
start.csv
index;level1;level2;level3;strate;value;id
3;Follow up;;D7 visits: N performed;Total;2;1
4;Follow up;;D7 visits: N expected;Total;6;1
3;Follow up;;D7 visits: N performed;Ivory Coast;1;2
4;Follow up;;D7 visits: N expected;Ivory Coast;4;2
3;Follow up;;D7 visits: N performed;Madagascar;1;3
4;Follow up;;D7 visits: N expected;Madagascar;2;3
我要合并具有相同id的行,如下所示:
end.csv
index;level1;level2;level3;strate;value;id
3;Follow up;;D7 visits: N performed/expected %;Total;0.33;1
4;Follow up;;D7 visits: N performed/expected %;Ivory Coast;0.75;2
4;Follow up;;D7 visits: N performed/expected %;IMadagascar;0.50;3
我可能需要使用groupby('id')并为列处理值应用函数,但是
df.replace('D7 visits: N expected','D7 visits: N performed/expected %').groupby("id",as_index=False,sort=False).last()
输出
index;level1;level2;level3;strate;value;id
3;Follow up;;D7 visits: N performed/expected %;Total;6;1
4;Follow up;;D7 visits: N performed/expected %;Ivory Coast;4;2
4;Follow up;;D7 visits: N performed/expected %;IMadagascar;2;3
在这里,数据帧的旋转可能会有所帮助:
应生成预期的csv文件
如果您想直接查看数据工作,请参阅“从1开始”部分。作为一个比我回答得早的人,您应该使用reforme-wide(在python:pivot中),但在我的例子中,我将数据本身进行了透视,并将其放入新的变量中 (
new=pd.pivot(data,index=['id','level1','strate','level2'],values=['level3','value'],columns='no')
)(*另外:在您输入整个示例数据后,我只需编辑一点代码.)
我还在Stata中添加了相同的工作。处理数据会容易得多
相关问题 更多 >
编程相关推荐