Python迭代插补

2024-03-29 02:10:01 发布

您现在位置:Python中文网/ 问答频道 /正文

I have a dataset with missing values that I want to impute, using the StdDev/Mean of the existing data within each feature, for each country, over time.

我想建立一个循环/迭代,使用groupby和lambda或forloop遍历组,并迭代地输入缺失的值

电流DF:

(远不止3年,3个国家,3个特色)

[country, feature, year, value]

USA  A  1995  8
USA  B  1995  NaN
USA  C  1995  326
USA  A  1996  14
USA  B  1996  42
USA  C  1996  NaN
USA  A  1997  NaN
USA  B  1997  50
USA  C  1997  400

CHN  A  1995  6
CHN  B  1995  34
CHN  C  1995  NaN

CHN  A  1996  NaN
CHN  B  1996  NaN
CHN  C  1996  381

CHN  A  1997  23
CHN  B  1997  54
CHN  C  1997  412

grp = df.groupby(['country', 'series'])

for country, group in grp: 

    return ????Some Iteration????

Expected output would return the df with the NaN values now imputed as the StdDev values for each country, with respect to each feature.

Not the StdDev of the all the features/all the countries combined as a whole.

感谢所有的意见


Tags: ofthetoforwithnancountryfeature