用其他列的平均值填充Pandas中的空白

import numpy as np import pandas as pd df = pd.DataFrame() df['A'] = [1,2,3,4,5,np.nan,7,8,9] df['B'] = [7,9,np.nan,13,15,17,19,21,23] df['C'] = [-5,0,5,10,np.nan,20,25,30,35] print(df) colstd = df.std(axis=0) rowstd = df.std(axis=1) colavg = df.mean(axis=0) rowavg = df.mean(axis=1) for idx , row in df.iterrows(): for col in df.columns: if pd.isna(df.loc[idx][col]): df.loc[idx][col] = colavg[col] + colstd[col] * np.nanmean((row - colavg)/colstd) print(df)

2条回答

网友

1楼 · 编辑于 2024-06-16 11:38:08

您可以使用lambda函数尝试DataFrame.apply，其目的是避免数据帧的手动迭代，即iterrows()：

(df.apply(lambda row: pd.Series([colavg[col] +  
                                   colstd[col] * 
                                   np.nanmean((row - colavg)/colstd) 
                                   if pd.isna(row[col]) 
                                   else row[col] for col in row.index],index=row.index), 
          axis=1))

输出：

          A          B          C
0  1.000000   7.000000  -5.000000
1  2.000000   9.000000   0.000000
2  3.000000  11.755997   5.000000
3  4.000000  13.000000  10.000000
4  5.000000  15.000000  14.665627
5  5.756524  17.000000  20.000000
6  7.000000  19.000000  25.000000
7  8.000000  21.000000  30.000000
8  9.000000  23.000000  35.000000

以下是执行的时间比较：

%%timeit
for idx , row in df.iterrows():
    for col in df.columns:
        if pd.isna(x.loc[idx][col]):
            df.loc[idx][col] = colavg[col] +  colstd[col] * np.nanmean((row - colavg)/colstd)

5.39 ms ± 401 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
x=(df.apply(lambda row: pd.Series([colavg[col] +  
                                   colstd[col] * 
                                   np.nanmean((row - colavg)/colstd) 
                                   if pd.isna(row[col]) 
                                   else row[col] for col in row.index],index=row.index), 
          axis=1))

2.68 ms ± 398 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

网友

2楼 · 编辑于 2024-06-16 11:38:08

只需执行与向量运算相同的操作。也许有一个更简单的方法，我只是试着按照你的逻辑：

colstd = df.std(axis=0)
rowstd = df.std(axis=1)
colavg = df.mean(axis=0)
rowavg = df.mean(axis=1)
fill = colavg.values+colstd.values*np.array([df.sub(colavg).div(colstd).mean(axis=1).values]*3).T
df.where(~df.isna(), fill)

输出：

          A          B          C
0  1.000000   7.000000  -5.000000
1  2.000000   9.000000   0.000000
2  3.000000  11.755997   5.000000
3  4.000000  13.000000  10.000000
4  5.000000  15.000000  14.665627
5  5.756524  17.000000  20.000000
6  7.000000  19.000000  25.000000
7  8.000000  21.000000  30.000000
8  9.000000  23.000000  35.000000

NB。我以前从未见过这种转变，你能详细介绍一下吗

网友

3楼 · 编辑于 2024-06-16 11:38:08

如果你想用mean来填充，就用这个

df.fillna(df.mean(axis=0))

相关问题更多 >

编程相关推荐

热门问题

热门文章