用前两个和后两个值的均值替换缺失值

0 投票

2 回答

39 浏览

提问于 2025-04-14 15:22

我刚来这里，最近在处理一段代码时遇到了一些困难，想问问有没有人能帮我。

正如标题所说，我想把Excel表格中缺失的值替换成前面两个和后面两个值的平均值。

这个Excel表格只有两列，长得像这样：

A : 1, 2, 3, 4, 5, 6, 7, 8,9 ,10....

B: 4, 7, 3 ,1, missing value, 5,30,14,27 ....

所以对于第一个缺失值，我想用[3,1]和[5,30]的平均值。

我在用Python，想知道用df.fillna还是df.replace哪个更合适？有没有人知道怎么写这样的代码？

我非常感谢任何帮助！

我已经问过chatgpt了，但它提供的函数没有用。我也试着在网上搜索，但找不到类似的问题...

数据清洗数据分析数据预处理缺失值处理 pandas库统计方法 Excel操作均值替换

2 个回答

@ChrisbBacon，你把它放在了整个代码块的地方：df = pd.DataFrame({ 'A': list(range(1, 11)), 'B': [4, 7, 3, 1, np.nan, 5, 30, 14, 27, np.nan] })。第一个元素是手动用Pandas创建一个数据框的方法。第二个是通过读取一个xlsx文件来创建数据框的方法。

解决这个问题的另一种方法是：

import numpy as np
import pandas as pd

df = pd.DataFrame({"A": [1, 2, None, 4, 5, 6, None, 8,9],
                   "B" : [4, 7, 3 ,1, None, 5,30,14,27]})

# find all missing values and return tuple with a list of column places and row places
np_positions = np.where(pd.isnull(df))
# create a list with the combination of row and column for each missing value
empty_values = [(row,col) for row,col in zip(*np_positions)]

# for every missing value
for empty in empty_values:
    # take the row and column
    pos_x, pos_y = empty
    # add the mean of the two numbers above and the mean of the numbers below it and divide by two
    fill_value = (df.iloc[pos_x-2:pos_x, pos_y].mean() + df.iloc[pos_x+1:pos_x+3, pos_y].mean())/2
    # place the value in the missing place
    df.iloc[pos_x, pos_y] = fill_value

回答于 2025-04-14 由 Python大师

分享举报

一个解决方案的例子可能是：

import pandas as pd
import numpy as np

# Example DataFrame (comment this when reading from xlsx file)
df = pd.DataFrame({
    'A': list(range(1, 11)),
    'B': [4, 7, 3, 1, np.nan, 5, 30, 14, 27, np.nan]
})

# (Uncomment when reading from xlsx file)
# df = pd.read_excel('file_name.xlsx')

# To compute the rolling mean of the previous 2 and next 2 values for each row in column B
df['B'] = df['B'].fillna((df['B'].shift(-2).rolling(2).mean() + df['B'].shift(2).rolling(2).mean()) / 2)

print(df)

解释器的输出结果：

    A      B
0   1   4.00
1   2   7.00
2   3   3.00
3   4   1.00
4   5  11.25
5   6   5.00
6   7  30.00
7   8  14.00
8   9  27.00
9  10    NaN

回答于 2025-04-14 由 Python大师

分享举报

用前两个和后两个值的均值替换缺失值

2 个回答

撰写回答