计算列中第n个元素的平均值，并在列中的一定间隔内重复计算

code scale year week a b c 1111 -5 2017 15 68 68 19 1111 -4 2017 16 30 95 24 1111 -3 2017 17 21 15 94 1111 -2 2017 18 67 30 16 1111 -1 2017 19 10 13 13 1111 0 2017 20 26 22 18 1111 1 2017 21 NaN NaN NaN 1111 2 2017 22 NaN NaN NaN 1111 3 2017 23 NaN NaN NaN 1111 4 2017 24 NaN NaN NaN 1111 5 2017 25 NaN NaN NaN 1111 6 2017 26 NaN NaN NaN 2222 -5 2017 15 13 19 21 2222 -4 2017 16 24 95 23 2222 -3 2017 17 22 32 76 2222 -2 2017 18 21 30 12 2222 -1 2017 19 15 55 17 2222 0 2017 20 23 22 23 2222 1 2017 21 NaN NaN NaN 2222 2 2017 22 NaN NaN NaN 2222 3 2017 23 NaN NaN NaN 2222 4 2017 24 NaN NaN NaN 2222 5 2017 25 NaN NaN NaN 2222 6 2017 26 NaN NaN NaN ....

import numpy as np import pandas as pd #data is your dataframe name column_list = list(data.columns.values)[4:] for column_name in column_list : column = data[column_name].values #converted pandas series to numpy series for index in xrange(0,column.shape[0]): #iterating over entries in the column if np.isnan(column[index]): column[index] = np.nanmean(column.take(range(index-5,index+1),mode='wrap'))

code scale year week a b c 1111 -5 2017 15 68 68 19 1111 -4 2017 16 30 95 24 1111 -3 2017 17 21 15 94 1111 -2 2017 18 67 30 16 1111 -1 2017 19 10 13 13 1111 0 2017 20 26 22 18 1111 1 2017 21 37 41 31 1111 2 2017 22 32 36 33 1111 3 2017 23 32 26 34 1111 4 2017 24 34 28 24 1111 5 2017 25 28 28 25 1111 6 2017 26 32 30 27 2222 -5 2017 15 13 19 21 2222 -4 2017 16 24 95 23 2222 -3 2017 17 22 32 76 2222 -2 2017 18 21 30 12 2222 -1 2017 19 15 55 17 2222 0 2017 20 23 22 23 2222 1 2017 21 20 42 29 2222 2 2017 22 21 46 30 2222 3 2017 23 20 38 31 2222 4 2017 24 20 39 24 2222 5 2017 25 20 40 26 2222 6 2017 26 21 38 27 ...

2条回答

网友

1楼 · 编辑于 2024-04-19 20:14:31

假设您的数据看起来与提供的示例相同，您可以这样做

colSelector = df.columns.values[4:]

for index,row in df.iterrows():
    if np.isnan(row[4:].values).any():
        col = colSelector[np.isnan(row[4:].values)]
        df.loc[index,col] = np.round(df.loc[index-6:index,col].mean(),0)

我假设要平均的列可能比a、b和c要多，但这两种方法都适用。另外，我们可以做一些布尔索引来找到NaN值，并选择平均值，从而消除第一个循环，而不是遍历每一列。你知道吗

注意：如果只是a-c列，并且在这些列之后确实有数据不是平均值，则将所有[4:]更改为[4:7]

网友

2楼 · 编辑于 2024-04-19 20:14:31

我们需要的是一种获得移动平均线的方法，我可能错了，但我不认为在pandas中有处理这一点的功能（考虑到pandas确实实现了ewa（）和rolling\u mean（），我想这可能并不奇怪）。在这里使用递归是有意义的，因为它不太深。你知道吗

def moving_average(data, window, periods_forward):
    """docs"""

    try:
        data.shape[1]
    except IndexError:
        import sys
        print("Data shape %s found. If there is only one sample please reshape the data using .reshape(-1, 1)." % data.shape)
        sys.exit()

    # Base case: Kill the recursion once we've created enough forward looks.
    if periods_forward == 0:
        return data
    else:
        data = np.concatenate([data, data[-window:, :].mean(axis=0).reshape(1,-1)])

    periods_forward -= 1
    return moving_average(data, window, periods_forward)


# Reset values in the dataframe.
columns = ['a', 'b', 'c']
for code in df.code.unique():
    df.loc[df.code == code, columns] = moving_average(
        df.loc[df.code == code, columns].dropna().values, window=6, periods_forward=6)

相关问题更多 >

编程相关推荐

热门问题

热门文章