指定某列的最后一个有效值

2024-04-25 00:29:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个很大的数据框,在这里我用条件计算平均值。我需要将NaN更改为该城市的最后一个有效值。你知道吗

我试过df['Mean3big'].fillna(method='ffill',inplace=True),但是我得到了错误的值,因为它不考虑城市。你知道吗

df  = pd.DataFrame([["Gothenburg", "2018", 1.5, 2.3, 107],
["Gothenburg", 2018, 1.3, 3.3, 10],
["Gothenburg", 2018, 2.2, 2.3, 20],
["Gothenburg", 2018, 1.5, 2.1, 30],
["Gothenburg", 2018, 2.5, 2.3, 20],
["Malmo", 2018, 1.6, 2.3, 10],
["Gothenburg", 2018, 1.9, 2.8, 10],
["Malmo", 2018, 0.7, 4.3, 30],
["Gothenburg", 2018, 1.7, 3.2, 40],
["Malmo", 2018, 1.0, 3.3, 40],
["Gothenburg", 2018, 3.7, 2.3, 10],
["Malmo", 2018, 1.0, 2.9, 112],
["Gothenburg", 2018, 2.7, 2.3, 20],
["Gothenburg", 2019, 1.3, 3.3, 10],
["Gothenburg", 2019, 1.2, 2.3, 20],
["Gothenburg", 2019, 1.6, 2.1, 10],
["Gothenburg", 2019, 1.8, 2.3, 10],
["Malmo", 2019, 1.6, 1.3, 20],
["Gothenburg", 2019, 1.9, 2.8, 30]])

df.columns = ['City', 'Year', 'Val1', 'Val2', 'Val3']
df["Mean3big"] = round(df.groupby(['City', "Year"])['Val3'].transform(lambda x: x.expanding().mean().shift()).where(df['Val1'] > 1.6), 2)

我的结果:

      City  Year  Val1  Val2  Val3  Mean3big
0   Gothenburg  2018   1.5   2.3   107       NaN
1   Gothenburg  2018   1.3   3.3    10       NaN
2   Gothenburg  2018   2.2   2.3    20     10.00
3   Gothenburg  2018   1.5   2.1    30       NaN
4   Gothenburg  2018   2.5   2.3    20     20.00
5        Malmo  2018   1.6   2.3    10       NaN
6   Gothenburg  2018   1.9   2.8    10     20.00
7        Malmo  2018   0.7   4.3    30       NaN
8   Gothenburg  2018   1.7   3.2    40     18.00
9        Malmo  2018   1.0   3.3    40       NaN
10  Gothenburg  2018   3.7   2.3    10     21.67
11       Malmo  2018   1.0   2.9   112       NaN
12  Gothenburg  2018   2.7   2.3    20     20.00
13  Gothenburg  2019   1.3   3.3    10       NaN
14  Gothenburg  2019   1.2   2.3    20       NaN
15  Gothenburg  2019   1.6   2.1    10       NaN
16  Gothenburg  2019   1.8   2.3    10     13.33
17       Malmo  2019   1.6   1.3    20       NaN
18  Gothenburg  2019   1.9   2.8    30     12.50

我想让mean3bigrow3给出城市“Gothenburg”=10的最后一个有效值。第0行和第1行可以接受NaN,因为我之前没有有效值。你知道吗

第7行应该是20,这是“Malmo”的最后一个有效值。第5行可以使用Nan,因为没有先前的有效值,以此类推。。。你知道吗


Tags: 数据citydfnan条件yearmethod平均值
1条回答
网友
1楼 · 发布于 2024-04-25 00:29:22

没有考虑到你的最后一句话。或许可以试试这个:

import pandas as pd

df = pd.DataFrame(
    [
        ["Gothenburg", "2018", 1.5, 2.3, 107],
        ["Gothenburg", 2018, 1.3, 3.3, 10],
        ["Gothenburg", 2018, 2.2, 2.3, 20],
        ["Gothenburg", 2018, 1.5, 2.1, 30],
        ["Gothenburg", 2018, 2.5, 2.3, 20],
        ["Malmo", 2018, 1.6, 2.3, 10],
        ["Gothenburg", 2018, 1.9, 2.8, 10],
        ["Malmo", 2018, 0.7, 4.3, 30],
        ["Gothenburg", 2018, 1.7, 3.2, 40],
        ["Malmo", 2018, 1.0, 3.3, 40],
        ["Gothenburg", 2018, 3.7, 2.3, 10],
        ["Malmo", 2018, 1.0, 2.9, 112],
        ["Gothenburg", 2018, 2.7, 2.3, 20],
        ["Gothenburg", 2019, 1.3, 3.3, 10],
        ["Gothenburg", 2019, 1.2, 2.3, 20],
        ["Gothenburg", 2019, 1.6, 2.1, 10],
        ["Gothenburg", 2019, 1.8, 2.3, 10],
        ["Malmo", 2019, 1.6, 1.3, 20],
        ["Gothenburg", 2019, 1.9, 2.8, 30],
    ]
)

df.columns = ['City', 'Year', 'Val1', 'Val2', 'Val3']
df["Mean3big"] = round(
    df.groupby(['City', "Year"])['Val3']
    .transform(lambda x: x.expanding().mean().shift())
    .where(df['Val1'] > 1.6),
    2,
)
print(df)

valids = {}
for index, row in df.iterrows():
    # this if checks if the value is NaN, you can import math and use isnan() instead
    if row['Mean3big'] != row['Mean3big']:
        if row['City'] in valids:
            df.at[index, 'Mean3big'] = valids[row['City']]
    else:
        valids[row['City']] = row['Mean3big']

print(df)

时间复杂度为O(n)。你知道吗

相关问题 更多 >