如何仅在数据中有x个连续notnull值的情况下传递值?

2024-04-20 10:07:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个长达60年的月温度异常数据的时间序列。我只想在温度异常大于0.5的时间序列中连续6个月或更长时间传递温度值。尽管我发现替换这些值很容易<;0.5对于NaN,我不确定如何替换温度为>;0.5,但只有2或3个连续值大于0.5。下面的片段:

time = [1950.04167, 1950.125  , 1950.20833, 1950.29167, 1950.375  ,
       1950.45833, 1950.54167, 1950.625  , 1950.70833, 1950.79167,
       1950.875  , 1950.95833, 1951.04167, 1951.125  , 1951.20833,
       1951.29167, 1951.375  , 1951.45833, 1951.54167, 1951.625  ,
       1951.70833, 1951.79167, 1951.875  , 1951.95833, 1952.04167,
       1952.125  , 1952.20833, 1952.29167, 1952.375  , 1952.45833,
       1952.54167, 1952.625  , 1952.70833, 1952.79167, 1952.875  ,
       1952.95833, 1953.04167, 1953.125  , 1953.20833, 1953.29167,
       1953.375  , 1953.45833, 1953.54167, 1953.625  , 1953.70833,
       1953.79167, 1953.875  , 1953.95833, 1954.04167, 1954.125  ,
       1954.20833, 1954.29167, 1954.375  , 1954.45833, 1954.54167,
       1954.625  , 1954.70833, 1954.79167, 1954.875  , 1954.95833]


sst = [-1.67623 , -1.685853, -1.69083 , -1.61898 , -1.40235 ,
       -1.097773, -0.835867, -0.718727, -0.694087, -0.785423,
       -0.9312  , -1.01925 , -0.8868  , -0.48022 , -0.007597,
        0.448647,  0.66546 ,  0.852427, 0.922443,  1.14481 ,
        1.291153,  1.338903,  0.993053,  0.68006, 0.493597,
        0.500197,  0.528363,  0.515583,  0.418493,  0.168387,
       -0.003403,  0.033933,  0.15759 ,  0.113847,  0.019967,
        0.111413, 0.372967,  0.623067,  0.763903,  0.909743,
        0.990287,  1.01288 , 0.969407,  0.985817,  0.982607,
        1.01244 ,  1.039917,  1.11755, 1.044333,  0.799593,
        0.3769  ,  0.105033, -0.070743, -0.281483, -0.59861,
        -0.875743, -0.88768 , -0.642517, -0.548043, -0.547057]


series = pd.Series(index=time,data=sst)
greater = series.where(cond=(series>= 0.5))

例如,我希望能够“传递”SST值,对应于1951.375到1951.95833和1953.125到1954.125的时间跨度,其中8个和13个连续值的SST分别大于0.5,但对于对应于1952.125至1952.29167的SST值,将SST值替换为NaN,其中只有3个连续值为>;0.5.

有什么建议吗?蒂亚


Tags: 数据ltgtdataindextime时间序列
1条回答
网友
1楼 · 发布于 2024-04-20 10:07:59

您可以使用series.groupby(series.le(0.5).cumsum())查找> 0.5运行的长度,然后使用.apply()替换太短的运行值

.groupby最后将最后一个<= 0.5值集总,因此我们希望将其限制为5次以上的运行,并用np.nan替换第一个值

In [61]: (
    series
    .groupby(series.le(0.5).cumsum())
    .apply(lambda x: pd.Series(np.nan if len(x) < 5 else [np.nan] + list(x)[1:], x.index))
)
Out[61]:
1950.04167         NaN
1950.12500         NaN
1950.20833         NaN
1950.29167         NaN
1950.37500         NaN
1950.45833         NaN
1950.54167         NaN
1950.62500         NaN
1950.70833         NaN
1950.79167         NaN
1950.87500         NaN
1950.95833         NaN
1951.04167         NaN
1951.12500         NaN
1951.20833         NaN
1951.29167         NaN
1951.37500    0.665460
1951.45833    0.852427
1951.54167    0.922443
1951.62500    1.144810
1951.70833    1.291153
1951.79167    1.338903
1951.87500    0.993053
1951.95833    0.680060
1952.04167         NaN
1952.12500         NaN
1952.20833         NaN
1952.29167         NaN
1952.37500         NaN
1952.45833         NaN
1952.54167         NaN
1952.62500         NaN
1952.70833         NaN
1952.79167         NaN
1952.87500         NaN
1952.95833         NaN
1953.04167         NaN
1953.12500    0.623067
1953.20833    0.763903
1953.29167    0.909743
1953.37500    0.990287
1953.45833    1.012880
1953.54167    0.969407
1953.62500    0.985817
1953.70833    0.982607
1953.79167    1.012440
1953.87500    1.039917
1953.95833    1.117550
1954.04167    1.044333
1954.12500    0.799593
1954.20833         NaN
1954.29167         NaN
1954.37500         NaN
1954.45833         NaN
1954.54167         NaN
1954.62500         NaN
1954.70833         NaN
1954.79167         NaN
1954.87500         NaN
1954.95833         NaN
dtype: float64

相关问题 更多 >