在Pandas.Dataframe中访问邻近行
我正在尝试计算一系列数据的局部最大值和最小值:如果当前行的值比前一行和后一行的值都大或都小,就把它设为当前值,否则就设为NaN(表示不是一个数字)。除了这样做,还有没有更优雅的方法呢:
import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2014', periods=10, freq='5min')
s = pd.Series([1, 2, 3, 2, 1, 2, 3, 5, 7, 4], index=rng)
df = pd.DataFrame(s, columns=['val'])
df.index.name = "dt"
df['minmax'] = np.NaN
for i in range(len(df.index)):
if i == 0:
continue
if i == len(df.index) - 1:
continue
if df['val'][i] >= df['val'][i - 1] and df['val'][i] >= df['val'][i + 1]:
df['minmax'][i] = df['val'][i]
continue
if df['val'][i] <= df['val'][i - 1] and df['val'][i] <= df['val'][i + 1]:
df['minmax'][i] = df['val'][i]
continue
print(df)
结果是:
val minmax
dt
2014-01-01 00:00:00 1 NaN
2014-01-01 00:05:00 2 NaN
2014-01-01 00:10:00 3 3
2014-01-01 00:15:00 2 NaN
2014-01-01 00:20:00 1 1
2014-01-01 00:25:00 2 NaN
2014-01-01 00:30:00 3 NaN
2014-01-01 00:35:00 5 NaN
2014-01-01 00:40:00 7 7
2014-01-01 00:45:00 4 NaN
1 个回答
1
我们可以用 shift
和 where
来决定要给哪些值赋什么。重要的是,在比较数据时,我们需要用到位运算符 &
和 |
。Shift
会返回一个向下移动了1行(默认情况下)或者根据你传入的值移动的 Series 或 DataFrame。
使用 where
时,我们可以传入一个布尔条件,第二个参数 NaN
表示如果条件为 False
的话,就给这个值赋值。
In [81]:
df['minmax'] = df['val'].where(((df['val'] < df['val'].shift(1))&(df['val'] < df['val'].shift(-1)) | (df['val'] > df['val'].shift(1))&(df['val'] > df['val'].shift(-1))), NaN)
df
Out[81]:
val minmax
dt
2014-01-01 00:00:00 1 NaN
2014-01-01 00:05:00 2 NaN
2014-01-01 00:10:00 3 3
2014-01-01 00:15:00 2 NaN
2014-01-01 00:20:00 1 1
2014-01-01 00:25:00 2 NaN
2014-01-01 00:30:00 3 NaN
2014-01-01 00:35:00 5 NaN
2014-01-01 00:40:00 7 7
2014-01-01 00:45:00 4 NaN