Python Pandas 时间序列如何找到值大于特定值的最大序列

1 投票
1 回答
1530 浏览
提问于 2025-04-18 11:11

我该如何在时间序列中找到最长的连续序列呢?

比如我有一个这样的DataFrame

index      Value 
1-1-2012   10
1-2-2012   14
1-3-2012   15
1-4-2012   8
1-5-2012   7
1-6-2012   16
1-7-2012   17
1-8-2012   18

现在我想找出最长的连续序列:在这个例子中,就是从1-6-20121-8-2012之间的序列,共有3个数据。

谢谢!
安雅

1 个回答

2

这个方法有点笨重,但能完成任务。因为你没有说明标题中提到的“具体值”,我选择了12。

import pandas as pd

time_indecies = pd.date_range(start='2012-01-01', end='2012-08-01', freq='MS')
data = [10, 14, 15, 8, 7, 16, 17, 18]
df = pd.DataFrame({'vals': data, 't_indices': time_indecies })

threshold = 12
df['tag'] = df.vals > threshold

# make another DF to hold info about each region
regs_above_thresh = pd.DataFrame()

# first row of consecutive region is a True preceded by a False in tags
regs_above_thresh['start_idx']  = \
    df.index[df['tag'] & ~ df['tag'].shift(1).fillna(False)]

# last row of consecutive region is a False preceded by a True   
regs_above_thresh['end_idx']  = \
   df.index[df['tag'] & ~ df['tag'].shift(-1).fillna(False)] 

# how long is each region
regs_above_thresh['spans'] = \
    [(spam[0] - spam[1] + 1) for spam in \
    zip(regs_above_thresh['end_idx'], regs_above_thresh['start_idx'])]

# index of the region with the longest span      
max_idx = regs_above_thresh['spans'].argmax()

# we can get the start and end points of longest region from the original dataframe 
df.ix[regs_above_thresh.ix[max_idx][['start_idx', 'end_idx']].values]

连续区域的聪明之处来自于 behzad.nouri 的 这个解决方案

撰写回答