Pandas - 事件分离 - .iloc iteritem()?

0 投票

1 回答

1040 浏览

提问于 2025-04-18 18:01

我有一个叫做sample_data.txt的文件，里面有一些数据。

Precision= Waterdrops

2009-11-17 14:00:00,4.9,
2009-11-17 14:30:00,6.1,
2009-11-17 15:00:00,5.3,
2009-11-17 15:30:00,3.3,
2009-11-17 16:00:00,4.9,

我需要把数据分开，找出那些值大于零的部分，并且识别出时间间隔超过2小时的变化（事件）。到目前为止，我写了：

file_path  = 'sample_data.txt'
df = pd.read_csv(file_path, skiprows = [num for (num,line) in enumerate(open(file_path),2) if 'Precision=' in line][0],
                 parse_dates =  True,index_col = 0,header= None, sep =',',
                 names = ['meteo', 'empty'])
df['date'] = df.index
df = df.drop(['empty'], axis=1)
df = df[df.meteo>20]
df['diff'] = df.date-df.date.shift(1)
df['sections'] = (diff > np.timedelta64(2, "h")).astype(int).cumsum()

从上面的代码中，我得到了：

                   meteo    date                diff       sections
2009-12-15 12:00:00 23.8    2009-12-15 12:00:00 NaT         0
2009-12-15 13:00:00 23.0    2009-12-15 13:00:00 01:00:00    0

如果我使用：

df.date.iloc[[0, -1]].reset_index(drop=True)

我得到：

0   2009-12-15 12:00:00
1   2012-12-05 16:00:00
Name: date, dtype: datetime64[ns]

这就是我sample_data.txt的开始日期和结束日期。

我该如何对每个df['section']类别使用.iloc[[0, -1]].reset_index(drop=True)呢？

我尝试用.apply：

def f(s):
    return s.iloc[[0, -1]].reset_index(drop=True)

df.groupby(df['sections']).apply(f)

结果是：IndexError: positional indexers are out-of-bounds（索引错误：位置索引超出范围）

数据处理数据分析 pandas 数据框时间序列数据筛选事件检测索引操作

1 个回答

我不太明白你为什么要用 drop_index() 这种复杂的方式。我的方法会简单一些，首先从

df

   sections       meteo      date      diff
0         0  2009-12-15  12:00:00       NaT
1         0  2009-12-15  13:00:00  01:00:00
0         1  2009-12-15  12:00:00       NaT
1         1  2009-12-15  13:00:00  01:00:00

开始做（在你用 sort('sections', 'date') 确保 iloc[0,-1] 确实是开始和结束之前，否则就直接用 min() 和 max()）

def f(s):
    return s.iloc[[0, -1]]['date']
df.groupby('sections').apply(f)

date             0         1
sections                    
0         12:00:00  13:00:00
1         12:00:00  13:00:00

或者，作为一种更简化的方法

df.groupby('sections')['date'].agg([np.max, np.min])
              amax      amin
sections                    
0         13:00:00  12:00:00
1         13:00:00  12:00:00

回答于 2025-04-18 由 Python大师

分享举报

Pandas - 事件分离 - .iloc iteritem()?

1 个回答

撰写回答