识别非连续零的索引值

DATE VAL 2007-06-26 0.000000 2007-06-27 0.000000 2007-06-28 0.000000 2007-06-29 -0.006408 2007-07-02 0.000000 2007-07-03 0.000000 2007-07-04 -0.000003 2007-07-05 0.000000 2007-07-06 0.000000 2007-07-09 0.000000 2007-07-10 -0.018858 2007-07-11 -0.015624 2007-07-12 0.000000 2007-07-13 0.000000 2007-07-16 -0.008562 2007-07-17 -0.006587

START END DAYS MIN 2007-06-29 2007-06-29 1 -0.006408 2007-07-04 2007-07-04 1 -0.000003 2007-07-10 2007-07-11 2 -0.018858 2007-07-16 2007-07-17 2 -0.008562

3条回答

网友

1楼 · 编辑于 2024-04-23 06:24:20

这一个与最初的解决方案（Allen）有一些相似的逻辑，但较少“适用”。不确定性能比较

# a new group begins when previous value is 0, but current is negative
df['NEW_GROUP'] = df['VAL'].shift(1) == 0
df['NEW_GROUP'] &= df['VAL'] < 0

# Group by the number of times a new group has showed up, which determines the group number.
# Directly return a Series from `apply` to obviate further transformations
print(df.loc[df['VAL'] < 0]
        .groupby(df['NEW_GROUP'].cumsum())
        .apply(lambda x: pd.Series([x.DATE.iloc[0], x.DATE.iloc[-1], x.VAL.min(), len(x)],
                        index=['START','END','MIN','DAYS'])))

输出：

          START      END         MIN         DAYS
NEW_GROUP                                      
1         2007-06-29 2007-06-29 -0.006408     1
2         2007-07-04 2007-07-04 -0.000003     1
3         2007-07-10 2007-07-11 -0.018858     2
4         2007-07-16 2007-07-17 -0.008562     2

网友

2楼 · 编辑于 2024-04-23 06:24:20

使用^{}在0.25+下工作的溶液：

#convert DatetimeIndex to column
df = df.reset_index()
#filter values equal 0
m = df['VAL'].eq(0)
#create groups only for non 0 rows filtering with inverting mask by ~
g = m.ne(m.shift()).cumsum()[~m]
#aggregation by groups
df1 = df.groupby(g).agg(START=('DATE','first'),
                        END=('DATE','last'),
                        DAYS= ('DATE', 'size'),
                        MIN=('VAL','min')).reset_index(drop=True)
print (df1)
       START        END  DAYS       MIN
0 2007-06-29 2007-06-29     1 -0.006408
1 2007-07-04 2007-07-04     1 -0.000003
2 2007-07-10 2007-07-11     2 -0.018858
3 2007-07-16 2007-07-17     2 -0.008562

熊猫解决方案<；0.25可以将dictionary传递到agg和最后设置的新列名：

df = df.reset_index()
m = df['VAL'].eq(0)
g = m.ne(m.shift()).cumsum()[~m]

df1 = df.groupby(g).agg({'DATE':['first','last','size'], 'VAL':'min'}).reset_index(drop=True)
df1.columns = ['START','END','DAYS','MIN']
print (df1)
       START        END  DAYS       MIN
0 2007-06-29 2007-06-29     1 -0.006408
1 2007-07-04 2007-07-04     1 -0.000003
2 2007-07-10 2007-07-11     2 -0.018858
3 2007-07-16 2007-07-17     2 -0.008562

网友

3楼 · 编辑于 2024-04-23 06:24:20

首先创建一个标志来查找非零记录并将其分配到相同的组中，然后创建groupby并计算所需的属性

(
    df.assign(Flag = np.where(df.VAL.ge(0), 1, np.nan))
    .assign(Flag = lambda x: x.Flag.fillna(x.Flag.cumsum().ffill()))
    .loc[lambda x: x.Flag.ne(1)]
    .groupby('Flag')
    .apply(lambda x: [x.DATE.iloc[0], x.DATE.iloc[-1], len(x), x.VAL.min()])
    .apply(pd.Series)
    .set_axis(['START','END','DAYS','MIN'], axis=1, inplace=False)
)


        START       END         DAYS    MIN
Flag                
3.0     2007-06-29  2007-06-29  1   -0.006408
5.0     2007-07-04  2007-07-04  1   -0.000003
8.0     2007-07-10  2007-07-11  2   -0.018858
10.0    2007-07-16  2007-07-17  2   -0.008562

相关问题更多 >

编程相关推荐

热门问题

热门文章