pandas数据帧查找具有特定条件的最长连续行

2024-03-29 06:05:22 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是一个名为“df”的pandas数据帧

             A
2015-05-01  True
2015-05-02  True
2015-05-03  False
2015-05-04  False
2015-05-05  False
2015-05-06  False
2015-05-07  True
2015-05-08  False
2015-05-09  False

我想返回一个片段,它是列'a'读取'False'的最长连续行数。这能做到吗?在


Tags: 数据falsetruepandasdf
2条回答

您可以使用cumsum来检测A列中的更改,因为在python中boolean可以求和。在

# Test data
df= DataFrame([True, True, False, False, False, False, True, False, False], 
              index=pd.to_datetime(['2015-05-01', '2015-05-02', '2015-05-03',
                                   '2015-05-04', '2015-05-05', '2015-05-06',
                                   '2015-05-07', '2015-05-08', '2015-05-09']), 
              columns=['A'])

# We have to ensure that the index is sorted
df.sort_index(inplace=True)
# Resetting the index to create a column
df.reset_index(inplace=True)

# Grouping by the cumsum and counting the number of dates and getting their min and max
df = df.groupby(df['A'].cumsum()).agg(
    {'index': ['count', 'min', 'max']})

# Removing useless column level
df.columns = df.columns.droplevel()

print(df)
#    count        min        max
# A                             
# 1      1 2015-05-01 2015-05-01
# 2      5 2015-05-02 2015-05-06
# 3      3 2015-05-07 2015-05-09

# Getting the max
df[df['count']==df['count'].max()]

#    count        min        max
# A                             
# 2      5 2015-05-02 2015-05-06

很抱歉带回一个旧的帖子,但我注意到罗曼的回答有点不对劲——计数不正确,导致结果不准确。计数列中应该有4项:[2,4,1,2],最大值为4。在

为了证明这个问题-我已经把它分解了一点(df与上面接受的答案相同)。您可以看到结果组不正确:

# sort
dfS = df.sort_index(inplace=True)
# reset
dfSR = dfS.reset_index(inplace=True)
# group
dfG = dfSR.groupby(df['A'].cumsum())

# show resulting groups
for group in dfG: print(group)

# (1,        index     A
# 0 2015-05-01  True)
# (2,        index      A
# 1 2015-05-02   True
# 2 2015-05-03  False
# 3 2015-05-04  False
# 4 2015-05-05  False
# 5 2015-05-06  False)
# (3,        index      A
# 6 2015-05-07   True
# 7 2015-05-08  False
# 8 2015-05-09  False)

多亏了DSM here的回答,当然还有罗曼的回答,将这两篇文章的技术结合起来就得到了答案。它们已经在它们来自的帖子中进行了解释,所以我将在下面的代码中继续讨论。在

^{pr2}$

输出:

(1,        index     A
0 2015-05-01  True
1 2015-05-02  True)
(2,        index      A
2 2015-05-03  False
3 2015-05-04  False
4 2015-05-05  False
5 2015-05-06  False)
(3,        index     A
6 2015-05-07  True)
(4,        index      A
7 2015-05-08  False
8 2015-05-09  False)

    count        min        max
A                             
1      2 2015-05-01 2015-05-02
2      4 2015-05-03 2015-05-06
3      1 2015-05-07 2015-05-07
4      2 2015-05-08 2015-05-09 

   count        min        max
A                             
2      4 2015-05-03 2015-05-06

相关问题 更多 >