pandas数据帧查找具有特定条件的最长连续行

2条回答

网友

1楼 · 编辑于 2024-05-16 01:15:39

您可以使用cumsum来检测A列中的更改，因为在python中boolean可以求和。在

# Test data
df= DataFrame([True, True, False, False, False, False, True, False, False], 
              index=pd.to_datetime(['2015-05-01', '2015-05-02', '2015-05-03',
                                   '2015-05-04', '2015-05-05', '2015-05-06',
                                   '2015-05-07', '2015-05-08', '2015-05-09']), 
              columns=['A'])

# We have to ensure that the index is sorted
df.sort_index(inplace=True)
# Resetting the index to create a column
df.reset_index(inplace=True)

# Grouping by the cumsum and counting the number of dates and getting their min and max
df = df.groupby(df['A'].cumsum()).agg(
    {'index': ['count', 'min', 'max']})

# Removing useless column level
df.columns = df.columns.droplevel()

print(df)
#    count        min        max
# A                             
# 1      1 2015-05-01 2015-05-01
# 2      5 2015-05-02 2015-05-06
# 3      3 2015-05-07 2015-05-09

# Getting the max
df[df['count']==df['count'].max()]

#    count        min        max
# A                             
# 2      5 2015-05-02 2015-05-06

网友

2楼 · 编辑于 2024-05-16 01:15:39

很抱歉带回一个旧的帖子，但我注意到罗曼的回答有点不对劲——计数不正确，导致结果不准确。计数列中应该有4项：[2，4，1，2]，最大值为4。在

为了证明这个问题-我已经把它分解了一点（df与上面接受的答案相同）。您可以看到结果组不正确：

# sort
dfS = df.sort_index(inplace=True)
# reset
dfSR = dfS.reset_index(inplace=True)
# group
dfG = dfSR.groupby(df['A'].cumsum())

# show resulting groups
for group in dfG: print(group)

# (1,        index     A
# 0 2015-05-01  True)
# (2,        index      A
# 1 2015-05-02   True
# 2 2015-05-03  False
# 3 2015-05-04  False
# 4 2015-05-05  False
# 5 2015-05-06  False)
# (3,        index      A
# 6 2015-05-07   True
# 7 2015-05-08  False
# 8 2015-05-09  False)

多亏了DSM here的回答，当然还有罗曼的回答，将这两篇文章的技术结合起来就得到了答案。它们已经在它们来自的帖子中进行了解释，所以我将在下面的代码中继续讨论。在

^{pr2}$

输出：

(1,        index     A
0 2015-05-01  True
1 2015-05-02  True)
(2,        index      A
2 2015-05-03  False
3 2015-05-04  False
4 2015-05-05  False
5 2015-05-06  False)
(3,        index     A
6 2015-05-07  True)
(4,        index      A
7 2015-05-08  False
8 2015-05-09  False)

    count        min        max
A                             
1      2 2015-05-01 2015-05-02
2      4 2015-05-03 2015-05-06
3      1 2015-05-07 2015-05-07
4      2 2015-05-08 2015-05-09 

   count        min        max
A                             
2      4 2015-05-03 2015-05-06

相关问题更多 >

编程相关推荐

热门问题

热门文章

pandas数据帧查找具有特定条件的最长连续行

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >