# Test data
df= DataFrame([True, True, False, False, False, False, True, False, False],
index=pd.to_datetime(['2015-05-01', '2015-05-02', '2015-05-03',
'2015-05-04', '2015-05-05', '2015-05-06',
'2015-05-07', '2015-05-08', '2015-05-09']),
columns=['A'])
# We have to ensure that the index is sorted
df.sort_index(inplace=True)
# Resetting the index to create a column
df.reset_index(inplace=True)
# Grouping by the cumsum and counting the number of dates and getting their min and max
df = df.groupby(df['A'].cumsum()).agg(
{'index': ['count', 'min', 'max']})
# Removing useless column level
df.columns = df.columns.droplevel()
print(df)
# count min max
# A
# 1 1 2015-05-01 2015-05-01
# 2 5 2015-05-02 2015-05-06
# 3 3 2015-05-07 2015-05-09
# Getting the max
df[df['count']==df['count'].max()]
# count min max
# A
# 2 5 2015-05-02 2015-05-06
(1, index A
0 2015-05-01 True
1 2015-05-02 True)
(2, index A
2 2015-05-03 False
3 2015-05-04 False
4 2015-05-05 False
5 2015-05-06 False)
(3, index A
6 2015-05-07 True)
(4, index A
7 2015-05-08 False
8 2015-05-09 False)
count min max
A
1 2 2015-05-01 2015-05-02
2 4 2015-05-03 2015-05-06
3 1 2015-05-07 2015-05-07
4 2 2015-05-08 2015-05-09
count min max
A
2 4 2015-05-03 2015-05-06
您可以使用
cumsum
来检测A
列中的更改,因为在python中boolean
可以求和。在很抱歉带回一个旧的帖子,但我注意到罗曼的回答有点不对劲——计数不正确,导致结果不准确。计数列中应该有4项:[2,4,1,2],最大值为4。在
为了证明这个问题-我已经把它分解了一点(df与上面接受的答案相同)。您可以看到结果组不正确:
多亏了DSM here的回答,当然还有罗曼的回答,将这两篇文章的技术结合起来就得到了答案。它们已经在它们来自的帖子中进行了解释,所以我将在下面的代码中继续讨论。在
^{pr2}$输出:
相关问题 更多 >
编程相关推荐