<p>您可以使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html" rel="nofollow noreferrer">^{<cd1>}</a>将结束时间与移位的开始时间进行比较,并将任何相同的对设置为null:</p>
<pre><code>df['flag'] = df['start_time'].shift(-1)
df.loc[df['end_time'] == df['flag'], 'flag'] = pd.NaT
print(df)
start_time end_time duration id flag
0 2020-01-01 00:00:00 2020-01-01 00:30:00 30 A NaT
1 2020-01-01 00:30:00 2020-01-01 01:00:00 30 B NaT
2 2020-01-01 01:00:00 2020-01-01 01:30:00 30 C NaT
3 2020-01-01 01:30:00 2020-01-01 02:00:00 30 D 2020-01-04 05:00:00
4 2020-01-04 05:00:00 2020-01-04 05:30:00 30 E NaT
5 2020-01-04 05:30:00 2020-01-04 06:00:00 30 F NaT
6 2020-01-04 06:00:00 2020-01-04 06:30:00 30 G NaT
7 2020-01-04 06:30:00 2020-01-04 07:00:00 30 H 2020-01-04 20:30:00
8 2020-01-04 20:30:00 2020-01-04 21:00:00 30 I NaT
</code></pre>
<p>然后使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.bfill.html" rel="nofollow noreferrer">^{<cd2>}</a>以违反间隔条件的开始时间回填这些空值。您需要为最后一个值手动设置空值</p>
<pre><code>df['flag'] = df['flag'].bfill().fillna(df['end_time'].iloc[-2])
print(df)
start_time end_time duration id flag
0 2020-01-01 00:00:00 2020-01-01 00:30:00 30 A 2020-01-04 05:00:00
1 2020-01-01 00:30:00 2020-01-01 01:00:00 30 B 2020-01-04 05:00:00
2 2020-01-01 01:00:00 2020-01-01 01:30:00 30 C 2020-01-04 05:00:00
3 2020-01-01 01:30:00 2020-01-01 02:00:00 30 D 2020-01-04 05:00:00
4 2020-01-04 05:00:00 2020-01-04 05:30:00 30 E 2020-01-04 20:30:00
5 2020-01-04 05:30:00 2020-01-04 06:00:00 30 F 2020-01-04 20:30:00
6 2020-01-04 06:00:00 2020-01-04 06:30:00 30 G 2020-01-04 20:30:00
7 2020-01-04 06:30:00 2020-01-04 07:00:00 30 H 2020-01-04 20:30:00
8 2020-01-04 20:30:00 2020-01-04 21:00:00 30 I 2020-01-04 07:00:00
</code></pre>
<p>现在按照<a href="https://stackoverflow.com/a/59900869/12777044">ansev</a>的建议做:</p>
<pre><code>df = df.groupby('flag').agg({'start_time':'first','end_time':'last','duration':'sum','id':'first'}).reset_index(drop=True)
print(df)
start_time end_time duration id
0 2020-01-01 00:00:00 2020-01-01 02:00:00 120 A
1 2020-01-04 20:30:00 2020-01-04 21:00:00 30 I
2 2020-01-04 05:00:00 2020-01-04 07:00:00 120 E
</code></pre>