<p>创建数据帧</p>
<pre><code>import pandas as pd
df = pd.DataFrame({
'train': [1, 1, 1, 2, 1, 2],
'station': [1000, 1001, 1001, 1000, 1002, 1003],
'time': pd.to_datetime(['20200525 13:30:00',
'20200525 13:45:00',
'20200525 13:50:00',
'20200525 13:35:00',
'20200525 14:10:00',
'20200525 14:00:00']),
'mvt': [10, -1, 2, 20, 0, 0],
},
columns=['train', 'station', 'time', 'mvt'])
</code></pre>
<p>计算秩,以识别(火车站)1个运动对2个运动对。然后使用秩重新塑造数据帧:</p>
<pre><code>df['rank'] = df.groupby(['train', 'station'])['time'].rank().astype(int)
# re-shape the data frame - 'rank' is part of column label
x = (df.set_index(['train', 'station', 'rank'])
.unstack(level='rank')
.reset_index())
# find rows with a time with rank=2 ...
mask = x.loc[:, ('time', 2)].notna()
# ... and replace time-1 with time-2 (keep later time only)
x.loc[mask, ('time', 1)] = x.loc[mask, ('time', 2)]
# drop time-2
x = x.drop(columns=('time', 2))
# re-name columns
x.columns = ['train', 'station', 'time', 'mvt_x', 'mvt_y']
print(x)
train station time mvt_x mvt_y
0 1 1000 2020-05-25 13:30:00 10.0 NaN
1 1 1001 2020-05-25 13:50:00 -1.0 2.0
2 1 1002 2020-05-25 14:10:00 0.0 NaN
3 2 1000 2020-05-25 13:35:00 20.0 NaN
4 2 1003 2020-05-25 14:00:00 0.0 NaN
</code></pre>