<h3>设置</h3>
<pre><code># Start with Valdi_Bo's expanded example data
data = [['1', 51, np.nan], ['1', 52, np.nan], ['1', 53, np.nan],
['2', 61, np.nan], ['2', 62, np.nan], ['2', 63, np.nan],
['3', np.nan, 1], ['3', np.nan, np.nan], ['3', np.nan, np.nan],
['4', np.nan, 2], ['4', np.nan, np.nan], ['4', np.nan, np.nan]]
df = pd.DataFrame(data, columns = ['Day', 'Data', 'repeat_tag'])
# Convert Day to integer data type
df['Day'] = df['Day'].astype(int)
# Spread repeat_tag values into all rows of tagged day
df['repeat_tag'] = df.groupby('Day')['repeat_tag'].ffill()
</code></pre>
<h3>解决方案</h3>
<pre><code># Within each day, assign a number to each row
df['obs'] = df.groupby('Day').cumcount()
# Self-join
filler = (pd.merge(df, df,
left_on=['repeat_tag', 'obs'],
right_on=['Day', 'obs'])
.set_index(['Day_x', 'obs'])['Data_y'])
# Fill missing data
df = df.set_index(['Day', 'obs'])
df.loc[df['Data'].isnull(), 'Data'] = filler
df = df.reset_index()
</code></pre>
<h3>结果</h3>
<pre><code>df
Day obs Data repeat_tag
0 1 0 51.0 NaN
1 1 1 52.0 NaN
2 1 2 53.0 NaN
3 2 0 61.0 NaN
4 2 1 62.0 NaN
5 2 2 63.0 NaN
6 3 0 51.0 1.0
7 3 1 52.0 1.0
8 3 2 53.0 1.0
9 4 0 61.0 2.0
10 4 1 62.0 2.0
11 4 2 63.0 2.0
</code></pre>