<p>这不是一个非常熊猫式的方法,但如果我理解你的目标是正确的,你实际上得到了你想要的结果</p>
<pre><code># a dict for unique filtered records
filtered_records = {}
def unique_key(row):
return '%s-%s-%d' % (row[columns[0]],row[columns[1]],row[columns[3]])
# populate a map of unique dt, train, station records
for index, row in df.iterrows():
key = unique_key(row)
val = filtered_records.get(key,None)
if val is None:
filtered_records[key] = row[columns[2]]
else:
# is there's a 1 and 2 record, declare the record a 6
if val * row[columns[2]] == 2:
filtered_records[key] = 6
# helper function for apply
def update_row_ptype(row):
val = filtered_records[unique_key(row)]
return val if val == 6 else row[columns[2]]
# update the dataframe with invalid detected entries from the dict
df[columns[2]] = df.apply(lambda row: update_row_ptype(row), axis = 1)
# drop em
df.drop(df[(df[columns[2]]==6)].index,inplace=True)
print df
</code></pre>
<p>输出</p>
<pre><code> pdate station ptype train
0 2019-06-20 12:28:00 05123 2 8888
3 2019-06-20 13:35:00 35478 2 1234
5 2019-06-20 14:22:00 98765 1 8888
</code></pre>