<p>我认为您需要<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html" rel="nofollow">^{<cd1>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html" rel="nofollow">^{<cd2>}</a>,然后使用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html" rel="nofollow">^{<cd4>}</a>的参数<code>keep=first</code>:</p>
<pre><code>print df
IDnumber Subid Subsubid Date Originaldataindicator
0 a 1 x 2006 NaN
1 a 1 x 2007 NaN
2 a 1 x 2008 NaN
3 a 1 x 2008 1
4 a 1 x 2008 NaN
df = df.groupby(['IDnumber', 'Subid', 'Subsubid', 'Date'])
.apply(lambda x: x.sort_values('Originaldataindicator')).reset_index(drop=True)
print df
IDnumber Subid Subsubid Date Originaldataindicator
0 a 1 x 2006 NaN
1 a 1 x 2007 NaN
2 a 1 x 2008 1
3 a 1 x 2008 NaN
4 a 1 x 2008 NaN
df1=df.drop_duplicates(subset=['IDnumber', 'Subid', 'Subsubid', 'Date'], keep='first')
print df1
IDnumber Subid Subsubid Date Originaldataindicator
0 a 1 x 2006 NaN
1 a 1 x 2007 NaN
2 a 1 x 2008 1
</code></pre>
<p>或者使用<code>inplace</code>:</p>
<pre><code>df.drop_duplicates(subset=['IDnumber','Subid','Subsubid','Date'], keep='first', inplace=True)
print df
IDnumber Subid Subsubid Date Originaldataindicator
0 a 1 x 2006 NaN
1 a 1 x 2007 NaN
2 a 1 x 2008 1
</code></pre>
<p>如果列<code>Originaldataindicator</code>有多个值,请使用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.duplicated.html" rel="nofollow">^{<cd7>}</a>(可能可以添加所有列<code>IDnumber</code>、<code>Subid</code>、<code>Subsubid</code>、<code>Date</code>)和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isnull.html" rel="nofollow">^{<cd12>}</a>:</p>
<pre><code>print df
IDnumber Subid Subsubid Date Originaldataindicator
0 a 1 x 2006 NaN
1 a 1 x 2007 NaN
2 a 1 x 2008 NaN
3 a 1 x 2008 1
4 a 1 x 2008 1
print df[~((df.duplicated('Date',keep=False))&~(pd.notnull(df['Originaldataindicator'])))]
IDnumber Subid Subsubid Date Originaldataindicator
0 a 1 x 2006 NaN
1 a 1 x 2007 NaN
3 a 1 x 2008 1
4 a 1 x 2008 1
</code></pre>
<p>说明条件:</p>
<pre><code>print df.duplicated('Date', keep=False)
0 False
1 False
2 True
3 True
4 True
dtype: bool
print (pd.isnull(df['Originaldataindicator']))
0 True
1 True
2 True
3 False
4 False
Name: Originaldataindicator, dtype: bool
print ~((df.duplicated('Date', keep=False)) & (pd.isnull(df['Originaldataindicator'])))
0 True
1 True
2 False
3 True
4 True
dtype: bool
</code></pre>