擅长:python、mysql、java
<p>在这里使用<code>filter</code>和lambda函数会大大降低速度。你可以通过移除它来加快速度。你知道吗</p>
<hr/>
<pre><code>u = coinc.groupby('id')
m = u.temp1.any() & u.temp2.any()
res = df.loc[coinc.id.isin(m[m].index), ['id']]
</code></pre>
<hr/>
<p>在更大的框架上比较这个方法。你知道吗</p>
<pre><code>a = np.random.randint(1, 1000, 100_000)
b = np.random.randint(0, 2, 100_000, dtype=bool)
c = ~b
coinc = pd.DataFrame({'id': a, 'temp1': b, 'temp2': c})
In [295]: %%timeit
...: u = coinc.groupby('id')
...: m = u.temp1.any() & u.temp2.any()
...: res = coinc.loc[coinc.id.isin(m[m].index), ['id']]
...:
13.5 ms ± 476 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [296]: %%timeit
...: grouped = coinc.groupby('id')
...: final = grouped.filter(lambda x: ( x['temp2'].any() and x['temp1'].any()))
...: lanif = final.drop(['temp1','temp2'],axis = 1 )
...:
527 ms ± 7.91 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
</code></pre>
<hr/>
<pre><code>np.array_equal(res.values, lanif.values)
</code></pre>
<p/>
<pre><code>True
</code></pre>