<h2>蛮力(原始)答案</h2>
<p>这个答案集中在一个平凡的问题上,即如何利用熊猫(和小熊猫)来找到解决问题的方法。从算法复杂度的角度来看,它是幼稚的,实际上是<code>O(2^n)</code>,因为它评估所有可能的列组合。因此,它是<a href="https://en.wikipedia.org/wiki/Time_complexity#Exponential_time" rel="nofollow noreferrer">Exponential time</a></p>
<p>请参见<a href="https://stackoverflow.com/a/65394551/758174">my other answer</a>了解具有中值时间<code>O(n^2)</code>的波束搜索解决方案</p>
<p>在这种天真的方法中,我们将:</p>
<ol>
<li>表示所有列的组合(不包括<code>Profit</code>)</李>
<li>定义一个<code>filtered</code>函数,该函数根据给定的列集过滤数据帧</李>
<li>定义一个<code>metrics</code>函数,该函数返回一个元组<code>(sum(Profit), posCount, -negCount)</code></李>
<li>计算所有组合的度量,并组装成一个<code>df</code></李>
<li>按度量元组对<code>df</code>进行排序</李>
</ol>
<pre class="lang-py prettyprint-override"><code>from itertools import combinations
def metrics(s):
# returns three quantities on a Series s: sum, poscount, -negcount
return s.sum(), (s > 0).sum(), -(s < 0).sum()
def filtered(df, combo):
# given a combo: set of columns, filter the df to keep
# the rows where all the columns are True
mask = np.all(df[combo], axis=1)
return df.loc[mask]
def brute_force_all(df):
"""
Return all brute-force solutions. O(2^n).
"""
# get all columns (except for 'profit') combinations
crit_cols = [k for k in df.columns if k != 'profit']
combos = [set(combo) for n in range(0, len(crit_cols) + 1)
for combo in combinations(crit_cols, n)]
# assemble a df made of metrics and colset
res = pd.DataFrame([
metrics(filtered(df, combo)['profit']) + (combo,)
for combo in combos
], columns='total poscount negcount colset'.split())
# finally, sort to expose the "best" result first
res = res.sort_values(['total', 'poscount', 'negcount'], ascending=False)
res = res.reset_index(drop=True)
return res
</code></pre>
<p>关于您的数据的示例:</p>
<pre><code> total poscount negcount colset
0 165 3 -3 {}
9 129 2 -1 {Crit5, Crit1}
20 129 2 -1 {Crit5, Crit1, Crit3}
5 124 2 -2 {Crit5}
14 124 2 -2 {Crit5, Crit3}
...
29 0 0 0 {Crit5, Crit2, Crit4, Crit3}
30 0 0 0 {Crit2, Crit4, Crit5, Crit1, Crit3}
7 -70 0 -1 {Crit1, Crit4}
12 -70 0 -1 {Crit3, Crit4}
18 -70 0 -1 {Crit3, Crit1, Crit4}
</code></pre>
<p>详情:</p>
<p>为了理解上面的代码,最好检查我们开始计算的一些量。例如:</p>
<pre class="lang-py prettyprint-override"><code>>>> combos
[set(),
{'Crit1'},
...
{'Crit5'},
{'Crit1', 'Crit2'},
...
{'Crit4', 'Crit5'},
{'Crit1', 'Crit2', 'Crit3'},
...
{'Crit3', 'Crit4', 'Crit5'},
{'Crit1', 'Crit2', 'Crit3', 'Crit4'},
...
{'Crit2', 'Crit3', 'Crit4', 'Crit5'},
{'Crit1', 'Crit2', 'Crit3', 'Crit4', 'Crit5'}]
# metrics on the unfiltered (whole) data:
>>> metrics(data['Profit'])
(165, 3, -3)
# data filtered where Crit2 and Crit3 are True:
>>> filtered(data, {'Crit2', 'Crit3'})
Profit Crit1 Crit2 Crit3 Crit4 Crit5
3 40 True True True False True
4 -5 False True True False True
# metrics on the above:
>>> metrics(filtered(data, {'Crit2', 'Crit3'})['Profit'])
(35, 1, -1)
</code></pre>