基于多个条件查找大型lis的所有组合

2024-04-19 21:57:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在计算一个幻想自行车比赛的最佳队伍。我有一个csv文件,其中包含176名自行车手,他们的球队,他们得分的数额和价格,把他们放在我的团队。我正在努力寻找得分最高的16人车队。在

适用于任何团队组成的规则是:

  • 这支队伍的总费用不能超过100英镑。在
  • 同一队的骑手不能超过4人。在

下面是我的csv文件的简短摘录。在

THOMAS Geraint,Team INEOS,142,13
SAGAN Peter,BORA - hansgrohe,522,11.5
GROENEWEGEN Dylan,Team Jumbo-Visma,205,11
FUGLSANG Jakob,Astana Pro Team,46,10
BERNAL Egan,Team INEOS,110,10
BARDET Romain,AG2R La Mondiale,21,9.5
QUINTANA Nairo,Movistar Team,58,9.5
YATES Adam,Mitchelton-Scott,40,9.5
VIVIANI Elia,Deceuninck - Quick Step,273,9.5
YATES Simon,Mitchelton-Scott,13,9
EWAN Caleb,Lotto Soudal,13,9

解决这个问题的最简单的方法是生成一个所有可能的团队组合的列表,然后应用规则排除不符合上述规则的团队。在这之后,很容易计算出每支球队的总得分,并选出得分最高的一支。理论上,可以通过下面的代码生成可用团队的列表。在

^{pr2}$

然而,将176名自行车运动员的所有组合计算成16人一组,这对我的电脑来说只是too many calculations的事情,尽管this问题的答案意味着其他的事情。我做了一些研究,但没有找到任何方法来应用上述条件,而不必迭代每个选项。在

有没有办法找到一个最佳的团队,要么通过使上述方法发挥作用,要么使用另一种方法?在

编辑: 在文本中,可以找到完整的CSV文件here


Tags: 文件csv方法列表规则自行车团队事情
2条回答

我为你的问题补充一个答案:

The CSV I posted was actually modified, my original one also contains a list for each rider with their score for each stage. This list looks like this [0, 40, 13, 0, 2, 55, 1, 17, 0, 14]. I am trying to find the team that performs the best overall. So I have a pool of 16 cyclists, from which the score of 10 cyclists counts towards the score of each day. The scores for each day are then summed to get a total score. The purpose is to get this final total score as high as possible.

如果你认为我应该编辑我的第一篇文章,请让我知道,我认为这样更清楚,因为我的第一篇文章相当密集,回答了最初的问题。在

我们引入一个新变量:

Zik = 1 if cyclist i is selected and is one of the top 10 in your team on day k

您需要添加这些约束以链接变量Zik和Xi(如果未选择cyclist i,即Xi=0,则变量Zik不能为1)

^{pr2}$

每天选择10名骑车人的限制条件如下:

For all k, sum(Zik, 1<=i<=n_cyclists) <= 10

最后,你的目标可以这样写:

Maximize sum(pik * Xi * Zik, 1<=i<=n_cyclists, 1 <= k <= n_days)
# where pik = nb_points of cyclist i at day k

这是思考的部分。这样写的目标不是线性的(注意两个变量X和Z之间的乘法)。幸运的是,这两个二进制文件都有,而且有一个技巧可以将这个公式转换成它的线性形式。在

enter image description here

让我们再次引入新的变量,比如(Lik = Xi * Zik)来线性化目标。在

目标现在可以这样写,并且是线性的:

Maximize sum(pik * Lik, 1<=i<=n_cyclists, 1 <= k <= n_days)
# where pik = nb_points of cyclist i at day k

现在我们需要添加这些约束,使Lik等于Xi * Zik

For all i,k : Xi + Zik - 1 <= Lik
For all i,k : Lik <= 1/2 * (Xi + Zik)

好吧。这就是数学的美,你可以用线性方程组来模拟很多事情。我提出了先进的概念,如果你第一眼看不懂是正常的。在


我在这个file上模拟了每天的分数列。在

下面是解决新问题的Python代码:

import ast
from ortools.linear_solver import pywraplp
import pandas as pd


solver = pywraplp.Solver('cyclist', pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
cyclist_df = pd.read_csv('cyclists_day.csv')
cyclist_df['Punten_day'] = cyclist_df['Punten_day'].apply(ast.literal_eval)

# Variables
variables_name = {}
variables_team = {}
variables_name_per_day = {}
variables_linear = {}

for _, row in cyclist_df.iterrows():
    variables_name[row['Naam']] = solver.IntVar(0, 1, 'x_{}'.format(row['Naam']))
    if row['Ploeg'] not in variables_team:
        variables_team[row['Ploeg']] = solver.IntVar(0, solver.infinity(), 'y_{}'.format(row['Ploeg']))

    for k in range(10):
        variables_name_per_day[(row['Naam'], k)] = solver.IntVar(0, 1, 'z_{}_{}'.format(row['Naam'], k))
        variables_linear[(row['Naam'], k)] = solver.IntVar(0, 1, 'l_{}_{}'.format(row['Naam'], k))

# Link cyclist <-> team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, solver.infinity())
    constraint.SetCoefficient(var, 1)
    for cyclist in cyclist_df[cyclist_df.Ploeg == team]['Naam']:
        constraint.SetCoefficient(variables_name[cyclist], -1)

# Max 4 cyclist per team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, 4)
    constraint.SetCoefficient(var, 1)

# Max cyclists
constraint_max_cyclists = solver.Constraint(16, 16)
for cyclist in variables_name.values():
    constraint_max_cyclists.SetCoefficient(cyclist, 1)

# Max cost
constraint_max_cost = solver.Constraint(0, 100)
for _, row in cyclist_df.iterrows():
    constraint_max_cost.SetCoefficient(variables_name[row['Naam']], row['Waarde'])

# Link Zik and Xi
for name, cyclist in variables_name.items():
    constraint_link_cyclist_day = solver.Constraint(-solver.infinity(), 0)
    constraint_link_cyclist_day.SetCoefficient(cyclist, - 10)
    for k in range(10):
        constraint_link_cyclist_day.SetCoefficient(variables_name_per_day[name, k], 1)

# Min/Max 10 cyclists per day
for k in range(10):
    constraint_cyclist_per_day = solver.Constraint(10, 10)
    for name in cyclist_df.Naam:
        constraint_cyclist_per_day.SetCoefficient(variables_name_per_day[name, k], 1)

# Linearization constraints 
for name, cyclist in variables_name.items():
    for k in range(10):
        constraint_linearization1 = solver.Constraint(-solver.infinity(), 1)
        constraint_linearization2 = solver.Constraint(-solver.infinity(), 0)

        constraint_linearization1.SetCoefficient(cyclist, 1)
        constraint_linearization1.SetCoefficient(variables_name_per_day[name, k], 1)
        constraint_linearization1.SetCoefficient(variables_linear[name, k], -1)

        constraint_linearization2.SetCoefficient(cyclist, -1/2)
        constraint_linearization2.SetCoefficient(variables_name_per_day[name, k], -1/2)
        constraint_linearization2.SetCoefficient(variables_linear[name, k], 1)

# Objective 
objective = solver.Objective()
objective.SetMaximization()

for _, row in cyclist_df.iterrows():
    for k in range(10):
        objective.SetCoefficient(variables_linear[row['Naam'], k], row['Punten_day'][k])

solver.Solve()

chosen_cyclists = [key for key, variable in variables_name.items() if variable.solution_value() > 0.5]

print('\n'.join(chosen_cyclists))

for k in range(10):
    print('\nDay {} :'.format(k + 1))
    chosen_cyclists_day = [name for (name, day), variable in variables_name_per_day.items() 
                       if (day == k and variable.solution_value() > 0.5)]
    assert len(chosen_cyclists_day) == 10
    assert all(chosen_cyclists_day[i] in chosen_cyclists for i in range(10))
    print('\n'.join(chosen_cyclists_day))

结果如下:

您的团队:

SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
BENOOT Tiesj
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús
MEURISSE Xandro
GRELLIER Fabien

每天选择骑自行车的人

Day 1 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús

Day 2 :
SAGAN Peter
ALAPHILIPPE Julian
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
NIZZOLO Giacomo
MEURISSE Xandro

Day 3 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
MATTHEWS Michael
TRENTIN Matteo
VAN AVERMAET Greg
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús

Day 4 :
SAGAN Peter
VIVIANI Elia
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús

Day 5 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
CICCONE Giulio
HERRADA Jesús

Day 6 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
TRENTIN Matteo
COLBRELLI Sonny
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike

Day 7 :
SAGAN Peter
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
COLBRELLI Sonny
VAN AVERMAET Greg
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús
MEURISSE Xandro

Day 8 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
MATTHEWS Michael
STUYVEN Jasper
TEUNISSEN Mike
HERRADA Jesús
NIZZOLO Giacomo
MEURISSE Xandro

Day 9 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
ALAPHILIPPE Julian
PINOT Thibaut
TRENTIN Matteo
COLBRELLI Sonny
VAN AVERMAET Greg
TEUNISSEN Mike
HERRADA Jesús

Day 10 :
SAGAN Peter
GROENEWEGEN Dylan
VIVIANI Elia
PINOT Thibaut
COLBRELLI Sonny
STUYVEN Jasper
CICCONE Giulio
TEUNISSEN Mike
HERRADA Jesús
NIZZOLO Giacomo

让我们比较答案1和答案2的结果print(solver.Objective().Value())

第一个模型得到3738.0,第二个模型得到3129.087388325567。该值较低,因为每个阶段只选择10个自行车手,而不是16个。在

如果保留第一个解并使用新的评分方法,我们得到3122.9477585307413

我们可以认为第一个模型已经足够好了:我们不必引入新的变量/约束,模型保持简单,我们得到的解决方案几乎和复杂模型一样好。有时不需要100%的精确性,通过一些近似值可以更容易、更快地求解模型。在

这是一个经典的operations research问题。在

有大量的算法可以找到一个最优的(或者仅仅是一个非常好的,取决于算法)解决方案:

  • 混合整数规划
  • 元启发式
  • 约束规划
  • 。。。在

下面的代码将使用MIP、或工具库和默认解算器COIN-OR找到最佳解决方案:

from ortools.linear_solver import pywraplp
import pandas as pd


solver = pywraplp.Solver('cyclist', pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)    
cyclist_df = pd.read_csv('cyclists.csv')

# Variables

variables_name = {}
variables_team = {}

for _, row in cyclist_df.iterrows():
    variables_name[row['Naam']] = solver.IntVar(0, 1, 'x_{}'.format(row['Naam']))
    if row['Ploeg'] not in variables_team:
        variables_team[row['Ploeg']] = solver.IntVar(0, solver.infinity(), 'y_{}'.format(row['Ploeg']))

# Constraints

# Link cyclist <-> team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, solver.infinity())
    constraint.SetCoefficient(var, 1)
    for cyclist in cyclist_df[cyclist_df.Ploeg == team]['Naam']:
        constraint.SetCoefficient(variables_name[cyclist], -1)

# Max 4 cyclist per team
for team, var in variables_team.items():
    constraint = solver.Constraint(0, 4)
    constraint.SetCoefficient(var, 1)

# Max cyclists
constraint_max_cyclists = solver.Constraint(16, 16)
for cyclist in variables_name.values():
    constraint_max_cyclists.SetCoefficient(cyclist, 1)

# Max cost
constraint_max_cost = solver.Constraint(0, 100)
for _, row in cyclist_df.iterrows():
    constraint_max_cost.SetCoefficient(variables_name[row['Naam']], row['Waarde'])    

# Objective 
objective = solver.Objective()
objective.SetMaximization()

for _, row in cyclist_df.iterrows():
    objective.SetCoefficient(variables_name[row['Naam']], row['Punten totaal:'])

# Solve and retrieve solution     
solver.Solve()

chosen_cyclists = [key for key, variable in variables_name.items() if variable.solution_value() > 0.5]

print(cyclist_df[cyclist_df.Naam.isin(chosen_cyclists)])

印刷品:

^{pr2}$

这个代码如何解决这个问题?正如@KyleParsons所说,它看起来像背包问题,可以用整数规划建模。在

让我们定义变量Xi (0 <= i <= nb_cyclists)Yj (0 <= j <= nb_teams)。在

Xi = 1 if cyclist n°i is chosen, =0 otherwise

Yj = n where n is the number of cyclists chosen within team j

要定义这些变量之间的关系,可以对这些约束进行建模:

# Link cyclist <-> team
For all j, Yj >= sum(Xi, for all i where Xi is part of team j)

若要为每个车队最多选择4名自行车手,请创建以下约束:

# Max 4 cyclist per team
For all j, Yj <= 4

要选择16名自行车手,以下是相关的约束:

# Min 16 cyclists 
sum(Xi, 1<=i<=nb_cyclists) >= 16
# Max 16 cyclists 
sum(Xi, 1<=i<=nb_cyclists) <= 16

成本约束:

# Max cost 
sum(ci * Xi, 1<=i<=n_cyclists) <= 100 
# where ci = cost of cyclist i

然后你可以最大化

# Objective
max sum(pi * Xi, 1<=i<=n_cyclists)
# where pi = nb_points of cyclist i

请注意,我们使用线性目标和线性不等式约束对问题进行建模。如果Xi和Yj是连续变量,这个问题将是polynomial(线性规划),可以使用以下方法解决:

因为这些变量都是整数(整数规划或混合整数规划),这个问题被称为NP_complete类的一部分(除非你是genious),否则无法使用多项式解来解决。像COIN-OR这样的解算器使用复杂的Branch & Bound或{a10}方法来有效地求解它们。ortools为在python中使用COIN提供了一个很好的包装器。这些工具是免费和开源的。在

所有这些方法的优点是在不迭代所有可能解的情况下找到一个最优解(并大大减少了组合运算)。在

相关问题 更多 >