按等级对加权值进行分组和排序

2024-05-14 15:49:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我有由几个值组成的数据。每个值都有一个正权重和一个秩。我想返回这些值的子列表,以便通过增加列表中的秩来添加值,直到它们的权重之和最小可能大于或等于给定目标,并附加一个约束,即如果添加值,则必须添加所有具有相同秩的值,ie值由排名组添加到列表中

我写了下面的算法,速度很慢,不容易阅读。输入是一个元组列表(我们可以假设它已经按秩排序);每个元组包含秩、值和权重。需要返回的只是保留值的列表及其总重量。该算法按秩对数据进行分组,然后计算出要添加多少秩以达到适当的累积权重。有什么加速和改进的建议吗

对于输入列表[(1, 'a', 0.10131488885239891), (1, 'b', 0.05189220940446056), (1, 'c', 0.05233086643211586), (4, 'd', 0.019651238401109394), (5, 'e', 0.11439063810128784), (6, 'f', 0.003242047085661822), (6, 'g', 0.03093925229983648), (8, 'h', 0.09737202015393008), (8, 'i', 0.02024957487257707), (10, 'j', 0.08454098571100034), (11, 'k', 0.18524611312188527), (11, 'l', 0.07110968996322396), (11, 'm', 0.023236464232769254), (14, 'n', 0.14448401136774303)]和目标权重0.47,它返回子列表['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']和保持权重0.49138273560337803

weightGoal = 0.47 # Target weight
    
# Example of data; each tuple has (rank, value and weight)
individualData = [(1, 'a', 0.10131488885239891), (1, 'b', 0.05189220940446056), (1, 'c', 0.05233086643211586), (4, 'd', 0.019651238401109394), (5, 'e', 0.11439063810128784), (6, 'f', 0.003242047085661822), (6, 'g', 0.03093925229983648), (8, 'h', 0.09737202015393008), (8, 'i', 0.02024957487257707), (10, 'j', 0.08454098571100034), (11, 'k', 0.18524611312188527), (11, 'l', 0.07110968996322396), (11, 'm', 0.023236464232769254), (14, 'n', 0.14448401136774303)]
print("Input", individualData, weightGoal)

# Determine cumulative weight for each value.
vCumulWeight = [0]
for ind in individualData: vCumulWeight.append(vCumulWeight[-1] + ind[-1])
vCumulWeight.pop(0)

# To group by rank    
ranks = list(set([v[0] for v in individualData])) # Unique values of the rank
ranks.sort(reverse=False) # Sort the list by rank
rankWeights = [] # Total weight of values with same rank
rankValues = [] # Values with same rank

# Group points by rank
for r in ranks:
    rankW, rankV = 0, []
    for ind in individualData:
        if r == ind[0]:
            rankW += ind[-1]
            rankV.append(ind[1])
    rankWeights.append(rankW)
    rankValues.append(rankV)
    
# Get cumulative weight of each group
rankCumulWeight = [0]
for op in rankWeights: rankCumulWeight.append(rankCumulWeight[-1] + op)
rankCumulWeight.pop(0)

# Rank all values and all groups
fullRankingVals, fullRankingGroups, rnk, prevLen, prevCP = [], [], 0, 1, 0
for i, (o, pr, ov, cp) in enumerate(zip(ranks, rankWeights, rankValues, rankCumulWeight)):
    fullRankingGroups.append((i+1, ov, o, pr, cp, ((cp <= weightGoal) or (prevCP < weightGoal))))
    rnk += prevLen
    for v in ov: fullRankingVals.append((rnk, v, o, pr/len(ov), cp, ((cp <= weightGoal) or (prevCP < weightGoal))))
    prevLen = len(ov)
    prevCP = cp

print("============ Check by value")
for v in fullRankingVals: print(v)
print("============ Check by group")
for v in fullRankingGroups: print(v)
print("============")

## Actual acceptance
tmpAccepted = [v for v in fullRankingVals if (v[-1])]
acceptedValues = [v[1] for v in tmpAccepted]
accdeptedWeight = max(v[-2] for v in tmpAccepted)
print("Result", acceptedValues, accdeptedWeight)

Tags: in列表forbycp权重ovprint
2条回答

我是否可以建议您采取可能的最简单的方法,只需分小块解决问题

让我们将其分解为3个小步骤,以便您了解实际发生的情况:

  1. 使用rank作为键按升序对数据进行排序
  2. 将数据添加到我们的storage,直到达到限制weightGoal
  3. 如果到达weightGoal,如果value与我们添加到storage的上一个相同,则添加到我们的storage

现在,让我们对上述流程进行编码:

weightGoal = 0.47 # Target weight
    
# Example of data; each tuple has (rank, value and weight)
individualData = [(1, 'a', 0.10131488885239891), (1, 'b', 0.05189220940446056), (1, 'c', 0.05233086643211586), (4, 'd', 0.019651238401109394), (5, 'e', 0.11439063810128784), (6, 'f', 0.003242047085661822), (6, 'g', 0.03093925229983648), (8, 'h', 0.09737202015393008), (8, 'i', 0.02024957487257707), (10, 'j', 0.08454098571100034), (11, 'k', 0.18524611312188527), (11, 'l', 0.07110968996322396), (11, 'm', 0.023236464232769254), (14, 'n', 0.14448401136774303)]

# step 1
individualData.sort(key=lambda x: x[0])

storage = 0
last = None
for data in individualData:
    if storage < weightGoal:  # step 2
        last = data
        storage += data[2]
    elif last and data[1] == last[1]:  # step 3
        storage += data[2]
        # no need to change `last` here unless you really want to
    else:  # termination clause
        break

现在,最后一个summation存储在storage变量中,您还可以使用last变量知道最后添加的元素

如果有任何歧义,请告诉我

Pandas一起:

import pandas as pd

df = pd.DataFrame(l, columns=['rank', 'value', 'weight'])
df = df.groupby('rank').agg({'value': list, 'weight': sum})

m = ~((df.weight.cumsum() > 0.47).shift(
    fill_value=False))

result = df[m]['value'].explode().to_list()
weight = df[m]['weight'].cumsum().iloc[-1]

相关问题 更多 >

    热门问题