我试图将assign
unique
中的值pandas
df
分配给特定的个人。在
对于下面的df
,[Area]
和{unique
值,这些值是不同的作业。这些值将分配给个人,总体目标是尽可能少地使用个人。在
诀窍在于这些值是不断开始和结束的,并且持续时间不同。任何一次分配给个人的最多unique
值是3。[On]
显示当前[Place]和[Area]的唯一值的数量。在
所以这为我需要多少个人提供了一个具体的指导。e、 g.3unique
取值1=1人,6个唯一值on=2人
我不能做一个groupby
语句,其中我assign
第一个3 unique values
到{unique
值到individual 2
等等
我设想的是,当unique
值大于3时,我想先将[Area]
中的值分组,然后合并剩余的值。因此,请将assign
中的[Area]
中的值相同(最多3个)。然后,如果存在_leftover_
值(<;3),则应尽可能将它们组合成一组3个。在
我设想这项工作的方式是:通过一个hour
展望未来。对于每一个新的row
,script
应该看到有多少个值是[On]
(这提供了需要多少个个体的指示)。其中unique
值是>;3,它们应该是assigned
乘以{
对于下面的df
,出现在[Place]
和{unique
值的数量在1-6之间变化。所以我们不应该有超过2个人assigned
。当unique
的值是>;3时,它应该首先由[Area]
赋值。剩余的值应与其他值小于3unique
的个体合并。在
为大df道歉。这是我复制问题的唯一方法!
import pandas as pd
import numpy as np
from collections import Counter
d = ({
'Time' : ['8:03:00','8:17:00','8:20:00','8:33:00','8:47:00','8:48:00','9:03:00','9:15:00','9:18:00','9:33:00','9:45:00','9:48:00','10:03:00','10:15:00','10:15:00','10:15:00','10:18:00','10:32:00','10:33:00','10:39:00','10:43:00','10:48:00','10:50:00','11:03:00','11:03:00','11:07:00','11:25:00','11:27:00','11:42:00','11:48:00','11:51:00','11:57:00','12:00:00','12:08:00','12:15:00','12:17:00','12:25:00','12:30:00','12:35:00','12:39:00','12:47:00','12:52:00','12:55:00','13:00:00','13:03:00','13:07:00','13:12:00','13:15:00','13:22:00','13:27:00','13:27:00'],
'Area' : ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','B','A','B','A','A','A','A','B','A','A','B','B','A','B','C','A','B','C','C','A','B','C','C','B','A','C','B','C','C','A','C','B','C','C','A','C'],
'Place' : ['House 1','House 2','House 3','House 1','House 3','House 2','House 1','House 3','House 2','House 1','House 3','House 2','House 1','House 3','House 4','House 1','House 2','House 1','House 1','House 4','House 3','House 2','House 1','House 1','House 4','House 1','House 1','House 4','House 1','House 1','House 4','House 1','House 2','House 1','House 4','House 1','House 1','House 2','House 1','House 4','House 1','House 1','House 3','House 2','House 4','House 1','House 2','House 4','House 1','House 4','House 2'],
'On' : ['1','2','3','3','3','3','3','3','3','3','3','3','3','3','4','5','5','5','5','5','5','4','3','3','3','2','2','2','2','3','3','3','4','4','4','4','4','4','4','4','4','4','4','4','4','4','5','6','6','6','6'],
'Person' : ['Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 1','Person 2','Person 3','Person 1','Person 3','Person 1','Person 2','Person 1','Person 1','Person 3','Person 1','Person 2','Person 3','Person 3','Person 2','Person 3','Person 4','Person 2','Person 3','Person 4','Person 4','Person 2','Person 3','Person 4','Person 4','Person 3','Person 2','Person 4','Person 3','Person 4','Person 4','Person 2','Person 4','Person 3','Person 5','Person 4','Person 2','Person 4'],
})
df = pd.DataFrame(data=d)
def getAssignedPeople(df, areasPerPerson):
areas = df['Area'].values
places = df['Place'].values
times = pd.to_datetime(df['Time']).values
maxPerson = np.ceil(areas.size / float(areasPerPerson)) - 1
assignmentCount = Counter()
assignedPeople = []
assignedPlaces = {}
heldPeople = {}
heldAreas = {}
holdAvailable = True
person = 0
# search for repeated areas. Mark them if the next repeat occurs within an hour
ixrep = np.argmax(np.triu(areas.reshape(-1, 1)==areas, k=1), axis=1)
holds = np.zeros(areas.size, dtype=bool)
holds[ixrep.nonzero()] = (times[ixrep[ixrep.nonzero()]] - times[ixrep.nonzero()]) < np.timedelta64(1, 'h')
for area,place,hold in zip(areas, places, holds):
if (area, place) in assignedPlaces:
# this unique (area, place) has already been assigned to someone
assignedPeople.append(assignedPlaces[(area, place)])
continue
if assignmentCount[person] >= areasPerPerson:
# the current person is already assigned to enough areas, move on to the next
a = heldPeople.pop(person, None)
heldAreas.pop(a, None)
person += 1
if area in heldAreas:
# assign to the person held in this area
p = heldAreas.pop(area)
heldPeople.pop(p)
else:
# get the first non-held person. If we need to hold in this area,
# also make sure the person has at least 2 free assignment slots,
# though if it's the last person assign to them anyway
p = person
while p in heldPeople or (hold and holdAvailable and (areasPerPerson - assignmentCount[p] < 2)) and not p==maxPerson:
p += 1
assignmentCount.update([p])
assignedPlaces[(area, place)] = p
assignedPeople.append(p)
if hold:
if p==maxPerson:
# mark that there are no more people available to perform holds
holdAvailable = False
# this area recurrs in an hour, mark that the person should be held here
heldPeople[p] = area
heldAreas[area] = p
return assignedPeople
def allocatePeople(df, areasPerPerson=3):
assignedPeople = getAssignedPeople(df, areasPerPerson=areasPerPerson)
df = df.copy()
df.loc[:,'Person'] = df['Person'].unique()[assignedPeople]
return df
print(allocatePeople(df))
输出:
^{pr2}$预期输出和我认为应该分配的原因的注释:
There's a live version of this answer online that you can try for yourself.
问题
您看到的bug是由于您的问题(又一个)有趣的边缘情况。在
6th
作业期间,代码将person 2
分配给(A, House 4)
。然后它看到区域A
在一小时内重复,因此它在该区域中保持person 2
。这使得person 2
不可用于下一个作业,该作业位于B
区域中。在但是,没有理由为了在}中保留{},因为区域和地点的唯一组合{}已经被分配给
(A, House 1)
中发生的作业而在区域{person 1
。在解决方案
这个问题可以通过在决定何时把人关在某个区域时只考虑区域和地点的独特组合来解决。只有几行代码需要更改。在
首先,我们构建一个与唯一(区域、地点)对相对应的区域列表:
然后我们只需在标识holds的代码的第一行中用}:
^{pr2}$unqareas
代替{完整列表/测试
输出:
相关问题 更多 >
编程相关推荐