逐日比较两个数据框的列值
我有以下两个数据框
Box box_cap size Preference
1 16 1200 1
2 16 1550 2
3 15 1300 3
另一个是
Day Capacity
1 23
2 24
我需要的输出数据框是
Day Box box_cap
1 1 16
1 2 7
2 2 9
2 3 15
这个数据框基本上是按天分配球的数量,依据每天的容量来分配。优先级数字是用来按顺序放置箱子的。因为箱子1有最高的优先级,所以箱子1会优先处理。我正试图按照以下方式来实现这个目标
for i in df.index:
tun = df['Tundish'][i]
heat = df['Heat'][i]
width = df['Width'][i]
pref = df['Preference'][i]
tuntobegiven = []
for j in dfday.index:
day = dfday['Day'][j]
cap = dfday['Capacity'][j]
caps = cap - heat
if caps >= 0:
tuntobegiven.append((tun, day))
但是我还没能搞定。这里的数据是虚拟的,箱子的数量可以有很多。
1 个回答
1
在我看来,最简单的方法是使用一个循环的代码,另外可以选择用 numba 来提高性能:
from numba import jit
def allocate(df, dfday):
@jit(nopython=True) # optional
def compute(boxes, capacities):
box_idx = 0
cap_idx = 0
out = []
while (cap_idx < len(capacities)) and (box_idx < len(boxes)):
take = min(boxes[box_idx], capacities[cap_idx])
boxes[box_idx] -= take
capacities[cap_idx] -= take
out.append((cap_idx, box_idx, take))
if not capacities[cap_idx]:
cap_idx += 1
if not boxes[box_idx]:
box_idx += 1
return out
# ensure inputs are sorted by Preference/Day
df = df.sort_values(by='Preference', ignore_index=True)
dfday = dfday.sort_values(by='Day', ignore_index=True)
# run allocation
out = pd.DataFrame(compute(df['box_cap'].to_numpy(copy=True),
dfday['Capacity'].to_numpy(copy=True)),
columns=['Day', 'Box', 'box_cap'])
# convert indices to actual values
out['Day'] = dfday['Day'].to_numpy()[out['Day']]
out['Box'] = df['Box'].to_numpy()[out['Box']]
return out
out = allocate(df, dfday)
输出结果:
Day Box box_cap
0 1 1 16
1 1 2 7
2 2 2 9
3 2 3 15