# Get index of (shop, previous month, day).
# This will serve as a unique index to look up prev. month sale.
prev = pd.concat((df.shop, df.month - 1, df.day), axis=1)
# Unfortunately need to convert to list of tuples for MultiIndexing
prev = pd.MultiIndex.from_arrays(prev.values.T)
# old: [tuple(i) for i in prev.values]
# Now call .loc on df to look up each prev. month sale.
sale_prev_month = df.set_index(['shop', 'month', 'day']).loc[prev]
# And finally just concat rather than merge/join operation
# because we want to ignore index & mimic a left join.
df = pd.concat((df, sale_prev_month.reset_index(drop=True)), axis=1)
shop month day sale sale
0 1 7 1 10 8.0
1 1 6 1 8 9.0
2 1 5 1 9 NaN
3 2 7 1 10 8.0
4 2 6 1 8 9.0
5 2 5 1 9 NaN
一种使用
.concat()
、set_index()
和.loc[]
的解决方案:您的新列将是float,而不是int,because表示NaNs的存在。在
更新-尝试使用dask
我不使用达斯克每天,所以这可能是可悲的低于标准。试图解决dask没有实现熊猫的多重索引这一事实。因此,您可以将现有的三个索引连接到一个字符串列中,并对其进行查找。在
^{pr2}$相关问题 更多 >
编程相关推荐