我正在尝试将各种过滤器应用于一个新的df。如果可能的话,我希望应用向量运算,而不是在每一行中循环(我现在正在这样做,速度慢得令人无法接受)。然而,有些过滤器不是很简单
productids = [
'01t0J00000HcoqpQAB', '01t0J00000HcoqnQAB', '01t0J00000HcoqyQAB',
'01t0J00000Hcor3QAB', '01t0J00000Hcor5QAB', '01t0J00000Hcor6QAB',
'01t0J00000Hcor9QAB', '01t0J00000HcorCQAR', '01t0J00000HcorGQAR',
'01t0J00000IDGAOQA5'
]
previous_products = [
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'}
]
df_test = pd.DataFrame({'productids': productids, 'previous_products': previous_products}, index=range(len(productids)))
df_test
以下是我尝试应用的过滤器:
df_test.productids.isin(test.previous_products)
这背后的逻辑是,我需要知道列1上的id是否存在于列2上设置的id中。第2列是其他一组函数的结果,用于计算每个客户机以前的产品。我现在所做的有点像这样:
for i, row in df_test.iterrows():
if row['productids'] in row['previous_products']:
**do more stuff**
else:
**do different stuff**
问题是,随着df变大,完成循环需要很长时间
还有其他建议吗
相关问题 更多 >
编程相关推荐