如何矢量化两列之间的比较

2024-06-02 06:17:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将各种过滤器应用于一个新的df。如果可能的话,我希望应用向量运算,而不是在每一行中循环(我现在正在这样做,速度慢得令人无法接受)。然而,有些过滤器不是很简单

productids = [
    '01t0J00000HcoqpQAB', '01t0J00000HcoqnQAB', '01t0J00000HcoqyQAB',
    '01t0J00000Hcor3QAB', '01t0J00000Hcor5QAB', '01t0J00000Hcor6QAB',
    '01t0J00000Hcor9QAB', '01t0J00000HcorCQAR', '01t0J00000HcorGQAR',
    '01t0J00000IDGAOQA5'
]

previous_products = [
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'},
{'01t0J00000Hcor3QAB', '01t0J00000IDGAOQA5', '01t0J00000Hcor5QAB', '01t0J00000HcoqyQAB', '01t0J00000Hcor9QAB', '01t0J00000HcorGQAR', '01t0J00000Hcor6QAB', '01t0J00000HcorCQAR', '01t0J00000HcoqnQAB'}
]

df_test = pd.DataFrame({'productids': productids, 'previous_products': previous_products}, index=range(len(productids)))

df_test

以下是我尝试应用的过滤器:

df_test.productids.isin(test.previous_products)

这背后的逻辑是,我需要知道列1上的id是否存在于列2上设置的id中。第2列是其他一组函数的结果,用于计算每个客户机以前的产品。我现在所做的有点像这样:

for i, row in df_test.iterrows():
    if row['productids'] in row['previous_products']:
        **do more stuff**
    else:
        **do different stuff**

问题是,随着df变大,完成循环需要很长时间

还有其他建议吗


1条回答
网友
1楼 · 发布于 2024-06-02 06:17:00
df1 = pd.DataFrame([[1,4],[2,5],[3,6]), columns=['col1','col2'])
df2 = pd.DataFrame([1,2], columns = ['lookup_col'])

df_merge = df1.merge(df2, left_on='col1', right_on='lookup_col')

相关问题 更多 >