在python中根据给定查询的优先级计算其前10个产品

2024-05-29 01:36:58 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我们得到一个数据帧,如:

                       Query  Productid  priority
index
0                        3ds    2125233  0.018946
1                        rca    2009324  0.027599
2                       nook    1517163  0.009443
3                        rca    2877125  0.012054
4                        rca    2877134  0.005557
5              flatscreentvs    2416092  0.011961
6                    macbook    3108172  0.010459
7                        3ds    2264036  0.165948
8                        rca    8280834  0.004006
9                 memorycard    2740208  0.013744
10               acpowercord    2584273  0.006865
11                zaggiphone    1230537  0.136073
12            watchthethrone    3168067  0.104679
13     remotecontrolextender    7997055  0.113058
14                 camcorder    2009041  0.017809
15                       3ds    1988047  0.031711
16                       3ds    1686079  0.043783
17        wirelessheadphones    3770439  0.014714
18        wirelessheadphones    2602403  0.008525
19                 samsung40    2126065  0.018066

我想根据给定查询的优先级来查找前2个product_ids。你知道吗

例如,如果我们有query=3ds,那么排名前2的产品应该是:

1. 1988047 
2. 1686079 

Tags: 数据indexquerymacbookpriorityrcaproductidnook
2条回答

这相当于Oracle的row_number()分析函数:

In [172]: df.assign(rn=df.sort_values('priority', ascending=0).groupby('Query').cumcount() + 1).query('rn < 3').sort_values(['Query','rn'])
Out[172]:
                       Query  Productid  priority  rn
index
7                        3ds    2264036  0.165948   1
16                       3ds    1686079  0.043783   2
10               acpowercord    2584273  0.006865   1
14                 camcorder    2009041  0.017809   1
5              flatscreentvs    2416092  0.011961   1
6                    macbook    3108172  0.010459   1
9                 memorycard    2740208  0.013744   1
2                       nook    1517163  0.009443   1
1                        rca    2009324  0.027599   1
3                        rca    2877125  0.012054   2
13     remotecontrolextender    7997055  0.113058   1
19                 samsung40    2126065  0.018066   1
12            watchthethrone    3168067  0.104679   1
17        wirelessheadphones    3770439  0.014714   1
18        wirelessheadphones    2602403  0.008525   2
11                zaggiphone    1230537  0.136073   1

为选定的Query显示Productid

In [180]: (df.assign(rn=df.sort_values('priority', ascending=0).groupby('Query').cumcount() + 1)
   .....:    .query('Query=="3ds" and rn < 3')['Productid']
   .....: )
Out[180]:
index
7     2264036
16    1686079
Name: Productid, dtype: int64

IIUC用途:

print (df.set_index('Productid').groupby('Query')['priority'].nlargest(2).reset_index())
                    Query  Productid  priority
0                     3ds    2264036  0.165948
1                     3ds    1686079  0.043783
2             acpowercord    2584273  0.006865
3               camcorder    2009041  0.017809
4           flatscreentvs    2416092  0.011961
5                 macbook    3108172  0.010459
6              memorycard    2740208  0.013744
7                    nook    1517163  0.009443
8                     rca    2009324  0.027599
9                     rca    2877125  0.012054
10  remotecontrolextender    7997055  0.113058
11              samsung40    2126065  0.018066
12         watchthethrone    3168067  0.104679
13     wirelessheadphones    3770439  0.014714
14     wirelessheadphones    2602403  0.008525
15             zaggiphone    1230537  0.136073

相关问题 更多 >

    热门问题