如何过滤数据帧列表中的数字(n>3)?

2024-04-25 14:16:18 发布

您现在位置:Python中文网/ 问答频道 /正文

movie_id    user_id      rating
0   1   [5, 2, 1, 6]    [4, 4, 5, 4]
1   2   [5, 1]          [3, 3]
2   3   [1]             [4]
3   4   [1]             [3]
4   5   [1]             [3]
5   6   [1]             [5]
6   7   [6, 1]          [2, 4]
7   8   [1, 6]          [1, 4]
8   9   [1, 6]          [5, 4]

我试图得到“评级”中每行大于3的数字的计数。例如,[4,4,5,5]=>;4/[3,3]=>;0。你知道吗

以下是我迄今为止所做的:

appr = df.copy()

appr['approval'] = appr['rating'].map(Counter)
appr

并输出:

    movie_id    user_id   rating        approval
0   1        [5, 2, 1, 6][4, 4, 5, 4]   {4: 3, 5: 1}
1   2        [5, 1]      [3, 3]         {3: 2}
2   3        [1]         [4]            {4: 1}
3   4        [1]         [3]            {3: 1}
4   5        [1]         [3]            {3: 1}
5   6        [1]         [5]            {5: 1}
6   7        [6, 1]      [2, 4]         {2: 1, 4: 1}
7   8        [1, 6]      [1, 4]         {1: 1, 4: 1}
8   9        [1, 6]      [5, 4]         {5: 1, 4: 1}

我的目标是过滤出每一行“rating”中不大于3的数字,并对它们的出现次数求和:

    movie_id    user_id   rating        approval       appr_sum
0   1        [5, 2, 1, 6][4, 4, 5, 4]   {4: 3, 5: 1}   4
1   2        [5, 1]      [3, 3]         {3: 2}         0
2   3        [1]         [4]            {4: 1}         1
3   4        [1]         [3]            {3: 1}         0
4   5        [1]         [3]            {3: 1}         0
5   6        [1]         [5]            {5: 1}         1
6   7        [6, 1]      [2, 4]         {2: 1, 4: 1}   1
7   8        [1, 6]      [1, 4]         {1: 1, 4: 1}   1
8   9        [1, 6]      [5, 4]         {5: 1, 4: 1}   2

我试过:

s = appr['rating'].map

t = [x for x in s if x > 3]
t

但是有一个TypeError:“method”对象是不可iterable的,如果这部分代码正确的话,它就不会对它们的出现进行求和。你知道吗


Tags: gtidmap目标dfcounter数字movie
3条回答

一个更好的办法是避免列表串联。取而代之的是:

  1. 将列表系列扩展到其他列中。你知道吗
  2. 将列表系列展开为多行。你知道吗

这两个选项都支持矢量化计算。选择第一个选项:

rats = pd.DataFrame(df.pop('rating').values.tolist()).add_suffix('rat')
appr = appr.join(rats).assign(appr_sum=rats.gt(3).sum(1))

将嵌套列表理解与筛选和sum一起使用:

appr['appr_sum'] = [sum(v for k, v in x.items() if k > 3) for x in appr['approval']]
print (appr)
   movie_id       user_id        rating      approval  appr_sum
0         1  [5, 2, 1, 6]  [4, 4, 5, 4]  {4: 3, 5: 1}         4
1         2        [5, 1]        [3, 3]        {3: 2}         0
2         3           [1]           [4]        {4: 1}         1
3         4           [1]           [3]        {3: 1}         0
4         5           [1]           [3]        {3: 1}         0
5         6           [1]           [5]        {5: 1}         1
6         7        [6, 1]        [2, 4]  {2: 1, 4: 1}         1
7         8        [1, 6]        [1, 4]  {1: 1, 4: 1}         1
8         9        [1, 6]        [5, 4]  {5: 1, 4: 1}         2

表达式不起作用的原因是您对一个系列的迭代不正确。更简单的方法是:

import pandas as pd

df = pd.DataFrame({'A': [1, 3, 4]})

a = [x for _, x in df.iterrows() if x['A'] > 3]
print(a)

> [A]
  [4]

相关问题 更多 >