删除没有最长lis的数据帧行

2024-04-25 00:32:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我的搜索技能一定让我失望了,因为这是一个常见的问题。我有一个嵌套列表的数据帧,正在尝试删除所有没有最长列表的数据帧:

df = pd.DataFrame(data = [["a", "b", "c", ["d", "e"]],
                          ["a", "b", "c", ["e"]],
                          ["l", "m", "n", ["o"]], 
                  columns = ["c1", "c2", "c3", "c4"])

# max doesn't evaluate length ~ this is wrong
df.groupby(by=["c1", "c2", "c3"])["c4"].apply(max)
c1  c2  c3
a   b   c        [e]
l   m   n        [o]
Name: c4, dtype: object

# but length does ~ but using an int to equate to another row isn't guaranteed
df.groupby(by=["c1", "c2", "c3"])["c4"].apply(len)
c1  c2  c3
a   b   c     2
l   m   n     1
Name: c4, dtype: int64

这些必须首先分组,因为这三列中的每一列都构成一个唯一的主密钥,我需要从中获得最长的列表。每个组也有不同长度的列表,对于大多数组,它的大小是1,对于其他组,它可以高达5。最终目标应该是这样一个新的数据帧:

c1  c2  c3  c4
a   b   c   ["d", "e"]
l   m   n   ["o"]

Tags: 数据namedf列表bylengthmaxbut
1条回答
网友
1楼 · 发布于 2024-04-25 00:32:22

这个怎么样:

df = pd.DataFrame(data =[["a", "b", "c", ["d", "e"]],
                         ["a", "b", "c", ["e"]],
                         ["l", "m", "n", ["o"]]],
                  columns = ["c1", "c2", "c3", "c4"])

df['len'] = df['c4'].apply(len)

max_groups = df[df.groupby(['c1', 'c2', 'c3'])['len'].transform(max) == df['len']]

我们在c4中添加一个与列表长度相对应的额外列,然后将数据帧过滤到那些c4长度与分组的最大长度c4相同的记录。它将max_groups返回为:

  c1 c2 c3      c4  len
0  a  b  c  [d, e]    2
2  l  m  n     [o]    1

相关问题 更多 >