在一系列Python中使用lambda删除特定时间以下出现的单词 - 问答 - Python中文网

在一系列Python中使用lambda删除特定时间以下出现的单词

2024-04-29 09:09:32 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我在dataframedf中有41000个庞大的行序列：

column1                                   column2     column2
content in not below like this amsterdam  nan         sport
massive create non-programming question   nan         religion

我想删除column1中5次或5次以下出现的所有单词，因此df数据框如下所示：

column1                                   column2     column2
content amsterdam                         nan         sport
massive create non-programming question   nan         religion

有人能帮我吗

我最初的尝试是这样的：

df['column1'] = df['column1'].apply(filter(lambda x : (x, df['column1'].count < 4)), set(df['column1']))

但我收到的错误信息是：

TypeError: filter expected 2 arguments, got 1

Tags： df create content nan filter programming question non

1条回答

网友

1楼 · 发布于 2024-04-29 09:09:32

可能最好使用一个函数，因为单个lambda太复杂且不太干净

regex将元素转换为单词列表。然后afilter保留大于5的单词

import re

def remove_five_or_less(line):
    word_list = re.sub("[^\w]", " ",  line["column1"]).split()
    filtered_list = filter(lambda x: len(x) > 5, word_list)
    return " ".join(filtered_list)

df["column1"] = df.apply(lambda x: remove_five_or_less(x), axis=1)

相关问题更多 >

编程相关推荐

热门问题

热门文章