当每行都是列表时,替换不适用于多个字符串替换

2024-03-28 16:11:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从我的数据帧中构建一个函数来替换httphttpscomwww

df

content                                                       Col2  Col3   Col4
[www,roger, that,com, http, great, hi, www]                   89     78     40
[http, https,www,roger, http, for,com, http, you, bye, www]   93     94     30
and so one...there are 30,000 rows 

并不是说每一行都是我的数据集中列内容的列表

定义功能

def replace(df):
    for row in df:
        for index, item in enumerate(row):
            # create string *and update row*
            row[index] = item.replace("www", " ")
            row[index] = item.replace("http", " ")
            row[index] = item.replace("https", " ")
            row[index] = item.replace("com", " ")
    return df

调用函数

df['content']=replace(df['content'])

问题是www被替换了,但http、https和com没有被替换。我做错了什么


1条回答
网友
1楼 · 发布于 2024-03-28 16:11:13

您可以对列进行简单的列表理解:

rep = ['http', 'https', 'www', 'com']
df['col2'] = df['col1'].apply(lambda x: [i for i in x if i not in rep])

                                            col1                      col2
0  [www, roger, that, com, http, great, hi, www]  [roger, that, great, hi]
1                 [http, https, www, roger, for]              [roger, for]

样本数据

cl=[["www","roger", "that","com", "http", "great", "hi", "www"],
    ["http", "https", "www","roger","for"]]

df = pd.DataFrame({'col1': cl})

相关问题 更多 >