如何用特殊字符替换列中的空值

2024-05-21 03:13:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,列名称如下:

Column (Name)     Column Name 2   Column3   Column (4)
NULL                 NULL             C3       100
22                    C44            C55       NULL
2                      C5            C11       13

我希望用平均值和最小值替换子集Column (Name)Column (4)中的空值。如何做到这一点?Column (Name)Column (4)中的值是数字

 df['Column (Name)']=df['Column (Name)'].fillna(df['Column (Name)'].mean())
 df['Column (4)']=df['Column (4)'].fillna(df['Column (4)'].min())

我得到以下错误:

TypeError: can only concatenate str (not "int") to str

预期产出:

 Column (Name)     Column Name 2   Column3   Column (4)
    12                 NULL            C3        100
    22                  C44           C55        13
    2                    C5              C11       13

Tags: 数据name名称dfcolumnnull平均值str
3条回答

尝试连接字符串和整数时会引发此错误。仅当存在相同类型时才能连接。尝试使用str()方法将整数转换为字符串

实际上使用你的代码我没有错误。请将dtypes与我的代码进行比较

import io
import pandas as pd

读取您的数据

df = pd.read_csv(io.StringIO("""
Column (Name)     Column Name 2   Column3   Column (4)
NULL                 NULL             C3       100
22                    C44            C55       NULL
2                      C5            C11       13
"""), sep="\s\s+", engine="python")

检查数据类型

df.dtypes

Column (Name)    float64
Column Name 2     object
Column3           object
Column (4)       float64
dtype: object

填写平均值和最小值的代码

df['Column (Name)']=df['Column (Name)'].fillna(df['Column (Name)'].mean())
df['Column (4)']=df['Column (4)'].fillna(df['Column (4)'].min())

填充值为12.0和13.0

您的错误意味着列中有一些非数值

测试列是否为数字,如果不是,则将其转换为df.dtypes:

print(df.dtypes)

然后您可以测试哪些值是错误的:

print (df.loc[pd.to_numeric(df['Column (Name)'], errors='coerce').isna(), 'Column (Name)'])

最后转换为数字:

df['Column (Name)'] = pd.to_numeric(df['Column (Name)'], errors='coerce')
df['Column (4)'] = pd.to_numeric(df['Column (4)'], errors='coerce')

或者,如果要转换多个列:

cols = ['Column (Name)','Column (4)']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')

然后使用您的解决方案:

df['Column (Name)']=df['Column (Name)'].fillna(df['Column (Name)'].mean())
df['Column (4)']=df['Column (4)'].fillna(df['Column (4)'].min())

或者您可以使用^{}

df = df.fillna(df.agg({'Column (Name)':'mean', 'Column (4)':'min'}))
print (df)
   Column (Name) Column Name 2 Column3  Column (4)
0           12.0           NaN      C3       100.0
1           22.0           C44     C55        13.0
2            2.0            C5     C11        13.0

相关问题 更多 >