在datafram的特定列上应用函数

3条回答

网友

1楼 · 编辑于 2024-04-19 20:52:36

Do not use ^{}。相反，应该使用.loc设置子集上的值。这是顶格的简单映射。你知道吗

m = train.Age.isnull()
d = {1: 38, 2: 30, 3: 25}

train.loc[m, 'Age'] = train.loc[m, 'Pclass'].map(d)

对于最下面的情况，因为else子句，我们可以使用np.select。其工作方式是创建一个条件列表，该列表遵循if，elif-else逻辑的顺序。然后我们提供一个选择列表，当我们遇到第一个True时从中进行选择。既然有嵌套逻辑，我们需要首先取消它的注释，以便它在逻辑上读作

if age is null and pclass == 1
elif age is null and pclass == 2
elif age is null 
else

样本数据

import pandas as pd
import numpy as np

df = pd.DataFrame({'Age': [50, 60, 70, np.NaN, np.NaN, np.NaN, np.NaN],
                   'Pclass': [1, 1, 1, 1, 2, np.NaN, 1]})
#    Age  Pclass
#0  50.0     1.0
#1  60.0     1.0
#2  70.0     1.0
#3   NaN     1.0
#4   NaN     2.0
#5   NaN     NaN
#6   NaN     1.0

m = df.Age.isnull()
conds = [m & df.Pclass.eq(1),
         m & df.Pclass.eq(2),
         m]
choices = [37, 29, 24]

df['Age'] = np.select(conds, choices, default=df.Age)
                                      # |
                                      # Takes care of else, i.e. Age not null
print(df)
#    Age  Pclass
#0  50.0     1.0
#1  60.0     1.0
#2  70.0     1.0
#3  37.0     1.0
#4  29.0     2.0
#5  24.0     NaN
#6  37.0     1.0

网友

2楼 · 编辑于 2024-04-19 20:52:36

欢迎使用Python。为了回答你的问题，特别是在开始阶段，有时你只需要打开一个新的IPython笔记本，尝试一下：

In [1]: import pandas as pd
   ...: def function(x):
   ...:     return x+1
   ...:
   ...: df = pd.DataFrame({'values':range(10)})
   ...: print(df)
   ...:
   values
0       0
1       1
2       2
3       3
4       4
5       5
6       6
7       7
8       8
9       9

In [2]: print(df.apply(function))
   values
0       1
1       2
2       3
3       4
4       5
5       6
6       7
7       8
8       9
9      10

在您的问题中，cols是您循环的每一行的值。你知道吗

网友

3楼 · 编辑于 2024-04-19 20:52:36

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

在panda数据帧上使用apply方法时，将对每一列（或每一行，取决于axis参数，该参数默认为0，即列轴）调用要应用的函数。因此，函数必须为apply将传递给它的行设置一个参数。你知道吗

def include_mean():
    if pd.isnull('Age'):
        if 'Pclass'==1:
            return 38
        elif 'Pclass'==2:
            return 30
        elif 'Pclass'==3:
            return 25
        else: return 'Age'

这有几个问题。你知道吗

'Pclass'==1:保证是False，因为您比较的是一个字符串（'Pclass'）和一个整数（1），它们不能相等。您需要比较一列的Pclass项的值，您可以通过索引列来检索：col["Pclass"]，或者col[1]，如果Pclass是第二列。你知道吗
如果pd.isnull('Age')是False，则函数返回None。因为字符串'Age'不是空的，所以应该总是这样。执行d.apply(include_mean())时，调用include_mean，返回None，然后将该值传递给apply。但是apply需要一个可调用的函数（例如函数）。你知道吗
在else子句中，返回字符串'Age'。这意味着您的数据帧在某些单元格中具有值'Age'。你知道吗

您的第二个示例解决了这些问题：impute年龄函数现在为行（cols）获取一个参数，Age和Pclass列的值被查找和比较，并且您传递该函数而不调用它到apply方法。你知道吗

样本数据

相关问题更多 >

编程相关推荐

热门问题

热门文章