TypeError:float()参数必须是字符串或数字,而不是“function”–Python/Sklearn

2024-05-23 19:45:16 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是一个名为Flights.py的程序的代码片段

...
#Load the dataset
df = dataset
df.isnull().any()
df = df.fillna(lambda x: x.median())

# Define X and Y
X = df.iloc[:, 2:124].values
y = df.iloc[:, 136].values
X_tolist = X.tolist()

# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

倒数第二行引发以下错误:

Traceback (most recent call last):

  File "<ipython-input-14-d4add2ccf5ab>", line 3, in <module>
    X_train = sc.fit_transform(X_train)

  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/base.py", line 494, in fit_transform
    return self.fit(X, **fit_params).transform(X)

  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 560, in fit
    return self.partial_fit(X, y)

  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 583, in partial_fit
    estimator=self, dtype=FLOAT_DTYPES)

  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/utils/validation.py", line 382, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)

TypeError: float() argument must be a string or a number, not 'function'

我的数据帧df大小(22587138)

我在看下面的问题以获得灵感:

TypeError: float() argument must be a string or a number, not 'method' in Geocoder

我尝试了以下调整:

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train.as_matrix)
X_test = sc.transform(X_test.as_matrix)

导致以下错误:

AttributeError: 'numpy.ndarray' object has no attribute 'as_matrix'

对于如何扫描数据帧并查找/转换有问题的条目,我目前感到不知所措。


Tags: inpytestdflineusernametransformtrain
3条回答
df = df.fillna(lambda x: x.median())

这不是真正有效的使用fillna的方法。这里需要文本值,或者从列到文本值的映射。它不会应用您提供的函数;相反,NA cells的值将简单地设置为函数本身。这是你的估计器试图变成浮点数的函数。

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html

正如this answer所解释的,fillna不是设计用于回调的。如果您传递一个,它将被视为文本填充值,这意味着您的NaN将被lambdas替换:

df

      col1  col2  col3  col4
row1  65.0    24  47.0   NaN
row2  33.0    48   NaN  89.0
row3   NaN    34  67.0   NaN
row4  24.0    12  52.0  17.0

df4.fillna(lambda x: x.median())

                                    col1  col2  \
row1                                  65    24   
row2                                  33    48   
row3  <function <lambda> at 0x10bc47730>    34   
row4                                  24    12   

                                    col3                                col4  
row1                                  47  <function <lambda> at 0x10bc47730>  
row2  <function <lambda> at 0x10bc47730>                                  89  
row3                                  67  <function <lambda> at 0x10bc47730>  
row4                                  52                                  17 

如果您试图按中间值填充,解决方案是基于列创建一个中间值数据帧,并将其传递给fillna

df
      col1  col2  col3  col4
row1  65.0    24  47.0   NaN
row2  33.0    48   NaN  89.0
row3   NaN    34  67.0   NaN
row4  24.0    12  52.0  17.0

df.fillna(df.median())
df 
      col1  col2  col3  col4
row1  65.0    24  47.0  53.0
row2  33.0    48  52.0  89.0
row3  33.0    34  67.0  53.0
row4  24.0    12  52.0  17.0

我在使用df = df.fillna(lambda x: x.median())时遇到了同样的问题 我的解决方案是将真实值而不是“函数”放入数据帧:

# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np

我用nan创建数据帧10行,3列

df = pd.DataFrame(np.random.randint(100,size=(10,3)))
df.iloc[3:5,0] = np.nan
df.iloc[4:6,1] = np.nan
df.iloc[5:8,2] = np.nan

属性愚蠢的列标签以便以后使用

df.columns=['Number_of_Holy_Hand_Grenades_of_Antioch', 'Number_of_knight_fleeings', 'Number_of_rabbits_of_Caerbannog']

print df.isnull().any()  # tell if nan per column

对于通过标签的每一列,我们用列本身计算出的中值填充所有nan值。可与mean()等一起使用

for i in df.columns:     #df.columns[w:] if you have w column of line description 
    df[i] = df[i].fillna(df[i].median() )
print df.isnull().any()

现在df包含用中值替换的nan

print df

例如,你可以

X = df.ix[:,:].values
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

不适用于df = df.fillna(lambda x: x.median()) 我们现在可以使用df into forward方法,因为所有值都是真值,而不是函数;与使用lambda into dataframe.fillna()的方法相反,例如all proposals using fillna combined to lambda

相关问题 更多 >