函数的输出不一致np.std标准作为函数参数

1个数据集：

首先，以iris数据集为例。我将数据的第一列缩放如下。你知道吗

from sklearn import datasets import numpy as np from sklearn.preprocessing import StandardScaler iris = datasets.load_iris() X_train = iris.data[:,[1]] # my X_train is the first column if iris data sc = StandardScaler() sc.fit(X_train) # Using StandardScaler to scale it!

问题2：没有更改默认值ddof = 0我得到了不同的np.std标准!你知道吗

import pandas as pd import sys print("The mean and std(sample std) of X_train is :") print(pd.DataFrame(X_train).apply([np.mean,np.std],axis = 0),"\n") print("The std(population std) of X_train is :") print(pd.DataFrame(X_train).apply(np.std,axis = 0),"\n") print("The std(population std) of X_train is :","{0:.6f}".format(sc.scale_[0]),'\n') print("Python version:",sys.version, "\npandas version:",pd.__version__, "\nsklearn version:",sklearn.__version__)

输出：

The mean and std(sample std) of X_train is : 0 mean 3.057333 std 0.435866 The std(population std) of X_train is : 0 0.434411 dtype: float64 The std(population std) of X_train is : 0.434411 Python version: 3.7.1 (default, Dec 10 2018, 22:54:23) [MSC v.1915 64 bit (AMD64)] pandas version: 0.23.4 sklearn version: 0.20.1

从以上结果来看，pd.DataFrame(X_train).apply([np.mean,np.std],axis = 0)给出样本std 0.435866，pd.DataFrame(X_train).apply(np.std,axis = 0)给出总体std 0.434411。你知道吗

2条回答

网友

1楼 · 编辑于 2024-04-19 09:36:47

你能替换吗下面

print(pd.DataFrame(X_train).apply(np.std,axis = 0),"\n")

用这个

print(pd.DataFrame(X_train).apply([np.std],axis = 0),"\n")

网友

2楼 · 编辑于 2024-04-19 09:36:47

这种行为的原因可以在对一个系列的.apply()的评估中找到（也许是不雅的）。如果您有一个look at the source code，您会发现以下几行：

if isinstance(func, (list, dict)):
    return self.aggregate(func, *args, **kwds)

这意味着：如果调用apply([func])，结果可能与apply(func)不同！关于np.std，我建议使用内置的df.std()方法或者df.describe()。你知道吗

您可以尝试以下代码，以了解哪些有效，哪些无效：

import numpy as np
import pandas as pd

print(10*"-","Showing ddof impact",10*"-")

print(np.std([4,5], ddof=0)) # 0.5      ## N   (population's standard deviation)
print(np.std([4,5], ddof=1)) # 0.707... # N-1 (unbiased sample variance)

x = pd.Series([4,5])

print(10*"-","calling builtin .std() on Series",10*"-")
print(x.std(ddof=0)) # 0.5
print(x.std()) # 0.707

df=pd.DataFrame([[4,5],[5,6]], columns=['A', 'B'])

print(10*"-","calling builtin .std() on DF",10*"-")

print(df["A"].std(ddof=0))# 0.5
print(df["B"].std(ddof=0))# 0.5
print(df["A"].std())# 0.707
print(df["B"].std())# 0.707

print(10*"-","applying np.std to whole DF",10*"-")
print(df.apply(np.std,ddof=0)) # A = 0.5,  B = 0.5
print(df.apply(np.std,ddof=1)) # A = 0.707 B = 0.707

# print(10*"-","applying [np.std] to whole DF WONT work",10*"-")
# print(df.apply([np.std],axis=0,ddof=0)) ## this WONT Work
# print(df.apply([np.std],axis=0,ddof=1)) ## this WONT Work

print(10*"-","applying [np.std] to DF columns",10*"-")
print(df["A"].apply([np.std])) # 0.707
print(df["A"].apply([np.std],ddof=1)) # 0.707

print(10*"-","applying np.std to DF columns",10*"-")
print(df["A"].apply(np.std)) # 0: 0 1: 0 WHOOPS !! #<          -
print(30*"-")

您还可以通过apply调用自己的函数来了解发生了什么：

def myFun(a):
    print(type(a))
    return np.std(a,ddof=0)

print("> 0",20*"-")    
print(x.apply(myFun))
print("> 1",20*"-","## <- only this will be applied to the Series!")
print(df.apply(myFun))
print("> 2",20*"-","## <- this will be applied to each Int!")
print(df.apply([myFun]))
print("> 3",20*"-")
print(df["A"].apply(myFun))
print("> 4",20*"-")
print(df["A"].apply([myFun]))

1个数据集：

问题2：没有更改默认值`ddof = 0`我得到了不同的np.std标准!你知道吗

3我的问题：

相关问题更多 >

编程相关推荐

热门问题

热门文章