Python错误：使用一行数据的statsmodels时len()无法计算

Question

我在使用statsmodel的加权最小二乘回归（WLS）时，数据点很多的时候一切都很好。但是，当我尝试对数据集中的单个样本使用WLS时，似乎遇到了numpy数组的问题。

我的意思是，如果我有一个数据集X，它是一个二维数组，有很多行，WLS运行得很好。但如果我尝试只对一行进行操作，就不行了。下面的代码可以说明我的意思：

import sys
from sklearn.externals.six.moves import xrange
from sklearn.metrics import accuracy_score
import pylab as pl
from sklearn.externals.six.moves import zip
import numpy as np
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std

# this is my dataset X, with 10 rows
X = np.array([[1,2,3],[1,2,3],[4,5,6],[1,2,3],[4,5,6],[1,2,3],[1,2,3],[4,5,6],[4,5,6],[1,2,3]])
# this is my response vector, y, also with 10 rows
y = np.array([1, 1, 0, 1, 0, 1, 1, 0, 0, 1])
# weights, 10 rows
weights = np.array([ 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1, 0.1 , 0.1 ])

# the line below, using all 10 rows of X, gives no errors but is commented out
# mod_wls = sm.WLS(y, X, weights)
# and this is the line I need, which is giving errors:
mod_wls = sm.WLS(np.array(y[0]), np.array([X[0]]),np.array([weights[0]]))

上面最后一行最开始只是 mod_wls = sm.WLS(y[0], X[0], weights[0])

但是那样会出现错误，比如 object of type 'numpy.float64' has no len()，所以我把它们变成了数组。但现在我总是收到这个错误：

Traceback (most recent call last):
  File "C:\Users\app\Documents\Python Scripts\test.py", line 53, in <module>
    mod_wls = sm.WLS(np.array(y[0]), np.array([X[0]]),np.array([weights[0]]))
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\regression\linear_model.py", line 383, in __init__
    weights=weights, hasconst=hasconst)
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\regression\linear_model.py", line 79, in __init__
    super(RegressionModel, self).__init__(endog, exog, **kwargs)
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\model.py", line 136, in __init__
    super(LikelihoodModel, self).__init__(endog, exog, **kwargs)
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\model.py", line 52, in __init__
    self.data = handle_data(endog, exog, missing, hasconst, **kwargs)
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 401, in handle_data
    return klass(endog, exog=exog, missing=missing, hasconst=hasconst, **kwargs)
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 78, in __init__
    self._check_integrity()
  File "C:\Users\app\Anaconda\lib\site-packages\statsmodels\base\data.py", line 249, in _check_integrity
    print len(self.endog)
TypeError: len() of unsized object

为了弄清楚长度出了什么问题，我做了这个：

print "y size: "
print len(np.array([y[0]]))
print "X size"
print len (np.array([X[0]]))
print "weights size"
print len(np.array([weights[0]]))

然后得到了这个输出：

y size: 
1
X size
1
weights size
1

接着我尝试了这个：

print "x shape"
print X[0].shape
print "y shape"
print y[0].shape

输出是：

x shape
(3L,)
y shape
()

错误提到的data.py中的第249行有这个函数，我在里面加了一堆“打印大小”的代码，以便查看发生了什么：

def _check_integrity(self):
    if self.exog is not None:
        print "exog size: " 
        print len(self.exog)            
        print "endog size"
        print len(self.endog) # <-- this, and the line below are causing the error
        if len(self.exog) != len(self.endog):
            raise ValueError("endog and exog matrices are different sizes")

看起来 len(self.endog) 出了问题。虽然当我尝试打印 len(np.array([y[0]])) 时，它的输出是 1。但不知怎么的，当 y 进入check_integrity函数并变成 endog 时，它的表现就不一样了……或者说是发生了别的事情？

我该怎么办？我正在使用一个算法，确实需要对 X 的每一行单独运行WLS。

数据处理错误调试数据完整性 numpy数组 statsmodels 回归分析一维数组加权最小二乘回归

Python错误：使用一行数据的statsmodels时len()无法计算

1 个回答

撰写回答