Numpy hstack-“ValueError:所有输入数组必须具有相同的维数”-但是它们

#reading in test/train data for TF-IDF traindata = list(np.array(p.read_csv('FinalCSVFin.csv', delimiter=";"))[:,2]) testdata = list(np.array(p.read_csv('FinalTestCSVFin.csv', delimiter=";"))[:,2]) #reading in labels for training y = np.array(p.read_csv('FinalCSVFin.csv', delimiter=";"))[:,-2] #reading in single integer column to join AlexaTrainData = p.read_csv('FinalCSVFin.csv', delimiter=";")[["alexarank"]] AlexaTestData = p.read_csv('FinalTestCSVFin.csv', delimiter=";")[["alexarank"]] AllAlexaAndGoogleInfo = AlexaTestData.append(AlexaTrainData) tfv = TfidfVectorizer(min_df=3, max_features=None, strip_accents='unicode', analyzer='word',token_pattern=r'\w{1,}',ngram_range=(1, 2), use_idf=1,smooth_idf=1,sublinear_tf=1) #tf-idf object rd = lm.LogisticRegression(penalty='l2', dual=True, tol=0.0001, C=1, fit_intercept=True, intercept_scaling=1.0, class_weight=None, random_state=None) #Classifier X_all = traindata + testdata #adding test and train data to put into tf-idf lentrain = len(traindata) #find length of train data tfv.fit(X_all) #fit tf-idf on all our text X_all = tfv.transform(X_all) #transform it X = X_all[:lentrain] #reduce to size of training set AllAlexaAndGoogleInfo = AllAlexaAndGoogleInfo[:lentrain] #reduce to size of training set X_test = X_all[lentrain:] #reduce to size of training set #printing debug info, output below : print "X.shape => " + str(X.shape) print "AllAlexaAndGoogleInfo.shape => " + str(AllAlexaAndGoogleInfo.shape) print "X_all.shape => " + str(X_all.shape) #line we get error on X = np.hstack((X, AllAlexaAndGoogleInfo))

X.shape => (7395, 238377) AllAlexaAndGoogleInfo.shape => (7395, 1) X_all.shape => (10566, 238377) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-12-2b310887b5e4> in <module>() 31 print "X_all.shape => " + str(X_all.shape) 32 #X = np.column_stack((X, AllAlexaAndGoogleInfo)) ---> 33 X = np.hstack((X, AllAlexaAndGoogleInfo)) 34 sc = preprocessing.StandardScaler().fit(X) 35 X = sc.transform(X) C:\Users\Simon\Anaconda\lib\site-packages\numpy\core\shape_base.pyc in hstack(tup) 271 # As a special case, dimension 0 of 1-dimensional arrays is "horizontal" 272 if arrs[0].ndim == 1: --> 273 return _nx.concatenate(arrs, 0) 274 else: 275 return _nx.concatenate(arrs, 1) ValueError: all the input arrays must have same number of dimensions

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-16-640ef6dd335d> in <module>() ---> 36 X = np.column_stack((X, AllAlexaAndGoogleInfo)) 37 sc = preprocessing.StandardScaler().fit(X) 38 X = sc.transform(X) C:\Users\Simon\Anaconda\lib\site-packages\numpy\lib\shape_base.pyc in column_stack(tup) 294 arr = array(arr,copy=False,subok=True,ndmin=2).T 295 arrays.append(arr) --> 296 return _nx.concatenate(arrays,1) 297 298 def dstack(tup): ValueError: all the input array dimensions except for the concatenation axis must match exactly

3条回答

网友

1楼 · 编辑于 2024-05-14 00:39:44

使用.column_stack。就像这样：

X = np.column_stack((X, AllAlexaAndGoogleInfo))

从docs：

Take a sequence of 1-D arrays and stack them as columns to make a single 2-D array. 2-D arrays are stacked as-is, just like with hstack.

网友

2楼 · 编辑于 2024-05-14 00:39:44

由于X是稀疏数组，因此使用scipy.sparse.hstack来连接数组，而不是numpy.hstack。在我看来，这个错误信息有点误导人。

这个最小的例子说明了这种情况：

import numpy as np
from scipy import sparse

X = sparse.rand(10, 10000)
xt = np.random.random((10, 1))
print 'X shape:', X.shape
print 'xt shape:', xt.shape
print 'Stacked shape:', np.hstack((X,xt)).shape
#print 'Stacked shape:', sparse.hstack((X,xt)).shape #This works

基于以下输出

X shape: (10, 10000)
xt shape: (10, 1)

人们可能希望下面这行中的hstack可以工作，但事实是它抛出了这个错误：

ValueError: all the input arrays must have same number of dimensions

因此，当有稀疏数组要堆栈时，请使用scipy.sparse.hstack。

事实上，我已经在你的另一个问题中作为评论回答了这个问题，你提到另一个错误信息弹出：

TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))

首先，AllAlexaAndGoogleInfo没有dtype，因为它是DataFrame。要获得它的底层numpy数组，只需使用AllAlexaAndGoogleInfo.values。检查它的dtype。根据错误消息，它有一个dtype的object，这意味着它可能包含字符串等非数字元素。

这是一个最小的例子，再现了这种情况：

X = sparse.rand(100, 10000)
xt = np.random.random((100, 1))
xt = xt.astype('object') # Comment this to fix the error
print 'X:', X.shape, X.dtype
print 'xt:', xt.shape, xt.dtype
print 'Stacked shape:', sparse.hstack((X,xt)).shape

错误消息：

TypeError: no supported conversion for types: (dtype('float64'), dtype('O'))

因此，在进行叠加之前，检查AllAlexaAndGoogleInfo中是否有任何非数值，并修复它们。

网友

3楼 · 编辑于 2024-05-14 00:39:44

尝试：

X = np.hstack((X, AllAlexaAndGoogleInfo.values))

我没有正在运行的熊猫模块，所以无法测试它。但是DataFrame文档描述了values Numpy representation of NDFrame。np.hstack是一个numpy函数，因此对DataFrame的内部结构一无所知。

相关问题更多 >

编程相关推荐

热门问题

热门文章