Numpy hstack-“ValueError:所有输入数组必须具有相同的维数”-但是它们问题的回答

Numpy hstack-“ValueError:所有输入数组必须具有相同的维数”-但是它们

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在尝试加入两个numpy数组。在其中一个例子中，我在一列文本上运行TF-IDF之后拥有一组列/特性。另一个是一个列/特性，它是一个整数。所以我读了一列训练和测试数据，在上面运行TF-IDF，然后我想添加另一个整数列，因为我认为这将帮助我的分类器更准确地了解它的行为。 不幸的是，当我试图运行<code>hstack</code>将这个列添加到另一个numpy数组时，标题中出现了错误。 这是我的代码： <pre><code> #reading in test/train data for TF-IDF traindata = list(np.array(p.read_csv('FinalCSVFin.csv', delimiter=";"))[:,2]) testdata = list(np.array(p.read_csv('FinalTestCSVFin.csv', delimiter=";"))[:,2]) #reading in labels for training y = np.array(p.read_csv('FinalCSVFin.csv', delimiter=";"))[:,-2] #reading in single integer column to join AlexaTrainData = p.read_csv('FinalCSVFin.csv', delimiter=";")[["alexarank"]] AlexaTestData = p.read_csv('FinalTestCSVFin.csv', delimiter=";")[["alexarank"]] AllAlexaAndGoogleInfo = AlexaTestData.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>(AlexaTrainData) tfv = TfidfVectorizer(min_df=3, max_features=None, strip_accents='unicode', analyzer='word',token_pattern=r'\w{1,}',ngram_range=(1, 2), use_idf=1,smooth_idf=1,sublinear_tf=1) #tf-idf object rd = lm.LogisticRegression(penalty='l2', dual=True, tol=0.0001, C=1, fit_intercept=True, intercept_scaling=1.0, class_weight=None, random_state=None) #Classifier X_all = traindata + testdata #adding test and train data to put into tf-idf lentrain = len(traindata) #find length of train data tfv.fit(X_all) #fit tf-idf on all our text X_all = tfv.transform(X_all) #transform it X = X_all[:lentrain] #reduce to size of training set AllAlexaAndGoogleInfo = AllAlexaAndGoogleInfo[:lentrain] #reduce to size of training set X_test = X_all[lentrain:] #reduce to size of training set #printing debug info, output below : print "X.shape => " + str(X.shape) print "AllAlexaAndGoogleInfo.shape => " + str(AllAlexaAndGoogleInfo.shape) print "X_all.shape => " + str(X_all.shape) #line we get error on X = np.hstack((X, AllAlexaAndGoogleInfo)) </code></pre> 以下是输出和错误消息： <pre><code>X.shape => (7395, 238377) AllAlexaAndGoogleInfo.shape => (7395, 1) X_all.shape => (10566, 238377) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-12-2b310887b5e4> in <module>() 31 print "X_all.shape => " + str(X_all.shape) 32 #X = np.column_stack((X, AllAlexaAndGoogleInfo)) ---> 33 X = np.hstack((X, AllAlexaAndGoogleInfo)) 34 sc = preprocessing.StandardScaler().fit(X) 35 X = sc.transform(X) C:\Users\Simon\Anaconda\lib\site-packages\numpy\core\shape_base.pyc in hstack(tup) 271 # As a special case, dimension 0 of 1-dimensional arrays is "horizontal" 272 if arrs[0].ndim == 1: --> 273 return _nx.concatenate(arrs, 0) 274 else: 275 return _nx.concatenate(arrs, 1) ValueError: all the input arrays must have same number of dimensions </code></pre> 是什么导致了我的问题？我该怎么解决？据我所知，我应该能够加入这些列？我误解了什么？ 谢谢你。 编辑： 使用下面答案中的方法将得到以下错误： <pre><code>--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-16-640ef6dd335d> in <module>() ---> 36 X = np.column_stack((X, AllAlexaAndGoogleInfo)) 37 sc = preprocessing.StandardScaler().fit(X) 38 X = sc.transform(X) C:\Users\Simon\Anaconda\lib\site-packages\numpy\lib\shape_base.pyc in column_stack(tup) 294 arr = array(arr,copy=False,subok=True,ndmin=2).T 295 arrays.append(arr) --> 296 return _nx.concatenate(arrays,1) 297 298 def dstack(tup): ValueError: all the input array dimensions except for the concatenation axis must match exactly </code></pre> 有趣的是，我试图打印X的<code>dtype</code>，结果很好： <pre><code>X.dtype => float64 </code></pre> 但是，尝试按如下方式打印<code>AllAlexaAndGoogleInfo</code>的数据类型： <pre><code>print "AllAlexaAndGoogleInfo.dtype => " + str(AllAlexaAndGoogleInfo.dtype) </code></pre> 产生： <pre><code>'DataFrame' object has no attribute 'dtype' </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

Numpy hstack-“ValueError:所有输入数组必须具有相同的维数”-但是它们

1 个回答

相关Python问题