CountVectorizer：AttributeError：'numpy.ndarray'对象没有'lower'属性

18 投票

4 回答

83996 浏览

提问于 2025-04-27 13:45

我有一个一维数组，里面的每个元素都是很长的字符串。我想用一个叫做 CountVectorizer 的工具把这些文本数据转换成数字向量。但是，我遇到了一个错误，错误提示是：

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

mealarray 里面的每个元素都是很长的字符串。总共有5000个这样的样本。我试着像下面这样进行向量化：

vectorizer = CountVectorizer(
    stop_words='english',
    ngram_range=(1, 1),  #ngram_range=(1, 1) is the default
    dtype='double',
)
data = vectorizer.fit_transform(mealarray)

完整的错误追踪信息：

File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform
    self.fixed_vocabulary_)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 748, in _count_vocab
    for feature in analyze(doc):
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 234, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 200, in <lambda>
    return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.ndarray' object has no attribute 'lower'

暂无标签

4 个回答

我遇到了同样的错误：

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

为了解决这个问题，我做了以下几步：

首先，检查数组的维度，使用： name_of_array1.shape
如果输出是：(n,1)，那么我用 flatten() 把二维数组转换成一维数组： flat_array = name_of_array1.flatten()
现在，我可以使用 CountVectorizer()，因为它可以处理只有一个字符串参数的列表。

回答于 2025-04-27 由 Python大师

分享举报

一个更好的解决办法是明确调用 pandas 的系列（series），然后把它传给 CountVectorizer():

>>> tex = df4['Text']
>>> type(tex)
<class 'pandas.core.series.Series'>
X_train_counts = count_vect.fit_transform(tex)

下面这个方法是行不通的，因为它是一个数据框（frame），而不是系列（series）

>>> tex2 = (df4.ix[0:,[11]])
>>> type(tex2)
<class 'pandas.core.frame.DataFrame'>

回答于 2025-04-27 由 Python大师

分享举报

检查一下 mealarray 的形状。如果传给 fit_transform 的参数是一个字符串数组，它必须是一维的数组。也就是说，mealarray.shape 的形状应该是 (n,) 这样的格式。例如，如果 mealarray 的形状是 (n, 1)，你就会遇到“没有这个属性”的错误。

你可以尝试类似下面的代码：

data = vectorizer.fit_transform(mealarray.ravel())

回答于 2025-04-27 由 Python大师

分享举报

我找到了我问题的答案。简单来说，CountVectorizer 是接受字符串内容的列表作为参数，而不是数组。这解决了我的问题。

回答于 2025-04-27 由 Python大师

分享举报

CountVectorizer：AttributeError：'numpy.ndarray'对象没有'lower'属性

4 个回答

撰写回答