计算LSA时出现“需要2D数组,需要1D数组”错误

2024-04-20 03:27:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在为LSA(潜在语义分析)编写自然语言处理中的预处理函数。所有其他函数,如tfidf、remove\u stopwords,都与我创建的单元测试一起工作。但是,在测试其功能时,LSA函数不断给我以下错误:

“应为2D数组,改为1D数组: 数组=[“我在橄榄园吃了晚饭”,“我们正在买房子”, “我没有在橄榄园吃晚饭”,“我们的邻居正在买房子”]。 使用数组重塑数据。如果数据具有单个特征或数组,则重塑(-1,1)。如果数据包含单个样本,则重塑(1,-1)。”

以下是我的LSA函数代码和测试代码:

import pandas as pd
import nltk
import string
import sklearn
from sklearn.decomposition import TruncatedSVD
from sklearn.preprocessing import Normalizer
from sklearn.feature_extraction.text import TfidfVectorizer

def LSA(data, tfidf = True, remove_stopwords=True):
    # done with stop word removal and tf-idf weighting keeping the 100 most common concepts
    text = data.iloc[:,-1] #isolate text column
    
     
    #Define the LSA function
    vectors = sklearn.decomposition.TruncatedSVD(n_components = 2, algorithm = 'randomized', n_iter = 100, random_state = 100)

    vectors.fit(text.tolist())
    svd_matrix = vectors.fit_transform(text.tolist())
    svd_matrix = Normalizer(copy=False).fit_transform(text.tolist())

    dense = svd_matrix.todense()
    denselist = dense.tolist()
    
    data["cleaned_vectorized_document"] = denselist
    return data

下面是我正在使用的抛出错误的测试代码:

p = pd.DataFrame({'two':[1,2,3,4],'test':['I ate dinner at Olive Garden', 'we are buying a house',
'I did not eat dinner at Olive Garden', 'our neighbors are buying a house']})

print(LSA(p))

Tags: 数据函数textfromimportdata数组sklearn
1条回答
网友
1楼 · 发布于 2024-04-20 03:27:23

我不确定这是否是您的问题,但您的数组在项之间缺少逗号,这至少会引发以下错误:

ValueError: arrays must all be same length

请尝试以下方法:

p = pd.DataFrame({'two':[1,2,3,4],'test':['I ate dinner at Olive Garden', 'we are buying a house', 'I did not eat dinner at Olive Garden', 'our neighbors are buying a house']})

相关问题 更多 >