pca和RASS错误

2024-04-24 22:13:52 发布

您现在位置:Python中文网/ 问答频道 /正文

有两个.csv文件包含Tweet和每个Tweet的分类:posnegneutralclass表示分类,text表示Tweet。你知道吗

这是我的密码:

def prediction():
    print("Reading files...")

    #Will learn from this data set.
    train = file2SentencesArray('twitter-sanders-apple3')

    #Test dataset.
    test = file2SentencesArray('twitter-sanders-apple2')
    print("Complete!")

    print("Cleaning sentences...")
    #cleanSenteces will remove html, stop words and some characters.
    cleanTrainSentences = cleanSentences(train["text"])
    cleanTestSentences = cleanSentences(test["text"])
    print("Complete!...")

    print("Fiting sentences...")
    vectorizer = CountVectorizer(analyzer="word", tokenizer=None, preprocessor=None, stop_words=None, max_features=5000)
    trainDataFeatures = vectorizer.fit_transform(cleanTrainSentences)
    np.asarray(trainDataFeatures)

    testDataFeatures = vectorizer.transform(cleanTestSentences)
    np.asarray(testDataFeatures)

    #Getting error here.
    randomized_lasso = RandomizedLasso()
    randomized_lasso.fit_transform(trainDataFeatures, testDataFeatures)
    trainDataFeatures = randomized_lasso.transform(trainDataFeatures)

    #and here.
    #pca = decomposition.PCA(n_components=2)
    #pca.fit_transform(trainDataFeatures)
    #trainDataFeatures = pca.transform(trainDataFeatures)
    print("Complete!")

    print("Predicting...")
    forest = RandomForestClassifier(n_estimators=100)
    forest = forest.fit(trainDataFeatures, train["class"])
    result = forest.predict(testDataFeatures)
    print("Complete...")

    return result

随机套索和PCA都抛出了异常:

PCA–PCA does not support sparse input.

随机套索–bad input shape

我的trainDataFeatures看起来像这样:

(0, 573)   1
(0, 1411)  2
(0, 2748)  1
(0, 1073)  1
(1, 126)   1
(2, 1203)  1

Tags: textnonetransformtrainfittweetcompleteprint
1条回答
网友
1楼 · 发布于 2024-04-24 22:13:52

PCA和随机化套索的输入格式都不正确。请替换以下两行并重试。你知道吗

np.asarray(trainDataFeatures)
np.asarray(testDataFeatures)
# replace the above two lines with these
trainDataFeatures = trainDataFeatures.toarray()
testDataFeatures = testDataFeatures.toarray()

相关问题 更多 >