当我尝试运行我的cod时,我得到了这个值错误

2024-06-07 19:47:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图运行以下代码,但得到这个错误不知道为什么。 ValueError:X每个示例有6个特性;应为2613

df1 = pd.read_csv('train_set.csv',encoding='latin-1')
df1.columns = df1.columns.str.strip()
con = sqlite3.connect("TrainSet.db")
df1.to_sql("Table1",con)
con.close()

text = []
for i in df1['SentimentText']:
    text.append(i)
for i in range(len(text)):
    text[i] = word_tokenize(text[i].lower())

stpwrds = stopwords.words('english')
stpwrds.extend(['.',',','-','_','&','!','@','*',')','(',':','/',';'])
stpwrds = set(stpwrds)

for i in range(len(text)):
    text[i] = list(set(text[i]) - stpwrds)

lemmatizer = WordNetLemmatizer()
for i in range(len(text)):
    for j in range(len(text[i])):
        text[i][j] = lemmatizer.lemmatize(text[i][j], pos='v')

for i in range(len(text)):
    text[i] = ' '.join(text[i])

vectorizer = TfidfVectorizer()
matrix = vectorizer.fit_transform(text)
X = matrix[:7000]
Y = np.array(df1['Sentiment'][:7000])

f='such horrible movie, never gonna watching it again!!!'
f=word_tokenize(f.lower())
for k in range(len(stw)):
    while stw[k] in f:
        f.remove(stw[k])

lemmatizer = WordNetLemmatizer()
for i in f:
    i = lemmatizer.lemmatize(i,pos='v')

f= ' '.join(f)

g=vect.fit_transform([f])
g=g.toarray()

X_train,X_test,Y_train,Y_test = train_test_split(X.toarray(),Y)  <--- 
                                                                 MemoryError 
lr = LogisticRegression()
lr.fit(X_train,Y_train)
Y_pred = lr.predict(g)   <---ValueError: X has 6 features per sample; 
                             expecting 2613

有人能帮我解决这个错误吗?另外,我在执行train\u test\u split时遇到内存错误。你知道吗


Tags: textintestforlen错误rangetrain

热门问题