Python中神经网络的数据加载

2021-04-11 23:13:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我必须处理两个文本文件,其中有几个来自酒店的评论。在每个评审旁边都有一个值,表示它是真实的评审还是欺骗性的评审。 为了处理测试和训练集,我有这部分代码:

import csv
x_train = list()
y_train = list()
with open('TRAINING_ALL.txt', encoding='utf-8') as infile:
    reader = csv.reader(infile, delimiter='\t')
    for row in reader:
        x_train.append(row[0])
        y_train.append(int(row[1]))



x_test = list()
y_test = list()
with open('TEST_ALL.txt', encoding='utf-8') as infile:
reader = csv.reader(infile, delimiter='\t')
for row in reader:
    x_test.append(row[0])
    y_test.append(int(row[1]))

然后我要用神经网络进行分类。但是,在加载数据部分,我陷入了困境:

^{pr2}$

我得到:

Loading data...
480 train sequences
320 test sequences
Pad sequences (samples x time)

到目前为止还不错。它读取正确的序列号。那么错误是:

ValueError: invalid literal for int() with base 10: "ould take a quick dip in the pool. I toured the hotel as my niece is planning her wedding and just so happens to live close to the hotel. The ' Chagall Ballroom ', was elegant enough for such an occa

给这段代码正确的输入是什么?在

请注意,代码最初的工作原理如下(从imdb获取数据集):

print('Loading data...')
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)

也许x峎u train和x_test的格式不正确?在