TensorFlow数据格式要求

2024-05-29 01:57:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我对如何将数据读入tensorflow有点困惑。我试图创建一个LSTM并使用tf.nn.embedding_lookup来查找向量表示,但我似乎无法运行它。你知道吗

我的数据当前如下所示:

Out[494]: 
   sentiment                                      glove_indexes
0          0  [574305, 1294, 939107, 657375, 571132, 1013429...
1          0                                           [500519]
2          4                                    [560941, 93286]
3          0  [972036, 569274, 478483, 1051901, 684125, 6482...
4          0  [156951, 572457, 465860, 132739, 284963, 11483...

我还有一个字典glove_ids,我可以用这些索引调用它来获得这些单词的向量表示。你知道吗

我想我可以打个电话

embed = tf.nn.embedding_lookup(glove_ids, inputs_data)

得到向量表示,但这不起作用。有人能帮我把这个设置好吗?你知道吗

编辑 我尝试了一个也不起作用的解决方法。我只是希望能得到一些关于如何解决这个问题的指导。。。你知道吗

我现在把epoch\ux\u列作为长度为18的向量,我是说单词的最大长度是,epoch\ux\u列中的每个条目是25,这是嵌入的长度。我相信这是正确的,每个单词都有正确的嵌入。getTrainBatch随机地将新数据拉入模型以进行拟合。我犯了个错误

ValueError: setting an array element with a sequence.


def getTrainBatch():
    labels = []
    arr = np.zeros([batch_size , maxSeqLength])
    for i in range(batch_size ):
        num = randint(0,len(train_dat))
        labels.append(y_train[num])
        arr[i] = x_train[num]
    return arr, labels

def my_lookup(dat):
    new = []
    for i in range(len(dat)):
        temp = []
        for j in range(len(dat[i])):
            if dat[i][j] == 0:
                temp.append(list(np.zeros(maxSeqLength)))
            else:
                temp.append(glove_ids[dat[i][j]])
        new.append(temp)
    return new


maxSeqLength = 18
x_train = train_dat['glove_indexes']
x_train = np.array(x_train)
x_train = sequence.pad_sequences(x_train, maxlen=maxSeqLength)

y_train = train_dat['sentiment']
y_train = np.where(y_train == 4, 1, 0)
y_train = np.array(y_train)

lstm_size = 256
batch_size = 500
learning_rate = 0.001
embed_size = GloVeEncodingsSize
n_outputs = 2



X = tf.placeholder(tf.float32, [None, embed_size, maxSeqLength])
Y = tf.placeholder(tf.int32, [None])

basic_cell = tf.contrib.rnn.BasicRNNCell(num_units = lstm_size)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype = tf.float32)

logits = tf.layers.dense(states, n_outputs)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=Y,logits=logits)

loss = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
correct = tf.nn.in_top_k(logits, Y, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

init = tf.global_variables_initializer()

n_epochs = 100



with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        epoch_x_train, epoch_y_train = getTrainBatch()
        epoch_x_train = my_lookup(epoch_x_train)

        sess.run(training_op, feed_dict={X: epoch_x_train, Y: epoch_y_train})
        acc_train = accuracy.eval(feed_dict={X: epoch_x_train, Y: epoch_y_train})
        print(epoch, "Train accuracy:", acc_train)

再次编辑 从更多的google上看,这个错误似乎来自feed目录,但我不明白为什么这是错误的。我试过用[1,0]格式的y来表示一个响应,或者每x\u列只使用1或0。你知道吗

完整错误消息

Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-77-7960c1e2188b>", line 12, in <module>
    sess.run(training_op, feed_dict={X: np.array(epoch_x_train), Y: np.array(epoch_y_train)})
  File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
    run_metadata_ptr)
  File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1089, in _run
    np_val = np.asarray(subfeed_val, dtype=subfeed_dtype)
  File "C:\ProgramData\Anaconda3\envs\py35\lib\site-packages\numpy\core\numeric.py", line 531, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

Tags: runinsizetfnplinetrainnn

热门问题