模特训练好像卡住了?

2024-04-26 23:53:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我在Windows10上使用tensorflow版本1.5。我使用的是从Github页面获取的Inception V4网络的Tensorflow slim模型,使用它们的预训练权重,并在最后添加我自己的层来分类120个不同的网络对象。对象我的训练数据集的大小大约是10000个图像,每个299*299*3。这是除了包含导入模块和数据集路径的行之外的完整代码。你知道吗

tf.logging.set_verbosity(tf.logging.INFO)
with slim.arg_scope(inception_blocks_v4.inception_v4_arg_scope()):
    X_input = tf.placeholder(tf.float32, shape = (None, image_size, image_size, 3))
    Y_label = tf.placeholder(tf.float32, shape = (None, num_classes))

    targets = convert_to_onehot(labels_dir, no_of_features = num_classes)
    targets = tf.convert_to_tensor(targets, dtype = tf.float32)

    Images = [] #TO STORE THE RESIZED IMAGES IN THE FORM OF LIST TO PASS IT TO tf.train.batch()
    images = glob.glob(images_file_path)
    i = 0
    for my_img in images:
        image = mpimg.imread(my_img)[:, :, :3]
        image = tf.convert_to_tensor(image, dtype = tf.float32)
        Images.append(image)

    logits, end_points = inception_blocks_v4.inception_v4(inputs = X_input, num_classes = pre_num_classes, is_training = True, create_aux_logits= False)
    pretrained_weights = slim.assign_from_checkpoint_fn(ckpt_dir, slim.get_model_variables('InceptionV4'))
    with tf.Session() as sess:
        pretrained_weights(sess)

    #MY LAYERS, add bias as well
    my_layer = slim.fully_connected(logits, 560, activation_fn=tf.nn.relu, scope='myLayer1', weights_initializer = tf.truncated_normal_initializer(stddev = 0.001), weights_regularizer=slim.l2_regularizer(0.00005),biases_initializer = tf.truncated_normal_initializer(stddev=0.001), biases_regularizer=slim.l2_regularizer(0.00005))
    my_layer = slim.dropout(my_layer, keep_prob = 0.6, scope = 'myLayer2')
    my_layer = slim.fully_connected(my_layer, num_classes,activation_fn = tf.nn.relu,scope= 'myLayer3', weights_initializer = tf.truncated_normal_initializer(stddev=0.001), weights_regularizer=slim.l2_regularizer(0.00005), biases_initializer = tf.truncated_normal_initializer(stddev=0.001), biases_regularizer=slim.l2_regularizer(0.00005))
    my_layer_logits = slim.fully_connected(my_layer, num_classes, activation_fn=None,scope='myLayer4')

    loss = tf.losses.softmax_cross_entropy(onehot_labels = Y_label, logits = my_layer_logits)  

    optimizer = tf.train.AdamOptimizer(learning_rate=0.0001) 
    train_op = optimizer.minimize(loss)
    batch_size = 8
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        for i in range(100):
            images, labels = tf.train.batch([Images, targets], batch_size = batch_size, num_threads = 1, capacity = (4*batch_size), enqueue_many=True)
            print (images)   #To check their shape
            print (labels)
            train_op.run(feed_dict = {X_input:images.eval(session = sess) ,Y_label:labels.eval(session = sess)})
            print (i)

我使用print(i)语句来跟踪完成了多少个时代。在运行脚本超过3小时后,甚至连一个历元的训练都没有完成。似乎它被卡在了train_op.run()步。我不知道有什么问题。你知道吗


Tags: imagelayersizemytfbatchtrainnum