如何从示例队列中将数据读入TensorFlow批处理？

def read_my_file_format(filename_queue): reader = tf.SomeReader() key, record_string = reader.read(filename_queue) example, label = tf.some_decoder(record_string) processed_example = some_processing(example) return processed_example, label def input_pipeline(filenames, batch_size, num_epochs=None): filename_queue = tf.train.string_input_producer( filenames, num_epochs=num_epochs, shuffle=True) example, label = read_my_file_format(filename_queue) # min_after_dequeue defines how big a buffer we will randomly sample # from -- bigger means better shuffling but slower start up and more # memory used. # capacity must be larger than min_after_dequeue and the amount larger # determines the maximum we will prefetch. Recommendation: # min_after_dequeue + (num_threads + a small safety margin) * batch_size min_after_dequeue = 10000 capacity = min_after_dequeue + 3 * batch_size example_batch, label_batch = tf.train.shuffle_batch( [example, label], batch_size=batch_size, capacity=capacity, min_after_dequeue=min_after_dequeue) return example_batch, label_batch

with tf.Session() as sess: sess.run(init) # Training cycle for epoch in range(training_epochs): total_batch = int(mnist.train.num_examples/batch_size) # Loop over all batches for i in range(total_batch): batch_xs, batch_ys = mnist.train.next_batch(batch_size)

1条回答

网友

1楼 · 发布于 2024-05-16 19:26:45

如果您想让这个输入管道工作，您将需要添加一个异步队列机制来生成成批的示例。这是通过创建tf.RandomShuffleQueue或tf.FIFOQueue并插入已读取、解码和预处理的JPEG图像来执行的。

您可以使用方便的构造来生成队列和相应的线程，以便通过tf.train.shuffle_batch_join或tf.train.batch_join运行队列。这里有一个简单的例子说明这是什么。请注意，此代码未经测试：

# Let's assume there is a Queue that maintains a list of all filenames
# called 'filename_queue'
_, file_buffer = reader.read(filename_queue)

# Decode the JPEG images
images = []
image = decode_jpeg(file_buffer)

# Generate batches of images of this size.
batch_size = 32

# Depends on the number of files and the training speed.
min_queue_examples = batch_size * 100
images_batch = tf.train.shuffle_batch_join(
  image,
  batch_size=batch_size,
  capacity=min_queue_examples + 3 * batch_size,
  min_after_dequeue=min_queue_examples)

# Run your network on this batch of images.
predictions = my_inference(images_batch)

根据需要扩展作业的方式，可能需要运行多个独立线程来读取/解码/预处理图像并将其转储到示例队列中。在Inception/ImageNet模型中提供了这样一个管道的完整示例。看看batch_inputs：

https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L407

最后，如果使用的是>；O（1000）JPEG图像，请记住，单独准备1000个小文件是非常低效的。这会使你的训练慢很多。

将图像数据集转换为Example协议的分片TFRecord的一种更健壮、更快的解决方案。这里有一个完全工作过的script用于将ImageNet数据集转换为这样的格式。这里有一组instructions用于在包含JPEG图像的任意目录上运行此预处理脚本的通用版本。

相关问题更多 >

编程相关推荐

热门问题

热门文章