广播错误HDF5:无法广播（3，2048，1，1）>（4，2048，1，1）

shape = (dataset_length, 2048, 1, 1) all_shape = (dataset_length, 6144, 1, 1) labels_shape = (dataset_length) batch_shape = (1,) path = args.HDF5_dataset + f'{phase}.hdf5' #hdf5_file = h5py.File(path, mode='w') with h5py.File(path, mode='a') as hdf5_file: array_40 = hdf5_file.create_dataset( f'{phase}_40x_arrays', shape, maxshape=(None, 2048, 1, 1) ) array_labels = hdf5_file.create_dataset( f'{phase}_labels', labels_shape, maxshape=(None), dtype=string_type ) array_batch_idx = hdf5_file.create_dataset( f'{phase}_batch_idx', data=np.array([-1, ]) ) hdf5_file.close() # either new or checkpionted file exists # load file and create references to exisitng h5 datasets with h5py.File(path, mode='r+') as hdf5_file: array_40 = hdf5_file[f'{phase}_40x_arrays'] array_labels = hdf5_file[f'{phase}_labels'] array_batch_idx = hdf5_file[f'{phase}_batch_idx'] batch_idx = int(array_batch_idx[0]+1) print("Batch ID is restarting from {}".format(batch_idx)) dataloaders_dict = torch.utils.data.DataLoader(datasets_dict, batch_size=args.batch_size, sampler=SequentialSampler2( datasets_dict, batch_idx, args.batch_size),drop_last=True, num_workers=args.num_workers, shuffle=False) # make sure shuffling is false for sampler to work and incase you restart for i, (inputs40x, paths40x, labels) in enumerate(dataloaders_dict): print(f'Batch ID: {batch_idx}') inputs40x = inputs40x.to(device) labels = labels.to(device) paths = paths40x x40 = resnet(inputs40x) # torch.Size([1, 2048, 1, 1]) batch, feats, 1l, 1l array_40[...] = x40.cpu() array_labels[batch_idx, ...] = labels[:].cpu() array_batch_idx[:,...] = batch_idx batch_idx +=1 hdf5_file.flush()

1条回答

网友

1楼 · 发布于 2024-04-26 07:09:18

我认为您对maxshape=()参数的使用感到困惑。它设置每个维度中分配的最大数据集大小。第一个数据集维度在创建时设置为dataset_length，使用maxshape[0]=None允许大小无限增长。创建时第二个数据集维度的大小为args.batch_size。您为maxshape指定了相同的大小，因此无法增加此维度

我被你的例子弄糊涂了。听起来好像您试图以args.batch_size的行/实例的形式递增地向数据集写入数据。您的示例有51行/实例数据，您希望以args.batch_size=4为单位进行批量写入。对于51行，您可以写入前48行（0-3、4-7…44-47），然后继续使用剩余的3行。您不能通过添加一个计数器（称之为nrows_left）并将批处理大小参数更改为min(args.batch_size, rows_left)来解决这个问题吗？对我来说似乎是最简单的解决办法

没有更多信息，我无法写出完整的示例。
我将尝试在下面说明我的意思：

# args.batch_size = 4
shape = (dataset_length, 2048, 1, 1)
array_40 = hdf5_file.create_dataset(
           f'{phase}_40x_arrays', shape, maxshape=(None, 2048, 1, 1))
nrows_left= dataset_length
rcnt = 0
loopcnt = dataset_length/args.batch_size
if dataset_length%args.batch_size != 0:
    loopcnt += 1 
for loop in range(loopcnt) :
    nload = min(nrows_left, args.batch_size)
    array_40[rcnt :row+nload] = img_data[rcnt:row+nload ]
    rcnt += nload 
    nrows_left -= nload

相关问题更多 >

编程相关推荐

热门问题

热门文章