将包含图像的张量展开为补丁
我有一批大小为 4
的单通道图像,它们的尺寸是 h x w = 180 x 320
。我想把这些图像分成一系列更小的块,每个块的形状是 h_p x w_p
,最终得到的张量(数据结构)形状是 4 x p x h_p x w_p
。如果 h
不能被 h_p
整除,或者 w
不能被 w_p
整除,那么在这些图像的边缘会填充0(也就是在图像周围加上一些空白)。我尝试了以下方法来实现这个目标:
import torch
tensor = torch.randn(4, 180, 320)
patch_size = (64, 64) #h_p = w_p = 64
unfold = torch.nn.Unfold(kernel_size=patch_size, stride=patch_size, padding=0)
unfolded = unfold(tensor)
print(unfolded.shape)
它打印出:
torch.Size([16384, 10])
我在这里遗漏了什么呢?
附注:
我想我自己找到了答案,并且已经在下面贴出来了。我还没有完全评估它的效果。不过如果你觉得这个方法有问题或者性能不佳,请告诉我。
1 个回答
1
我有一个输入数据,形状是 [#batches, height, width] = [4,180,320]
。我想把它分成 p
个更小的块,每个块的形状是 h_p x w_p
,最终得到的张量形状是 4 x p x h_p x w_p
。注意,要覆盖所有的 h x w = 180 x 320
元素,使用大小为 h_p x w_p = 64 x 64
的块,我需要 p = 3 x 5 = 15
个块:
所以,我在上下两边各加了6行的填充。其余的代码我在注释中解释了:
patch_size = (64,64)
input = torch.randn(4,180,320)
# Padding 6 rows on top and bottom, to make up total padding of 12 rows,
# so that our frame will become of size 192 x 320 and we can fit 3
# kernels of size 64 x 64 vertically
input = f.pad(input, pad=(0,0,6,6))
print(input.shape) # [4,192,320]
# add additional dimension indicating single channel
input = input.unsqueeze(1) # [4,1,192, 320]
print(input.shape)
# unfold with both stride and kernel size of 64 x 64
unfold = torch.nn.Unfold(kernel_size=patch_size, stride=(64,64))
unfolded = unfold(input)
print(unfolded.shape) # [4, 4096, 15]
# 4 for batch size
# 4096 = 64 x 64 elements in one patch
# 15 = we can fit 15 patches of size 64 x 64 in frame of size 192 x 329
# reshape result to desired size
# size(0) = 4 = batch size
# -1 to infer p or number of patches, by our calculations it will be 15
# *patch_size = 64 x 64
unfolded = unfolded.view(unfolded.size(0),-1,*patch_size)
print(unfolded.shape) # [4, 15, 64, 64]
这样输出是正确的:
torch.Size([4, 192, 320])
torch.Size([4, 1, 192, 320])
torch.Size([4, 4096, 15])
torch.Size([4, 15, 64, 64]