像素RNN Pythorch实现

2024-03-28 11:16:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在Pythorch中实现PixelRNN,但我似乎找不到任何关于这方面的文档。像素RNN的主要部分是行LSTM和双向LSTM,所以我正在寻找这些算法的一些代码,以便更好地理解它们在做什么。具体地说,我很困惑这些算法分别计算一行和对角线。任何帮助都将不胜感激。在


Tags: 代码文档算法像素双向pythorchlstmrnn
1条回答
网友
1楼 · 发布于 2024-03-28 11:16:34

摘要

以下是正在进行的部分实施:

https://github.com/carpedm20/pixel-rnn-tensorflow

下面是谷歌在Bidliagm上的描述:

https://towardsdatascience.com/summary-of-pixelrnn-by-google-deepmind-7-min-read-938d9871d6d9


行LSTM

来自链接的deepmind博客:

一个像素的隐藏状态,在下面的图像中是红色的,是基于它前面的三个三角形像素的“内存”。因为它们在一行中,我们可以并行计算,加快计算速度。我们牺牲一些上下文信息(使用更多的历史或内存)来实现这种并行计算和加速训练。在

enter image description here

实际的实现依赖于其他几个优化,并且非常复杂。从original paper

The computation proceeds as follows. An LSTM layer has an input-to-state component and a recurrent state-to-state component that together determine the four gates inside the LSTM core. To enhance parallelization in the Row LSTM the input-to-state component is first computed for the entire two-dimensional input map; for this a k × 1 convolution is used to follow the row-wise orientation of the LSTM itself. The convolution is masked to include only the valid context (see Section 3.4) and produces a tensor of size 4h × n × n, representing the four gate vectors for each position in the input map, where h is the number of output feature maps. To compute one step of the state-to-state component of the LSTM layer, one is given the previous hidden and cell states hi−1 and ci−1, each of size h × n × 1. The new hidden and cell states hi , ci are obtained as follows:

enter image description here

where xi of size h × n × 1 is row i of the input map, and ~ represents the convolution operation and the elementwise multiplication. The weights Kss and Kis are the kernel weights for the state-to-state and the input-to-state components, where the latter is precomputed as described above. In the case of the output, forget and input gates oi , fi and ii , the activation σ is the logistic sigmoid function, whereas for the content gate gi , σ is the tanh function. Each step computes at once the new state for an entire row of the input map

对角线BLSTM的开发是为了在不牺牲大量上下文信息的情况下利用并行化的加速。DBLSTM中的一个节点向它的左边和上面看;因为这些节点也向左边和上面看,所以给定节点的条件概率在某种意义上取决于它的所有祖先。否则,架构非常相似。来自deepmind博客:

enter image description here

相关问题 更多 >