为什么actor中的可训练变量没有梯度？

2024-06-16 10:21:30 发布

男 | 程序猿一只，喜欢编程写python代码。

我自己在tensorflow中实现了ddpg，遇到了一个神秘的bug，花了我好几天的时间去思考，但仍然没有结果。你知道吗

我把演员的损失定义为

actor_loss = - tf.reduce_mean(self.critic_with_actor.Q)

其中self.critic_with_actor.Q是critic的输出，它将从actor获得的操作作为其输入之一。问题是，actor不知何故没有任何梯度。以下是相关tensorboard信息的快照：

其中Tanh是actor的输出张量，即actor选择的action。BiasAdd是Tanh的输入张量，其他张量只是actor中的可训练变量。如您所见，Tanh有梯度，但其他的没有。这是我的主要演员-评论家网络的架构

其中critic_1对应于代码中的self.critic_with_actor，它与critic共享变量。你知道吗

Tags：代码 self 定义 tf tensorflow with 时间 bug

0条回答

目前没有回答