<p>看一下<a href="http://devdocs.io/tensorflow~python/tf/global_variables_initializer" rel="nofollow noreferrer">documentation</a>,<code>init = tf.global_variables_initializer()</code>和{<cd2>}是一样的</p>
<p><a href="http://devdocs.io/tensorflow~python/tf/train/adamoptimizer" rel="nofollow noreferrer">^{<cd3>}</a>需要初始化一些内部变量(平均值统计等)</p>
<pre><code><tf.Variable 'beta1_power:0' shape=() dtype=float32_ref>
<tf.Variable 'beta2_power:0' shape=() dtype=float32_ref>
<tf.Variable 'x/Adam:0' shape=(2, 1) dtype=float32_ref> # 1st moment vector
<tf.Variable 'x/Adam_1:0' shape=(2, 1) dtype=float32_ref> # 2nd moment vector
</code></pre>
<p><a href="http://devdocs.io/tensorflow~python/tf/train/adamoptimizer" rel="nofollow noreferrer">documentation</a>告诉您如何应用更新。在</p>
<p>相反,香草梯度下降优化器<a href="http://devdocs.io/tensorflow~python/tf/train/gradientdescentoptimizer" rel="nofollow noreferrer">^{<cd4>}</a>不依赖于任何<em>变量。有区别。
现在,在<a href="http://devdocs.io/tensorflow~python/tf/train/adamoptimizer" rel="nofollow noreferrer">^{<cd3>}</a>可以使用其变量之前,需要在某个时刻初始化这些变量。在</p>
<p>要创建初始化所有所需变量的操作<code>init</code>,此操作<code>init</code>需要知道运行程序所需的变量。因此,它需要放在</em><a href="http://devdocs.io/tensorflow~python/tf/train/adamoptimizer" rel="nofollow noreferrer">^{<cd3>}</a>之后。在</p>
<p>如果你把<code>init = tf.global_variables_initializer()</code><em>放在<a href="http://devdocs.io/tensorflow~python/tf/train/adamoptimizer" rel="nofollow noreferrer">^{<cd3>}</a>之前</p>
^{pr2}$
<p>你会得到</p>
<pre><code>Attempting to use uninitialized value beta1_power
</code></pre>
<p>它告诉您,<a href="http://devdocs.io/tensorflow~python/tf/train/adamoptimizer" rel="nofollow noreferrer">^{<cd3>}</a>试图访问尚未初始化的<code><tf.Variable 'beta1_power:0' shape=() dtype=float32_ref></code>。在</p>
<p>所以</p>
<pre><code># ...
... = tf.train.AdamOptimizer(0.1).minimize(cost_function)
# ...
init = tf.global_variables_initializer()
</code></pre>
<p>是唯一正确的方法。您可以检查,哪些变量可以通过</p>
<pre><code>for variable in tf.global_variables():
print(variable)
</code></pre>
<p>源代码中。在</p>
<p>考虑最小化二次型<code>0.5x'Ax + bx + c</code>的例子。在TensorFlow中</p>
<pre><code>import tensorflow as tf
import numpy as np
x = tf.Variable(np.random.rand(2, 1), dtype=tf.float32, name="x")
# we already make clear, that we are not going to optimize these variables
b = tf.constant([[5], [6]], dtype=tf.float32, name="b")
A = tf.constant([[9, 2], [2, 10]], dtype=tf.float32, name="A")
cost_function = 0.5 * tf.matmul(tf.matmul(tf.transpose(x), A), x) - tf.matmul(tf.transpose(b), x) + 42
for variable in tf.global_variables():
print('before ADAM: global_variables_initializer would init {}'.format(variable))
optimize_op = tf.train.AdamOptimizer(0.1).minimize(cost_function)
for variable in tf.global_variables():
print('after ADAM: global_variables_initializer would init
</code></pre>
<p>{}'。格式(变量)</p>
<pre><code>init_op = tf.variables_initializer(tf.global_variables())
with tf.Session() as sess:
sess.run(init_op)
for i in range(5):
loss, _ = sess.run([cost_function, optimize_op])
print(loss)
</code></pre>
<p>输出是</p>
<pre><code>before ADAM global_variables_initializer would init <tf.Variable 'x:0' shape=(2, 1) dtype=float32_ref>
after ADAM global_variables_initializer would init <tf.Variable 'x:0' shape=(2, 1) dtype=float32_ref>
after ADAM global_variables_initializer would init <tf.Variable 'beta1_power:0' shape=() dtype=float32_ref>
after ADAM global_variables_initializer would init <tf.Variable 'beta2_power:0' shape=() dtype=float32_ref>
after ADAM global_variables_initializer would init <tf.Variable 'x/Adam:0' shape=(2, 1) dtype=float32_ref>
after ADAM global_variables_initializer would init <tf.Variable 'x/Adam_1:0' shape=(2, 1) dtype=float32_ref>
</code></pre>
<p>因此,当在ADAM定义<code>tf.train.AdamOptimizer</code>之前放置{<cd1>}时,<code>tf.global_variables_initializer()</code>看不到ADAM所需的变量。使用GradientDescentOptimizer时,值为</p>
<pre><code>before ADAM global_variables_initializer would init <tf.Variable 'x:0' shape=(2, 1) dtype=float32_ref>
after ADAM global_variables_initializer would init <tf.Variable 'x:0' shape=(2, 1) dtype=float32_ref>
</code></pre>
<p>所以优化器前后没有任何变化。在</p>