自定义激活函数可能存在梯度问题

2024-04-26 18:16:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要一个定制的激活函数,公式如下:

enter image description here

下面是我如何用tensorflow实现它:

import tensorflow as tf

sess = tf.Session()

def s_lamda_activation(f, lam):
    positive = tf.nn.relu(f - lam)
    positive = positive * (f/positive)
    positive = tf.where(tf.is_nan(positive), tf.zeros_like(positive), positive)
    negative = tf.nn.relu((-f) - lam)
    negative = negative * (f/negative)
    negative = tf.where(tf.is_nan(negative), tf.zeros_like(negative), negative)
    return positive + negative


a = tf.constant([[1,2,3,4,5,10,-10,14,-20],[-100,-2,-3,-4,-5,-10,10,-14,-20]], dtype=tf.float32)
a = s_lamda_activation(a, 5)
print(sess.run(a))

输出:

[[   0.    0.    0.    0.    0.   10.  -10.   14.  -20.]
 [-100.    0.    0.    0.    0.  -10.   10.  -14.  -20.]]

然而,tf.where可能会导致一些梯度问题,并且使用此实现不会减少损失。你知道吗

我删除了tf.where,并将代码更改为:

import tensorflow as tf

sess = tf.Session()

def s_lamda_activation(f, lam):
    positive = tf.nn.relu(f - lam)
    negative = tf.nn.relu((-f) - lam)
    return positive - negative


a = tf.constant([[1,2,3,4,5,10,-10,14,-20],[-100,-2,-3,-4,-5,-10,10,-14,-20]], dtype=tf.float32)
a = s_lamda_activation(a, 5)
print(sess.run(a))

输出:

[[  0.   0.   0.   0.   0.   5.  -5.   9. -15.]
 [-95.   0.   0.   0.   0.  -5.   5.  -9. -15.]]

此实现工作正常,损失函数正在按预期减少。但是这个实现与上面所描述的原始激活函数不同。有什么建议可以让我正确有效地执行这个功能吗?那么tf.where会产生梯度问题吗?你知道吗

非常感谢你的帮助!你知道吗


Tags: 函数importsessiontftensorflowasnnwhere
1条回答
网友
1楼 · 发布于 2024-04-26 18:16:14

问题是您没有正确地使用tf.where()来实现激活函数。您可以使用tf.gradients查看渐变,如下所示:

import tensorflow as tf

...

result = s_lamda_activation(a, 5)
grad = tf.gradients(result,a)

with tf.Session() as sess:
    print(sess.run(result))
    print(sess.run(grad))
[[   0.    0.    0.    0.    0.   10.  -10.   14.  -20.]
 [-100.    0.    0.    0.    0.  -10.   10.  -14.  -20.]]
[array([[nan, nan, nan, nan, nan, nan, nan, nan, nan],
       [nan, nan, nan, nan, nan, nan, nan, nan, nan]], dtype=float32)]

正确的用法很简单:

import tensorflow as tf

def s_lamda_activation(f, lam):
    return tf.where(tf.greater(tf.abs(f),lam),f,tf.zeros_like(f))

a = tf.constant([[1,2,3,4,5,10,-10,14,-20],[-100,-2,-3,-4,-5,-10,10,-14,-20]], dtype=tf.float32)

result = s_lamda_activation(a, 5)
grad = tf.gradients(result,a)

with tf.Session() as sess:
    print(sess.run(result))
    print(sess.run(grad))

[[   0.    0.    0.    0.    0.   10.  -10.   14.  -20.]
 [-100.    0.    0.    0.    0.  -10.   10.  -14.  -20.]]
[array([[0., 0., 0., 0., 0., 1., 1., 1., 1.],
       [1., 0., 0., 0., 0., 1., 1., 1., 1.]], dtype=float32)]

相关问题 更多 >