如何在Python中实现Softmax函数

3条回答

网友

1楼 · 编辑于 2024-05-14 23:42:30

（嗯。。。这里有很多困惑，无论是问题还是答案……）

首先，这两个解（即您的解和建议的解）是而不是等价的；它们发生仅在一维分数数组的特殊情况下是等价的。如果您也尝试过Udacity测试提供的示例中的二维分数数组，您就会发现它。

从结果上看，这两种解决方案之间唯一的实际区别是axis=0参数。为了证明这是真的，让我们试试您的解决方案（your_softmax），其中唯一的区别是axis参数：

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# correct solution:
def softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

正如我所说，对于一维分数数组，结果确实是相同的：

scores = [3.0, 1.0, 0.2]
print(your_softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
print(softmax(scores))
# [ 0.8360188   0.11314284  0.05083836]
your_softmax(scores) == softmax(scores)
# array([ True,  True,  True], dtype=bool)

然而，以下是作为测试示例的Udacity测验中给出的二维分数数组的结果：

scores2D = np.array([[1, 2, 3, 6],
                     [2, 4, 5, 6],
                     [3, 8, 7, 6]])

print(your_softmax(scores2D))
# [[  4.89907947e-04   1.33170787e-03   3.61995731e-03   7.27087861e-02]
#  [  1.33170787e-03   9.84006416e-03   2.67480676e-02   7.27087861e-02]
#  [  3.61995731e-03   5.37249300e-01   1.97642972e-01   7.27087861e-02]]

print(softmax(scores2D))
# [[ 0.09003057  0.00242826  0.01587624  0.33333333]
#  [ 0.24472847  0.01794253  0.11731043  0.33333333]
#  [ 0.66524096  0.97962921  0.86681333  0.33333333]]

结果是不同的-第二个结果确实与Udacity测验中预期的结果相同，其中所有列的总和确实为1，而第一个（错误的）结果则不是这样。

所以，所有的麻烦实际上都是为了实现细节-参数axis。根据numpy.sum documentation：

The default, axis=None, will sum all of the elements of the input array

在这里我们要按行求和，因此axis=0。对于一维数组，（仅）行的和和和所有元素的和碰巧是相同的，因此在这种情况下得到相同的结果。。。

撇开axis问题不谈，您的实现（即您选择先减去最大值）实际上比建议的解决方案要好！事实上，这是实现softmax函数的推荐方法-请参见here以获得理由（数字稳定性，也由上面的一些答案指出）。

网友

2楼 · 编辑于 2024-05-14 23:42:30

它们都是正确的，但从数值稳定性的角度来看，最好是你的。

你从

e ^ (x - max(x)) / sum(e^(x - max(x))

通过使用a^（b-c）=（a^b）/（a^c）这个事实，我们得到了

= e ^ x / (e ^ max(x) * sum(e ^ x / e ^ max(x)))

= e ^ x / sum(e ^ x)

另一个答案是。您可以用任何变量替换max（x），它将被取消。

网友

3楼 · 编辑于 2024-05-14 23:42:30

所以，这确实是对沙漠人的回答的评论，但由于我的名声，我现在还不能评论。正如他所指出的，只有当您的输入包含一个样本时，您的版本才是正确的。如果您的输入包含多个示例，则它是错误的。然而，desternaut的解决方案也是错误的。问题是，一旦他接受一维输入，然后他接受二维输入。让我给你看看这个。

import numpy as np

# your solution:
def your_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

# desertnaut solution (copied from his answer): 
def desertnaut_softmax(x):
    """Compute softmax values for each sets of scores in x."""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum(axis=0) # only difference

# my (correct) solution:
def softmax(z):
    assert len(z.shape) == 2
    s = np.max(z, axis=1)
    s = s[:, np.newaxis] # necessary step to do broadcasting
    e_x = np.exp(z - s)
    div = np.sum(e_x, axis=1)
    div = div[:, np.newaxis] # dito
    return e_x / div

以逃兵为例：

x1 = np.array([[1, 2, 3, 6]]) # notice that we put the data into 2 dimensions(!)

这是输出：

your_softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

desertnaut_softmax(x1)
array([[ 1.,  1.,  1.,  1.]])

softmax(x1)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

您可以看到desenauts版本在这种情况下会失败。（如果输入像np.array（[1，2，3，6]）一样是一维的，那就不会了。

现在让我们使用3个样本，因为这就是我们使用二维输入的原因。下面的x2与desenauts示例中的x2不同。

x2 = np.array([[1, 2, 3, 6],  # sample 1
               [2, 4, 5, 6],  # sample 2
               [1, 2, 3, 6]]) # sample 1 again(!)

此输入由一个包含3个样本的批组成。但样本一和样本三基本上是一样的。我们现在期望3行softmax激活，其中第一行应与第三行相同，也应与我们激活的x1相同！

your_softmax(x2)
array([[ 0.00183535,  0.00498899,  0.01356148,  0.27238963],
       [ 0.00498899,  0.03686393,  0.10020655,  0.27238963],
       [ 0.00183535,  0.00498899,  0.01356148,  0.27238963]])


desertnaut_softmax(x2)
array([[ 0.21194156,  0.10650698,  0.10650698,  0.33333333],
       [ 0.57611688,  0.78698604,  0.78698604,  0.33333333],
       [ 0.21194156,  0.10650698,  0.10650698,  0.33333333]])

softmax(x2)
array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037047],
       [ 0.01203764,  0.08894682,  0.24178252,  0.65723302],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037047]])

我希望你能明白这只是我的解决方案。

softmax(x1) == softmax(x2)[0]
array([[ True,  True,  True,  True]], dtype=bool)

softmax(x1) == softmax(x2)[2]
array([[ True,  True,  True,  True]], dtype=bool)

此外，以下是TensorFlows softmax实现的结果：

import tensorflow as tf
import numpy as np
batch = np.asarray([[1,2,3,6],[2,4,5,6],[1,2,3,6]])
x = tf.placeholder(tf.float32, shape=[None, 4])
y = tf.nn.softmax(x)
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(y, feed_dict={x: batch})

结果是：

array([[ 0.00626879,  0.01704033,  0.04632042,  0.93037045],
       [ 0.01203764,  0.08894681,  0.24178252,  0.657233  ],
       [ 0.00626879,  0.01704033,  0.04632042,  0.93037045]], dtype=float32)

相关问题更多 >

编程相关推荐

热门问题

热门文章