Softmax函数 – python

从Udacity的深度学习阶段来看，y_i的softmax就是简单的指数除以整个Y向量的指数之和：

在这里输入图像描述

其中S(y_i)是S(y_i)的softmax函数， e是指数， j是否。 input向量Y中的列。

我已经尝试了以下内容：

 import numpy as np def softmax(x): """Compute softmax values for each sets of scores in x.""" e_x = np.exp(x - np.max(x)) return e_x / e_x.sum() scores = [3.0, 1.0, 0.2] print(softmax(scores))

它返回：

 [ 0.8360188 0.11314284 0.05083836]

但build议的解决scheme是：

 def softmax(x): """Compute softmax values for each sets of scores in x.""" return np.exp(x) / np.sum(np.exp(x), axis=0)

它产生与第一个实现相同的输出 ，即使第一个实现显式地取每列和max的差值，然后除以和。

有人可以用math表示为什么？ 一个是正确的，另一个是错的？

在代码和时间复杂性方面的实现是否相似？ 哪个更有效率？

他们都是正确的，但是从数值稳定的angular度来看，你是首选。

你开始

 e ^ (x - max(x)) / sum(e^(x - max(x))

通过使用a（b – c）=（a ^ b）/（a ^ c）的事实，

 = e ^ x / e ^ max(x) * sum(e ^ x / e ^ max(x)) = e ^ x / sum(e ^ x)

这是另一个答案所说的。你可以用任何variables代替max（x），它会被抵消掉。

（呃…这里很混乱，无论是在问题还是答案中）

首先，两个解决scheme（即你的和build议的）是不等价的。它们恰好相当于仅用于一维分数arrays的特殊情况。你会发现它，如果你也尝试了在Udacity测验提供的例子中的二维分数数组。

从结果来看，两个解决scheme之间唯一的实际区别是axis=0参数。要看到这种情况，我们来试试你的解决scheme（ your_softmax ），唯一的区别就是axis参数：

 import numpy as np # your solution: def your_softmax(x): """Compute softmax values for each sets of scores in x.""" e_x = np.exp(x - np.max(x)) return e_x / e_x.sum() # correct solution: def softmax(x): """Compute softmax values for each sets of scores in x.""" e_x = np.exp(x - np.max(x)) return e_x / e_x.sum(axis=0) # only difference

正如我所说的，对于一维分数arrays，结果确实是一样的：

 scores = [3.0, 1.0, 0.2] print(your_softmax(scores)) # [ 0.8360188 0.11314284 0.05083836] print(softmax(scores)) # [ 0.8360188 0.11314284 0.05083836] your_softmax(scores) == softmax(scores) # array([ True, True, True], dtype=bool)

不过，下面是在Udacity测验中给出的二维分数arrays的testing结果：

 scores2D = np.array([[1, 2, 3, 6], [2, 4, 5, 6], [3, 8, 7, 6]]) print(your_softmax(scores2D)) # [[ 4.89907947e-04 1.33170787e-03 3.61995731e-03 7.27087861e-02] # [ 1.33170787e-03 9.84006416e-03 2.67480676e-02 7.27087861e-02] # [ 3.61995731e-03 5.37249300e-01 1.97642972e-01 7.27087861e-02]] print(softmax(scores2D)) # [[ 0.09003057 0.00242826 0.01587624 0.33333333] # [ 0.24472847 0.01794253 0.11731043 0.33333333] # [ 0.66524096 0.97962921 0.86681333 0.33333333]]

结果是不同的 – 第二个与Udacity测验中预期的结果是完全相同的，所有列的总和确实为1，而第一个（错误的）结果并不是这样。

所以，所有的大惊小怪实际上是一个实现细节 – axis参数。根据numpy.sum文档：

轴的默认值为None，将对input数组的所有元素进行求和

而在这里我们想要按行求和，因此axis=0 。对于一维数组，（只）行和所有元素的总和恰好相同，因此在这种情况下，您的结果是一致的。

除了axis问题，你的实现（即你select减去最大的第一个）实际上比build议的解决scheme更好！实际上，这是实现softmax函数的推荐方式 – 请参阅此处的alignment方式（数值稳定性，上面的一些答案也指出）。

我会说，尽pipe两者在math上都是正确的，但在实施方面，第一个更好。计算softmax时，中间值可能会变得非常大。划分两个大数字可能在数字上不稳定。这些笔记（来自斯坦福大学）提到了一个正常化的把戏，这基本上就是你在做什么。

所以，这真的是评论沙漠的答案，但我不能评论，因为我的声誉。正如他指出的，如果您的input包含单个样本，那么您的版本才是正确的。如果你的input包含几个样本，那就错了。 然而，逃避的解决scheme也是错误的。 问题是，一旦他采取一维input，然后他采取二维input。让我给你看看

 import numpy as np # your solution: def your_softmax(x): """Compute softmax values for each sets of scores in x.""" e_x = np.exp(x - np.max(x)) return e_x / e_x.sum() # desertnaut solution (copied from his answer): def desertnaut_softmax(x): """Compute softmax values for each sets of scores in x.""" e_x = np.exp(x - np.max(x)) return e_x / e_x.sum(axis=0) # only difference # my (correct) solution: def softmax(z): assert len(z.shape) == 2 s = np.max(z, axis=1) s = s[:, np.newaxis] # necessary step to do broadcasting e_x = np.exp(z - s) div = np.sum(e_x, axis=1) div = div[:, np.newaxis] # dito return e_x / div

让我们以斋戒为例：

 x1 = np.array([[1, 2, 3, 6]]) # notice that we put the data into 2 dimensions(!)

这是输出：

 your_softmax(x1) array([[ 0.00626879, 0.01704033, 0.04632042, 0.93037047]]) desertnaut_softmax(x1) array([[ 1., 1., 1., 1.]]) softmax(x1) array([[ 0.00626879, 0.01704033, 0.04632042, 0.93037047]])

你可以看到在这种情况下desernauts版本会失败。（如果input只是像np.array（[1,2,3,6]）那样的一维数据，就不会。

现在让我们使用3个样本，因为这就是我们使用二维input的原因。下面的x2和desernauts的例子不一样。

 x2 = np.array([[1, 2, 3, 6], # sample 1 [2, 4, 5, 6], # sample 2 [1, 2, 3, 6]]) # sample 1 again(!)

该input由一个具有3个样本的批次组成。但是样本一和三基本上是一样的。我们现在期望3行softmax激活，其中第一行应该与第三行相同，也与我们的激活x1相同！

 your_softmax(x2) array([[ 0.00183535, 0.00498899, 0.01356148, 0.27238963], [ 0.00498899, 0.03686393, 0.10020655, 0.27238963], [ 0.00183535, 0.00498899, 0.01356148, 0.27238963]]) desertnaut_softmax(x2) array([[ 0.21194156, 0.10650698, 0.10650698, 0.33333333], [ 0.57611688, 0.78698604, 0.78698604, 0.33333333], [ 0.21194156, 0.10650698, 0.10650698, 0.33333333]]) softmax(x2) array([[ 0.00626879, 0.01704033, 0.04632042, 0.93037047], [ 0.01203764, 0.08894682, 0.24178252, 0.65723302], [ 0.00626879, 0.01704033, 0.04632042, 0.93037047]])

我希望你能看到，这只是我的解决scheme的情况。

 softmax(x1) == softmax(x2)[0] array([[ True, True, True, True]], dtype=bool) softmax(x1) == softmax(x2)[2] array([[ True, True, True, True]], dtype=bool)

另外，这里是TensorFlows softmax实现的结果：

 import tensorflow as tf import numpy as np batch = np.asarray([[1,2,3,6],[2,4,5,6],[1,2,3,6]]) x = tf.placeholder(tf.float32, shape=[None, 4]) y = tf.nn.softmax(x) init = tf.initialize_all_variables() sess = tf.Session() sess.run(y, feed_dict={x: batch})

结果是：

 array([[ 0.00626879, 0.01704033, 0.04632042, 0.93037045], [ 0.01203764, 0.08894681, 0.24178252, 0.657233 ], [ 0.00626879, 0.01704033, 0.04632042, 0.93037045]], dtype=float32)

在这里，你可以找出他们为什么使用- max 。

从那里：

“在编写用于计算Softmax函数的代码时，由于指数的原因，中间项可能非常大，划分大数字可能在数值上不稳定，所以使用归一化技巧很重要。

我写了一个在任何轴上应用softmax的函数：

 def softmax(X, theta = 1.0, axis = None): """ Compute the softmax of each element along an axis of X. Parameters ---------- X: ND-Array. Probably should be floats. theta (optional): float parameter, used as a multiplier prior to exponentiation. Default = 1.0 axis (optional): axis to compute values along. Default is the first non-singleton axis. Returns an array the same size as X. The result will sum to 1 along the specified axis. """ # make X at least 2d y = np.atleast_2d(X) # find axis if axis is None: axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1) # multiply y against the theta parameter, y = y * float(theta) # subtract the max for numerical stability y = y - np.expand_dims(np.max(y, axis = axis), axis) # exponentiate y y = np.exp(y) # take the sum along the specified axis ax_sum = np.expand_dims(np.sum(y, axis = axis), axis) # finally: divide elementwise p = y / ax_sum # flatten if X was 1D if len(X.shape) == 1: p = p.flatten() return p

正如其他用户所描述的那样，减去最大值是最好的做法。我在这里写了一篇详细的文章。

sklearn也提供了softmax的实现

 from sklearn.utils.extmath import softmax import numpy as np x = np.array([[ 0.50839931, 0.49767588, 0.51260159]]) softmax(x) # output array([[ 0.3340521 , 0.33048906, 0.33545884]])

从math的angular度来看，双方让我们m=max(x) 。现在你的函数softmax返回一个向量，它的第i个坐标等于

注意这适用于任何m ，因为对于所有（甚至是复数）的数字e^m != 0

从计算复杂性的angular度来看，它们也是等价的，都运行在O(n)时间，其中n是一个向量的大小。
从数值稳定性的angular度来看，第二种解决scheme是优选的，因为e^x增长非常快，甚至对于很小的x值也会溢出。减去最大值可以摆脱这种溢出。为了实际上体验我正在谈论的东西，尝试将x = np.array([1000, 5])到这两个函数中。一会返回正确的概率，第二会溢出与nan
不涉及问题，但你的解决scheme只适用于向量（Udacity测验希望你也可以为matrixcaclulate）。为了解决它，你需要使用sum(axis=0)

一个更简洁的版本是：

 def softmax(x): return np.exp(x) / np.exp(x).sum(axis=0)

我想补充一点对这个问题的理解。这里减去数组的最大值是正确的。但是，如果你在另一篇文章中运行代码，当数组是2D或更高维度时，你会发现它没有给你正确的答案。

在这里我给你一些build议：

要获得最大值，请尝试沿着x轴执行，您将获得一维数组。
重塑你的最大arrays到原来的形状。
np.exp得到指数值。
沿着轴做np.sum。
获得最终结果。

按照结果，你将通过做vector化得到正确的答案。由于这与大学作业有关，我不能在这里发表确切的代码，但如果你不明白，我想提出更多的build议。

为了保持数值稳定性，应该减去max（x）。以下是softmax函数的代码;

def softmax（x）：

 if len(x.shape) > 1: tmp = np.max(x, axis = 1) x -= tmp.reshape((x.shape[0], 1)) x = np.exp(x) tmp = np.sum(x, axis = 1) x /= tmp.reshape((x.shape[0], 1)) else: tmp = np.max(x) x -= tmp x = np.exp(x) tmp = np.sum(x) x /= tmp return x

Softmax函数 – python

理解neural network反向传播

word2vec：负面抽样（非专业术语）？

如何识别这个图像中的矩形？

在pip中找不到张量stream

一般来说哪种机器学习分类器可供select？

偏差在neural network中的作用

如何将数据分成3组（火车，validation和testing）？

图像中的标识识别

为什么梯度下降，当我们可以parsing线性回归

R和数据挖掘