任务

  • 实现向量化的损失函数
  • 实现向量化的梯度计算
  • 分析梯度与数值梯度的验证
  • 使用验证集来选择超参数
  • 使用SGD优化方法
  • 可视化权重

理论知识

softmax损失函数

令W为权重矩阵,大小为D×C;x为输入,大小为1×D;b为偏置项,大小为1×C。那么模型的输出为:
XW + b

为一个向量S(1×C),对应样本属于各个类别的得分。在实际实现中偏置b被放入了W中,只需要在输入X后面多加一个1即可。以上的函数常用f(x,W)来表示,叫做得分函数。
softmax函数如下:

梯度求解

根据loss的计算过程以及链式法则,我们可以很容易的计算出loss对权重的梯度。博主在这里犯了很多小错误,比如指数函数的倒数就是其本身而不是1等等。这里就不详细列出来了。

在一些资料中,博主看到了一个关于数值稳定性的问题。就是指数函数的值会很大,大数相除就会有较大误差。实际中,利用除法以及指数函数的性质,让得分向量S的各项都减去向量中的最大值来避免这一问题。

代码实现

softmax loss以及梯度

循环实现

  num_classes = W.shape[1]  # 根据W的形状取得类别数
  num_train = X.shape[0]  # 获取输入样本的数量
  for i in range(num_train):
    scores = X[i].dot(W)   # 对每个样本计算得分
    scores = scores - np.max(scores)  # 减去最大值,避免指数后结果太大
    scores_exp = np.exp(scores)     # 指数操作
    ds_w = np.repeat(X[i], num_classes).reshape(-1, num_classes)   # 计算得分对权重的倒数
    scores_exp_sum = np.sum(scores_exp)
    pk = scores_exp[y[i]] / scores_exp_sum
    loss += -np.log(pk)   # 求得loss,为每个Xi对应loss之和 
    dl_s = np.zeros(W.shape)  # 开始计算loss对得分的倒数
    for j in range(num_classes):
      if j == y[i]:
        dl_s[:, j] = pk - 1    # 对于输入正确分类的那一项,倒数与其他不同
      else:
        dl_s[:, j] = scores_exp[j] / scores_exp_sum
    dW_i = ds_w * dl_s
    dW += dW_i
  loss /= num_train
  dW /= num_train
  loss += reg * np.sum(W * W)
  dW += W * 2 * reg

向量化实现

这里的实现很巧妙,值得仔细看一看

  num_classes = W.shape[1]
  num_train = X.shape[0]
  scores = X.dot(W)
  scores = scores - np.max(scores, 1, keepdims=True)
  scores_exp = np.exp(scores)
  sum_s = np.sum(scores_exp, 1, keepdims=True)
  p = scores_exp / sum_s
  loss = np.sum(-np.log(p[np.arange(num_train), y]))

  ind = np.zeros_like(p)
  ind[np.arange(num_train), y] = 1
  dW = X.T.dot(p - ind)

  loss /= num_train
  dW /= num_train
  loss += reg * np.sum(W * W)
  dW += W * 2 * reg

结果

numerical: 1.597347 analytic: 1.597346, relative error: 4.468400e-08
numerical: -0.827809 analytic: -0.827809, relative error: 3.977940e-08
numerical: 1.958358 analytic: 1.958358, relative error: 2.840188e-08
numerical: 2.962181 analytic: 2.962180, relative error: 1.708981e-08
numerical: 0.226635 analytic: 0.226635, relative error: 1.232015e-07
numerical: -4.365886 analytic: -4.365886, relative error: 1.991957e-08
numerical: -1.552392 analytic: -1.552392, relative error: 1.729607e-08
numerical: -0.602468 analytic: -0.602468, relative error: 8.903067e-08
numerical: -1.708499 analytic: -1.708499, relative error: 9.391237e-09
numerical: -0.255113 analytic: -0.255113, relative error: 3.232163e-07
numerical: 0.010474 analytic: 0.010474, relative error: 6.247893e-06
numerical: 1.042362 analytic: 1.042362, relative error: 2.746680e-08
numerical: -1.611347 analytic: -1.611348, relative error: 2.727858e-08
numerical: 0.825958 analytic: 0.825958, relative error: 3.310199e-08
numerical: 0.484694 analytic: 0.484694, relative error: 1.599890e-07
numerical: -3.753818 analytic: -3.753818, relative error: 2.689649e-08
numerical: 0.550515 analytic: 0.550515, relative error: 1.091313e-07
numerical: -0.880654 analytic: -0.880654, relative error: 3.377934e-08
numerical: 1.652611 analytic: 1.652611, relative error: 2.442218e-08
numerical: 1.257297 analytic: 1.257297, relative error: 5.101717e-08

可见计算的梯度与数值梯度差距十分的小

SGD训练

代码实现

# Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
from cs231n.classifiers import Softmax
results = {}
best_val = -1
best_softmax = None
learning_rates = [1e-7, 5e-7]
regularization_strengths = [2.5e4, 5e4]

################################################################################
# TODO:                                                                        #
# Use the validation set to set the learning rate and regularization strength. #
# This should be identical to the validation that you did for the SVM; save    #
# the best trained softmax classifer in best_softmax.                          #
################################################################################
for lr in learning_rates:
    for reg in regularization_strengths:
        softmax = Softmax()
        loss_hist = softmax.train(X_train, y_train, lr, reg,
                      num_iters=500, verbose=True)
        y_train_pred = softmax.predict(X_train)
        acc_tr = np.mean(y_train == y_train_pred)
        y_val_pred = softmax.predict(X_val)
        acc_val = np.mean(y_val == y_val_pred)
        results[(lr, reg)] = (acc_tr, acc_val)
        if best_val < acc_val:
            best_val = acc_val
            best_softmax = softmax
################################################################################
#                              END OF YOUR CODE                                #
################################################################################
    
# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print('lr %e reg %e train accuracy: %f val accuracy: %f' % (
                lr, reg, train_accuracy, val_accuracy))
    
print('best validation accuracy achieved during cross-validation: %f' % best_val)

结果

iteration 0 / 500: loss 776.772172
iteration 100 / 500: loss 285.134480
iteration 200 / 500: loss 105.757440
iteration 300 / 500: loss 39.981648
iteration 400 / 500: loss 15.989359
iteration 0 / 500: loss 1551.039707
iteration 100 / 500: loss 208.879550
iteration 200 / 500: loss 29.723389
iteration 300 / 500: loss 5.831771
iteration 400 / 500: loss 2.657849
iteration 0 / 500: loss 780.461502
iteration 100 / 500: loss 6.960413
iteration 200 / 500: loss 2.141252
iteration 300 / 500: loss 2.089460
iteration 400 / 500: loss 2.140546
iteration 0 / 500: loss 1551.688270
iteration 100 / 500: loss 2.223032
iteration 200 / 500: loss 2.113767
iteration 300 / 500: loss 2.133740
iteration 400 / 500: loss 2.122119
lr 1.000000e-07 reg 2.500000e+04 train accuracy: 0.314816 val accuracy: 0.334000
lr 1.000000e-07 reg 5.000000e+04 train accuracy: 0.313510 val accuracy: 0.331000
lr 5.000000e-07 reg 2.500000e+04 train accuracy: 0.325122 val accuracy: 0.342000
lr 5.000000e-07 reg 5.000000e+04 train accuracy: 0.290408 val accuracy: 0.302000
best validation accuracy achieved during cross-validation: 0.342000