梯度下降更新的Adam优化似乎不适用于逻辑回归

2024-06-01 01:43:31 发布

您现在位置:Python中文网/ 问答频道 /正文

Helo,我在学习机器学习第一原理,所以我用numpy和微积分从头开始用back-prop编码逻辑回归。用加权平均(动量)更新衍生工具对我有效,但对RMSProp或Adam无效,因为成本不会下降。我做错事了

亚当的主要障碍是这个

# momentum
VW = beta*VW + (1-beta)*dW
Vb = beta*Vb + (1-beta)*db
# rmsprop
SW = beta2*SW + (1-beta2)*dW**2
Sb = beta2*Sb + (1-beta2)*db**2
# update weight TODO: Adam doesntwork>
W -= learning_rate*VW/(np.sqrt(SW)+epsilon)
b -= learning_rate*Vb/(np.sqrt(Sb)+epsilon)

完整的代码是这样的

# load dataset breast cancer
import sklearn
from sklearn import *
import numpy as np
import matplotlib.pyplot as plt
X,y = sklearn.datasets.load_breast_cancer(return_X_y=True, as_frame=False)
# scaling input
X = (X-np.mean(X,0))/np.std(X,0)
# avoid rank 1 vector
y = y.reshape(len(y),1)

# stat
m = X.shape[0]
n = X.shape[1]

# hyper parameters
num_iter = 20000
learning_rate = 1e-6 
beta = 0.9
beta2 = 0.999
epsilon = 1e-8

# init
np.random.seed(42)
W = np.random.randn(n,1)
b = np.random.randn(1)
y = y.reshape(len(y),1)
VW = np.zeros((n,1))
Vb = np.zeros(1)
SW = np.zeros((n,1))
Sb = np.zeros(1)


for i in range(num_iter):
    # forward

    Z = X.dot(W) + b # m,nclass

    # sigmoid
    A = 1/(1+np.exp(-Z))
    # categorical cross-entropy
    # cost = -np.sum(y*np.log(A))/m

    # binary classification cost
    j = (-y*np.log(A)- (1-y)*np.log(1-A)).sum()*(1/m)
    
    if i % 1000 == 999:
        print(i, j)
    
    # backward

    # derivative respect to j
    dA = (A-y)/(A*(1-A))
    dZ = A-y
    
    dW = X.transpose().dot(dZ)
    db = dZ.sum()
    # momentum
    VW = beta*VW + (1-beta)*dW
    Vb = beta*Vb + (1-beta)*db
    # rmsprop
    SW = beta2*SW + (1-beta2)*dW**2
    Sb = beta2*Sb + (1-beta2)*db**2
    # update weight TODO: Adam doesntwork>
    W -= learning_rate*VW/(np.sqrt(SW)+epsilon)
    b -= learning_rate*Vb/(np.sqrt(Sb)+epsilon)

print(sklearn.metrics.classification_report(y,np.round(A),target_names=['benign','malignant']))

结果表明,对于这个特殊的问题,RMSProp/Adam的收敛时间比梯度下降要长得多。我的实现是正确的


Tags: importdbratenpsqrtswbetasb