# These are the libraries that will be used for this lab.
import torch
import torch.nn as nn
import matplotlib.pylab as plt
import numpy as np
0) torch.manual_seed(
Momentum
Objective for this Notebook
- Learn Saddle Points, Local Minima, and Noise
Table of Contents
In this lab, you will deal with several problems associated with optimization and see how momentum can improve your results.
Estimated Time Needed: 25 min
Preparation
Import the following libraries that you’ll use for this lab:
This function will plot a cubic function and the parameter values obtained via Gradient Descent.
# Plot the cubic
def plot_cubic(w, optimizer):
= []
LOSS # parameter values
= torch.arange(-4, 4, 0.1)
W # plot the loss fuction
for w.state_dict()['linear.weight'][0] in W:
1.0]]))).item())
LOSS.append(cubic(w(torch.tensor([['linear.weight'][0] = 4.0
w.state_dict()[= 10
n_epochs = []
parameter = []
loss_list
# n_epochs
# Use PyTorch custom module to implement a ploynomial function
for n in range(n_epochs):
optimizer.zero_grad() = cubic(w(torch.tensor([[1.0]])))
loss
loss_list.append(loss)'linear.weight'][0].detach().data.item())
parameter.append(w.state_dict()[
loss.backward()
optimizer.step()'ro', label='parameter values')
plt.plot(parameter, loss_list, ='objective function')
plt.plot(W.numpy(), LOSS, label'w')
plt.xlabel('l(w)')
plt.ylabel( plt.legend()
This function will plot a 4th order function and the parameter values obtained via Gradient Descent. You can also add Gaussian noise with a standard deviation determined by the parameter std
.
# Plot the fourth order function and the parameter values
def plot_fourth_order(w, optimizer, std=0, color='r', paramlabel='parameter values', objfun=True):
= torch.arange(-4, 6, 0.1)
W = []
LOSS for w.state_dict()['linear.weight'][0] in W:
1.0]]))).item())
LOSS.append(fourth_order(w(torch.tensor([['linear.weight'][0] = 6
w.state_dict()[= 100
n_epochs = []
parameter = []
loss_list
#n_epochs
for n in range(n_epochs):
optimizer.zero_grad()= fourth_order(w(torch.tensor([[1.0]]))) + std * torch.randn(1, 1)
loss
loss_list.append(loss)'linear.weight'][0].detach().data.item())
parameter.append(w.state_dict()[
loss.backward()
optimizer.step()
# Plotting
if objfun:
='objective function')
plt.plot(W.numpy(), LOSS, label'ro',label=paramlabel, color=color)
plt.plot(parameter, loss_list, 'w')
plt.xlabel('l(w)')
plt.ylabel( plt.legend()
This is a custom module. It will behave like a single parameter value. We do it this way so we can use PyTorch’s build-in optimizers .
# Create a linear model
class one_param(nn.Module):
# Constructor
def __init__(self, input_size, output_size):
super(one_param, self).__init__()
self.linear = nn.Linear(input_size, output_size, bias=False)
# Prediction
def forward(self, x):
= self.linear(x)
yhat return yhat
We create an object w
, when we call the object with an input of one, it will behave like an individual parameter value. i.e w(1)
is analogous to \(w\)
# Create a one_param object
= one_param(1, 1) w
Saddle Points
Let’s create a cubic function with Saddle points
# Define a function to output a cubic
def cubic(yhat):
= yhat ** 3
out return out
We create an optimizer with no momentum term
# Create a optimizer without momentum
= torch.optim.SGD(w.parameters(), lr=0.01, momentum=0) optimizer
We run several iterations of stochastic gradient descent and plot the results. We see the parameter values get stuck in the saddle point.
# Plot the model
plot_cubic(w, optimizer)
we create an optimizer with momentum term of 0.9
# Create a optimizer with momentum
= torch.optim.SGD(w.parameters(), lr=0.01, momentum=0.9) optimizer
We run several iterations of stochastic gradient descent with momentum and plot the results. We see the parameter values do not get stuck in the saddle point.
# Plot the model
plot_cubic(w, optimizer)
Local Minima
In this section, we will create a fourth order polynomial with a local minimum at 4 and a global minimum a -2. We will then see how the momentum parameter affects convergence to a global minimum. The fourth order polynomial is given by:
# Create a function to calculate the fourth order polynomial
def fourth_order(yhat):
= torch.mean(2 * (yhat ** 4) - 9 * (yhat ** 3) - 21 * (yhat ** 2) + 88 * yhat + 48)
out return out
We create an optimizer with no momentum term. We run several iterations of stochastic gradient descent and plot the results. We see the parameter values get stuck in the local minimum.
# Make the prediction without momentum
= torch.optim.SGD(w.parameters(), lr=0.001)
optimizer plot_fourth_order(w, optimizer)
We create an optimizer with a momentum term of 0.9. We run several iterations of stochastic gradient descent and plot the results. We see the parameter values reach a global minimum.
# Make the prediction with momentum
= torch.optim.SGD(w.parameters(), lr=0.001, momentum=0.9)
optimizer plot_fourth_order(w, optimizer)
Noise
In this section, we will create a fourth order polynomial with a local minimum at 4 and a global minimum a -2, but we will add noise to the function when the Gradient is calculated. We will then see how the momentum parameter affects convergence to a global minimum.
with no momentum, we get stuck in a local minimum
# Make the prediction without momentum when there is noise
= torch.optim.SGD(w.parameters(), lr=0.001)
optimizer =10) plot_fourth_order(w, optimizer, std
with momentum, we get to the global minimum
# Make the prediction with momentum when there is noise
= torch.optim.SGD(w.parameters(), lr=0.001,momentum=0.9)
optimizer =10) plot_fourth_order(w, optimizer, std
Practice
Create two SGD
objects with a learning rate of 0.001
. Use the default momentum parameter value for one and a value of 0.9
for the second. Use the function plot_fourth_order
with an std=100
, to plot the different steps of each. Make sure you run the function on two independent cells.
# Practice: Create two SGD optimizer with lr = 0.001, and one without momentum and the other with momentum = 0.9. Plot the result out.
# Type your code here
Double-click here for the solution.