SGD
Linear regression 1D: Training Two Parameter Stochastic Gradient Descent (SGD)
Objective
- How to use SGD(Stochastic Gradient Descent) to train the model.
Table of Contents
In this Lab, you will practice training a model by using Stochastic Gradient descent.
- Make Some Data
- Create the Model and Cost Function (Total Loss)
- Train the Model:Batch Gradient Descent
- Train the Model:Stochastic gradient descent
- Train the Model:Stochastic gradient descent with Data Loader
Estimated Time Needed: 30 min
Preparation
We'll need the following libraries:
# These are the libraries we are going to use in the lab.
import torch
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits import mplot3d
from dataidea_science.plots import plot_error_surfaces
The class plot_error_surfaces
is just to help you visualize the data space and the parameter space during training and has nothing to do with PyTorch.
Make Some Data
Set random seed:
<torch._C.Generator at 0x79c1633e7c10>
Generate values from -3 to 3 that create a line with a slope of 1 and a bias of -1. This is the line that you need to estimate. Add some noise to the data:
# Setup the actual data and simulated data
X = torch.arange(-3, 3, 0.1).view(-1, 1)
f = 1 * X - 1
Y = f + 0.1 * torch.randn(X.size())
Plot the results:
Create the Model and Cost Function (Total Loss)
Define the forward
function:
Define the cost or criterion function (MSE):
Create a plot_error_surfaces
object to visualize the data space and the parameter space during training:
# Create plot_error_surfaces for viewing the data
get_surface = plot_error_surfaces(15, 13, X, Y, 30)
<Figure size 640x480 with 0 Axes>
Train the Model: Batch Gradient Descent
Create model parameters w
, b
by setting the argument requires_grad
to True because the system must learn it.
# Define the parameters w, b for y = wx + b
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
Set the learning rate to 0.1 and create an empty list LOSS
for storing the loss for each iteration.
# Define learning rate and create an empty list for containing the loss for each iteration.
lr = 0.1
LOSS_BGD = []
Define train_model
function for train the model.
# The function for training the model
def train_model(iter):
# Loop
for epoch in range(iter):
# make a prediction
Yhat = forward(X)
# calculate the loss
loss = criterion(Yhat, Y)
# Section for plotting
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
get_surface.plot_ps()
# store the loss in the list LOSS_BGD
LOSS_BGD.append(loss)
# backward pass: compute gradient of the loss with respect to all the learnable parameters
loss.backward()
# update parameters slope and bias
w.data = w.data - lr * w.grad.data
b.data = b.data - lr * b.grad.data
# zero the gradients before running the backward pass
w.grad.data.zero_()
b.grad.data.zero_()
Run 10 epochs of batch gradient descent: bug data space is 1 iteration ahead of parameter space.
Train the Model: Stochastic Gradient Descent
Create a plot_error_surfaces
object to visualize the data space and the parameter space during training:
# Create plot_error_surfaces for viewing the data
get_surface = plot_error_surfaces(15, 13, X, Y, 30, go = False)
Define train_model_SGD
function for training the model.
# The function for training the model
LOSS_SGD = []
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
def train_model_SGD(iter):
# Loop
for epoch in range(iter):
# SGD is an approximation of out true total loss/cost, in this line of code we calculate our true loss/cost and store it
Yhat = forward(X)
# store the loss
LOSS_SGD.append(criterion(Yhat, Y).tolist())
for x, y in zip(X, Y):
# make a pridiction
yhat = forward(x)
# calculate the loss
loss = criterion(yhat, y)
# Section for plotting
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
# backward pass: compute gradient of the loss with respect to all the learnable parameters
loss.backward()
# update parameters slope and bias
w.data = w.data - lr * w.grad.data
b.data = b.data - lr * b.grad.data
# zero the gradients before running the backward pass
w.grad.data.zero_()
b.grad.data.zero_()
#plot surface and data space after each epoch
get_surface.plot_ps()
Run 10 epochs of stochastic gradient descent: bug data space is 1 iteration ahead of parameter space.
Compare the loss of both batch gradient descent as SGD.
# Plot out the LOSS_BGD and LOSS_SGD
plt.plot(LOSS_BGD_,label = "Batch Gradient Descent")
plt.plot(LOSS_SGD,label = "Stochastic Gradient Descent")
plt.xlabel('epoch')
plt.ylabel('Cost/ total loss')
plt.legend()
plt.show()
SGD with Dataset DataLoader
Import the module for building a dataset class:
Create a dataset class:
# Dataset Class
class Data(Dataset):
# Constructor
def __init__(self):
self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
self.y = 1 * self.x - 1
self.len = self.x.shape[0]
# Getter
def __getitem__(self,index):
return self.x[index], self.y[index]
# Return the length
def __len__(self):
return self.len
Create a dataset object and check the length of the dataset.
# Create the dataset and check the length
dataset = Data()
print("The length of dataset: ", len(dataset))
The length of dataset: 60
Obtain the first training point:
( tensor([-3.]) , tensor([-4.]) )
Similarly, obtain the first three training points:
# Print the first 3 point
x, y = dataset[0:3]
print("The first 3 x: ", x)
print("The first 3 y: ", y)
The first 3 x: tensor([[-3.0000],
[-2.9000],
[-2.8000]])
The first 3 y: tensor([[-4.0000],
[-3.9000],
[-3.8000]])
Create a plot_error_surfaces
object to visualize the data space and the parameter space during training:
# Create plot_error_surfaces for viewing the data
get_surface = plot_error_surfaces(15, 13, X, Y, 30, go = False)
Create a DataLoader
object by using the constructor:
Define train_model_DataLoader
function for training the model.
# The function for training the model
w = torch.tensor(-15.0,requires_grad=True)
b = torch.tensor(-10.0,requires_grad=True)
LOSS_Loader = []
def train_model_DataLoader(epochs):
# Loop
for epoch in range(epochs):
# SGD is an approximation of out true total loss/cost, in this line of code we calculate our true loss/cost and store it
Yhat = forward(X)
# store the loss
LOSS_Loader.append(criterion(Yhat, Y).tolist())
for x, y in trainloader:
# make a prediction
yhat = forward(x)
# calculate the loss
loss = criterion(yhat, y)
# Section for plotting
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
# Backward pass: compute gradient of the loss with respect to all the learnable parameters
loss.backward()
# Updata parameters slope
w.data = w.data - lr * w.grad.data
b.data = b.data - lr* b.grad.data
# Clear gradients
w.grad.data.zero_()
b.grad.data.zero_()
#plot surface and data space after each epoch
get_surface.plot_ps()
Run 10 epochs of stochastic gradient descent: bug data space is 1 iteration ahead of parameter space.
Compare the loss of both batch gradient decent as SGD. Note that SGD converges to a minimum faster, that is, it decreases faster.
# Plot the LOSS_BGD and LOSS_Loader
plt.plot(LOSS_BGD_,label="Batch Gradient Descent")
plt.plot(LOSS_Loader,label="Stochastic Gradient Descent with DataLoader")
plt.xlabel('epoch')
plt.ylabel('Cost/ total loss')
plt.legend()
plt.show()
Practice
For practice, try to use SGD with DataLoader to train model with 10 iterations. Store the total loss in LOSS
. We are going to use it in the next question.
# Practice: Use SGD with trainloader to train model and store the total loss in LOSS
LOSS = []
w = torch.tensor(-12.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
Double-click here for the solution.
Plot the total loss
Double-click here for the solution.
About the Author:
Hi, My name is Juma Shafara. Am a Data Scientist and Instructor at DATAIDEA. I have taught hundreds of peope Programming, Data Analysis and Machine Learning.
I also enjoy developing innovative algorithms and models that can drive insights and value.
I regularly share some content that I find useful throughout my learning/teaching journey to simplify concepts in Machine Learning, Mathematics, Programming, and related topics on my website jumashafara.dataidea.org.
Besides these technical stuff, I enjoy watching soccer, movies and reading mystery books.