SGD

Linear regression 1D: Training Two Parameter Stochastic Gradient Descent (SGD)
Objective
- How to use SGD(Stochastic Gradient Descent) to train the model.
Table of Contents
In this Lab, you will practice training a model by using Stochastic Gradient descent.
- Make Some Data
- Create the Model and Cost Function (Total Loss)
- Train the Model:Batch Gradient Descent
- Train the Model:Stochastic gradient descent
- Train the Model:Stochastic gradient descent with Data Loader
Estimated Time Needed: 30 min
Preparation
We'll need the following libraries:
# These are the libraries we are going to use in the lab.
import torch
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits import mplot3d
from dataidea_science.plots import plot_error_surfaces
The class plot_error_surfaces is just to help you visualize the data space and the parameter space during training and has nothing to do with PyTorch.
Make Some Data
Set random seed:
<torch._C.Generator at 0x79c1633e7c10>
Generate values from -3 to 3 that create a line with a slope of 1 and a bias of -1. This is the line that you need to estimate. Add some noise to the data:
# Setup the actual data and simulated data
X = torch.arange(-3, 3, 0.1).view(-1, 1)
f = 1 * X - 1
Y = f + 0.1 * torch.randn(X.size())
Plot the results:
Create the Model and Cost Function (Total Loss)
Define the forward function:
Define the cost or criterion function (MSE):
Create a plot_error_surfaces object to visualize the data space and the parameter space during training:
# Create plot_error_surfaces for viewing the data
get_surface = plot_error_surfaces(15, 13, X, Y, 30)
<Figure size 640x480 with 0 Axes>


Train the Model: Batch Gradient Descent
Create model parameters w, b by setting the argument requires_grad to True because the system must learn it.
# Define the parameters w, b for y = wx + b
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
Set the learning rate to 0.1 and create an empty list LOSS for storing the loss for each iteration.
# Define learning rate and create an empty list for containing the loss for each iteration.
lr = 0.1
LOSS_BGD = []
Define train_model function for train the model.
# The function for training the model
def train_model(iter):
# Loop
for epoch in range(iter):
# make a prediction
Yhat = forward(X)
# calculate the loss
loss = criterion(Yhat, Y)
# Section for plotting
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
get_surface.plot_ps()
# store the loss in the list LOSS_BGD
LOSS_BGD.append(loss)
# backward pass: compute gradient of the loss with respect to all the learnable parameters
loss.backward()
# update parameters slope and bias
w.data = w.data - lr * w.grad.data
b.data = b.data - lr * b.grad.data
# zero the gradients before running the backward pass
w.grad.data.zero_()
b.grad.data.zero_()
Run 10 epochs of batch gradient descent: bug data space is 1 iteration ahead of parameter space.










Train the Model: Stochastic Gradient Descent
Create a plot_error_surfaces object to visualize the data space and the parameter space during training:
# Create plot_error_surfaces for viewing the data
get_surface = plot_error_surfaces(15, 13, X, Y, 30, go = False)
Define train_model_SGD function for training the model.
# The function for training the model
LOSS_SGD = []
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
def train_model_SGD(iter):
# Loop
for epoch in range(iter):
# SGD is an approximation of out true total loss/cost, in this line of code we calculate our true loss/cost and store it
Yhat = forward(X)
# store the loss
LOSS_SGD.append(criterion(Yhat, Y).tolist())
for x, y in zip(X, Y):
# make a pridiction
yhat = forward(x)
# calculate the loss
loss = criterion(yhat, y)
# Section for plotting
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
# backward pass: compute gradient of the loss with respect to all the learnable parameters
loss.backward()
# update parameters slope and bias
w.data = w.data - lr * w.grad.data
b.data = b.data - lr * b.grad.data
# zero the gradients before running the backward pass
w.grad.data.zero_()
b.grad.data.zero_()
#plot surface and data space after each epoch
get_surface.plot_ps()
Run 10 epochs of stochastic gradient descent: bug data space is 1 iteration ahead of parameter space.










Compare the loss of both batch gradient descent as SGD.
# Plot out the LOSS_BGD and LOSS_SGD
plt.plot(LOSS_BGD_,label = "Batch Gradient Descent")
plt.plot(LOSS_SGD,label = "Stochastic Gradient Descent")
plt.xlabel('epoch')
plt.ylabel('Cost/ total loss')
plt.legend()
plt.show()

SGD with Dataset DataLoader
Import the module for building a dataset class:
Create a dataset class:
# Dataset Class
class Data(Dataset):
# Constructor
def __init__(self):
self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
self.y = 1 * self.x - 1
self.len = self.x.shape[0]
# Getter
def __getitem__(self,index):
return self.x[index], self.y[index]
# Return the length
def __len__(self):
return self.len
Create a dataset object and check the length of the dataset.
# Create the dataset and check the length
dataset = Data()
print("The length of dataset: ", len(dataset))
The length of dataset: 60
Obtain the first training point:
( tensor([-3.]) , tensor([-4.]) )
Similarly, obtain the first three training points:
# Print the first 3 point
x, y = dataset[0:3]
print("The first 3 x: ", x)
print("The first 3 y: ", y)
The first 3 x: tensor([[-3.0000],
[-2.9000],
[-2.8000]])
The first 3 y: tensor([[-4.0000],
[-3.9000],
[-3.8000]])
Create a plot_error_surfaces object to visualize the data space and the parameter space during training:
# Create plot_error_surfaces for viewing the data
get_surface = plot_error_surfaces(15, 13, X, Y, 30, go = False)
Create a DataLoader object by using the constructor:
Define train_model_DataLoader function for training the model.
# The function for training the model
w = torch.tensor(-15.0,requires_grad=True)
b = torch.tensor(-10.0,requires_grad=True)
LOSS_Loader = []
def train_model_DataLoader(epochs):
# Loop
for epoch in range(epochs):
# SGD is an approximation of out true total loss/cost, in this line of code we calculate our true loss/cost and store it
Yhat = forward(X)
# store the loss
LOSS_Loader.append(criterion(Yhat, Y).tolist())
for x, y in trainloader:
# make a prediction
yhat = forward(x)
# calculate the loss
loss = criterion(yhat, y)
# Section for plotting
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
# Backward pass: compute gradient of the loss with respect to all the learnable parameters
loss.backward()
# Updata parameters slope
w.data = w.data - lr * w.grad.data
b.data = b.data - lr* b.grad.data
# Clear gradients
w.grad.data.zero_()
b.grad.data.zero_()
#plot surface and data space after each epoch
get_surface.plot_ps()
Run 10 epochs of stochastic gradient descent: bug data space is 1 iteration ahead of parameter space.










Compare the loss of both batch gradient decent as SGD. Note that SGD converges to a minimum faster, that is, it decreases faster.
# Plot the LOSS_BGD and LOSS_Loader
plt.plot(LOSS_BGD_,label="Batch Gradient Descent")
plt.plot(LOSS_Loader,label="Stochastic Gradient Descent with DataLoader")
plt.xlabel('epoch')
plt.ylabel('Cost/ total loss')
plt.legend()
plt.show()

Practice
For practice, try to use SGD with DataLoader to train model with 10 iterations. Store the total loss in LOSS. We are going to use it in the next question.
# Practice: Use SGD with trainloader to train model and store the total loss in LOSS
LOSS = []
w = torch.tensor(-12.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
Double-click here for the solution.
Plot the total loss
Double-click here for the solution.
About the Author:
Hi, My name is Juma Shafara. Am a Data Scientist and Instructor at DATAIDEA. I have taught hundreds of peope Programming, Data Analysis and Machine Learning.
I also enjoy developing innovative algorithms and models that can drive insights and value.
I regularly share some content that I find useful throughout my learning/teaching journey to simplify concepts in Machine Learning, Mathematics, Programming, and related topics on my website jumashafara.dataidea.org.
Besides these technical stuff, I enjoy watching soccer, movies and reading mystery books.