Mini-Batch GD
author: Juma Shafara date: "2024-08-08" title: Training Two Parameter Mini-Batch Gradient Decent keywords: Training Two Parameter, Mini-Batch Gradient Decent, Training Two Parameter Mini-Batch Gradient Decent] description: In this Lab, you will practice training a model by using Mini-Batch Gradient Descent.

Linear Regression 1D: Training Two Parameter Mini-Batch Gradient Decent
Objective
- How to use Mini-Batch Gradient Descent to train model.
Table of Contents
In this Lab, you will practice training a model by using Mini-Batch Gradient Descent.
- Make Some Data
- Create the Model and Cost Function (Total Loss)
- Train the Model: Batch Gradient Descent
- Train the Model: Stochastic Gradient Descent with Dataset DataLoader
- Train the Model: Mini Batch Gradient Decent: Batch Size Equals 5
- Train the Model: Mini Batch Gradient Decent: Batch Size Equals 10
Estimated Time Needed: 30 min
Preparation
We'll need the following libraries:
The class plot_error_surfaces is just to help you visualize the data space and the parameter space during training and has nothing to do with PyTorch.
Make Some Data
Import PyTorch and set random seed:
Generate values from -3 to 3 that create a line with a slope of 1 and a bias of -1. This is the line that you need to estimate. Add some noise to the data:
Plot the results:
Create the Model and Cost Function (Total Loss)
Define the forward function:
Define the cost or criterion function:
Create a plot_error_surfaces object to visualize the data space and the parameter space during training:
Train the Model: Batch Gradient Descent (BGD)
Define train_model_BGD function.
# Define the function for training model
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
lr = 0.1
LOSS_BGD = []
def train_model_BGD(epochs):
for epoch in range(epochs):
Yhat = forward(X)
loss = criterion(Yhat, Y)
LOSS_BGD.append(loss)
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
get_surface.plot_ps()
loss.backward()
w.data = w.data - lr * w.grad.data
b.data = b.data - lr * b.grad.data
w.grad.data.zero_()
b.grad.data.zero_()
Run 10 epochs of batch gradient descent: bug data space is 1 iteration ahead of parameter space.
Stochastic Gradient Descent (SGD) with Dataset DataLoader
Create a plot_error_surfaces object to visualize the data space and the parameter space during training:
Import Dataset and DataLoader libraries
Create Data class
Create a dataset object and a dataloader object:
Define train_model_SGD function for training the model.
# Define train_model_SGD function
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
LOSS_SGD = []
lr = 0.1
def train_model_SGD(epochs):
for epoch in range(epochs):
Yhat = forward(X)
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), criterion(Yhat, Y).tolist())
get_surface.plot_ps()
LOSS_SGD.append(criterion(forward(X), Y).tolist())
for x, y in trainloader:
yhat = forward(x)
loss = criterion(yhat, y)
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
loss.backward()
w.data = w.data - lr * w.grad.data
b.data = b.data - lr * b.grad.data
w.grad.data.zero_()
b.grad.data.zero_()
get_surface.plot_ps()
Run 10 epochs of stochastic gradient descent: bug data space is 1 iteration ahead of parameter space.
Mini Batch Gradient Descent: Batch Size Equals 5
Create a plot_error_surfaces object to visualize the data space and the parameter space during training:
Create Data object and create a Dataloader object where the batch size equals 5:
Define train_model_Mini5 function to train the model.
# Define train_model_Mini5 function
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
LOSS_MINI5 = []
lr = 0.1
def train_model_Mini5(epochs):
for epoch in range(epochs):
Yhat = forward(X)
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), criterion(Yhat, Y).tolist())
get_surface.plot_ps()
LOSS_MINI5.append(criterion(forward(X), Y).tolist())
for x, y in trainloader:
yhat = forward(x)
loss = criterion(yhat, y)
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
loss.backward()
w.data = w.data - lr * w.grad.data
b.data = b.data - lr * b.grad.data
w.grad.data.zero_()
b.grad.data.zero_()
Run 10 epochs of mini-batch gradient descent: bug data space is 1 iteration ahead of parameter space.
Mini Batch Gradient Descent: Batch Size Equals 10
Create a plot_error_surfaces object to visualize the data space and the parameter space during training:
Create Data object and create a Dataloader object batch size equals 10
Define train_model_Mini10 function for training the model.
# Define train_model_Mini5 function
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
LOSS_MINI10 = []
lr = 0.1
def train_model_Mini10(epochs):
for epoch in range(epochs):
Yhat = forward(X)
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), criterion(Yhat, Y).tolist())
get_surface.plot_ps()
LOSS_MINI10.append(criterion(forward(X),Y).tolist())
for x, y in trainloader:
yhat = forward(x)
loss = criterion(yhat, y)
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
loss.backward()
w.data = w.data - lr * w.grad.data
b.data = b.data - lr * b.grad.data
w.grad.data.zero_()
b.grad.data.zero_()
Run 10 epochs of mini-batch gradient descent: bug data space is 1 iteration ahead of parameter space.
Plot the loss for each epoch:
# Plot out the LOSS for each method
plt.plot(LOSS_BGD_,label = "Batch Gradient Descent")
plt.plot(LOSS_SGD,label = "Stochastic Gradient Descent")
plt.plot(LOSS_MINI5,label = "Mini-Batch Gradient Descent, Batch size: 5")
plt.plot(LOSS_MINI10,label = "Mini-Batch Gradient Descent, Batch size: 10")
plt.legend()
Practice
Perform mini batch gradient descent with a batch size of 20. Store the total loss for each epoch in the list LOSS20.
# Practice: Perform mini batch gradient descent with a batch size of 20.
dataset = Data()
trainloader = DataLoader(dataset = dataset, batch_size = 20)
w = torch.tensor(-15.0, requires_grad = True)
b = torch.tensor(-10.0, requires_grad = True)
LOSS_MINI20 = []
lr = 0.1
def my_train_model(epochs):
for epoc in range(epochs):
Yhat = forward(X)
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), criterion(Yhat, Y).tolist())
get_surface.plot_ps()
LOSS_MINI20.append(criterion(forward(X), Y).tolist())
for x, y in trainloader:
yhat = forward(x)
loss = criterion(yhat, y)
get_surface.set_para_loss(w.data.tolist(), b.data.tolist(), loss.tolist())
loss.backward()
w.data = w.data - lr * w.grad.data
b.data = b.data - lr * b.grad.data
w.grad.data.zero_()
b.grad.data.zero_()
Double-click here for the solution.
Plot a graph that shows the LOSS results for all the methods.
Double-click here for the solution.
About the Author:
Hi, My name is Juma Shafara. Am a Data Scientist and Instructor at DATAIDEA. I have taught hundreds of peope Programming, Data Analysis and Machine Learning.
I also enjoy developing innovative algorithms and models that can drive insights and value.
I regularly share some content that I find useful throughout my learning/teaching journey to simplify concepts in Machine Learning, Mathematics, Programming, and related topics on my website jumashafara.dataidea.org.
Besides these technical stuff, I enjoy watching soccer, movies and reading mystery books.