Welcome to our new Data Science resource site!

logo
Programming for Data Science
He Init
Initializing search
    • Home
    • Python
    • Data Collection and Visualization
    • Machine Learning
    • Deep Learning
    • Time Series
    • Maths & Statistics
    • Extras
    • About
    • Home
      • Overview
      • End of Course Exercise
        • Outline
        • Introduction
        • Variables
        • Numbers
        • Strings
        • Operators
        • Containers
        • Flow Control
        • Advanced
        • Modules
        • File Handling
        • End of Course Exercise
        • Filter Function
        • Map Function
        • Reduce Function
        • NumPy Crash Course
        • Pandas Crash Course
        • Matplotlib Crash Course
      • NumPy Crash Course
      • Pandas Crash Course
      • Weather Data
      • Matplotlib Crash Course
      • Data Exploration Exercise
      • Handling Missing Data
      • Overview
      • Training Models
        • Introduction
        • Advanced
        • Feature Selection
        • Why Scaling
        • Feature Scaling (FPL)
        • Normalization and Standardization
      • Handling Missing Data
        • Classification Metrics
        • Regression Metrics
        • Pipelines
        • Hyperparameter Tuning
      • Introduction
        • 1D Tensors
        • 2D Tensors
        • Derivatives & Graphs
        • Simple Datasets
        • Pre-Built Datasets
        • Exercise
        • 1D Regression
        • One Parameter
        • Slope & Bias
        • Exercise
        • SGD
        • Mini-Batch GD
        • PyTorch Way
        • Training & Validation
        • Exercise
        • Multiple LR Prediction
        • Multiple LR Training
        • Multi-Target LR
        • Training Multiple Output
        • Exercise
        • Prediction
        • MSE Issues
        • Cross Entropy
        • Softmax
        • Exercise
        • Custom Datasets
        • DataLoaders
        • Transforms
        • Simple Hidden Layer
        • this is for exercises
        • XOR Problem
        • MNIST
        • Activation Functions
        • MNIST One Layer
        • MNIST Two Layer
        • Multiclass Spiral
        • Dropout Prediction
        • Dropout Regression
        • Initialization
        • Xavier Init
        • He Init
        • Momentum
        • NN with Momentum
        • Batch Normalization
        • Convolution Basics
        • Activation & Pooling
        • Multiple Channels
        • Simple CNN
        • CNN Small Image
        • CNN Batch Processing
      • Introduction
      • Analysis
      • Forecasting
      • Python Example
      • Overview
      • Eigen Values and Vectors
      • Descriptive Statistics
      • Inferential Statistics
      • Statistical Models
      • Hypothesis Testing
      • Customer Analysis
      • How KNN Works
      • Handling Imbalanced Data
      • Classification Metrics
      • License
      • ReadMe

    Test Uniform, Default and He Initialization on MNIST Dataset with Relu Activation

    Objective for this Notebook

    1. Learn how to Define Several Neural Network, Criterion function, Optimizer.
    2. Test Uniform, Default and He Initialization

    Table of Contents

    In this lab, you will test the Uniform Initialization, Default Initialization and He Initialization on the MNIST dataset with Relu Activation

    • Neural Network Module and Training Function
    • Make Some Data
    • Define Several Neural Network, Criterion function, Optimizer
    • Test Uniform, Default and He Initialization
    • Analyze Results

    Estimated Time Needed: 25 min


    Preparation

    We'll need the following libraries:

    In [ ]:
    Copied!
    # Import the libraries we need to use in this lab
    
    # Using the following line code to install the torchvision library
    # !mamba install -y torchvision
    
    import torch 
    import torch.nn as nn
    import torchvision.transforms as transforms
    import torchvision.datasets as dsets
    import torch.nn.functional as F
    import matplotlib.pylab as plt
    import numpy as np
    
    torch.manual_seed(0)
    
    # Import the libraries we need to use in this lab # Using the following line code to install the torchvision library # !mamba install -y torchvision import torch import torch.nn as nn import torchvision.transforms as transforms import torchvision.datasets as dsets import torch.nn.functional as F import matplotlib.pylab as plt import numpy as np torch.manual_seed(0)

    Neural Network Module and Training Function

    Define the neural network module or class with He Initialization

    In [ ]:
    Copied!
    # Define the class for neural network model with He Initialization
    
    class Net_He(nn.Module):
        
        # Constructor
        def __init__(self, Layers):
            super(Net_He, self).__init__()
            self.hidden = nn.ModuleList()
    
            for input_size, output_size in zip(Layers, Layers[1:]):
                linear = nn.Linear(input_size, output_size)
                torch.nn.init.kaiming_uniform_(linear.weight, nonlinearity='relu')
                self.hidden.append(linear)
    
        # Prediction
        def forward(self, x):
            L = len(self.hidden)
            for (l, linear_transform) in zip(range(L), self.hidden):
                if l < L - 1:
                    x = F.relu(linear_transform(x))
                else:
                    x = linear_transform(x)
            return x
    
    # Define the class for neural network model with He Initialization class Net_He(nn.Module): # Constructor def __init__(self, Layers): super(Net_He, self).__init__() self.hidden = nn.ModuleList() for input_size, output_size in zip(Layers, Layers[1:]): linear = nn.Linear(input_size, output_size) torch.nn.init.kaiming_uniform_(linear.weight, nonlinearity='relu') self.hidden.append(linear) # Prediction def forward(self, x): L = len(self.hidden) for (l, linear_transform) in zip(range(L), self.hidden): if l < L - 1: x = F.relu(linear_transform(x)) else: x = linear_transform(x) return x

    Define the class or neural network with Uniform Initialization

    In [ ]:
    Copied!
    # Define the class for neural network model with Uniform Initialization
    
    class Net_Uniform(nn.Module):
        
        # Constructor
        def __init__(self, Layers):
            super(Net_Uniform, self).__init__()
            self.hidden = nn.ModuleList()
    
            for input_size, output_size in zip(Layers, Layers[1:]):
                linear = nn.Linear(input_size,output_size)
                linear.weight.data.uniform_(0, 1)
                self.hidden.append(linear)
        
        # Prediction
        def forward(self, x):
            L = len(self.hidden)
            for (l, linear_transform) in zip(range(L), self.hidden):
                if l < L - 1:
                    x = F.relu(linear_transform(x))
                else:
                    x = linear_transform(x)
                    
            return x
    
    # Define the class for neural network model with Uniform Initialization class Net_Uniform(nn.Module): # Constructor def __init__(self, Layers): super(Net_Uniform, self).__init__() self.hidden = nn.ModuleList() for input_size, output_size in zip(Layers, Layers[1:]): linear = nn.Linear(input_size,output_size) linear.weight.data.uniform_(0, 1) self.hidden.append(linear) # Prediction def forward(self, x): L = len(self.hidden) for (l, linear_transform) in zip(range(L), self.hidden): if l < L - 1: x = F.relu(linear_transform(x)) else: x = linear_transform(x) return x

    Class or Neural Network with PyTorch Default Initialization

    In [ ]:
    Copied!
    # Define the class for neural network model with PyTorch Default Initialization
    
    class Net(nn.Module):
        
        # Constructor
        def __init__(self, Layers):
            super(Net, self).__init__()
            self.hidden = nn.ModuleList()
    
            for input_size, output_size in zip(Layers, Layers[1:]):
                linear = nn.Linear(input_size, output_size)
                self.hidden.append(linear)
            
        def forward(self, x):
            L=len(self.hidden)
            for (l, linear_transform) in zip(range(L), self.hidden):
                if l < L - 1:
                    x = F.relu(linear_transform(x))
                else:
                    x = linear_transform(x)
                    
            return x
    
    # Define the class for neural network model with PyTorch Default Initialization class Net(nn.Module): # Constructor def __init__(self, Layers): super(Net, self).__init__() self.hidden = nn.ModuleList() for input_size, output_size in zip(Layers, Layers[1:]): linear = nn.Linear(input_size, output_size) self.hidden.append(linear) def forward(self, x): L=len(self.hidden) for (l, linear_transform) in zip(range(L), self.hidden): if l < L - 1: x = F.relu(linear_transform(x)) else: x = linear_transform(x) return x

    Define a function to train the model, in this case the function returns a Python dictionary to store the training loss and accuracy on the validation data

    In [ ]:
    Copied!
    # Define function to  train model
    
    def train(model, criterion, train_loader, validation_loader, optimizer, epochs = 100):
        i = 0
        loss_accuracy = {'training_loss': [], 'validation_accuracy': []}  
        
        #n_epochs
        for epoch in range(epochs):
            for i, (x, y) in enumerate(train_loader):
                optimizer.zero_grad()
                z = model(x.view(-1, 28 * 28))
                loss = criterion(z, y)
                loss.backward()
                optimizer.step()
                loss_accuracy['training_loss'].append(loss.data.item())
            
            correct = 0
            for x, y in validation_loader:
                yhat = model(x.view(-1, 28 * 28))
                _, label = torch.max(yhat, 1)
                correct += (label == y).sum().item()
            accuracy = 100 * (correct / len(validation_dataset))
            loss_accuracy['validation_accuracy'].append(accuracy)
        
        return loss_accuracy
    
    # Define function to train model def train(model, criterion, train_loader, validation_loader, optimizer, epochs = 100): i = 0 loss_accuracy = {'training_loss': [], 'validation_accuracy': []} #n_epochs for epoch in range(epochs): for i, (x, y) in enumerate(train_loader): optimizer.zero_grad() z = model(x.view(-1, 28 * 28)) loss = criterion(z, y) loss.backward() optimizer.step() loss_accuracy['training_loss'].append(loss.data.item()) correct = 0 for x, y in validation_loader: yhat = model(x.view(-1, 28 * 28)) _, label = torch.max(yhat, 1) correct += (label == y).sum().item() accuracy = 100 * (correct / len(validation_dataset)) loss_accuracy['validation_accuracy'].append(accuracy) return loss_accuracy

    Make some Data

    Load the training dataset by setting the parameters train to True and convert it to a tensor by placing a transform object int the argument transform

    In [ ]:
    Copied!
    # Create the training dataset
    
    train_dataset = dsets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())
    
    # Create the training dataset train_dataset = dsets.MNIST(root='./data', train=True, download=True, transform=transforms.ToTensor())

    Load the testing dataset by setting the parameters train False and convert it to a tensor by placing a transform object int the argument transform

    In [ ]:
    Copied!
    # Create the validation dataset
    
    validation_dataset = dsets.MNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())
    
    # Create the validation dataset validation_dataset = dsets.MNIST(root='./data', train=False, download=True, transform=transforms.ToTensor())

    Create the training-data loader and the validation-data loader object

    In [ ]:
    Copied!
    # Create the data loader for training and validation
    
    train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=2000, shuffle=True)
    validation_loader = torch.utils.data.DataLoader(dataset=validation_dataset, batch_size=5000, shuffle=False)
    
    # Create the data loader for training and validation train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=2000, shuffle=True) validation_loader = torch.utils.data.DataLoader(dataset=validation_dataset, batch_size=5000, shuffle=False)

    Define Neural Network, Criterion function, Optimizer and Train the Model

    Create the criterion function

    In [ ]:
    Copied!
    # Create the criterion function
    
    criterion = nn.CrossEntropyLoss()
    
    # Create the criterion function criterion = nn.CrossEntropyLoss()

    Create a list that contains layer size

    In [ ]:
    Copied!
    # Create the parameters
    
    input_dim = 28 * 28
    output_dim = 10
    layers = [input_dim, 100, 200, 100, output_dim]
    
    # Create the parameters input_dim = 28 * 28 output_dim = 10 layers = [input_dim, 100, 200, 100, output_dim]

    Test PyTorch Default Initialization, Xavier Initialization and Uniform Initialization

    Train the network using PyTorch Default Initialization

    In [ ]:
    Copied!
    # Train the model with the default initialization
    
    model = Net(layers)
    learning_rate = 0.01
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
    training_results = train(model, criterion, train_loader,validation_loader, optimizer, epochs=30)
    
    # Train the model with the default initialization model = Net(layers) learning_rate = 0.01 optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) training_results = train(model, criterion, train_loader,validation_loader, optimizer, epochs=30)

    Train the network using He Initialization function

    In [ ]:
    Copied!
    # Train the model with the He initialization
    
    model_He = Net_He(layers)
    optimizer = torch.optim.SGD(model_He.parameters(), lr=learning_rate)
    training_results_He = train(model_He, criterion, train_loader, validation_loader, optimizer, epochs=30)
    
    # Train the model with the He initialization model_He = Net_He(layers) optimizer = torch.optim.SGD(model_He.parameters(), lr=learning_rate) training_results_He = train(model_He, criterion, train_loader, validation_loader, optimizer, epochs=30)

    Train the network using Uniform Initialization function

    In [ ]:
    Copied!
    # Train the model with the Uniform initialization
    
    model_Uniform = Net_Uniform(layers)
    optimizer = torch.optim.SGD(model_Uniform.parameters(), lr=learning_rate)
    training_results_Uniform = train(model_Uniform, criterion, train_loader, validation_loader, optimizer, epochs=30)
    
    # Train the model with the Uniform initialization model_Uniform = Net_Uniform(layers) optimizer = torch.optim.SGD(model_Uniform.parameters(), lr=learning_rate) training_results_Uniform = train(model_Uniform, criterion, train_loader, validation_loader, optimizer, epochs=30)

    Analyze Results

    Compare the training loss for each activation

    In [ ]:
    Copied!
    # Plot the loss
    
    plt.plot(training_results_He['training_loss'], label='He')
    plt.plot(training_results['training_loss'], label='Default')
    plt.plot(training_results_Uniform['training_loss'], label='Uniform')
    plt.ylabel('loss')
    plt.xlabel('iteration ') 
    plt.title('training loss iterations')
    plt.legend()
    
    # Plot the loss plt.plot(training_results_He['training_loss'], label='He') plt.plot(training_results['training_loss'], label='Default') plt.plot(training_results_Uniform['training_loss'], label='Uniform') plt.ylabel('loss') plt.xlabel('iteration ') plt.title('training loss iterations') plt.legend()

    Compare the validation loss for each model

    In [ ]:
    Copied!
    # Plot the accuracy
    
    plt.plot(training_results_He['validation_accuracy'], label='He')
    plt.plot(training_results['validation_accuracy'], label='Default')
    plt.plot(training_results_Uniform['validation_accuracy'], label='Uniform') 
    plt.ylabel('validation accuracy')
    plt.xlabel('epochs ')   
    plt.legend()
    plt.show()
    
    # Plot the accuracy plt.plot(training_results_He['validation_accuracy'], label='He') plt.plot(training_results['validation_accuracy'], label='Default') plt.plot(training_results_Uniform['validation_accuracy'], label='Uniform') plt.ylabel('validation accuracy') plt.xlabel('epochs ') plt.legend() plt.show()

    What's on your mind? Put it in the comments!

    June 3, 2025 June 3, 2025

    © 2025 DATAIDEA. All rights reserved. Built with ❤️ by Juma Shafara.