Simple Datasets
title: Simple Dataset author: Juma Shafara date: "2023-09" date-modified: "2024-09-02" description: In this lab, you will construct a basic dataset by using PyTorch and learn how to apply basic transformations to it. keywords: [ custom dataset classes in pytorch, Simple dataset, Transforms, Compose, ]

Objective
- How to create a dataset in pytorch.
- How to perform transformations on the dataset.
Table of Contents
In this lab, you will construct a basic dataset by using PyTorch and learn how to apply basic transformations to it.
Estimated Time Needed: 30 min
Preparation
The following are the libraries we are going to use for this lab. The torch.manual_seed() is for forcing the random function to give the same number every time we try to recompile it.
Simple dataset
Let us try to create our own dataset class.
# Define class for dataset
class toy_set(Dataset):
# Constructor with defult values
def __init__(self, length = 10, transform = None):
self.len = length
self.x = 2 * torch.ones(length, 2)
self.y = torch.ones(length, 1)
self.transform = transform
# Getter
def __getitem__(self, index):
sample = self.x[index], self.y[index]
if self.transform:
sample = self.transform(sample)
return sample
# Get Length
def __len__(self):
return self.len
Now, let us create our toy_set object, and find out the value on index 1 and the length of the inital dataset
As a result, we can apply the same indexing convention as a list, and apply the fuction len on the toy_set object. We are able to customize the indexing and length method by def getitem(self, index) and def len(self).
Now, let us print out the first 3 elements and assign them to x and y:
The dataset object is an Iterable; as a result, we apply the loop directly on the dataset object
An existing dataset
For purposes of learning, we will use a simple dataset from the dataidea package called music. It's made up of two features, age and gender and outcome variable as genre
We can create a custom Class for this dataset as demonstrated below
Now let's create a MusicDataset object and get use the methods to access some data and info
Let's have a look at the first 5 rows
Practice
Try to create an toy_set object with length 50. Print out the length of your object.
Double-click here for the solution.
Transforms
You can also create a class for transforming the data. In this case, we will try to add 1 to x and multiply y by 2:
Now, create a transform object:.
Assign the outputs of the original dataset to x and y. Then, apply the transform add_mult to the dataset and output the values as x_ and y_, respectively:
As the result, x has been added by 1 and y has been multiplied by 2, as [2, 2] + 1 = [3, 3] and [1] x 2 = [2]
We can apply the transform object every time we create a new toy_set object? Remember, we have the constructor in toy_set class with the parameter transform = None. When we create a new object using the constructor, we can assign the transform object to the parameter transform, as the following code demonstrates.
This applied a_m object (a transform method) to every element in cust_data_set as initialized. Let us print out the first 10 elements in cust_data_set in order to see whether the a_m applied on cust_data_set
The result is the same as the previous method.
Double-click here for the solution.
Compose
You can compose multiple transforms on the dataset object. First, import transforms from torchvision:
Then, create a new transform class that multiplies each of the elements by 100:
Now let us try to combine the transforms add_mult and mult
The new Compose object will perform each transform concurrently as shown in this figure:

Now we can pass the new Compose object (The combination of methods add_mult() and mult) to the constructor for creating toy_set object.
Let us print out the first 3 elements in different toy_set datasets in order to compare the output after different transforms have been applied:
# Use loop to print out first 3 elements in dataset
for i in range(3):
x, y = data_set[i]
print('Index: ', i, 'Original x: ', x, 'Original y: ', y)
x_, y_ = cust_data_set[i]
print('Index: ', i, 'Transformed x_:', x_, 'Transformed y_:', y_)
x_co, y_co = compose_data_set[i]
print('Index: ', i, 'Compose Transformed x_co: ', x_co ,'Compose Transformed y_co: ',y_co)
Let us see what happened on index 0. The original value of x is [2, 2], and the original value of y is [1]. If we only applied add_mult() on the original dataset, then the x became [3, 3] and y became [2]. Now let us see what is the value after applied both add_mult() and mult(). The result of x is [300, 300] and y is [200]. The calculation which is equavalent to the compose is x = ([2, 2] + 1) x 100 = [300, 300], y = ([1] x 2) x 100 = 200
Practice
Try to combine the mult() and add_mult() as mult() to be executed first. And apply this on a new toy_set dataset. Print out the first 3 elements in the transformed dataset.