import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
Classification Metrics Practice
machine learning, machine learning classification, machine learning classification metrics, decision trees, python, precision, recall, f1 score, weighted, accuracy, linear regression
In this notebook, we’ll walk through the process of building and evaluating a decision tree classifier using Scikit-Learn. We’ll use the Iris dataset for demonstration and then provide an exercise to apply the same steps to the Wine dataset.
To be among the first to hear about future updates of the course materials, simply enter your email below, follow us on (formally Twitter), or subscribe to our YouTube channel.
Importing Necessary Libraries
First, we import the necessary libraries for data manipulation and loading the dataset.
numpy
andpandas
are imported for data manipulation.load_iris
fromsklearn.datasets
is imported to load the Iris dataset.
Loading the Iris Dataset
= load_iris() iris
The Iris dataset is loaded and stored in the variable iris.
Displaying Dataset Description
For a better understanding of the dataset, we can uncomment the following line to print the description of the Iris dataset.
## uncomment and run to read the data description
# print(iris['DESCR'])
Extracting Features and Target Variables
= iris['data']
X = iris['target'] y
- X contains the feature data (sepal length, sepal width, petal length, petal width).
- y contains the target data (class labels: 0, 1, 2).
Importing Train-Test Split Function
from sklearn.model_selection import train_test_split
train_test_split
is imported to split the data into training and testing sets.
Splitting the Data
= train_test_split(X, y, test_size=0.3) X_train, X_test, y_train, y_test
The dataset is split into training (70%) and testing (30%) sets.
Importing Decision Tree Classifier
Next, we import the Decision Tree classifier from Scikit-Learn.
from sklearn.tree import DecisionTreeClassifier
Initializing the Classifier
We create an instance of the Decision Tree classifier
= DecisionTreeClassifier() classifier
Training the Classifier
We train the classifier using the training data.
classifier.fit(X_train, y_train)
DecisionTreeClassifier()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
DecisionTreeClassifier()
Making Predictions
We then make predictions on the test data using the the predict()
method on the model
= classifier.predict(X_test) preds
Importing Metrics for Evaluation
To evaluate our model, we import various metrics from Scikit-Learn.
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
Calculating Accuracy
Accuracy refers to the proportion of correctly predicted instances out of the total instances.
accuracy_score(y_test, preds)
0.9777777777777777
Calculating Precision
Precision is the ratio of correctly predicted positive observations to the total predicted positives.
='weighted') precision_score(y_test, preds, average
0.9794871794871796
Calculating Recall
Recall is the ratio of correctly predicted positive observations to all the actual positives.
='weighted') recall_score(y_test, preds, average
0.9777777777777777
Calculating F1 Score
The f1 score refers to the Harmonic mean of Precision and Recall.
='weighted') f1_score(y_test, preds, average
0.977863799283154
Displaying the Classification Report
We can print the classification report, which provides precision, recall, F1-score, and support for each class.
from sklearn.metrics import classification_report
= classification_report(y_test, preds)
classification_report print(classification_report)
precision recall f1-score support
0 1.00 1.00 1.00 17
1 1.00 0.94 0.97 16
2 0.92 1.00 0.96 12
accuracy 0.98 45
macro avg 0.97 0.98 0.98 45
weighted avg 0.98 0.98 0.98 45
The results show how well the model performs in classifying the iris species, with metrics providing insights into different aspects of the model’s performance.
from sklearn.metrics import confusion_matrix
= confusion_matrix(y_test, preds)
conf_matrix = pd.DataFrame(conf_matrix, index=[0, 1, 2], columns=[0, 1, 2])
conf_matrix # print("Confusion Matrix:\n", conf_matrix)
conf_matrix
0 | 1 | 2 | |
---|---|---|---|
0 | 17 | 0 | 0 |
1 | 0 | 15 | 1 |
2 | 0 | 0 | 12 |
Exercise:
Perform the steps above using the wine dataset from sklearn