import numpy as np
import matplotlib.pyplot as plt
Maths and Statistics
python basics, variables, numbers, operators, containers, flow control, advanced, modules, file handling, statistics, mean
Linear Algebra for Data Science
We’ll cover essential linear algebra concepts, including Vectors and Matrices and Matrix Operations, with Python code examples using NumPy.
1. Vectors and Matrices
Vector: A vector is an ordered list of numbers, which can be represented as a row or column.
# Creating a vector
= np.array([3, 4])
vector print("Vector:", vector)
Vector: [3 4]
Matrix: A matrix is a two-dimensional array of numbers.
# Creating a matrix
= np.array([[1, 2], [3, 4]])
matrix print("Matrix:\n", matrix)
Matrix:
[[1 2]
[3 4]]
Applications of Vectors and Matrices - Representing physical quantities like force, velocity, and acceleration - Describing geometric shapes and transformations - Organizing and modeling data in data science
2. Matrix Operations
Addition and Subtraction
= np.array([[1, 2], [3, 4]])
A = np.array([[5, 6], [7, 8]])
B
print("Matrix A:\n", A)
print("Matrix B:\n", B)
Matrix A:
[[1 2]
[3 4]]
Matrix B:
[[5 6]
[7 8]]
# Element-wise addition
= np.array([[1, 2], [3, 4]])
A = np.array([[5, 6], [7, 8]])
B
= A + B
C print("A + B:\n", C)
A + B:
[[ 6 8]
[10 12]]
# Element-wise subtraction
= np.array([[1, 2], [3, 4]])
A = np.array([[5, 6], [7, 8]])
B
= A - B
D print("A - B:\n", D)
A - B:
[[-4 -4]
[-4 -4]]
Matrix Multiplication
# Element-wise multiplication
= np.array([[1, 2], [3, 4]])
A = np.array([[5, 6], [7, 8]])
B
= A * B
E print("Element-wise A * B:\n", E)
Element-wise A * B:
[[ 5 12]
[21 32]]
# Dot product (Matrix multiplication)
= np.array([[1, 2], [3, 4]])
A = np.array([[5, 6], [7, 8]])
B
= np.dot(A, B)
F print("Dot Product A @ B:\n", F)
Dot Product A @ B:
[[19 22]
[43 50]]
# **Transpose of a Matrix**
= np.array([[1, 2], [3, 4]])
A
= A.T
G print("Transpose of A:\n", G)
Transpose of A:
[[1 3]
[2 4]]
Research:
- Inverse Matrix
- Determinant
- Eigenvalues and Eigenvectors
Derivatives and Gradients in Calculus
Derivatives measure the rate of change of a function.
from sympy import symbols, diff
# Define symbol x for differentiation
= symbols('x')
x = x**2 + 3*x + 2
f # get derivative
= diff(f, x)
f_derivative print("Derivative of f(x) = x^2 + 3x + 2 is:", f_derivative)
Derivative of f(x) = x^2 + 3x + 2 is: 2*x + 3
Gradient
The gradient of a function represents the direction and rate of steepest increase of that function at any given point.
Applications of Gradient
- Optimization algorithms
Probability Distributions
Probability distributions describe how values are distributed. Here, we explore some common ones: Uniform, Normal, Binomial.
Uniform Distribution
In a uniform distribution, all values within a range are equally likely.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
# Discrete uniform distribution for a fair coin (2 outcomes: heads or tails)
= ['Heads', 'Tails']
outcomes = [0.5, 0.5]
probabilities
# Plotting
=['blue', 'orange'])
plt.bar(outcomes, probabilities, color'Fair Coin Toss Distribution')
plt.title('Outcome')
plt.xlabel('Probability')
plt.ylabel(0, 1)
plt.ylim( plt.show()
Examples of uniform distributions
- Rolling a fair die
- Flipping a fair coin
- Drawing a card from a well-shuffled deck
Normal Distribution
A normal (Gaussian) distribution is symmetric, centered around the mean.
from scipy.stats import norm
# Simulating Nobel Prize winner ages (mean = 60, std dev = 10)
= 60, 10
mu, sigma = np.random.normal(mu, sigma, 1000) # Generate 1000 random ages
ages
# Plotting the histogram of ages
=30, density=True, alpha=0.6, color='g', label='Histogram of Ages')
plt.hist(ages, bins
# Plot the normal distribution PDF
= np.linspace(ages.min(), ages.max(), 100)
x = norm.pdf(x, mu, sigma)
y 'r-', label='Normal Distribution')
plt.plot(x, y,
'Age Distribution of Nobel Prize Winners')
plt.title('Age')
plt.xlabel('Probability Density')
plt.ylabel(
plt.legend() plt.show()
Examples of normal distributions
- Heights of people
- IQ scores
- Measurement errors
- Stock price fluctuations
Binomial Distribution
The binomial distribution models the number of successes in n
trials.
from scipy.stats import binom
# Parameters
= 10, 0.5 # Number of trials and probability of success (fair coin)
n, p = np.arange(0, n + 1) # Possible number of heads (successes)
x = binom.pmf(x, n, p) # Binomial PMF (probability mass function)
y
# Plotting the binomial distribution
='blue', alpha=0.7)
plt.bar(x, y, color'Binomial Distribution for Fair Coin Toss (n=10, p=0.5)')
plt.title('Number of Heads')
plt.xlabel('Probability')
plt.ylabel(0, n + 1))
plt.xticks(np.arange(0, 0.3) # Limiting y-axis for better visualization
plt.ylim( plt.show()
Examples of binomial distributions
- Number of heads in coin flips
- Number of defective items in a batch
- Number of customers who click on an ad
- Number of students who pass an exam
Expectation and Variance
Expectation (mean) of a random variable X is E[X] = Σ x * P(x)
Variance measures spread: Var(X) = E[X^2] - (E[X])^2
# Example with a discrete random variable
= np.array([1, 2, 3, 4, 5])
values = np.array([0.1, 0.2, 0.3, 0.2, 0.2]) # Probabilities sum to 1
probs = np.sum(values * probs)
expectation = np.sum((values**2) * probs) - expectation**2
variance print(f"Expectation (E[X]): {expectation}")
print(f"Variance (Var[X]): {variance}")
# For a normal distribution
= 5, 2
mean, std_dev = mean
expectation_norm = std_dev ** 2
variance_norm print(f"Normal Distribution - Expectation: {expectation_norm}, Variance: {variance_norm}")
Expectation (E[X]): 3.2
Variance (Var[X]): 1.5599999999999987
Normal Distribution - Expectation: 5, Variance: 4
Applications of Expectation and Variance
- Portfolio management
- Insurance risk assessment
- Machine Learning model performance
- Hypothesis testing
Statistics in Data Science
This notebook covers Descriptive Statistics and Regression Analysis with examples in Python.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
Descriptive Statistics
# Descriptive Statistics
= [12, 15, 14, 10, 8, 10, 12, 15, 18, 20, 20, 21, 19, 18]
data print(f"Dataset: {data}")
= np.mean(data)
mean_value = np.median(data)
median_value = stats.mode(data).mode
mode_value = np.std(data)
std_dev
print(f"Mean: {mean_value}")
print(f"Median: {median_value}")
print(f"Mode: {mode_value}")
print(f"Standard Deviation: {std_dev}")
Dataset: [12, 15, 14, 10, 8, 10, 12, 15, 18, 20, 20, 21, 19, 18]
Mean: 15.142857142857142
Median: 15.0
Mode: 10
Standard Deviation: 4.120630029101703
Regression Analysis
# Regression Analysis
42)
np.random.seed(= np.random.rand(100, 1) * 10 # Independent variable
X = 2.5 * X + np.random.randn(100, 1) * 2 + 5 # Dependent variable with noise
y # Splitting the data
= train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_test, y_train, y_test
# Applying Linear Regression
= LinearRegression()
model
model.fit(X_train, y_train)= model.predict(X_test)
y_pred
# Model Performance
print(f"Intercept: {model.intercept_[0]}")
print(f"Slope: {model.coef_[0][0]}")
print(f"Mean Squared Error: {mean_squared_error(y_test, y_pred)}")
print(f"R-squared Score: {r2_score(y_test, y_pred)}")
# Plotting the Regression Line
='blue', label='Actual Data')
plt.scatter(X_test, y_test, color='red', linewidth=2, label='Regression Line')
plt.plot(X_test, y_pred, color"Linear Regression Example")
plt.title("X")
plt.xlabel("y")
plt.ylabel(
plt.legend() plt.show()
Intercept: 5.285826638917127
Slope: 2.419729462992111
Mean Squared Error: 2.6147980548680128
R-squared Score: 0.9545718935323326
Applications of Regression Analysis
- Predictive modeling
- Time series forecasting
- Causal inference
- Feature engineering