Skip to content

NumPy Crash Course

Meta Data title: Numpy Crash Course author: Juma Shafara date: "2024-01" date-modified: "2025-12-27" description: This crash course will teach you the basics and intermediate concepts of the Numpy Library keywords: [numpy, data types, array mathematics, aggregate functions, Subsetting, Slicing, Indexing]

Photo by DATAIDEA

Objective

In this lesson, you will learn all you need to know to get moving with numpy. ie:

What is Numpy

  • Numpy is a python package used for scientific computing
  • Numpy provides arrays which are greater and faster alternatives to traditional python lists. An array is a group of elements of the same data type
  • A standard numpy array is required to have elements of the same data type.

Why NumPy?

NumPy is the foundation of most Python data libraries such as:

  • Pandas
  • SciPy
  • Scikit-learn
  • TensorFlow / PyTorch

It is fast because:

  • It uses C under the hood
  • It avoids Python loops using vectorization
# Python list
py_list = [1, 2, 3]
py_list * 2   # duplicates list
[1, 2, 3, 1, 2, 3]
# NumPy array
np_arr = np.array([1, 2, 3])
np_arr * 2    # element-wise multiplication
array([2, 4, 6])
## Uncomment and run this cell to install numpy
# !pip install numpy

Inspecting our arrays

To use numpy, we'll first import it (you must have it installed for this to work)

 # import numpy module
import numpy as np

We can check the version we'll be using by using the __version__ method

# checking the numpy version
np.__version__
'2.3.4'

Numpy gives us a more powerful Python List alternative data structure called a Numpy ndarray, we creat it using the array() from numpy

# creating a numpy array
num_arr = np.array([1, 2, 3, 4])

The object that's created by array() is called ndarray. This can be shown by checking the type of the object using type()

# Checking type of object
type(num_arr)
numpy.ndarray

Data Types

The table below describes some of the most common data types we use in numpy

Data Type Description
int64 Signed 64-bit integer
float64 Double-precision floating point
complex128 Complex numbers
bool Boolean values
object Python objects
str_ Fixed-length strings

Dimensions:

A dimension is a direction or axis along which data is organized in an array. We find the the number of dimensions in our array using the ndim attribute. A dimension in NumPy refers to the number of axes or levels of depth in an array, determining its shape (e.g., 2D for a matrix, 3D for a tensor).

# finding the number of dimensions
num_arr.ndim
1

Shape:

Refers to a tuple describing the size of each dimension of an array. We can check the shape of a numpy array by using the shape attribute as demonstrated below.

# shape of array
num_arr.shape
(4,)

Length

In NumPy, the length refers to the size of the first axis (dimension) of an array, which is the number of elements along that axis. We can use the len() method to find the length.

# number of elements in array
len(num_arr)
4

Size

Size in NumPy refers to the total number of elements in an array across all dimensions. We can use the size of a numpy array using the size attribute

# another way to get the number of elements
num_arr.size
4

Data Type(dtype)

dtype in NumPy refers to the data type of the elements stored in an array, such as int, float, bool, etc.

# finding data type of array elements
print(num_arr.dtype.name)
int64

Converting Array Data Types

We cas use astype() method to convert an array from one type to another.

# converting an array
float_arr = np.array([1.2, 3.5, 7.0])

# use astype() to convert to a specific
int_arr = float_arr.astype(int)

print(f'Array: {float_arr}, Data Type: {float_arr.dtype}')
print(f'Array: {int_arr}, Data Type: {int_arr.dtype}')
Array: [1.2 3.5 7. ], Data Type: float64
Array: [1 3 7], Data Type: int64

Ask for help

np.info(np.ndarray.shape)
Tuple of array dimensions.

The shape property is usually used to get the current shape of an array,
but may also be used to reshape the array in-place by assigning a tuple of
array dimensions to it.  As with `numpy.reshape`, one of the new shape
dimensions can be -1, in which case its value is inferred from the size of
the array and the remaining dimensions. Reshaping an array in-place will
fail if a copy is required.

.. warning::

    Setting ``arr.shape`` is discouraged and may be deprecated in the
    future.  Using `ndarray.reshape` is the preferred approach.

Examples
--------
>>> import numpy as np
>>> x = np.array([1, 2, 3, 4])
>>> x.shape
(4,)
>>> y = np.zeros((2, 3, 4))
>>> y.shape
(2, 3, 4)
>>> y.shape = (3, 8)
>>> y
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])
>>> y.shape = (3, 6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: cannot reshape array of size 24 into shape (3,6)
>>> np.zeros((4,2))[::2].shape = (-1,)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: Incompatible shape for in-place modification. Use
`.reshape()` to make a copy with the desired shape.

See Also
--------
numpy.shape : Equivalent getter function.
numpy.reshape : Function similar to setting ``shape``.
ndarray.reshape : Method similar to setting ``shape``.

?np.ndarray.shape

Quick Array Inspection Cheatsheet

Attribute Meaning
ndim Number of dimensions
shape Size along each dimension
size Total number of elements
dtype Data type of elements
arr = np.array([[1, 2, 3], [4, 5, 6]])

print('Dimensions:', arr.ndim)
print('Shape:', arr.shape)
print('Size:', arr.size)
print('Dtype:', arr.dtype)
Dimensions: 2
Shape: (2, 3)
Size: 6
Dtype: int64

Broadcasting

Broadcasting allows NumPy to perform operations on arrays of different shapes.

arr = np.array([1, 2, 3])
arr + 10
array([11, 12, 13])

Explanation:

  • Scalar is stretched to match array shape
  • No extra memory used

This explains why NumPy feels magical.

Array mathematics

Numpy has out of the box tools to help us perform some import mathematical operations

Arithmetic Operations

Arithmetic operations in NumPy are element-wise operations like addition, subtraction, multiplication, and division that can be performed directly between arrays or between an array and a scalar.

# creating arrays
array1 = np.array([1, 4, 6, 7])
array2 = np.array([3, 5, 3, 1])
# subtract
difference1 = array2 - array1
print('difference1 =', difference1)

# another way
difference2 = np.subtract(array2, array1)
print('difference2 =', difference2)
difference1 = [ 2  1 -3 -6]
difference2 = [ 2  1 -3 -6]

As we may notice, numpy does element-wise operations for ordinary arithmetic operations

# sum
summation1 = array1 + array2
print('summation1 =', summation1)

# another way
summation2 = np.add(array1, array2)
print('summation2 =', summation2)
summation1 = [4 9 9 8]
summation2 = [4 9 9 8]

Trigonometric operations

Trigonometric operations in NumPy are functions like np.sin(), np.cos(), and np.tan() that perform element-wise trigonometric calculations on arrays.

# sin
print('sin(array1) =', np.sin(array1))
# cos
print('cos(array1) =', np.cos(array1))
# log
print('log(array1) =', np.log(array1))
sin(array1) = [ 0.84147098 -0.7568025  -0.2794155   0.6569866 ]
cos(array1) = [ 0.54030231 -0.65364362  0.96017029  0.75390225]
log(array1) = [0.         1.38629436 1.79175947 1.94591015]

# dot product
array1.dot(array2)
np.int64(48)

The dot() function: - Performs a dot product for 1D arrays - Performs matrix multiplication for 2D arrays

Research:

another way to dot matrices (arrays)

Comparison

In NumPy, comparison operators perform element-wise comparisons on arrays and return boolean arrays of the same shape, where each element indicates True or False based on the corresponding element-wise comparison.

array1 == array2
array([False, False, False, False])
array1 &gt; 3
array([False,  True,  True,  True])

Aggregate functions

NumPy provides several aggregate functions that perform operations across the elements of an array and return a single scalar value.

# array sum
array_sum = array1.sum(axis=0)
print('Sum: ', array_sum)
Sum:  18

# average value
mean = array1.mean()
print('Mean: ', mean)
Mean:  4.5

# minimum value
minimum = array1.min()
print('Minimum: ', minimum)
Minimum:  1

# maximum value
maximum = array1.max()
print('Maximum: ', maximum)
Maximum:  7

# correlation coefficient
correlation_coefficient = np.corrcoef(array1, array2)
print('Correlation Coefficient: ', correlation_coefficient)
Correlation Coefficient:  [[ 1.         -0.46291005]
 [-0.46291005  1.        ]]

# standard deviation
standard_deviation = np.std(array1)
print('Standard Deviation: ', standard_deviation)
Standard Deviation:  2.29128784747792

Research:

copying arrays (you might meet view(), copy())

Subsetting, Slicing and Indexing

Indexing is the technique we use to access individual elements in an array. 0 represents the first element, 1 the represents second element and so on.

Slicing is used to access elements of an array using a range of two indexes. The first index is the start of the range while the second index is the end of the range. The indexes are separated by a colon ie [start:end]

# Creating numpy arrays of different dimension
# 1D array
arr1 = np.array([1, 4, 6, 7])
print('Array1 (1D): \n', arr1)
Array1 (1D): 
 [1 4 6 7]

# 2D array
arr2 = np.array([[1.5, 2, 3], [4, 5, 6]])
print('Array2 (2D): \n', arr2)
Array2 (2D): 
 [[1.5 2.  3. ]
 [4.  5.  6. ]]

#3D array
arr3 = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
                 [[10, 11, 12], [13, 14, 15], [16, 17, 18]]])
print('Array3 (3D): \n', arr3)
Array3 (3D): 
 [[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]]

# find the dimensions of an array
print('Array1 (1D):', arr1.shape)
print('Array2 (2D):', arr2.shape)
print('Array3 (3D):', arr3.shape)
Array1 (1D): (4,)
Array2 (2D): (2, 3)
Array3 (3D): (2, 3, 3)

Indexing

# accessing items in a 1D array
arr1[2]
np.int64(6)
# accessing items in 2D array
arr2[1, 2]
np.float64(6.0)
# accessing in a 3D array
arr3[0, 1, 2]
np.int64(6)

slicing

# slicing 1D array
arr1[0:3]
array([1, 4, 6])
# slicing a 2D array
arr2[1, 1:]
# row index = 1
# column index from 1 to end
array([5., 6.])
# slicing a 3D array
first = arr3[0, 2]
second = arr3[1, 0]

np.concatenate((first, second))
array([ 7,  8,  9, 10, 11, 12])

Boolean Indexing

Boolean indexing in NumPy allows you to select elements from an array based on a boolean condition or a boolean array of the same shape. The elements corresponding to True values in the boolean array/condition are selected, while those corresponding to False are discarded.

# boolean indexing
arr1[arr1 &lt; 5]
array([1, 4])

Research:

Fancy Indexing

Array manipulation

NumPy provides a wide range of functions that allow you to change the shape, dimensions, and structure of arrays to suit your needs

print(arr2)
[[1.5 2.  3. ]
 [4.  5.  6. ]]

# transpose
arr2_transpose1 = np.transpose(arr2)
print('Transpose1: \n', arr2_transpose1)
Transpose1: 
 [[1.5 4. ]
 [2.  5. ]
 [3.  6. ]]

# another way
arr2_transpose2 = arr2.T
print('Transpose2: \n', arr2_transpose2)
Transpose2: 
 [[1.5 4. ]
 [2.  5. ]
 [3.  6. ]]

# combining arrays
first = arr3[0, 2]
second = arr3[1, 0]

np.concatenate((first, second))
array([ 7,  8,  9, 10, 11, 12])
test_arr1 = np.array([[7, 8, 9], [10, 11, 12]])
test_arr2 = np.array([[1, 2, 3], [4, 5, 6]])

np.concatenate((test_arr1, test_arr2), axis=1)
array([[ 7,  8,  9,  1,  2,  3],
       [10, 11, 12,  4,  5,  6]])

Homework

  1. Create an array of 10 numbers
  2. Remove the last element
  3. Reshape it into a 3x3 matrix
  4. Find the mean of each column

Research:

Adding/Removing Elements - resize() - append() - insert() - delete()

Changing array shape - ravel() - reshape()

#stacking
# np.vstack((a,b))
# np.hstack((a,b))
# np.column_stack((a,b))
# np.c_[a, b]
# splitting arrays
# np.hsplit()
# np.vsplit()

What's on your mind? Put it in the comments!