Customer Analysis
title: Customer Analysis keywords: [Customer Analysis, Data Visualization, Matplotlib, Pandas, Opendatasets] description: In this notebook, I want to observe any trends related to customers. author: Juma Shafara date: "2024-07-08"
In this project, I want to look at customer data pulled from github and create some visuals in my jupyter notebook to observe any trends related to customers.
Before we continue, we have a humble request, to be among the first to hear about future updates of the course materials, simply enter your email below, follow us on (formally Twitter), or subscribe to our YouTube channel.
Downloading the Dataset:
First we download our data sets from github.
Data Preparation and Cleaning
Get our dataset into a data frame, examine the tables to check for incorrect, inconsistent, or invalid entries. Handle other cleaning steps as necessary.
Cleaning the Data
Here we need to fix some of the columns/rows to make the data easier to use.
#import the useful libraries.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# Read the file in data without first two rows as it is of no use.
data = pd.read_csv("Marketing_Analysis.csv",skiprows = 2)
#print the head of the data frame.
data.head()
# Drop the customer id as it is of no use.
data.drop('customerid', axis = 1, inplace = True)
#Extract job & Education in newly from "jobedu" column.
data['job']= data["jobedu"].apply(lambda x: x.split(",")[0])
data['education']= data["jobedu"].apply(lambda x: x.split(",")[1])
# Drop the "jobedu" column from the dataframe.
data.drop('jobedu', axis = 1, inplace = True)
# Printing the Dataset
data.sample(n=5)
Exploratory Analysis and Visualization
Now we apply some data manipulation steps and explore some of the findings through the use of visuals. Hopefully we can then gain some useful insights from our data.
What kind of employment is most common in our data?
What is the education level?
What are the balances for individuals based on their age?
What is correlating with balance?
What is the salary range and averages for both response types?
What marital status has the highest response rate?
What combination of education and marital status has the largest response rate?
What is the average salary for each age group in the data?
#plot the bar graph of age groups with average salary for that group
bins = [18, 30, 40, 50, 60, 70, 120]
labels = ['18-29', '30-39', '40-49', '50-59', '60-69', '70+']
data['agerange'] = pd.cut(data.age, bins, labels = labels,include_lowest = True)
#plot the bar graph of average salary per age group
data.groupby('agerange')['salary'].mean().plot.bar()
plt.title('Avg Salary per Age',fontsize = 12)
plt.show()
Let us save and upload our work to Jovian before finishing up.
Conclusions
What we can say from the visuals above are the following:
- Approx. 60% of our customers are in the technician/management/blue collar category of work.
- Half are high school graduates, and less than a third have higher education.
- For people under 65, the balance is typically between 0-20000. For over 65, we see 0-10000 is the range.
- Heatmap supports the age-balance correlation to be stronger than salary-balance.
- Reponse rate is highest for single highly educated and lowest for married and less educated individuals.