Data Exploration and Cleaning Exercise

Photo by DATAIDEA
  1. Load demo.xlsx dataset
# your solution
  1. Rename the columns as suggested below

    Old name New name
    Age age
    Gender gender
    Marital Status marital_status
    Address address
    Income income
    Income Category income_category
    Job Category job_category
# your solution
  1. Display all the columns in the dataset
# your solution
  1. Display some basic statistics about the numeric variables in the dataset
# your solution
  1. Display some basic statistics about the categorical variables in the dataset
# your solution
  1. What are the unique observations under gender?
# your solution
  1. Can you fix any problems observed under the gender, give brief explanations why and how
# your solution
  1. How many observations have ‘no answer’ for marital status?
# your solution
  1. Write some piece of code to return only numeric variables from the dataset
# your solution
  1. Are there any missing values in the dataset?
# your solution
  1. Are there any outliers in the income variable?
# your solution
  1. Investigate the relationship between age and income
# your solution
  1. How many people earn more than 300 units?
# your solution
  1. What data type is the marital status?
# your solution
  1. Create dummy variables for gender
# your solution

END

What’s on your mind? Put it in the comments!

Back to top