Handling Missing Data Quiz

Author

Juma Shafara

Published

March 1, 2024

Keywords

Handling Missing Data Quiz, Handling Missing Data

Photo by DATAIDEA

Questions:

1. Which of the following is a common method for handling missing data?

    1. Deleting rows with missing values
    1. Using the mean to fill missing values
    1. Using a machine learning algorithm to predict missing values
    1. All of the above

2. What is the term used for removing rows or columns that contain missing data?

    1. Imputation
    1. Deletion
    1. Interpolation
    1. Normalization

3. Which imputation method replaces missing values with the mean, median, or mode?

    1. Random sampling imputation
    1. Regression imputation
    1. Central tendency imputation
    1. K-nearest neighbors imputation

4. What is the potential drawback of deleting rows with missing data?

    1. It is computationally expensive.
    1. It can lead to biased results.
    1. It always improves model accuracy.
    1. It requires complex algorithms.

5. Which technique involves predicting missing values based on other available data?

    1. Listwise deletion
    1. Pairwise deletion
    1. Multiple imputation
    1. Hot deck imputation

6. Which of the following is NOT a method for handling missing data in time series analysis?

    1. Forward fill
    1. Backward fill
    1. Interpolation
    1. Cross-validation

7. In the context of handling missing data, what does ‘MCAR’ stand for?

    1. Missing Completely at Random
    1. Missing Conditional on Available Rows
    1. Missing Characteristic Attribute Reduction
    1. Missing Completely Available Records

8. Which method is most suitable for handling missing data when the data is ‘MAR’ (Missing At Random)?

    1. Listwise deletion
    1. Multiple imputation
    1. Mean imputation
    1. Mode imputation

9. Which Python library is widely used for data manipulation and handling missing data?

    1. NumPy
    1. Pandas
    1. SciPy
    1. Matplotlib

10. Which of the following is a disadvantage of using mean imputation?

- A) It is computationally intensive.
- B) It can distort the variance of the data.
- C) It requires labeled data.
- D) It is only applicable to categorical data.

Answers:

    1. All of the above
    1. Deletion
    1. Central tendency imputation
    1. It can lead to biased results.
    1. Multiple imputation
    1. Cross-validation
    1. Missing Completely at Random
    1. Multiple imputation
    1. Pandas
    1. It can distort the variance of the data.

What’s on your mind? Put it in the comments!

Back to top