Open Access Open Access  Restricted Access Subscription Access

The Efficiency of Multiple Imputation and Maximum Likelihood Methods for Estimating Missing Values


Affiliations
1 Department of Statistics and Operations Research, North West University Mafikeng Campus, South Africa
 

Objectives: This study investigated the efficiency of Multiple Imputation (MI) and Maximum Likelihood (ML) methods for estimating missing values. The study was set to use the findings to make recommendations for future studies about the impact of missing data imputation on the accuracy of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Methods: The completedset (with no missing values) used in this study was collected in 2010/11 through the Income and Expenditure Survey (IES) and had 25328 observations. Missing data were generated by randomly deleting 10%, 20%, 30%, 40% and 50% of the values from the complete dataset. The missing values in each of the five datasets were imputed using MI and ML methods. Subsequently, absolute error values of AIC and BIC from multiple regression analysis were computed for each dataset. The study then compared the absolute errors for each missing value imputation method. Findings: The findings of the study revealed that AIC and BIC are more accurate when missing values are estimated by the Full Information Maximum Likelihood (FIML) of the ML algorithm, provided 10% of the data are missing. For all datasets, AIC and BIC were least accurate when missing values were imputed by Expectation Maximisation (EM) of the ML algorithm. The findings also showed that AIC and BIC are more accurate when the rate of MISSINGNESS gets large provided missing values were estimated using either the Fully Conditional Specification (FCS) or Markov Chain Monte Carlo (MCMC), MI algorithms. Application: When the rate of MISSINGNESS is small (at most 10%), FIML should be used to handle missing data if AIC and BIC are going to be used. Also both FCS and MCMC should be considered over EM algorithms when the rate of MISSINGNESS is high (at least 40% missing).

Keywords

Maximum Likelihood Imputation, Multiple Imputation, AIC, BIC
User

Abstract Views: 212

PDF Views: 0




  • The Efficiency of Multiple Imputation and Maximum Likelihood Methods for Estimating Missing Values

Abstract Views: 212  |  PDF Views: 0

Authors

Tlhalitshi Volition Montshiwa
Department of Statistics and Operations Research, North West University Mafikeng Campus, South Africa
Ntebo Moroke
Department of Statistics and Operations Research, North West University Mafikeng Campus, South Africa
Elias Munapo
Department of Statistics and Operations Research, North West University Mafikeng Campus, South Africa

Abstract


Objectives: This study investigated the efficiency of Multiple Imputation (MI) and Maximum Likelihood (ML) methods for estimating missing values. The study was set to use the findings to make recommendations for future studies about the impact of missing data imputation on the accuracy of Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). Methods: The completedset (with no missing values) used in this study was collected in 2010/11 through the Income and Expenditure Survey (IES) and had 25328 observations. Missing data were generated by randomly deleting 10%, 20%, 30%, 40% and 50% of the values from the complete dataset. The missing values in each of the five datasets were imputed using MI and ML methods. Subsequently, absolute error values of AIC and BIC from multiple regression analysis were computed for each dataset. The study then compared the absolute errors for each missing value imputation method. Findings: The findings of the study revealed that AIC and BIC are more accurate when missing values are estimated by the Full Information Maximum Likelihood (FIML) of the ML algorithm, provided 10% of the data are missing. For all datasets, AIC and BIC were least accurate when missing values were imputed by Expectation Maximisation (EM) of the ML algorithm. The findings also showed that AIC and BIC are more accurate when the rate of MISSINGNESS gets large provided missing values were estimated using either the Fully Conditional Specification (FCS) or Markov Chain Monte Carlo (MCMC), MI algorithms. Application: When the rate of MISSINGNESS is small (at most 10%), FIML should be used to handle missing data if AIC and BIC are going to be used. Also both FCS and MCMC should be considered over EM algorithms when the rate of MISSINGNESS is high (at least 40% missing).

Keywords


Maximum Likelihood Imputation, Multiple Imputation, AIC, BIC



DOI: https://doi.org/10.17485/ijst%2F2018%2Fv11i16%2F173391