Abstract:
Missing data is a pervasive challenge in diverse datasets, often resulting from human error, system faults, and respondent non-response. Failing to address missing data can lead to inaccurate results during data analysis, as incomplete data sequences introduce biases and compromise the distribution of the synthesized data. Over the past decade, deep learning methods, particularly Recurrent Neural Network (RNN), have been employed to tackle the problem. This study aims to comprehensively evaluate recent RNN methods for missing data imputation, focusing on their strengths and weaknesses to provide a detailed understanding of the current landscape. A systematic literature review was conducted on RNN-based data imputation methods, covering research articles from 2013 to 2023 identified in the SCOPUS database. Out of 363 relevant studies, 70 were selected as primary articles. The findings highlight that Long Short-Term Memory (LSTM) is the most adopted RNN method for data imputation due to its adaptability in processing data of varying lengths as compared to Gated Recurrent Units (GRU) and other hybrid methods. Performance metrics such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Area Under the Receiver Operating Characteristic Curve (AU-ROC), Mean Squared Error (MSE), and Mean Relative Error (MRE) are commonly used to evaluate these models. Future development of a more robust RNN-based imputation methods that integrate optimization algorithms, such as Particle Swarm Optimization (PSO) and Stochastic Gradient Descent (SGD) will further enhance the imputation accuracy and reliability.