Abstract:
Diabetes is a disease which is beyond cure and which has adverse effects on the health and hence has to be detected at an earlier time to avoid more damage to the body. This study aims at establishing the use of machine learning in the circumstances of diabetes prediction based on factors such as glucose levels, blood pressure, skin fold thickness, and insulin. The purpose of this study is to identify the potential of using machine learning techniques, such as Support Vector Machine (SVM), Logistic Regression and proposed Ensemble Model for the prediction of diabetes.
To this aim, in the current study, a dataset including the fundamental medical features of a general population of patients was employed. Regarding this, the data pre-processing was done with the view of handling missing data, data normalization and feature extraction in a view of enhancing the performance of the proposed model. All the models have been developed, and the data was split to perform k-fold cross validation to make the predictions more accurate.
From the evaluation metrics, it is evident that the proposed Ensemble Model is the most appropriate since it has a higher accuracy rate compared to the Support Vector Machine, Logistic Regression model. To compare the performance of each model the metrics used includes accuracy, precision, recall, F1-Score.
Therefore, the above analysis shows that the proposed Ensemble model is effective in the prediction of diabetes, and this is why there is the need to consider data mining in order to improve the health care delivery systems.