Abstract:
Diabetes is the ”silent killer,” stealing the lives of millions of people worldwide. There are many reasons for diabetes,
such as increasing glucose, Cholesterol, systolic BP, and Age. These are considered to be the four primary causes of diabetes. The
challenge in diabetes is predicting the human illness early to start treatment immediately after discovering diabetes; this can be the
most challenging thing in diabetes discovery because tens of features may cause diabetes. This study proposes a model consisting of
data mining and Machine Learning (ML) algorithms to predict if humans can have diabetes or not in the future. The prediction is
made up of compensating two datasets; one dataset is used to reconfirm the other dataset in order to make a more accurate prediction.
This can be performed using the k-means-PCA hybrid model and the highest weight selection of features that widely cause diabetes.
The selected features help the ML algorithm predict the model’s accuracy, which indicates the prediction model’s accuracy. Simulation
results show that the number of predict-diabetic patients increased from 53 from the original datasets to 142 after applying the proposed
model. Simulation outcomes also prove that the Random Forest ML model gives the highest accuracy of other ML models, reaching
95.2%.