dc.contributor.author | TARIQ, ARHAM | |
dc.contributor.author | NIAZ, YASIR | |
dc.contributor.author | AMIN, AHMAD | |
dc.date.accessioned | 2021-08-03T09:58:31Z | |
dc.date.available | 2021-08-03T09:58:31Z | |
dc.date.issued | 2021-08-03 | |
dc.identifier.issn | 2210-142X | |
dc.identifier.uri | https://journal.uob.edu.bh:443/handle/123456789/4394 | |
dc.description.abstract | From the last two decades, it can be observed that the rate of education getting increased day by day all over the globe. Therefore to predict the student’s performance is considered as an emerging research area under educational data mining. Previous studies have noticed that most of the available educational datasets are of a low sample size. These datasets provide fewer generalization opportunities, which makes them difficult to analyze. Previous approaches use noise filtering, data balancing, GAN-based oversampling, or mostly rely on classifiers' performance. In this paper, the proposed approach will provide an improved model that will optimize the classifier's performance and remove the adverse effects of noisy instances and increase data balancing tendency in a better way. The proposed model is based on CTGAN (Conditional Tabular Generative Model), NCC (Nearest Centroid Classifier) combined with data balancing algorithm SMOTE-IPF(Iterative-Partitioning Filter) to increase dataset size by keeping their balanced nature intact and also to minimize the negative effect of noisy data points. Finally, for prediction six classifiers Random Forest (RT), Gradient Boosting (GB), CAT Boost (CT), Extra Tree (ET), KNN, and AdaBoost (AB) are hyperparameter tuned and Stacked ensemble among the best of them is created. The detailed analysis of results elaborates that the proposed model outperforms previous approaches by 2-2.5% in terms of Accuracy, ROC. | en_US |
dc.language.iso | en | en_US |
dc.publisher | University of Bahrain | en_US |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Low Sample Educational Datasets | en_US |
dc.subject | Conditional Tabular Generative Model | en_US |
dc.subject | Students academic performance | en_US |
dc.subject | Educational Data Mining | en_US |
dc.subject | SMOTE-IPF | en_US |
dc.title | Systematic Approach for Re-Sampling and Prediction of Low Sample Educational Datasets | en_US |
dc.identifier.doi | https://dx.doi.org/10.12785/ijcds/120196 | en_US |
dc.contributor.authorcountry | Pakistan | en_US |
dc.contributor.authorcountry | Pakistan | en_US |
dc.contributor.authorcountry | Pakistan | en_US |
dc.contributor.authoraffiliation | University of Lahore, Lahore | en_US |
dc.contributor.authoraffiliation | University of Lahore, Lahore | en_US |
dc.contributor.authoraffiliation | University of Lahore, Lahore | en_US |
dc.source.title | International Journal of Computing and Digital System | en_US |
dc.abbreviatedsourcetitle | IJCDS | en_US |
The following license files are associated with this item: