University of Bahrain
Scientific Journals

Systematic Approach for Re-Sampling and Prediction of Low Sample Educational Datasets

Show simple item record

dc.contributor.author TARIQ, ARHAM
dc.contributor.author NIAZ, YASIR
dc.contributor.author AMIN, AHMAD
dc.date.accessioned 2021-08-03T09:58:31Z
dc.date.available 2021-08-03T09:58:31Z
dc.date.issued 2021-08-03
dc.identifier.issn 2210-142X
dc.identifier.uri https://journal.uob.edu.bh:443/handle/123456789/4394
dc.description.abstract From the last two decades, it can be observed that the rate of education getting increased day by day all over the globe. Therefore to predict the student’s performance is considered as an emerging research area under educational data mining. Previous studies have noticed that most of the available educational datasets are of a low sample size. These datasets provide fewer generalization opportunities, which makes them difficult to analyze. Previous approaches use noise filtering, data balancing, GAN-based oversampling, or mostly rely on classifiers' performance. In this paper, the proposed approach will provide an improved model that will optimize the classifier's performance and remove the adverse effects of noisy instances and increase data balancing tendency in a better way. The proposed model is based on CTGAN (Conditional Tabular Generative Model), NCC (Nearest Centroid Classifier) combined with data balancing algorithm SMOTE-IPF(Iterative-Partitioning Filter) to increase dataset size by keeping their balanced nature intact and also to minimize the negative effect of noisy data points. Finally, for prediction six classifiers Random Forest (RT), Gradient Boosting (GB), CAT Boost (CT), Extra Tree (ET), KNN, and AdaBoost (AB) are hyperparameter tuned and Stacked ensemble among the best of them is created. The detailed analysis of results elaborates that the proposed model outperforms previous approaches by 2-2.5% in terms of Accuracy, ROC. en_US
dc.language.iso en en_US
dc.publisher University of Bahrain en_US
dc.rights Attribution-NonCommercial-NoDerivatives 4.0 International *
dc.rights.uri http://creativecommons.org/licenses/by-nc-nd/4.0/ *
dc.subject Low Sample Educational Datasets en_US
dc.subject Conditional Tabular Generative Model en_US
dc.subject Students academic performance en_US
dc.subject Educational Data Mining en_US
dc.subject SMOTE-IPF en_US
dc.title Systematic Approach for Re-Sampling and Prediction of Low Sample Educational Datasets en_US
dc.identifier.doi https://dx.doi.org/10.12785/ijcds/120196 en_US
dc.contributor.authorcountry Pakistan en_US
dc.contributor.authorcountry Pakistan en_US
dc.contributor.authorcountry Pakistan en_US
dc.contributor.authoraffiliation University of Lahore, Lahore en_US
dc.contributor.authoraffiliation University of Lahore, Lahore en_US
dc.contributor.authoraffiliation University of Lahore, Lahore en_US
dc.source.title International Journal of Computing and Digital System en_US
dc.abbreviatedsourcetitle IJCDS en_US


Files in this item

The following license files are associated with this item:

This item appears in the following Issue(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 International Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International

All Journals


Advanced Search

Browse

Administrator Account