A Parallel Approach of Cascade Modelling Using MPI4Py on  Imbalanced Dataset

Suprapto; Wahyono; Rokhman, Nur; Dharma Adhinata, Faisal

doi:http://dx.doi.org/10.12785/ijcds/150191

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 15
→
Issue 01
→
View Item

A Parallel Approach of Cascade Modelling Using MPI4Py on Imbalanced Dataset

Suprapto; Wahyono; Rokhman, Nur; Dharma Adhinata, Faisal

DOI: http://dx.doi.org/10.12785/ijcds/150191

ISSN: 2210-142X

Date: 2024-03-10

Abstract:

Machine learning is crucial in categorizing data into specific classes based on their features. However, challenges emerge especially in classification when dealing with imbalanced datasets in which the model exhibits bias towards the majority class. This research proposes a cascade and parallel architecture in the training process to enhance accuracy as well as speed compared to non-cascade and sequential respectively. This research will evaluate the performance of the SVM and Random Forest methods. The research finds that the Support Vector Machine (SVM) method with the Radial Basis Function (RBF) kernel notably increases accuracy by 1.25% over non-cascade classifiers. In addition, the use of Message Passing Interface for Python (MPI4Py) for training process across multiple cores or nodes proved that parallel processing significantly speeds up the training process up to 3.57 times faster than sequential training. These findings underscore the effectiveness of parallel processing in enhancing both the accuracy and efficiency of classification tasks in imbalanced data scenarios.

Show full item record