Abstract:
Machine learning is crucial in categorizing data into specific classes based on their features. However, challenges emerge
especially in classification when dealing with imbalanced datasets in which the model exhibits bias towards the majority class. This
research proposes a cascade and parallel architecture in the training process to enhance accuracy as well as speed compared to non-cascade and sequential respectively. This research will evaluate the performance of the SVM and Random Forest methods. The
research finds that the Support Vector Machine (SVM) method with the Radial Basis Function (RBF) kernel notably increases
accuracy by 1.25% over non-cascade classifiers. In addition, the use of Message Passing Interface for Python (MPI4Py) for training
process across multiple cores or nodes proved that parallel processing significantly speeds up the training process up to 3.57 times
faster than sequential training. These findings underscore the effectiveness of parallel processing in enhancing both the accuracy and
efficiency of classification tasks in imbalanced data scenarios.