Abstract:
This research delves into the issue of dataset imbalance in the classification of Chest X-Ray (CXR) images in TBX11K by
applying the Random Forest (RF) and XGBoost (XGB) methods with or without the Synthetic Minority Over-sampling Technique
(SMOTE) resampling technique. The objective of this study is to assess the impact of SMOTE on model performance in the
classification of CXR TBX11K images. In this research, the SMOTE technique is applied to the RF and XGB classification models.
The use of SMOTE aims to increase the number of minority class samples (TB positive) to mitigate the imbalance with the majority
class samples (TB negative). Each model is evaluated using the same metrics for comparison, such as accuracy, precision, recall, and
F1 score. After conducting experiment, the research results indicate that the use of SMOTE technique in the RF and XGB models is
effective in addressing class imbalance in the dataset. The RF model without SMOTE achieves an accuracy of approximately
93.33%, while the RF model with SMOTE achieves an accuracy of 92.72%. The XGB model without SMOTE attains an accuracy of
94.11%, whereas the XGB model with SMOTE achieves an accuracy of 94.33%. Although there is a slight decrease in accuracy in
models with SMOTE during testing, the balance between precision and recall remains high. Overall, the XGB model with SMOTE is
the optimal model for identifying rarely occurring positive cases, while the RF model without SMOTE is the optimal model for
situations where overall accuracy is most critical.