Text Classification on Cybercrime Cases From News Articles  Using Supervised Learning

Farhan, Muhammad; Mutalib, Sofianita; Yusof Darus, Mohamad; Ismail, Azlan; Mokayed, Hamam; Abdul-Rahman, Shuzlina; Nizam, Muhamad

doi:XXXXXX

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Preprint
→
View Item

dc.contributor.author	Farhan, Muhammad
dc.contributor.author	Mutalib, Sofianita
dc.contributor.author	Yusof Darus, Mohamad
dc.contributor.author	Ismail, Azlan
dc.contributor.author	Mokayed, Hamam
dc.contributor.author	Abdul-Rahman, Shuzlina
dc.contributor.author	Nizam, Muhamad
dc.date.accessioned	2024-07-11T11:39:44Z
dc.date.available	2024-07-11T11:39:44Z
dc.date.issued	2024-07-11
dc.identifier.uri	https://journal.uob.edu.bh:443/handle/123456789/5807
dc.description.abstract	The number of cybercrime cases has increased in this country, especially after the pandemic. The nation has created numerous strategic plans, including the introduction of the Malaysia Cyber Security Strategy (MCSS), which sparked a baseline for countering cybercrime. One of the pillars is Enhancing Capacity and Capability Building, Awareness, and Education. To raise awareness effectively, the taxonomy of cybercrime must be easily understandable by the citizens. This project is to study the classification of news postings by applying supervised models that can ease the classification of cybercrime types. Five supervised models with a combination of two feature extractors were examined. The models were experimented with to evaluate their performance using a percentage split of 70:20 and 80:20. Each model is evaluated based on accuracy, F1-measure, and precision. In the experiment, Random Forest with the TF-IDF feature extractor produced the best result. Achieving an impressive accuracy rate of 94.01%, this model stands out for its precision. Naïve Bayes with the Word2vec feature extractor performed the least effectively, with an accuracy rate of 73.48%. This research focused on analyzing textual data by examining word frequency and interpreting topics based on the class labels of Cybercrime Type 1 and Cybercrime Type 2. Each class of cybercrime news uncovered the topic using latent direct allocation, which was interpreted using Chat-GPT. The analysis and the results of the classification model have been effectively visualized in the PowerBI dashboard, enhancing comprehension. To enhance future research, consider adjusting the scope of the data to focus on local Malay news for more targeted insights.	en_US
dc.language.iso	en_US	en_US
dc.publisher	University of Bahrain	en_US
dc.subject	Article News	en_US
dc.subject	Cybercrime	en_US
dc.subject	Machine Learning	en_US
dc.subject	Text Classification	en_US
dc.subject	Topic Identification	en_US
dc.title	Text Classification on Cybercrime Cases From News Articles Using Supervised Learning	en_US
dc.identifier.doi	XXXXXX
dc.volume	17	en_US
dc.issue	1	en_US
dc.pagestart	1	en_US
dc.pageend	11	en_US
dc.contributor.authorcountry	40450 Shah Alam, Selangor, Malaysia	en_US
dc.contributor.authorcountry	Luelå, Sweden	en_US
dc.contributor.authoraffiliation	School of Computing Sciences, College of Computing, Informatics and Mathematics, Universiti Teknologi MARA	en_US
dc.contributor.authoraffiliation	Institute of Big Data Analytics and Artificial Intelligence, Universiti Teknologi MARA	en_US
dc.contributor.authoraffiliation	Department of Computer Science, Electrical and Space Engineering, Luleå tekniska universitet	en_US
dc.source.title	International Journal of Computing and Digital Systems	en_US
dc.abbreviatedsourcetitle	IJCDS	en_US