Phishing Website Classification using Machine Learning with  Different Datasets

BOUIJIJ, Habiba; BERQIA, Amine

doi:http://dx.doi.org/10.12785/ijcds/1501115

Journals About us Ethics and Policies Objectives Values Contact us

UOB Journals
→
02. International Journal of Computing and Digital Systems
→
Volume 15
→
Issue 01
→
View Item

Phishing Website Classification using Machine Learning with Different Datasets

BOUIJIJ, Habiba; BERQIA, Amine

DOI: http://dx.doi.org/10.12785/ijcds/1501115

ISSN: 2210-142X

Date: 2024-05-01

Abstract:

The classification of phishing websites through the analysis of their URLs is a technique used to enhance the capabilities of systems designed to detect malicious websites. However, the evolution of phishing sites has allowed them to achieve higher levels of sophistication, making proactive detection more complex. The central focus of this article revolves around the exploitation of deep learning models and machine learning techniques with lexical analysis of their URLs to facilitate the classification, detection, and preventive mitigation of phishing websites. Our study includes the evaluation of a selection of commonly castoff machine learning algorithms, specifically Random Forest, K-Nearest Neighbors, Support Vector Machines, Gradient Boosting, Decision Tree, Bagging, AdaBoost and ExtraTree, as well as the deep neural network model. To assess the effectiveness of these algorithms and models, we conduct our analysis using two distinct URL datasets, one from 2016 and the other from 2021. Through lexical analysis, we extract significant features from the URLs and then calculate the accuracy of each algorithm on both datasets. Our results reveal that some algorithms achieve remarkable accuracy scores of up to 99% when applied to the 2016 dataset. However, this score decreases to less than 91% when applied to the dataset collected in 2021.

Show full item record