Abstract:
The classification of phishing websites through the analysis of their URLs is a technique used to enhance the capabilities of
systems designed to detect malicious websites. However, the evolution of phishing sites has allowed them to achieve higher levels of
sophistication, making proactive detection more complex. The central focus of this article revolves around the exploitation of deep
learning models and machine learning techniques with lexical analysis of their URLs to facilitate the classification, detection, and
preventive mitigation of phishing websites. Our study includes the evaluation of a selection of commonly castoff machine learning
algorithms, specifically Random Forest, K-Nearest Neighbors, Support Vector Machines, Gradient Boosting, Decision Tree, Bagging,
AdaBoost and ExtraTree, as well as the deep neural network model. To assess the effectiveness of these algorithms and models, we
conduct our analysis using two distinct URL datasets, one from 2016 and the other from 2021. Through lexical analysis, we extract
significant features from the URLs and then calculate the accuracy of each algorithm on both datasets. Our results reveal that some
algorithms achieve remarkable accuracy scores of up to 99% when applied to the 2016 dataset. However, this score decreases to less
than 91% when applied to the dataset collected in 2021.