The applicability of a hybrid framework for automated phishing detection

Cybersecurity; Cyberthreat Intelligence; Fraud detection; Machine learning; Phishing; Phishing detection; Privacy and security; Security management; Cyber security; Cyberthreat intelligence; Cyberthreats; Hybrid framework; Machine-learning; Phishing detections; Computer Science (all); Law; General Computer Science

Abstract :

[en] Phishing attacks are a critical and escalating cybersecurity threat in the modern digital landscape. As cybercriminals continually adapt their techniques, automated phishing detection systems have become essential for safeguarding Internet users. However, many current systems rely on single-analysis models, making them vulnerable to sophisticated bypass attempts by hackers. This research delves into the potential of hybrid approaches, which combine multiple models to enhance both the robustness and effectiveness of phishing detection. It highlights existing hybrid models' limitations that focus primarily on effectiveness while ignoring broader applicability. To address these gaps, we introduce a novel framework explicitly designed for applicability in the real world, which poses the foundation for practical and robust phishing detection architectures. We develop a proof of concept to evaluate its effectiveness, robustness, and detection speed. Additionally, we introduce an innovative methodology for simulating bypass attacks on single-analysis base models. Our experiments demonstrate that the proposed hybrid framework outperforms individual models, displaying higher effectiveness, robustness against bypassing attempts, and real-time detection capabilities. Our proof of concept achieves an accuracy of 97.44% thereby outperforming the current state-of-the-art approach while requiring less computational time. The results provide insights into the multifaceted factors of hybrid models, extending beyond mere effectiveness, and emphasize the importance of holistic applicability in hybrid approaches to address the critical need for robust defenses against phishing attacks.

Disciplines :

Computer science

Author, co-author :

van Geest, R.J.; Eindhoven University of Technology, Jheronimus Academy of Data Science, Netherlands

Cascavilla, G. ; Eindhoven University of Technology, Jheronimus Academy of Data Science, Netherlands

HULSTIJN, Joris ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)

Zannone, N. ; Eindhoven University of Technology, Netherlands

External co-authors :

yes

Language :

English

Title :

The applicability of a hybrid framework for automated phishing detection

Publication date :

April 2024

Journal title :

Computers and Security

ISSN :

0167-4048

Publisher :

Elsevier Ltd

Volume :

139

Pages :

103736

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

https://api.elsevier.com/content/article/PII:S0167404824000373?httpAccept=text/xml

Available on ORBilu :

since 06 March 2024

Statistics

Number of views

204 (3 by Unilu)

Number of downloads

255 (1 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenAlex citations

Bibliography

Abbate, P., Internet crime report 2021. Federal Bureau of Investigation, 2022 bit.ly/CrimeReport2021.
Abdelnabi, S., Krombholz, K., Fritz, M., Visualphishnet: zero-day phishing website detection by visual similarity. Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, 2020, 1681–1698.
Afroz, S., Greenstadt, R., Phishzoo: detecting phishing websites by looking at them. 2011 IEEE Fifth International Conference on Semantic Computing, 2011, 368–375.
Al Halaseh, R., Alqatawna, J., Analyzing cybercrimes strategies: the case of phishing attack. 2016 Cybersecurity and Cyberforensics Conference (CCC), 2016, IEEE, 82–88.
Al-Ahmadi, S., Alotaibi, A., Alsaleh, O., Pdgan: phishing detection with generative adversarial networks. IEEE Access 10 (2022), 42459–42468.
AlEroud, A., Karabatis, G., Bypassing detection of url-based phishing attacks using generative adversarial deep neural networks. Proceedings of the Sixth International Workshop on Security and Privacy Analytics, 2020, 53–60.
Alhogail, A., Alsabih, A., Applying machine learning and natural language processing to detect phishing email. Comput. Secur., 110, 2021, 102414.
Aljofey, A., Jiang, Q., Qu, Q., Huang, M., Niyigena, J.-P., An effective phishing detection model based on character level convolutional neural network from url. Electronics, 9(9), 2020.
Aljofey, A., Jiang, Q., Rasool, A., Chen, H., Liu, W., Qu, Q., Wang, Y., An effective detection approach for phishing websites using url and html features. Sci. Rep., 12(1), 2022, 8842.
Allodi, L., Chotza, T., Panina, E., Zannone, N., The need for new antiphishing measures against spear-phishing attacks. IEEE Secur. Priv. 18:2 (2020), 23–34.
Almousa, M., Furst, R., Anwar, M., Characterizing coding style of phishing websites using machine learning techniques. 2022 Fourth International Conference on Transdisciplinary AI (TransAI), 2022, 101–105.
Alom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.S., Hasan, M., Van Essen, B.C., Awwal, A.A., Asari, V.K., A state-of-the-art survey on deep learning theory and architectures. Electronics, 8(3), 2019, 292.
Alshingiti, Z., Alaqel, R., Al-Muhtadi, J., Haq, Q.E.U., Saleem, K., Faheem, M.H., A deep learning-based phishing detection system using CNN, LSTM, and LSTM-CNN. Electronics, 12(1), 2023.
Ariyadasa, Subhash, Fernando, Shantha, Fernando, Subha, Phishing Websites Dataset., 2021.
Ariyadasa, Subhash, Fernando, Shantha, Fernando, Subha, Phishrepo-dataset., 2022.
Bilot, T., Geis, G., Hammi, B., Phishgnn: a phishing website detection framework using graph neural networks., 2022.
Bu, S.-J., Cho, S.-B., Deep character-level anomaly detection based on a convolutional autoencoder for zero-day phishing url detection. Electronics, 10(12), 2021, 1492.
Bu, S.-J., Cho, S.-B., Integrating deep learning with first-order logic programmed constraints for zero-day phishing attack detection. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, IEEE, 2685–2689.
Bu, S.-J., Kim, H.-J., Optimized url feature selection based on genetic-algorithm-embedded deep learning for phishing website detection. Electronics, 11(7), 2022, 1090.
Cao, Y., Han, W., Le, Y., Anti-phishing based on automated individual white-list. Proceedings of the 4th ACM Workshop on Digital Identity Management, 2008, 51–60.
Chiew, K.L., Chang, E.H., Tan, C.L., Abdullah, J., Yong, K.S.C., Building standard offline anti-phishing dataset for benchmarking. Int. J. Eng. Technol. 7:4.31 (2018), 7–14.
Chinnasamy, P., Kumaresan, N., Selvaraj, R., Dhanasekaran, S., Ramprathap, K., Boddu, S., An efficient phishing attack detection using machine learning algorithms. 2022 International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC), 2022, 1–6.
Do, N.Q., Selamat, A., Krejcar, O., Herrera-Viedma, E., Fujita, H., Deep learning for phishing detection: taxonomy, current challenges and future directions. IEEE Access, 2022.
Dunlop, M., Groat, S., Shelly, D., Goldphish: using images for content-based phishing analysis. 2010 Fifth International Conference on Internet Monitoring and Protection, 2010, IEEE, 123–128.
Dutta, A.K., Detecting phishing websites using machine learning technique. PLoS ONE, 16(10), 2021, e0258361.
Ejaz, A., Mian, A.N., Manzoor, S., Life-long phishing attack detection using continual learning. Sci. Rep., 13(1), 2023, 11488.
Elsadig, M., Ibrahim, A.O., Basheer, S., Alohali, M.A., Alshunaifi, S., Alqahtani, H., Alharbi, N., Nagmeldin, W., Intelligent deep machine learning cyber phishing url detection based on bert features extraction. Electronics, 11(22), 2022.
Feng, F., Zhou, Q., Shen, Z., Yang, X., Han, L., Wang, J., The application of a novel neural network in the detection of phishing websites. J. Ambient Intell. Humaniz. Comput., 2018, 1–15.
Feng, J., Zhang, Y., Qiao, Y., A detection method for phishing web page using dom-based doc2vec model. J. Comput. Inf. Technol. 28 (2020), 19–31.
Feng, J., Zou, L., Ye, O., Han, J., Web2vec: phishing webpage detection method based on multidimensional features driven by deep learning. IEEE Access 8 (2020), 221214–221224.
Hou, Y.-T., Chang, Y., Chen, T., Laih, C.-S., Chen, C.-M., Malicious web content detection by machine learning. Expert Syst. Appl. 37:1 (2010), 55–60.
IBM, Cost of a data breach report. https://www.ibm.com/downloads/cas/E3G5JMBP, 2023.
Kexin, X., Liang, B., Rai, A., Chan, A., URL classification with deep learning., 2021.
Khandelwal, S., Das, R., Phishing detection using computer vision. Computer Networks and Inventive Communication Technologies, 2022, Springer, 113–130.
Kim, T., Park, N., Hong, J., Kim, S.-W., Phishing url detection: a network-based approach robust to evasion. Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS '22, 2022, Association for Computing Machinery, 1769–1782.
Le, H., Pham, Q., Sahoo, D., Hoi, S.C., Urlnet: learning a url representation with deep learning for malicious url detection. arXiv preprint arXiv:1802.03162, 2018.
Le, Q., Mikolov, T., Distributed representations of sentences and documents. International Conference on Machine Learning, 2014, PMLR, 1188–1196.
LeCun, Y., Bengio, Y., Hinton, G., Deep learning. Nature 521:7553 (2015), 436–444.
Lee, J., Ye, P., Liu, R., Divakaran, D.M., Chan, M.C., Building robust phishing detection system: an empirical analysis. NDSS MADWeb, 2020.
Li, Y., Yang, M., Zhang, Z., A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 31:10 (2018), 1863–1883.
Lin, Y., Liu, R., Divakaran, D.M., Ng, J.Y., Chan, Q.Z., Lu, Y., Si, Y., Zhang, F., Dong, J.S., Phishpedia: a hybrid deep learning based approach to visually identify phishing webpages. USENIX Security Symposium, 2021.
Liu, E., Phishing webpage classification method based on joint features. 2021 3rd International Conference on Applied Machine Learning (ICAML), 2021, 24–27.
Liu, R., Lin, Y., Yang, X., Ng, S.H., Divakaran, D.M., Dong, J.S., Inferring phishing intention via webpage appearance and dynamics: a deep vision based approach. 31st USENIX Security Symposium (USENIX Security 22), 2022, USENIX Association, 1633–1650.
Maneriker, P., Stokes, J.W., Lazo, E.G., Carutasu, D., Tajaddodianfar, F., Gururajan, A., Urltran: improving phishing url detection using transformers. MILCOM 2021-2021 IEEE Military Communications Conference (MILCOM), 2021, IEEE, 197–204.
Morgan, S., Cybercrime to cost the world $10.5 trillion annually by 2025. Cybercrime magazine. bit.ly/CybercrimeMagazine, 2020.
Odeh, A., Keshta, I., Abdelfattah, E., Phiboost-anovel phishing detection model using adaptive boosting approach. Jordanian J. Comput. Inf. Technol., 7(01), 2021.
Oest, A., Safaei, Y., Zhang, P., Wardman, B., Tyers, K., Shoshitaishvili, Y., Doupé, A., {PhishTime}: continuous longitudinal measurement of the effectiveness of anti-phishing blacklists. 29th USENIX Security Symposium (USENIX Security 20), 2020, 379–396.
Opara, C., Wei, B., Chen, Y., Htmlphish: enabling phishing web page detection by applying deep learning techniques on html analysis. 2020 International Joint Conference on Neural Networks (IJCNN), 2020, IEEE, 1–8.
Opara, C., Chen, Y., Wei, B., Look before you leap: detecting phishing web pages by exploiting raw url and html characteristics. Expert Syst. Appl., 236, 2023, 121183.
OpenPhish, Openphish - phishing intelligence. https://openphish.com, 2022.
Ouyang, L., Zhang, Y., Phishing web page detection with html-level graph neural network. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), 2021, IEEE, 952–958.
Ozcan, A., Catal, C., Donmez, E., Senturk, B., A hybrid dnn–lstm model for detecting phishing urls. Neural Comput. Appl. 35:7 (2023), 4957–4973.
Peng, P., Xu, C., Quinn, L., Hu, H., Viswanath, B., Wang, G., What happens after you leak your password: understanding credential sharing on phishing sites. Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, 2019, 181–192.
PhishTank, Phishtank - join the fight against phishing. https://www.phishtank.com/, 2022.
Richardson, L., Beautiful soup library. bit.ly/BeautifulSoup4, 2022.
Ripa, S.P., Islam, F., Arifuzzaman, M., The emergence threat of phishing attack and the detection techniques using machine learning models. 2021 International Conference on Automation, Control and Mechatronics for Industry 4.0 (ACMI), 2021, 1–6.
Sabir, B., Babar, M., Gaire, R., Abuadbba, A., Reliability and robustness analysis of machine learning based phishing url detectors. IEEE Trans. Dependable Secure Comput. 01 (2022), 1–18.
Sahingoz, O.K., Buber, E., Demir, O., Diri, B., Machine learning based phishing detection from urls. Expert Syst. Appl. 117 (2019), 345–357.
Sahoo, D., Liu, C., Hoi, S.C., Malicious url detection using machine learning: a survey. arXiv preprint arXiv:1701.07179, 2017.
Sánchez-Paniagua, M., Fernández, E.F., Alegre, E., Al-Nabki, W., González-Castro, V., Phishing url detection: a real-case scenario through login urls. IEEE Access 10 (2022), 42949–42960.
Sánchez-Paniagua, M., Fidalgo, E., Alegre, E., Alaiz-Rodríguez, R., Phishing websites detection using a novel multipurpose dataset and web technologies features. Expert Syst. Appl., 207, 2022, 118010.
Shah, R.K., Hasan, M.K., Islam, S., Khan, A., Ghazal, T.M., Khan, A.N., Detect phishing website by fuzzy multi-criteria decision making. 2022 1st International Conference on AI in Cybersecurity (ICAIC), 2022, 1–8.
Su, Y., Research on website phishing detection based on LSTM RNN. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), vol. 1, 2020, 284–288.
Tang, L., Mahmoud, Q.H., A deep learning-based framework for phishing website detection. IEEE Access 10 (2021), 1509–1521.
Teraguchi, N.C.R.L.Y., Mitchell, J.C., Client-Side Defense Against Web-Based Identity Theft. 2004, Computer Science Department, Stanford University Available: http://crypto.stanford.edu/SpoofGuard/webspoof.pdf.
Valiyaveedu, N., Jamal, S., Reju, R., Murali, V., Nithin, K., Survey and analysis on ai based phishing detection techniques. 2021 International Conference on Communication Control and Information Sciences (ICCISc), vol. 1, 2021, IEEE, 1–6.
Van Dooremaal, B., Burda, P., Allodi, L., Zannone, N., Combining text and visual features to improve the identification of cloned webpages for early phishing detection. The 16th International Conference on Availability, Reliability and Security, 2021, 1–10.
Vecliuc, D.-D., Artene, C.-G., Tibeică, M.-N., Leon, F., An experimental study of machine learning for phishing detection. Asian Conference on Intelligent Information and Database Systems, 2021, Springer, 427–439.
Venugopal, S., Panale, S.Y., Agarwal, M., Kashyap, R., Ananthanagu, U., Detection of malicious urls through an ensemble of machine learning techniques. 2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 2021, IEEE, 1–6.
Vishva, E.S., Aju, D., Phisher fighter: website phishing detection system based on url and term frequency-inverse document frequency values. J. Cyber Secur. Mobil. 11:1 (2021), 83–104.
Wang, G., Verilogo: proactive phishing detection via logo recognition., 2010.
Wang, W., Zhang, F., Luo, X., Zhang, S., Pdrcnn: precise phishing detection with recurrent convolutional neural networks. Secur. Commun. Netw., 2019, 2019.
Wei, W., Ke, Q., Nowak, J., Korytkowski, M., Scherer, R., Woźniak, M., Accurate and fast url phishing detector: a convolutional neural network approach. Comput. Netw., 178, 2020, 107275.
Xiang, G., Hong, J., Rose, C.P., Cranor, L., Cantina+ a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14:2 (2011), 1–28.
Xiao, X., Xiao, W., Zhang, D., Zhang, B., Hu, G., Li, Q., Xia, S., Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets. Comput. Secur., 108, 2021, 102372.
Yang, R., Zheng, K., Wu, B., Wu, C., Wang, X., Phishing website detection based on deep convolutional neural network and random forest ensemble learning. Sensors, 21(24), 2021, 8281.
Zhang, H., Yu, Z., Dai, G., Huang, G., Ding, Y., Xie, Y., Wang, Y., Understanding GNN computational graph: a coordinated computation, IO, and memory perspective. Marculescu, D., Chi, Y., Wu, C., (eds.) Proceedings of Machine Learning and Systems, vol. 4, 2022, 467–484.
Zhang, Y., Hong, J.I., Cranor, L.F., Cantina: a content-based approach to detecting phishing web sites. Proceedings of the 16th International Conference on World Wide Web, 2007, 639–648.