Doctoral thesis (Dissertations and theses)
DNS and Semantic Analysis for Phishing Detection
Marchal, Samuel


Full Text
Author postprint (9.64 MB)
(4.38 MB)

All documents in ORBilu are protected by a user license.

Send to


Keywords :
phishing detection; DNS monitoring; semantic analysis; URL lexical anlysis; Internet security; machine learning
Abstract :
[en] Phishing is a kind of modern swindles that targets electronic communications users and aims to persuade them to perform actions for a another’s benefit. Miscreants performing this activity are named phishers and employ their power of persuasion to tailor socially engineered messages able to deceive their gullible victims. A popular example of phishing activities is the stealing of web services account login information or credit card information using fake websites or spoofed emails. However, several means are used to perform phishing attacks and several goals are sought, which harden the fight against phishing. Despite the forces engaged to get rid of this threat, phishing remains a concerning problem since the financial damage it causes is increasing overtime. Moreover, the perceived fatality about being a victim of phishing erodes the trust among users and threaten the use of electronic means as way of communicating. Existing solutions to cope with phishing attacks are not adapted to their short lifetime and the variety of means used to perform them, making them inefficient. Crowd verified blacklists, emails content analysis techniques or web page content analysis techniques did not succeed to reverse the increasing trend presented by phishing consequences. None of these solutions present the essential requirements that must meet a phishing protection technique to be efficient and which are speed, coverage, reliability and usability. Stating that phishing attacks rely mostly on social engineering and that most phishing vectors leverage directing links represented by domain names and URLs, we introduce new solutions to cope with phishing. These solutions rely on the lexical and semantic analysis of the composition of domain names and URLs. Both of these resource pointers are created and obfuscated by phishers to trap their victims. Hence, we demonstrate in this document that phishing do- main names and URLs present similarities in their lexical and semantic composition that are different form legitimate domain names and URLs composition. We use this characteristic to build models representing the composition of phishing URLs and domain names using machine learning techniques and natural language processing models. The built models are used for several applications such as the identification of phishing domain names and phishing URLs, the rating of phishing URLs and the prediction of domain names used in phishing attacks. All the introduced techniques are assessed on ground truth data and show their efficiency by meeting speed, coverage and reliability requirements. This document shows that the use of lexical and semantic analysis can be applied to domain names and URLs and that this application is relevant to detect phishing attacks.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust
Disciplines :
Computer science
Author, co-author :
Marchal, Samuel ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Language :
Title :
DNS and Semantic Analysis for Phishing Detection
Alternative titles :
[fr] Analyse du DNS et Analyse Sémantique pour la Détection de l’Hameçonnage
Defense date :
22 June 2015
Number of pages :
Institution :
Unilu - University of Luxembourg, Luxembourg, Luxembourg
Degree :
Promotor :
Engel, Thomas 
Festor, Olivier
President :
Jury member :
Godart, Claude
Filiol, Eric
Totel, Eric
Gurbani, Vijay
State, Radu  
Funders :
FNR - Fonds National de la Recherche [LU]
Available on ORBilu :
since 24 June 2015


Number of views
432 (12 by Unilu)
Number of downloads
2093 (17 by Unilu)


Similar publications

Contact ORBilu