PhishStorm: Detecting Phishing With Streaming Analytics

Marchal, Samuel; François, Jérôme; State, Radu; Engel, Thomas

doi:10.1109/TNSM.2014.2377295

Download

Article (Scientific journals)

PhishStorm: Detecting Phishing With Streaming Analytics

Marchal, Samuel; François, Jérôme; State, Radu et al.

2014 • In IEEE Transactions on Network and Service Management, 11 (December), p. 458-471

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/10993/20053

DOI
10.1109/TNSM.2014.2377295

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

phishStorm-revised.pdf

Author preprint (2.3 MB)

Download

All documents in ORBilu are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Big Data; Phishing Detection; Machine Learning; Mining and Statistical Methods; Search Engine Query Data; URL rating; Word Relatedness

Abstract :

[en] Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due to the short lifetime of phishing Web sites, making recent approaches relying on real-time or proactive phishing URL detection techniques more appropriate. In this paper, we introduce PhishStorm, an automated phishing detection system that can analyze in real time any URL in order to identify potential phishing sites. PhishStorm can interface with any email server or HTTP proxy. We argue that phishing URLs usually have few relationships between the part of the URL that must be registered (low-level domain) and the remaining part of the URL (upper-level domain, path, query). We show in this paper that experimental evidence supports this observation and can be used to detect phishing sites. For this purpose, we define the new concept of intra-URL relatedness and evaluate it using features extracted from words that compose a URL based on query data from Google and Yahoo search engines. These features are then used in machine-learning-based classification to detect phishing URLs from a real dataset. Our technique is assessed on 96 018 phishing and legitimate URLs that result in a correct classification rate of 94.91% with only 1.44% false positives. An extension for a URL phishingness rating system exhibiting high confidence rate ( $>$ 99%) is proposed. We discuss in this paper efficient implementation patterns that allow real-time analytics using Big Data architectures such as STORM and advanced data structures based on the Bloom filter.

Research center :

Interdisciplinary Centre for Security, Reliability and Trust

Disciplines :

Computer science

Author, co-author :

Marchal, Samuel ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

François, Jérôme ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

State, Radu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)

Engel, Thomas ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)

Language :

English

Title :

PhishStorm: Detecting Phishing With Streaming Analytics

Publication date :

December 2014

Journal title :

IEEE Transactions on Network and Service Management

ISSN :

1932-4537

Publisher :

IEEE Communications Society, New York, United States - New York

Volume :

Issue :

December

Pages :

458-471

Peer reviewed :

Peer Reviewed verified by ORBi

Funders :

FNR - Fonds National de la Recherche [LU]

Available on ORBilu :

since 16 February 2015

Statistics

Number of views

580 (6 by Unilu)

Number of downloads

3425 (6 by Unilu)

More statistics

Scopus citations^®

153

Scopus citations^®
without self-citations

148

WoS citations^™

106