![]() Marchal, Samuel ![]() Doctoral thesis (2015) Phishing is a kind of modern swindles that targets electronic communications users and aims to persuade them to perform actions for a another’s benefit. Miscreants performing this activity are named ... [more ▼] Phishing is a kind of modern swindles that targets electronic communications users and aims to persuade them to perform actions for a another’s benefit. Miscreants performing this activity are named phishers and employ their power of persuasion to tailor socially engineered messages able to deceive their gullible victims. A popular example of phishing activities is the stealing of web services account login information or credit card information using fake websites or spoofed emails. However, several means are used to perform phishing attacks and several goals are sought, which harden the fight against phishing. Despite the forces engaged to get rid of this threat, phishing remains a concerning problem since the financial damage it causes is increasing overtime. Moreover, the perceived fatality about being a victim of phishing erodes the trust among users and threaten the use of electronic means as way of communicating. Existing solutions to cope with phishing attacks are not adapted to their short lifetime and the variety of means used to perform them, making them inefficient. Crowd verified blacklists, emails content analysis techniques or web page content analysis techniques did not succeed to reverse the increasing trend presented by phishing consequences. None of these solutions present the essential requirements that must meet a phishing protection technique to be efficient and which are speed, coverage, reliability and usability. Stating that phishing attacks rely mostly on social engineering and that most phishing vectors leverage directing links represented by domain names and URLs, we introduce new solutions to cope with phishing. These solutions rely on the lexical and semantic analysis of the composition of domain names and URLs. Both of these resource pointers are created and obfuscated by phishers to trap their victims. Hence, we demonstrate in this document that phishing do- main names and URLs present similarities in their lexical and semantic composition that are different form legitimate domain names and URLs composition. We use this characteristic to build models representing the composition of phishing URLs and domain names using machine learning techniques and natural language processing models. The built models are used for several applications such as the identification of phishing domain names and phishing URLs, the rating of phishing URLs and the prediction of domain names used in phishing attacks. All the introduced techniques are assessed on ground truth data and show their efficiency by meeting speed, coverage and reliability requirements. This document shows that the use of lexical and semantic analysis can be applied to domain names and URLs and that this application is relevant to detect phishing attacks. [less ▲] Detailed reference viewed: 405 (12 UL)![]() Marchal, Samuel ![]() ![]() ![]() in IEEE Transactions on Network and Service Management (2014), 11(December), 458-471 Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due ... [more ▼] Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due to the short lifetime of phishing Web sites, making recent approaches relying on real-time or proactive phishing URL detection techniques more appropriate. In this paper, we introduce PhishStorm, an automated phishing detection system that can analyze in real time any URL in order to identify potential phishing sites. PhishStorm can interface with any email server or HTTP proxy. We argue that phishing URLs usually have few relationships between the part of the URL that must be registered (low-level domain) and the remaining part of the URL (upper-level domain, path, query). We show in this paper that experimental evidence supports this observation and can be used to detect phishing sites. For this purpose, we define the new concept of intra-URL relatedness and evaluate it using features extracted from words that compose a URL based on query data from Google and Yahoo search engines. These features are then used in machine-learning-based classification to detect phishing URLs from a real dataset. Our technique is assessed on 96 018 phishing and legitimate URLs that result in a correct classification rate of 94.91% with only 1.44% false positives. An extension for a URL phishingness rating system exhibiting high confidence rate ( $>$ 99%) is proposed. We discuss in this paper efficient implementation patterns that allow real-time analytics using Big Data architectures such as STORM and advanced data structures based on the Bloom filter. [less ▲] Detailed reference viewed: 650 (5 UL)![]() Marchal, Samuel ![]() ![]() ![]() in Proceedings of the 10th International Conference on Network and Service Management (2014, November) Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due ... [more ▼] Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due to the short lifetime of phishing Web sites, making recent approaches relying on real-time or proactive phishing URLs detection techniques more appropriate. In this paper we introduce PhishScore, an automated real-time phishing detection system. We observed that phishing URLs usually have few relationships between the part of the URL that must be registered (upper level domain) and the remaining part of the URL (low level domain, path, query). Hence, we define this concept as intra-URL relatedness and evaluate it using features extracted from words that compose a URL based on query data from Google and Yahoo search engines. These features are then used in machine learning based classification to detect phishing URLs from a real dataset. [less ▲] Detailed reference viewed: 367 (10 UL)![]() Marchal, Samuel ![]() ![]() in Proceedings of the 3rd IEEE Congress on Big Data (2014, July) Network traffic is a rich source of information for security monitoring. However the increasing volume of data to treat raises issues, rendering holistic analysis of network traffic difficult. In this ... [more ▼] Network traffic is a rich source of information for security monitoring. However the increasing volume of data to treat raises issues, rendering holistic analysis of network traffic difficult. In this paper we propose a solution to cope with the tremendous amount of data to analyse for security monitoring perspectives. We introduce an architecture dedicated to security monitoring of local enterprise networks. The application domain of such a system is mainly network intrusion detection and prevention, but can be used as well for forensic analysis. This architecture integrates two systems, one dedicated to scalable distributed data storage and management and the other dedicated to data exploitation. DNS data, NetFlow records, HTTP traffic and honeypot data are mined and correlated in a distributed system that leverages state of the art big data solution. Data correlation schemes are proposed and their performance are evaluated against several well-known big data framework including Hadoop and Spark. [less ▲] Detailed reference viewed: 581 (14 UL)![]() Jerome, Quentin ![]() ![]() ![]() in Proceedings of the sixth International Workshop on Autonomous and Spontaneous Security, RHUL, Egham, U.K., 12th-13th September 2013 (2013, September 13) In this paper we introduce an efficient application for malicious PDF detection: ADEPT. With targeted attacks rising over the recent past, exploring a new detection and mitigation paradigm becomes ... [more ▼] In this paper we introduce an efficient application for malicious PDF detection: ADEPT. With targeted attacks rising over the recent past, exploring a new detection and mitigation paradigm becomes mandatory. The use of malicious PDF files that exploit vulnerabilities in well-known PDF readers has become a popular vector for targeted at- tacks, for which few efficient approaches exist. Although simple in theory, parsing followed by analysis of such files is resource-intensive and may even be impossible due to several obfuscation and reader-specific artifacts. Our paper describes a new approach for detecting such malicious payloads that leverages machine learning techniques and an efficient feature selection mechanism for rapidly detecting anomalies. We assess our approach on a large selection of malicious files and report the experimental performance results for the developed prototype. [less ▲] Detailed reference viewed: 994 (6 UL)![]() Marchal, Samuel ![]() ![]() ![]() in Proceedings of the IEEE International Workshop on Information Forensics and Security (2012, December) In network level forensics, Domain Name Service (DNS) is a rich source of information. This paper describes a new approach to mine DNS data for forensic purposes. We propose a new technique that leverages ... [more ▼] In network level forensics, Domain Name Service (DNS) is a rich source of information. This paper describes a new approach to mine DNS data for forensic purposes. We propose a new technique that leverages semantic and natural language processing tools in order to analyze large volumes of DNS data. The main research novelty consists in detecting malicious and dangerous domain names by evaluating the semantic similarity with already known names. This process can provide valuable information for reconstructing network and user activities. We show the efficiency of the method on experimental real datasets gathered from a national passive DNS system. [less ▲] Detailed reference viewed: 244 (3 UL)![]() Marchal, Samuel ![]() ![]() ![]() in Proceedings of the 15th International Symposium on Research in Attacks, Intrusions and Defenses, Amsterdam 12-14 September 2012 (2012, September) Phishing is an important security issue to the Internet, which has a significant economic impact. The main solution to counteract this threat is currently reactive blacklisting; however, as phishing ... [more ▼] Phishing is an important security issue to the Internet, which has a significant economic impact. The main solution to counteract this threat is currently reactive blacklisting; however, as phishing attacks are mainly performed over short periods of time, reactive methods are too slow. As a result, new approaches to early identify malicious websites are needed. In this paper a new proactive discovery of phishing related domain names is introduced. We mainly focus on the automated detec- tion of possible domain registrations for malicious activities. We leverage techniques coming from natural language modelling in order to build pro- active blacklists. The entries in this list are built using language models and vocabularies encountered in phishing related activities - “secure”, “banking”, brand names, etc. Once a pro-active blacklist is created, ongoing and daily monitoring of only these domains can lead to the efficient detection of phishing web sites. [less ▲] Detailed reference viewed: 170 (1 UL)![]() Marchal, Samuel ![]() ![]() in 6th IFIP WG 6.6 International Conference on Autonomous Infrastructure, Management, and Security, Luxembourg, June 4-8 2012 (2012, June) In this paper we present an architecture for large scale DNS monitoring. The analysis of DNS traffic is becoming of first importance currently, as it allows to monitor the main part of the interactions on ... [more ▼] In this paper we present an architecture for large scale DNS monitoring. The analysis of DNS traffic is becoming of first importance currently, as it allows to monitor the main part of the interactions on the Internet. DNS traffic can reveal anomalies such as worm infected hosts, botnets or spam participating hosts. The efficiency and the speed of detection of such anomalies rely on the capacity of DNS monitoring system to treat quickly huge quantity of data. We propose a system that leverages distributed processing and storage facilities. [less ▲] Detailed reference viewed: 129 (2 UL)![]() Marchal, Samuel ![]() ![]() ![]() in Proceedings of the 11th International IFIP TC 6 Networking Conference, Prague, Czech Republic, May 21-25 2012 (2012, May) The DNS structure discloses useful information about the organization and the operation of an enterprise network, which can be used for designing attacks as well as monitoring domains supporting malicious ... [more ▼] The DNS structure discloses useful information about the organization and the operation of an enterprise network, which can be used for designing attacks as well as monitoring domains supporting malicious activities. Thus, this paper introduces a new method for exploring the DNS domains. Although our previous work described a tool to generate existing DNS names accurately in order to probe a domain automatically, the approach is extended by leveraging semantic analysis of domain names. In particular, the semantic distributional similarity and relatedness of sub-domains are considered as well as sequential patterns. The evaluation shows that the discovery is highly improved while the overhead remains low, comparing with non semantic DNS probing tools including ours and others. [less ▲] Detailed reference viewed: 157 (0 UL)![]() Marchal, Samuel ![]() ![]() ![]() in IEEE/IFIP Network Operations and Management Symposium (2012, April) We present a monitoring approach and the supporting software architecture for passive DNS traffic. Monitoring DNS traffic can reveal essential network and system level activity profiles. Worm infected and ... [more ▼] We present a monitoring approach and the supporting software architecture for passive DNS traffic. Monitoring DNS traffic can reveal essential network and system level activity profiles. Worm infected and botnet participating hosts can be identified and malicious backdoor communications can be detected. Any passive DNS monitoring solution needs to address several challenges that range from architectural approaches for dealing with large volumes of data up to specific Data Mining approaches for this purpose. We describe a framework that leverages state of the art distributed processing facilities with clustering techniques in order to detect anomalies in both online and offline DNS traffic. This framework entitled DSNSM is implemented and operational on several networks. We validate the framework against two large trace sets. [less ▲] Detailed reference viewed: 232 (2 UL) |
||