References of "State, Radu 50003137"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailEmpirical assessment of machine learning-based malware detectors for Android: Measuring the Gap between In-the-Lab and In-the-Wild Validation Scenarios
Allix, Kevin UL; Bissyande, Tegawendé François D Assise UL; Jerome, Quentin UL et al

in Empirical Software Engineering (2014)

To address the issue of malware detection through large sets of applications, researchers have recently started to investigate the capabilities of machine-learning techniques for proposing effective ... [more ▼]

To address the issue of malware detection through large sets of applications, researchers have recently started to investigate the capabilities of machine-learning techniques for proposing effective approaches. So far, several promising results were recorded in the literature, many approaches being assessed with what we call in the lab validation scenarios. This paper revisits the purpose of malware detection to discuss whether such in the lab validation scenarios provide reliable indications on the performance of malware detectors in real-world settings, aka in the wild. To this end, we have devised several Machine Learning classifiers that rely on a set of features built from applications’ CFGs. We use a sizeable dataset of over 50 000 Android applications collected from sources where state-of-the art approaches have selected their data. We show that, in the lab, our approach outperforms existing machine learning-based approaches. However, this high performance does not translate in high performance in the wild. The performance gap we observed—F-measures dropping from over 0.9 in the lab to below 0.1 in the wild —raises one important question: How do state-of-the-art approaches perform in the wild ? [less ▲]

Detailed reference viewed: 470 (45 UL)
Full Text
Peer Reviewed
See detailPhishStorm: Detecting Phishing With Streaming Analytics
Marchal, Samuel UL; François, Jérôme UL; State, Radu UL et al

in IEEE Transactions on Network and Service Management (2014), 11(December), 458-471

Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due ... [more ▼]

Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due to the short lifetime of phishing Web sites, making recent approaches relying on real-time or proactive phishing URL detection techniques more appropriate. In this paper, we introduce PhishStorm, an automated phishing detection system that can analyze in real time any URL in order to identify potential phishing sites. PhishStorm can interface with any email server or HTTP proxy. We argue that phishing URLs usually have few relationships between the part of the URL that must be registered (low-level domain) and the remaining part of the URL (upper-level domain, path, query). We show in this paper that experimental evidence supports this observation and can be used to detect phishing sites. For this purpose, we define the new concept of intra-URL relatedness and evaluate it using features extracted from words that compose a URL based on query data from Google and Yahoo search engines. These features are then used in machine-learning-based classification to detect phishing URLs from a real dataset. Our technique is assessed on 96 018 phishing and legitimate URLs that result in a correct classification rate of 94.91% with only 1.44% false positives. An extension for a URL phishingness rating system exhibiting high confidence rate ( $>$ 99%) is proposed. We discuss in this paper efficient implementation patterns that allow real-time analytics using Big Data architectures such as STORM and advanced data structures based on the Bloom filter. [less ▲]

Detailed reference viewed: 600 (5 UL)
Full Text
Peer Reviewed
See detailPhishScore: Hacking Phishers' Minds
Marchal, Samuel UL; François, Jérôme UL; State, Radu UL et al

in Proceedings of the 10th International Conference on Network and Service Management (2014, November)

Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due ... [more ▼]

Despite the growth of prevention techniques, phishing remains an important threat since the principal countermeasures in use are still based on reactive URL blacklisting. This technique is inefficient due to the short lifetime of phishing Web sites, making recent approaches relying on real-time or proactive phishing URLs detection techniques more appropriate. In this paper we introduce PhishScore, an automated real-time phishing detection system. We observed that phishing URLs usually have few relationships between the part of the URL that must be registered (upper level domain) and the remaining part of the URL (low level domain, path, query). Hence, we define this concept as intra-URL relatedness and evaluate it using features extracted from words that compose a URL based on query data from Google and Yahoo search engines. These features are then used in machine learning based classification to detect phishing URLs from a real dataset. [less ▲]

Detailed reference viewed: 252 (6 UL)
Full Text
Peer Reviewed
See detailA Big Data Architecture for Large Scale Security Monitoring
Marchal, Samuel UL; Jiang, Xiuyan; State, Radu UL et al

in Proceedings of the 3rd IEEE Congress on Big Data (2014, July)

Network traffic is a rich source of information for security monitoring. However the increasing volume of data to treat raises issues, rendering holistic analysis of network traffic difficult. In this ... [more ▼]

Network traffic is a rich source of information for security monitoring. However the increasing volume of data to treat raises issues, rendering holistic analysis of network traffic difficult. In this paper we propose a solution to cope with the tremendous amount of data to analyse for security monitoring perspectives. We introduce an architecture dedicated to security monitoring of local enterprise networks. The application domain of such a system is mainly network intrusion detection and prevention, but can be used as well for forensic analysis. This architecture integrates two systems, one dedicated to scalable distributed data storage and management and the other dedicated to data exploitation. DNS data, NetFlow records, HTTP traffic and honeypot data are mined and correlated in a distributed system that leverages state of the art big data solution. Data correlation schemes are proposed and their performance are evaluated against several well-known big data framework including Hadoop and Spark. [less ▲]

Detailed reference viewed: 544 (14 UL)
Full Text
Peer Reviewed
See detailA Forensic Analysis of Android Malware -- How is Malware Written and How It Could Be Detected?
Allix, Kevin UL; Jerome, Quentin UL; Bissyande, Tegawendé François D Assise UL et al

in Proceedings of the 2014 IEEE 38th Annual Computer Software and Applications Conference (2014, July)

We consider in this paper the analysis of a large set of malware and benign applications from the Android ecosystem. Although a large body of research work has dealt with Android malware over the last ... [more ▼]

We consider in this paper the analysis of a large set of malware and benign applications from the Android ecosystem. Although a large body of research work has dealt with Android malware over the last years, none has addressed it from a forensic point of view. After collecting over 500,000 applications from user markets and research repositories, we perform an analysis that yields precious insights on the writing process of Android malware. This study also explores some strange artifacts in the datasets, and the divergent capabilities of state-of-the-art antivirus to recognize/define malware. We further highlight some major weak usage and misunderstanding of Android security by the criminal community and show some patterns in their operational flow. Finally, using insights from this analysis, we build a naive malware detection scheme that could complement existing anti virus software. [less ▲]

Detailed reference viewed: 335 (18 UL)
Full Text
Peer Reviewed
See detailUsing opcode-sequences to detect malicious Android applications
Jerome, Quentin UL; Allix, Kevin UL; State, Radu UL et al

in IEEE International Conference on Communications, ICC 2014, Sydney Australia, June 10-14, 2014 (2014, June)

Recently, the Android platform has seen its number of malicious applications increased sharply. Motivated by the easy application submission process and the number of alternative market places for ... [more ▼]

Recently, the Android platform has seen its number of malicious applications increased sharply. Motivated by the easy application submission process and the number of alternative market places for distributing Android applications, rogue authors are developing constantly new malicious programs. While current anti-virus software mainly relies on signature detection, the issue of alternative malware detection has to be addressed. In this paper, we present a feature based detection mechanism relying on opcode-sequences combined with machine learning techniques. We assess our tool on both a reference dataset known as Genome Project as well as on a wider sample of 40,000 applications retrieved from the Google Play Store. [less ▲]

Detailed reference viewed: 245 (12 UL)
Full Text
Peer Reviewed
See detailLarge-scale Machine Learning-based Malware Detection: Confronting the "10-fold Cross Validation" Scheme with Reality
Allix, Kevin UL; Bissyande, Tegawendé François D Assise UL; Jerome, Quentin UL et al

in Proceedings of the 4th ACM Conference on Data and Application Security and Privacy (2014, March)

To address the issue of malware detection, researchers have recently started to investigate the capabilities of machine- learning techniques for proposing effective approaches. Sev- eral promising results ... [more ▼]

To address the issue of malware detection, researchers have recently started to investigate the capabilities of machine- learning techniques for proposing effective approaches. Sev- eral promising results were recorded in the literature, many approaches being assessed with the common “10-Fold cross validation” scheme. This paper revisits the purpose of mal- ware detection to discuss the adequacy of the “10-Fold” scheme for validating techniques that may not perform well in real- ity. To this end, we have devised several Machine Learning classifiers that rely on a novel set of features built from ap- plications’ CFGs. We use a sizeable dataset of over 50,000 Android applications collected from sources where state-of- the art approaches have selected their data. We show that our approach outperforms existing machine learning-based approaches. However, this high performance on usual-size datasets does not translate in high performance in the wild. [less ▲]

Detailed reference viewed: 310 (24 UL)
Full Text
See detailCorrectness of source code extension for fault detection in openflow based networks
Hermann, Frank UL; Hommes, Stefan UL; State, Radu UL et al

Report (2014)

Software Defined Networks using OpenFlow have to provide a re- liable way to detect network faults and attacks. This technical report shows a formal analysis of correctness for an automated code extension ... [more ▼]

Software Defined Networks using OpenFlow have to provide a re- liable way to detect network faults and attacks. This technical report shows a formal analysis of correctness for an automated code extension technique used to extend OpenFlow networks with a logging mecha- nism that is used for the detection of faults and attacks. As presented in a companion paper, we applied the code extension techniques for a framework that can extend controller programs transparently, making possible on-line fault management, debugging as well as off-line and forensic analysis. [less ▲]

Detailed reference viewed: 162 (37 UL)
Full Text
Peer Reviewed
See detailImplications and Detection of DoS Attacks in OpenFlow-based Networks
Hommes, Stefan UL; State, Radu UL; Engel, Thomas UL

in 2014 IEEE Global Communications Conference (2014)

In this paper, we address the potential of centralised network monitoring based on Software-Defined Networking with OpenFlow. Due to the vulnerability of the flow table, which can store only a limited ... [more ▼]

In this paper, we address the potential of centralised network monitoring based on Software-Defined Networking with OpenFlow. Due to the vulnerability of the flow table, which can store only a limited number of entries, we discuss and show the implications for a DoS attack on a testbed consisting of OpenFlow enabled network devices. Such an attack can be detected by analysing variations in the logical topology, using techniques from information theory that can run as a network service on the network controller. [less ▲]

Detailed reference viewed: 78 (0 UL)
Full Text
Peer Reviewed
See detailAutomated Source Code Extension for Debugging of OpenFlow based Networks
Hommes, Stefan UL; Hermann, Frank UL; State, Radu UL et al

in Proc. 9th International Conference on Network and Service Management (CNSM) (2013, October)

Software-Defined Networks using OpenFlow have to provide a reliable way to to detect network faults in operational environments. Since the functionality of such networks is mainly based on the installed ... [more ▼]

Software-Defined Networks using OpenFlow have to provide a reliable way to to detect network faults in operational environments. Since the functionality of such networks is mainly based on the installed software, tools are required in order to determine software bugs. Moreover, network debugging might be necessary in order to detect faults that occurred on the network devices. To determine such activities, existing controller programs must be extended with the relevant functionality. In this paper we propose a framework that can modify controller programs transparently by using graph transformation, making possible online fault management through logging of network parameters in a NoSQL database. Latter acts as a storage system for flow entries and respective parameters, that can be leveraged to detect network anomalies or to perform forensic analysis. [less ▲]

Detailed reference viewed: 155 (15 UL)
Full Text
Peer Reviewed
See detailIdentifying abnormal pattern in cellular communication flows
Goergen, David UL; Mendiratta, Veena; State, Radu UL et al

in Proceedings of IPTComm 2013 (2013, October)

Analyzing communication flows on the network can help to improve the overall quality it provides to its users and allow the operators to detect abnormal patterns and react accordingly. In this paper we ... [more ▼]

Analyzing communication flows on the network can help to improve the overall quality it provides to its users and allow the operators to detect abnormal patterns and react accordingly. In this paper we consider the analysis of large volumes of cellular communications records. We propose a method that detects abnormal communications events covering call data record volumes, comprising a country-level data set. We detect patterns by calculating a weighted average using a sliding window with a fixed period and correlate the results with actual events happening at that time. We are able to successfully detect several events using a data set provided by a mobile phone operator, and suggest examples of future usage of the outcome such as real time pattern detection and possible visualisation for mobile phone operators. [less ▲]

Detailed reference viewed: 143 (3 UL)
Full Text
Peer Reviewed
See detailClassification of Log Files with Limited Labeled Data
Hommes, Stefan UL; State, Radu UL; Engel, Thomas UL

in Proceedings of IPTComm 2013 (2013, October)

We address the problem of anomaly detection in log files that consist of a huge number of records. In order to achieve this task, we demonstrate label propagation as a semi-supervised learning technique ... [more ▼]

We address the problem of anomaly detection in log files that consist of a huge number of records. In order to achieve this task, we demonstrate label propagation as a semi-supervised learning technique. The strength of this approach lies in the small amount of labelled data that is needed to label the remaining data. This is an advantage since labelled data needs human expertise which comes at a high cost and be- comes infeasible for big datasets. Even though our approach is generally applicable, we focus on the detection of anoma- lous records in firewall log files. This requires a separation of records into windows which are compared using different distance functions to determine their similarity. Afterwards, we apply label propagation to label a complete dataset in only a limited number of iterations. We demonstrate our approach on a realistic dataset from an ISP. [less ▲]

Detailed reference viewed: 210 (11 UL)
Full Text
Peer Reviewed
See detailAdvanced Detection Tool for PDF Threats
Jerome, Quentin UL; Marchal, Samuel UL; State, Radu UL et al

in Proceedings of the sixth International Workshop on Autonomous and Spontaneous Security, RHUL, Egham, U.K., 12th-13th September 2013 (2013, September 13)

In this paper we introduce an efficient application for malicious PDF detection: ADEPT. With targeted attacks rising over the recent past, exploring a new detection and mitigation paradigm becomes ... [more ▼]

In this paper we introduce an efficient application for malicious PDF detection: ADEPT. With targeted attacks rising over the recent past, exploring a new detection and mitigation paradigm becomes mandatory. The use of malicious PDF files that exploit vulnerabilities in well-known PDF readers has become a popular vector for targeted at- tacks, for which few efficient approaches exist. Although simple in theory, parsing followed by analysis of such files is resource-intensive and may even be impossible due to several obfuscation and reader-specific artifacts. Our paper describes a new approach for detecting such malicious payloads that leverages machine learning techniques and an efficient feature selection mechanism for rapidly detecting anomalies. We assess our approach on a large selection of malicious files and report the experimental performance results for the developed prototype. [less ▲]

Detailed reference viewed: 700 (6 UL)
See detailAggregating large-scale measurements for Application Layer Traffic Optimization (ALTO) Protocol
Goergen, David UL; State, Radu UL; Gurbani, Vijay

Scientific Conference (2013, July)

Analyzing and aggregating large-scale broadband measurements is essential to study trends and derive network analytics. These trends and analyses could be made available through well defined protocols ... [more ▼]

Analyzing and aggregating large-scale broadband measurements is essential to study trends and derive network analytics. These trends and analyses could be made available through well defined protocols such as the Application Layer Traffic Optimization (ALTO) protocol. However, ALTO requires network information to be distilled and abstracted in form of a network map and a cost map. We describe our methodology for analyzing the United States Federal Communication Commission’s (FCC) Measuring Broadband America (MBA) dataset to derive required topology and cost maps suitable for consumption by an ALTO server. [less ▲]

Detailed reference viewed: 102 (6 UL)
Full Text
Peer Reviewed
See detailASMATRA: Ranking ASs Providing Transit Service to Malware Hosters
Wagner, Cynthia UL; François, Jérôme UL; State, Radu UL et al

in IFIP/IEEE International Symposium on Integrated Network Management IM2013 (2013)

The Internet has grown into an enormous network offering a variety of services, which are spread over a multitude of domains. BGP-routing and Autonomous Systems (AS) are the key components for maintaining ... [more ▼]

The Internet has grown into an enormous network offering a variety of services, which are spread over a multitude of domains. BGP-routing and Autonomous Systems (AS) are the key components for maintaining high connectivity in the Internet. Unfortunately, Internet Service Providers (ISPs) operating ASs do not only host normal users and content, but also malicious content used by attackers for spreading malware, hosting phishing web-sites or performing any kind of fraudulent activity. Practical analysis shows that such malware-providing ASs prevent themselves from being de-peered by hiding behind other ASs, which do not host the malware themselves but simply provide transit service for malware. This paper presents a new method for detecting ASs that provide transit service for malware hosters, without being malicious themselves. A formal definition of the problem and the metrics are determined by using the AS graph. The PageRank algorithm is applied to improve the scalability and the completeness of the approach. The method is assessed on real and publicly available datasets, showing promising results. [less ▲]

Detailed reference viewed: 65 (1 UL)
Full Text
Peer Reviewed
See detailSemantic based DNS Forensics
Marchal, Samuel UL; François, Jérôme UL; State, Radu UL et al

in Proceedings of the IEEE International Workshop on Information Forensics and Security (2012, December)

In network level forensics, Domain Name Service (DNS) is a rich source of information. This paper describes a new approach to mine DNS data for forensic purposes. We propose a new technique that leverages ... [more ▼]

In network level forensics, Domain Name Service (DNS) is a rich source of information. This paper describes a new approach to mine DNS data for forensic purposes. We propose a new technique that leverages semantic and natural language processing tools in order to analyze large volumes of DNS data. The main research novelty consists in detecting malicious and dangerous domain names by evaluating the semantic similarity with already known names. This process can provide valuable information for reconstructing network and user activities. We show the efficiency of the method on experimental real datasets gathered from a national passive DNS system. [less ▲]

Detailed reference viewed: 201 (3 UL)
Full Text
Peer Reviewed
See detailProactive Discovery of Phishing Related Domain Names
Marchal, Samuel UL; François, Jérôme UL; State, Radu UL et al

in Proceedings of the 15th International Symposium on Research in Attacks, Intrusions and Defenses, Amsterdam 12-14 September 2012 (2012, September)

Phishing is an important security issue to the Internet, which has a significant economic impact. The main solution to counteract this threat is currently reactive blacklisting; however, as phishing ... [more ▼]

Phishing is an important security issue to the Internet, which has a significant economic impact. The main solution to counteract this threat is currently reactive blacklisting; however, as phishing attacks are mainly performed over short periods of time, reactive methods are too slow. As a result, new approaches to early identify malicious websites are needed. In this paper a new proactive discovery of phishing related domain names is introduced. We mainly focus on the automated detec- tion of possible domain registrations for malicious activities. We leverage techniques coming from natural language modelling in order to build pro- active blacklists. The entries in this list are built using language models and vocabularies encountered in phishing related activities - “secure”, “banking”, brand names, etc. Once a pro-active blacklist is created, ongoing and daily monitoring of only these domains can lead to the efficient detection of phishing web sites. [less ▲]

Detailed reference viewed: 130 (1 UL)
Full Text
Peer Reviewed
See detailA Distance-Based Method to Detect Anomalous Attributes in Log Files
Hommes, Stefan UL; State, Radu UL; Engel, Thomas UL

in Proceedings of IEEE/IFIP NOMS 2012 (2012, April)

Dealing with large volumes of logs is like the prover- bial needle in the haystack problem. Finding relevant events that might be associated with an incident, or real time analysis of operational logs is ... [more ▼]

Dealing with large volumes of logs is like the prover- bial needle in the haystack problem. Finding relevant events that might be associated with an incident, or real time analysis of operational logs is extremely difficult when the underlying data volume is huge and when no explicit misuse model exists. While domain-specific knowledge and human expertise may be useful in analysing log data, automated approaches for detecting anomalies and track incidents are the only viable solutions when confronted with large volumes of data. In this paper we address the issue of automated log analysis and consider more specifically the case of ISP-provided firewall logs. We leverage approaches derived from statistical process control and information theory in order to track potential incidents and detect suspicious network activity. [less ▲]

Detailed reference viewed: 110 (4 UL)
Full Text
Peer Reviewed
See detailDNSSM: A large-scale Passive DNS Security Monitoring Framework
Marchal, Samuel UL; François, Jérôme UL; Wagner, Cynthia UL et al

in IEEE/IFIP Network Operations and Management Symposium (2012, April)

We present a monitoring approach and the supporting software architecture for passive DNS traffic. Monitoring DNS traffic can reveal essential network and system level activity profiles. Worm infected and ... [more ▼]

We present a monitoring approach and the supporting software architecture for passive DNS traffic. Monitoring DNS traffic can reveal essential network and system level activity profiles. Worm infected and botnet participating hosts can be identified and malicious backdoor communications can be detected. Any passive DNS monitoring solution needs to address several challenges that range from architectural approaches for dealing with large volumes of data up to specific Data Mining approaches for this purpose. We describe a framework that leverages state of the art distributed processing facilities with clustering techniques in order to detect anomalies in both online and offline DNS traffic. This framework entitled DSNSM is implemented and operational on several networks. We validate the framework against two large trace sets. [less ▲]

Detailed reference viewed: 189 (2 UL)