References of "Jimenez, Matthieu 50002043"
     in
Bookmark and Share    
Full Text
Peer Reviewed
See detailTUNA: TUning Naturalness-based Analysis
Jimenez, Matthieu UL; Cordy, Maxime UL; Le Traon, Yves UL et al

in 34th IEEE International Conference on Software Maintenance and Evolution, Madrid, Spain, 26-28 September 2018 (2018, September 26)

Natural language processing techniques, in particular n-gram models, have been applied successfully to facilitate a number of software engineering tasks. However, in our related ICSME ’18 paper, we have ... [more ▼]

Natural language processing techniques, in particular n-gram models, have been applied successfully to facilitate a number of software engineering tasks. However, in our related ICSME ’18 paper, we have shown that the conclusions of a study can drastically change with respect to how the code is tokenized and how the used n-gram model is parameterized. These choices are thus of utmost importance, and one must carefully make them. To show this and allow the community to benefit from our work, we have developed TUNA (TUning Naturalness-based Analysis), a Java software artifact to perform naturalness-based analyses of source code. To the best of our knowledge, TUNA is the first open- source, end-to-end toolchain to carry out source code analyses based on naturalness. [less ▲]

Detailed reference viewed: 11 (1 UL)
Full Text
Peer Reviewed
See detailOn the impact of tokenizer and parameters on N-gram based Code Analysis
Jimenez, Matthieu UL; Cordy, Maxime UL; Le Traon, Yves UL et al

Scientific Conference (2018, September)

Recent research shows that language models, such as n-gram models, are useful at a wide variety of software engineering tasks, e.g., code completion, bug identification, code summarisation, etc. However ... [more ▼]

Recent research shows that language models, such as n-gram models, are useful at a wide variety of software engineering tasks, e.g., code completion, bug identification, code summarisation, etc. However, such models require the appropriate set of numerous parameters. Moreover, the different ways one can read code essentially yield different models (based on the different sequences of tokens). In this paper, we focus on n- gram models and evaluate how the use of tokenizers, smoothing, unknown threshold and n values impact the predicting ability of these models. Thus, we compare the use of multiple tokenizers and sets of different parameters (smoothing, unknown threshold and n values) with the aim of identifying the most appropriate combinations. Our results show that the Modified Kneser-Ney smoothing technique performs best, while n values are depended on the choice of the tokenizer, with values 4 or 5 offering a good trade-off between entropy and computation time. Interestingly, we find that tokenizers treating the code as simple text are the most robust ones. Finally, we demonstrate that the differences between the tokenizers are of practical importance and have the potential of changing the conclusions of a given experiment. [less ▲]

Detailed reference viewed: 4 (0 UL)
Full Text
Peer Reviewed
See detailEnabling the Continous Analysis of Security Vulnerabilities with VulData7
Jimenez, Matthieu UL; Le Traon, Yves UL; Papadakis, Mike UL

in IEEE International Working Conference on Source Code Analysis and Manipulation (2018)

Detailed reference viewed: 14 (1 UL)
Full Text
Peer Reviewed
See detailAnalyzing Complex Data in Motion at Scale with Temporal Graphs
Hartmann, Thomas UL; Fouquet, François UL; Jimenez, Matthieu UL et al

in Proceedings of the 29th International Conference on Software Engineering and Knowledge Engineering (2017, July)

Modern analytics solutions succeed to understand and predict phenomenons in a large diversity of software systems, from social networks to Internet-of-Things platforms. This success challenges analytics ... [more ▼]

Modern analytics solutions succeed to understand and predict phenomenons in a large diversity of software systems, from social networks to Internet-of-Things platforms. This success challenges analytics algorithms to deal with more and more complex data, which can be structured as graphs and evolve over time. However, the underlying data storage systems that support large-scale data analytics, such as time-series or graph databases, fail to accommodate both dimensions, which limits the integration of more advanced analysis taking into account the history of complex graphs, for example. This paper therefore introduces a formal and practical definition of temporal graphs. Temporal graphs pro- vide a compact representation of time-evolving graphs that can be used to analyze complex data in motion. In particular, we demonstrate with our open-source implementation, named GREYCAT, that the performance of temporal graphs allows analytics solutions to deal with rapidly evolving large-scale graphs. [less ▲]

Detailed reference viewed: 121 (9 UL)
Full Text
See detailOn the Naturalness of Mutants
Jimenez, Matthieu UL; Cordy, Maxime UL; Kintis, Marinos UL et al

E-print/Working paper (2017)

Detailed reference viewed: 53 (5 UL)
Full Text
Peer Reviewed
See detailAn Empirical Analysis of Vulnerabilities in OpenSSL and the Linux Kernel
Jimenez, Matthieu UL; Papadakis, Mike UL; Le Traon, Yves UL

in 2016 Asia-Pacific Software Engineering Conference (APSEC) (2016, December)

Vulnerabilities are one of the main concerns faced by practitioners when working with security critical applications. Unfortunately, developers and security teams, even experienced ones, fail to identify ... [more ▼]

Vulnerabilities are one of the main concerns faced by practitioners when working with security critical applications. Unfortunately, developers and security teams, even experienced ones, fail to identify many of them with severe consequences. Vulnerabilities are hard to discover since they appear in various forms, caused by many different issues and their identification requires an attacker’s mindset. In this paper, we aim at increasing the understanding of vulnerabilities by investigating their characteristics on two major open-source software systems, i.e., the Linux kernel and OpenSSL. In particular, we seek to analyse and build a profile for vulnerable code, which can ultimately help researchers in building automated approaches like vulnerability prediction models. Thus, we examine the location, criticality and category of vulnerable code along with its relation with software metrics. To do so, we collect more than 2,200 vulnerable files accounting for 863 vulnerabilities and compute more than 35 software metrics. Our results indicate that while 9 Common Weakness Enumeration (CWE) types of vulnerabilities are prevalent, only 3 of them are critical in OpenSSL and 2 of them in the Linux kernel. They also indicate that different types of vulnerabilities have different characteristics, i.e., metric profiles, and that vulnerabilities of the same type have different profiles in the two projects we examined. We also found that the file structure of the projects can provide useful information related to the vulnerabilities. Overall, our results demonstrate the need for making project specific approaches that focus on specific types of vulnerabilities. [less ▲]

Detailed reference viewed: 162 (15 UL)
Full Text
Peer Reviewed
See detailVulnerability Prediction Models: A case study on the Linux Kernel
Jimenez, Matthieu UL; Papadakis, Mike UL; Le Traon, Yves UL

in 16th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2016, Raleigh, US, October 2-3, 2016 (2016, October)

To assist the vulnerability identification process, researchers proposed prediction models that highlight (for inspection) the most likely to be vulnerable parts of a system. In this paper we aim at ... [more ▼]

To assist the vulnerability identification process, researchers proposed prediction models that highlight (for inspection) the most likely to be vulnerable parts of a system. In this paper we aim at making a reliable replication and comparison of the main vulnerability prediction models. Thus, we seek for determining their effectiveness, i.e., their ability to distinguish between vulnerable and non-vulnerable components, in the context of the Linux Kernel, under different scenarios. To achieve the above-mentioned aims, we mined vulnerabilities reported in the National Vulnerability Database and created a large dataset with all vulnerable components of Linux from 2005 to 2016. Based on this, we then built and evaluated the prediction models. We observe that an approach based on the header files included and on function calls performs best when aiming at future vulnerabilities, while text mining is the best technique when aiming at random instances. We also found that models based on code metrics perform poorly. We show that in the context of the Linux kernel, vulnerability prediction models can be superior to random selection and relatively precise. Thus, we conclude that practitioners have a valuable tool for prioritizing their security inspection efforts. [less ▲]

Detailed reference viewed: 280 (21 UL)
Full Text
Peer Reviewed
See detailProfiling Android Vulnerabilities
Jimenez, Matthieu UL; Papadakis, Mike UL; Bissyande, Tegawendé François D Assise UL et al

in 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS 2016) (2016, August)

In widely used mobile operating systems a single vulnerability can threaten the security and privacy of billions of users. Therefore, identifying vulnerabilities and fortifying software systems requires ... [more ▼]

In widely used mobile operating systems a single vulnerability can threaten the security and privacy of billions of users. Therefore, identifying vulnerabilities and fortifying software systems requires constant attention and effort. However, this is costly and it is almost impossible to analyse an entire code base. Thus, it is necessary to prioritize efforts towards the most likely vulnerable areas. A first step in identifying these areas is to profile vulnerabilities based on previously reported ones. To investigate this, we performed a manual analysis of Android vulnerabilities, as reported in the National Vulnerability Database for the period 2008 to 2014. In our analysis, we identified a comprehensive list of issues leading to Android vulnerabilities. We also point out characteristics of the locations where vulnerabilities reside, the complexity of these locations and the complexity to fix the vulnerabilities. To enable future research, we make available all of our data. [less ▲]

Detailed reference viewed: 217 (22 UL)