Results 21-23 of 23.
Bookmark and Share    
Full Text
Peer Reviewed
See detailOn the impact of tokenizer and parameters on N-gram based Code Analysis
Jimenez, Matthieu UL; Cordy, Maxime UL; Le Traon, Yves UL et al

Scientific Conference (2018, September)

Recent research shows that language models, such as n-gram models, are useful at a wide variety of software engineering tasks, e.g., code completion, bug identification, code summarisation, etc. However ... [more ▼]

Recent research shows that language models, such as n-gram models, are useful at a wide variety of software engineering tasks, e.g., code completion, bug identification, code summarisation, etc. However, such models require the appropriate set of numerous parameters. Moreover, the different ways one can read code essentially yield different models (based on the different sequences of tokens). In this paper, we focus on n- gram models and evaluate how the use of tokenizers, smoothing, unknown threshold and n values impact the predicting ability of these models. Thus, we compare the use of multiple tokenizers and sets of different parameters (smoothing, unknown threshold and n values) with the aim of identifying the most appropriate combinations. Our results show that the Modified Kneser-Ney smoothing technique performs best, while n values are depended on the choice of the tokenizer, with values 4 or 5 offering a good trade-off between entropy and computation time. Interestingly, we find that tokenizers treating the code as simple text are the most robust ones. Finally, we demonstrate that the differences between the tokenizers are of practical importance and have the potential of changing the conclusions of a given experiment. [less ▲]

Detailed reference viewed: 184 (12 UL)
Full Text
Peer Reviewed
See detailEnabling lock-free concurrent workers over temporal graphs composed of multiple time-series
Fouquet, Francois; Hartmann, Thomas UL; Mosser, Sébastien et al

in 33rd Annual ACM Symposium on Applied Computing (SAC'18) (2018, April)

Time series are commonly used to store temporal data, e.g., sensor measurements. However, when it comes to complex analytics and learning tasks, these measurements have to be combined with structural ... [more ▼]

Time series are commonly used to store temporal data, e.g., sensor measurements. However, when it comes to complex analytics and learning tasks, these measurements have to be combined with structural context data. Temporal graphs, connecting multiple time- series, have proven to be very suitable to organize such data and ultimately empower analytic algorithms. Computationally intensive tasks often need to be distributed and parallelized among different workers. For tasks that cannot be split into independent parts, several workers have to concurrently read and update these shared temporal graphs. This leads to inconsistency risks, especially in the case of frequent updates. Distributed locks can mitigate these risks but come with a very high-performance cost. In this paper, we present a lock-free approach allowing to concurrently modify temporal graphs. Our approach is based on a composition operator able to do online reconciliation of concurrent modifications of temporal graphs. We evaluate the efficiency and scalability of our approach compared to lock-based approaches. [less ▲]

Detailed reference viewed: 109 (4 UL)
Full Text
See detailOn the Naturalness of Mutants
Jimenez, Matthieu UL; Cordy, Maxime UL; Kintis, Marinos UL et al

E-print/Working paper (2017)

Detailed reference viewed: 119 (13 UL)