[en] Natural language processing techniques, in particular n-gram models, have been applied successfully to facilitate a number of software engineering tasks. However, in our related ICSME ’18 paper, we have shown that the conclusions of a study can drastically change with respect to how the code is tokenized and how the used n-gram model is parameterized. These choices are thus of utmost importance, and one must carefully make them. To show this and allow the community to benefit from our work, we have developed TUNA (TUning Naturalness-based Analysis), a Java software artifact to perform naturalness-based analyses of source code. To the best of our knowledge, TUNA is the first open- source, end-to-end toolchain to carry out source code analyses based on naturalness.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Security Design and Validation Research Group (SerVal)
Disciplines :
Computer science
Author, co-author :
Jimenez, Matthieu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Cordy, Maxime ; Facultés Universitaires Notre-Dame de la Paix - Namur - FUNDP
Le Traon, Yves ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Papadakis, Mike ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Computer Science and Communications Research Unit (CSC)
External co-authors :
yes
Language :
English
Title :
TUNA: TUning Naturalness-based Analysis
Publication date :
26 September 2018
Event name :
34th IEEE International Conference on Software Maintenance and Evolution (ICSME'18)
Event place :
Madrid, Spain
Event date :
26-28 September 2018
Audience :
International
Main work title :
34th IEEE International Conference on Software Maintenance and Evolution, Madrid, Spain, 26-28 September 2018
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, "On the naturalness of software, " in Proceedings of ICSE 12. Piscataway, NJ, USA: IEEE Press, 2012, pp. 837-847.
B. Ray, V. Hellendoorn, S. Godhane, Z. Tu, A. Bacchelli, and P. Devanbu, "On the "naturalness" of buggy code, " in Proceedings of ICSE 16. New York, NY, USA: ACM, 2016, pp. 428-439.
M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, "A survey of machine learning for big code and naturalness, " CoRR, vol. abs/1709.06182, 2017.
M. Jimenez, M. Cordy, Y. L. Traon, and M. Papadakis, "On the impact of tokenizer and parameters on n-gram based code analysis, " in Proceedings of ICSME 18, 2018.
J. Parser. (2017) Java parser github. [Online]. Available: https://github.com/javaparser/javaparser
G. Neubig. (2017) Kyoto language modeling toolkit. [Online]. Available: https://github.com/neubig/kylm