[en] Natural language processing techniques, in particular n-gram models, have been applied successfully to facilitate a number of software engineering tasks. However, in our related ICSME ’18 paper, we have shown that the conclusions of a study can drastically change with respect to how the code is tokenized and how the used n-gram model is parameterized. These choices are thus of utmost importance, and one must carefully make them. To show this and allow the community to benefit from our work, we have developed TUNA (TUning Naturalness-based Analysis), a Java software artifact to perform naturalness-based analyses of source code. To the best of our knowledge, TUNA is the first open- source, end-to-end toolchain to carry out source code analyses based on naturalness.
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Security Design and Validation Research Group (SerVal)
Disciplines :
Sciences informatiques
Auteur, co-auteur :
JIMENEZ, Matthieu ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
CORDY, Maxime ; Facultés Universitaires Notre-Dame de la Paix - Namur - FUNDP
LE TRAON, Yves ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
PAPADAKIS, Mike ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > Computer Science and Communications Research Unit (CSC)
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
TUNA: TUning Naturalness-based Analysis
Date de publication/diffusion :
26 septembre 2018
Nom de la manifestation :
34th IEEE International Conference on Software Maintenance and Evolution (ICSME'18)
Lieu de la manifestation :
Madrid, Espagne
Date de la manifestation :
26-28 September 2018
Manifestation à portée :
International
Titre de l'ouvrage principal :
34th IEEE International Conference on Software Maintenance and Evolution, Madrid, Spain, 26-28 September 2018
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, "On the naturalness of software, " in Proceedings of ICSE 12. Piscataway, NJ, USA: IEEE Press, 2012, pp. 837-847.
B. Ray, V. Hellendoorn, S. Godhane, Z. Tu, A. Bacchelli, and P. Devanbu, "On the "naturalness" of buggy code, " in Proceedings of ICSE 16. New York, NY, USA: ACM, 2016, pp. 428-439.
M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, "A survey of machine learning for big code and naturalness, " CoRR, vol. abs/1709.06182, 2017.
M. Jimenez, M. Cordy, Y. L. Traon, and M. Papadakis, "On the impact of tokenizer and parameters on n-gram based code analysis, " in Proceedings of ICSME 18, 2018.
J. Parser. (2017) Java parser github. [Online]. Available: https://github.com/javaparser/javaparser
G. Neubig. (2017) Kyoto language modeling toolkit. [Online]. Available: https://github.com/neubig/kylm