Code refactoring; inconsistent method names; deep learning; code embedding
Résumé :
[en] To ensure code readability and facilitate software maintenance, program methods must be named properly. In particular, method names must be consistent with the corresponding method implementations. Debugging method names remains an important topic in the literature, where various approaches analyze commonalities among method names in a large dataset to detect inconsistent method names and suggest better ones. We note that the state-of-the-art does not analyze the implemented code itself to assess consistency. We thus propose a novel automated approach to debugging method names based on the analysis of consistency between method names and method code. The approach leverages deep feature representation techniques adapted to the nature of each artifact. Experimental results on over 2.1 million Java methods show that we can achieve up to 15 percentage points improvement over the state-of-the-art, establishing a record performance of 67.9% F1-measure in identifying inconsistent method names. We further demonstrate that our approach yields up to 25% accuracy in suggesting full names, while the state-of-the-art lags far behind at 1.1% accuracy. Finally, we report on our success in fixing 66 inconsistent method names in a live study on projects in the wild.
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Security Design and Validation Research Group (SerVal)
Disciplines :
Sciences informatiques
Auteur, co-auteur :
LIU, Kui ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
KIM, Kisub ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
KOYUNCU, Anil ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
Kim, Suntae
LE TRAON, Yves ; University of Luxembourg > Faculty of Science, Technology and Communication (FSTC) > Computer Science and Communications Research Unit (CSC)
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Learning to Spot and Refactor Inconsistent Method Names
Date de publication/diffusion :
mai 2019
Nom de la manifestation :
41st ACM/IEEE International Conference on Software Engineering
Date de la manifestation :
from 25-05-2019 to 31-05-2019
Manifestation à portée :
International
Titre de l'ouvrage principal :
41st ACM/IEEE International Conference on Software Engineering (ICSE)
Maison d'édition :
IEEE, Montreal, Canada
Peer reviewed :
Peer reviewed
Projet FnR :
FNR10449467 - Automatic Bug Fix Recommendation: Improving Software Repair And Reducing Time-to-fix Delays In Software Development Projects, 2015 (01/02/2016-31/01/2019) - Tegawendé François D'assise Bissyandé
M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts, Refactoring: improving the design of existing code. Addison-Wesley Professional, 1999
S. McConnell, Code complete. Pearson Education, 2004
K. Beck, Implementation patterns. Pearson Education, 2007
R. C. Martin, Clean code: a handbook of agile software craftsmanship. Pearson Education, 2009
P. Johnson, "Don't go into programming if you don't have a good thesaurus, " https://www.itworld.com/article/2833265/cloudcomputing/don-T-go-into-programming-if-you-don-T-have-A-goodthesaurus.html, Last Accessed: August 2018
-, "Arg! the 9 hardest things programmers have to do, " http://www.itworld.com/article/2823759/enterprise-software/124383-Arg-The-9-hardest-Things-programmers-have-To-do.html#slide10, Last Accessed: August 2018
S. Kim and D. Kim, "Automatic identifier inconsistency detection using code dictionary, " Empirical Software Engineering, vol. 21, no. 2, pp. 565-604, 2016
F. Deissenboeck and M. Pizka, "Concise and consistent naming, " Software Quality Journal, vol. 14, no. 3, pp. 261-282, 2006
M. Gethers, T. Savage, M. Di Penta, R. Oliveto, D. Poshyvanyk, and A. De Lucia, "CodeTopics: which topic am i coding now?" in Proceedings of the 33rd International Conference on Software Engineering. ACM, 2011, pp. 1034-1036
G. Bavota, R. Oliveto, M. Gethers, D. Poshyvanyk, and A. De Lucia, "Methodbook: Recommending move method refactorings via relational topic models, " IEEE Transactions on Software Engineering, vol. 40, no. 7, pp. 671-694, 2014
F. Deissenboeck and M. Pizka, "Concise and consistent naming: ten years later, " in Proceedings of the 23rd International Conference on Program Comprehension. IEEE, 2015, pp. 3-3
A. A. Takang, P. A. Grubb, and R. D. Macredie, "The effects of comments and identifier names on program comprehensibility: an experimental investigation, " J. Prog. Lang., vol. 4, no. 3, pp. 143-167, 1996
B. Liblit, A. Begel, and E. Sweetser, "Cognitive perspectives on the role of naming in computer programs, " in Proceedings of the 18th Annual Workshop of the Psychology of Programming Interest Group. Citeseer, 2006, pp. 53-67
D. Lawrie, C. Morrell, H. Feild, and D. Binkley, "What's in a name? a study of identifiers, " in Proceedings of the 14th International Conference on Program Comprehension. IEEE, 2006, pp. 3-12
V. Arnaoudova, L. M. Eshkevari, M. Di Penta, R. Oliveto, G. Antoniol, and Y.-G. Gueheneuc, "Repent: Analyzing the nature of identifier renamings, " IEEE Transactions on Software Engineering, vol. 40, no. 5, pp. 502-532, 2014
V. Arnaoudova, M. Di Penta, and G. Antoniol, "Linguistic antipatterns: What they are and how developers perceive them, " Empirical Software Engineering, vol. 21, no. 1, pp. 104-158, 2016
M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, "Deep learning code fragments for code clone detection, " in Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. ACM, 2016, pp. 87-98
J. Hofmeister, J. Siegmund, and D. V. Holt, "Shorter identifier names take longer to comprehend, " in Proceedings of the 24th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 2017, pp. 217-227
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp, "Relating identifier naming flaws and code quality: An empirical study, " in Proceedings of 16th Working Conference on Reverse Engineering. IEEE, 2009, pp. 31-35
S. L. Abebe, S. Haiduc, P. Tonella, and A. Marcus, "The effect of lexicon bad smells on concept location in source code, " in Proceedings of the 11th International Working Conference on Source Code Analysis and Manipulation. IEEE, 2011, pp. 125-134
S. L. Abebe, V. Arnaoudova, P. Tonella, G. Antoniol, and Y.-G. Gueheneuc, "Can lexicon bad smells improve fault prediction?" in Proceedings of the 19th Working Conference on Reverse Engineering. IEEE, 2012, pp. 235-244
S. Amann, H. A. Nguyen, S. Nadi, T. N. Nguyen, and M. Mezini, "A systematic evaluation of api-misuse detectors, " arXiv preprint arXiv:1712. 00242, 2017
D. Hovemeyer and W. Pugh, "Finding bugs is easy, " ACM Sigplan Notices, vol. 39, no. 12, pp. 92-106, 2004
Eclipse, "Aspectj, " https://github.com/eclipse/org. Aspectj, Last Access: August. 2018
S. Exchange, "Stack overflow, " https://stackoverflow.com/, Last Access: August. 2018
Microsoft, "Github, " https://github.com/, Last Access: August. 2018
E. W. Høst and B. M. Ostvold, "Debugging method names, " in Proceedings of the 23rd European Conference on Object-Oriented Programming. Springer, 2009, pp. 294-317
M. Allamanis, E. T. Barr, C. Bird, and C. Sutton, "Learning natural coding conventions, " in Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2014, pp. 281-293
P. F. Brown, P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai, "Class-based n-gram models of natural language, " Computational Linguistics, vol. 18, no. 4, pp. 467-479, 1992
M. Allamanis, E. T. Barr, C. Bird, and C. Sutton, "Suggesting accurate method and class names, " in Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. ACM, 2015, pp. 38-49
M. Allamanis, H. Peng, and C. Sutton, "A convolutional attention network for extreme summarization of source code, " in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 2091-2100
Q. Le and T. Mikolov, "Distributed representations of sentences and documents, " in Proceedings of the 31th International Conference on Machine Learning. JMLR.org, 2014, pp. 1188-1196
M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, "Subject independent facial expression recognition with robust face detection using a convolutional neural network, " Neural Networks, vol. 16, no. 5-6, pp. 555-559, 2003
T. Mikolov, K. Chen, G. S. Corrado, and J. Dean, "Efficient estimation of word representations in vector space, " CoRR, vol. Abs/1301. 3781, 2013
G. E. Dahl, R. P. Adams, and H. Larochelle, "Training restricted boltzmann machines on word observations, " arXiv preprint arXiv:1202. 5695, 2012
D. Tang, B. Qin, and T. Liu, "Document modeling with gated recurrent neural network for sentiment classification, " in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. ACL, 2015, pp. 1422-1432
Q. Ai, L. Yang, J. Guo, and W. B. Croft, "Analysis of the paragraph vector model for information retrieval, " in Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval. ACM, 2016, pp. 133-142
M. Kusner, Y. Sun, N. Kolkin, and K. Weinberger, "From word embeddings to document distances, " in Proceedings of the 32nd International Conference on Machine Learning. JMLR.org, 2015, pp. 957-966
J. Wieting, M. Bansal, K. Gimpel, and K. Livescu, "Towards universal paraphrastic sentence embeddings, " arXiv preprint arXiv:1511. 08198, 2015
A. M. Dai, C. Olah, and Q. V. Le, "Document embedding with paragraph vectors, " arXiv preprint arXiv:1507. 07998, 2015
A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, V. Zhong, R. Paulus, and R. Socher, "Ask me anything: Dynamic memory networks for natural language processing, " in Proceedings of the 33nd International Conference on Machine Learning. JMLR.org, 2016, pp. 1378-1387
D. Tang, B. Qin, and T. Liu, "Learning semantic representations of users and products for document level sentiment classification, " in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1. ACL, 2015, pp. 1014-1023
Y. Kim, "Convolutional neural networks for sentence classification, " in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. ACL, 2014, pp. 1746-1751
P. Wang, J. Xu, B. Xu, C. Liu, H. Zhang, F. Wang, and H. Hao, "Semantic clustering and convolutional neural network for short text categorization, " in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), vol. 2, 2015, pp. 352-357
H. Peng, L. Mou, G. Li, Y. Liu, L. Zhang, and Z. Jin, "Building program vector representations for deep learning, " in Proceedings of the 8th International Conference on Knowledge Science, Engineering and Management. Springer, 2015, pp. 547-553
M. Allamanis, D. Tarlow, A. Gordon, and Y. Wei, "Bimodal modelling of source code and natural language, " in Proceedings of the 32nd International Conference on Machine Learning. JMLR.org, 2015, pp. 2123-2132
L. Mou, G. Li, L. Zhang, T. Wang, and Z. Jin, "Convolutional neural networks over tree structures for programming language processing, " in Proceedings of the 30th AAAI Conference on Artificial Intelligence. AAAI, 2016, pp. 1287-1293
X. Gu, H. Zhang, D. Zhang, and S. Kim, "Deep api learning, " in Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2016, pp. 631-642
S. Jiang, A. Armaly, and C. McMillan, "Automatically generating commit messages from diffs using neural machine translation, " in Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE, 2017, pp. 135-146
T. D. Nguyen, A. T. Nguyen, H. D. Phan, and T. N. Nguyen, "Exploring api embedding for api usages and applications, " in Proceedings of the 39th International Conference on Software Engineering. IEEE/ACM, 2017, pp. 438-449
X. Gu, H. Zhang, and S. Kim, "Deep code search, " in Proceedings of the 40th International Conference on Software Engineering. ACM, 2018, pp. 933-944
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, "On the naturalness of software, " in Proceedings of the 34th International Conference on Software Engineering. IEEE, 2012, pp. 837-847
M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, "A survey of machine learning for big code and naturalness, " ACM Computing Surveys, vol. 51, no. 4, p. 81, 2018
N. D. Q. Bui, L. Jiang, and Y. Yu, "Cross-language learning for program classification using bilateral tree-based convolutional neural networks, " in Proceedings of the Workshops of the The 32nd AAAI Conference on Artificial Intelligence. AAAI Press, 2018, pp. 758-761
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition, " Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using rnn encoder-decoder for statistical machine translation, " pp. 1724-1734, 2014
Google, "Word2vec, " https://code. google.com/archive/p/word2vec/, Last Accessed: August. 2018
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality, " in Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems. NIPS, 2013, pp. 3111-3119
E. W. Høst and B. M. Ostvold, "The Java programmer's phrase book, " in Proceedings of the First International Conference on Software Language Engineering. Springer, 2008, pp. 322-341
K. Liu, D. Kim, T. F. Bissyandé, S. Yoo, and Y. L. Traon, "Mining fix patterns for findbugs violations, " IEEE Transactions on Software Engineering, 2018
B. Liang, P. Bian, Y. Zhang, W. Shi, W. You, and Y. Cai, "AntMiner: mining more bugs by reducing noise interference, " in Proceedings of the 38th IEEE/ACM International Conference on Software Engineering. ACM, 2016, pp. 333-344
T. Hastie, R. Tibshirani, and J. Friedman, "Unsupervised learning, " in The Elements of Statistical Learning. Springer, 2009, pp. 485-585
D. W. Aha, Lazy learning. Washington, DC: Springer, 1997
S. Wang, T. Liu, and L. Tan, "Automatically learning semantic features for defect prediction, " in Proceedings of the 38th International Conference on Software Engineering. ACM, 2016, pp. 297-308
Oracle, "Java naming convention, " http://www.oracle.com/technetwork/java/codeconventions-135099.html, Last Access: August. 2018
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp, "Mining Java class naming conventions, " in Proceedings of the 27th IEEE International Conference on Software Maintenance. IEEE, 2011, pp. 93-102
M. Frigge, D. C. Hoaglin, and B. Iglewicz, "Some implementations of the boxplot, " The American Statistician, vol. 43, no. 1, pp. 50-54, 1989
M. T. Ribeiro, S. Singh, and C. Guestrin, "Why should I trust you?: Explaining the predictions of any classifier, " in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 1135-1144
Eclipse, "Deep learning for Java, " https://deeplearning4j.org/, Last Access: August. 2018
Gitter, "Deeplearning4j communities, " https://gitter. im/deeplearning4j/deeplearning4j, Last Access: August. 2018
A. Thies and C. Roth, "Recommending rename refactorings, " in Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering. ACM, 2010, pp. 1-5
D. Binkley, M. Hearn, and D. Lawrie, "Improving identifier informativeness using part of speech information, " in Proceedings of the 8th Working Conference on Mining Software Repositories. ACM, 2011, pp. 203-206
U. Alon, M. Zilberstein, O. Levy, and E. Yahav, "code2vec: Learning distributed representations of code, " in Proceedings of the 46th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, vol. 3. ACM, 2019, pp. 40:1-40:29
T. Suzuki, K. Sakamoto, F. Ishikawa, and S. Honiden, "An approach for evaluating and suggesting method names using n-gram models, " in Proceedings of the 22nd International Conference on Program Comprehension. ACM, 2014, pp. 271-274
H. Kim, Y. Jung, S. Kim, and K. Yi, "MeCC: memory comparison-based clone detector, " in Proceedings of the 33rd International Conference on Software Engineering. ACM, 2011, pp. 301-310
F.-H. Su, J. Bell, K. Harvey, S. Sethumadhavan, G. Kaiser, and T. Jebara, "Code relatives: detecting similarly behaving software, " in Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2016, pp. 702-714
K. Kim, D. Kim, T. F. Bissyande, E. Choi, L. Li, J. Klein, and Y. Le Traon, "Facoy-A code-To-code search engine, " in Proceedings of the 40th International Conference on Software Engineering. ACM, 2018
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp, "Exploring the influence of identifier names on code quality: An empirical study, " in Proceedings of the 14th European Conference on Software Maintenance and Reengineering. IEEE, 2010, pp. 156-165
S. Butler, "Mining Java class identifier naming conventions, " in Proceedings of the 34th International Conference on Software Engineering. IEEE, 2012, pp. 1641-1643
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp, "INVocD: identifier name vocabulary dataset, " in Proceedings of the 10th Working Conference on Mining Software Repositories. IEEE, 2013, pp. 405-408
B. Caprile and P. Tonella, "Nomen est omen: Analyzing the language of function identifiers, " in Proceedings of the 6th Working Conference on Reverse Engineering. IEEE, 1999, pp. 112-122
D. Lawrie, H. Feild, and D. Binkley, "Syntactic identifier conciseness and consistency, " in Proceedings of the 6th IEEE International Workshop on Source Code Analysis and Manipulation. IEEE, 2006, pp. 139-148
S. Haiduc, J. Aponte, L. Moreno, and A. Marcus, "On the use of automated text summarization techniques for summarizing source code, " in Proceedings of the 17th Working Conference on Reverse Engineering. IEEE, 2010, pp. 35-44
S. Haiduc, J. Aponte, and A. Marcus, "Supporting program comprehension with source code summarization, " in Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 2. ACM, 2010, pp. 223-226
G. Sridhara, L. Pollock, and K. Vijay-Shanker, "Automatically detecting and describing high level actions within methods, " in Proceedings of the 33rd International Conference on Software Engineering. ACM, 2011, pp. 101-110
A. De Lucia, M. Di Penta, and R. Oliveto, "Improving source code lexicon via traceability and information retrieval, " IEEE Transactions on Software Engineering, vol. 37, no. 2, pp. 205-227, 2011.