Abstract :
[en] Inconsistent identifiers make it difficult for developers to understand source code.
In particular, large software systems written by several developers can be vulnerable to identifier
inconsistency. Unfortunately, it is not easy to detect inconsistent identifiers that are
already used in source code. Although several techniques have been proposed to address this
issue, many of these techniques can result in false alarms since such techniques do not accept
domain words and idiom identifiers that are widely used in programming practice. This paper
proposes an approach to detecting inconsistent identifiers based on a custom code dictionary.
It first automatically builds a Code Dictionary from the existing API documents of popular
Java projects by using an Natural Language Processing (NLP) parser. This dictionary records
domain words with dominant part-of-speech (POS) and idiom identifiers. This set of domain
words and idioms can improve the accuracy when detecting inconsistencies by reducing false
alarms. The approach then takes a target program and detects inconsistent identifiers of the
program by leveraging the Code Dictionary. We provide CodeAmigo, a GUI-based tool support
for our approach. We evaluated our approach on seven Java based open-/proprietarysource
projects. The results of the evaluations show that the approach can detect inconsistent identifiers with 85.4% precision and 83.59% recall values. In addition, we conducted an
interview with developers who used our approach, and the interview confirmed that inconsistent
identifiers frequently and inevitably occur in most software projects. The interviewees
then stated that our approach can help to better detect inconsistent identifiers that would have
been missed through manual detection.
Scopus citations®
without self-citations
26