![]() ; ; Liu, Kui ![]() in 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, Hawaii 9-12 March 2021 (2021, March 10) The literature of Automated Program Repair is largely dominated by approaches that leverage test suites not only to expose bugs but also to validate the generated patches. Unfortunately, beyond the widely ... [more ▼] The literature of Automated Program Repair is largely dominated by approaches that leverage test suites not only to expose bugs but also to validate the generated patches. Unfortunately, beyond the widely-discussed concern that test suites are an imperfect oracle because they can be incomplete, they can include tests that are flaky. A flaky test is one that can be passed or failed by a program in a non-deterministic way. Such tests are generally carefully removed from the repair benchmarks. In practice, however, flaky tests are available test suite of software repositories. To the best of our knowledge, no study has discussed this threat to validity for evaluation of program repair. In this work, we highlight this threat and further investigate the impact of flaky tests by reverting their removal from the Defects4J benchmark. Our study aims to characterize the impact of flaky tests for localizing bugs and the eventual influence on the repair performance. Among other insights, we find that (1) although flaky tests are few (≈0.3%) of total tests, they affect experiments related to a large proportion (98.9%) of Defects4J real-world faults; (2) most flaky tests (98%) actually provide deterministic results under specific environment configurations (with the jdk version influencing the results); (3) flaky tests drastically hinder the effectiveness of spectrum-based fault localization (e.g., the rankings of 90 bugs drop down while none of the bugs obtains better location results compared with results achieved without flaky tests); and (4) the repairability of APR tools is greatly affected by the presence of flaky tests (e.g., 10 state of the art APR tools can now fix significantly fewer bugs than when the benchmark is manually curated to remove flaky tests). Given that the detection of flaky tests is still nascent, we call for the program repair community to relax the artificial assumption that the test suite is free from flaky tests. One direction that we propose is to consider developing strategies where patches that partially-fix bugs are considered worthwhile: a patch may make the program pass some test cases but fail some (which may actually be the flaky ones). [less ▲] Detailed reference viewed: 31 (0 UL)![]() Liu, Kui ![]() ![]() in 42nd ACM/IEEE International Conference on Software Engineering (ICSE) (2020, May) Test-based automated program repair has been a prolific field of research in software engineering in the last decade. Many approaches have indeed been proposed, which leverage test suites as a weak, but ... [more ▼] Test-based automated program repair has been a prolific field of research in software engineering in the last decade. Many approaches have indeed been proposed, which leverage test suites as a weak, but affordable, approximation to program specifications. Although the literature regularly sets new records on the number of benchmark bugs that can be fixed, several studies increasingly raise concerns about the limitations and biases of state-of-the-art approaches. For example, the correctness of generated patches has been questioned in a number of studies, while other researchers pointed out that evaluation schemes may be misleading with respect to the processing of fault localization results. Nevertheless, there is little work addressing the efficiency of patch generation, with regard to the practicality of program repair. In this paper, we fill this gap in the literature, by providing an extensive review on the efficiency of test suite based program repair. Our objective is to assess the number of generated patch candidates, since this information is correlated to (1) the strategy to traverse the search space efficiently in order to select sensical repair attempts, (2) the strategy to minimize the test effort for identifying a plausible patch, (3) as well as the strategy to prioritize the generation of a correct patch. To that end, we perform a large-scale empirical study on the efficiency, in terms of quantity of generated patch candidates of the 16 open-source repair tools for Java programs. The experiments are carefully conducted under the same fault localization configurations to limit biases. Eventually, among other findings, we note that: (1) many irrelevant patch candidates are generated by changing wrong code locations; (2) however, if the search space is carefully triaged, fault localization noise has little impact on patch generation efficiency; (3) yet, current template-based repair systems, which are known to be most effective in fixing a large number of bugs, are actually least efficient as they tend to generate majoritarily irrelevant patch candidates. [less ▲] Detailed reference viewed: 260 (18 UL)![]() Koyuncu, Anil ![]() ![]() ![]() in Empirical Software Engineering (2020) Patching is a common activity in software development. It is generally performed on a source code base to address bugs or add new functionalities. In this context, given the recurrence of bugs across ... [more ▼] Patching is a common activity in software development. It is generally performed on a source code base to address bugs or add new functionalities. In this context, given the recurrence of bugs across projects, the associated similar patches can be leveraged to extract generic fix actions. While the literature includes various approaches leveraging similarity among patches to guide program repair, these approaches often do not yield fix patterns that are tractable and reusable as actionable input to APR systems. In this paper, we propose a systematic and automated approach to mining relevant and actionable fix patterns based on an iterative clustering strategy applied to atomic changes within patches. The goal of FixMiner is thus to infer separate and reusable fix patterns that can be leveraged in other patch generation systems. Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree structure of the edit scripts that captures the ASTlevel context of the code changes. FixMiner uses different tree representations of Rich Edit Scripts for each round of clustering to identify similar changes. These are abstract syntax trees, edit actions trees, and code context trees. We have evaluated FixMiner on thousands of software patches collected from open source projects. Preliminary results show that we are able to mine accurate patterns, efficiently exploiting change information in Rich Edit Scripts. We further integrated the mined patterns to an automated program repair prototype, PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J benchmark. Beyond this quantitative performance, we show that the mined fix patterns are sufficiently relevant to produce patches with a high probability of correctness: 81% of PARFixMiner’s generated plausible patches are correct. [less ▲] Detailed reference viewed: 139 (7 UL)![]() Tian, Haoye ![]() ![]() ![]() in Tian, Haoye (Ed.) 35th IEEE/ACM International Conference on Automated Software Engineering, September 21-25, 2020, Melbourne, Australia (2020) A large body of the literature of automated program repair develops approaches where patches are generated to be validated against an oracle (e.g., a test suite). Because such an oracle can be imperfect ... [more ▼] A large body of the literature of automated program repair develops approaches where patches are generated to be validated against an oracle (e.g., a test suite). Because such an oracle can be imperfect, the generated patches, although validated by the oracle, may actually be incorrect. While the state of the art explore research directions that require dynamic information or rely on manually-crafted heuristics, we study the benefit of learning code representations to learn deep features that may encode the properties of patch correctness. Our work mainly investigates different representation learning approaches for code changes to derive embeddings that are amenable to similarity computations. We report on findings based on embeddings produced by pre-trained and re-trained neural networks. Experimental results demonstrate the potential of embeddings to empower learning algorithms in reasoning about patch correctness: a machine learning predictor with BERT transformer-based embeddings... [less ▲] Detailed reference viewed: 114 (34 UL)![]() Liu, Kui ![]() Doctoral thesis (2019) Error-free software is a myth. Debugging thus accounts for a significant portion of software maintenance and absorbs a large part of software cost. In particular, the manual task of fixing bugs is tedious ... [more ▼] Error-free software is a myth. Debugging thus accounts for a significant portion of software maintenance and absorbs a large part of software cost. In particular, the manual task of fixing bugs is tedious, error- prone and time-consuming. In the last decade, automatic bug-fixing, also referred to as automated program repair (APR) has boomed as a promising endeavor of software engineering towards alleviating developers’ burden. Several potentially promising techniques have been proposed making APR an increasingly prominent topic in both the research and practice communities. In production, APR will drastically reduce time-to-fix delays and limit downtime. In a development cycle, APR can help suggest changes to accelerate debugging. As an emergent domain, however, program repair has many open problems that the community is still exploring. Our work contributes to this momentum on two angles: the repair of programs for functionality bugs, and the repair of programs for method naming issues. The thesis starts with highlighting findings on key empirical studies that we have performed to inform future repair approaches. Then, we focus on template-based program repair scenarios and explore deep learning models for inferring accurate and relevant patterns. Finally, we integrate these patterns into APR pipelines, which yield the state of the art repair tools. The dissertation includes the following contributions: • Real-world Patch Study: Existing APR studies have shown that the state-of-the-art techniques in automated repair tend to generate patches only for a small number of bugs even with quality issues (e.g., incorrect behavior and nonsensical changes). To improve APR techniques, the community should deepen its knowledge on repair actions from real-world patches since most of the techniques rely on patches written by human developers. However, previous investigations on real-world patches are limited to statement level that is not sufficiently fine-grained to build this knowledge. This dissertation starts with deepening this knowledge via a systematic and fine-grained study of real-world Java program bug fixes. • Fault Localization Impact: Existing test-suite-based APR systems are highly dependent on the performance of the fault localization (FL) technique that is the process of the widely studied APR pipeline. However, APR systems generally focus on the patch generation, but tend to use similar but different strategies for fault localization. To assess the impact of FL on APR, we identify and investigate a practical bias caused by the FL step in a repair pipeline. We propose to highlight the different FL configurations used in the literature, and their impact on APR systems when applied to the real bugs. Then, we explore the performance variations that can be achieved by “tweaking” the FL step. • Fix Pattern Mining: Fix patterns (a.k.a. fix templates) have been studied in various APR scenarios. Particularly, fix patterns have been widely used in different APR systems. To date, fix pattern mining is mainly studied in three ways: manually summarization, transformation inferring and code change action statistics. In this dissertation, we explore mining fix patterns for static bugs leveraging deep learning and clustering algorithms. • Avatar: Fix pattern based patch generation is a promising direction in the APR community. Notably, it has been demonstrated to produce more acceptable and correct patches than the patches obtained with mutation operators through genetic programming. The performance of fix pattern based APR systems, however, depends on the fix ingredients mined from commit changes in development histories. Unfortunately, collecting a reliable set of bug fixes in repositories can be challenging. We propose to investigate the possibility in an APR scenario of leveraging code changes that address violations by static bug detection tools. To that end, we build the Avatar APR system, which exploits fix patterns of static analysis violations as ingredients for patch generation. • TBar: Fix patterns are widely used in patch generation of APR, however, the repair performance of a single fix pattern is not studied. We revisit the performance of template-based APR to build comprehensive knowledge about the effectiveness of fix patterns, and to highlight the importance of complementary steps such as fault localization or donor code retrieval. To that end, we first investigate the literature to collect, summarize and label recurrently-used fix patterns. Based on the investigation, we build TBar, a straightforward APR tool that systematically attempts to apply these fix patterns to program bugs. We thoroughly evaluate TBar on the Defects4J benchmark. In particular, we assess the actual qualitative and quantitative diversity of fix patterns, as well as their effectiveness in yielding plausible or correct patches. • Debugging Method Names: Except the issues about semantic/static bugs in programs, we note that how to debug inconsistent method names automatically is important to improve program quality. In this dissertation, we propose a deep learning based approach to spotting and refactoring inconsistent method names in programs. [less ▲] Detailed reference viewed: 377 (20 UL)![]() Koyuncu, Anil ![]() ![]() ![]() in ESEC/FSE 2019 Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2019, August) Issue tracking systems are commonly used in modern software development for collecting feedback from users and developers. An ultimate automation target of software maintenance is then the systematization ... [more ▼] Issue tracking systems are commonly used in modern software development for collecting feedback from users and developers. An ultimate automation target of software maintenance is then the systematization of patch generation for user-reported bugs. Although this ambition is aligned with the momentum of automated program repair, the literature has, so far, mostly focused on generate-and- validate setups where fault localization and patch generation are driven by a well-defined test suite. On the one hand, however, the common (yet strong) assumption on the existence of relevant test cases does not hold in practice for most development settings: many bugs are reported without the available test suite being able to reveal them. On the other hand, for many projects, the number of bug reports generally outstrips the resources available to triage them. Towards increasing the adoption of patch generation tools by practitioners, we investigate a new repair pipeline, iFixR, driven by bug reports: (1) bug reports are fed to an IR-based fault localizer; (2) patches are generated from fix patterns and validated via regression testing; (3) a prioritized list of generated patches is proposed to developers. We evaluate iFixR on the Defects4J dataset, which we enriched (i.e., faults are linked to bug reports) and carefully-reorganized (i.e., the timeline of test-cases is naturally split). iFixR generates genuine/plausible patches for 21/44 Defects4J faults with its IR-based fault localizer. iFixR accurately places a genuine/plausible patch among its top-5 recommendation for 8/13 of these faults (without using future test cases in generation-and-validation). [less ▲] Detailed reference viewed: 185 (19 UL)![]() Liu, Kui ![]() ![]() in 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA) (2019, July) We revisit the performance of template-based APR to build com-prehensive knowledge about the effectiveness of fix patterns, andto highlight the importance of complementary steps such as faultlocalization ... [more ▼] We revisit the performance of template-based APR to build com-prehensive knowledge about the effectiveness of fix patterns, andto highlight the importance of complementary steps such as faultlocalization or donor code retrieval. To that end, we first investi-gate the literature to collect, summarize and label recurrently-usedfix patterns. Based on the investigation, we buildTBar, a straight-forward APR tool that systematically attempts to apply these fixpatterns to program bugs. We thoroughly evaluateTBaron the De-fects4J benchmark. In particular, we assess the actual qualitative andquantitative diversity of fix patterns, as well as their effectivenessin yielding plausible or correct patches. Eventually, we find that,assuming a perfect fault localization,TBarcorrectly/plausibly fixes74/101 bugs. Replicating a standard and practical pipeline of APRassessment, we demonstrate thatTBarcorrectly fixes 43 bugs fromDefects4J, an unprecedented performance in the literature (includ-ing all approaches, i.e., template-based, stochastic mutation-basedor synthesis-based APR). [less ▲] Detailed reference viewed: 159 (11 UL)![]() Liu, Kui ![]() ![]() in 41st ACM/IEEE International Conference on Software Engineering (ICSE) (2019, May) To ensure code readability and facilitate software maintenance, program methods must be named properly. In particular, method names must be consistent with the corresponding method implementations ... [more ▼] To ensure code readability and facilitate software maintenance, program methods must be named properly. In particular, method names must be consistent with the corresponding method implementations. Debugging method names remains an important topic in the literature, where various approaches analyze commonalities among method names in a large dataset to detect inconsistent method names and suggest better ones. We note that the state-of-the-art does not analyze the implemented code itself to assess consistency. We thus propose a novel automated approach to debugging method names based on the analysis of consistency between method names and method code. The approach leverages deep feature representation techniques adapted to the nature of each artifact. Experimental results on over 2.1 million Java methods show that we can achieve up to 15 percentage points improvement over the state-of-the-art, establishing a record performance of 67.9% F1-measure in identifying inconsistent method names. We further demonstrate that our approach yields up to 25% accuracy in suggesting full names, while the state-of-the-art lags far behind at 1.1% accuracy. Finally, we report on our success in fixing 66 inconsistent method names in a live study on projects in the wild. [less ▲] Detailed reference viewed: 361 (21 UL)![]() Liu, Kui ![]() ![]() ![]() in The 12th IEEE International Conference on Software Testing, Verification and Validation (ICST-2019) (2019, April 24) Properly benchmarking Automated Program Repair (APR) systems should contribute to the development and adoption of the research outputs by practitioners. To that end, the research community must ensure ... [more ▼] Properly benchmarking Automated Program Repair (APR) systems should contribute to the development and adoption of the research outputs by practitioners. To that end, the research community must ensure that it reaches significant milestones by reliably comparing state-of-the-art tools for a better understanding of their strengths and weaknesses. In this work, we identify and investigate a practical bias caused by the fault localization (FL) step in a repair pipeline. We propose to highlight the different fault localization configurations used in the literature, and their impact on APR systems when applied to the Defects4J benchmark. Then, we explore the performance variations that can be achieved by "tweaking'' the FL step. Eventually, we expect to create a new momentum for (1) full disclosure of APR experimental procedures with respect to FL, (2) realistic expectations of repairing bugs in Defects4J, as well as (3) reliable performance comparison among the state-of-the-art APR systems, and against the baseline performance results of our thoroughly assessed kPAR repair tool. Our main findings include: (a) only a subset of Defects4J bugs can be currently localized by commonly-used FL techniques; (b) current practice of comparing state-of-the-art APR systems (i.e., counting the number of fixed bugs) is potentially misleading due to the bias of FL configurations; and (c) APR authors do not properly qualify their performance achievement with respect to the different tuning parameters implemented in APR systems. [less ▲] Detailed reference viewed: 251 (18 UL)![]() Liu, Kui ![]() ![]() in The 26th IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER-2019) (2019, February 24) Fix pattern-based patch generation is a promising direction in Automated Program Repair (APR). Notably, it has been demonstrated to produce more acceptable and correct patches than the patches obtained ... [more ▼] Fix pattern-based patch generation is a promising direction in Automated Program Repair (APR). Notably, it has been demonstrated to produce more acceptable and correct patches than the patches obtained with mutation operators through genetic programming. The performance of pattern-based APR systems, however, depends on the fix ingredients mined from fix changes in development histories. Unfortunately, collecting a reliable set of bug fixes in repositories can be challenging. In this paper, we propose to investigate the possibility in an APR scenario of leveraging code changes that address violations by static bug detection tools. To that end, we build the AVATAR APR system, which exploits fix patterns of static analysis violations as ingredients for patch generation. Evaluated on the Defects4J benchmark, we show that, assuming a perfect localization of faults, AVATAR can generate correct patches to fix 34/39 bugs. We further find that AVATAR yields performance metrics that are comparable to that of the closely-related approaches in the literature. While AVATAR outperforms many of the state-of-the-art pattern-based APR systems, it is mostly complementary to current approaches. Overall, our study highlights the relevance of static bug finding tools as indirect contributors of fix ingredients for addressing code defects identified with functional test cases. [less ▲] Detailed reference viewed: 226 (18 UL)![]() Liu, Kui ![]() ![]() ![]() in 25th Asia-Pacific Software Engineering Conference (APSEC) (2018, December 07) Automated program repair (APR) has extensively been developed by leveraging search-based techniques, in which fix ingredients are explored and identified in different granularities from a specific search ... [more ▼] Automated program repair (APR) has extensively been developed by leveraging search-based techniques, in which fix ingredients are explored and identified in different granularities from a specific search space. State-of-the approaches often find fix ingredients by using mutation operators or leveraging manually-crafted templates. We argue that the fix ingredients can be searched in an online mode, leveraging code search techniques to find potentially-fixed versions of buggy code fragments from which repair actions can be extracted. In this study, we present an APR tool, LSRepair, that automatically explores code repositories to search for fix ingredients at the method-level granularity with three strategies of similar code search. Our preliminary evaluation shows that code search can drive a faster fix process (some bugs are fixed in a few seconds). LSRepair helps repair 19 bugs from the Defects4J benchmark successfully. We expect our approach to open new directions for fixing multiple-lines bugs. [less ▲] Detailed reference viewed: 305 (27 UL)![]() Kong, Pingfan ![]() ![]() in IEEE Transactions on Reliability (2018) Automated testing of Android apps is essential for app users, app developers and market maintainer communities alike. Given the widespread adoption of Android and the specificities of its development ... [more ▼] Automated testing of Android apps is essential for app users, app developers and market maintainer communities alike. Given the widespread adoption of Android and the specificities of its development model, the literature has proposed various testing approaches for ensuring that not only functional requirements but also non-functional requirements are satisfied. In this paper, we aim at providing a clear overview of the state-of-the-art works around the topic of Android app testing, in an attempt to highlight the main trends, pinpoint the main methodologies applied and enumerate the challenges faced by the Android testing approaches as well as the directions where the community effort is still needed. To this end, we conduct a Systematic Literature Review (SLR) during which we eventually identified 103 relevant research papers published in leading conferences and journals until 2016. Our thorough examination of the relevant literature has led to several findings and highlighted the challenges that Android testing researchers should strive to address in the future. After that, we further propose a few concrete research directions where testing approaches are needed to solve recurrent issues in app updates, continuous increases of app sizes, as well as the Android ecosystem fragmentation. [less ▲] Detailed reference viewed: 243 (33 UL)![]() Liu, Kui ![]() ![]() ![]() in 34th IEEE International Conference on Software Maintenance and Evolution (ICSME) (2018, September) Bug fixing is a time-consuming and tedious task. To reduce the manual efforts in bug fixing, researchers have presented automated approaches to software repair. Unfortunately, recent studies have shown ... [more ▼] Bug fixing is a time-consuming and tedious task. To reduce the manual efforts in bug fixing, researchers have presented automated approaches to software repair. Unfortunately, recent studies have shown that the state-of-the-art techniques in automated repair tend to generate patches only for a small number of bugs even with quality issues (e.g., incorrect behavior and nonsensical changes). To improve automated program repair (APR) techniques, the community should deepen its knowledge on repair actions from real-world patches since most of the techniques rely on patches written by human developers. Previous investigations on real-world patches are limited to statement level that is not sufficiently fine-grained to build this knowledge. In this work, we contribute to building this knowledge via a systematic and fine-grained study of 16,450 bug fix commits from seven Java open-source projects. We find that there are opportunities for APR techniques to improve their effectiveness by looking at code elements that have not yet been investigated. We also discuss nine insights into tuning automated repair tools. For example, a small number of statement and expression types are recurrently impacted by real-world patches, and expression-level granularity could reduce search space of finding fix ingredients, where previous studies never explored. [less ▲] Detailed reference viewed: 215 (24 UL)![]() Liu, Kui ![]() ![]() in IEEE Transactions on Software Engineering (2018) Several static analysis tools, such as Splint or FindBugs, have been proposed to the software development community to help detect security vulnerabilities or bad programming practices. However, the ... [more ▼] Several static analysis tools, such as Splint or FindBugs, have been proposed to the software development community to help detect security vulnerabilities or bad programming practices. However, the adoption of these tools is hindered by their high false positive rates. If the false positive rate is too high, developers may get acclimated to violation reports from these tools, causing concrete and severe bugs being overlooked. Fortunately, some violations are actually addressed and resolved by developers. We claim that those violations that are recurrently fixed are likely to be true positives, and an automated approach can learn to repair similar unseen violations. However, there is lack of a systematic way to investigate the distributions on existing violations and fixed ones in the wild, that can provide insights into prioritizing violations for developers, and an effective way to mine code and fix patterns which can help developers easily understand the reasons of leading violations and how to fix them. In this paper, we first collect and track a large number of fixed and unfixed violations across revisions of software. The empirical analyses reveal that there are discrepancies in the distributions of violations that are detected and those that are fixed, in terms of occurrences, spread and categories, which can provide insights into prioritizing violations. To automatically identify patterns in violations and their fixes, we propose an approach that utilizes convolutional neural networks to learn features and clustering to regroup similar instances. We then evaluate the usefulness of the identified fix patterns by applying them to unfixed violations. The results show that developers will accept and merge a majority (69/116) of fixes generated from the inferred fix patterns. It is also noteworthy that the yielded patterns are applicable to four real bugs in the Defects4J major benchmark for software testing and automated repair. [less ▲] Detailed reference viewed: 173 (7 UL) |
||