References of "Ma, Wei 018127791B"
Bookmark and Share    
Full Text
See detailNext Generation Mutation Testing: Continuous, Predictive, and ML-enabled
Ma, Wei UL

Doctoral thesis (2022)

Software has been an essential part of human life, and it substantially improves production and enriches our life. However, flaws in software can lead to tragedies, e.g. the failure of the Mariner 1 ... [more ▼]

Software has been an essential part of human life, and it substantially improves production and enriches our life. However, flaws in software can lead to tragedies, e.g. the failure of the Mariner 1 Spacecraft in 1962. At the moment, modern software systems are much different from what before. The issue gets even more severe since the complexity of software systems grows larger than before and Artificial Intelligence(AI) models are integrated into software (e.g., Tesla Deaths Report ). Testing such modern artificial software systems is challenging. Due to new requirements, software systems evolve and frequently change, and artificial intelligence(AI) models have non-determination issues. The non-determination of AI models is related to many factors, e.g., optimization algorithms, numerical problems, the labelling threshold, data of the same object but under different collecting conditions or changing the backend libraries. We have witnessed many new testing techniques emerge to guarantee the trustworthiness of modern software systems. Coverage-based Testing is one early technique to test Deep Learning(DL) systems by analyzing the neuron values statistically, e.g., Neuron Coverage(NC) . In recent years, Mutation Testing has drawn much attention. Coverage-based testing metrics can be misleading and easily be fooled by generating tests to satisfy test coverage requirements just by executing the code line. The test suite with one hundred percent coverage may detect no flaw in software. On the contrary, Mutation Testing is a robust approach to approximating the quality of a test suite. Mutation Testing is a technique based on detecting artificial defects from many crafted code perturbations (i.e., mutant) to assess and improve the quality of a test suite. The behaviour of a mutant is likely to be located on the border between correctness and non-correctness since the code perturbation is usually tiny. Through mutation testing, the border behaviour of the subject under test can be explored well, which leads to a high quality of software. It has been generalized to test software systems integrated with DL systems, e.g., image classification systems and autonomous driving systems. However, the application of Mutation Testing encounters some obstacles. One main challenge is that Mutation Testing is resource-intensive. Large resource consumption makes it unskilled in modern software development because the code frequently evolves every day. This dissertation studies how to apply Mutation Testing for modern software systems, exploring and exploiting the usages and innovations of Mutation Testing encountering AI algorithms, i.e., how to employ Mutation Testing for modern software systems under test. AI algorithms can improve Mutation Testing for modern software systems, and at the same time, Mutation Testing can test modern software integrated with DL models well. First, this dissertation adapts Mutation Testing to modern software development, Continuous Integration. Most software development teams currently employ Continuous Integration(CI) as the pipeline where the changes happen frequently. It is problematic to adopt Mutation Testing in Continuous Integration because of its expensive cost. At the same time, traditional Mutation Testing is not a good test metric for code changes as it is designed for the whole software. We adapt Mutation Testing to test these program changes by proposing commit-relevant mutants. This type of mutant affects the changed program behaviours and represents the commit-relevant test requirements. We use the benchmarks from C and Java to validate our proposal. The experiment results indicate that commit-relevant mutants can effectively enhance code change testing. Second, based on the aforementioned work, we introduce MuDelta, an AI approach that identifies commit-relevant mutants, i.e., some mutants that interact with the code change. MuDelta uses manually-designed features that require expert knowledge. MuDelta leverages a combined scheme of static code characteristics as the data feature. Our evaluation results indicate that commit-based mutation testing is suitable and promising for evolving software systems. Third, this dissertation proposes a new approach GraphCode2Vec to learn the general software code representation. Recent works utilize natural language models to embed the code into the vector representation. Code embedding is a keystone in the application of machine learning on several Software Engineering (SE) tasks. Its target is to extract universal features automatically. GraphCode2Vec considers program syntax and semantics simultaneously by combining code analysis and Graph Neural Networks(GNN). We evaluate our approach in the mutation testing task and three other tasks (method name prediction, solution classification, and overfitted patch classification). GraphCode2Vec is better or comparable to the state-of-the-art code embedding models. We also perform an ablation study and probing analysis to give insights into GraphCode2Vec. Finally, this dissertation studies Mutation Testing to select test data for deep learning systems. Since deep learning systems play an essential role in different fields, the safety of DL systems takes centre stage. Such DL systems are much different from traditional software systems, and the existed testing techniques are not supportive of guaranteeing the reliability of the deep learning systems. It is well-known that DL systems usually require extensive data for learning. It is significant to select data for training and testing DL systems. A good dataset can help DL models have a good performance. There are several metrics to guide choosing data to test DL systems. We compare a set of test selection metrics for DL systems. Our results show that uncertainty-based metrics are competent in identifying misclassified data. These metrics also improve classification accuracy faster when retraining DL systems. In summary, this dissertation shows the usage of Mutation Testing in the artificial intelligence era. The first, second and third contributions are on Mutation Testing helping modern software test in CI. The fourth contribution is a study on selecting training and testing data for DL systems. Mutation Testing is an excellent technique for testing modern software systems. At the same time, AI algorithms can alleviate the main challenges of Mutation Testing in practice by reducing the resource cost. [less ▲]

Detailed reference viewed: 185 (6 UL)
Full Text
Peer Reviewed
See detailTowards Exploring the Limitations of Active Learning: An Empirical Study
Hu, Qiang UL; Guo, Yuejun UL; Cordy, Maxime UL et al

in The 36th IEEE/ACM International Conference on Automated Software Engineering. (2021)

Deep neural networks (DNNs) are being increasingly deployed as integral parts of software systems. However, due to the complex interconnections among hidden layers and massive hyperparameters, DNNs ... [more ▼]

Deep neural networks (DNNs) are being increasingly deployed as integral parts of software systems. However, due to the complex interconnections among hidden layers and massive hyperparameters, DNNs require being trained using a large number of labeled inputs, which calls for extensive human effort for collecting and labeling data. Spontaneously, to alleviate this growing demand, a surge of state-of-the-art studies comes up with different metrics to select a small yet informative dataset for the model training. These research works have demonstrated that DNN models can achieve competitive performance using a carefully selected small set of data. However, the literature lacks proper investigation of the limitations of data selection metrics, which is crucial to apply them in practice. In this paper, we fill this gap and conduct an extensive empirical study to explore the limits of selection metrics. Our study involves 15 selection metrics evaluated over 5 datasets (2 image classification tasks and 3 text classification tasks), 10 DNN architectures, and 20 labeling budgets (ratio of training data being labeled). Our findings reveal that, while selection metrics are usually effective in producing accurate models, they may induce a loss of model robustness (against adversarial examples) and resilience to compression. Overall, we demonstrate the existence of a trade-off between labeling effort and different model qualities. This paves the way for future research in devising selection metrics considering multiple quality criteria. [less ▲]

Detailed reference viewed: 218 (48 UL)