Article (Scientific journals)
Active Code Learning: Benchmarking Sample-Efficient Training of Code Models
HU, Qiang; GUO, Yuejun; Xie, Xiaofei et al.
2024In IEEE Transactions on Software Engineering, 50 (5), p. 1080 - 1095
Peer Reviewed verified by ORBi
 

Files


Full Text
Active_Code_Learning_Benchmarking_Sample-Efficient_Training_of_Code_Models.pdf
Author postprint (1.36 MB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Active learning; benchmark; empirical analysis; machine learning for code; Active Learning; Benchmark; Code; Empirical analysis; Features extraction; Labelings; Machine learning for code; Machine-learning; Task analysis; Training data; Software; Codes; Data models; Training; Feature extraction; Labeling
Abstract :
[en] The costly human effort required to prepare the training data of machine learning (ML) models hinders their practical development and usage in software engineering (ML4Code), especially for those with limited budgets. Therefore, efficiently training models of code with less human effort has become an emergent problem. Active learning is such a technique to address this issue that allows developers to train a model with reduced data while producing models with desired performance, which has been well studied in computer vision and natural language processing domains. Unfortunately, there is no such work that explores the effectiveness of active learning for code models. In this paper, we bridge this gap by building the first benchmark to study this critical problem - active code learning. Specifically, we collect 11 acquisition functions (which are used for data selection in active learning) from existing works and adapt them for code-related tasks. Then, we conduct an empirical study to check whether these acquisition functions maintain performance for code data. The results demonstrate that feature selection highly affects active learning and using output vectors to select data is the best choice. For the code summarization task, active code learning is ineffective which produces models with over a 29.64% gap compared to the expected performance. Furthermore, we explore future directions of active code learning with an exploratory study. We propose to replace distance calculation methods with evaluation metrics and find a correlation between these evaluation-based distance methods and the performance of code models.
Disciplines :
Computer science
Author, co-author :
HU, Qiang  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON
GUO, Yuejun  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON ; Luxembourg Institute of Science and Technology, Belval, Luxembourg
Xie, Xiaofei ;  Singapore Management University, Singapore
CORDY, Maxime  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Ma, Lei ;  University of Tokyo, Tokyo, Japan ; University of Alberta, Edmonton, Canada
PAPADAKIS, Mike  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Traon, Yves Le ;  University of Luxembourg, Belval, Luxembourg
External co-authors :
yes
Language :
English
Title :
Active Code Learning: Benchmarking Sample-Efficient Training of Code Models
Publication date :
May 2024
Journal title :
IEEE Transactions on Software Engineering
ISSN :
0098-5589
eISSN :
1939-3520
Publisher :
Institute of Electrical and Electronics Engineers Inc.
Volume :
50
Issue :
5
Pages :
1080 - 1095
Peer reviewed :
Peer Reviewed verified by ORBi
Funders :
European Union’s Horizon Research and Innovation Programme
Project LAZARUS
Luxembourg National Research Funds
Available on ORBilu :
since 06 January 2025

Statistics


Number of views
89 (4 by Unilu)
Number of downloads
48 (1 by Unilu)

Scopus citations®
 
6
Scopus citations®
without self-citations
6
OpenCitations
 
0
OpenAlex citations
 
5
WoS citations
 
5

Bibliography


Similar publications



Contact ORBilu