Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

Active learning; benchmark; empirical analysis; machine learning for code; Active Learning; Benchmark; Code; Empirical analysis; Features extraction; Labelings; Machine learning for code; Machine-learning; Task analysis; Training data; Software; Codes; Data models; Training; Feature extraction; Labeling

Abstract :

[en] The costly human effort required to prepare the training data of machine learning (ML) models hinders their practical development and usage in software engineering (ML4Code), especially for those with limited budgets. Therefore, efficiently training models of code with less human effort has become an emergent problem. Active learning is such a technique to address this issue that allows developers to train a model with reduced data while producing models with desired performance, which has been well studied in computer vision and natural language processing domains. Unfortunately, there is no such work that explores the effectiveness of active learning for code models. In this paper, we bridge this gap by building the first benchmark to study this critical problem - active code learning. Specifically, we collect 11 acquisition functions (which are used for data selection in active learning) from existing works and adapt them for code-related tasks. Then, we conduct an empirical study to check whether these acquisition functions maintain performance for code data. The results demonstrate that feature selection highly affects active learning and using output vectors to select data is the best choice. For the code summarization task, active code learning is ineffective which produces models with over a 29.64% gap compared to the expected performance. Furthermore, we explore future directions of active code learning with an exploratory study. We propose to replace distance calculation methods with evaluation metrics and find a correlation between these evaluation-based distance methods and the performance of code models.

Disciplines :

Computer science

Author, co-author :

HU, Qiang ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON

GUO, Yuejun ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust > SerVal > Team Yves LE TRAON ; Luxembourg Institute of Science and Technology, Belval, Luxembourg

Xie, Xiaofei ; Singapore Management University, Singapore

CORDY, Maxime ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

Ma, Lei ; University of Tokyo, Tokyo, Japan ; University of Alberta, Edmonton, Canada

PAPADAKIS, Mike ; University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal

Traon, Yves Le ; University of Luxembourg, Belval, Luxembourg

External co-authors :

yes

Language :

English

Title :

Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

Publication date :

May 2024

Journal title :

IEEE Transactions on Software Engineering

ISSN :

0098-5589

eISSN :

1939-3520

Publisher :

Institute of Electrical and Electronics Engineers Inc.

Volume :

Issue :

Pages :

1080 - 1095

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://xplorestaging.ieee.org/ielx7/32/10531114/10471610.pdf?arnumber=10471610

Funders :

European Union’s Horizon Research and Innovation Programme
Project LAZARUS
Luxembourg National Research Funds

Available on ORBilu :

since 06 January 2025

Statistics

Number of views

118 (4 by Unilu)

Number of downloads

80 (1 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

WoS citations^™

Bibliography

M. Allamanis, E. T. Barr, P. Devanbu, and C. Sutton, "A survey of machine learning for big code and naturalness," ACM Comput. Surv., vol. 51, no. 4, pp. 1-37, 2018.
X. Hu, G. Li, X. Xia, D. Lo, S. Lu, and Z. Jin, "Summarizing source code with transferred API knowledge," in Proc. 27th Int. Joint Conf. Artif. Intell. (IJCAI), 2018, pp. 2269-2275, doi:10.24963/ijcai.2018/314.
L. Li, H. Feng, W. Zhuang, N. Meng, and B. Ryder, "CCLearner: A deep learning-based clone detection approach," in Proc. IEEE Int. Conf. Softw. Maintenance Evol. (ICSME), 2017, pp. 249-260.
Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, "Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks," in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019.
T. Wolf et al., "Transformers: State-of-the-art natural language processing," in Proc. Conf. Empirical Methods Natural Lang. Process., Syst. Demonstrations, Association for Computational Linguistics, Oct. 2020, pp. 38-45. Accessed: Oct. 2020. [Online]. Available: https:// aclanthology.org/2020.emnlp-demos.6
B. Settles, "Active learning literature survey," Univ. Wisconsin-Madison, Madison, WI, USA, CS Tech. Rep. TR1648, 2009.
O. Sener and S. Savarese, "Active learning for convolutional neural networks: A core-set approach," in Proc. Int. Conf. Learn. Representations, 2018. [Online]. Available: https://openreview.net/forum?id=H1aIuk-RW
B. Settles and M. Craven, "An analysis of active learning strategies for sequence labeling tasks," in Proc. Conf. Empirical Methods Natural Lang. Process., 2008, pp. 1070-1079.
Q. Hu et al., "Towards exploring the limitations of active learning: An empirical study," in Proc. 36th IEEE/ACM Int. Conf. Automated Softw. Eng., 2021, pp. 917-929.
M. Weiss and P. Tonella, "Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study)," in Proc. 31st ACM SIGSOFT Int. Symp. Softw. Testing Anal. (ISSTA), 2022, pp. 139-150.
H. Liu and L. Yu, "Toward integrating feature selection algorithms for classification and clustering," IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 491-502, Apr. 2005.
"Active code learning." Google Sites. Accessed: 2023. [Online]. Available: https://sites.google.com/view/activecodelearning
D. Wang and Y. Shang, "A new active labeling method for deep learning," in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Piscataway, NJ, USA: IEEE Press, 2014, pp. 112-119.
Y. Feng, Q. Shi, X. Gao, J. Wan, C. Fang, and Z. Chen, "DeepGini: Prioritizing massive tests to enhance the robustness of deep neural networks," in Proc. 29th ACM SIGSOFT Int. Symp. Softw. Testing Anal., 2020, pp. 177-188.
Y. Gal, R. Islam, and Z. Ghahramani, "Deep Bayesian active learning with image data," in Proc. Int. Conf. Mach. Learn., PMLR, 2017, pp. 1183-1192.
K. Margatina, G. Vernikos, L. Barrault, and N. Aletras, "Active learning by acquiring contrastive examples," in Proc. Conf. Empirical Methods Natural Lang. Process., Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 650-663. Accessed: Nov. 2021. [Online]. Available: https://aclanthology.org/2021.emnlp-main.51
J. T. Ash, C. Zhang, A. Krishnamurthy, J. Langford, and A. Agarwal, "Deep batch active learning by diverse, uncertain gradient lower bounds," in Proc. Int. Conf. Learn. Representations, 2020. [Online]. Available: https://openreview.net/forum?id=ryghZJBKPS
Z. Feng et al., "CodeBERT: A pre-trained model for programming and natural languages," in Proc. Findings Assoc. Comput. Linguistics (EMNLP), Association for Computational Linguistics, Nov. 2020, pp. 1536-1547. Accessed: 2020. [Online]. Available: https:// aclanthology.org/2020.findings-emnlp.139
J. D. M.-W. C. Kenton and L. K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, Human Lang. Technol., Association for Computational Linguistics, 2019, vol. 1 (Long and Short Papers), pp. 4171-4186.
D. Guo et al., "GraphCodeBERT: Pre-training code representations with data flow," 2020, arXiv:2009.08366.
Z. Tian, J. Chen, and Z. Jin, "Code difference guided adversarial example generation for deep code models," in Proc. 38th IEEE/ACM Int. Conf. Automated Softw. Eng. (ASE), Los Alamitos, CA, USA: IEEE Comput. Soc. Press, Sep. 2023, pp. 850-862, doi:10.1109/ASE56229. 2023.00149.
Z. Yang, J. Shi, J. He, and D. Lo, "Natural attack for pretrained models of code," in Proc. 44th Int. Conf. Softw. Eng., 2022, pp. 1482-1493.
Y. Wang, W. Wang, S. Joty, and S. C. Hoi, "CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation," in Proc. Conf. Empirical Methods Natural Lang. Process., M.-F. Moens, X. Huang, L. Specia, and S. W.-t. Yih, Eds., Punta Cana, Dominican Republic: Association for Computational Linguistics, Nov. 2021, pp. 8696-8708. Accessed: 2021. [Online]. Available: https:// aclanthology.org/2021.emnlp-main.685
R. Puri et al, "CodeNet: A large-scale AI for code dataset for learning a diversity of coding tasks," in Proc. Neural Inf. Process. Syst. (NeurIPS) Track Datasets Benchmarks, 2021. Accessed: 2021. [Online]. Available: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/ a5bfc9e07964f8dddeb95fc584cd965d-Abstract-round2.html
J. Svajlenko, J. F. Islam, I. Keivanloo, C. K. Roy, and M. M. Mia, "Towards a big data curated benchmark of inter-project code clones," in Proc. IEEE Int. Conf. Softw. Maintenance Evol., Piscataway, NJ, USA: IEEE Press, 2014, pp. 476-480.
Y. Zhou, S. Liu, J. Siow, X. Du, and Y. Liu, "Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks," in Proc. Adv. Neural Inf. Process. Syst., H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32. Vancouver, Canada: Curran Associates, Inc., 2019. Accessed: Dec. 8, 2019. [Online]. Available: https://proceedings.neurips. cc/paper/2019/file/49265d2447bc3bbfe9e76306ce40a31f-Paper.pdf
S. Lu et al., "CodeXGLUE: A machine learning benchmark dataset for code understanding and generation," 2021, arXiv:2102.04664.
Y. Liu et al., "RoBERTa: A robustly optimized BERT pretraining approach," 2019, arXiv:1907.11692.
C. Raffel et al., "Exploring the limits of transfer learning with a unified text-to-text transformer," J. Mach. Learn. Res., vol. 21, no. 1, pp. 5485-5551, Jan. 2020.
Y. Wan et al., "Improving automatic source code summarization via deep reinforcement learning," in Proc. 33rd ACM/IEEE Int. Conf. Automated Softw. Eng., 2018, pp. 397-407.
D. B. Owen, "The power of student's t-test," J. Amer. Statist. Assoc., vol. 60, no. 309, pp. 320-333, 1965.
Q. Hu et al., "An empirical study on data distribution-aware test selection for deep learning enhancement," ACM Trans. Softw. Eng. Methodol., vol. 31, no. 4, pp. 1-30, 2022.
A. Eghbali and M. Pradel, "CrystalBLEU: Precisely and efficiently measuring the similarity of code," in Proc. 37th IEEE/ACM Int. Conf. Automated Softw. Eng., 2022, pp. 1-12.
S. Zhou, U. Alon, S. Agarwal, and G. Neubig, "CodeBERTScore: Evaluating code generation with pretrained models of code," 2023, arXiv:2302.05527.
OpenAI, "GPT-4 technical report," 2023, arXiv:2303.08774.
Y. Li, M. Chen, Y. Liu, D. He, and Q. Xu, "An empirical study on the efficacy of deep active learning for image classification," 2022, arXiv:2212.03088.
Y. Guo, Q. Hu, M. Cordy, M. Papadakis, and Y. Le Traon, "DRE: Density-based data selection with entropy for adversarial-robust deep learning models," Neural Comput. Appl., vol. 45, pp. 4009-4026, Feb. 2023.
D. Pereira-Santos, R. B. C. Prudencio, and A. C. de Carvalho, "Empirical investigation of active learning strategies," Neurocomputing, vol. 326, pp. 15-27, Jan. 2019.
Z. Yu, N. A. Kraft, and T. Menzies, "Finding better active learners for faster literature reviews," Empirical Softw. Eng., vol. 23, pp. 3161-3186, Dec. 2018.
A. Siddhant and Z. C. Lipton, "Deep Bayesian active learning for natural language processing: Results of a large-scale empirical study," in Proc. Conf. Empirical Methods Natural Lang. Process., Brussels, Belgium: Association for Computational Linguistics, Oct./Nov. 2018, pp. 2904-2909. Accessed: 2018. [Online]. Available: https://aclanthology.org/ D18-1318
M. E. Ramirez-Loaiza, M. Sharma, G. Kumar, and M. Bilgic, "Active learning: An empirical study of common baselines," Data Mining Knowl. Discovery, vol. 31, pp. 287-313, Mar. 2017.
F. C. Heilbron, J.-Y. Lee, H. Jin, and B. Ghanem, "What do I annotate next? An empirical study of active learning for action localization," in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 199-216.
N. Chirkova and S. Troshin, "Empirical study of transformers for source code," in Proc. 29th ACM Joint Meeting Eur. Softw. Eng. Conf. Symp. Found. Softw. Eng., 2021, pp. 703-715.
C. Niu, C. Li, V. Ng, D. Chen, J. Ge, and B. Luo, "An empirical comparison of pre-trained models of source code," in Proc. 45th Int. Conf. Softw. Eng. (ICSE), May 2023, pp. 2136-2148.
B. Steenhoek, M. M. Rahman, R. Jiles, and W. Le, "An empirical study of deep learning models for vulnerability detection," in Proc. IEEE/ACM 45th Int. Conf. Softw. Eng. (ICSE), 2023, pp. 2237-2248.
W. Jiang et al., "An empirical study of pre-trained model reuse in the hugging face deep learning model registry," in Proc. 45th Int. Conf. Softw. Eng. (ICSE), May 2023, pp. 2463-2475.
A. Mastropaolo et al., "On the robustness of code generation techniques: An empirical study on GitHub Copilot," in Proc. IEEE/ACM 45th Int. Conf. Softw. Eng. (ICSE), 2023, pp. 2149-2160.
Q. Hu et al., "CodeS: Towards code model generalization under distribution shift," in Proc. Int. Conf. Softw. Eng. (ICSE), New Ideas Emerg. Results (NIER), 2023, pp. 1-6.
P. Nie, J. Zhang, J. J. Li, R. Mooney, and M. Gligoric, "Impact of evaluation methodologies on code summarization," in Proc. 60th Annu. Meeting Assoc. Comput. Linguistics, vol. 1 (Long Papers). Dublin, Ireland: Association for Computational Linguistics, May 2022, pp. 4936-4960. Accessed: 2022. [Online]. Available: https://aclanthology.org/2022.acllong. 339
H. Tian et al., "Is ChatGPT the ultimate programming assistant-How far is it?" 2023, arXiv:2304.11938.
W. Sun et al., "Automatic code summarization via ChatGPT: How far are we?" 2023, arXiv:2305.12865.
Y. Charalambous, N. Tihanyi, R. Jain, Y. Sun, M. A. Ferrag, and L. C. Cordeiro, "A new era in software security: Towards self-healing software via large language models and formal verification," 2023, arXiv:2305.14752.
C. S. Xia and L. Zhang, "Keep the conversation going: Fixing 162 out of 337 bugs for 0.42 each using ChatGPT," 2023, arXiv:2304. 00385.
Y. Deng, C. S. Xia, C. Yang, S. D. Zhang, S. Yang, and L. Zhang, "Large language models are edge-case fuzzers: Testing deep learning libraries via FuzzGPT," 2023, arXiv:2304.02014.
W. Ma et al., "The scope of ChatGPT in software engineering: A thorough investigation," 2023, arXiv:2305.12138.
R. Moskovitch, N. Nissim, and Y. Elovici, "Malicious code detection using active learning," in Proc. Int. Workshop Privacy, Secur., Trust KDD, Berlin, Germany: Springer-Verlag, 2008, pp. 74-91.
N. Nissim, R. Moskovitch, O. BarAd, L. Rokach, and Y. Elovici, "ALDROID: Efficient update of android anti-virus software using designated active learning methods," Knowl. Inf. Syst., vol. 49, pp. 795-833, Dec. 2016.
N. Nissim, R. Moskovitch, L. Rokach, and Y. Elovici, "Novel active learning methods for enhanced PC malware detection in windows OS," Expert Syst. Appl., vol. 41, no. 13, pp. 5843-5857, 2014.
P. Samoaa, L. Aronsson, A. Longa, P. Leitner, and M. H. Chehreghani, "A unified active learning framework for annotating graph data with application to software source code performance prediction," 2023, arXiv:2304.13032.
D. Wu, C.-T. Lin, and J. Huang, "Active learning for regression using greedy sampling," Inf. Sci., vol. 474, pp. 90-105, Feb. 2019.
M. Berezov, C. Ancourt, J. Zawalska, and M. Savchenko, "COLA-Gen: Active learning techniques for automatic code generation of benchmarks," in Proc. 13th Workshop Parallel Program. Run-Time Manage. Techn. Many-Core Archit./11th Workshop Des. Tools Archit. Multicore Embedded Comput. Platforms (PARMA-DITAM), Wadern, Germany: Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2022, pp. 3:1-3:14.