Reference : GraphCode2Vec: generic code embedding via lexical and program dependence analyses
Scientific congresses, symposiums and conference proceedings : Paper published in a book
Engineering, computing & technology : Computer science
http://hdl.handle.net/10993/53862
GraphCode2Vec: generic code embedding via lexical and program dependence analyses
English
Ma, Wei mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal]
Zhao, Mengjie mailto [LMU Munich, Germany]
Soremekun, Ezekiel mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal]
Hu, Qiang mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal]
Zhang, Jie M. mailto [University College London, United Kingdom]
Papadakis, Mike mailto [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)]
Cordy, Maxime mailto [University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal]
Xie, Xiaofei mailto [Singapore Management University, Singapore]
Traon, Yves Le mailto [University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Life Sciences and Medicine (DLSM)]
22-May-2022
Proceedings of the 19th International Conference on Mining Software Repositories
524--536
Yes
9th International Conference on Mining Software Repositories
May 23–24, 2022
[en] code embedding ; code representation ; code analysis
[en] Code embedding is a keystone in the application of machine learn- ing on several Software Engineering (SE) tasks. To effectively support a plethora of SE tasks, the embedding needs to capture program syntax and semantics in a way that is generic. To this end, we propose the first self-supervised pre-training approach (called GraphCode2Vec) which produces task-agnostic embedding of lexical and program dependence features. GraphCode2Vec achieves this via a synergistic combination of code analysis and Graph Neural Networks. GraphCode2Vec is generic, it allows pre-training, and it is applicable to several SE downstream tasks. We evaluate the effectiveness of GraphCode2Vec on four (4) tasks (method name prediction, solution classification, mutation testing and overfitted patch classification), and compare it with four (4) similarly generic code embedding baselines (Code2Seq, Code2Vec, CodeBERT, Graph- CodeBERT) and seven (7) task-specific, learning-based methods. In particular, GraphCode2Vec is more effective than both generic and task-specific learning-based baselines. It is also complementary and comparable to GraphCodeBERT (a larger and more complex model). We also demonstrate through a probing and ablation study that GraphCode2Vec learns lexical and program dependence features and that self-supervised pre-training improves effectiveness.
http://hdl.handle.net/10993/53862
10.1145/3524842.3528456

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
MSR22.pdfPublisher postprint901.94 kBView/Open

Bookmark and Share SFX Query

All documents in ORBilu are protected by a user license.