Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)
GraphCode2Vec: generic code embedding via lexical and program dependence analyses
MA, Wei; Zhao, Mengjie; SOREMEKUN, Ezekiel et al.
2022In Proceedings of the 19th International Conference on Mining Software Repositories
Peer reviewed
 

Documents


Texte intégral
MSR22.pdf
Postprint Éditeur (923.58 kB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
code embedding; code representation; code analysis
Résumé :
[en] Code embedding is a keystone in the application of machine learn- ing on several Software Engineering (SE) tasks. To effectively support a plethora of SE tasks, the embedding needs to capture program syntax and semantics in a way that is generic. To this end, we propose the first self-supervised pre-training approach (called GraphCode2Vec) which produces task-agnostic embedding of lexical and program dependence features. GraphCode2Vec achieves this via a synergistic combination of code analysis and Graph Neural Networks. GraphCode2Vec is generic, it allows pre-training, and it is applicable to several SE downstream tasks. We evaluate the effectiveness of GraphCode2Vec on four (4) tasks (method name prediction, solution classification, mutation testing and overfitted patch classification), and compare it with four (4) similarly generic code embedding baselines (Code2Seq, Code2Vec, CodeBERT, Graph- CodeBERT) and seven (7) task-specific, learning-based methods. In particular, GraphCode2Vec is more effective than both generic and task-specific learning-based baselines. It is also complementary and comparable to GraphCodeBERT (a larger and more complex model). We also demonstrate through a probing and ablation study that GraphCode2Vec learns lexical and program dependence features and that self-supervised pre-training improves effectiveness.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
MA, Wei ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Zhao, Mengjie;  LMU Munich, Germany
SOREMEKUN, Ezekiel ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
HU, Qiang ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Zhang, Jie M.;  University College London, United Kingdom
PAPADAKIS, Mike ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
CORDY, Maxime  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Xie, Xiaofei;  Singapore Management University, Singapore
Traon, Yves Le;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Life Sciences and Medicine (DLSM)
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
GraphCode2Vec: generic code embedding via lexical and program dependence analyses
Date de publication/diffusion :
22 mai 2022
Nom de la manifestation :
9th International Conference on Mining Software Repositories
Date de la manifestation :
May 23–24, 2022
Titre de l'ouvrage principal :
Proceedings of the 19th International Conference on Mining Software Repositories
Pagination :
524--536
Peer reviewed :
Peer reviewed
Disponible sur ORBilu :
depuis le 16 janvier 2023

Statistiques


Nombre de vues
205 (dont 7 Unilu)
Nombre de téléchargements
190 (dont 9 Unilu)

citations Scopus®
 
24
citations Scopus®
sans auto-citations
20
OpenCitations
 
1
citations OpenAlex
 
25

Bibliographie


Publications similaires



Contacter ORBilu