Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)
Assessing the Generalizability of code2vec Token Embeddings
Kang, Hong Jin; BISSYANDE, Tegawendé François D Assise; David, Lo
2019In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering
Peer reviewed
 

Documents


Texte intégral
ase19-code2vec.pdf
Preprint Auteur (250.45 kB)
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
Code Embeddings; Distributed Representations; Big Code
Résumé :
[en] Many Natural Language Processing (NLP) tasks, such as sentiment analysis or syntactic parsing, have benefited from the development of word embedding models. In particular, regardless of the training algorithms, the learned embeddings have often been shown to be generalizable to different NLP tasks. In contrast, despite recent momentum on word embeddings for source code, the literature lacks evidence of their generalizability beyond the example task they have been trained for. In this experience paper, we identify 3 potential downstream tasks, namely code comments generation, code authorship identification, and code clones detection, that source code token embedding models can be applied to. We empirically assess a recently proposed code token embedding model, namely code2vec’s token embeddings. Code2vec was trained on the task of predicting method names, and while there is potential for using the vectors it learns on other tasks, it has not been explored in literature. Therefore, we fill this gap by focusing on its generalizability for the tasks we have identified. Eventually, we show that source code token embeddings cannot be readily leveraged for the downstream tasks. Our experiments even show that our attempts to use them do not result in any improvements over less sophisticated methods. We call for more research into effective and general use of code embeddings.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
Kang, Hong Jin;  Singapore Management University > SIS
BISSYANDE, Tegawendé François D Assise  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT)
David, Lo
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Assessing the Generalizability of code2vec Token Embeddings
Date de publication/diffusion :
novembre 2019
Nom de la manifestation :
34th IEEE/ACM International Conference on Automated Software Engineering
Lieu de la manifestation :
San Diego, California, Etats-Unis
Date de la manifestation :
from 10/11/2019 to 15/11/2019
Manifestation à portée :
International
Titre de l'ouvrage principal :
Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering
Pagination :
1-12
Peer reviewed :
Peer reviewed
Focus Area :
Security, Reliability and Trust
Disponible sur ORBilu :
depuis le 23 janvier 2020

Statistiques


Nombre de vues
182 (dont 8 Unilu)
Nombre de téléchargements
185 (dont 6 Unilu)

citations Scopus®
 
62
citations Scopus®
sans auto-citations
56
citations OpenAlex
 
74

Bibliographie


Publications similaires



Contacter ORBilu