Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)
CodeGrid: A Grid Representation of Code
KABORE, Abdoul Kader; Barr, Earl T.; KLEIN, Jacques et al.
2023In Just, Rene (Ed.) ISSTA 2023 - Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis
Peer reviewed
 

Documents


Texte intégral
3597926.3598141.pdf
Postprint Auteur (773.31 kB) Licence Creative Commons - Attribution
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
Code TypeSetting; Spatial-Aware Neural Network; Abstract Syntax Trees; Applications of AI; Code representation; Code typesetting; Neural-networks; Performance; Rich structure; Spatial-aware neural network; State of the art; Structural information; Artificial Intelligence; Software
Résumé :
[en] Code representation is a key step in the application of AI in software engineering. Generic NLP representations are effective but do not exploit all the rich structure inherent to code. Recent work has focused on extracting abstract syntax trees (AST) and integrating their structural information into code representations.These AST-enhanced representations advanced the state of the art and accelerated new applications of AI to software engineering. ASTs, however, neglect important aspects of code structure, notably control and data flow, leaving some potentially relevant code signal unexploited. For example, purely image-based representations perform nearly as well as AST-based representations, despite the fact that they must learn to even recognize tokens, let alone their semantics. This result, from prior work, is strong evidence that these new code representations can still be improved; it also raises the question of just what signal image-based approaches are exploiting. We answer this question. We show that code is spatial and exploit this fact to propose , a new representation that embeds tokens into a grid that preserves code layout. Unlike some of the existing state of the art, is agnostic to the downstream task: whether that task is generation or classification, can complement the learning algorithm with spatial signal. For example, we show that CNNs, which are inherently spatially-aware models, can exploit outputs to effectively tackle fundamental software engineering tasks, such as code classification, code clone detection and vulnerability detection. PixelCNN leverages 's grid representations to achieve code completion. Through extensive experiments, we validate our spatial code hypothesis, quantifying model performance as we vary the degree to which the representation preserves the grid. To demonstrate its generality, we show that augments models, improving their performance on a range of tasks, On clone detection, improves ASTNN's performance by 3.3% F1 score.
Disciplines :
Sciences informatiques
Auteur, co-auteur :
KABORE, Abdoul Kader  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
Barr, Earl T.;  University College London, United Kingdom ; Google DeepMind
KLEIN, Jacques  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
Bissyandé, Tegawendé F.;  University of Luxembourg, Luxembourg
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
CodeGrid: A Grid Representation of Code
Date de publication/diffusion :
12 juillet 2023
Nom de la manifestation :
Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis
Lieu de la manifestation :
Seattle, Usa
Date de la manifestation :
17-07-2023 => 21-07-2023
Manifestation à portée :
International
Titre de l'ouvrage principal :
ISSTA 2023 - Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis
Editeur scientifique :
Just, Rene
Maison d'édition :
Association for Computing Machinery, Inc
ISBN/EAN :
9798400702211
Peer reviewed :
Peer reviewed
Organisme subsidiant :
ACM SIGSOFT
AITO
Subventionnement (détails) :
This work was partly supported (1) by the Luxembourg National Research Fund (FNR) - NERVE project, ref. 14591304, (2) by the Luxembourg Ministry of Foreign and European Affairs through their Digital4Development (D4D) portfolio under project LuxWAyS and (3) by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (Project NATURAL - grant agreement N° 949014).
Disponible sur ORBilu :
depuis le 22 novembre 2023

Statistiques


Nombre de vues
113 (dont 3 Unilu)
Nombre de téléchargements
40 (dont 2 Unilu)

citations Scopus®
 
2
citations Scopus®
sans auto-citations
2
citations OpenAlex
 
2

Bibliographie


Publications similaires



Contacter ORBilu