Communication publiée dans un ouvrage (Colloques, congrès, conférences scientifiques et actes)
On the Creation of Representative Samples of Software Repositories
Gorostidi, June; AIT-MIMOUNE FONOLLA, Adem; CABOT, Jordi et al.
2024In Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2024
Peer reviewed
 

Documents


Texte intégral
esem24-main.pdf
Preprint Auteur (664.76 kB) Licence Creative Commons - Attribution, Pas d'Utilisation Commerciale, Partage dans les Mêmes Conditions
Télécharger

Tous les documents dans ORBilu sont protégés par une licence d'utilisation.

Envoyer vers



Détails



Mots-clés :
Empirical Studies; Repositories; Sampling; Coding platform; Data sampling; Empirical Software Engineering; Empirical studies; Mining software; Repository; Representative sample; Software project; Software repositories; Source data; Computer Science Applications; Software
Résumé :
[en] Software repositories is one of the sources of data in Empirical Software Engineering, primarily in the Mining Software Repositories field, aimed at extracting knowledge from the dynamics and practice of software projects. With the emergence of social coding platforms such as GitHub, researchers have now access to millions of software repositories to use as source data for their studies. With this massive amount of data, sampling techniques are needed to create more manageable datasets. The creation of these datasets is a crucial step, and researchers have to carefully select the repositories to create representative samples according to a set of variables of interest. However, current sampling methods are often based on random selection or rely on variables which may not be related to the research study (e.g., popularity or activity). In this paper, we present a methodology for creating representative samples of software repositories, where such representativeness is properly aligned with both the characteristics of the population of repositories and the requirements of the empirical study. We illustrate our approach with use cases based on Hugging Face repositories.
Centre de recherche :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > Other
Disciplines :
Sciences informatiques
Auteur, co-auteur :
Gorostidi, June ;  Universitat Oberta de Catalunya (UOC), IN3, Barcelona, Spain
AIT-MIMOUNE FONOLLA, Adem  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PI Cabot
CABOT, Jordi  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > PI Cabot ; Luxembourg Institute of Science and Technology (LIST), Esch-sur-Alzette, Luxembourg
Canovas Izquierdo, Javier Luis ;  Universitat Oberta de Catalunya (UOC), IN3, Barcelona, Spain
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
On the Creation of Representative Samples of Software Repositories
Date de publication/diffusion :
24 octobre 2024
Nom de la manifestation :
Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement
Lieu de la manifestation :
Barcelona, Esp
Date de la manifestation :
24-10-2024 => 25-10-2024
Titre de l'ouvrage principal :
Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2024
Maison d'édition :
IEEE Computer Society
ISBN/EAN :
9798400710476
Peer reviewed :
Peer reviewed
Projet FnR :
FNR16544475 - Better Smart Software Faster (Besser) - An Intelligent Low-code Infrastructure For Smart Software, 2020 (01/01/2022-...) - Jordi Cabot
Intitulé du projet de recherche :
U-AGR-7344 - P20/IS/16544475/BESSER/Cabot - CABOT Jordi
Organisme subsidiant :
MCIN/AEI/10.13039/501100011033 and European Union NextGenerationEU/PRTR
FNR - Luxembourg National Research Fund
Subventionnement (détails) :
This work is part of the project TED2021-130331B-I00 funded by MCIN/AEI/10.13039/501100011033 and European Union NextGenerationEU/ PRTR; and BESSER, funded by the Luxembourg National Research Fund (FNR) PEARL program, grant agreement 16544475.
Disponible sur ORBilu :
depuis le 07 janvier 2025

Statistiques


Nombre de vues
73 (dont 5 Unilu)
Nombre de téléchargements
19 (dont 0 Unilu)

citations Scopus®
 
1
citations Scopus®
sans auto-citations
1
OpenCitations
 
0
citations OpenAlex
 
2

Bibliographie


Publications similaires



Contacter ORBilu