Paper published in a book (Scientific congresses, symposiums and conference proceedings)
Revisiting the Non-Determinism of Code Generation by the GPT-3.5 Large Language Model
Sawadogo, Salimata; Sabane, Aminata; Kafando, Rodrique et al.
2025In Proceedings - 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025
Peer reviewed
 

Files


Full Text
First_paper.pdf
Author preprint (613.11 kB)
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Code Generation; LLMs; Non-Determinism; Tree of Thoughts; Codegeneration; Deterministic behavior; High degree of variability; Language model; Large language model; Non Determinism; Software engineering research; Thought process; Tree of thought; Hardware and Architecture; Software; Safety, Risk, Reliability and Quality
Abstract :
[en] Despite recent advancements in Large Language Models (LLMs) for code generation, their inherent non-determinism remains a significant obstacle for reliable and reproducible software engineering research. Prior work has highlighted the high degree of variability in LLM-generated code, even when prompted with identical inputs. This non-deterministic behavior can undermine the validity of scientific conclusions drawn from LLM-based experiments. This paper showcases the Tree of Thoughts (ToT) prompting strategy as a promising alternative for improving the predictability and quality of code generation results. By guiding the LLM through a structured Thoughts process, ToT aims to reduce the randomness inherent in the generation process and improve the consistency of the output. Our experiments on GPT-3.5 Turbo model using 829 code generation problems from benchmarks such as CodeContests, APPS (Automated Programming Progress Standard) and HumanEval demonstrate a substantial reduction in non-determinism compared to previous findings. Specifically, we observed a significant decrease in the number of coding tasks that produced inconsistent outputs across multiple requests. Nevertheless, we show that the reduction in semantic variability was less pronounced for HumanEval (69%), indicating unique challenges present in this dataset that are not fully mitigated by ToT.
Disciplines :
Computer science
Author, co-author :
Sawadogo, Salimata;  Université Joseph Ki-Zerbo, Centre d'Excellence en IA (CITADEL), Ouagadougou, Burkina Faso
Sabane, Aminata;  Université Joseph Ki-Zerbo, Centre d'Excellence en IA (CITADEL), Ouagadougou, Burkina Faso
Kafando, Rodrique;  Université Virtuelle du Burkina Faso, Centre d'Excellence en IA (CITADEL), Ouagadougou, Burkina Faso
Kabore, Abdoul Kader;  Université du Luxembourg, Centre d'Excellence en IA (CITADEL), Ouagadougou, Burkina Faso
BISSYANDE, Tegawendé  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > TruX
External co-authors :
yes
Language :
English
Title :
Revisiting the Non-Determinism of Code Generation by the GPT-3.5 Large Language Model
Publication date :
04 March 2025
Event name :
2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
Event organizer :
IEEE
Event date :
04-03-2025 - 07-03-2025
Audience :
International
Main work title :
Proceedings - 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025
Publisher :
Institute of Electrical and Electronics Engineers Inc.
ISBN/EAN :
9798331535100
Pages :
36-44
Peer reviewed :
Peer reviewed
Available on ORBilu :
since 15 December 2025

Statistics


Number of views
0 (0 by Unilu)
Number of downloads
0 (0 by Unilu)

Scopus citations®
 
1
Scopus citations®
without self-citations
1
OpenCitations
 
0
OpenAlex citations
 
1

Bibliography


Similar publications



Contact ORBilu