Doctoral thesis (Dissertations and theses)
Scaling up Uniform Random Sampling
ZEYEN, Olivier
2025
 

Files


Full Text
thesis.pdf
Author postprint (9.03 MB) Creative Commons License - Attribution
Download

All documents in ORBilu are protected by a user license.

Send to



Details



Keywords :
Uniform Random Sampling; Knowledge Compilation; Model Counting; Propositional Logic
Abstract :
[en] Despite its NP-completeness, the Boolean satisfiability problem (SAT) gave birth to highly efficient tools that can find solutions to a Boolean formula. Boolean formulae can succinctly represent vast, constrained search spaces, such as the configuration options of variability-intensive systems like the Linux kernel. Due to the sheer size of these spaces, exhaustive exploration is typically infeasible. As a result, most testing approaches rely on sampling a subset of solutions for analysis. A desirable property of such samples is uniformity: each solution should get the same selection probability. This property motivated the design of uniform random samplers, relying on SAT solvers or model counters, and achieving different trade-offs between uniformity and scalability. In this thesis, we contribute to a deeper understanding of uniform random sampling complexity through several advances. First, we introduce a novel, efficient parallel algorithm to compute the number of equivalence classes in Boolean formulae, a structural metric that strongly correlates with sampling time and memory usage. Leveraging these correlations, we develop a classifier that accurately predicts whether sampling will exceed computational budgets. Second, we deepen our understanding of sampling complexity by exploring synthetic formulae and phase transitions. We investigate how phase transitions can explain the practical complexity of sampling. Our results, computed on 11,409 synthetic formulae and 4656 real-world formulae, show that phase transitions occur in both uniform random sampling and SAT-solving, but at a different clause-to-variable ratio than for SAT tasks. We further reveal that low formula modularity is correlated with a higher uniform random sampling time. Overall, our work contributes to a principled understanding of uniform random sampling complexity. Third, we develop a statistical testing framework to support the evaluation and development of new uniform random sampling algorithms. While assessing the uniformity of a sampler shares similarities with testing pseudo-random number generators (PRNGs), key differences make the problem more challenging: sampling is significantly slower, and the space of valid solutions is often highly constrained by the input formula. As a result, traditional PRNG testing methods are insufficient for this domain. To address this, we introduce a suite of five statistical tests specifically designed for evaluating uniformity in constrained sampling scenarios. We apply these tests to seven existing samplers, showing their effectiveness and diagnostic power. Additionally, we demonstrate the influence of the Boolean formula given as input to the samplers under test on the test results. Finally, we introduce a divide-and-conquer approach to knowledge compilation (KC). At the time of writing, knowledge compilation is one of the most effective methods to achieve uniform random sampling. However, KC still fails to scale on some Boolean formulae, including some representing the variability of large configurable systems. Concretely, our DivKC algorithm decomposes a large Boolean formula into two smaller ones, which we can easily compile into the d-DNNF form. When evaluated on a diversified benchmark of 4,656 formulae, DivKC compiles 114 formulae out of the 672 formulae that were previously out of reach for the \dfour state-of-the-art d-DNNF compiler. We then show how to leverage DivKC decompositions to build an approximate model counter and a uniform random sampler.
Research center :
Interdisciplinary Centre for Security, Reliability and Trust (SnT) > SerVal - Security, Reasoning & Validation
Disciplines :
Computer science
Author, co-author :
ZEYEN, Olivier ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Language :
English
Title :
Scaling up Uniform Random Sampling
Defense date :
23 September 2025
Institution :
Unilu - University of Luxembourg [Faculty of Science, Technology and Medicine], Luxembourg, Luxembourg
Degree :
Docteur en Informatique (DIP_DOC_0006_B)
Promotor :
CORDY, Maxime  ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
President :
PAPADAKIS, Michail ;  University of Luxembourg > Interdisciplinary Centre for Security, Reliability and Trust (SNT) > SerVal
Jury member :
NAVET, Nicolas ;  University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Computer Science (DCS)
Lagniez, Jean-Marie;  Université d'Artois
Perrouin, Gilles;  UNamur - Université de Namur
FnR Project :
FNR15077233 - Scaling Up Variability - Scaling Up Variability, 2020 (01/09/2021-31/08/2025) - Maxime Cordy
Funders :
FNR - Fonds National de la Recherche Luxembourg
Funding number :
AFR Grant 17047437
Funding text :
This research was funded in whole or in part by the Luxembourg National Research Fund (FNR). Maxime Cordy and Olivier Zeyen are supported by FNR Luxembourg (grants INTER/FNRS/20/15077233/Scaling Up Variability/Cordy, C23/IS/18177547/VARIANCE, and AFR Grant 17047437)
Available on ORBilu :
since 23 October 2025

Statistics


Number of views
54 (7 by Unilu)
Number of downloads
25 (2 by Unilu)

Bibliography


Similar publications



Contact ORBilu