![]() Baraud, Yannick ![]() ![]() ![]() E-print/Working paper (2022) We solve the problem of estimating the distribution of presumed i.i.d.\ observations for the total variation loss. Our approach is based on density models and is versatile enough to cope with many ... [more ▼] We solve the problem of estimating the distribution of presumed i.i.d.\ observations for the total variation loss. Our approach is based on density models and is versatile enough to cope with many different ones, including some density models for which the Maximum Likelihood Estimator (MLE for short) does not exist. We mainly illustrate the properties of our estimator on models of densities on the line that satisfy a shape constraint. We show that it possesses some similar optimality properties, with regard to some global rates of convergence, as the MLE does when it exists. It also enjoys some adaptation properties with respect to some specific target densities in the model for which our estimator is proven to converge at parametric rate. More important is the fact that our estimator is robust, not only with respect to model misspecification, but also to contamination, the presence of outliers among the dataset and the equidistribution assumption. This means that the estimator performs almost as well as if the data were i.i.d.\ with density $p$ in a situation where these data are only independent and most of their marginals are close enough in total variation to a distribution with density $p$. {We also show that our estimator converges to the average density of the data, when this density belongs to the model, even when none of the marginal densities belongs to it}. Our main result on the risk of the estimator takes the form of an exponential deviation inequality which is non-asymptotic and involves explicit numerical constants. We deduce from it several global rates of convergence, including some bounds for the minimax L1-risks over the sets of concave and log-concave densities. These bounds derive from some specific results on the approximation of densities which are monotone, convex, concave and log-concave. Such results may be of independent interest. [less ▲] Detailed reference viewed: 103 (20 UL)![]() Baraud, Yannick ![]() E-print/Working paper (2021) In the Bayes paradigm and for a given loss function, we propose the construction of a new type of posterior distributions, that extends the classical Bayes one, for estimating the law of an $n$-sample ... [more ▼] In the Bayes paradigm and for a given loss function, we propose the construction of a new type of posterior distributions, that extends the classical Bayes one, for estimating the law of an $n$-sample. The loss functions we have in mind are based on the total variation and Hellinger distances as well as some L_j-ones. We prove that, with a probability close to one, this new posterior distribution concentrates its mass in a neighbourhood of the law of the data, for the chosen loss function, provided that this law belongs to the support of the prior or, at least, lies close enough to it. We therefore establish that the new posterior distribution enjoys some robustness properties with respect to a possible misspecification of the prior, or more precisely, its support. For the total variation and squared Hellinger losses, we also show that the posterior distribution keeps its concentration properties when the data are only independent, hence not necessarily i.i.d., provided that most of their marginals or the average of these are close enough to some probability distribution around which the prior puts enough mass. The posterior distribution is therefore also stable with respect to the equidistribution assumption. We illustrate these results by several applications. We consider the problems of estimating a location parameter or both the location and the scale of a density in a nonparametric framework. Finally, we also tackle the problem of estimating a density, with the squared Hellinger loss, in a high-dimensional parametric model under some sparsity conditions. The results established in this paper are non-asymptotic and provide, as much as possible, explicit constants. [less ▲] Detailed reference viewed: 123 (14 UL)![]() Baraud, Yannick ![]() in Probability Theory and Related Fields (2021), 180(3), 799-846 We consider the problem of estimating the joint distribution of n independent random variables. Given a loss function and a family of candidate probabilities, that we shall call a model, we aim at ... [more ▼] We consider the problem of estimating the joint distribution of n independent random variables. Given a loss function and a family of candidate probabilities, that we shall call a model, we aim at designing an estimator with values in our model that possesses good estimation properties not only when the distribution of the data belongs to the model but also when it lies close enough to it. The losses we have in mind are the total variation, Hellinger, Wasserstein and L_p-distances to name a few. We show that the risk of our estimator can be bounded by the sum of an approximation term that accounts for the loss between the true distribution and the model and a complexity term that corresponds to the bound we would get if this distribution did belong to the model. Our results hold under mild assumptions on the true distribution of the data and are based on exponential deviation inequalities that are non-asymptotic and involve explicit constants. Interestingly, when the model reduces to two distinct probabilities, our procedure results in a robust test whose errors of first and second kinds only depend on the losses between the true distribution and the two tested probabilities. [less ▲] Detailed reference viewed: 334 (76 UL)![]() Baraud, Yannick ![]() ![]() E-print/Working paper (2020) Detailed reference viewed: 309 (73 UL)![]() Baraud, Yannick ![]() ![]() ![]() E-print/Working paper (2020) The aim of this paper is to provide a confidence interval on the number of infected persons by COVID-19 within the population from the number of deaths reported in the hospitals and the mortality rate ... [more ▼] The aim of this paper is to provide a confidence interval on the number of infected persons by COVID-19 within the population from the number of deaths reported in the hospitals and the mortality rate (that is assumed to be known). [less ▲] Detailed reference viewed: 280 (32 UL)![]() Baraud, Yannick ![]() in Annals of Statistics (2020) We observe n independent random variables with joint distribution P and pretend that they are i.i.d. with some common density s (with respect to a known measure μ) that we wish to estimate. We consider a ... [more ▼] We observe n independent random variables with joint distribution P and pretend that they are i.i.d. with some common density s (with respect to a known measure μ) that we wish to estimate. We consider a density model S for s that we endow with a prior distribution π (with support in S) and build a robust alternative to the classical Bayes posterior distribution which possesses similar concentration properties around s whenever the data are truly i.i.d. and their density s belongs to the model S. Furthermore, in this case, the Hellinger distance between the classical and the robust posterior distributions tends to 0, as the number of observations tends to infinity, under suitable assumptions on the model and the prior. However, unlike what happens with the classical Bayes posterior distribution, we show that the concentration properties of this new posterior distribution are still preserved when the model is misspecified or when the data are not i.i.d. but the marginal densities of their joint distribution are close enough in Hellinger distance to the model S. [less ▲] Detailed reference viewed: 215 (49 UL)![]() Baraud, Yannick ![]() in Journal de la Société Française de Statistique (2019), 160(3), Detailed reference viewed: 200 (28 UL)![]() Baraud, Yannick ![]() in Annals of Statistics (2018), 46(6B), 3767--3804 Detailed reference viewed: 231 (41 UL)![]() Baraud, Yannick ![]() in Inventiones Mathematicae (2017), 207(2), 425--517 Detailed reference viewed: 313 (34 UL)![]() Baraud, Yannick ![]() in Journal de la Société Française de Statistique (2017), 158(3), 1--26 Detailed reference viewed: 89 (14 UL)![]() Baraud, Yannick ![]() in Electronic Journal of Statistics (2016), 10(2), 1709--1728 Detailed reference viewed: 134 (20 UL)![]() Baraud, Yannick ![]() in Stochastic Processes and Their Applications (2016), 126(12), 3888--3912 Detailed reference viewed: 133 (25 UL)![]() Baraud, Yannick ![]() in Ann. Inst. Henri Poincaré Probab. Stat. (2014), 50(1), 285--314 Detailed reference viewed: 98 (18 UL)![]() Baraud, Yannick ![]() in Ann. Inst. Henri Poincaré Probab. Stat. (2014), 50(3), 1092--1119 Detailed reference viewed: 101 (9 UL)![]() Baraud, Yannick ![]() in Confluentes Mathematici (2013), 5(1), 3--21 Detailed reference viewed: 98 (9 UL)![]() Baraud, Yannick ![]() in Probab. Theory Related Fields (2011), 151(1-2), 353--401 Detailed reference viewed: 95 (11 UL)![]() Baraud, Yannick ![]() in Bernoulli (2010), 16(4), 1064--1085 Detailed reference viewed: 153 (8 UL)![]() Baraud, Yannick ![]() in Annals of Statistics (2009), 37(2), 630--672 Detailed reference viewed: 100 (10 UL)![]() Baraud, Yannick ![]() in Probab. Theory Related Fields (2009), 143(1-2), 239--284 Detailed reference viewed: 103 (7 UL)![]() Baraud, Yannick ![]() in Annals of Statistics (2005), 33(1), 214--257 Detailed reference viewed: 99 (6 UL) |
||