approximate Markov chain Monte Carlo; control variates; goodness-of-fit testing; likelihood ratio; maximum likelihood estimator; prior sensitivity; sample quality; Stein’s method; variational inference; Statistics and Probability; Mathematics (all); Statistics, Probability and Uncertainty; prior sensitiv-ity; General Mathematics
Abstract :
[en] Stein’s method compares probability distributions through the study of a class of linear operators called Stein operators.While mainly studied in probability and used to underpin theoretical statistics, Stein’s method has led to significant advances in computational statistics in recent years. The goal of this survey is to bring together some of these recent developments, and in doing so, to stimulate further research into the successful field of Stein’s method and statistics. The topics we discuss include tools to benchmark and compare sampling methods such as approximate Markov chain Monte Carlo, deterministic alternatives to sampling methods, control variate techniques, parameter estimation and goodness-of-fit testing
Disciplines :
Mathematics
Author, co-author :
Anastasiou, Andreas; Department of Mathematics and Statistics, University of Cyprus, Nicosia, Cyprus
Barp, Alessandro; University of Cambridge, Engineering Dept, Cambridge, United Kingdom
Briol, François-Xavier; University College London, London, United Kingdom
Ebner, Bruno; Institute of Stochastics, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
Gaunt, Robert E.; The University of Manchester, Manchester, United Kingdom
Ghaderinezhad, Fatemeh; The Gradient Building, Brussels, Belgium
Gorham, Jackson; Data Scientist, Whisper.ai, Inc., United States
Gretton, Arthur; Gatsby Computational Neuroscience Unit, University College London, Sainsbury Wellcome Centre, London, United Kingdom
LEY, Christophe ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Mathematics (DMATH)
Liu, Qiang; The University of Texas at Austin, Austin, United States
Mackey, Lester; Microsoft Research New England, Cambridge, United States
Oates, Chris J.; Newcastle University, United Kingdom
Reinert, Gesine; University of Oxford, Department of Statistics, Oxford, United Kingdom
Swan, Yvik; Université Libre de Bruxelles, Department of Mathematics, Brussels, Belgium
Stein’s Method Meets Computational Statistics: A Review of Some Recent Developments
Publication date :
2023
Journal title :
Statistical Science: A Review Journal of the Institute of Mathematical Statistics
ISSN :
0883-4237
Publisher :
Institute of Mathematical Statistics
Volume :
38
Issue :
1
Pages :
120 - 139
Peer reviewed :
Peer reviewed
Funding text :
AA was supported by a start-up grant from the University of Cyprus. AB was supported by the UK Defence Science and Technology Laboratory (Dstl) and Engineering and Physical Research Council (EPSRC) under the grant EP/R018413/2. FXB and CJO were supported by the Lloyds Register Foundation Programme on Data-Centric Engineering and The Alan Turing Institute under the EPSRC grant EP/N510129/1. AG was supported by the Gatsby Charitable Foundation. RG was supported by a Dame Kathleen Ollerenshaw Research Fellowship. FG and CL were supported by a BOF Starting Grant of Ghent University. QL was supported in part by NSF CAREER No. 1846421. GR was supported in part by EP/T018445/1 and EP/R018472/1. YS was supported in part by CDR/OL J.0197.20 from FRS-FNRS.
[1] Ahn, S., Korattikara, A. and Welling, M. (2012). Bayesian posterior sampling via stochastic gradient Fisher scoring. In International Conference on Machine Learning (ICML) 1591-1598.
[2] Allison, J. S., Betsch, S., Ebner, B. and Visagie, I. J. H. (2022). On testing the adequacy of the inverse Gaussian distribution. Mathematics 10 350.
[3] ANASTASIOU, A. (2017). Bounds for the normal approximation of the maximum likelihood estimator from m-dependent random variables. Statist. Probab. Lett. 129 171-181. MR3688530 https://doi.org/10.1016/j.spl.2017.04.022
[4] ANASTASIOU, A. and GAUNT, R. E. (2021). Wasserstein distance error bounds for the multivariate normal approximation of the maximum likelihood estimator. Electron. J. Stat. 15 57585810. MR4355697 https://doi.org/10.1214/21-ejs1920
[5] ANASTASIOU, A. and Ley, C. (2017). Bounds for the asymptotic normality of the maximum likelihood estimator using the delta method. ALEA Lat. Am. J. Probab. Math. Stat. 14 153171. MR3622464
[6] ANASTASIOU, A. and REINERT, G. (2017). Bounds for the normal approximation of the maximum likelihood estimator. Bernoulli 23 191-218. MR3556771 https://doi.org/10.3150/15-BEJ741
[7] ANASTASIOU, A. and REINERT, G. (2020). Bounds for the asymptotic distribution of the likelihood ratio. Ann. Appl. Probab. 30 608-643. MR4108117 https://doi.org/10.1214/19-AAP1510
[8] ANDRADÓTTIR, S., Heyman, D. P. and Ott, T. J. (1993). Variance reduction through smoothing and control variates for Markov chain simulations. ACM Trans. Model. Comput. Simul. 3 167-189.
[9] ARONSZAJN, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337-404. MR0051437 https://doi.org/10.2307/1990404
[10] ARRAS, B. and HOUDRÉ, C. (2019). On Stein’s Method for Infinitely Divisible Laws with Finite First Moment. SpringerBriefs in Probability and Mathematical Statistics. Springer, Cham. MR3931309
[11] Assaraf, R. and CAFFAREL, M. (1999). Zero-variance principle for Monte Carlo algorithms. Phys. Rev. Lett. 83 4682.
[12] Banerjee, T., Liu, Q., Mukherjee, G. and Sun, W. (2021). A general framework for empirical Bayes estimation in discrete linear exponential family. J. Mach. Learn. Res. 22 67. MR4253760
[13] Barbour, A. D. (1988). Stein’s method and Poisson process convergence. J. Appl. Probab. 25A 175-184.
[14] BARBOUR, A. D. (1990). Stein’s method for diffusion approximations. Probab. Theory Related Fields 84 297-322. MR1035659 https://doi.org/10.1007/BF01197887
[15] Barbour, A. D. and CHEN, L. H. Y. (2014). Stein’s (magic) method. ArXiv preprint. Available at arXiv:1411.1179.
[16] Barbour, A. D., Holst, L. and Janson, S. (1992). Poisson Approximation. Oxford Studies in Probability 2. The Clarendon Press, New York. MR1163825
[17] BARBOUR, A. D. and XIA, A. (1999). Poisson perturbations. ESAIM Probab. Stat. 3 131-150. MR1716120 https://doi.org/10.1051/ps:1999106
[18] Baringhaus, L. and Henze, N. (1991). A class of consistent tests for exponentiality based on the empirical Laplace transform. Ann. Inst. Statist. Math. 43 551-564. MR1143640 https://doi.org/10.1007/BF00053372
[19] Baringhaus, L. and Henze, N. (1992). A goodness of fit test for the Poisson distribution based on the empirical generating function. Statist. Probab. Lett. 13 269-274. MR1160747 https://doi.org/10.1016/0167-7152(92)90033-2
[20] Barp, A. A. (2020). The Bracket Geometry of Statistics Ph.D. thesis Imperial College London.
[21] Barp, A. A., Briol, F. X., Duncan, A. B., Girolami, M. and MACKEY, L. (2019). Minimum Stein discrepancy estimators. In Advances on Neural Information Processing Systems (NeurIPS) 12964-12976.
[22] Barp, A. A., Oates, C., Porcu, E. and Girolami, M. (2018). A Riemannian-Stein kernel method. ArXiv preprint. Available at arXiv:1810.04946.
[23] Belomestny, D., Iosipoi, L., Moulines, E., Naumov, A. and SAMSONOV, S. (2020). Variance reduction for Markov chains with application to MCMC. Stat. Comput. 30 973-997. MR4108687 https://doi.org/10.1007/s11222-020-09931-z
[24] Belomestny, D., Iosipoi, L. and Zhivotovskiy, N.(2017). Variance reduction via empirical variance minimization: Convergence and complexity. ArXiv preprint. Available at arXiv:1712.04667.
[25] Belomestny, D., Moulines, E., Shagadatov, N. and URUSOV, M. (2019). Variance reduction for MCMC methods via martingale representations. ArXiv preprint. Available at arXiv:1903.07373.
[26] Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer Academic, Boston, MA. MR2239907 https://doi.org/10.1007/978-1-4419-9096-9
[27] BETSCH, S. and EBNER, B. (2019). A new characterization of the Gamma distribution and associated goodness-of-fit tests. Metrika 82 779-806. MR4008662 https://doi.org/10.1007/s00184-019-00708-7
[28] BETSCH, S. and EBNER, B. (2020). Testing normality via a distributional fixed point property in the Stein characterization. TEST 29 105-138. MR4063385 https://doi.org/10.1007/s11749-019-00630-0
[29] BETSCH, S. and Ebner, B. (2021). Fixed point characterizations of continuous univariate probability distributions and their applications. Ann. Inst. Statist. Math. 73 31-59. MR4205241 https://doi.org/10.1007/s10463-019-00735-1
[30] BETSCH, S., Ebner, B. and Klar, B. (2021). Minimum Lq-distance estimators for non-normalized parametric models. Canad. J. Statist. 49 514-548. MR4267931 https://doi.org/10.1002/cjs.11574
[31] BETSCH, S., Ebner, B. and Nestmann, F. (2022). Characterizations of non-normalized discrete probability distributions and their application in statistics. Electron. J. Stat. 16 13031329. MR4381061 https://doi.org/10.1214/22-ejs1983
[32] Carmeli, C., de Vito, E., Toigo, A. and Umanità, V. (2010). Vector valued reproducing kernel Hilbert spaces and universality. Anal. Appl. (Singap.) 8 19-61. MR2603770 https://doi.org/10.1142/S0219530510001503
[33] CHATTERJEE, S. (2014). A short survey of Stein’s method. In Proceedings of the International Congress of Mathematicians— Seoul 2014. Vol. IV1-24. Kyung Moon Sa, Seoul. MR3727600
[34] Chen, C., Zhang, R., Wang, W., Li, B. and Chen, L. (2018). A unified particle-optimization framework for scalable Bayesian sampling. In Uncertainty in Artificial Intelligence (UAI).
[35] CHEN, L. H. and RÖLLIN, A. (2010). Stein couplings for normal approximation. ArXiv preprint. Available at arXiv:1003.6039.
[36] CHEN, L. H. Y. (1975). Poisson approximation for dependent trials. Ann. Probab. 3 534-545. MR0428387 https://doi.org/10.1214/aop/1176996359
[37] Chen, L. H. Y., Goldstein, L. and Shao, Q.-M. (2011). Normal Approximation by Stein’s Method. Probability and Its Applications (New York). Springer, Heidelberg. MR2732624 https://doi.org/10.1007/978-3-642-15007-4
[38] Chen, P., Wu, K., Chen, J., O’Leary-Roseberry, T. and GHATTAS, o. (2019). Projected Stein variational Newton: A fast and scalable Bayesian inference method in high dimensions. In Advances on Neural Information Processing Systems (NeurIPS) 15130-15139.
[39] Chen, W. Y., Barp, A. A., Briol, F.-X., Gorham, J., Girolami, M., Mackey, L. and Oates, C. J. (2019). Stein point Markov chain Monte Carlo. In International Conference on Machine Learning (ICML) 1011-1021.
[40] Chen, W. Y., Mackey, L., Gorham, J., Briol, F.-X. and OATES, C. J. (2018). Stein points. In International Conference on Machine Learning (ICML) 844-853.
[41] Chewi, S., Gouic, T. L., Lu, C., Maunu, T. and Rigollet, P. (2020). SVGD as a kernelized Wasserstein gradient flow of the chi-squared divergence. In Advances on Neural Information Processing Systems (NeurIPS).
[42] Chwialkowski, K., Strathmann, H. and Gretton, A. (2016). A kernel test of goodness of fit. in International Conference on Machine Learning (ICML) 2606-2615.
[43] Courtade, T. A., Fathi, M. and Pananjady, A. (2019). Existence of Stein kernels under a spectral gap, and discrepancy bounds. Ann. Inst. Henri Poincaré Probab. Stat. 55 777-790. MR3949953 https://doi.org/10.1214/18-aihp898
[44] DELLAPORTAS, P. and Kontoyiannis, I. (2012). Control variates for estimation based on reversible Markov chain Monte Carlo samplers. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 133-161. MR2885843 https://doi.org/10.1111/j.1467-9868. 2011.01000.x
[45] Detommaso, G., Cui, T., Marzouk, Y., Scheichl, R. and SPANTINI, A. (2018). A Stein variational Newton method. In Advances on Neural Information Processing Systems (NeurIPS) 9169-9179.
[46] DIACONIS, P. and FREEDMAN, D. (1986). On the consistency of Bayes estimates (with a discussion and a rejoinder by the authors). Ann. Statist. 14 1-67. MR0829555 https://doi.org/10.1214/aos/1176349830
[47] DIACONIS, P. and HOLMES, S., eds. (2004). Stein’s Method: Expository Lectures and Applications. Institute of Mathematical Statistics Lecture Notes—Monograph Series 46.
[48] DÖRR, P., EBNER, B. and Henze, N. (2021). A new test of multivariate normality by a double estimation in a characterizing PDE. Metrika 84 401-427. MR4233599 https://doi.org/10.1007/s00184-020-00795-x
[49] DUNCAN, A., NÜSKEN, N. and Szpruch, L. (2019). On the geometry of Stein variational gradient descent. ArXiv preprint. Available at arXiv:1912.00894.
[50] EBNER, B. (2021). On combining the zero bias transform and the empirical characteristic function to test normality. ALEA Lat. Am. J. Probab. Math. Stat. 18 1029-1045. MR4282180 https://doi.org/10.30757/alea.v18-38
[51] Ebner, B. and Henze, N. (2020). Tests for multivariate normality—a critical review with emphasis on weighted L2-statistics. TEST 29 845-892. MR4182841 https://doi.org/10.1007/s11749-020-00740-0
[52] Erdogdu, M. A., Mackey, L. and Shamir, O. (2018). Global non-convex optimization with discretized diffusions. In Advances on Neural Information Processing Systems (NeurIPS) 9694-9703.
[53] Fang, X., SHAO, Q.-M. and XU, L. (2019). Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula. Probab. Theory Related Fields 174 945-979. MR3980309 https://doi.org/10.1007/s00440-018-0874-5
[54] Fathi, M., Goldstein, L., Reinert, G. and Saumard, A. (2020). Relaxing the Gaussian assumption in shrinkage and SuRE in high dimension. ArXiv preprint. Available at arXiv:2004.01378.
[55] Feng, Y., Wang, D. and LIU, Q. (2017). Learning to draw samples with amortized Stein variational gradient descent. in Uncertainty in Artificial Intelligence (UAI).
[56] Fernández, T., Rivera, N., Xu, W. and Gretton, A. (2020). Kernelized Stein discrepancy tests of goodness-of-fit for time-to-event data. in International Conference on Machine Learning (ICML).
[57] Fisher, M. A., Nolan, T. H., Graham, M. M., Prangle, D. and OATES, C. J. (2021). Measure transport with kernel Stein discrepancy. in International Conference on Artificial Intelligence and Statistics (AISTATS).
[58] Gaunt, R. E. (2017). On Stein’s method for products of normal random variables and zero bias couplings. Bernoulli 23 3311-3345. MR3654808 https://doi.org/10.3150/16-BEJ848
[59] Gaunt, R. E. (2022). Bounds for the chi-square approximation of the power divergence family of statistics. J. Appl. Probab.
[60] Gaunt, R. E., Pickett, A. M. and Reinert, G. (2017). Chi-square approximation by Stein’s method with application to Pearson’s statistic. Ann. Appl. Probab. 27 720-756. MR3655852 https://doi.org/10.1214/16-AAP1213
[61] Gaunt, R. E. and Reinert, G. (2021). Bounds for the chisquare approximation of Friedman’s statistic by Stein’s method. ArXiv preprint. Available at arXiv:2111.00949.
[62] Ghaderinezhad, F. and Ley, C. (2019). Quantification of the impact of priors in Bayesian statistics via Stein’s method. Statist. Probab. Lett. 146 206-212. MR3884714 https://doi.org/10.1016/j.spl.2018.11.012
[63] Gibbs, A. L. and Su, F. E. (2002). On choosing and bounding probability metrics. Int. Stat. Rev. 70 419-435.
[64] Goldstein, L. and Reinert, G. (2005). Distributional transformations, orthogonal polynomials, and Stein characterizations. J. Theoret. Probab. 18 237-260. MR2132278 https://doi.org/10.1007/s10959-004-2602-6
[65] Goldstein, L. and Reinert, G. (2013). Stein’s method for the beta distribution and the Pólya-Eggenberger urn. J. Appl. Probab. 50 1187-1205. MR3161381 https://doi.org/10.1239/jap/1389370107
[66] Gong, C., Peng, J. and LIU, Q. (2019). Quantile Stein variational gradient descent for parallel Bayesian optimization. In International Conference on Machine Learning (ICML) 23472356.
[67] Gong, W., Li, Y. and HernáNDEZ-Lobato, J. M. (2021). Sliced kernelized Stein discrepancy. In International Conference on Learning Representations (ICLR).
[68] Gorham, J., Duncan, A. B., Vollmer, S. J. and Mackey, L. (2019). Measuring sample quality with diffusions. Ann. Appl. Probab. 29 2884-2928. MR4019878 https://doi.org/10.1214/19-AAP1467
[69] Gorham, J. and MACKEY, L. (2015). Measuring sample quality with Stein’s method. in Advances on Neural Information Processing Systems (NeurIPS) 226-234. Curran Associates, Red Hook.
[70] Gorham, J. and MACKEY, L. (2017). Measuring sample quality with kernels. in International Conference on Machine Learning (ICML) 1292-1301.
[71] Gorham, J., Raj, A. and MACKEY, L. (2020). Stochastic Stein discrepancies. in Advances on Neural Information Processing Systems (NeurIPS).
[72] GÖTZE, F. (1991). On the rate of convergence in the multivariate CLT. Ann. Probab. 19 724-739. MR1106283
[73] Grathwohl, W., Wang, K. C., Jacobsen, J. H., Duve-NAUD, D. and Zemel, R. (2020). Learning the Stein discrepancy for training and evaluating energy-based models without sampling. in International Conference on Machine Learning 9485-9499.
[74] Gretton, A., Borgwardt, K. M., Rasch, M., SCHÖLKOPF, B. and SMOLA, A. J. (2006). A kernel method for the two-sample-problem. in Advances on Neural Information Processing Systems (NeurIPS) 513-520.
[75] Gretton, A., Borgwardt, K. M., Rasch, M. J., SCHÖLKOPF, B. and SMOLA, A. (2012). A kernel two-sample test. J. Mach. Learn. Res. 13 723-773. MR2913716
[76] Haarnoja, T., tang, H., Abbeel, P. and Levine, S. (2017). Reinforcement learning with deep energy-based policies. in International Conference on Machine Learning (ICML) 1352-1361.
[77] HAN, J. and LIU, Q. (2017). Stein variational adaptive importance sampling. In Uncertainty in Artificial Intelligence (UAI).
[78] Han, J. and LIU, Q. (2018). Stein variational gradient descent without gradient. In International Conference on Machine Learning (ICML) 1900-1908.
[79] Henderson, S. G. and SIMON, B. (2004). Adaptive simulation using perfect control variates. J. Appl. Probab. 41 859-876. MR2074828 https://doi.org/10.1017/s0021900200020593
[80] Henze, N., Meintanis, S. G. and Ebner, B. (2012). Goodness-of-fit tests for the gamma distribution based on the empirical Laplace transform. Comm. Statist. Theory Methods 41 1543-1556. MR3003807 https://doi.org/10.1080/03610926. 2010.542851
[81] Henze, N. and Visagie, J. (2020). Testing for normality in any dimension based on a partial differential equation involving the moment generating function. Ann. Inst. Statist. Math. 72 1109-1136. MR4137748 https://doi.org/10.1007/s10463-019-00720-8
[82] Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Comput. 14 1771-1800.
[83] Hodgkinson, L., Salomone, R. and Roosta, F. (2020). The reproducing Stein kernel approach for post-hoc corrected sampling. ArXiv preprint. Available at arXiv:2001.09266.
[84] HOLMES, S. (2004). Stein’s method for birth and death chains. In Stein’s Method: Expository Lectures and Applications. Institute of Mathematical Statistics Lecture Notes—Monograph Series 46 45-67. IMS, Beachwood, OH. MR2118602
[85] HOLMES, S. and Reinert, G. (2004). Stein’s method for the bootstrap. in Stein’s Method: Expository Lectures and Applications. Institute of Mathematical Statistics Lecture Notes—Monograph Series 46 95-136. iMS, Beachwood, oH. MR2118605
[86] Hu, T., Chen, Z., Sun, H., Bai, J., Ye, M. and Cheng, G. (2018). Stein neural sampler. ArXiv preprint. Available at arXiv:1810.03545.
[87] Huggins, J. H. and MACKEY, L. (2018). Random feature Stein discrepancies. in Advances on Neural Information Processing Systems (NeurIPS) 1899-1909.
[88] HyvàRINEN, A. (2005). Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6 695-709. MR2249836
[89] JAMES, W. and STEIN, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 361-379. Univ. California Press, Berkeley, CA. MR0133191
[90] Jitkrittum, W., Xu, W., Szabo, Z., Fukumizu, K. and Gretton, A. (2017). A linear-time kernel goodness-of-fit test. in Advances on Neural Information Processing Systems (NeurIPS) 261-270.
[91] Key, O., Fernandez, T., Gretton, A. and Briol, F.-X. (2021). Composite goodness-of-fit tests with kernels. in NeurIPS 2021 Workshop Your Model Is Wrong: Robustness and Misspecification in Probabilistic Modeling. Available at arXiv:2111.10275.
[92] Kim, T., Yoon, J., Dia, O., Kim, S., Bengio, Y. and Ahn, S. (2018). Bayesian model-agnostic meta-learning. In Advances on Neural Information Processing Systems (NeurIPS) 73327342.
[93] Korattikara, A., Chen, Y. and Welling, M. (2014). Austerity in MCMC land: Cutting the Metropolis-Hastings budget. in Proceedings of International Conference on Machine Learning (ICML). ICML’14.
[94] Korba, A., Salim, A., Arbel, M., Luise, G. and Gretton, A. (2020). A non-asymptotic analysis for Stein variational gradient descent. in Advances in Neural Information Processing Systems (NeurIPS) 33.
[95] Kumar Kattumannil, S. (2009). On Stein’s identity and its application. Statist. Probab. Lett. 79 1444-1449. MR2536504 https://doi.org/10.1016/j.spl.2009.03.021
[96] LEDOUX, M., Nourdin, I. and Peccati, G. (2015). Stein’s method, logarithmic Sobolev and transport inequalities. Geom. Funct. Anal. 25 256-306. MR3320893 https://doi.org/10.1007/s00039-015-0312-0
[97] LEUCHT, A. and NEUMANN, M. H. (2013). Dependent wild bootstrap for degenerate U and V-statistics. J. Multivariate Anal. 117 257-280. MR3053547 https://doi.org/10.1016/j.jmva.2013.03.003
[98] Ley, C., Reinert, G. and SWAN, Y. (2017). Stein’s method for comparison of univariate distributions. Probab. Surv. 14 152. MR3595350 https://doi.org/10.1214/16-PS278
[99] Ley, C., Reinert, G. and SWAN, Y. (2017). Distances between nested densities and a measure of the impact of the prior in Bayesian statistics. Ann. Appl. Probab. 27 216-241. MR3619787 https://doi.org/10.1214/16-AAP1202
[100] Ley, C. and SWAN, Y. (2016). Parametric Stein operators and variance bounds. Braz. J. Probab. Stat. 30 171-195. MR3481100 https://doi.org/10.1214/14-BJPS271
[101] LI, L., Li, Y., Liu, J.-G., Liu, Z. and LU, J. (2020). A stochastic version of Stein variational gradient descent for efficient sampling. Commun. Appl. Math. Comput. Sci. 15 37-63. MR4113783 https://doi.org/10.2140/camcos.2020.15.37
[102] Lippert, R. A., Huang, H. and Waterman, M. S. (2002). Distributional regimes for the number of k-word matches between two random sequences. Proc. Natl. Acad. Sci. USA 99 13980-13989. MR1944413 https://doi.org/10.1073/pnas. 202468099
[103] Liu, A., Liang, Y. and Van den Broeck, G. (2020). Off-policy deep reinforcement learning with analogous disentangled exploration. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
[104] LIU, C. and ZHU, J. (2018). Riemannian Stein variational gradient descent for Bayesian inference. In AAAI Conference on Artificial Intelligence 3627-3634.
[105] Liu, C., Zhuo, J., Cheng, P., Zhang, R. and Zhu, J. (2019). Understanding and accelerating particle-based variational inference. In International Conference on Machine Learning (ICML) 4082-4092.
[106] Liu, H., Feng, Y., Mao, Y., Zhou, D., Peng, J. and Liu, Q. (2018). Action-dependent control variates for policy optimization via Stein’s identity. in International Conference on Learning Representations (ICLR).
[107] LIU, Q. (2017). Stein variational gradient descent as gradient flow. In Advances on Neural Information Processing Systems (NeurIPS) 3115-3123.
[108] LIU, Q., Lee, J. and JORDAN, M. (2016). A kernelized Stein discrepancy for goodness-of-fit tests. In International Conference on Machine Learning (ICML) 276-284.
[109] LIU, Q. and Lee, J. D. (2017). Black-box importance sampling. In International Conference on Artificial Intelligence and Statistics (AISTATS) 952-961.
[110] LIU, Q., Lee, J. D. and JORDAN, M. I. (2016). A kernelized Stein discrepancy for goodness-of-fit tests and model evaluation. In International Conference on Machine Learning (ICML) 276-284.
[111] LIU, Q. and WANG, D. (2016). Stein variational gradient descent: A general purpose Bayesian inference algorithm. In Advances on Neural Information Processing Systems (NeurIPS) 2370-2378.
[112] LIU, Q. and WANG, D. (2018). Stein variational gradient descent as moment matching. In Advances on Neural Information Processing Systems (NeurIPS) 8854-8863.
[113] Liu, S., Kanamori, T., Jitkrittum, W. and Chen, Y. (2019). Fisher efficient inference of intractable models. In Advances on Neural Information Processing Systems (NeurIPS) 8793-8803.
[114] Liu, Y., Ramachandran, P., Liu, Q. and Peng, J. (2017). Stein variational policy gradient. In Uncertainty in Artificial Intelligence (UAI).
[115] LU, J., LU, Y. and NOLEN, J. (2019). Scaling limit of the Stein variational gradient descent: The mean field regime. SIAM J. Math. Anal. 51 648-671. MR3919409 https://doi.org/10.1137/18M1187611
[116] Mackey, L. and GORHAM, J. (2016). Multivariate Stein factors for a class of strongly log-concave distributions. Electron. Commun. Probab. 21 56. MR3548768 https://doi.org/10.1214/16-ecp15
[117] Matsubara, T., Knoblauch, J., Briol, F. X. and Oates, C. J. (2021). Robust generalised Bayesian inference for intractable likelihoods. J. R. Stat. Soc. Ser. B. Stat. Methodol.. To appear. Available at arXiv:2104.07359.
[118] Matsubara, T., Knoblauch, J., Briol, F. X. and Oates, C. J. (2022). Generalised Bayesian inference for discrete intractable likelihood. Available at arXiv:2206.08420.
[119] Meyn, S. P. and TWEEDIE, R. L. (1993). Markov Chains and Stochastic Stability. Communications and Control Engineering Series. Springer, London. MR1287609 https://doi.org/10.1007/978-1-4471-3267-7
[120] MIJATOVIC, A. and Vogrinc, J. (2018). On the Poisson equation for Metropolis-Hastings chains. Bernoulli 24 2401-2428. MR3757533 https://doi.org/10.3150/17-BEJ932
[121] MIJOULE, G., Reinert, G. and SWAN, Y. (2021). Stein’s density method for multivariate continuous distributions. ArXiv preprint. Available at arXiv:2101.05079.
[122] MIRA, A., SOLGI, R. and IMPARATO, D. (2013). Zero variance Markov chain Monte Carlo for Bayesian estimators. Stat. Comput. 23 653-662. MR3094805 https://doi.org/10.1007/s11222-012-9344-6
[123] MÜLLER, A. (1997). Integral probability metrics and their generating classes of functions. Adv. in Appl. Probab. 29 429-443. MR1450938 https://doi.org/10.2307/1428011
[124] NOURDIN, I. and PECCATI, G. (2012). Normal Approximations with Malliavin Calculus: From Stein’s Method to Universality. Cambridge Tracts in Mathematics 192. Cambridge Univ. Press, Cambridge. MR2962301 https://doi.org/10.1017/CBO9781139084659
[125] NÜSKEN, N. and Renger, D. (2021). Stein variational gradient descent: Many-particle and long-time asymptotics. ArXiv preprint. Available at arXiv:2102.12956.
[126] Oates, C. J., Cockayne, J., Briol, F.-X. and Giro-LAMI, M. (2019). Convergence rates for a class of estimators based on Stein’s method. Bernoulli 25 1141-1159. MR3920368 https://doi.org/10.3150/17-bej1016
[127] Oates, C. J., Girolami, M. and Chopin, N. (2017). Control functionals for Monte Carlo integration. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 695-718. MR3641403 https://doi.org/10.1111/rssb.12185
[128] Oates, C. J., Papamarkou, T. and Girolami, M. (2016). The controlled thermodynamic integral for Bayesian model evidence evaluation. J. Amer. Statist. Assoc. 111 634-645. MR3538693 https://doi.org/10.1080/01621459.2015.1021006
[129] OKSENDAL, B. (2013). Stochastic Differential Equations: An Introduction with Applications, 6th ed. Springer, Berlin.
[130] PU, Y., Gan, Z., Henao, R., Li, C., han, S. and CARIN, L. (2017). VAE learning via Stein variational gradient descent. In Advances on Neural Information Processing Systems (NeurIPS) 4236-4245.
[131] Rachev, S. T., Klebanov, L. B., Stoyanov, S. V. and Fabozzi, F. J. (2013). The Methods of Distances in the Theory of Probability and Statistics. Springer, New York. MR3024835 https://doi.org/10.1007/978-1-4614-4869-3
[132] Ranganath, R., Tran, D., Altosaar, J. and Blei, D. (2016). operator variational inference. In Advances on Neural Information Processing Systems (NeurIPS) 496-504.
[133] Reinert, G. (1998). Couplings for normal approximations with Stein’s method. In Microsurveys in Discrete Probability (Princeton, NJ, 1997). DIMACS Ser. Discrete Math. Theoret. Comput. Sci. 41 193-207. Amer. Math. Soc., Providence, Ri. MR1630415 https://doi.org/10.1089/cmb.1998.5.223
[134] REINERT, G. (2005). Three general approaches to Stein’s method. in An Introduction to Stein’s Method. Lect. Notes Ser. Inst. Math. Sci. Natl. Univ. Singap. 4 183-221. Singapore Univ. Press, Singapore. MR2235451 https://doi.org/10.1142/9789812567680_0004
[135] Reinert, G., Chew, D., Sun, F. and Waterman, M. S. (2009). Alignment-free sequence comparison. I. Statistics and power. J. Comput. Biol. 16 1615-1634. MR2578699 https://doi.org/10.1089/cmb.2009.0198
[136] REINERT, G. and ROSS, N. (2019). Approximating stationary distributions of fast mixing Glauber dynamics, with applications to exponential random graphs. Ann. Appl. Probab. 29 3201-3229. MR4019886 https://doi.org/10.1214/19-AAP1478
[137] Riabiz, M., Chen, W., Cockayne, J., Swietach, P., Niederer, S. A., Mackey, L. and Oates, C. (2020). Optimal thinning of MCMC output. ArXiv preprint. Available at arXiv:2005.03952.
[138] ROSS, N. (2011). Fundamentals of Stein’s method. Probab. Surv. 8 210-293. MR2861132 https://doi.org/10.1214/11-PS182
[139] SCHWARTZ, L. (1964). Sous-espaces hilbertiens d’espaces vec-toriels topologiques et noyaux associés (noyaux reproduisants). J. Anal. Math. 13 115-256. MR0179587 https://doi.org/10.1007/BF02786620
[140] SERFLING, R. J. (2009). Approximation Theorems of Mathematical Statistics 162. Wiley, New York.
[141] SHAO, Q.-M. (2005). An explicit Berry-Esseen bound for Student’s t-statistic via Stein’s method. In Stein’s Method andAp-plications. Lect. Notes Ser. Inst. Math. Sci. Natl. Univ. Sin-gap. 5 143-155. Singapore Univ. Press, Singapore. MR2205333 https://doi.org/10.1142/9789812567673_0009
[142] SHAO, Q.-M. (2010). Stein’s method, self-normalized limit theory and applications. In Proceedings of the International Congress of Mathematicians. Volume IV 2325-2350. Hindustan Book Agency, New Delhi. MR2827974
[143] Shao, Q.-M., ZHANG, K. and ZHOU, W.-X. (2016). Stein’s method for nonlinear statistics: A brief survey and recent progress. J. Statist. Plann. Inference 168 68-89. MR3412222 https://doi.org/10.1016/j.jspi.2015.06.008
[144] Si, S., Oates, C. J., Duncan, A. B., Carin, L. and Briol, F.-X. (2020). Scalable control variates for Monte Carlo methods via stochastic optimization. ArXiv preprint. Available at arXiv:2006.07487.
[145] Smola, A., Gretton, A., Song, L. and Schölkopf, B. (2007). A Hilbert space embedding for distributions. in International Conference on Algorithmic Learning Theory 13-31.
[146] Sohl-Dickstein, J., Battaglino, P. and DeWeese, M. R. (2011). Minimum probability flow learning. In International Conference on Machine Learning 905-912.
[147] South, L. F., Karvonen, T., Nemeth, C., Girolami, M. and OATES, C. (2020). Semi-exact control functionals from sard’s method. ArXiv preprint. Available at arXiv:2002.00033.
[148] South, L. F., Oates, C. J., Mira, A. and Drovandi, C. (2018). Regularised zero-variance control variates for highdimensional variance reduction. ArXiv preprint. Available at arXiv:1811.05073.
[149] STEIN, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954-1955, Vol. 1197-206. Univ. California Press, Berkeley-Los Angeles, CA. MR0084922
[150] STEIN, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif, 1970/1971), Vol. II: Probability Theory 583-602. MR0402873
[151] Stein, C. (1986). Approximate Computation of Expectations. Institute of Mathematical Statistics Lecture Notes—Monograph Series 7. IMS, Hayward, CA. MR0882007
[152] Stein, C., Diaconis, P., Holmes, S. and Reinert, G. (2004). Use of exchangeable pairs in the analysis of simulations. in Stein’s Method: Expository Lectures and Applications. Institute of Mathematical Statistics Lecture Notes—Monograph Series 46 1-26. iMS, Beachwood, oH. MR2118600 https://doi.org/10.1214/lnms/1196283797
[153] Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135-1151. MR0630098
[154] SUN, Z., Barp, A. and BRIOL, F.-X. (2021). Vector-valued control variates. Available at arXiv:2109.08944.
[155] Teymur, O., Gorham, J., Riabiz, M. and Oates, C. (2021). optimal quantisation of probability measures using maximum mean discrepancy. in International Conference on Artificial Intelligence and Statistics (AISTATS) 1027-1035.
[156] TIHOMIROV, A. N. (1980). Convergence rate in the central limit theorem for weakly dependent random variables. Teor. Veroyatn. Primen. 25 800-818. MR0595140
[157] WANG, D. and LIU, Q. (2016). Learning to draw samples: With application to amortized MLE for generative adversarial learning. ArXiv preprint. Available at arXiv:1611.01722.
[158] WANG, D. and LIU, Q. (2019). Nonlinear Stein variational gradient descent for learning diversified mixture models. In International Conference on Machine Learning (ICML) 6576-6585.
[159] WANG, D., tang, Z., Bajaj, C. and LIU, Q. (2019). Stein variational gradient descent with matrix-valued kernels. in Advances on Neural Information Processing Systems (NeurIPS) 7834-7844.
[160] WANG, D., Zeng, Z. and LIU, Q. (2018). Stein variational message passing for continuous graphical models. in International Conference on Machine Learning (ICML) 5219-5227.
[161] WELLING, M. and TEH, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. in International Conference on Machine Learning (ICML) 681-688.
[162] XU, W. (2022). Standardisation-function kernel Stein discrepancy: A unifying view on kernel Stein discrepancy tests for goodness-of-fit. in International Conference on Artificial Intelligence and Statistics (AISTATS) 1575-1597.
[163] XU, W. and REINERT, G. (2021). A Stein goodness-of-fit test for exponential random graph models. in International Conference on Artificial Intelligence and Statistics (AISTATS) 415423.
[164] YANG, J., LIU, Q., Rao, V. and NEVILLE, J. (2018). Goodness-of-fit testing for discrete distributions via Stein discrepancy. In International Conference on Machine Learning (ICML) 5561-5570.
[165] Yang, J., Rao, V. and Neville, J. (2019). A Stein-papangelou goodness-of-fit test for point processes. In International Conference on Artificial Intelligence and Statistics (AIS-TATS) 226-235.
[166] Yang, Z., Balasubramanian, K., Wang, Z. and Liu, H. (2017). Learning non-Gaussian multi-index model via second-order Stein’s method. In Advances in Neural Information Processing Systems (NeurIPS) 30 6097-6106.
[167] Zhang, X. and CURTIS, A. (2019). Seismic tomography using variational inference methods. J. Geophys. Res, Solid Earth 125 e2019JB018589.
[168] ZHANG, X. and CURTIS, A. (2020). Variational full-waveform inversion. Geophys. J. Int. 222 406-411.
[169] ZHANG, Y. and Lee, A. A. (2019). Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning. Chem. Sci. 10 8154-8163. https://doi.org/10.1039/c9sc00616h
[170] ZHU, Y. and ZABARAS, N. (2018). Bayesian deep convolutional encoder-decoder networks for surrogate modeling and uncertainty quantification. J. Comput. Phys. 366 415-447. MR3800689 https://doi.org/10.1016/j.jcp.2018.04.018
[171] ZHU, Z., Wan, R. and ZHONG, M. (2018). Neural control variates for variance reduction. ArXiv preprint. Available at arXiv:1806.00159.
[172] Zhuo, J., Liu, C., Shi, J., Zhu, J., Chen, N. and Zhang, B. (2018). Message passing Stein variational gradient descent. In International Conference on Machine Learning (ICML) 60136022.
[173] ZOLOTAREV, V. M. (1984). Probability metrics. Theory Probab. Appl. 28 278-302.