Maximum Likelihood Estimation Under Constraints: Singularities and Random Critical Points

Bayesian Posterior Consistency; Constraints; Critical Points; Empirical Likelihood; Misspecification; Random Polynomials; Singularity; Testing; Bayesian; Bayesian posterior consistency; Constraint; Critical point; Datapoints; Empirical likelihood; Maximum-likelihood estimation; Random polynomials

Abstract :

[en] We investigate the procedure of semi-parametric maximum likelihood estimation under constraints on summary statistics. Such a procedure results in a discrete probability distribution supported on the data points that maximizes the likelihood among all distributions supported on the data points satisfying the specified constraints (called estimating equations). The resultant distribution is an approximation of the underlying population distribution. The study of such empirical likelihood estimation originates from the seminal work of Owen [1], [2]. We investigate this procedure in the setting of misspecified (or biased) constraints, i.e., when the null hypothesis is not true. We establish that the behavior of the optimal weight distribution under such misspecification differ markedly from their properties under the null, i.e., when the estimating equations are correctly specified (or unbiased). This is manifested by certain “singularities” in the optimal distribution, that are not observed under the null. Furthermore, we establish an anomalous behavior of the log-likelihood based Wilks’ statistic, which, unlike under the null, does not exhibit a chi-squared limit. In the Bayesian setting, we establish the posterior consistency of procedures based on these ideas, where instead of a parametric likelihood, an empirical likelihood is used to define the posterior distribution. In particular, we show that this posterior, as a random probability measure, rapidly converges, with explicit convergence guarantees, to the delta measure at the true parameter value. We also illustrate implications of our results in diverse settings such as degeneracies in exponential random graph models (ERGM) for random networks [3], [4], empirical procedures where the constraints are themselves estimated from data [5], and to approximate Bayesian computation based procedures [6], [7]. A novel feature of our work is to connect the likelihood maximization problem to critical points of random polynomials. This yields the mass of the singular weight in the optimal weight distribution as the leading term in a canonical expansion of a critical point of a random polynomial. Our work unveils the possibility that similar random polynomial based techniques could be effective in analyzing a wide class of problems in related areas.

Disciplines :

Mathematics

Author, co-author :

Ghosh, Subhroshekhar

Chaudhuri, Sanjay

GANGOPADHYAY, Ujan ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Mathematics (DMATH)

External co-authors :

yes

Language :

English

Title :

Maximum Likelihood Estimation Under Constraints: Singularities and Random Critical Points

Publication date :

22 September 2023

Journal title :

IEEE Transactions on Information Theory

ISSN :

0018-9448

eISSN :

1557-9654

Publisher :

Institute of Electrical and Electronics Engineers Inc.

Volume :

Issue :

Pages :

7976-7997

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://xplorestaging.ieee.org/ielx7/18/10328924/10261459.pdf?arnumber=10261459

Funders :

Singapore Ministry of Education
Singapore Ministry of Education
University of Nebraska-Lincoln

Available on ORBilu :

since 29 November 2023

Statistics

Number of views

61 (8 by Unilu)

Number of downloads

3 (3 by Unilu)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

A. B. Owen, “Empirical likelihood ratio confidence intervals for a single functional,” Biometrika, vol. 75, no. 2, pp. 237–249, 1988, doi: 10.1093/biomet/75.2.237.
A. Owen, Empirical Likelihood. London, U.K.: Chapman & Hall, 2001.
S. Chatterjee and P. Diaconis, “Estimating and understanding exponential random graph models,” Ann. Statist., vol. 41, no. 5, pp. 2428–2461, Oct. 2013.
S. Mukherjee, “Degeneracy in sparse ERGMs with functions of degrees as sufficient statistics,” Bernoulli, vol. 26, no. 2, pp. 1016–1043, May 2020.
N. L. Hjort, I. W. McKeague, and I. Van Keilegom, “Extending the scope of empirical likelihood,” Ann. Statist., vol. 37, no. 3, pp. 1079–1111, Jun. 2009.
S. Chaudhuri, S. Ghosh, D. J. Nott, and K. Cuc Pham, “On a variational approximation based empirical likelihood ABC method,” 2020, arXiv:2011.07721.
T. DiCiccio, P. Hall, and J. Romano, “Empirical likelihood is Bartlett-correctable,” Ann. Statist., vol. 19, no. 2, pp. 1053–1061, Jun. 1991.
J. Qin and J. Lawless, “Empirical likelihood and general estimating equations,” Ann. Statist., vol. 22, no. 1, pp. 300–325, Mar. 1994.
P. Hall and A. B. Owen, “Empirical likelihood confidence bands in density estimation,” J. Comput. Graph. Statist., vol. 2, no. 3, pp. 273–289, 1993, doi: 10.2307/1390646.
P. A. Mykland, “Bartlett identities and large deviations in likelihood theory,” Ann. Statist., vol. 27, no. 3, pp. 1105–1117, Jun. 1999.
M. Grendár and G. Judge, “Asymptotic equivalence of empirical likelihood and Bayesian MAP,” Ann. Statist., vol. 37, no. 5A, pp. 2445–2457, Oct. 2009.
Y. Kitamura, A. Santos, and A. M. Shaikh, “On the asymptotic optimality of empirical likelihood for testing moment restrictions,” Econometrica, vol. 80, no. 1, pp. 413–423, 2012. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.3982/ECTA8773
S. A. Murphy, “Likelihood ratio-based confidence intervals in survival analysis,” J. Amer. Stat. Assoc., vol. 90, no. 432, pp. 1399–1405, Dec. 1995. [Online]. Available: http://links.jstor.org/sici?sici=01621459(199512)90:432<1399: LRCIIS>2.0.CO;2-E&origin=MSN
S. A. Murphy and A. W. van der Vaart, “Semiparametric likelihood ratio inference,” Ann. Statist., vol. 25, no. 4, pp. 1471–1509, Aug. 1997.
G. Li, “Nonparametric likelihood ratio estimation of probabilities for truncated data,” J. Amer. Stat. Assoc., vol. 90, no. 431, pp. 997–1003, Sep. 1995. [Online]. Available: http://links.jstor.org/sici?sici=01621459(199509)90:431.997:NLREOP.2.0.CO;2-A&origin=MSN
H. Peng and A. Schick, “Empirical likelihood approach to goodness of fit testing,” Bernoulli, vol. 19, no. 3, pp. 954–981, Aug. 2013.
J. Chang, C. Y. Tang, and T. T. Wu, “A new scope of penalized empirical likelihood with high-dimensional estimating equations,” Ann. Statist., vol. 46, no. 6B, pp. 3185–3216, Dec. 2018.
S. X. Chen and I. Van Keilegom, “A review on empirical likelihood methods for regression,” TEST, vol. 18, no. 3, pp. 415–447, Nov. 2009.
E. T. Jaynes, “Information theory and statistical mechanics,” Phys. Rev., vol. 106, no. 4, pp. 620–630, May 1957.
E. T. Jaynes, “Information theory and statistical mechanics. II,” Phys. Rev., vol. 108, no. 2, pp. 171–190, Oct. 1957.
T. A. B. Snijders, P. E. Pattison, G. L. Robins, and M. S. Handcock, “New specifications for exponential random graph models,” Sociol. Methodol., vol. 36, no. 1, pp. 99–153, Aug. 2006.
G. Robins, P. Pattison, Y. Kalish, and D. Lusher, “An introduction to exponential random graph (p*) models for social networks,” Social Netw., vol. 29, no. 2, pp. 173–191, May 2007.
S. Horvát, É. Czabarka, and Z. Toroczkai, “Reducing degeneracy in maximum entropy models of networks,” Phys. Rev. Lett., vol. 114, no. 15, Apr. 2015, Art. no. 158701.
I. Fellows and M. Handcock, “Removing phase transitions from Gibbs measures,” in Proc. 20th Int. Conf. Artif. Intell. Statist., vol. 54, A. Singh and J. Zhu, Eds., 2017, pp. 289–297. [Online]. Available: https://proceedings.mlr.press/v54/fellows17a.html
N. Lazar, “An evaluation of the power and conditionality properties of empirical likelihood,” Biometrika, vol. 85, no. 3, pp. 523–534, Sep. 1998.
N. A. Lazar, “Bayesian empirical likelihood,” Biometrika, vol. 90, no. 2, pp. 319–326, Jun. 2003.
S. M. Schennach, “Bayesian exponentially tilted empirical likelihood,” Biometrika, vol. 92, no. 1, pp. 31–46, Mar. 2005.
K.-T. Fang and R. Mukerjee, “Expected lengths of confidence intervals based on empirical discrepancy statistics,” Biometrika, vol. 92, no. 2, pp. 499–503, Jun. 2005.
K.-T. Fang and R. Mukerjee, “Empirical-type likelihoods allowing posterior credible sets with frequentist validity: Higher-order asymptotics,” Biometrika, vol. 93, no. 3, pp. 723–733, Sep. 2006.
S. Chaudhuri, D. Mondal, and T. Yin, “Hamiltonian Monte Carlo sampling in Bayesian empirical likelihood computation,” J. Roy. Stat. Soc. Ser. B, Stat. Methodol., vol. 79, no. 1, pp. 293–320, Jan. 2017.
S. Chib, M. Shin, and A. Simoni, “Bayesian estimation and comparison of moment condition models,” J. Amer. Stat. Assoc., vol. 113, no. 524, pp. 1656–1668, Oct. 2018.
X. Zhong and M. Ghosh, “Higher-order properties of Bayesian empirical likelihood,” Electron. J. Statist., vol. 10, no. 2, pp. 3011–3044, Jan. 2016.
A. Vexler, L. Zou, and A. D. Hutson, “The empirical likelihood prior applied to bias reduction of general estimating equations,” Comput. Statist. Data Anal., vol. 138, pp. 96–106, Oct. 2019, doi: 10.1016/j.csda.2019.04.001.
A. Yuan and B. Clarke, “Reference priors for empirical likelihoods,” in Frontiers of Statistical Decision Making and Bayesian Analysis, H. James, O. Berger, M. H. Chen, P. Müller, D. Sun, K. Ye, and D. K. Dey, Eds. New York, NY, USA: Springer, 2010, pp. 56–68.[Online]. Available: https://books.google.com.sg/books?id=uOwFwrD8wDoC
S. Chaudhuri and M. Ghosh, “Empirical likelihood for small area estimation,” Biometrika, vol. 98, no. 2, pp. 473–480, Jun. 2011.
A. T. Porter, S. H. Holan, and C. K. Wikle, “Bayesian semiparametric hierarchical empirical likelihood spatial models,” J. Stat. Planning Inference, vol. 165, pp. 78–90, Oct. 2015, doi: 10.1016/j.jspi.2015.04.002.
Y. Yang and X. He, “Bayesian empirical likelihood for quantile regression,” Ann. Statist., vol. 40, no. 2, pp. 1102–1131, Apr. 2012.
K. L. Mengersen, P. Pudlo, and C. P. Robert, “Bayesian computation via empirical likelihood,” Proc. Nat. Acad. Sci. USA, vol. 110, no. 4, pp. 1321–1326, Jan. 2013.
A. Auffinger, G. B. Arous, and J. Černý, “Random matrices and complexity of spin glasses,” Commun. Pure Appl. Math., vol. 66, no. 2, pp. 165–201, Feb. 2013. [Online]. Available: https://onlinelibrary.wiley.com/doi/10.1002/cpa.21422
A. Auffinger and G. Ben Arous, “Complexity of random smooth functions on the high-dimensional sphere,” Ann. Probab., vol. 41, no. 6, pp. 4214–4247, Nov. 2013.
N. Cressie and T. R. Read, “Multinomial goodness-of-fit tests,” J. Roy. Stat. Soc., B, Methodol., vol. 46, no. 3, pp. 440–464, 1984.
G. Judge and R. Mittelhammer, “Implications of the cressie-read family of additive divergences for information recovery,” Entropy, vol. 14, no. 12, pp. 2427–2438, Dec. 2012.
Y. Kitamura, “Asymptotic optimality of empirical likelihood for testing moment restrictions,” Econometrica, vol. 69, no. 6, pp. 1661–1672, Nov. 2001.
P. Deheuvels, “On the influence of the extremes of an I.I.D. sequence on the maximal spacings,” Ann. Probab., vol. 14, no. 1, pp. 194–208, Jan. 1986. [Online]. Available: http://links.jstor.org/sici?sici=00911798(198601)14:1<194: OTIOTE>2.0.CO;2-A&origin=MSN
H. N. Nagaraja, “Record values and extreme value distributions,” J. Appl. Probab., vol. 19, no. 1, pp. 233–239, Mar. 1982.
R. E. Welsch, “A convergence theorem for extreme values from Gaussian sequences,” Ann. Probab., vol. 1, no. 3, pp. 398–404, Jun. 1973.
R.-D. Reiss, Approximate Distributions of Order Statistics (Springer Series in Statistics). New York, NY, USA: Springer, 1989. [Online]. Available: http://link.springer.com/10.1007/978-1-4613-9620-8
M. R. Leadbetter, G. Lindgren, and H. Rootzén, Extremes and Related Properties of Random Sequences and Processes. New York, NY, USA: Springer, 1983.
H. N. Nagaraja, K. Bharath, and F. Zhang, “Spacings around an order statistic,” Ann. Inst. Stat. Math., vol. 67, no. 3, pp. 515–540, Jun. 2015.
S. I. Resnick, Extreme Values, Regular Variation, and Point Processes (Springer Series in Operations Research and Financial Engineering). New York, NY, USA: Springer, 1987. [Online]. Available: http://link.springer.com/10.1007/978-0-387-75953-1
S. Kotz, T. J. Kozubowski, and K. Podgórski, The Laplace Distribution Generalizations. Boston, MA, USA: Birkhäuser, 2001. [Online]. Available: http://link.springer.com/10.1007/978-1-4612-0173-1
A. Yuan, J. Xu, and G. Zheng, “On empirical likelihood statistical functions,” J. Econometrics, vol. 178, pp. 613–623, Jan. 2014, doi: 10.1016/j.jeconom.2013.08.037.
J. F. Monahan and D. D. Boos, “Proper likelihoods for Bayesian analysis,” Biometrika, vol. 79, no. 2, pp. 271–278, 1992.
L. E. Dubins and D. A. Freedman, “Invariant probabilities for certain Markov processes,” Ann. Math. Statist., vol. 37, no. 4, pp. 837–848, Aug. 1966.
R. H. Berk, “Correction notes: Correction to limiting behavior of posterior distributions when the model is incorrect,” Ann. Math. Statist., vol. 37, no. 3, pp. 745–746, Jun. 1966.
B. J. K. Kleijn and A. W. van der Vaart, “Misspecification in infinite-dimensional Bayesian statistics,” Ann. Statist., vol. 34, no. 2, pp. 837–877, Apr. 2006.
D. Fudenberg, G. Romanyuk, and P. Strack, “Active learning with a misspecified prior,” Theor. Econ., vol. 12, no. 3, pp. 1155–1189, Sep. 2017.
B. Shiffman and S. Zelditch, “Random polynomials of high degree and Levy concentration of measure,” Asian J. Math., vol. 7, no. 4, pp. 627–646, 2003. [Online]. Available: http://www.intlpress.com/site/pub/pages/journals/items/ajm/content/vols/0007/0004/a011/
L. F. Kozachenko and N. N. Leonenko, “Sample estimate of the entropy of a random vector,” Problemy Peredachi Inform., vol. 23, no. 2, pp. 9–16, Oct. 1987.
T. B. Berrett, R. J. Samworth, and M. Yuan, “Efficient multivariate entropy estimation via k-nearest neighbour distances,” Ann. Statist., vol. 47, no. 1, pp. 288–318, 2019, doi: 10.1214/18-AOS1688. [Online]. Available: http://www.intlpress.com/site/pub/pages/journals/items/ajm/content/vols/0007/0004/a011/
A. Röllin, “Kolomogorov bounds for the normal approximation of the number of trinagles in the Erdös–Rényi random graph,” Probab. Eng. Inf. Sci., vol. 36, no. 3, pp. 747–773, 2022.
N. Privault and G. Serafin, “Normal approximation for sums of weighted U-statistics—Application to Kolmogorov bounds in random subgraph counting,” Bernoulli, vol. 26, no. 1, pp. 587–615, 2020, doi: 10.3150/19-BEJ1141.
Z.-S. Zhang, “Berry–Esseen bounds for generalized U-statistics,” Electron. J. Probab., vol. 27, pp. 1–36, Jan. 2022, doi: 10.1214/22-EJP860.