[en] We investigate the problem of estimating the structure of a weighted network from repeated measurements of a Gaussian graphical model (GGM) on the network. In this vein, we consider GGMs whose covariance structures align with the geometry of the weighted network on which they are based. Such GGMs have been of longstanding interest in statistical physics, and are referred to as the Gaussian free field (GFF). In recent years, they have attracted considerable interest in the machine learning and theoretical computer science. In this work, we propose a novel estimator for the weighted network (equivalently, its Laplacian) from repeated measurements of a GFF on the network, based on the Fourier analytic properties of the Gaussian distribution. In this pursuit, our approach exploits complex-valued statistics constructed from observed data, that are of interest in their own right. We demonstrate the effectiveness of our estimator with concrete recovery guarantees and bounds on the required sample complexity. In particular, we show that the proposed statistic achieves the parametric rate of estimation for fixed network size. In the setting of networks growing with sample size, our results show that for Erdos–Renyi random graphs G(d, p) above the connectivity threshold, network recovery takes place with high probability as soon as the sample size n satisfies n≫d4logd·p-2.
Disciplines :
Mathématiques
Auteur, co-auteur :
Ghosh, Subhro; Department of Mathematics, National University of Singapore, Singapore, Singapore
Mukherjee, Soumendu Sundar; Theoretical Statistics and Mathematics Unit, Indian Statistical Institute, Calcutta, India
Tran, Hoang-Son ; Department of Mathematics, National University of Singapore, Singapore, Singapore
GANGOPADHYAY, Ujan ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Mathematics (DMATH) ; Department of Mathematics, National University of Singapore, Singapore, Singapore
Co-auteurs externes :
yes
Langue du document :
Anglais
Titre :
Learning Networks from Gaussian Graphical Models and Gaussian Free Fields
S.G. was supported in part by the MOE Grants R-146-000-250-133, R-146-000-312-114 and MOE-T2EP20121-0013. S.S.M. was partially supported by an INSPIRE research Grant (DST/INSPIRE/04/2018/002193) from the Department of Science and Technology, Government of India and a Start-Up Grant from Indian Statistical Institute, Kolkata. H.S.T. was supported by the NUS Research Scholarship. We thank Satya Majumdar for helpful suggestions.
A. Anandkumar V. Tan A. Willsky High-dimensional Gaussian graphical model selection: walk summability and local separation criterion J. Mach. Learn. Res. 2011 13 07 2973603
A. Anandkumar V.Y. Tan F. Huang A.S. Willsky High-dimensional structure estimation in Ising models: local separation criterion Ann. Stat. 2012 40 1346 1375 3015028 10.1214/12-AOS1009
O. Banerjee L. Ghaoui Model selection through sparse max likelihood estimation model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data J. Mach. Learn. Res. 2007 9 08
S. Banerjee S. Ghosal Posterior convergence rates for estimating large precision matrices using graphical models Electron. J. Stat. 2014 8 2 2111 2137 3273620 10.1214/14-EJS945
S. Banerjee S. Ghosal Bayesian structure learning in graphical models J. Multivar. Anal. 2015 136 147 162 3321485 10.1016/j.jmva.2015.01.015
Basso, K., Margolin, A.A., Stolovitzky, G., Klein, U., Dalla-Favera, R., Califano, A.: Reverse engineering of regulatory networks in human B cells. Nat. Genet. 37(4), 382–390 (2005). https://doi.org/10.1038/ng1532
Belomestny, D., Trabs, M., Tsybakov, A.B.: Sparse covariance matrix estimation in high-dimensional deconvolution. Bernoulli 25 (3), 8 (2019). https://doi.org/10.3150/18-BEJ1040A
Berestycki, N.: Introduction to the Gaussian free field and Liouville quantum gravity. Lecture notes, 2018–2019 (2015)
Q. Berthet P. Rigollet P. Srivastava Exact recovery in the ising blockmodel Ann. Stat. 2019 47 4 1805 1834 3953436 10.1214/17-AOS1620
Bhattacharya, B. B., Mukherjee, S.: Inference in ising models. (2018)
P.J. Bickel E. Levina Regularized estimation of large covariance matrices Ann. Stat. 2008 36 1 199 227 2387969 10.1214/009053607000000758
P.J. Bickel E. Levina Covariance regularization by thresholding Ann. Stat. 2008 36 6 2577 2604 2485008 10.1214/08-AOS600
Bresler, G.: Efficiently learning ising models on arbitrary graphs. In: Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pp. 771–782 (2015)
T.T. Cai C.-H. Zhang H.H. Zhou Optimal rates of convergence for covariance matrix estimation Ann. Stat. 2010 38 4 2118 2144 2676885 10.1214/09-AOS752
T.T. Cai H. Li W. Liu J. Xie Covariate-adjusted precision matrix estimation with an application in genetical genomics Biometrika 2012 100 1 139 156 3034329 10.1093/biomet/ass058 11
T. Cai W. Liu H.H. Zhou Estimating sparse precision matrix: optimal rates of convergence and adaptive estimation Ann. Stat. 2016 44 2 455 488 3476606 10.1214/13-AOS1171
T.T. Cai Z. Ren H.H. Zhou Estimating structured high-dimensional covariance and precision matrices: optimal rates and adaptive estimation Electron. J. Stat. 2016 10 1 1 59 3466172
T. Cai W. Liu X. Luo A constrained ℓ1 minimization approach to sparse precision matrix estimation J. Am. Stat. Assoc. 2011 106 494 594 607 10.1198/jasa.2011.tm10155
J. Dahl L. Vandenberghe V. Roychowdhury Covariance selection for non-chordal graphs via chordal embedding Optim. Methods Softw. 2008 23 4 501 520 2440363 10.1080/10556780802102693
A. d’Aspremont O. Banerjee L. El Ghaoui First-order methods for sparse covariance selection SIAM J. Matrix Anal. Appl. 2008 30 1 56 66 2399568 10.1137/060670985
N. El Karoui Operator norm consistent estimation of large-dimensional sparse covariance matrices Ann. Stat. 2008 36 6 2717 2756 2485011
J. Fan Y. Feng W. Yichao Network exploration via the adaptive LASSO and SCAD penalties Ann. Appl. Stat. 2009 3 2 521 541 2750671 10.1214/08-AOAS215
J. Friedman T. Hastie R. Tibshirani Sparse inverse covariance estimation with the graphical lasso Biostatistics 2007 9 3 432 441 10.1093/biostatistics/kxm045 12
Ghosh, S., Mukherjee, S. S.: Learning with latent group sparsity via heat flow dynamics on networks. arXiv:2201.08326 (2022)
J.Z. Huang N. Liu M. Pourahmadi L. Liu Covariance matrix selection and estimation via penalised normal likelihood Biometrika 2006 93 1 85 98 2277742 10.1093/biomet/93.1.85
S. Huang J. Li L. Sun J. Ye A. Fleisher W. Teresa K. Chen E. Reiman Learning brain connectivity of Alzheimer’s disease by sparse inverse covariance estimation NeuroImage 2010 50 3 935 949 10.1016/j.neuroimage.2009.12.120
J. Kelner F. Koehler R. Meka A. Moitra H. Larochelle M. Ranzato R. Hadsell M.F. Balcan H. Lin Learning some popular gaussian graphical models without condition number bounds Advances in Neural Information Processing Systems 2020 New York Curran Associates Inc 10986 10998
J. Kelner F. Koehler R. Meka A. Moitra Learning some popular gaussian graphical models without condition number bounds Adv. Neural. Inf. Process. Syst. 2020 33 10986 10998
C. Lam J. Fan Sparsistency and rates of convergence in large covariance matrix estimation Ann. Stat. 2009 37 6B 4254 4278 2572459 10.1214/09-AOS720
J. Lei A. Rinaldo Consistency of spectral clustering in stochastic block models Ann. Stat. 2015 43 1 215 237 3285605 10.1214/14-AOS1274
H. Liu J. Lafferty L. Wasserman The nonparanormal: semiparametric estimation of high dimensional undirected graphs J. Mach. Learn. Res. 2009 10 80 2295 2328 2563983
P.-L. Loh P. Bühlmann High-dimensional learning of linear causal networks via inverse covariance estimation J. Mach. Learn. Res. 2014 15 1 3065 3105 3277162
Ma, Y., Garnett, R., Schneider, J.: σ -optimality for active learning on gaussian random fields. Advances in Neural Information Processing Systems, 26 (2013)
D.M. Malioutov J.K. Johnson A.S. Willsky Walk-sums and belief propagation in gaussian graphical models J. Mach. Learn. Res. 2006 7 73 2031 2064 2274432
N. Meinshausen P. Bühlmann High-dimensional graphs and variable selection with the Lasso Ann. Stat. 2006 34 3 1436 1462 2278363 10.1214/009053606000000281
P. Menéndez Y.A. Kourmpetis C.J. ter Braak F.A. van Eeuwijk Gene regulatory networks from multifactorial perturbations using graphical lasso: application to the DREAM4 challenge PLoS ONE 2010 5 12 e14147 2010PLoSO..514147M 10.1371/journal.pone.0014147
Misra, S., Vuffray, M., Lokhov, A.Y.: Information theoretic optimal learning of gaussian graphical models. In: Jacob Abernethy and Shivani Agarwal, editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pp. 2888–2909. PMLR, 09–12 (2020)
A. Müller M. Scarsini Archimedean copulae and positive dependence J. Multivar. Anal. 2005 93 2 434 445 2162647 10.1016/j.jmva.2004.04.003
Ravikumar, P., Wainwright, M.J., Lafferty, J.D.: High-dimensional ising model selection using ℓ 1 -regularized logistic regression (2010)
P. Ravikumar M.J. Wainwright G. Raskutti Y. Bin Yu High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence Electron. J. Stat. 2011 5 none 935 980 2836766 10.1214/11-EJS631
I. Rish B. Thyreau B. Thirion M. Plaze M. Paillere-martinot C. Martelli J. Martinot J. Poline G. Cecchi Y. Bengio D. Schuurmans J. Lafferty C. Williams A. Culotta Discriminative network models of schizophrenia Advances in Neural Information Processing Systems 2009 New York Curran Associates Inc
A.J. Rothman P.J. Bickel E. Levina J. Zhu Sparse permutation invariant covariance estimation Electron. J. Stat. 2008 2 494 515 2417391 10.1214/08-EJS176
S. Sheffield Gaussian free fields for mathematicians Probab. Theory Relat. Fields 2007 139 3–4 521 541 2322706 10.1007/s00440-006-0050-1
W. Shi S. Ghosal R. Martin Bayesian estimation of sparse precision matrices in the presence of Gaussian measurement error Electron. J. Stat. 2021 15 2 4545 4579 4316663 10.1214/21-EJS1904
J.A. Tropp Just relax: convex programming methods for identifying sparse signals in noise IEEE Trans. Inf. Theory 2006 52 3 1030 1051 2238069 10.1109/TIT.2005.864420
Varoquaux, G., Baronnet, F., Kleinschmidt, A., Fillard, P., Thirion, B.: Detection of brain functional-connectivity difference in post-stroke patients using group-level covariance modeling. In: T. Jiang, N. Navab, J.P.W. Pluim, M.A. Viergever, (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2010, pp. 200–208, Springer, Berlin (2010)
G. Varoquaux A. Gramfort J. Poline B. Thirion J. Lafferty C. Williams J. Shawe-Taylor R. Zemel A. Culotta Brain covariance selection: better individual functional connectivity models using population prior Advances in Neural Information Processing Systems 2010 New York Curran Associates Inc
R. Vershynin High-Dimensional Probability: An Introduction with Applications in Data Science 2018 Cambridge Cambridge University Press
M.J. Wainwright Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ1-constrained quadratic programming (lasso) IEEE Trans. Inf. Theory 2009 55 5 2183 2202 2729873 10.1109/TIT.2009.2016018
A. Wille P. Zimmermann E. Vranová A. Fürholz O. Laule S. Bleuler L. Hennig A. Prelić P. von Rohr L. Thiele E. Zitzler W. Gruissem P. Bühlmann Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana Genome Biol. 2004 5 11 R92 10.1186/gb-2004-5-11-r92
M.A. Woodbury Inverting Modified Matrices 1950 Princeton Princeton University, Department of Statistics
W.B. Wu M. Pourahmadi Non-parametric estimation of large covariance matrices of longitudinal data Biometrika 2003 90 4 831 844 2024760 10.1093/biomet/90.4.831
M. Yuan High dimensional inverse covariance matrix estimation via linear programming J. Mach. Learn. Res. 2010 11 79 2261 2286 2719856
M. Yuan Y. Lin Model selection and estimation in the Gaussian graphical model Biometrika 2007 94 19 35 2367824 10.1093/biomet/asm018
P. Zhao Y. Bin Yu On model selection consistency of lasso J. Mach. Learn. Res. 2006 7 90 2541 2563 2274449
Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: Proceedings of the 20th International conference on Machine learning (ICML-03), pp. 912–919 (2003)
Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In: ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, vol. 3 (2003)
P. Zwiernik Semialgebraic Statistics and Latent Tree Models 2016 Boca Raton Chapman & Hall/CRC