Directional statistics; High-dimensional data; Location tests; Principal component analysis; Rotationally symmetric distributions; Spherical mean; Statistics and Probability; Numerical Analysis; Statistics, Probability and Uncertainty
Abstract :
[en] This paper mainly focuses on one of the most classical testing problems in directional statistics, namely the spherical location problem that consists in testing the null hypothesis H0:θ=θ0 under which the (rotational) symmetry center θ is equal to a given value θ0. The most classical procedure for this problem is the so-called Watson test, which is based on the sample mean of the observations. This test enjoys many desirable properties, but its asymptotic theory requires the sample size n to be large compared to the dimension p. This is a severe limitation, since more and more problems nowadays involve high-dimensional directional data (e.g., in genetics or text mining). In the present work, we derive the asymptotic null distribution of the Watson statistic as both n and p go to infinity. This reveals that (i) the Watson test is robust against high dimensionality, and that (ii) it allows for (n, p)-asymptotic results that are universal, in the sense that p may go to infinity arbitrarily fast (or slowly) as a function of n. Turning to Euclidean data, we show that our results also lead to a test for the null that the covariance matrix of a high-dimensional multinormal distribution has a "θ0-spiked" structure. Finally, Monte Carlo studies corroborate our asymptotic results and briefly explore non-null rejection frequencies.
Disciplines :
Mathematics
Author, co-author :
LEY, Christophe ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Mathematics (DMATH) ; Département de Mathématique and ECARES, Université Libre de Bruxelles, Bruxelles, Belgium
Paindaveine, Davy; Département de Mathématique and ECARES, Université Libre de Bruxelles, Bruxelles, Belgium
Verdebout, Thomas; EQUIPPE and INRIA, Université Lille III, Domaine Universitaire du Pont de Bois, Villeneuve d'Ascq Cedex, France
External co-authors :
yes
Language :
English
Title :
High-dimensional tests for spherical location and spiked covariance
Fonds National de la Recherche Scientifique, Communauté Française de Belgique Communauté Française de Belgique Belgian government
Funding text :
Christophe Ley thanks the Fonds National de la Recherche Scientifique, Communauté Française de Belgique , for support via a Mandat de Chargé de Recherche. Davy Paindaveine’s research is supported by an A.R.C. contract from the Communauté Française de Belgique and by the IAP research network grant no. P7/06 of the Belgian government (Belgian Science Policy) . All three authors would like to thank the Associate Editor and an anonymous referee for their comments that led to a significant improvement of the paper.
A. Banerjee, I. Dhillon, J. Ghosh, S. Sra, Generative model-based clustering of directional data, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, pp. 19-28.
Banerjee A., Dhillon I., Ghosh J., Sra S. Clustering on the unit hypersphere using von mises-Fisher distributions. J. Mach. Learn. Res. 2005, 6:1345-1382.
A. Banerjee, J. Ghosh, Frequency sensitive competitive learning for clustering on high-dimensional hyperspheres, in: Proceedings International Joint Conference on Neural Networks, 2002, pp. 1590-1595.
Banerjee A., Ghosh J. Frequency sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres. IEEE Trans. Neural Netw. 2004, 15:702-719.
Baricz A., Ponnusamy S. On turán type inequalities for modified bessel functions. Proc. Amer. Math. Soc. 2013, 141(523-532).
Billingsley P. Probability and Measure 1995, Wiley, New York, Chichester. third ed.
Cai T., Fan J., Jiang T. Distributions of angles in random packing on spheres. J. Mach. Learn. Res. 2013, 14:1837-1864.
Cai T., Jiang T. Phase transition in limiting distributions of coherence of high-dimensional random matrices. J. Multivariate Anal. 2012, 107:24-39.
Chen S., Qin Y. A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 2010, 38:808-835.
Chen S.X., Zhang L.-X., Zhong P.-S. Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc. 2010, 105:810-819.
Dryden I.L. Statistical analysis on high-dimensional spheres and shape spaces. Ann. Statist. 2005, 33:1643-1665.
Heyde C.C., Brown B.M. On the departure from normality of a certain class of martingales. Ann. Math. Stat. 1970, 41:2161-2165.
Jiang T., Yang F. Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions. Ann. Statist. 2013, 41:2029-2074.
Johnstone I.M. On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 2001, 29:295-327.
Ledoit O., Wolf M. Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist. 2002, 30:1081-1102.
Li J., Chen S.X. Two sample tests for high-dimensional covariance matrices. Ann. Statist. 2012, 40:908-940.
Onatski A., Moreira M., Hallin M. Asymptotic power of sphericity tests for high-dimensional data. Ann. Statist. 2013, 41:1204-1231.
D. Paindaveine, T. Verdebout, On high-dimensional sign test, Bernoulli, in press.
Paindaveine D., Verdebout T. Optimal rank-based tests for the location parameter of a rotationally symmetric distribution on the hypersphere. Mathematical Statistics and Limit Theorems: Festschrift in Honor of Paul Deheuvels 2015, 243-264. Springer. M. Hallin, D. Mason, D. Pfeifer, J. Steinebach (Eds.).
Saw J.G. A family of distributions on the m-sphere and some hypothesis tests. Biometrika 1978, 65:69-73.
Schott J. Some high-dimensional tests for a one-way manova. J. Multivariate Anal. 2007, 98:1825-1839.
Srivastava M.S., Fujikoshi Y. Multivariate analysis of variance with fewer observations than the dimension. J. Multivariate Anal. 2006, 97:1927-1940.
Srivastava M.S., Katayama S., Kano Y. A two sample test in high dimensional data. J. Multivariate Anal. 2013, 114:349-358.
Srivastava M.S., Kubokawa T. Tests for multivariate analysis of variance in high dimension under non-normality. J. Multivariate Anal. 2013, 115:204-216.
Stam A.J. Limit theorems for uniform distributions on spheres in high-dimensional Euclidean spaces. J. Appl. Probab. 1982, 19:221-228.
Watson G. A Treatise on the Theory of Bessel Functions 1944, Cambridge University Press. second ed.
Watson G.S. Limit theorems on high-dimensional spheres and Stiefel manifolds. Studies in Econometrics, Time Series, and Multivariate Statistics 1983, 559-570. Academic Press, New York. S. Karlin, T. Amemiya, L.A. Goodman (Eds.).
Watson G.S. Statistics on Spheres 1983, Wiley, New York.
Watson G.S. The Langevin distribution on high dimensional spheres. J. Appl. Stat. 1988, 15:123-130.