circular statistics; copulas; directional statistics; finite mixtures; heavy tails; skewness; transformation approach; Statistics and Probability; Statistics, Probability and Uncertainty
Abstract :
[en] Probability distributions are the building blocks of statistical modeling and inference. It is therefore of the utmost importance to know which distribution to use in what circumstances, as wrong choices will inevitably entail a biased analysis. In this article, we focus on circumstances involving complex data and describe the most popular flexible models for these settings. We focus on the following complex data: multivariate skew and heavy-tailed data, circular data, toroidal data, and cylindrical data. We illustrate the strength of flexible models on the basis of concrete examples and discuss major applications and challenges.
Disciplines :
Mathematics
Author, co-author :
LEY, Christophe ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Mathematics (DMATH) ; Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
Babić, Slađana; Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium ; Vlerick Business School, Brussels, Belgium
Craens, Domien; Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Gent, Belgium
External co-authors :
yes
Language :
English
Title :
Flexible models for complex data with applications
Abe T, Pewsey A. 2011b. Symmetric circular models through duplication and cosine perturbation. Comp. Stat. Data Anal. 55: 3271-82
Adcock C, Azzalini A. 2020. A selective overview of skew-elliptical and related distributions and of their applications. Symmetry 12: 118
Ameijeiras-Alonso J, Ley C. 2019. Sine-skewed toroidal distributions and their application in protein bioinformatics. arXiv: 1910. 13293 [stat. ME]
Andrews D, Gnanadesikan R, Warner J. 1971. Transformations of multivariate data. Biometrics 27: 825-40
Arellano-Valle RB, GentonMG. 2010. Multivariate extended skew-t distributions and related families. Metron 68: 201-34
Arnold R, Jupp PE. 2018. Statistics of orientations of symmetrical objects. In Applied Directional Statistics: Modern Methods and Case Studies, ed. C Ley, T Verdebout, pp. 25-44. Boca Raton, FL: Chapman and Hall/CRC
Atkinson A. 2020. The Box-Cox transformation: review and extensions. Stat. Sci. In Press
Azzalini A. 2017. sn: the skew-normal and related distributions such as the skew-t. R Package, version 1. 6-2. https://CRAN. R-project. org/package=sn
Azzalini A, Capitanio A. 2003. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. Ser. B 65: 367-89
Azzalini A, Dalla Valle A. 1996. The multivariate skew-normal distribution. Biometrika 83: 715-26
Azzalini A, Genton MG. 2008. Robust likelihood methods based on the skew-t and related distributions. Int. Stat. Rev. 76: 106-29
Babíc S, Ley C, Veredas D. 2019. Comparison and classification of flexible distributions for multivariate skew and heavy-tailed data. Symmetry 11(10): 1216
Balakrishnan N, Lai CD. 2009. Continuous Bivariate Distributions. New York: Springer
Batschelet E. 1981. Circular Statistics in Biology. London: Academic
Bedford T, Cooke RM. 2002. Vines: a new graphical model for dependent random variables. Ann. Stat. 30: 1031-68
Bermúdez L, Karlis D, Santolino M. 2017. A finite mixture of multiple discrete distributions for modelling heaped count data. Comput. Stat. Data Anal. 112: 14-23
Böhning D. 1999. Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Disease Mapping, and Others. Boca Raton, FL: Chapman and Hall/CRC
Box GE, Cox DR. 1964. An analysis of transformations. J. R. Stat. Soc. Ser. B 26: 211-43
Breiman L. 2001. Statistical modeling: the two cultures. Stat. Sci. 6: 199-231
Charemza W, Vela CD, Makarova S. 2013. Too many skew-normal distributions? The practitioners perspective. Discuss. Pap. Econ. 13/07, Univ. Leicester, UK
Clayton DG. 1978. Amodel for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65: 141-51
Craens D, Ley C. 2018. Invited opinion paper: Analysis of biological and biomedical data with circular statistics. Biostat. Biom. Open Access J. 5: 555671
Dominicy Y, Sinner C. 2017. Distributions and composite models for size-type data. In Advances in Statistical Methodologies and Their Applications to Real Problems, ed. THokimoto, pp. 159-84. Rijeka, Croatia: InTech
Dryden IL. 2005. Statistical analysis on high-dimensional spheres and shape spaces. Ann. Stat. 33: 1643-65
Embrechts P, HofertM. 2013. Statistical inference for copulas in high dimensions: a simulation study. ASTIN Bull. J. IAA 43: 81-95
Everitt B, Hand D. 1981. Finite Mixture Distributions. New York: Chapman and Hall
Fang HB, Fang KT, Kotz S. 2002. The meta-elliptical distributions with given marginals. J. Multivar. Anal. 82: 1-16
Field C, Genton MG. 2006. The multivariate g-and-h distribution. Technometrics 48: 104-11
Forbes F, Wraith D. 2014. A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering. Stat. Comput. 24: 971-84
Frank MJ. 1979. On the simultaneous associativity of F(x, y) and x + y ? F(x, y). Aequ. Math. 19: 194-226
FréchetM. 1951. Sur les tableaux de corrélation dont les marges sont données. Ann. Univ. Lyon 3e Sér. Sci. Sect. A 14: 53-77
Gatto R, Jammalamadaka SR. 2007. The generalized von Mises distribution. Stat. Methodol. 4: 341-53
GaussCF. 1809. Theoriamotus corporum coelestium in sectionibus conicis solem ambientium. Hamburg, Ger.: Perthes and Besser
Gelfand AE, Banerjee S. 2017. Bayesian modeling and analysis of geostatistical data. Annu. Rev. Stat. Appl. 4: 245-66
Genest C, Rivest LP. 1993. Statistical inference procedures for bivariate Archimedean copulas. J. Am. Stat. Assoc. 88: 1034-43
Genton MG. 2004. Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Boca Raton, FL: Chapman and Hall/CRC
Genton MG, Thompson KR. 2004. Skew-elliptical time series with application to flooding risk. In Time Series Analysis and Applications to Geophysical Systems, ed. DBrillinger, EA Robinson, FP Schoenberg, pp. 169-85. New York: Springer
Ghalanos A, Theussl S. 2015. Rsolnp: general non-linear optimization using augmented Lagrange multiplier method. R Package, version 1. 16. https://CRAN. R-project. org/package=Rsolnp
Gugliani G, Sarkar A, Ley C, Mandal S. 2018. New methods to assess wind resources in terms of wind speed, load, power and direction. Renew. Energy 129: 168-82
Hofert M, Kojadinovic I, Maechler M, Yan J. 2017. copula: multivariate dependence with copulas. R Package, version 1. 0-0. https://CRAN. R-project. org/package=copula
Hunt E. 2011. Human Intelligence. Cambridge, UK: Cambridge Univ. Press
Jammalamadaka SR, SenGupta A. 2001. Topics in Circular Statistics. Singapore: World Sci.
Jangamshetti SH, RauGV. 2001. Normalized power curves as a tool for identification of optimum wind turbine generator parameters. IEEE Trans. Energy Convers. 16: 283-88
Jia L, LiK, Yu J, Guo X, Zhao T. 2020. Prediction and analysis of Coronavirus Disease 2019. arXiv: 2003. 05447 [q-bio. PE]
Joe H. 1996. Families of m-variate distributions with given margins and m(m ? 1)/2 bivariate dependence parameters. Lect. Notes Monogr. Ser. 28: 120-41
Joe H. 1997. Multivariate Models and Multivariate Dependence Concepts. Berlin: Springer
Johnson RA, Wehrly TE. 1978. Some angular-linear distributions and related regression models. J. Am. Stat. Assoc. 73: 602-6
JonesMC. 2015. On families of distributions with shape parameters (with discussion). Int. Stat. Rev. 83: 175-92
Jones MC, Faddy MJ. 2003. A skew extension of the t-distribution, with applications. J. R. Stat. Soc. Ser. B 65: 159-74
Jones MC, Pewsey A. 2005. A family of symmetric distributions on the circle. J. Am. Stat. Assoc. 100: 1422-28
Jones MC, Pewsey A. 2009. Sinh-arcsinh distributions. Biometrika 96: 761-80
Jones MC, Pewsey A, Kato S. 2015. On a class of circulas: copulas for circular distributions. Ann. Inst. Stat. Math. 67: 843-62
Kato S, Jones MC. 2010. A family of distributions on the circle with links to, and applications arising from, Möbius transformation. J. Am. Stat. Assoc. 105: 249-62
Kato S, Jones MC. 2015. A tractable and interpretable four-parameter family of unimodal distributions on the circle. Biometrika 102: 181-90
Kato S, Pewsey A. 2015. A Möbius transformation-induced distribution on the torus. Biometrika 102: 359-70
Kato S, Shimizu K. 2008. Dependent models for observations which include angular ones. J. Stat. Plan. Infer. 138: 3538-49
Kelker D. 1970. Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhy?a Indian J. Stat. A 32: 419-30
Kleiber C, Kotz S. 2003. Statistical Size Distributions in Economics and Actuarial Sciences. New York Wiley
Kotz S, BalakrishnanN, Johnson NL. 2004. Continuous Multivariate Distributions: Models and Applications, Vol. 1. New York Wiley
Kowalczyk H. 2013. Inflation fan charts and different dimensions of uncertainty. What if macroeconomic uncertainty is high?Work. Pap. 157, Natl. Bank Pol., Warsaw, Pol.
Kurowicka D, Joe H. 2010. Dependence Modeling: Vine Copula Handbook. Singapore: World Sci.
Lagona F. 2019. Correlated cylindrical data. In Applied Directional Statistics: Modern Methods and Case Studies, ed. C Ley, T Verdebout, pp. 45-60. Boca Raton, FL: Chapman and Hall/CRC
Lagona F, Picone M, Maruotti A. 2015. A Hidden Markov model for the analysis of cylindrical time series. Environmetrics 26: 535-44
Lambert P, Vandenhende F. 2002. A copula-based model for multivariate non-normal longitudinal data: analysis of a dose titration safety study on a new antidepressant. Stat. Med. 21: 3197-217
Lee C, Famoye F, Alzaatreh A. 2013. Methods for generating families of univariate continuous distributions in the recent decades. WIREs Comput. Stat. 5: 219-38
Ley C. 2015. Flexible modelling in statistics: past, present and future. J. Soc. Fr. Stat. 156: 76-96
Ley C, Paindaveine D. 2010a. Multivariate skewing mechanisms: a unified perspective based on the transformation approach. Stat. Probab. Lett. 80: 1685-94
Ley C, Paindaveine D. 2010b. On the singularity of multivariate skew-symmetric models. J. Multivar. Anal. 101: 1434-44
Ley C, Van deWiele T, Van Eetvelde H. 2019. Ranking soccer teams on the basis of their current strength: a comparison of maximum likelihood approaches. Stat. Model. 19: 55-77
Ley C, Verdebout T. 2017. Modern Directional Statistics. Boca Raton, FL: Chapman and Hall/CRC
Lombardi MJ, Veredas D. 2009. Indirect estimation of elliptical stable distributions. Comput. Stat. Data Anal. 53: 2309-24
Mardia KV. 1975. Statistics of directional data (with discussion). J. R. Stat. Soc. Ser. B 37: 349-93
Mardia KV, Hughes G, Taylor CC, Singh H. 2008. A multivariate von Mises distribution with applications to bioinformatics. Can. J. Stat. 36: 99-109
Mardia KV, Jupp PE. 2000. Directional Statistics. New York Wiley
Mardia KV, Ley C. 2018. Directional distributions. In Wiley StatsRef: Statistics Reference Online, ed. N Balakrishnan, T Colton, B Everitt, WPiegorsch, F Ruggeri, J Teugels, pp. 1-13. New York Wiley
Mardia KV, Patrangenaru V. 2005. Directions and projective shapes. Ann. Stat. 33: 1666-99
Mardia KV, Sutton TW. 1975. On the modes of a mixture of two vonMises distributions. Biometrika 62: 699-701
Mardia KV, Sutton TW. 1978. A model for cylindrical variables with applications. J. R. Stat. Soc. Ser. B 40: 229-33
Mardia KV, Taylor CC, Subramaniam GK. 2007. Protein bioinformatics and mixtures of bivariate von Mises distributions for angular data. Biometrics 63: 505-12
McLachlan GJ, BasfordKE. 1988. MixtureModels: Inference and Applications to Clustering. NewYork: M. Dekker McLachlan GJ, Lee SX, Rathnayake SI. 2019. Finite mixture models. Annu. Rev. Stat. Appl. 6: 355-78
Mengersen KL, Robert C, Titterington M. 2011. Mixtures: Estimation and Applications. New York Wiley
Mooney JA, Helms PJ, Jolliffe IT. 2003. Fitting mixtures of von Mises distributions: a case study involving sudden infant death syndrome. Comp. Stat. Data Anal. 41: 505-13
MüllerD, CzadoC. 2019. Dependence modelling in ultra high dimensions with vine copulas and the Graphical Lasso. Comput. Stat. Data Anal. 137: 211-32
Nagler T, Bumann C, Czado C. 2019. Model selection in sparse high-dimensional vine copula models with an application to portfolio risk. J. Multivar. Anal. 172: 180-92
Nagler T, Schepsmeier U, Stoeber J, Brechmann EC, Graeler B, Erhardt T. 2020. VineCopula: statistical inference of vine copulas. R Package, version 2. 4. 1. https://CRAN. R-project. org/package=VineCopula
Nelsen RB. 2003. Properties and applications of copulas: A brief survey. In Proceedings of the First Brazilian Conference on Statistical Modelling in Insurance and Finance, ed. J Dhaene, N Kolev, P Morettin, pp. 10-28. Sao Paulo: Inst. Math. Stat. Univ. Sao Paulo
Nelsen RB. 2006. An Introduction to Copulas. New York: Springer. 2nd ed.
Paindaveine D. 2012. Elliptical symmetry. In Encyclopedia of Environmetrics, ed. AH El-Shaarawi, WPiegorsch, pp. 802-7. New York: Wiley. 2nd ed.
Pewsey A. 2008. The wrapped stable family of distributions as a flexible model for circular data. Comp. Stat. Data Anal. 52: 1516-23
Pewsey A, Kato S. 2016. Parametric bootstrap goodness-of-fit testing for Wehrly-Johnson bivariate circular distributions. Stat. Comput. 26: 1307-17
Pewsey A, Neuh äuser M, Ruxton GD. 2013. Circular Statistics in R. Oxford, UK: Oxford Univ. Press
Ramachandran GN. 1963. Stereochemistry of polypeptide chain configurations. J. Mol. Biol. 7: 95-99
Ranalli M, Lagona F, Picone M, Zambianchi E. 2018. Segmentation of sea current fields by cylindrical hidden Markov models: a composite likelihood approach. J. R. Stat. Soc. Ser. C 67: 575-98
Scrucca L, Fop M, Murphy TB, Raftery AE. 2016. mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 8: 205-33
Sellers KF, Swift AW, Weems KS. 2017. A flexible distribution class for count data. J. Stat. Distrib. Appl. 4: 22
Siffer A. 2018. Rfolding: the folding test of unimodality. R Package, version 1. 0. https://CRAN. R-project. org/package=Rfolding
Singh H, Hnizdo V, Demchuk E. 2002. Probabilistic model for two dependent circular variables. Biometrika 89: 719-23
Sklar M. 1959. Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 8: 229-31
Stigler SM. 1986. TheHistory of Statistics: The Measurement of Uncertainty Before 1900. Cambridge, MA: Harvard Univ. Press
Tibshirani R, Taylor J, Lockhart R, Tibshirani R, Fithian W, et al. 2015. Recent advances in post-selection statistical inference. Breiman Lecture presented at NIPS 2015, the Twenty-Ninth Conference on Neural Information Processing Systems, Dec. 7-12, Montreal, Can.
Tukey JW. 1977. Modern techniques in data analysis. Presented at the NSF-Sponsored Regional Research Conference, Southern Massachusetts University, North Dartmouth, MA
Wang J, Boyer J, Genton MG. 2004. A skew-symmetric representation of multivariate distributions. Stat. Sin. 14: 1259-70
Wehrly TE, Johnson RA. 1980. Bivariatemodels for dependence of angular observations and a related Markov process. Biometrika 66: 255-56
Wraith D, Forbes F. 2015. Location and scale mixtures of Gaussians with flexible tail behaviour: properties, inference and application to multivariate clustering. Comput. Stat. Data Anal. 90: 61-73
Wuertz D, Setz T, Chalabi Y. 2020. fmultivar: Rmetrics-analysing and modeling multivariate financial return distributions. R Package, version 3042. 80. 1. https://CRAN. R-project. org/package=fMultivar
Ye Y. 1987. Interior algorithms for linear, quadratic, and linearly constrained non-linear programming. PhD Thesis, Dep. ESS, Stanford University
Yeo IK, Johnson RA. 2000. A new family of power transformations to improve normality or symmetry. Biometrika 87: 954-59