Social Psychology; Sociology and Political Science
Abstract :
[en] A look at the psychology literature reveals that researchers still seem to encounter difficulties in coping with multivariate outliers. Multivariate outliers can severely distort the estimation of population parameters. Detecting multivariate outliers is mainly disregarded or done by using the basic Mahalanobis distance. However, that indicator uses the multivariate sample mean and covariance matrix that are particularly sensitive to outliers. Hence, this method is problematic. We highlight the disadvantages of the basic Mahalanobis distance and argue instead in favor of a robust Mahalanobis distance. In particular, we present a variant based on the Minimum Covariance Determinant, a more robust procedure that is easy to implement. Using Monte Carlo simulations of bivariate sample distributions varying in size (ns = 20, 100, 500) and population correlation coefficient (ρ =.10,.30,.50), we demonstrate the detrimental impact of outliers on parameter estimation and show the superiority of the MCD over the Mahalanobis distance. We also make recommendations for deciding whether to include vs. exclude outliers. Finally, we provide the procedures for calculating this indicator in R and SPSS software.
Disciplines :
Mathematics Social & behavioral sciences, psychology: Multidisciplinary, general & others
Author, co-author :
Leys, Christophe; Université libre de Bruxelles, Centre de Recherche en Psychologie Sociale et Interculturelle, Belgium
KLEIN, Olivier ; University of Luxembourg ; Université libre de Bruxelles, Centre de Recherche en Psychologie Sociale et Interculturelle, Belgium
Dominicy, Yves; Université libre de Bruxelles, Solvay Brussels School of Economics and Management, ECARES, Belgium
LEY, Christophe ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Mathematics (DMATH) ; Ghent University, Department of Applied Mathematics, Computer Science and Statistics, Belgium
External co-authors :
yes
Language :
English
Title :
Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance
Bakker, M., Wicherts, J.M., Outlier removal, sum scores, and the inflation of the type I error rate in independent samples t-tests: The power of alternatives and recommendations. Psychological Methods 19:3 (2014), 409–427.
Barnett, V., Lewis, T., Outliers in statistical data. 3rd edition, 1994, John Wiley & Sons, New York.
Burrow, A., Rainone, N., How many likes did I get?: Purpose moderates links between positive social media feedback and self-esteem. Journal of Experimental Social Psychology 69 (2017), 232–236, 10.1016/j.jesp.2016.09.005.
Butler, R., Davies, P., Jhun, M., Asymptotics for the minimum covariance determinant estimator. The Annals of Statistics 21:3 (1993), 1385–1400.
Cohen, J., A power primer. Psychological Bulletin 112:1 (1992), 155–159.
Cohen, J., Cohen, P., West, S.G., Aiken, L.S., Applied multiple correlation/regression analysis for the behavioral sciences. 2003, Taylor & Francis, UK.
Cook, R.D., Detection of influential observation in linear regression. Technometrics 19:1 (1977), 15–18.
Cousineau, D., Chartier, S., Outliers detection and treatment: a review. International Journal of Psychological Research 3:1 (2010), 58–67.
Daszykowski, M., Kaczmarek, K., Vander Heyden, Y., Walczak, B., Robust statistics in data analysis – a review: basic concepts. Chemometrics and Intelligent Laboratory Systems 85:2 (2007), 203–219.
Donoho, D.L., Huber, P.J., The notion of breakdown point. Bickel, P.J., Doksum, K., Hodges, J.L. Jr., (eds.) A Festschrift for Erich L. Lehmann, 1983, Wadsworth, California, 157–184.
Fauconnier, C., Haesbroeck, G., Outliers detection with the minimum covariance determinant estimator in practice. Statistical Methodology 6:4 (2009), 363–379.
Hayes, A.F., Introduction to mediation, moderation, and conditional process analysis: A regression-based approach. 2013, Guilford Press, London.
Hubert, M., Rousseeuw, P.J., Van Aelst, S., High-breakdown robust multivariate methods. Statistical Science 23:1 (2008), 92–119.
Judd, C.M., McClelland, G.H., Ryan, C.S., Data Analysis: A Model Comparison Approach to Regression, ANOVA, and Beyond. 3rd ed, 2017, Routledge, Abingdon, UK.
Kline, R.B., Principles and practice of structural equation modeling. 2015, Guilford publications, London.
Leys, C., Ley, C., Klein, O., Bernard, P., Licata, L., Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology 49:4 (2013), 764–766.
Lopuhaä, H.P., On the relation between S-estimators and M-estimators of multivariate location and covariance. The Annals of Statistics 17:4 (1989), 1662–1683.
Mahalanobis, P.C., On tests and measures of groups divergence. Journal of Asiatic Sociology of Bengal 26 (1930), 541–588.
Maronna, R.A., Robust M-estimators of multivariate location and scatter. The Annals of Statistics 4:1 (1976), 51–67.
McClelland, G.H., Nasty data: Unruly, ill-mannered observations can ruin your analysis. Reis, H.T., Judd, C.M., (eds.) Handbook of research methods in social and personality psychology, 2000, Cambridge University Press, Cambridge, 393–411.
McGuire, W.J., Creative hypothesis generating in psychology: Some useful heuristics. Annual Review of Psychology 48 (1997), 1–30.
Muller, D., Judd, C.M., Yzerbyt, V.Y., When moderation is mediated and mediation is moderated. Journal of Personality and Social Psychology 89:6 (2005), 852–863.
Richard, F.D., Bond, C.F. Jr., Stokes-Zoota, J.J., One hundred years of social psychology quantitatively described. Review of General Psychology 7:4 (2003), 331–363, 10.1037/1089-2680.7.4.331.
Rousseeuw, P.J., Least median of squares regression. Journal of the American Statistical Association 79:388 (1984), 871–880.
Rousseeuw, P.J., Multivariate estimation with high breakdown point. Grossmann, W., Pflug, G., Vincze, I., Wertz, W., (eds.) Mathematical Statistics and Applications, Vol. B, 1985, Reidel, Netherlands, 283–297.
Rousseeuw, P.J., Leroy, A.M., Robust regression and outlier detection. 1987, John Wiley & Sons, Inc., New York.
Rousseeuw, P.J., Van Driessen, K., A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:3 (1999), 212–223.
Rousseeuw, P.J., Van Zomeren, B.C., Unmasking multivariate outliers and leverage points (with discussion). Journal of the American Statistical Association 85:411 (1990), 633–651.
Simmons, J.P., Nelson, L.D., Simonsohn, U., False positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22:11 (2011), 1359–1366.
Stahel, W.A., Breakdown of covariance estimators. Research Report 31, Fachgruppe für Statistik, 1981, E.T.H. Zürich, Switzerland.
Tatsuoka, K.S., Tyler, D.E., On the uniqueness of S-functionals and M-functionals under nonelliptical distributions. The Annals of Statistics 28:4 (2000), 1219–1243.
Thode, H.C., Testing for normality. 2002, Marcel Dekker, New York.
van't Veer, A.E., Giner-Sorolla, R., Pre-registration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology 67 (2016), 2–12, 10.1016/j.jesp.2016.03.004.