Malahanobis distance; Median absolute deviation; Minimum covariance determinant; Outliers; Preregistration; Robust detection; Social Psychology
Abstract :
[en] Researchers often lack knowledge about how to deal with outliers when analyzing their data. Even more frequently, researchers do not pre-specify how they plan to manage outliers. In this paper we aim to improve research practices by outlining what you need to know about outliers. We start by providing a functional definition of outliers. We then lay down an appropriate nomenclature/classification of outliers. This nomenclature is used to understand what kinds of outliers can be encountered and serves as a guideline to make appropriate decisions regarding the conservation, deletion, or recoding of outliers. These decisions might impact the validity of statistical inferences as well as the reproducibility of our experiments. To be able to make informed decisions about outliers you first need proper detection tools. We remind readers why the most common outlier detection methods are problematic and recommend the use of the median absolute deviation to detect univariate outliers, and of the Mahalanobis-MCD distance to detect multivariate outliers. An R package was created that can be used to easily perform these detection tests. Finally, we promote the use of pre-registration to avoid flexibility in data analysis when handling outliers.
Leys, Christophe ; Université Libre de Bruxelles, Service of Analysis of the Data (SAD), Bruxelles, Belgium
Delacre, Marie; Université Libre de Bruxelles, Service of Analysis of the Data (SAD), Bruxelles, Belgium
Mora, Youri L.; Université Libre de Bruxelles, Service of Analysis of the Data (SAD), Bruxelles, Belgium
Lakens, Daniël; Eindhoven University of Technology, Human Technology Interaction Group, Eindhoven, Netherlands
LEY, Christophe ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Mathematics (DMATH) ; Universiteit Gent, Department of Applied Mathematics, Computer Science and Statistics, Gent, Belgium
External co-authors :
yes
Language :
English
Title :
How to classify, detect, and manage univariate and multivariate outliers, with emphasis on pre-registration
Abelson, R.-P. (1995). Statistics as principled argument. Hillsdale, NJ: Lawrence Earlbaum Associates.
Aguinis, H., Gottfredson, R.-K., & Joo, H. (2013). Best-practice recommendations for defining, identifying, and handling outliers. Organizational Research Methods, 16(2), 270–301. DOI: https://doi.org/10.1177/1094428112470848
Antonovsky, A. (1987). Unraveling the mystery of health. How people manage stress and stay well. San Francisco: Jossey-Bass Publishers.
Bakker, M., & Wicherts, J. M. (2014). Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: The power of alternatives and recommendations. Psychological Methods, 19(3), 409–427. DOI: https://doi.org/10.1037/ met0000014
Chang, C. L., McAleer, M., & Wong, W. K. (2018). Big data, computational science, economics, finance, marketing, management, and psychology: Connections. Journal of Risk and Financial Management, 11(1), 15. DOI: https://doi.org/10.3390/jrfm11010015
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple correlation/regression analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Earlbaum Associates.
Cousineau, D., & Chartier, S. (2010). Outliers detection and treatment: A review. International Journal of Psychological Research, 3(1), 58–67. DOI: https://doi.org/10.21500/20112084.844
Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi, L. (1974). The Hopkins Symptom Checklist (HSCL): A self-report symptom inventory. Behavioral Science, 19(1), 1–15. DOI: https://doi.org/10.1002/bs.3830190102
Donoho, D. L., & Huber, P. J. (1983). The notion of breakdown point. In P.-J. Bickel, K. Diksum, & J.-L. Hodges (Eds.), A Festschrift for Erich L. Lehmann. California: Wadsworth.
Efron, B., & Tibshirani, R. J. (1994). An introduction to the bootstrap. New York: Chapman & Hall. DOI: https://doi.org/10.1007/978-1-4899-4541-9
Hall, P. (1986). On the bootstrap and confidence intervals. The Annals of Statistics, 14(4). DOI: https://doi.org/10.1214/aos/1176350168
Howell, D. (1997). Statistical methods for psychology. Boston, Massachusetts: Duxbury Press.
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. DOI: https://doi.org/10.1177/0956797611430953
Klein, O., Hardwicke, T. E., Aust, F., Breuer, J., Danielsson, H., Mohr, A. H., Frank, M. C., et al. (2018). A practical guide for transparency in psychological science. Collabra: Psychology, 4(1), 20. DOI: https://doi.org/10.1525/collabra.158
Kline, R. B. (2015). Principles and practice of structural equation modeling. London: Guilford Publications.
Leys, C., Klein, O., Dominicy, Y., & Ley, C. (2018). Detecting multivariate outliers: Use a robust variant of the Mahalanobis distance. Journal of Experimental Social Psychology, 74, 150–156. DOI: https://doi.org/10.1016/j.jesp.2017.09.011
Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49(4), 764–766. DOI: https://doi.org/10.1016/j.jesp.2013.03.013
Leys, C., & Schumann, S. (2010). A nonparametric method to analyze interactions: The adjusted rank transform test. Journal of Experimental Social Psychology, 46(4), 684–688. DOI: https://doi.org/10.1016/j.jesp.2010.02.007
Mahalanobis, P. C. (1930). On tests and measures of groups divergence, theoretical formulae. International Journal of the Asiatic Society of Bengal, 26, 541–588.
McClelland, G. H. (2000). Nasty data: Unruly, ill-mannered observations can ruin your analysis. In H. T. Reis, & C. M. Judd (Eds.), Handbook of research methods in social and personality psychology (pp. 393–411). New York, NY: Cambridge University Press.
McGuire, W. J. (1997). Creative hypothesis generating in psychology: Some useful heuristics. Annual Review of Psychology, 48, 1–30. DOI: https://doi.org/10.1146/annurev.psych.48.1.1
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606. DOI: https://doi.org/10.1073/pnas.1708274114
Saltelli, A., Chan, K., & Scott, E. M. (2000). Sensitivity analysis (Vol. 1). New York: Wiley.
Sheskin, D. J. (2004). Handbook of parametric and nonparametric statistical procedures. Boca Raton, FL: CRC Press. DOI: https://doi.org/10.4324/9780203489536
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. DOI: https://doi.org/10.1177/0956797611417632
Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702–712. DOI: https://doi.org/10.1177/1745691616658637
Tabachnick, B. G., & Fidell, L. S. (2013). Using multivariate statistics (6th ed.). Boston: Pearson.
Tukey, J. W., & McLaughlin, D. H. (1963). Less vulnerable confidence and significance procedures for location based on a single sample: Trimming/winsorization 1. Sankhyã: The Indian Journal of Statistics, Series A, 25(3), 331–352.
van’t Veer, A. E., & Giner-Sorolla, R. (2016). Pre-registration in social psychology—A discussion and suggested template. Journal of Experimental Social Psychology, 67, 2–12. DOI: https://doi.org/10.1016/j.jesp.2016.03.004
Wicherts, J. M., Veldkamp, C. L., Augusteijn, H. E., Bakker, M., Van Aert, R., & Van Assen, M. A. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 7(1832), 1–12. DOI: https://doi.org/10.3389/fpsyg.2016.01832
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100–1122. DOI: https://doi.org/10.1177/1745691617693393