References

American Psychological Association. 2001. Publication Manual of the American Psychological Association. 5th Edition.

Baguley, Thom. 2004. “Understanding statistical power in the context of applied research.” Applied Ergonomics 35 (2): 73–80. https://doi.org/10.1016/j.apergo.2004.01.002.

———. 2009. “Standardized or simple effect size: what should be reported?” British Journal of Psychology 100 (3): 603–17. https://doi.org/10.1348/000712608X377117.

Bakeman, Roger. 2005. “Recommended Effect Size Statistics for Repeated Measures Designs.” Behavior Research Methods.

Cockburn, Andy, Karl Gutwin, and Alan Dix. 2018. “HARK No More: On the Preregistration of Chi Experiments.” ACM.

Cohen, Jacob. 1977. “The t Test for Means.” In Statistical Power Analysis for the Behavioral Sciences, Revised Ed, 19–74. Academic Press. https://doi.org/10.1016/B978-0-12-179060-8.50007-4.

———. 1988. Statistical Power Analysis for the Behavioral Sciences. Lawrence Earlbaum Associates.

———. 1994. “The Earth Is Round (P<.05).” American Psychologist 49 (12). American Psychological Association: 997. http://ist-socrates.berkeley.edu/~maccoun/PP279_Cohen1.pdf.

Cumming, Geoff. 2013. Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Routledge.

———. 2014. “The New Statistics: Why and How.” Psychological Science 25 (1): 7–29. https://doi.org/10.1177/0956797613504966.

Cummings, Peter. 2011. “Arguments for and Against Standardized Mean Differences (Effect Sizes).” Archives of Pediatrics & Adolescent Medicine 165 (7): 592. https://doi.org/10.1001/archpediatrics.2011.97.

Dixon, Peter. 2003. “The P-Value Fallacy and How to Avoid It.” Canadian Journal of Experimental Psychology/Revue Canadienne de Psychologie Experimentale 57 (3). Canadian Psychological Association: 189. https://www.ncbi.nlm.nih.gov/pubmed/14596477.

Dragicevic, Pierre. 2016. “Fair Statistical Communication in Hci.” In Modern Statistical Methods for Hci, 291–330. Springer. https://hal.inria.fr/hal-01377894/document.

Earp, Brian D, and David Trafimow. 2015. “Replication, Falsification, and the Crisis of Confidence in Social Psychology.” Frontiers in Psychology 6. Frontiers Media SA. https://www.frontiersin.org/articles/10.3389/fpsyg.2015.00621/full.

Ehrenberg, ASC. 1977. “Rudiments of Numeracy.” Journal of the Royal Statistical Society. Series A (General). JSTOR, 277–97. http://www1.maths.leeds.ac.uk/~sta6ajb/math1910/p4.pdf.

Fisher, Ronald. 1955. “Statistical Methods and Scientific Induction.” Journal of the Royal Statistical Society. Series B (Methodological). JSTOR, 69–78. http://www.ssnpstudents.com/wp/wp-content/uploads/2015/02/Fisher-1955.pdf.

Gelman, Andrew. 2017. “Ethics and Statistics: Honesty and Transparency Are Not Enough.” Chance 30 (1). Taylor & Francis: 37–39. http://www.stat.columbia.edu/~gelman/research/published/ChanceEthics14.pdf.

Gelman, Andrew, and Eric Loken. 2013. “The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem, Even When There Is No ‘Fishing Expedition’ or ‘P-Hacking’ and the Research Hypothesis Was Posited Ahead of Time.” Department of Statistics, Columbia University. http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf.

Gelman, Andrew, Cristian Pasarica, and Rahul Dodhia. 2002. “Let’s Practice What We Preach: Turning Tables into Graphs.” The American Statistician 56 (2). Taylor & Francis: 121–30. https://pdfs.semanticscholar.org/202c/fec06a87fc96d3d56b6ad2ba4237b3fde141.pdf.

Gigerenzer, Gerd. 2004. “Mindless Statistics.” The Journal of Socio-Economics 33 (5). Elsevier: 587–606. http://pubman.mpdl.mpg.de/pubman/item/escidoc:2101336/component/escidoc:2101335/GG_Mindless_2004.pdf.

Gigerenzer, Gerd, and Julian N Marewski. 2015. “Surrogate Science: The Idol of a Universal Method for Scientific Inference.” Journal of Management 41 (2). Sage Publications Sage CA: Los Angeles, CA: 421–40. http://www.dcscience.net/Gigerenzer-Journal-of-Management-2015.pdf.

Giner-Sorolla, Roger. 2012. “Science or Art? How Aesthetic Standards Grease the Way Through the Publication Bottleneck but Undermine Science.” Perspectives on Psychological Science 7 (6). Sage Publications Sage CA: Los Angeles, CA: 562–71. http://journals.sagepub.com/doi/full/10.1177/1745691612457576.

Ioannidis, John PA. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8). Public Library of Science: e124. http://robotics.cs.tamu.edu/RSS2015NegativeResults/pmed.0020124.pdf.

Kampenes, Vigdis By, Tore Dybå, Jo E. Hannay, and Dag I.K. Sjøberg. 2007. “A Systematic Review of Effect Size in Software Engineering Experiments.” Information and Software Technology 49 (11): 1073–86. https://doi.org/https://doi.org/10.1016/j.infsof.2007.02.015.

Kaptein, Maurits, and Judy Robertson. 2012. “Rethinking Statistical Analysis Methods for Chi.” In Proceedings of the Sigchi Conference on Human Factors in Computing Systems, 1105–14. ACM. http://judyrobertson.typepad.com/files/chi2012_submission_final.pdf.

Kastellec, Jonathan P, and Eduardo L Leoni. 2007. “Using Graphs Instead of Tables in Political Science.” Perspectives on Politics 5 (04). Cambridge Univ Press: 755–71.

Kay, Matthew, Gregory L Nelson, and Eric B Hekler. 2016. “Researcher-Centered Design of Statistics: Why Bayesian Statistics Better Fit the Culture and Incentives of Hci.” In Proceedings of the 2016 Chi Conference on Human Factors in Computing Systems, 4521–32. ACM. http://www.mjskay.com/papers/chi_2016_bayes.pdf.

Kerr, Norbert L. 1998. “HARKing: Hypothesizing After the Results Are Known.” Personality and Social Psychology Review 2 (3). Sage Publications Sage CA: Los Angeles, CA: 196–217. http://www.socialrelationslab.com/uploads/1/8/9/6/18966149/harkingkerr1998.pdf.

Kirby, Kris N, and Daniel Gerlanc. 2013. “BootES: An R Package for Bootstrap Confidence Intervals on Effect Sizes.” Behavior Research Methods 45 (4). Springer: 905–27. http://web.williams.edu/Psychology/Faculty/Kirby/bootes-kirby-gerlanc-in-press.pdf.

Kruschke, John K, and Torrin M Liddell. 2017. “The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective.” Psychonomic Bulletin & Review. Springer, 1–29. https://osf.io/ksfyr/download?format=pdf.

Lenth, Russel V. 2001. “Some practical guidelines for effective sample size determination.” The American Statistician 55 (3): 187–93. https://doi.org/10.1198/000313001317098149.

Loftus, Geoffrey R. 1993. “A Picture Is Worth a Thousand P Values: On the Irrelevance of Hypothesis Testing in the Microcomputer Age.” Behavior Research Methods, Instruments, & Computers 25 (2). Springer: 250–56. https://faculty.washington.edu/gloftus/Research/Publications/Manuscript.pdf/Loftus%20p-values%201993.pdf.

Norman, Geoff. 2010. “Likert Scales, Levels of Measurement and the ‘Laws’ of Statistics.” Advances in Health Sciences Education 15 (5). Springer: 625–32. https://pdfs.semanticscholar.org/6dc0/0756ab722370b815df1223f4044dd63841a8.pdf.

Nosek, Brian A, Charles R Ebersole, Alexander DeHaven, and David Mellor. 2017. “The Preregistration Revolution.” Open Science Framework. https://osf.io/2dxu5/download?format=pdf.

Olejnik, Stephen, and James Algina. 2003. “Generalized Eta and Omega Squared Statistics: Measures of Effect Size for Some Common Research Designs.” Psychological Methods.

Rosenthal, Robert. 1991. Meta-Analytic Procedures for Social Research. Vol. 6. Sage.

Sauro, Jeff, and James R. Lewis. 2010. “Average task times in usability tests.” Proceedings of the 28th International Conference on Human Factors in Computing Systems - CHI ’10. https://doi.org/10.1145/1753326.1753679.

Simmons, Joseph P, Leif D Nelson, and Uri Simonsohn. 2011. “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22 (11). Sage Publications Sage CA: Los Angeles, CA: 1359–66. http://opim.wharton.upenn.edu/DPlab/papers/publishedPapers/Simmons_2011_False-Positive%20Psychology.pdf.

“Statistical Dances: Why No Statistical Analysis Is Reliable and What to Do About It.” 2017. https://tinyurl.com/gricad-dance. https://tinyurl.com/gricad-dance.

Stewart-Oaten, Allan. 1995. “Rules and Judgments in Statistics: Three Examples.” Ecology 76 (6). Wiley Online Library: 2001–9. http://onlinelibrary.wiley.com/doi/10.2307/1940736/full.

Taylor, John. 1997. Introduction to Error Analysis, the Study of Uncertainties in Physical Measurements. University Science Books.

“Transparent Statistics Website.” 2017. http://transparentstatistics.org/.

Tukey, John W. 1977. “Exploratory Data Analysis.” Reading, Mass.

Wierdsma, A. 2013. “What Is Wrong with Tests of Normality?” http://tinyurl.com/normality-wrong. http://tinyurl.com/normality-wrong.

Wilkinson, Leland. 1999. “Statistical Methods in Psychology Journals: Guidelines and Explanations.” American Psychologist 54 (8). American Psychological Association: 594.

Wilson, Max L, Wendy Mackay, Ed Chi, Michael Bernstein, Dan Russell, and Harold Thimbleby. 2011. “RepliCHI-Chi Should Be Replicating and Validating Results More: Discuss.” In CHI’11 Extended Abstracts on Human Factors in Computing Systems, 463–66. ACM. https://hal.inria.fr/file/index/docid/1000423/filename/RepliCHI-panel-2011.pdf.