There is a marvelous paper in PLOS Biology about p-hacking:
P-hacking is changing the analysis to get a significant p-value. P-hacking is defined in the paper as "selective reporting" or "inflation bias". The paper defines it very well:
It [p-hacking] occurs when researchers try out several statistical analyses and/or data eligibility specifications and then selectively report those that produce significant results [12–15]. [citations from original paper:
- 12. Brodeur A, Le M, Sangnier M, Zylberberg Y (2012) Star Wars: The empirics strike back. Paris School of Economics Working Paper 2012. ttp://ssrn.com/abstract=2089580.
- 13. Cumming G (2014) The new statistics: Why and how. Psychol Sci 25: 7–29. doi: 10.1177/0956797613504966. pmid:24220629 View Article PubMed/NCBI Google Scholar
- 14. Simmons JP, Nelson LD, Simonsohn U (2011) False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci 22: 1359–1366. doi: 10.1177/0956797611417632. pmid:22006061 View Article PubMed/NCBI Google Scholar
- 15. Gadbury GL, Allison DB (2014) Inappropriate fiddling with statistical analyses to obtain a desirable p-value: Tests to detect its presence in published literature. PLoS ONE 7: e46363. doi: 10.1371/journal.pone.0046363 View Article PubMed/NCBI Google Scholar
What I really really liked was the list of all the ways in which people commit sins of p-hacking. It reminds me of the prayers that get said at Rosh Hashanah and Yom Kippur when one asks for forgiveness for the sins one has committed in the year. For the statistical sin I committed by
- conducting analyses midway through experiments to decide whether to continue collecting data [15,16];
- recording many response variables and deciding which to report postanalysis [16,17],
- deciding whether to include or drop outliers postanalyses ,
- excluding, combining, or splitting treatment groups postanalysis ,
- including or excluding covariates postanalysis ,
- stopping data exploration if an analysis yields a significant p-value [18,19].
[see original paper for all the citations]
As someone originally trained in statistics and data analysis (before biology), I think the movement to reconsider how we use statistics and NHST (null hypothesis significance testing), as well as the lack of thinking about effect size are absolutely critical to Doing Good Science.
My lab group talked about these
sins, er problems today during journal club. I think there's a lot to be said, but each deserves its own post.