Anyone who does animal-based research, or heck, human-based research has measurement error as a close and intimate companion. Variation is a fact of life. [Aside, one of the best organized volumes to help you think about variation is this book by Hall and Hallgrimsson, Variation: A Central Concept in Biology. You can download the pdf's. I am sorry that it is Elsevier. But also buy a cheap version here. The first chapter is why variation is an important concept in all of biology, and why it is not just "noise".]
Amid the brouhaha about reproducibility in the social sciences, one voice of reason that I follow is Andrew Gelman who often has interesting & informative things about data analysis that actually help me in my work. Erik Loken (who do not know, or know of) and Gelman had an article in Science titled "Measurement error and the replication crisis".
I was drawn to it because of the sub-title: "The assumption that measurement error always reduces effect sizes is false.". I think effect size is one of the most neglected concepts in biomedical/neuroscience/evolutionary studies. When I got asked to give an "honorary" (hahah) talk at one of my home societies, I did not talk about the exact science of what I do. They are clinical. I am animal models, and I by and large get neither traction nor respect for my work from those folks. So I talked about significance and the difference between clinical significance and statistical significance and what the heck effect size means in general. When the editor in chief of the society's journal did stand up and say "I don't care, I won't publish without p-values", I knew I had actually scored and said something reasonable.
Back to Loken & Gelman. Measurement error is often used as the (pick your socially insensitive alternative metaphor, if you wish) scapegoat of studies with large variation amongst subjects. One of their theses in this article is that if you can find an effect size of meaning (there are ways to measure this), and your data are noisy, that means that the effect size is probably better, stronger and more apt to be "true", because you are seeing the effect size despite the noise.
Yet, L&G are saying that no, this is not the case. In fact, noisy data can make the effect size seem bigger (better?) than might be "true". I keep putting true in quotes, and using it as shorthand for what we want to know - what is it that we want to know. Do not bust my chops over this word. here. later, maybe.
They are explicit as to why this bias exists:
It seems intuitive that producing a result under challenging circumstances makes it all the more impressive.
They have two reasons that this is not valid.
The first is what they term researcher degrees of freedom. These are the decisions that any scientist makes from design through final analysis. All of these choices, given our human nature, tend to bias us towards significance. The second is that statistical significance is not necessary a good indicator of the effect, especially if the sample size is small. The article contains cites to studies supporting these points and Gelman blogged about it here. Part of their argument is that in small samples with noise, the standard errors will be higher, which in turn will make the estimates higher, which will magnify effects. When I read these first through, it was confusing. I will come back to this, and try and explain better, in a subsequent post.
Part of what is important, IMO, is advice on what to do to avoid this. Some parts of noise are unavoidable, so what can be done to offset the issue?
One of the papers cited, Simmons & Nelson, had actual concrete steps for authors. Here are their "Requirements for authors"
1. Authors must decide the rule for terminating data collection before data collection begins and report this rule in the article.
This is a level of self-discipline that is hard to add to one's work. For me, sometimes, it feels unrealistic. I'm still figuring out exactly what it is that can be measured. I've got ideas and plans and designs going into it, but quite often things fail, and fail spectacularly, both in physical disaster as well as intellectual/data/design disaster. Still, it is worth trying to do, and worth adding to the work.
2. Authors must collect at least 20 observations per cell or else provide a compelling cost-of-data-collection justification.
Now we get back to my diatribes on what is your unit-of-analysis. Is it the three bunnies you watch, the 10 trials per bunny, or the 40 hops per trial per bunny?
3. Authors must list all variables collected in a study.
This one is easy. Why aren't we doing it? Because we sometimes don't know what can be extracted from the data. If you've filmed your bunny hopping, there is a lot more than frequency of footfall that you could measure.
4. Authors must report all experimental conditions, including failed manipulations.
Again easy. Again why not? Hmmm..
5. If observations are eliminated, authors must also report what the statistical results are if those observations are included.
There is part of me that says: you have to be told to do this? Nu?
6. If an analysis includes a covariate, authors must report the statistical results of the analysis without the covariate.
Yes. But this is one we often don't think about, because that covariate often makes the results stronger, why report lesser results? Yet, there is a good point here.
I think this stuff is very important, and wanted to add more, but this has already gotten quite long. So, plow through, and I will try and pick up on this again. Soon. Real Soon.