Archive for the 'data analysis' category

Significance, p-values, and what to publish

Feb 26 2015 Published by under data analysis, ng, publishing, statistics, Uncategorized

I gave a Big Talk once on the difference between statistical significance and clinical significance. It was poorly received by the editor of the society's journal. He said to me that now he was going to be inundated with papers that say "Dr. Theron sez I don't need no stinkin' numbers". Actually he didn't quote The Treasure of Sierra Madre,  but the implication was there.

My simple point was that effect size is as important as significance. I've talked about this some, before. Something can be significant (because of pseudo-replication, over sampling, etc) but that the effect size is trivial, or less than your resolution of measurement. For (made-up) example, say you find a difference in a variable measuring "age at disease onset" between males and females, but that difference is on the order of days, i.e., males 60 years, 2 months and 3 days, females, 60 years, 2 months and 4 days. What does that mean in any kind of measure of effect size other than indicating there is probably some other bias lurking in your data. This one is obvious because we have an intuitive sense  of what the data mean. Other problems are often less obvious.

Now comes the journal Basic and Applied Social Psychology which is banning significance testing. In specific, they have banned  null hypothesis significance testing procedure (NHSTP). Steve Novella, who writes at the excellent Science-Based Medicine Blog, has a great discussion. He links to a video by Geoff Cumming that is similar to what I showed in the talk I gave.

All of this is well and good. Many Good Thinking People are cheering. But most of us publish in journals that require estimates of significance. My clinical society judges abstracts (which are severely limited in number of both talks and posters) based on some numskull criteria, which include "testable hypothesis" and "testing of hypothesis". They have not yet learned that horrible research comes in many flavors, and that one rule (must have p-value) does not a good piece of research make.

Another person who's view I respect is Andrew Gelman who has a good but short post on this. Here's the close of his post:

 Actually, I think standard errors, p-values, and confidence intervals can be very helpful in research when considered as convenient parts of a data analysis .... The problem comes when they’re considered as the culmination of the analysis, as if “p less than .05″ represents some kind of proof of something. I do like the idea of requiring that research claims stand on their own without requiring the (often spurious) support of p-values.

The bottom line always is: understand your data. Go back to what the effect size means. What are the data saying to you? What is the story? Use statistics to support what you do, but if they become the science of your research, you are in trouble.

3 responses so far

The Importance and Neglect of Variation - part 1

Nov 25 2014 Published by under data analysis, statistics, Uncategorized

Within science, my old thesis advisor used to say, there are A-Sciences and B-Sciences. The names came from the Uni's classification of undergraduate requirements. Other than having the label "B", as opposed to "the more important science that gets neglected", he felt this was A Good Thing. In short, A-sciences depended on rules, laws, and invariants. Chemistry. If you take hydrogen and oxygen, and know the pressure and temperature, you will know the phase-state of the result. The stuff of the 17th century enlightenment.

On the other hand, evolution, ecology, astronomy, he said, were statistical sciences. It wasn't a matter of a single law or paradigm, it was a matter of which ones occurred most often. For the evolutionary biologists out there, there is old hat. Darwin depended on the existence of variation to drive evolution. Survival of the fittest implies a more fit and a less fit organism.

Of course, B-science was neglected. That was part of its glory. A-science with its bald white men in lab coats (back then), as compared to bearded white men in plaid flannel shirts with mountain climbing boots in the middle of the city (in case a mountain did appear in the middle of the city), was the dominant paradigm. We were the subversives. Current funding trends support that perspective, fueled by a justification for improving health.

So it is no big shock that an NPR story titled You might be surprised when you take your temperature maintains that the magic number of 98.6F is not so magic and that  "Baseline normal temperatures differ from person to person and from day to day". One of the two points in the article is a listing of temperatures that might mean something clinically (above 101.5 is a serious fever and multiple organ failures happen at 107F). The other of course is that normal is not a set number. Variation gets its due in a backhanded way.

I find the lack of attention to variation to range between irritating and bad science. I remember during my postdoc, a neuroscientist saying that something along the lines of "I measure single neurons" and that "variation is irrelevant as each neuron has an outcome". In my ill-equipped intellectual armamentarium at the time, I tried to point out there were many single neurons and how did he know that the one he measured was representative of all the neurons? And what about differences among neurons? I could not articulate the  value of concepts of variation vs. central location to all science, but that's where I was heading, even then. Sometimes what is important is not whether two things differ in central value (the mean, the median), but in how they vary around that value. Next post will look at that in some detail.

Meantime, an interesting book (technical) book on variation is: Variation: A Central Concept in Biology  2005 by  Benedikt Hallgrímsson (Editor), Brian K. Hall (Editor) ISBN-10: 0120887770 ISBN-13: 978-0120887774. Online ($$$) version here. Review here.

 

 

index

Summary:Darwins theory of evolution by natural selection was based on the observation that there is variation between individuals within the same species. This fundamental observation is a central concept in evolutionary biology. However, variation is only rarely treated directly. It has remained peripheral to the study of mechanisms of evolutionary change. The explosion of knowledge in genetics, developmental biology, and the ongoing synthesis of evolutionary and developmental biology has made it possible for us to study the factors that limit, enhance, or structure variation at the level of an animals physical appearance and behavior. Knowledge of the significance of variability is crucial to this emerging synthesis. This volume situates the role of variability within this broad framework, bringing variation back to the center of the evolutionary stage.      

2 responses so far