I gave a Big Talk once on the difference between statistical significance and clinical significance. It was poorly received by the editor of the society's journal. He said to me that now he was going to be inundated with papers that say "Dr. Theron sez I don't need no stinkin' numbers". Actually he didn't quote The Treasure of Sierra Madre, but the implication was there.

My simple point was that effect size is as important as significance. I've talked about this some, before. Something can be significant (because of pseudo-replication, over sampling, etc) but that the effect size is trivial, or less than your resolution of measurement. For (made-up) example, say you find a difference in a variable measuring "age at disease onset" between males and females, but that difference is on the order of days, i.e., males 60 years, 2 months and 3 days, females, 60 years, 2 months and 4 days. What does that mean in any kind of measure of effect size other than indicating there is probably some other bias lurking in your data. This one is obvious because we have an intuitive sense of what the data mean. Other problems are often less obvious.

Now comes the journal Basic and Applied Social Psychology which is banning significance testing. In specific, they have banned null hypothesis significance testing procedure (NHSTP). Steve Novella, who writes at the excellent Science-Based Medicine Blog, has a great discussion. He links to a video by Geoff Cumming that is similar to what I showed in the talk I gave.

All of this is well and good. Many Good Thinking People are cheering. But most of us publish in journals that require estimates of significance. My clinical society judges abstracts (which are severely limited in number of both talks and posters) based on some numskull criteria, which include "testable hypothesis" and "testing of hypothesis". They have not yet learned that horrible research comes in many flavors, and that one rule (must have p-value) does not a good piece of research make.

Another person who's view I respect is Andrew Gelman who has a good but short post on this. Here's the close of his post:

Actually, I think standard errors, p-values, and confidence intervals can be very helpful in research when considered as convenient parts of a data analysis .... The problem comes when they’re considered as the culmination of the analysis, as if “p less than .05″ represents some kind of proof of something. I do like the idea of requiring that research claims stand on their own without requiring the (often spurious) support of p-values.

The bottom line always is: understand your data. Go back to what the effect size *means**. *What are the data saying to you? What is the story? Use statistics to support what you do, but if they become the science of your research, you are in trouble.

“If your experiment needs statistics, you ought to have done a better experiment.”

-- Rutherford

[…] I get their emphasis on descriptive statistics and effect size, and I also agree with Potnia Theron: […]

[…] someone originally trained in statistics and data analysis (before biology), I think the movement to reconsider how we use statistics and NHST (null hypothesis significance testing), as […]