Second part of what are errors. Other reason for repost: I find the idea of Type I and Type II to be a useful construct for many other things in life. There are lots of ways to make mistakes, and one can just say "ah a mistake". But understanding the *implications* of errors calls for a finer taxonomy of what is wrong.

Type 2 errors are a bit harder to wrap one’s mind around. Around which to wrap one’s mind? Whatever.

Type 1 errors are when you reject the null hypothesis (you think: the red fish are swimming at a different speed than the blue fish) and you are wrong (in reality: the red fish swim at the same speed as the blue fish). The p-value tells you how likely you are to be wrong (5% of the time at p=.05). Also know as a false positive.

Type 2 errors are often called “the sad error” because its “nope, can’t reject the null” when in fact, the reality is that those damn red fish are swimming at different speed. Sad because you in fact would really like to find a difference. Also know as false negative results.

[This of course raises a whole ‘nuther specter of what does it mean to “really like” a result. And, if you listen to the grant mavens, you should never have a hypothesis for which there is a sad outcome – ie accepting a boring null. Having a question that right or wrong gives you something that is very interesting, if not spectacular, is one hallmark of good science].

Anyway, type 2 errors are related to power. If α is the p-value (how likely are you to be wrong in rejecting the null), then β is how likely you are to be wrong in accepting the null. 1- β is termed The Power of the result. This is the power to which your IRB or IACUC or study section refers when they say “please do a power analysis to justify your sample size”.

α, β, and sample size are linked, together with “effect size”, and a measure of often referred to as δ (delta) in one lovely relationship, so that if you know any three, you can calculate the fourth. This is the heart of power analysis. You pick values for α and β (which itself can be fraught with difficulties), often .05 and .80 although *there is nothing magical, special or blessed about these levels.*

The next step is to calculate the effect size, another bag of worms that is worthy of its own post. The effect size is the difference you expect to see, the slope you expect to see, the whatever value you calculate that would reach significance in your eyes. But, I hear you cry, how the hell do I know this, if that is *the whole damn point of the experiment/project*? Aha. You use preliminary data. You can use data from a similar experiment. You can even use data from the literature. You can guess. For our fish: we expect a different of at least 1cm/sec between the reds and the blues. Else it doesn’t matter. Now, the raw effect size is usually not a very good measure, because you haven’t included a sense of the variation. Thus if a difference of 2.0 cm/sec means one thing if the standard deviation in each group is about .005 cm/sec, then 2cm/sec is a honking big difference, but if the sd is about 4cm/sec, than a difference of 2cm sec is not worth thinking about. To calculate sample size, some programs ask for the variations or SE or sd as well as effect size, others just ask for a scaled effect (diff/sd usually).

Then, you take all these numbers push them through a program or your favorite stats whiz, and get out The Sample Size (which can be plopped into your grant or protocol).

The bottom line here: this is not magic. It is not words from the Gods. This is not beyond the ability of any (and I do mean any) scientist to think about. It’s just one way, a very good way, of thinking about your data.

## Leave a Reply