Being statistically rational about generation

Apr 10 2017 Published by under Uncategorized

Well, once again, some Gen-X-er had to say something snarky about generations on the tweets.

I've gotten tired of arguing the stereotypes. I've gotten tired of pointing out the flaws in the statistics.

This time the fight, argument, er discussion had to do with "where do you draw the lines betwixt generations?"

This fight reminded me of one of the things I used to teach when I taught data analysis to biology grad students.  Say you've got two variables, oh call them X and Y. You are interested in the relationship between X&Y, and maybe even have some hypotheses about what that is. You've measured both, and let's even say you have reason to want to call Y the response variable (or even dependent) and X the carrier or independent. X looks a little "grouped". There seem to be some "natural" breaks.

Point the first: unlikely to be natural. If this is data you set out to collect: copoopods in the river, rabbits hopping on the bank, you may have, unintentionally, not on purpose, whatever, introduced that bias. Departures from random sampling are rampant and insidious.

So you plot the data and those breaks may or may not still be there. The temptation to turn "X", a continuous variable, be it distance on your transact, size of bunny lower limb or whatever, into a categorical variable is huge. Your slope of Y on X is small and the significance is marginal (.06 or so).  The relationship is there. You know it. But it's weak as that cup of coffee you stole from the pot before it was finished brewing.

So you take X and turn it into categories. Every 10M or 100M on the transact. Reorganize the data and push it through an ANOVA. And lo! Your eyes were opened, and there is a Significant Relationship. Victory is within your grasp.

Point the second: What is bad here? You've violated many assumptions:

In an ANOVA, membership in category is not random. The difference between random and fixed predictor or categorical variables in a linear model is critical and frequently overlooked.  Understanding the difference separates out the JV from the varsity, Serena Williams from my sweetie's tennis group, the undergraduate from the postdoc. The calculations are different, the implications are different, and the conclusions you can reach are different. What are these differences? Well, if you still believe in inferential statistics, you cannot make inferences about levels in a random variable (i.e. Is group 1 different from group 2?). Further when you test, you can't just take the ratio of MSR/MSE  to arrive at an F-stat. Small technical point, but one that influences that measure of significance.

Other points: did you check within group variation? I thought not. It's supposed to be equal, and if you divvied up a continuous variable, it's not likely to be.

You can call yourself a fire hydrant, but it doesn't make you one. This *was* a continuous variable, and your groups are not "real" biologically, and if it is, the group variable will be random, not fixed.

So where does it leave our stalwart warriors of GenX?

Well, "generations" is the imposition of categories on a continuous variable. This problem is evident when people start arguing about at what year the boundary occurs. It's a continuous variable that one is trying to parse into categories. And then! Treat the categories like the belong to a fixed variable, and not a continuous one. Are people born in 1963 more "like" those born in 1968 than they are like those born in 1948? Probably in many ways. There is a continuous distribution, with many traits, response variables and over all "philosophy" or culture changing, relatively gradually over time.

The idea of a generation is a convenient trope for the glossy news machines, and the bitter folks trying to blame their problems on others. Yes, there are people who did BAD things and took alz teh grants muney. There are historical trends, there are patterns. Finding those patterns may or may not be useful to either advancing a hypothesis (somewhat likely) or effecting change (much less likely, but tremendously satisfying). Knowing history is a good thing.

There are bad people in every generation. There are BSD and Big Dogs and kindly hearts who mentor you. If you start classifying by this construct we call "generation" you are likely to make mistakes about who and what people are, because you are judging them on their age.

Things *have* changed. There is no doubt of that. There are many, many things that are much harder, right now, for young scientists. There are also some things that are easier. Some of these things were active, evil and selfish, things done by older people that have downstream impacts on younger people. It may be very satisfying to demonize a group, and blame them for all your problems. But it won't solve the problems. It won't turn members of that group into your allies to help solve the problems.







One response so far

Leave a Reply