
You can imagine my surprise and the amount of load that disappeared from my shoulders when I found out that those conclusions were pretty much bullshit. Who would have thought that a statistics class at 8:30 in the morning would provide the means for such a catharsis. My dilemma and the conclusions from it are a classic example for the two evil forces challenging data analytics: correlation versus causation. Avoiding drilling too deep into math, the former says that there can be found a mathematical relationship between two or more things, and the latter means that this relationship has a quality of mutual influence to it (as in: cause and effect). It's always worth to consider both aspects as we shall see.
Getting back to my dramatic experience with the periodical men's advisor, I mixed up correlation and causality. I'm not implying that the magazine or the researchers stated something wrong. They probably went out asking 50 men how they slept, how big their bellybutton were, and how successful they did in their jobs. Then they jotted the answers down, arranged them in a spreadsheet et voilà: a noteworthy headline. The problem here is that they left out half of the math in their statements. Unfortunately many papers do that and our heuristic drive pushes us into drawing own conclusions. Quite often we end up in fallacies and memorize them. Apart from tabloids, business, politics, and medicine are particularly prone to these misconceptions; like Petri-dishes for bacteria they provide a fertile soil.
Now let's take it a notch deeper and figure out what happens. When dealing with it, we should consider a correlation analysis merely as a tool like a knife or even better: like a thermometer. When I was a boy, my mother used a thermometer to figure out if I had a fever. She crammed it into whatever body opening was needed and could tell after a while if I had a fever, and if it was rising or subsiding. Actually, a correlation analysis works quite similarly. It shows if there is a mathematical relationship between two or more things ("variables"), and it shows whether this relationship is positive (one goes up, the other one too) or negative (one up, other one down). The analysis doesn't reveal, however, what caused this observation. Correspondingly, a thermometer just shows whether or not you have a fever. It reveals something is wrong within your body but doesn't show you what virus or bacteria you caught. By the same token, such tools have their limits: how about diseases without abnormally high body temperature?
![]() |
| Source: sxc.hu |
So, we have here an analysis whose results take us at times on the right track, yet some other time it leaves us exactly where we started: in a mystic fog of numbers. What can we do to
counter this calamity? After some research1 I found
out four, probably not exhaustive but fair enough, pitfalls one should
take a look at when dealing with this issue. I will go through them
repetitively referring to the following two examples: fires and firemen. Let's say we have observed the number of
fires and firemen in some region and our correlation/thermometer
analysis showed a "significant" relationship between the two. Just ignore the "significant" if you don't like
it, it's for people who dug already deeper in statistics. Now, when
inferring from our results on cause-and-effect we must watch out for the
following aspects:
1. Atypical group:
If we, for instance, observed firemen in the city and want to draw conclusions from that for the whole country we could have looked at an unusual group. Another case in point would be how big the numbers are: did we look at hundreds of fires and firemen and contrast them with millions? Avoid comparing apples to bananas.
2. Neglected variable:
Maybe there's a government program or a great firemen school, or...well... how about the fantastic parties of them that could e.g. bring about more firemen. There are some rural areas in Germany that have a splendid fire brigade but almost no fires to extinct. Some look more like get-togethers under the umbrella of the brigade to me. Watch out for influences that you didn't take into account.
3. Automatic effects:
This is a sibling of number 2. We often forget about how values change over time. Maybe you took your survey in a region that boomed over the last 10 years, so fires went up in numbers and firemen probably, too. They could be neatly aligned because of the boom instead of a relation to each other. Time often brings automatic effects giving us a correlation without a real relation.
4. Causal direction:
I love this one. Maybe it's not the fires causing the number of firemen to rise. How about it's the other way around? Sounds stupid? Well, there are definitely cases in which firemen actually ended up being arsonists. Out of boredom? For training? Who knows. The question here is: who actually influences whom?
These pitfalls and additional aspects like it should be drummed into each of us because they influence our decisions on a daily basis. Just imagine a business situation in which we took the causal direction wrong, or a policy designed according to an analysis that left out a crucial variable. Many politicians exploit these pitfalls in their seek to appeal to the ordinary people (populism): "growing numbers of immigrants and rising lawbreaking go hand in hand in our town," they say, so "let's get rid of the immigrants." As shown above, often the town just prospered over time and that gave reason to more immigrants and more burglars to try their luck. Myriad examples can also be drawn from medicine. Just think of the tug of war between say, acupuncturist or healers and conventional doctors. Add to that the inexplicable placebo effect and you have a perfect recipe for a clusterfuck.
Still, I think there is hope. If we challenge these things with the right attitude we probably end up learning and knowing more about the situation. Then we should be able to take the right stand. My math professor used to say: "It's not that hard, but it's not that easy either." I guess that's the spirit to go for.
1c.p.: Yule, "Why do we sometimes get Nonsense-Correlations between Time-Series?" 1926, Journal of the Royal Statistical Society; Walter Krämer, "So lügt man in Statistik" ["How to lie with statistics" (translated)] 2000, Piper; "Teaching methods, an alternative vote", May 2011, The Economist; "Getting the story right, What does the rise (and fall) of commodity prices imply for investors and the economy?", ibidem.

0 comments:
Post a Comment