# Spurios Correlations

#### NobleSavage

##### Veteran Member
US spending on science, space, and technology
correlates with Suicides by hanging, strangulation and suffocation

Number people who drowned by falling into a swimming-pool
correlates with Number of films Niclas Cage appeared in

Divorce rate in Maine correlates with
Per capita consumption of margarine (US)

And many more at http://tylervigen.com/

#### Petrel

##### Member
I read a novel some years ago wherein a judge banned peanut butter from his home, because there was a statistical link between the consumption of peanut butter and the commission of violent crime.

The link was: they serve a lot of peanut butter in prisons because it's cheap.
Every one of those graphs, and many similar graphs, might have a third and/or fourth factor that connects them.
Or could be sheer coincidence.
(Assuming they're accurate in the first place.)

#### doubtingt

##### Senior Member
It is no coincidence that all the examples have time as the unit of analysis. The most glaring and seemingly silly of spurious correlations tend to when 2 variables are examined for their change over time. Since time is inherently confounded with every variable in the universe that changes in a systematic non-random way over time. Causality moves forward through time (fuck you, Stephen Hawking), such that the value of any quantitative variable at TimeX will be a function of the value at TimeX -1. Thus, even for variables that fluctuate in a cycles over time, for any narrow window of time, that variable is very likely to be systematically going up or going down. That will make it correlate either positively or negatively with every other variable in the universe that is also going up or down during that same time span.
This is why the most uninformative and most likely to be silly meaningless correlations tend to be 2 variables correlated in a linear way over time, and the shorter the time span the more meaningless. Complex non-linear correlations where the variable go up and down together are much less likely by chance. The Nick Cage example is nonlinear, so you can find them, but you have to hunt for such relationships. A priori there was a low chance that Nick cage movies would correlate in such a pattern with number of pool deaths. Someone just took tons of variables, and found this pair that worked. In contrast, a priori there is a very good chance that divorce rate in Maine and margarine consumption would correlate in a linear way either positively or negatively.

Notice that this time problem doesn't exist when the unit of analysis is not time, such as the correlation between IQ and belief in God among a sample of people (people is the unit of analysis rather than Time). Unlike each unit of time, each person does not belong in a specific place relative to each other on the X axis. Unlike units of time, people cannot be arranged in any order or any meaningful way unless you have another variable on which to order them. That is why graphs for these kinds of correlations have the 2 variables of interest on the X and Y axis and "person" is not represented at all on the graph. This eliminates most of the variables in the universe as potential sources of correlation. Anything that is only related to these variables because it also changes over time will not cause a correlation at the person level. When time is not your unit of analysis, then the source of the correlation is limited to some other variable that systematically varies between whatever is your unit of analysis. That leaves 3 options. Either is X causing Y, Y causing X, and/or some variable Z causing both X and Y. Variable Z causing X and Y tends to be less "spurious" of a relationship because it means that the correlated variables have a common causal factor that varies between people. Often this can be quite theoretically meaningful, even though it isn't a direct causal connection between X and Y.
Back to IQ and theism, even if lower IQ doesn't cause theism, many of the probably Z variables are still interesting and tell us something about the nature of theism and/or IQ. For example, maybe parental income creates social environments that promote theism and impact education such as to impact IQ performance. That still tells us something about the psychology of theism and what enables it, plus how IQ is hampered by some of the same factors.

A final note is that a good way to show that a correlation is meaningful and likely due to a relatively direct causal relation is to show that the correlation emerges at many different levels of analysis, such as between individual persons, between states, between groups, between nations, and over time. Since each unit of analysis has confounding factors but they differ from each other in which confounding factors, a correlation that holds under all units of analysis can't be due to confounds that only apply to some of the units of analysis.

#### hurtinbuckaroo

##### Veteran Member
The Super Bowl/Dow Jones correlation was my favorite. Between 1967 and 1997, the conference that won correlated with the direction of the DJIA 28 out of 31 times (up when an NFC team wins, down when an AFC team wins).