Rethinking Causation in Economics

Unlearning Economics
8 min readJul 25, 2020

A while ago Josh Mason asked on twitter whether we need to understand mechanisms to determine causation, which sparked an interesting debate. I’m of the view that mechanisms are an important and underrated way of thinking about causality in economics. Understanding mechanisms entails taking a granular approach to disentangling causality from context. This is opposed to the probabilistic approach to causality, conceived as exogenous variation in the independent variable. I will emphasise, however, that there’s no reason both (and other) approaches to causality shouldn’t be a part of the conversation.

If economists think there may be a relationship between some independent variable X and some dependent variable Y, they usually proceed by inducing random variation in X and seeing what happens to Y. The random variation rules out any other possibilities which could explain a correlation between X and Y such as a reverse causality from Y to X, or a third variable Z which causes both X and Y. An alternative view is that if we have a correlation between X and Y we can also illustrate causality by empirically establishing the existence of the mechanism M which links X and Y. In social science using historical explanations and case studies, including detailed descriptive statistics, can get at the mechanisms underlying a particular causal story.

Economists arguably sometimes use this approach, but as far as I’m aware they just haven’t articulated it in the same way. Consider a classic example, the gender pay gap, which tries to measure the (negative) effect of being a woman on pay. As all Scott Cunningham fans know, we can show this most easily using a Directed Acyclical Graph (DAG):

A Directed Acyclical Graph (DAG) for the factors causing women’s earnings

To explain, we have income y and we want to know the reduction in income caused by being a female F. Income is partly caused by occupation o, partly by ability A, and partly by discrimination d. But because discrimination against females acts on both occupation and income, if you control for occupation in a regression of income on female you’re effectively controlling for discrimination. The result is that even if women do face discrimination and do earn less than men, controls can eliminate the effect of gender on wages or even make it reverse sign! On the other hand, it’s not 100% clear that not controlling for occupation is always the right thing to do — it wouldn’t be if discrimination didn’t exist and occupation were the result of free choices or male vs female differences in ability A. We’re stuck.

One way to settle the question of whether women face discrimination is to establish that d exists using alternative methods. And this is what many studies have done including the famous CV audit studies, which showed that for applicants who are the same in all characteristics but gender, women are less likely to receive job offers, are paid less when they do, and are judged as less competent on a number of other subjective characteristics. Additional research has used experiments, as well as in-depth qualitative studies of women’s experiences at work, to establish that women are discriminated against. When combined with the fact that women earn less than men no matter how you measure it, this indicates that the existence of a gender pay gap is due to discrimination and it doesn’t bump up against the above problems (it’s worth mentioning the effect sizes in these studies are pretty high, too).

There are definite advantages to this approach over more statistically minded analysis. Even in the best quantitative empirical work in social science your results are often driven by an untestable assumption (think common trends for difference-in-difference, or exogeneity for instrumental variables). In contrast, the establishment of a mechanism deals directly with reality and ‘what happened’. Mapping from experiments to reality is trickier in social science than physical science, but if we have a wide range of field and lab experiments showing the same thing — as we do with discrimination against women — we can be more certain the mechanism is reliable.

Another advantage to the mechanistic approach is that it works better with nonlinear systems, of which the economy is surely one example. As Blair Fix has neatly summarised, the Acyclical in Directed Acyclical Graph rules out nonlinearities; in contrast, once we’ve shown a mechanism we can be pretty confident it exists whether or not the system is linear. Take the example of climate change: it was actually in the mid-19th Century that Eunice Newton Foote established that the ‘greenhouse gas’ mechanism exists in the lab through a simple experiment, and correctly predicted that on a larger scale it would lead to global warming. In this case rising CO2 levels from industrialisation, rising global temperatures, and a concrete physical mechanism gave us good reason to believe in anthropogenic global warming even if they were far from the full picture.

This approach is also closer to the practice of the real scientists in epidemiology, who have recently come into the public eye for obvious reasons. The famous (albeit slightly dated) Bradford-Hill criteria contains 8 components, one of which is a ‘plausible biological mechanism’ which links X and Y. Three epidemiologists have recently claimed that pluralism is needed in the approach to causality. They note that in the 1950s and 60s, observational data about the simultaneous increases in both smoking and lung cancer was combined with laboratory evidence about the effect of tobacco smoke on lung cells. They argue that the modern approach to causal inference would not have caught this and so may not always be the best way to think about causal claims. They note that “a piece of evidence which, on its own, is very poor evidence for causality, might be a keystone of a larger structure that makes a very strong case for causality”.

Murder Trial Causality

It is not always straightforward to isolate a particular mechanism in the way we would like, since causality in the real world can be difficult to prove when controlled studies are not possible. One example is a murder trial, where prosecutors often aim to establish both a motive and an opportunity. There is no quantitative method which can establish whether or not the defendant ‘caused’ the death, so we ask where they were and when, what they were doing, whether they had reason to kill the victim, etcetera. Finding out about motives, details, and making comparisons is the way most people intuitively think about causality. The famous sociologist Max Weber viewed close comparison as a substitute for counterfactuals, and a version of this approach is also common among historians, who’ve been said to focus on “the facts ma’am, just the facts”.

According to an excellent paper by Morck and Yeung “though not proof of causation, correlation is a smoking gun; and history can often supply sufficient circumstantial evidence to convict.” One of my favourite historical examples is whether the atom bomb was the reason the Japanese surrendered in World War 2. While I’m no expert, this Foreign Policy article convincingly makes the case that it wasn’t the A-Bomb given that key meetings happened before the news about Hiroshima had travelled; the need to preserve the legitimacy of the emperor by making the defeat seem fantastical to the population; and the fact that Hiroshima was not more devastated than other Japanese cities which had suffered from non-nuclear bombing, making the A-bomb seem less special to the authorities. They make the case that the Japanese surrendered because of Russia’s armies because it matches the facts better.

To see how this approach may aid the empirical revolution in economics, consider an interesting paper which followed two Randomised Control Trials (RCTs) from a qualitative perspective. As is often the case the RCTs — which gave women in two different locations in India assets such as livestock and poultry — were beset by practical issues including unsuccessful randomisation, dropouts, substitutions, and spillovers. The qualitative approach gave the researchers a finer grained picture of what happened to beneficiaries of the program, including why it was and wasn’t successful in different circumstances. Issues such as cooperation within the household, religion, access to alternatives, and various aptitudes affected how well the program worked.

There was agreement between the quantitative and qualitative approaches that the program worked better in Bengal than in Sindh, an encouraging congruence. Less encouragingly, the approaches differed on what was driving this: for the qualitative researchers it was the worst off among the villagers; for the quantitative was the best off. The authors suggest the quantitative researchers are wrong and their results were driven by the faulty randomisation mentioned earlier. The wealthy were more likely to stay on the program and this would have biased the difference between the treatment and control groups. Based on this detail they are able to suggest methods for improving the effectiveness of such programs in the future.

At the very least, this qualitative approach helps us to understand why an estimated quantitative effect is the way it is, and unmasks heterogeneities that would have been difficult to see with only quantitative data — in other words, it gets directly at mechanisms. At best, qualitative methods throw into doubt inferences made from naïve use of quantitative methods by highlighting inconsistencies between the estimates and ‘what happened’. While I’ve nothing against using both the qualitative and quantitative methods — their congruence is, for me, the true ‘gold standard’ — in case of disagreement between the two, the qualitative approaches typically have the edge as they can explain things at a level of detail quantitative researchers cannot. It is a well-established critique of RCTs, for instance, that they can tell us whether or not something worked but not why.

There are limitations of this approach. This article by Julian Reis points out that a mechanism can be present without probabilistic causality, for instance if two opposing mechanisms cancel each other out at the population level. Furthermore, as with hidden statistical assumptions one can omit (or be unaware of) parts of the qualitative story which don’t fit one’s narrative. For instance, blogger/sunglass wearer Pseudoerasmus’ approach to economic history follows the methodology I favour, and he has persuaded me of many things over the years. But in his article disputing the narrative that the US destabilised Chile in the 1970s, he fails to include that fact that the US dumped its copper holdings — Chile’s main export — which played a role in destabilising the economy. Still, I find this approach to teasing out the details more straightforward and transparent than the quantitative approach, which almost always seems to hinge on just introducing more assumptions.

In the interests of synthesis I will point out that natural experiments, field experiments, differences-in-differences, and regression discontinuities are the methods closest to my approach, since they often rely on close comparative examinations of particular cases. The aforementioned Reiss makes the case for causal pluralism more strongly, arguing the different approaches to causality are inherently incompatible, and that any one type of ‘causality’ can be present without the other. Whatever the case, there is certainly room for expanding our understanding of causality in economics, lest we risk turning out like that guy in XKCD 552.

Addendum: Pseudo unsurprisingly had some issues with the above claim that the US selling its copper reserves destabilised Chile, citing a 1971 letter from Kissinger to Nixon where Kissinger advised against it and also warned it would have small effects, and pointing to the first record of sales of numerous metals in 1973 (after the Chilean economy had tanked). I think he’s right — an excellent example of how fruitful the methodology outlined in this post can be!

--

--