Counterfactual analysis as fatalism

‘Many counterfactual analyses are based, explicitly or implicitly, on an attitude that I term fatalism. This considers the various potential responses \(Y_{i}(u)\), when treatment \(i\) is applied to unit \(u\), as predetermined attributes of unit \(u\), waiting only to be uncovered by suitable experimentation. (It is implicit that the unit \(u\) and its properties and propensities exist independently of, and are unaffected by, any treatment that may be applied.) Note that because each unit label \(u\) is regarded as individual and unrepeatable, there is never any possibility of empirically testing this assumption of fatalism, which thus can be categorized as metaphysical.’

– Dawid, A. P. (2000, pp. 412-413) [Causal inference without counterfactuals. Journal of the American Statistical Association, 95, 407–424].

Counterfactual talk as nonsense

‘We all indulge, in anger and regret, in counterfactual talk: “If they had not operated, John would be alive today”; “If I had not said that, she would not have left me”; “If I had chosen a different publisher, my book on causality without counterfactuals would have sold 10,000 copies.” The more fortunate among us have someone to remind us that we are talking nonsense.’

– Shafer, G. (2000, p. 442) [Causal inference without counterfactuals: Comment. Journal of the American Statistical Association, 95, 438–442].

 

Time for counterfactuals

I have just discovered Scriven’s stimulating (if grim) challenge to a counterfactual understanding of causation (see the debate recorded in Cook et al., 2010, p. 108):

“The classic example of this is the guy who has jumped off the top of a skyscraper and as he passes the 44th floor somebody shoots him through the head with a .357 magnum. Well, it’s clear enough that the shooter killed him but it’s clearly not true that he would not have died if the shooter hadn’t shot him; so the counterfactual condition does not apply, so it can’t be an essential part of the meaning of cause.”

I love this example because it illustrates a common form of programme effect and summarises the human condition – all in a couple of sentences! Let’s reshape it into an analogous example that extends the timeline by a couple of decades:

“A 60 year old guy chooses not to get a Covid vaccine. A few months later, he gets Covid and dies. Average male life expectancy is about 80 years.”

(I guess jumping is analogous to being born!)

By the end of the first sentence, I reason that if he had got the vaccine, he probably wouldn’t have died. By the end of the second sentence, I am reminded of the finiteness of life. So, the vaccine didn’t prevent death – similarly to an absence of a gunshot in the skyscraper example. How can we think about this using counterfactuals?

In a programme evaluation, it is common to gather data at a series of fixed time points, for instance a few weeks, months, and, if you are lucky, years after baseline. We are often happy to see improvement even if it doesn’t endure. For instance, if I take a painkiller, I don’t expect its effects to persist forevermore. If a vaccine extends life by two decades, that’s rather helpful. Programme effects are defined at each time point.

To make sense of the original example, we need to add in time. There are three key timepoints:

  1. Jumping (T0).
  2. Mid-flight after the gunshot (T1).
  3. Hitting the ground (T2).

When considering counterfactuals, the world may be different at each of these times, e.g., at T0 the main character might have decided to take the lift.

Here are counterfactuals that make time explicit:

  • If the guy hadn’t jumped at T0, then he wouldn’t have hit the ground at T2.
  • If the guy hadn’t jumped at T0, then he wouldn’t have been shot with the magnum and killed at T1.
  • If the guy had jumped, but hadn’t been shot by the magnum, he would still have been alive at T1 but not at T2.

To assign truth values or probabilities to each of these requires a model of some description, e.g., a causal Bayesian network, which formalises your understanding of the intentions and actions of the characters in the text – something like the DAG below, with conditional probabilities filled in appropriately.

So for instance, the probability of being dead at T2 given jumping at T0 is high – if you haven’t added variables about parachutes. What happens mid-flight governs T1 outcomes. Alternatively, you could just use informal intutition. Exercise to the reader: give it a go.

Using the Halpern-Pearl definitions of causality on this model (Halpern, 2016), jumping caused death at both T1 and T2. The shooting caused death at T1 but not T2. (R code here – proper explanation to be completed, but you could try this companion blog post and citation therein.)

Back then to the vaccine example, the counterfactuals rewrite to something like:

  • If the guy hadn’t been born at T0, then he wouldn’t have died at T2.
  • If the guy hadn’t been born at T0, then he couldn’t have chosen not to get a vaccine and died at T1.
  • If the guy had been born, but had decided to get the vaccine, he would still have been alive at T1 aged 60, but possibly not at T2 aged 80.

References

Cook, T. D., Scriven, M., Coryn, C. L. S., & Evergreen, S. D. H. (2010). Contemporary Thinking About Causation in Evaluation: A Dialogue With Tom Cook and Michael Scriven. American Journal of Evaluation, 31(1), 105–117.

Halpern, J. Y. (2016). Actual causality. The MIT press.

Actual causes: two examples using the updated Halpern-Pearl definition

Halpern (2015) provides three variants of the Halpern-Pearl definitions of actual causation. I’m trying to get my head around the formalism, which is elegant, concise, and precise, but tedious to use in practice, so I wrote an R script to do the sums. This blog post is not self-contained – you will need to read the original paper for an introduction to the model. However, it works through two examples, which may help if you’re also struggling with the paper.

The second (“updated”) definition of an actual cause asserts that \(\vec{A} = \vec{a}\) is a cause of \(\varphi\) in \((M,\vec{u})\) iff the following conditions hold:

AC1 \((M,\vec{u}) \models (\vec{A} =\vec{a}) \land \varphi\).

This says, if \(\vec{A} = \vec{a}\) is an actual cause of \(\varphi\) then they both hold in the actual world, \((M,\vec{u})\). Note, for this condition, we are just having a look at the model and not doing anything to it.

AC2 There is a partition of the endogenous variables in \(M\) into \(\vec{Z} \supseteq \vec{X}\) and \(\vec{W}\) and there are settings \(\vec{x’}\) and \(\vec{w}\) such that

(a) \((M,\vec{u}) \models [ \vec{X} \leftarrow \vec{x’}, \vec{W} \leftarrow \vec{w}] \neg \varphi\).

So, we’re trying to show that undoing the cause, i.e., setting \(\vec{X}\) to \(\vec{x’} \ne \vec{x}\), prevents the effect. We are allowed to modify \(\vec{W}\) however we want to show this, whilst leaving \(\vec{Z}-\vec{X}\) free to do whatever the model tells these variables to do.

(b) If \((M,\vec{u}) \models \vec{Z} = \vec{z^{\star}}\), for some \(\vec{z^{\star}}\), then for all \(\vec{W’} \subseteq \vec{W}\) and \(\vec{Z’} \subseteq \vec{Z}-\vec{X}\),
\((M,\vec{u}) \models [ \vec{X} \leftarrow \vec{x}, \vec{W’} \leftarrow \vec{w’}, \vec{Z’} \leftarrow \vec{z^{\star}}] \varphi\).

This says, trigger the cause (unlike AC1, we aren’t just looking to see if it holds) and check whether it leads to the effect under all subsets of \(\vec{Z}\) (as per actual world) that aren’t \(\vec{X}\) and all subsets of the modified \(\vec{W}\) that we found for AC2(a). Note how we are setting \(\vec{Z}\) for those subsets, rather than just observing it.

AC3 There is no \(\vec{A’} \subset \vec{A}\) such that \(\vec{A’} = \vec{a’}\) satisfies AC1 and AC2.

This says, there’s no superfluous stuff in \(\vec{A}\). You taking a painkiller and waving a magic wand doesn’t cause your headache to disappear, under AC3, if the painkiller works without the wand.

Example 1: an (actual) actual cause

Let’s give it a go with an overdetermined scenario (lightly edited from Halpern) that Alice and Bob both lob bricks at a glasshouse and smash the glass. Define

\(\mathit{AliceThrow} = 1\)
\(\mathit{BobThrow} = 1\)
\(\mathit{GlassBreaks} = \mathit{max}(\mathit{AliceThrow},\mathit{BobThrow})\)

So, if either Alice or Bob (or both) hit the glasshouse, then the glass breaks. Strictly speaking, I should have setup one or more exogenous variables, \(\vec{u}\), that define the context and then defined \(\mathit{AliceThrow}\) and \(\mathit{BobThrow}\) in terms of \(\vec{u}\), but it works fine to skip that step as I have here since I’m holding \(\vec{u}\) constant anyway.

Is \(\mathit{AliceThrow} = 1\) an actual cause of \(\mathit{GlassBreaks} = 1\)?

AC1 holds since \((M,\vec{u}) \models \mathit{AliceThrow} = 1 \land \mathit{GlassBreaks} = 1\). The first conjunct comes directly from one of the model equations and none of the functions change it. Spelling out the second conjunct,

\(\mathit{GlassBreaks} = \mathit{max}(\mathit{AliceThrow},\mathit{BobThrow})\)
\(= \mathit{max}(1, 1)\)
\(= 1\)

For AC2, we need to find a partition of the endogenous variables such that AC2(a) and AC2(b) hold. Try \(\vec{Z} = \{ \mathit{AliceThrow}, \mathit{GlassBreaks} \}\) and \(\vec{W}= \{ \mathit{BobThrow} \}\).

AC2(a) holds since \((M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 0, \mathit{BobThrow} \leftarrow 0] \mathit{GlassBreaks} = 0\).

For AC2(b), we begin with \(\vec{Z} = \{ \mathit{AliceThrow}, \mathit{GlassBreaks} \}\) and the settings as per the unchanged model, so

\((M,\vec{u}) \models \mathit{AliceThrow} = 1 \land \mathit{GlassBreaks} = 1\).

We need to check that for all \(\vec{W’} \subseteq \vec{W}\) and \(\vec{Z’} \subseteq \vec{Z}-\vec{X}\),
\((M,\vec{u}) \models [ \vec{X} \leftarrow \vec{x}, \vec{W’} \leftarrow \vec{w’}, \vec{Z’} \leftarrow \vec{z^{\star}}] \varphi\).

Here are the combinations and \(\varphi \equiv \mathit{GlassBreaks} = 1\) holds for all of them:

\((M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1, \mathit{GlassBreaks} \leftarrow 1, \mathit{BobThrow} \leftarrow 0 ] \varphi\)
\((M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1, \mathit{BobThrow} \leftarrow 0 ] \varphi\)
\((M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1, \mathit{GlassBreaks} \leftarrow 1 ] \varphi\)
\((M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1 ] \varphi \)

(The third was rather trivially true; however, as far as I understand, has to be checked given the definition.)

AC3 is easy since the cause only has one variable, so there’s nothing superfluous.

Example 2: not an actual cause

Now let’s try an example that isn’t an actual cause: the glass breaking causes Alice to throw the brick. It’s obviously false; however, it wasn’t clear to me exactly where it would fail until I worked through this…

AC1 holds since in the actual world, \(\mathit{GlassBreaks} = 1\) and \(\mathit{AliceThrow} = 1\) hold.

Examining the function defintions, they don’t provide a way to link \(\mathit{AliceThrow}\) to a change in \(\mathit{GlassBreaks}\), so the only apparent way to do so is through \(\vec{W}\). Therefore, use the partition \(\vec{W} = \{\mathit{AliceThrow}\}\) and \(\vec{Z} = \{\mathit{GlassBreaks}, \mathit{BobThrow}\}\).

Now for AC2(a), we can easily get \(\mathit{AliceThrow} = 0\) as required, since we can do what we like with \(\vec{W}\). It doesn’t help when we move onto AC2(b) since we have to hold \(\mathit{AliceThrow} = 0\), which is the negation of what we want. The same is the case for the other partition including \(\mathit{AliceThrow}\) in \(\vec{W}\), i.e., \(\vec{W} = \{ \mathit{AliceThrow}, \mathit{BobThrow} \}\).

So, the broken glass does not cause Alice to throw a brick. The setup we needed to get through AC2(a) set us up to fail AC2(b).

References

Halpern, J. Y. (2015). A Modification of the Halpern-Pearl Definition of Causality. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015), 3022–3033.

See also this companion blog post.

Asking counterfactuals to think about controversial policies

“As an anecdote: one of us lives in a 1930s neighbourhood. The local municipality proposed converting a road connecting the neighbourhood to other neighbourhoods from a two-direction road to a one-way road. This would apply to motorized traffic only, not to bicycles. Many people protested because of the detour they would have to take by car. We asked the counterfactual question: suppose the city would have introduced the one-way road decades ago, would they then heavily support a policy to make it a two-direction road? Several people we asked did not know, the main reason for not supporting that change being that it would result in more traffic, noise and pollution, and a reduction in safety.”

– Van Wee et al. (2023, p. 84)

Reference

Van Wee, B., Annema, J. A., & Van Barneveld, S. (2023). Controversial policies: Growing support after implementation. A discussion paper. Transport Policy, 139, 79–86.

What is a counterfactual?

What’s a counterfactual? Philosophers love the example, “If Oswald hadn’t killed Kennedy, someone else would have”. More generally, Y would be y had X been x in situation U = u (Judea Pearl’s, 2011, rendering).

References

Pearl, J. (2011). The structural theory of causation. In P. McKay Illari, F. Russo, & J. Williamson (Eds.), Causality in the Sciences (pp. 697–727). Oxford University Press.

It’s all theory-based and counterfactual

Two of my favourite articles on evaluation are Cook’s (2000) argument that all impact evaluations, RCTs included, are theory-based and Reichardt’s (2022) argument that there’s always a counterfactual, if not explicitly articulated then not far beneath the surface. I think both arguments are irrefutable, but how we can build on theirs and others’ work to improve evaluation commissioning and delivery seems a formidable challenge given the fiercely defended dichotomies in the field.

If all impact evaluation really is theory-based then it’s clear there’s huge variation in the quality of theories and theorising. If all impact evaluation depends on counterfactuals then there is huge variation in how compelling the evidence is for the counterfactual outcomes, particularly when there is no obvious comparison group.

Clarifying these kinds of distinctions is, I think, important for improving evaluations and the public services and other programmes they evaluate.

References

Cook, T. D. (2000). The false choice between theory-based evaluation and experimentation. In A. Petrosino, P. J. Rogers, T. A. Huebner, & T. A. Hacsi (Eds.), New directions in evaluation: Program Theory in Evaluation: Challenges and Opportunities (pp. 27–34). Jossey-Bass.

Reichardt, C. S. (2022). The Counterfactual Definition of a Program Effect. American Journal of Evaluation, 43(2), 158–174.

Counterfactual evaluation

Consider the following two sentences:

(1) Alex’s train left 2 minutes before they arrived at the platform.

(2) If Alex had arrived at the platform 10 minutes earlier, then they probably would have caught their train.

Is the counterfactual in sentence 2 true or false, or can’t you tell because you didn’t run an RCT?

I reckoned that the counterfactual is true. I reasoned that Alex probably missed the train because they were late, so turning up earlier would have fixed that.

I could think of other possible outcomes, but they became increasingly contrived and far from the (albeit minimal) evidence provided. For instance, it is conceivable that if Alex arrived earlier, they would have believed they had time to pop to Pret for a coffee – and missed the train again.

Seven ways to estimate a counterfactual

Experimental and quasi-experimental evaluations usually define a programme effect as the difference between (a) the actual outcome following a social programme and (b) an estimate of what the outcome would have been without the programme – the counterfactual outcome. (The latter might be a competing programme or some genre of “business as usual”.)

It is also usually argued that qualitative or so-called “theory-based” approaches to evaluation are not counterfactual evaluations. Reichardt (2022) adds to a slowly accumulating body of work that challenges this and argues that any approach to evaluation can be understood in counterfactual terms.

Reichardt provides seven examples of evaluation approaches, quantitative and qualitative, and explains how a counterfactual analysis is relevant:

  1. Comparisons Across Participants. RCTs and friends. The comparison group is used to estimate the counterfactual. (Note: the comparison group is not the counterfactual. A comparison group is factual.)
  2. Before-After Comparisons. The baseline score is often treated as counterfactual outcome (though it’s probably not, thanks, e.g., due to regression to the mean).
  3. What-If Assessments. Asking participants to reflect on a counterfactual like, “How would you have felt without the programme?” Participants provide the estimate of the counterfactual, the evaluators use it to estimate the effect.
  4. Just-Tell-Me Assessments. Cites Copestake (2014): “If we are interested in finding out whether particular men, women or children are less hungry as a result of some action it seems common-sense just to ask them.” In this case participants may be construed as carrying out the “What-If” assessment of the previous point and using this to work out the programme effect themselves.
  5. Direct Observation. Simply seeing the causal effect rather than inferring. An example given is of tapping a car brake and seeing the effect. Not sure I buy this one and neither does Reichardt. Whatever it is, I agree a counterfactual of some sort is needed (and inferred): you need to have a theory to explain what would have happened had you not tapped the brake.
  6. Theories-of-Change Assessments. Contribution analysis and realist evaluation are offered as examples. The gist is, despite what proponents of these approaches claim, to use a theory of change to work out whether the programme is responsible for or “contributes to” outcomes, you need to use the theory of change to think about the counterfactual. I’ve blogged about realist evaluation and contribution analysis elsewhere and their definitions of a causal effect.
  7. The Modus Operandi (MO) Method. The evaluator looks for evidence of traces or tell-tales that the programme worked. Not sure I quite get how this differs from theory-of-change assessments. Maybe it doesn’t. It sounds like potentially another way to evidence the causal chains in a theory of change.

The conclusion:

“I suspect there is no viable alternative to the counterfactual definition of an effect and that when the counterfactual definition is not given explicitly, it is being used implicitly. […] Of course, evaluators are free to use an alternative to the counterfactual definition of a program effect, if an adequate alternative can be found. But if an alternative definition is used, evaluators should explicitly describe that alternative definition and forthrightly demonstrate how their definition undergirds their methodology […].”

I like four of the seven, as kinds of evidence used to infer the counterfactual outcome. I also propose a fifth: evaluator opinion.

  1. Comparisons Across Participants.
  2. Before-After Comparisons.
  3. What-If Assessments.
  4. Just-Tell-Me Assessments.
  5. Evaluator opinion.

The What-If and Just-Tell-Me assessments could involve subject experts rather than only beneficiaries of a programme, which would have an impact on how those assessments are interpreted, particularly if the experts have a vested interest. To me, the Theory of Change Assessment in Reichardt’s original could be carried out with the help of one or more of these five. They are all ways to justify causal links (mediating variables or intermediate variables), not just evaluate outcomes, and help assess the validity of a theory of change. Though readers may not find them all equally compelling, particularly the last.

References

Copestake, J. (2014). Credible impact evaluation in complex contexts: Confirmatory and exploratory approaches. Evaluation, 20(4), 412–427.

Reichardt, C. S. (2022). The Counterfactual Definition of a Program Effect. American Journal of Evaluation43(2), 158–174.