The NeymanโRubin causal model (see, e.g., Rubin, 2008) has the following elements:
- Units, physical entities somewhere/somewhen in spacetime such as someone in Camden Town, London, on a Thursday eve.
- Two or more interventions, where one is often considered a “control”, e.g., cognitive behavioural therapy (CBT) as usual for anxiety, and another is considered a “treatment”, e.g., a new chat bot app to alleviate anxiety. The “control” does not have to be (and almost certainly cannot be) “nothing”.
- Potential outcomes, which represent outcomes following each intervention (e.g., following treatment and following control) for every unit. Alas, only one potential outcome is realised and observed for a unit, depending on which intervention they actually received. This is what makes causal inference such a challenge.
- Zero or more pre-intervention covariates, which are measured for all units.
- The causal effect is the difference in potential outcomes between two interventions for a unit, e.g., in levels of anxiety for someone following CBT and following the app intervention. It is impossible to obtain the causal effect for an individual unit since only one potential outcome can be realised.
- The assignment mechanism is the conditional probability distribution of being in an intervention group, given covariates and potential outcomes. For randomised experiments, the potential outcomes have no influence on the assignment probability. This assignment mechanism also explains which potential outcomes are realised and which are missing data.
Although the causal effect cannot be obtained for individual units, various causal estimates can be inferred if particular assumptions hold, e.g.,
- Sample average treatment effect on the treated (SATT or SATET), which is an estimate of the mean difference in a pair of potential outcomes (e.g., anxiety following the app minus anxiety following CBT) for those who were exposed to the “treatment” (e.g., the app) in a sample.
- Sample average treatment effect (SATE), which is an estimate of the mean difference between a pair of potential outcomes for everyone in a sample.
How does this work?
Suppose we run a randomised trial where people are assigned to either CBT or app based on the outcome of a coin toss. From each participant’s two potential outcomes, we only observe one depending on which group they were assigned to. But since we randomised, we know the missing data mechanism. It turns out that under a coin toss randomised trial, a good estimate of the average treatment effect is simply the difference between the means in observed outcomes for those assigned to the app and those assigned to CBT.
We can also calculate p-values in a variety of ways. One is to assume a null hypothesis of no difference in potential outcomes in the treatment and control conditions, i.e., the potential outcomes are identical for each participant but may vary between participants. Under this particular “sharp” null, we do not have a missing data problem since we can just use whatever outcome was observed for each participant to fill in the blank for the unobserved potential outcome. Since we know the assignment mechanism, it is possible to work out the distribution of possible mean differences under the null by enumerating all possible random assignments to groups and calculating the mean difference between treatment and control for each (in practice there may be too many, but we can approximate by taking a random subset). Now calculate a p-value by working out the probability of obtaining the actually observed mean difference or larger against this distribution of differences under the null.
What’s lovely about this potential outcomes approach is that it’s a simple starting point for thinking about a variety of ways for evaluating the impact of interventions. Though working out the consequences, e.g., standard errors for estimators, may be non-trivial.
References
Rubin, D. B. (2008). For objective causal inference, design trumps analysis. Annals of Applied Statistics, 2(3), 808โ840.