‘The intervening mechanism evaluation approach assesses whether the causal assumptions underlying a program are functioning as stakeholders had projected (Chen, 1990). […] It is not always labeled in the same way by those who apply it. Some evaluators have referred to it as “theory of change evaluation” (Connell, Kubisch, Schorr, & Weiss, 1995) or “theory-based evaluation” (Rogers, Hasci, Petrosino, & Huebner, 2000; Weiss, 1997).’
Excerpts from the Lind (1753), with help on the ye olde English from others who have quoted him (Hughes, 1975; Bartholomew, 2002; Weber & De Vreese, 2005).
Lind’s study is sometimes presented as an RCT, but it’s not clear how his patients were assigned to groups, just that the cases “were as similar as I could have them” (see discusison in Weber & De Vreese, 2005). Bartholomew (2002) argues that Lind was convinced scurvy was a disease of the digestive system and warns against quoting the positive outcomes for oranges and lemons (and cider) out of the broader context of Lind’s other work.
Here’s what Lind said he did:
“On the 20th May, 1747, I took twelve patients in the scurvy on board the Salisbury at sea. Their cases were as similar as I could have them. They all in general had putrid gums, the spots and lassitude, with weakness of their knees. They lay together in one place, being a proper apartment for the sick in the fore-hold; and had one diet in common to all, viz., water gruel sweetened with sugar in the morning; fresh mutton broth often times for dinner; at other times puddings, boiled biscuit with sugar etc.; and for supper barley, raisins, rice and currants, sago and wine, or the like.”
Groups (n = 2 in each):
- “ordered each a quart of cyder a day”
- “twenty five gutts of elixir vitriol three times a day upon an empty stomach, using a gargle strongly acidulated with it for their mouths.”
- “two spoonfuls of vinegar three times a day upon an empty stomach”
- “a course of sea water”
- “two oranges and one lemon given them every day. These they eat with greediness”
- “The two remaining patients took the bigness of a nutmeg three times a day of an electuray recommended by an hospital surgeon made of garlic, mustard seed, rad. raphan., balsam of Peru and gum myrrh, using for common drink barley water well acidulated with tamarinds, by a decoction of which, with the addition of cremor tartar, they were gently purged three or four times during the course”
Excerpt from the study outcomes:
- “The consequence was that the most sudden and visible good effects were perceived from the use of the oranges and lemons; one of those who had taken them, being at the end of six days fit for duty”
- “Next to the oranges, I thought the cyder had the best effects”
Bartholomew, M. (2002). James Lind’s Treatise of the Scurvy (1753). Postgraduate Medical Journal, 78, 695–696.
Hughes, R. E. (1975). James Lind and the cure of Scurvy: An experimental approach. Medical History, 19(4), 342–351.
Weber, E., & De Vreese, L. (2005). The causes and cures of scurvy. How modern was James Lind’s methodology? Logic and Logical Philosophy, 14(1), 55–67.
Some of the social research and evaluation papers I encounter include declarations of the authors’ metaphysical stance: social constructionist, realist (critical or otherwise), phenomenologist – and sometimes a dig at positivism. This is one way research and researchers are classified. Clearly there are different kinds of research; however, might it be easiest to see the differences in terms of research goals rather than jargon-heavy isms? Here are three examples of goals, to try to explore what I mean.
Evoke empathy. If you can’t have a chat with someone then the next best way to empathise with them is via a rich description by or about them. There is a bucket-load of pretentiousness in the literature (search for “thick description” to find some). But skip over this and there are wonderful works that are simply stories. Biographies you read which make you long to meet the subject. Film documentaries, though not fitting easily into traditional research output, are another. Anthologies gathering expressions of people’s lived experience without a researcher filter. “Interpretative Phenomenological Analyses” manage to include stories too, though with more metaphysics.
Classify. This may be the classification of perspectives, attitudes, experiences, processes, organisations, or other stuff-that-happens in society. For example: social class, personality, experiences people have in psychological therapy, political orientation, emotional experiences. The goal here is to develop patterns, whether from thematic analysis of interview responses, latent class analysis of answers on Likert scales, or some other kind of data and analysis. There’s no escaping theory, articulated and debated or unarticulated and unchallenged, when doing this.
Predict. Do people occupying a particular social class location tend to experience some mental health difficulties more often than others? Does your personality predict the kinds of books you like to read. Do particular events predict an emotion you will feel? Other predictions concern the impact of interventions of various kinds (broadly construed). What would happen if you funded national access to cognitive behavioural therapy or universal basic income? Theory matters here too, usually involving a story or model of why variables relate to each other. Prediction can be statistical or may involve gathering views on expert opinion (expert by lived experience or profession).
These goals cannot be straightforwardly mapped onto quantitative and qualitative data and analysis. As a colleague and I wrote (Fugard & Potts, 2016):
“Some qualitative research develops what looks like a taxonomy of experiences or phenomena. Much of this isn’t even framed as qualitative. Take for example Gray’s highly-cited work classifying type 1 and type 2 synapses. His labelled photos of cortex slices illustrate beautifully the role of subjectivity in qualitative analysis and there are clear questions about generalisability. Some qualitative analyses use statistical models of quantitative data, for example latent class analyses showing the different patterns of change in psychological therapies.”
What I personally want to see, as an avid reader of research, is a summary of the theory – topic-specific, substantive theory rather than metaphysical – that researchers had before launching into gathering data; how they plan to analyse the data; and what they think about the theory when they finished. Ideally I also want to know something about the politics driving the research, whether expressed in terms of conflicts of interest or the authors’ position on inequity or oppression investigated in a study. Reflections on ontological realism and epistemic relativity – less so.
One of my favourite examples of pointless debate is published in the 300 page, The Positivist Dispute in German Sociology. A big chunk of the debate is people attacking Popper’s positivism. Popper’s withering response notes that he wrote a book criticising positivism:
“This is an old misunderstanding created and perpetuated by people who know of my work only at second-hand: owing to the tolerant attitude adopted by some members of the Vienna Circle, my book, Logik der Forschug [The Logic of Scientific Discovery], in which I criticized this positivist Circle from a realist and anti-positivist point of view, was published in a series of books edited by Moritz Schlick and Philipp Frank, two leading members of the Circle; and those who judge books by their covers (or by their editors) created the myth that I had been a member of the Vienna Circle, and a positivist. Nobody who has read that book (or any other book of mine) would agree – unless indeed he believed in the myth to start with, in which case he may of course find evidence to support his belief.” (Popper, 1976, p. 290)
Popper K. (1976) Reason or Revolution? In T. W. Adorno, H. Albert, R. Dahrendorf, J. Habermas, H. Pilot, & K. R. Popper, The Positivist Dispute in German Sociology (pp. 288–300). Heinemann Educational Books.
Huey Chen (1990) solved many issues that are still endlessly discussed in evaluation, e.g., the role of stakeholder theories versus social science theories and the different ways theories can be tested. Here’s a useful summary of core elements of a theory-driven approach (Coryn et al., 2011, Table 1, p. 205):
1. Theory-driven evaluations/evaluators should formulate a plausible program theory
a. Formulate program theory from existing theory and research (e.g., social science theory)
b. Formulate program theory from implicit theory (e.g., stakeholder theory)
c. Formulate program theory from observation of the program in operation/exploratory research (e.g., emergent theory)
d. Formulate program theory from a combination of any of the above (i.e., mixed/integrated theory)
2. Theory-driven evaluations/evaluators should formulate and prioritize evaluation questions around a program theory
a. Formulate evaluation questions around program theory
b. Prioritize evaluation questions
3. Program theory should be used to guide planning, design, and execution of the evaluation under consideration of relevant contingencies
a. Design, plan, and conduct evaluation around a plausible program theory
b. Design, plan, and conduct evaluation considering relevant contingencies (e.g., time, budget, and use)
c. Determine whether evaluation is to be tailored (i.e., only part of the program theory) or comprehensive
4. Theory-driven evaluations/evaluators should measure constructs postulated in program theory
a. Measure process constructs postulated in program theory
b. Measure outcome constructs postulated in program theory
c. Measure contextual constructs postulated in program theory
5. Theory-driven evaluations/evaluators should identify breakdowns, side effects, determine program effectiveness (or efficacy), and explain cause-and-effect associations between theoretical constructs
a. Identify breakdowns, if they exist (e.g., poor implementation, unsuitable context, and theory failure)
b. Identify anticipated (and unanticipated), unintended outcomes (both positive and negative) not postulated by program theory
c. Describe cause-and-effect associations between theoretical constructs (i.e., causal description)
d. Explain cause-and-effect associations between theoretical constructs (i.e., causal explanation)
i. Explain differences in direction and/or strength of relationship between program and outcomes attributable to moderating factors/variables
ii. Explain the extent to which one construct (e.g., intermediate outcome) accounts for/mediates the relationship between other constructs
Chen, H. T. (1990). Theory-driven evaluations. Thousand Oaks, CA: Sage.
Coryn, C. L. S., Noakes, L. A., Westine, C. D., & Schröter, D. C. (2011). A systematic review of theory-driven evaluation practice from 1990 to 2009. American Journal of Evaluation, 32(2), 199–226. https://doi.org/10.1177/1098214010389321
“Analysis of the sequences of data envisioned in TBE [theory-based evaluation] presents many challenges. The basic task is to see how well the evidence matches the theories that were posited. Path analysis is conceptually compatible with TBE and has been used by evaluators (Murray and Smith 1979; Smith 1990), but the recurrent problem is that important variables may be overlooked, the model is incomplete, and hence the results can be misleading. Structural equation modeling through LISREL techniques holds much promise, but it has been used only on a limited scale in evaluation.”
‘We all indulge, in anger and regret, in counterfactual talk: “If they had not operated, John would be alive today”; “If I had not said that, she would not have left me”; “If I had chosen a different publisher, my book on causality without counterfactuals would have sold 10,000 copies.” The more fortunate among us have someone to remind us that we are talking nonsense.’
– Shafer, G. (2000, p. 442) [Causal inference without counterfactuals: Comment. Journal of the American Statistical Association, 95, 438–442].
Lovely collection of examples of the folllowing ways of visualising a programme theory:
- Logic model
- Logical framework
- Theory of change
- Context-mechanism-outcome configuration
- Causal loop diagram
- Stock and flow diagram
- Concept map
- Network map
- Path model
- Nested/Hybrid model
Also includes links to tools for reasoning about the representations (where they have some genre of formal semantics).
A potentially useful one-sentence(!) intervention for making a case to run a statistical matching evaluation rather than a randomised controlled trial:
“Matching can be thought of as a technique for finding approximately ideal experimental data hidden within an observational data set.”
– King, G., & Nielsen, R. (2019, p. 442) [Why Propensity Scores Should Not Be Used for Matching. Political Analysis, 27(4), 435–454]