Terminology of programme theory in evaluation

This tickles me (Funnell & Rogers, 2011, pp. 23-24):

Over the years, many different terms have been used to describe the approach to evaluation that is based on a “plausible and sensible model of how the program is supposed to work” (Bickman, 1987b):

      • Chains of reasoning (Torvatn, 1999)
      • Causal chain (Hall and O’Day, 1971)
      • Causal map (Montibeller and Belton, 2006)
      • Impact pathway (Douthwaite et al., 2003)
      • Intervention framework (Ministry of Health, NZ 2002)
      • Intervention logic (Nagarajan and Vanheukelen, 1997)
      • Intervention theory (Argyris, 1970; Fishbein et al., 2001)
      • Logic model (Rogers, 2004)
      • Logical framework (logframe) (Practical Concepts, 1979)
      • Mental model (Senge, 1990)
      • Outcomes hierarchy (Lenne and Cleland, 1987; Funnell, 1990, 1997)
      • Outcomes line
      • Performance framework (Montague, 1998; McDonald and Teather, 1997)
      • Program logic (Lenne and Cleland, 1987; Funnell, 1990, 1997)
      • Program theory (Bickman, 1990)
      • Program theory-driven evaluation science (Donaldson, 2005)
      • Reasoning map
      • Results chain
      • Theory of action (Patton, 1997; Schorr, 1997)
      • Theory of change (Weiss, 1998)
      • Theory-based evaluation (Weiss, 1972; Fitz-Gibbon and Morris, 1975)
      • Theory-driven evaluation (Chen and Rossi, 1983)


Funnell, S. C., & Rogers, P. J. (2011). Purposeful Program Theory: Effective Use of Theories of Change and Logic Models. Jossey-Bass.

From metaphysics to goals in social research and evaluation

Some of the social research and evaluation papers I encounter include declarations of the authors’ metaphysical stance: social constructionist, realist (critical or otherwise), phenomenologist – and sometimes a dig at positivism. This is one way research and researchers are classified. Clearly there are different kinds of research; however, might it be easiest to see the differences in terms of research goals rather than jargon-heavy isms? Here are three examples of goals, to try to explore what I mean.

Evoke empathy. If you can’t have a chat with someone then the next best way to empathise with them is via a rich description by or about them. There is a bucket-load of pretentiousness in the literature (search for “thick description” to find some). But skip over this and there are wonderful works that are simply stories. Biographies you read which make you long to meet the subject. Film documentaries, though not fitting easily into traditional research output, are another. Anthologies gathering expressions of people’s lived experience without a researcher filter. “Interpretative Phenomenological Analyses” manage to include stories too, though with more metaphysics.

Classify. This may be the classification of perspectives, attitudes, experiences, processes, organisations, or other stuff-that-happens in society. For example: social class, personality, experiences people have in psychological therapy, political orientation, emotional experiences. The goal here is to develop patterns, whether from thematic analysis of interview responses, latent class analysis of answers on Likert scales, or some other kind of data and analysis. There’s no escaping theory, articulated and debated or unarticulated and unchallenged, when doing this.

Predict. Do people occupying a particular social class location tend to experience some mental health difficulties more often than others? Does your personality predict the kinds of books you like to read. Do particular events predict an emotion you will feel? Other predictions concern the impact of interventions of various kinds (broadly construed). What would happen if you funded national access to cognitive behavioural therapy or universal basic income? Theory matters here too, usually involving a story or model of why variables relate to each other. Prediction can be statistical or may involve gathering views on expert opinion (expert by lived experience or profession).

These goals cannot be straightforwardly mapped onto quantitative and qualitative data and analysis. As a colleague and I wrote (Fugard & Potts, 2016):

“Some qualitative research develops what looks like a taxonomy of experiences or phenomena. Much of this isn’t even framed as qualitative. Take for example Gray’s highly-cited work classifying type 1 and type 2 synapses. His labelled photos of cortex slices illustrate beautifully the role of subjectivity in qualitative analysis and there are clear questions about generalisability. Some qualitative analyses use statistical models of quantitative data, for example latent class analyses showing the different patterns of change in psychological therapies.”

What I personally want to see, as an avid reader of research, is a summary of the theory – topic-specific, substantive theory rather than metaphysical – that researchers had before launching into gathering data; how they plan to analyse the data; and what they think about the theory when they finished. Ideally I also want to know something about the politics driving the research, whether expressed in terms of conflicts of interest or the authors’ position on inequity or oppression investigated in a study. Reflections on ontological realism and epistemic relativity – less so.

Core elements in theory-driven evaluation

Huey Chen (1990) solved many issues that are still endlessly discussed in evaluation, e.g., the role of stakeholder theories versus social science theories and the different ways theories can be tested. Here’s a useful summary of core elements of a theory-driven approach (Coryn et al., 2011, Table 1, p. 205):

1. Theory-driven evaluations/evaluators should formulate a plausible program theory

a. Formulate program theory from existing theory and research (e.g., social science theory)

b. Formulate program theory from implicit theory (e.g., stakeholder theory)

c. Formulate program theory from observation of the program in operation/exploratory research (e.g., emergent theory)

d. Formulate program theory from a combination of any of the above (i.e., mixed/integrated theory)

2. Theory-driven evaluations/evaluators should formulate and prioritize evaluation questions around a program theory

a. Formulate evaluation questions around program theory

b. Prioritize evaluation questions

3. Program theory should be used to guide planning, design, and execution of the evaluation under consideration of relevant contingencies

a. Design, plan, and conduct evaluation around a plausible program theory

b. Design, plan, and conduct evaluation considering relevant contingencies (e.g., time, budget, and use)

c. Determine whether evaluation is to be tailored (i.e., only part of the program theory) or comprehensive

4. Theory-driven evaluations/evaluators should measure constructs postulated in program theory

a. Measure process constructs postulated in program theory

b. Measure outcome constructs postulated in program theory

c. Measure contextual constructs postulated in program theory

5. Theory-driven evaluations/evaluators should identify breakdowns, side effects, determine program effectiveness (or efficacy), and explain cause-and-effect associations between theoretical constructs

a. Identify breakdowns, if they exist (e.g., poor implementation, unsuitable context, and theory failure)

b. Identify anticipated (and unanticipated), unintended outcomes (both positive and negative) not postulated by program theory

c. Describe cause-and-effect associations between theoretical constructs (i.e., causal description)

d. Explain cause-and-effect associations between theoretical constructs (i.e., causal explanation)

i. Explain differences in direction and/or strength of relationship between program and outcomes attributable to moderating factors/variables

ii. Explain the extent to which one construct (e.g., intermediate outcome) accounts for/mediates the relationship between other constructs


Chen, H. T. (1990). Theory-driven evaluations. Thousand Oaks, CA: Sage.

Coryn, C. L. S., Noakes, L. A., Westine, C. D., & Schröter, D. C. (2011). A systematic review of theory-driven evaluation practice from 1990 to 2009. American Journal of Evaluation, 32(2), 199–226. https://doi.org/10.1177/1098214010389321

“Path analysis is conceptually compatible with TBE”

“Analysis of the sequences of data envisioned in TBE [theory-based evaluation] presents many challenges. The basic task is to see how well the evidence matches the theories that were posited. Path analysis is conceptually compatible with TBE and has been used by evaluators (Murray and Smith 1979; Smith 1990), but the recurrent problem is that important variables may be overlooked, the model is incomplete, and hence the results can be misleading. Structural equation modeling through LISREL techniques holds much promise, but it has been used only on a limited scale in evaluation.”

– Weiss, C. H. (1997, p. 512). How can theory-based evaluation make greater headway? Evaluation Review, 21(4), 501–524.

Visualising programme theories

Lovely collection of examples of the folllowing ways of visualising a programme theory:

  1. Logic model
  2. Logical framework
  3. Theory of change
  4. Context-mechanism-outcome configuration
  5. Causal loop diagram
  6. Stock and flow diagram
  7. Concept map
  8. Network map
  9. Path model
  10. Nested/Hybrid model

Also includes links to tools for reasoning about the representations (where they have some genre of formal semantics).



Lemire, S., Porowski, A., & Mumma, K. (2023). How We Model Matters: Visualizing Program Theories. Abt Associates.

Carol Fitz-Gibbon (1938 – 2017), author of first description of theory-based evaluation, on importance of RCTs

“[…] I produced the first description of theory based evaluation […]. The point of theory based evaluation is to see, firstly, to what extent the theory is being implemented and, secondly, if the predicted outcomes then follow. It is particularly useful as an interim measure of implementation when the outcomes cannot be measured until much later. But most (if not all) theories in social science are only sets of persuasively stated hypotheses that provide a temporary source of guidance. In order to see if the hypotheses can become theories one must measure the extent to which the predicted outcomes are achieved. This requires randomised controlled trials. Even then the important point is to establish the direction and magnitude of the causal relation, not the theory. Many theories can often fit the same data.”

Fitz-Gibbon, C. T. (2002). Researching outcomes of educational interventions. BMJ, 324(7346), 1155.

It’s all theory-based and counterfactual

Two of my favourite articles on #evaluation are Cook’s (2000) argument that all impact evaluations, RCTs included, are theory-based and Reichardt’s (2022) argument that there’s always a counterfactual, if not explicitly articulated then not far beneath the surface. I think both arguments are irrefutable, but how we can build on theirs and others’ work to improve evaluation commissioning and delivery seems a formidable challenge given the fiercely defended dichotomies in the field.

If all impact evaluation really is theory-based then it’s clear there’s huge variation in the quality of theories and theorising. If all impact evaluation depends on counterfactuals then there is huge variation in how compelling the evidence is for the counterfactual outcomes, particularly when there is no obvious comparison group.

Clarifying these kinds of distinctions is, I think, important for improving evaluations and the public services and other programmes they evaluate.


Cook, T. D. (2000). The false choice between theory-based evaluation and experimentation. In A. Petrosino, P. J. Rogers, T. A. Huebner, & T. A. Hacsi (Eds.), New directions in evaluation: Program Theory in Evaluation: Challenges and Opportunities (pp. 27–34). Jossey-Bass.

Reichardt, C. S. (2022). The Counterfactual Definition of a Program Effect. American Journal of Evaluation, 43(2), 158–174.

A cynical view of SEMs

It is all too common for a box and arrow diagram to be cobbled together in an afternoon and christened a “theory of change”. One formalised version of such a diagram is a structural equation model (SEM), the arrows of which are annotated with coefficients estimated using data. Here is John Fox (2002) on SEM and informal boxology:

“A cynical view of SEMs is that their popularity in the social sciences reflects the legitimacy that the models appear to lend to causal interpretation of observational data, when in fact such interpretation is no less problematic than for other kinds of regression models applied to observational data. A more charitable interpretation is that SEMs are close to the kind of informal thinking about causal relationships that is common in social-science theorizing, and that, therefore, these models facilitate translating such theories into data analysis.”


Fox, J. (2002). Structural Equation Models: Appendix to An R and S-PLUS Companion to Applied Regression. Last corrected 2006.

Beautiful friendships have been jeopardised

This is an amusing opening to a paper on face validity, by Mosier (1947):

“Face validity is a term that is bandied about in the field of test construction until it seems about to become a part of accepted terminology. The frequency of its use and the emotional reaction which it arouses-ranging almost from contempt to highest approbation-make it desirable to examine its meaning more closely. When a single term variously conveys high praise or strong condemnation, one suspects either ambiguity of meaning or contradictory postulates among those using the term. The tendency has been, I believe, to assume unaccepted premises rather than ambiguity, and beautiful friendships have been jeopardized when a chance remark about face validity has classed the speaker among the infidels.”

I think dozens of beautiful friendships have been jeopardized by loose talk about randomised controlled trials, theory-based evaluation, realism, and positivism, among many others. I’ve just seen yet another piece arguing that you wouldn’t evaluate a parachute with an RCT and I can’t even.


Mosier, C. I. (1947). A Critical Examination of the Concepts of Face Validity. Educational and Psychological Measurement, 7(2), 191–205.

Counterfactual evaluation

Consider the following two sentences:

(1) Alex’s train left 2 minutes before they arrived at the platform.

(2) If Alex had arrived at the platform 10 minutes earlier, then they probably would have caught their train.

Is the counterfactual in sentence 2 true or false, or can’t you tell because you didn’t run an RCT?

I reckoned that the counterfactual is true. I reasoned that Alex probably missed the train because they were late, so turning up earlier would have fixed that.

I could think of other possible outcomes, but they became increasingly contrived and far from the (albeit minimal) evidence provided. For instance, it is conceivable that if Alex arrived earlier, they would have believed they had time to pop to Pret for a coffee – and missed the train again.