What works for whom and in what contexts

“It is sometimes argued that we need rich qualitative data in order to find out not ‘what works’ but for whom and in what contexts. Anyone familiar with the design of experiments will agree. That question is answered by factorial designs with interaction terms… although main effects predominate in most educational datasets. If that last sentence seems like an unfamiliar concept, blame whoever taught you research methods and please make sure that your students are familiar with procedures that will underpin the thousands of controlled trials that education needs if it is to know that rather than simply guessing or asserting why.”

– Carol Taylor Fitz-Gibbon (2002). Knowing Why and Knowing That. Paper presented to the European Evaluation Conference in Seville, Spain. Emphasis and ellipsis in original.

The first definition of something named “theory-based evaluation”

“A theory-based evaluation of a program is one in which the selection of program features to evaluate is determined by an explicit conceptualization of the program in terms of a theory […] which attempts to explain how the program produces the desired effects. The theory might be psychological […] or social psychological […] or philosophical […]. The essential characteristic is that the theory points out a causal relationship between a process A and an outcome B.”

– Carol Taylor Fitz-Gibbon and Lynn Lyons Morris (1975)


Fitz-Gibbon, C. T., & Morris, L. L. (1975). Theory-based evaluation. Evaluation Comment, 5(1), 1–4. Reprinted in Fitz-Gibbon, C. T., & Morris, L. L. (1996). Theory-based evaluation. Evaluation Practice, 17(2), 177–184.

Worthen, B. R. (1996). Editor’s Note: The Origins of Theory-Based Evaluation. Evaluation Practice, 17(2), 169–171. This comment traces the path back to Fitz-Gibbon and Morris (1975).

The kind of theory of theory-driven evaluation

“… the kind of theory we have in mind is not the global conceptual schemes of the grand theorists, but much more prosaic theories that are concerned with how human organizations work and how social problems are generated. It advances evaluation practice very little to adopt one or another of current global theories in attacking, say, the problem of juvenile delinquency, but it does help a great deal to understand the authority structure in schools and the mechanisms of peer group influence and parental discipline in designing and evaluating a program that is supposed to reduce disciplinary problems in schools. Nor are we advocating an approach that rests exclusively on proven theoretical schema that have received wide acclaim in published social science literatures. What we are strongly advocating is the necessity for theorizing, for constructing plausible and defensible models of how programs can be expected to work before evaluating them. Indeed the theory-driven perspective is closer to what econometricians call ‘model specification’ than are more complicated and more abstract and general theories.”

Chen, H.-T., & Rossi, P. H. (1983, p. 285). Evaluating With Sense: The Theory-Driven Approach. Evaluation Review, 7(3), 283–302.


Intervening mechanism evaluation

‘The intervening mechanism evaluation approach assesses whether the causal assumptions underlying a program are functioning as stakeholders had projected (Chen, 1990). […] It is not always labeled in the same way by those who apply it. Some evaluators have referred to it as “theory of change evaluation” (Connell, Kubisch, Schorr, & Weiss, 1995) or “theory-based evaluation” (Rogers, Hasci, Petrosino, & Huebner, 2000; Weiss, 1997).’

Chen, H. T. (2015, p. 312). Practical Program Evaluation: Theory-Driven Evaluation and the Integrated Evaluation Perspective. SAGE Publications Ltd.

Terminology of programme theory in evaluation

This tickles me (Funnell & Rogers, 2011, pp. 23-24):

Over the years, many different terms have been used to describe the approach to evaluation that is based on a “plausible and sensible model of how the program is supposed to work” (Bickman, 1987b):

      • Chains of reasoning (Torvatn, 1999)
      • Causal chain (Hall and O’Day, 1971)
      • Causal map (Montibeller and Belton, 2006)
      • Impact pathway (Douthwaite et al., 2003)
      • Intervention framework (Ministry of Health, NZ 2002)
      • Intervention logic (Nagarajan and Vanheukelen, 1997)
      • Intervention theory (Argyris, 1970; Fishbein et al., 2001)
      • Logic model (Rogers, 2004)
      • Logical framework (logframe) (Practical Concepts, 1979)
      • Mental model (Senge, 1990)
      • Outcomes hierarchy (Lenne and Cleland, 1987; Funnell, 1990, 1997)
      • Outcomes line
      • Performance framework (Montague, 1998; McDonald and Teather, 1997)
      • Program logic (Lenne and Cleland, 1987; Funnell, 1990, 1997)
      • Program theory (Bickman, 1990)
      • Program theory-driven evaluation science (Donaldson, 2005)
      • Reasoning map
      • Results chain
      • Theory of action (Patton, 1997; Schorr, 1997)
      • Theory of change (Weiss, 1998)
      • Theory-based evaluation (Weiss, 1972; Fitz-Gibbon and Morris, 1975)
      • Theory-driven evaluation (Chen and Rossi, 1983)


Funnell, S. C., & Rogers, P. J. (2011). Purposeful Program Theory: Effective Use of Theories of Change and Logic Models. Jossey-Bass.

“Reality is real”

Some of the social research and evaluation papers I encounter include declarations of the authors’ metaphysical stance: social constructionist, realist (critical or otherwise), phenomenologist – and sometimes a dig at positivism. This is one way research and researchers are classified. Clearly there are different kinds of research; however, might it be easiest to see the differences in terms of research goals rather than jargon-heavy isms? Here are three examples of goals, to try to explore what I mean.

Evoke empathy. If you can’t have a chat with someone then the next best way to empathise with them is via a rich description by or about them. There is a bucket-load of pretentiousness in the literature (search for “thick description” to find some). But skip over this and there are wonderful works that are simply stories. Biographies you read which make you long to meet the subject. Film documentaries, though not fitting easily into traditional research output, are another. Anthologies gathering expressions of people’s lived experience without a researcher filter. “Interpretative Phenomenological Analyses” manage to include stories too, though with more metaphysics.

Classify. This may be the classification of perspectives, attitudes, experiences, processes, organisations, or other stuff-that-happens in society. For example: social class, personality, experiences people have in psychological therapy, political orientation, emotional experiences. The goal here is to develop patterns, whether from thematic analysis of interview responses, latent class analysis of answers on Likert scales, or some other kind of data and analysis. There’s no escaping theory, articulated and debated or unarticulated and unchallenged, when doing this.

Predict. Do people occupying a particular social class location tend to experience some mental health difficulties more often than others? Does your personality predict the kinds of books you like to read. Do particular events predict an emotion you will feel? Other predictions concern the impact of interventions of various kinds (broadly construed). What would happen if you funded national access to cognitive behavioural therapy or universal basic income? Theory matters here too, usually involving a story or model of why variables relate to each other. Prediction can be statistical or may involve gathering views on expert opinion (expert by lived experience or profession).

These goals cannot be straightforwardly mapped onto quantitative and qualitative data and analysis. As a colleague and I wrote (Fugard & Potts, 2016):

“Some qualitative research develops what looks like a taxonomy of experiences or phenomena. Much of this isn’t even framed as qualitative. Take for example Gray’s highly-cited work classifying type 1 and type 2 synapses. His labelled photos of cortex slices illustrate beautifully the role of subjectivity in qualitative analysis and there are clear questions about generalisability. Some qualitative analyses use statistical models of quantitative data, for example latent class analyses showing the different patterns of change in psychological therapies.”

What I personally want to see, as an avid reader of research, is a summary of the theory – topic-specific, substantive theory rather than metaphysical – that researchers had before launching into gathering data; how they plan to analyse the data; and what they think about the theory when they finished. Ideally I also want to know something about the politics driving the research, whether expressed in terms of conflicts of interest or the authors’ position on inequity or oppression investigated in a study. Reflections on ontological realism and epistemic relativity – less so.

Core elements in theory-driven evaluation

Huey Chen (1990) solved many issues that are still endlessly discussed in evaluation, e.g., the role of stakeholder theories versus social science theories and the different ways theories can be tested. Here’s a useful summary of core elements of a theory-driven approach (Coryn et al., 2011, Table 1, p. 205):

1. Theory-driven evaluations/evaluators should formulate a plausible program theory

a. Formulate program theory from existing theory and research (e.g., social science theory)

b. Formulate program theory from implicit theory (e.g., stakeholder theory)

c. Formulate program theory from observation of the program in operation/exploratory research (e.g., emergent theory)

d. Formulate program theory from a combination of any of the above (i.e., mixed/integrated theory)

2. Theory-driven evaluations/evaluators should formulate and prioritize evaluation questions around a program theory

a. Formulate evaluation questions around program theory

b. Prioritize evaluation questions

3. Program theory should be used to guide planning, design, and execution of the evaluation under consideration of relevant contingencies

a. Design, plan, and conduct evaluation around a plausible program theory

b. Design, plan, and conduct evaluation considering relevant contingencies (e.g., time, budget, and use)

c. Determine whether evaluation is to be tailored (i.e., only part of the program theory) or comprehensive

4. Theory-driven evaluations/evaluators should measure constructs postulated in program theory

a. Measure process constructs postulated in program theory

b. Measure outcome constructs postulated in program theory

c. Measure contextual constructs postulated in program theory

5. Theory-driven evaluations/evaluators should identify breakdowns, side effects, determine program effectiveness (or efficacy), and explain cause-and-effect associations between theoretical constructs

a. Identify breakdowns, if they exist (e.g., poor implementation, unsuitable context, and theory failure)

b. Identify anticipated (and unanticipated), unintended outcomes (both positive and negative) not postulated by program theory

c. Describe cause-and-effect associations between theoretical constructs (i.e., causal description)

d. Explain cause-and-effect associations between theoretical constructs (i.e., causal explanation)

i. Explain differences in direction and/or strength of relationship between program and outcomes attributable to moderating factors/variables

ii. Explain the extent to which one construct (e.g., intermediate outcome) accounts for/mediates the relationship between other constructs


Chen, H. T. (1990). Theory-driven evaluations. Thousand Oaks, CA: Sage.

Coryn, C. L. S., Noakes, L. A., Westine, C. D., & Schröter, D. C. (2011). A systematic review of theory-driven evaluation practice from 1990 to 2009. American Journal of Evaluation, 32(2), 199–226. https://doi.org/10.1177/1098214010389321

“Path analysis is conceptually compatible with TBE”

“Analysis of the sequences of data envisioned in TBE [theory-based evaluation] presents many challenges. The basic task is to see how well the evidence matches the theories that were posited. Path analysis is conceptually compatible with TBE and has been used by evaluators (Murray and Smith 1979; Smith 1990), but the recurrent problem is that important variables may be overlooked, the model is incomplete, and hence the results can be misleading. Structural equation modeling through LISREL techniques holds much promise, but it has been used only on a limited scale in evaluation.”

– Weiss, C. H. (1997, p. 512). How can theory-based evaluation make greater headway? Evaluation Review, 21(4), 501–524.

Visualising programme theories

Lovely collection of examples of the folllowing ways of visualising a programme theory:

  1. Logic model
  2. Logical framework
  3. Theory of change
  4. Context-mechanism-outcome configuration
  5. Causal loop diagram
  6. Stock and flow diagram
  7. Concept map
  8. Network map
  9. Path model
  10. Nested/Hybrid model

Also includes links to tools for reasoning about the representations (where they have some genre of formal semantics).



Lemire, S., Porowski, A., & Mumma, K. (2023). How We Model Matters: Visualizing Program Theories. Abt Associates.

How theory-infused forms of evaluation proliferate

In case you’re wondering why we’re blessed with a multitude of terms for evaluations that use theory in some shape or fashion – theory-oriented evaluation, theory-based evaluation, theory-driven evaluation, program theory evaluation, intervening mechanism evaluation, theoretically relevant evaluation research, and program theory-driven evaluation science (Donaldson, 2022, p. 9) – the answer is in an XKCD comic:


Donaldson, S. I. (2022). Introduction to Theory-Driven Program Evaluation (2nd ed.). Routledge.