What I think’s wrong with adult mental health Payment by Results (PbR)

(Usual disclaimer: these are my personal views, etc.)

Here’s a simple guide to PbR for some background.

In adult mental health in England there is a collection of “clusters” characterizing mental health service users who (it is hoped) have similar levels of need. These will eventually be linked to tariffs – price (which hopefully relates to cost)Β – and used by CCGs when they commission services. Key to the approach is a questionnaire which asks clinicians to rate problems (e.g., “Problems associated with hallucinations and delusions”) and their severity, and an algorithm mapping these to the clusters. Some more detail is availableΒ over there.

I’m not convinced by the approach. Here’s why:

  1. The model used to link score profiles to clusters has a large number of predictors (1,204) which means it is likely it is “overfitting”, i.e., any predictions made are unlikely to generalise beyond the sample on which it was developed. At its worst there are around 1.5 cases per predictor.
  2. There is evidence that clinicians disagree with the cluster predictions. Investigations around this have seemingly ignored the fact that there is additional information in cluster descriptions such as an ICD-10 clinical diagnosis, for instance “Likely to include F60 Personality disorder”. This information is not part of the scores used as an input to the algorithm which assigns clusters. Without understanding how clinicians use this information it is not possible to improve the approach.
  3. The methodology used to validate the model is circular. Clinicians were trained in an algorithm to choose a cluster on the basis of clinician-completed questionnaire scores. They followed this process with service users in routine practice, first completing a questionnaire, then recording the cluster chosen. A statistical approach was used to model the relationship between scores and clusters chosen. The end result is a statistical model predicting what the clinicians were initially trained to do. The validation method used is to look at correlations between what clinicians (trained in an algorithm) do and what a computer (which relearned the algorithm) does. This is circular.
  4. The clusters are supposed to characterise patients with similar needs, e.g., in terms of duration and complexity of interventions. Is there any evidence that they do? Seemingly not but I hope I’m wrong. It’s clearly essential that clusters actually do this, since PbR is supposed to be used for commissioning services and deciding how much services get paid. It is crucial to look at variation in costs as well as averages.
  5. The questionnaire (“tool”) used for deciding how much services get paid has also been proposed as an outcomes measure. This is despite the fact that the proposed approach derived by a factor analysis is psychometrically poor.Β The proposers recognise this (page 30): “it has been well established within the literature that the HoNOS is not typically associated with a high level of internal consistency due to its original intended purpose of being a scale with independent items encompassing a variety of health related problems.” Their factor analysis confirms this. Goodhart’s law suggests that even if the psychometrics were fine, the measure would cease to measure what it claims to measure once linked to costs.

I don’t think it has to be like this but it all makes very depressing reading.