Exploring the validity of sentiment analysis in psychotherapy

What happens if you apply the Multilingual Language Model Toolkit for Sentiment Analysis (XLM-T), “a transformer-based NLP model derived from the Cross-lingual Language Model based on RoBERTa”, to psychotherapy transcripts? Eberhardt et al. (2024) investigate. Here’s a slightly simplified Table 1, showing correlations between positive and negative sentiment and a patient-reported emotions scale. Green gives between-patient correlations and pink within-patient across sessions. Note the wide confidence intervals.

Eberhardt, S. T., Schaffrath, J., Moggia, D., Schwartz, B., Jaehde, M., Rubel, J. A., Baur, T., André, E., & Lutz, W. (2024). Decoding emotions: Exploring the validity of sentiment analysis in psychotherapy. Psychotherapy Research.

Components and delivery formats of therapy for chronic insomnia

Interesting meta-analysis by Furukawa et al. (in press) of 241 trials, aiming to work out what components of therapy lead to better outcomes for people with chronic insomnia.


Furukawa Y., Sakata M., Yamamoto R., et al. Components and Delivery Formats of Cognitive Behavioral Therapy for Chronic Insomnia in Adults: A Systematic Review and Component Network Meta-Analysis. JAMA Psychiatry. Published online January 17, 2024.

Intent-to-fail treatments

Westen et al. (2004, p. 651) on “intent-fo-fail” conditions in randomised controlled trials comparing two interventions with each other:

‘Researchers should also exercise caution in labeling control treatments not constructed to maximize their efficacy ([…] what might be called intent-to-fail conditions) with brand names that are readily confused with genuine treatments and create sleeper effects in the literature. For example […] to test the efficacy of CBT for bulimia, Garner et al. (1993) developed a treatment they called supportive– expressive therapy, an abbreviated treatment described as nondirective and psychodynamically inspired, in which clinicians were forbidden to discuss the target symptoms with the patient and were instead instructed to reflect them back to the patient. Such a practice is not in fact characteristic of psychodynamic therapies for eating disorders (e.g., Bruch, 1973) and is analogous to a researcher creating a cognitive therapy comparison condition in which the therapist is instructed to say, “That’s irrational,” every time a patient tries to discuss the symptom.’


Westen, D., Novotny, C. M., & Thompson-Brenner, H. (2004). The Empirical Status of Empirically Supported Psychotherapies: Assumptions, Findings, and Reporting in Controlled Clinical Trials. Psychological Bulletin, 130(4), 631–663.

Comment on Peter Kinderman’s blog post

(Peter’s blog post.)

Peter Kinderman seems to be arguing that it doesn’t matter if an experience is classified as resulting from disease, illness, disorder, or a response to circumstance (genetically mediated or otherwise). People who have “obvious and quantifiable needs” should get the help they need with social challenges which may have led to the difficulties in the first place. They should have someone to talk to so they can make sense of what has happened. Removing the category of illness doesn’t remove distress, doesn’t mean people shouldn’t be helped. This makes a lot of sense.

Much has been said about the problems with diagnostic categories and with naïve reification to biological entities. You have disease D if and only if you have symptoms S1, S2, … Sn. Why do you have those symptoms? Why of course it’s because you have disease D. I think we can safely conclude, along with many others, that this is circular. An argument that we “need” diagnoses to care for people is unconvincing.

Should we completely throw away what has been collected in diagnostic tomes? I don’t think we should.

One complaint about DSM and ICD is that they cover all aspects of human experience. Most of us can find a diagnosis in there, especially if interpreting the descriptions broadly. But in many ways this is a strength — when naïve reification is eliminated. Denny Borsboom, Angelique Cramer and others have done important work extracting the individual complaints (e.g., loss of interest, thinking about suicide, fatigue, muscle tension) which make up diagnoses and modelling how they relate to each other (Borsboom, Cramer, Schmittmann, Epskamp, & Waldorp, 2011; Borsboom & Cramer, 2013). The individual descriptions and their interrelationships might gain in meaning when stripped of their diagnostic group.

Describing the sorts of situations people find themselves in and how they feel is crucial for conducting research and helping build up evidence for what works. When is talking therapy helpful? When might it make more sense for people to work four days a week rather than five? When should a focus be on interpersonal problems and who should be involved in sessions?

DSM-5 includes a chapter on “Other conditions that may be a focus of clinical attention” (American Psychiatric Association, 2013, pp. 715–727). It’s brief, making up only about 2% of the book, and should be expanded, however, it seems relevant to a psychosocial approach and could perhaps be combined with other descriptions of predicaments and problems. Example problems include:

  • High expressed emotion level within family
  • Spouse or partner violence
  • Inadequate housing
  • Discord with neighbour, lodger, or landlord
  • Problem related to current military deployment status
  • Academic or education problem
  • Social exclusion or rejection
  • Insufficient social insurance or welfare support

So, “DSM” is not synonymous with “biological”. There is again plenty to be built upon, despite its problems.

Kinderman argues that practitioners “can offer practical help, negotiate social benefits (which could be financial support, negotiated time off work, or deferred studies, for example), or offer psychological or emotional support.” It was great to see specific examples. Medication also likely has a place, especially when the mechanisms of action are conceptualized in a drug-centred way rather than keeping up the pretense that they cure a disease (Moncrieff & Cohen, 2005). I think we all should be doing more to elaborate how a meaningful psychosocial approach can work in practice.


American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, DC.

Borsboom, D., & Cramer, A. O. J. (2013). Network analysis: an integrative approach to the structure of psychopathology. Annual Review of Clinical Psychology, 9, 91–121. doi:10.1146/annurev-clinpsy-050212-185608

Borsboom, D., Cramer, A. O. J., Schmittmann, V. D., Epskamp, S., & Waldorp, L. J. (2011). The small world of psychopathology. PloS ONE, 6(11), e27407. doi:10.1371/journal.pone.0027407

Moncrieff, J., & Cohen, D. (2005). Rethinking models of psychotropic drug action. Psychotherapy and Psychosomatics, 74, 145–153. doi:10.1159/000083999

Worrying developments in NHS England mental health outcomes monitoring

Mental health service users hope that the therapeutic interventions they receive will help them feel better. Randomised controlled trials are one important way to test whether a therapy “works”; however, they don’t reveal how interventions are experienced in routine care. This has led to routine outcomes monitoring which uses questionnaires to ask service users and clinicians to rate symptoms and other relevant information before, during, and after treatment. Outcomes monitoring has been used by NHS services for some years, for example through the Child Outcomes Research Consortium and Improving Access to Psychological Therapies (IAPT). It is, however, controversial. Ros Mayo (2010, p. 63) for example argues that:

“The application of oversimplified questions requiring tick-box answers … are driven by short-term and superficial policies and management techniques, largely incorporated from industry and the financial sector and primarily concerned with speed, change, results, cost effectiveness – turnover and minimising human contact and time involvement… They have nothing to do with human engagement…”

And yet, outcomes monitoring could be better than bureaucracy. There is emerging evidence that providing regular progress feedback to clinicians improves outcomes, especially when questionnaires are completed by service users. Intuitively this seems to make sense: people could sometimes reveal more about how they feel on paper than they can orally face-to-face. Items used in IAPT include:

  • “How often have you been bothered by… not being able to stop or control worrying?”
  • “Heart palpitations bother me when I am around people”

They also ask directly about the care received, for example:

  • “Did staff listen to you and treat your concerns seriously?”
  • “Did you feel involved in making choices about you treatment and care?”

Responses to these items could help clinicians understand to what extent services are helping service users. Also using standardised questionnaires means that expected progress curves can be developed (for example see work by Lambert and colleagues), so clinicians can see, for example, if progress is slower than would be expected given initial assessment and, if warranted, try a different approach.

It’s early days for outcomes monitoring but the above examples suggest that it could be a promising approach. However, closer examination shows that there are clear problems with how questionnaires are being used in practice, and I think NHS services in England are being asked to implement actively damaging approaches to outcomes monitoring.

Problem 1. Use of an unreliable measure

Suppose you wish to develop a rating scale for quality of life or how distressed you are so that you can monitor progress over time with a summary “score”. As a bare minimum requirement, all of the questions for one topic should be related to each other: the items should be “internally consistent”. This questionnaire probably wouldn’t do very well:


  1. How often do you sing in the shower?
  2. What height are you?
  3. How far do you live from the nearest park?
  4. What’s your favourite number?
  5. How often do you go dancing?

You might learn interesting things from some of the individual answers, but summing all the answers together is unlikely to be revealing. This questionnaire would fare better:


  1. How do you feel? (0 is terrible, 10 fantastic)
  2. How do you feel? (0 is terrible, 10 fantastic)
  3. How do you feel? (0 is terrible, 10 fantastic)
  4. How do you feel? (0 is terrible, 10 fantastic)
  5. And finally… how do you feel? (0 is terrible, 10 fantastic)

However, you might wonder if questions 2 to 5 add anything. There are many ways to test the internal consistency of questionnaires, using the answers that people give. One is to use a formula called Cronbach’s alpha which gives answers from 0 to 1. Higher, say around 0.8, is better. Too close to 1 suggests redundancy in questions, as would be likely for the Reliable Feelings Questionnaire above.

In England, it is now recommended to use a “Mental Health Clustering Tool” to evaluate outcomes (see section 7.1 of recent guidelines). This is a questionnaire completed by clinicians covering areas such as hallucinations, delusions, depression, and relationship difficulties. The questionnaire suffers from a very basic problem: it’s not internally consistent. This has been discovered by the very people who proposed the approach (see p.30 of their report): “As a general guideline, alpha values of 0.70 or above are indicative of a reasonable level of consistency”. Their results are: 0.44, 0.58, 0.63, 0.57 – conspicuously smaller than 0.70. The authors also refer to previous studies explaining that this would always be the case, due to “its original intended purpose of being a scale with independent items” (p. 30). So, by design, it’s closer to the General Stuff Questionnaire above: a mixed bag of independent questions with low reliability.

Problem 2. Proposals to link outcomes to payment

Given evidence that collecting regular feedback might improve the quality of care people receive, it may be a good idea that the IAPT programme includes regular progress monitoring. IAPT uses service user completed questionnaires, which could in principle provide information clinicians might not otherwise have learned. There is, however, another potential difficulty over and above that of the quality of questionnaires used, and that is how external influences such as “Payment by Results” (PbR) initiatives can change for the worse how data is gathered and used. And PbR initiatives are beginning to be used in practice. The IAPT webpage notes, “An outcome based payment and pricing system is being developed for IAPT services. This is unique as other systems of PbR are activity or needs based.” Initial pilot results were “encouraging,” says the web page, and another pilot is currently running.

The idea with this proposal is that the more improvement shown by service users, as partly determined by outcomes scores, the more money service providers would receive. This is a worry as linking measures to targets has a tendency to cause the measures to stop measuring what it is hoped that they measure. For instance targets on ambulance response times have led to statistically unlikely peaks at exactly the target, suggesting that times have been changed. A national phonics screen has a statistically unlikely peak just at the cutoff score, suggesting that teachers have rounded marks up where they fell just below the cutoff. The effect has been around for such a long time that it has a name, Goodhart’s law:

“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

Faced with funding cuts, how many NHS managers in overstretched services will be forced to “game” performance-based payment systems to ensure their service survives? It’s not hard to do so, for example people who drop out of therapy tend to do so because they didn’t think it was helping. It can be easy to justify not asking people who leave therapy to complete questionnaires. Those who stay in therapy may be the ones with higher scores (see Clark, 2011, p. 321). Therefore, the missing data from those whom therapy was not helping could lead to a false picture of how well a service is working for service users. It is difficult to see how any data gathered that has been subject to these difficulties could tell clinicians or service providers anything helpful about their services or the wellbeing of those who use them.

Concluding thoughts

I think it is possible that statistically reliable, high quality questionnaires could be helpful in clinical practice, if thoughtfully used and explained to service users. Perhaps such questionnaires could be thought of as a bit like blood pressure readings; few patients feel reduced to numbers when told their results, and it’s obvious that more information is required to formulate a treatment if something is awry. However, using unreliable measures – especially when the developers know they are unreliable – is unacceptable for a national mental health programme. Moreover, linking questionnaire scores to payment raises even more complex ethical issues: there is a risk that the bureaucratic burden of questionnaires for service users would increase; also poorer treatment and financial decisions could result because those decisions would be made on the basis of low quality, unreliable data. Mental health services need to do much better than that, for the sake of everyone’s wellbeing.

Thanks very much to Martha Pollard and Justine McMahon for helpful comments.

Andy Fugard is a Lecturer in Research Methods and Statistics at the Department of Clinical, Educational and Health PsychologyUniversity College London. His research investigates psychological therapies for children and young people: are they effective in routine practice and what moderates effectiveness. He is also interested in policy around practice-based evidence and recently became a member of the NHS England/Monitor Quality and Cost Benchmarking Advisory Group.

Why can psychological therapy be helpful?

Research explaining how therapy might help is saturated with pretentious jargon, e.g., invoking “transference”, “extinction”, heightening access to “cognitive–emotional structures and processes”, “reconfiguring intersubjective relationship networks” (see over here for more).

Could simpler explanations be provided? Here are some quick thoughts, inspired by literature, discussing with people, and engaging myself as a client in therapy:

  • You know the therapist is there to listen to you – they’re paid to do so – so there’s less need to worry about their thoughts and feelings. One can and is encouraged to talk at length about oneself. This can feel liberating whereas in other settings it might feel selfish or self-indulgent.
  • The therapist keeps track of topics within and across sessions. This can be important for recognising patterns and maintaining focus, whilst allowing time to tell stories, meandering around past experiences, to see where they lead.
  • The therapist has knowledge (e.g., through literature, supervisory meetings, and conversations with other clients) of a range of people who may have had similar feelings and experiences. So although we’re all unique, it can also be helpful to know that others have faced and survived similar struggles – especially if we learn what they tried and what helped.
  • Drawing on this knowledge, the therapist can conjecture what might be going on. This, perhaps, works best if the conjectures are courageous (so a step or two away from what the clients says) – and tentative, so it’s possible to disagree.
  • There can be an opportunity for practice, for instance of activities or conversations which are distressing. Practicing is a good way to learn.
  • Related, there’s a regular structure and progress monitoring (verbally, with a diary, or using questionnaires). Self-reflection becomes routine and constrained in time, like (this might be a bit crude but bear with me) a psychological analogue of flossing one’s teeth.
  • (Idea from Clare) “… daring to talk about things never spoken of before with someone who demonstrates compassion and acceptance; helpful because allows us to face things in ourselves that scare us and develop less harsh ways of responding to ourselves”
  • The therapist has more distance from situations having an impact on someone than friends might have so, e.g., alternative explanations for interpersonal disputes can more easily be provided.
  • It’s easier for a therapist to be courageous in interactions and suggestions than for a friend as – if all goes wrong – it’s easier for the client to drop out of the therapeutic relationship without long-term consequences (e.g., there’s no loss of friendship).
  • Telling your story to a therapist gives you an audience who is missing all of the context of your life. Most of the context can feel obvious, until you start to tell your story. Storytelling requires explaining the context, making it explicit. For instance who are the people in your life? Why did you and others say and do the things they did? Perhaps this act of storytelling and making the context explicit also makes it easier to become aware of and find solutions.

Book review: High-quality psychotherapy research, by Areán and Kraemer (2013)

Randomised controlled trials (RCTs) are great, the gold standard of empirical research. The only thing better than RCTs are systematic reviews of lots and lots of RCTs. (So the story goes.) The reader may have noticed that RCTs evaluating CBT for psychosis have been vigorously debated for many months after a review was published in the British Journal of Psychiatry (Jauhar et al., 2014). Maybe not everyone agrees that RCTs are great (disclosure: I have analysed a couple), but I think it’s fair to say they are unavoidable whether you are trying to design or demolish them.

High-quality psychotherapy research by Patricia Arean and Helena Chmura Kraemer sets out to be a “practical, step-by-step guide” to designing and running RCTs. So why bother with an RCT? Observational trials, the authors explain, might involve studying participants who choose one of two or more interventions of interest by simply observing how they get on. This is problematic as differences in outcomes might be due to whatever factors led to them ending up receiving an intervention rather than the effect it had. RCTs use randomisation to overcome this problem so that people differ only in terms of the intervention received. That’s about it for the “why”: don’t expect debate on the epistemology.

The book’s strengths emerge as it develops: it catalogues issues that should worry study investigators and the authors draw on their own experience to offer hints. The Delphi consensus-building approach is introduced to solve the problem of developing an intervention manual and examples are given of how to word a letter asking for feedback on the proposed result. Randomisation techniques are introduced including horror stories of how they have gone wrong and invalidated RCTs. Ideas are provided for control groups, e.g., waiting list, usual care, and “gold standard” controls, and their strengths and drawbacks. The importance of not using pilot study results to determine sample size choices is explained. Guidance is provided on the people required; for example you need three or more therapists, at least two research assistants in case one takes ill, and a good statistician amongst other people. The Appendix includes a sample budget justification. All practical advice.

The text runs to under 200 pages so this could never be a comprehensive guide to all aspects of RCTs. What this book does do well is provide a systematic menu of options and ideas for things to consider. It might possibly give some ideas of what to demolish too, should you be so inclined, but this book is really only for those who are already sold on RCTs and want to get on with the seemingly painful task of designing and running one.


Areán, P. A., & Kraemer, H. C. (2013). High-quality psychotherapy research: from conception to piloting to national trials. Oxford University Press.

Jauhar, S., McKenna, P. J., Radua, J., Fung, E., Salvador, R., & Laws, K. R. (2014). Cognitive-behavioural therapy for the symptoms of schizophrenia: systematic review and meta-analysis with examination of potential bias. The British Journal of Psychiatry, 204, 20–29. doi:10.1192/bjp.bp.112.116285


Book review: Power, interest and psychology, by David Smail (2005)

Power, interest and psychology argues that psychotherapists need to take seriously how the social forces of interest and power affect how we all – therapists and clients alike – think, feel, and behave. The main targets of the book are what Smail believes to be the over-ambition and limited reach of therapists’ actions: warmth and empathy, debating beliefs, the “transference” – exploring and making explicit how the relationship between therapist and client might mirror relationships outside the room. The various therapeutic techniques, he argues, are dwarfed by the harsh social environment outside the therapy room. I read this book with interest as a (non-clinical, academic) lecturer who works with many kinds of psychotherapists and counsellors.

Smail rejects interventions which assume that insight leads to change, that we have individual will power which therapy can encourage, and that conscious thoughts accessible in therapy precede action. But what about clients who show improvement during the first few sessions of therapies which use these forms of intervention? He argues (pp. 24–25) that

“such initial gains tend not to last… Rather like tender plants that thrive only in a greenhouse, it seems that people find that there is still a cold and hostile world waiting for them at the end of their therapy sessions…”

The exceptions cited are clients who are young, attractive, verbal, intelligent, and successful – people who tend to be privileged by society. There is some research support for his clinical experience, for instance showing that cognitive ability positively correlates with outcomes (e.g., Mathiassen et al., 2012). A counterargument is evidence showing that “early responders” tend to sustain  better outcomes at long term follow up (Haas, Hill, Lambert, & Morrell, 2002; Lambert, 2005). However these correlational studies are open to attack: perhaps the “early response” just signals existing social and material resources which were easily activated by therapy (friends, family, money, etc.).

Therapy, Smail argues, tries to boost the perception of clients’ power to change, when in reality it is actual power that clients often need: power over material resources, finances; control in the workplace; personal characteristics such as confidence and intellect; a good home and family life; a love life; and an active social life (Hagan & Smail, 1997). These are areas which often cannot be influenced by talk in the clinic.

So why has individual therapy grown so popular? Smail argues – and emphasises that it’s nothing to be ashamed of – that therapists rely on income to put food on the table and pay the rent, just like their clients. He illustrates with the example of Sigmund Freud (p. 3) who wrote that

“My mood also depends very strongly on my earnings… I have come to know the helplessness of poverty and continually fear it.”

Freud, he argues, changed his theories so as not to blame clients’ parents since they paid the bills. Smail also argues that there is a great mysticism around therapy (p. 8): “rituals of therapeutic cure… bear a strong resemblance to the spells and incantations of sorcerers”, with practitioners rarely explaining in plain language to clients how their techniques supposedly work. Together these interests help sustain psychotherapy.

Is it really true that therapists can only intervene in the room with the individual client? Couple therapy takes the first step beyond the individual by bringing a romantic partner into the room, and there is evidence it helps with relationship problems (Snyder, Castellani, & Whisman, 2006). Child and adolescent mental health services frequently intervene in the family (Carr, 2009). Multi-family therapy (Asen & Scholz, 2010) brings communities of people into a room and encourages families to help each other as the therapists gradually “decentralise” themselves. There is an awareness of the importance of the systems around people suffering distress.

Another path outside the clinic is via “homework”, such as practicing social skills, which is (ideally) jointly agreed and set in a range of different types of therapies (Ronan & Kazantzis, 2006). Outcomes are better when therapies include homework than when no homework is included (Kazantzis, Whittington, & Dattilio, 2010). Smail, however, no doubt would argue that each of these interventions is limited when there are more material challenges at work such as poverty. What then would the homework consist of? Get a job? Make more money?

“The world is in a bloody mess,” concludes the book, “and even though I know, as do many others, what it would look like if it weren’t, I have no more viable idea than anyone else how to get there.” But there are constructive ideas in this text. Awareness that the causes of many of our actions is a mystery can be positive, for example in terms of accepting that social power flows through us and we shouldn’t blame ourselves for our situation or how we feel. A rich analysis is provided of the sources of this social power. The positive and convincing argument of the book is that the main hope of exercising power is through cooperation with others on all levels from friendship through to political activism. Indeed there is some evidence that activists who “advocate a social or political cause” tend to experience more positive emotions than non-activists (Klar & Kasser, 2009).

To what extent broader societal processes are within the scope of psychotherapy will no doubt continue to be debated. But whatever the scope, Smail suggests (p. 84) that the “appropriate role for therapeutic psychology is to record, celebrate and wonder at the extraordinary diversity of human character” – which sounds to me like a valuable starting point for therapeutic research and practice.


Asen, E., & Scholz, M. (2010). Multi-family therapy: concept and techniques. Hove: Routledge.

Carr, A. (2009). The effectiveness of family therapy and systemic interventions for child-focused problems. Journal of Family Therapy, 31, 3–45.

Haas, E., Hill, R. D., Lambert, M. J., & Morrell, B. (2002). Do early responders to psychotherapy maintain treatment gains? Journal of Clinical Psychology, 58, 1157–72. doi:10.1002/jclp.10044

Hagan, T., & Smail, D. (1997). Power-Mapping I . Background and Basic Methodology. Journal of Community & Applied Social Psychology, 7, 257–267.

Kazantzis, N., Whittington, C., & Dattilio, F. (2010). Meta-Analysis of Homework Effects in Cognitive and Behavioral Therapy: A Replication and Extension. Clinical Psychology: Science and Practice, 17, 144–156. doi:10.1111/j.1468-2850.2010.01204.x

Klar, M., & Kasser, T. (2009). Some Benefits of Being an Activist: Measuring Activism and Its Role in Psychological Well-Being. Political Psychology, 30(5), 755–777. doi:10.1111/j.1467-9221.2009.00724.x

Lambert, M. J. (2005). Early response in psychotherapy: further evidence for the importance of common factors rather than “placebo effects”. Journal of Clinical Psychology, 61(7), 855–69. doi:10.1002/jclp.20130

Mathiassen, B., Brøndbo, P. H., Waterloo, K., Martinussen, M., Eriksen, M., Hanssen-Bauer, K., & Kvernmo, S. (2012). IQ as a moderator of outcome in severity of children’s mental health status after treatment in outpatient clinics. Child and Adolescent Psychiatry and Mental Health, 6(22), 1–7. doi:10.1186/1753-2000-6-22

Ronan, K. R., & Kazantzis, N. (2006). The use of between-session (homework) activities in psychotherapy: Conclusions from the Journal of Psychotherapy. Journal of Psychotherapy Integration, 16, 254–259. doi:10.1037/1053-0479.16.2.254

Smail, D. (2005). Power, interest and psychology: elements of a social materialist understanding of distress. Ross-on-Wye: PCCS Books.

Snyder, D. K., Castellani, A. M., & Whisman, M. a. (2006). Current status and future directions in couple therapy. Annual Review of Psychology, 57, 317–44. doi:10.1146/annurev.psych.56.091103.070154

Lightly edited 3 Feb 2019