RCT of the ZOE nutrition app – and critical analysis

Abstract: “Large variability exists in people’s responses to foods. However, the efficacy of personalized dietary advice for health remains understudied. We compared a personalized dietary program (PDP) versus general advice (control) on cardiometabolic health using a randomized clinical trial. The PDP used food characteristics, individual postprandial glucose and triglyceride (TG) responses to foods, microbiomes and health history, to produce personalized food scores in an 18-week app-based program. The control group received standard care dietary advice (US Department of Agriculture Guidelines for Americans, 2020–2025) using online resources, check-ins, video lessons and a leaflet. Primary outcomes were serum low-density lipoprotein cholesterol and TG concentrations at baseline and at 18 weeks. Participants (n = 347), aged 41–70 years and generally representative of the average US population, were randomized to the PDP (n = 177) or control (n = 170). Intention-to-treat analysis (n = 347) between groups showed significant reduction in TGs (mean difference =β€‰βˆ’0.13 mmol lβˆ’1; log-transformed 95% confidence interval =β€‰βˆ’0.07 to βˆ’0.01, P = 0.016). Changes in low-density lipoprotein cholesterol were not significant. There were improvements in secondary outcomes, including body weight, waist circumference, HbA1c, diet quality and microbiome (beta-diversity) (P < 0.05), particularly in highly adherent PDP participants. However, blood pressure, insulin, glucose, C-peptide, apolipoprotein A1 and B, and postprandial TGs did not differ between groups. No serious intervention-related adverse events were reported. Following a personalized diet led to some improvements in cardiometabolic health compared to standard dietary advice. ClinicalTrials.gov registration: NCT05273268.”

Bermingham, K. M., Linenberg, I., Polidori, L., Asnicar, F., Arrè, A., Wolf, J., Badri, F., Bernard, H., Capdevila, J., Bulsiewicz, W. J., Gardner, C. D., Ordovas, J. M., Davies, R., Hadjigeorgiou, G., Hall, W. L., Delahanty, L. M., Valdes, A. M., Segata, N., Spector, T. D., & Berry, S. E. (2024). Effects of a personalized nutrition program on cardiometabolic health: A randomized controlled trial. Nature Medicine.

And here is a blog post that provides an in-depth critique. The key issue is the control group and how small a role the specific elements of Zoe plays, as illustrated in this pic:

 

The Turing Way handbook to reproducible, ethical and collaborative data science

“The Turing Way project is open source, open collaboration, and community-driven. We involve and support a diverse community of contributors to make data science accessible, comprehensible and effective for everyone. Our goal is to provide all the information that researchers and data scientists in academia, industry and the public sector need to ensure that the projects they work on are easy to reproduce and reuse.”

Yonder

Let’s not replace “impact evaluation” with “contribution analysis”

Giel Ton has written an interesting blog post arguing that we should shift from talking about “impact evaluation” to “contribution analysis”, in the form devised by Mayne. Ton defines contribution analysis as following this process:

“make a good theory of change, identify key assumptions in this theory of change, and focus M&E and research on these key assumptions.”

My first thought was, this definition is remarkably broad! It’s the same as for any theory-based approach (or theory-driven – evaluation is awash with synonyms) where you start with a theory of change (ToC) and test and refine it. See, e.g., what Fitz-Gibbon and Morris (1975), Chen and Rossi (1980), and many others were proposing before Mayne. They all criticise “black box” approaches that lob methods at a programme before stopping to think what it might do and how, so I wondered what makes Ton’s (and/or Mayne’s) proposal different to these broad umbrella approaches that include all methods, mixed, blended, interwoven, shaken, or stirred – so long as a ToC is used throughout.

One recurring issue is people endlessly rocking up with yet another panacea: “Behold! ACME Programmeβ„’ will finally sort out your social problem!” Effect size, 0.1 SDs, if you’re lucky. A piece by Thomas Delahais (2023) helped clarify for me what’s different about the contribution analysis approach and how it helps address the panacea phenomenon: alternative explanations of change are treated as being as important as the new programme being investigated. That’s a fun challenge, for all evaluation approaches, qual, quant and RCTs included. For instance, we would design statistical analyses to tell us something about mechanisms that are involved in a range of activities in and around a new programme. We would explore how a new programme interacts with existing activities. These ideas sound very sensible to me – and are often done through implementation and process evaluation. But taking seriously the broader context and alternative explanations of change is much broader than contribution analysis. We might call the activity something like “evaluation”.

References

Chen, H.-T., & Rossi, P. H. (1980). The Multi-Goal, Theory-Driven Approach to Evaluation: A Model Linking Basic and Applied Social Science. Social Forces, 59, 106–122.

Delahais, T. (2023). Contribution Analysis. LIEPP Methods Brief, 44.

Fitz-Gibbon, C. T., & Morris, L. L. (1975). Theory-based evaluation. Evaluation Comment, 5(1), 1–4. Reprinted in Fitz-Gibbon, C. T., & Morris, L. L. (1996). Theory-based evaluation. Evaluation Practice, 17(2), 177–184.

Hedges’ g for multilevel models in R {lmeInfo}

This package looks useful (for {nlme} not {lme4}).

“Provides analytic derivatives and information matrices for fitted linear mixed effects (lme) models and generalized least squares (gls) models estimated using lme() (from package ‘nlme’) and gls() (from package ‘nlme’), respectively. The package includes functions for estimating the sampling variance-covariance of variance component parameters using the inverse Fisher information. The variance components include the parameters of the random effects structure (for lme models), the variance structure, and the correlation structure. The expected and average forms of the Fisher information matrix are used in the calculations, and models estimated by full maximum likelihood or restricted maximum likelihood are supported. The package also includes a function for estimating standardized mean difference effect sizes (Pustejovsky, Hedges, and Shadish (2014) <doi:10.3102/1076998614547577>) based on fitted lme or gls models.”

How Many Imputations Do You Need? {howManyImputations}

“When performing multiple imputations, while 5-10 imputations are sufficient for obtaining point estimates, a larger number of imputations are needed for proper standard error estimates. This package allows you to calculate how many imputations are needed, following the work of von Hippel (2020).”

Useful example here.

References

Von Hippel, P. T. (2020). How many imputations do you need? A two-stage calculation using a quadratic rule. Sociological Methods & Research, 49(3), 699-718.

Why does everyone love a good RCT?

The individual treatment effect is defined as an individual’s potential outcome under treatment minus their potential outcome under control. This within-participant difference cannot be directly measured since only one of the two potential outcomes is realised depending on whether the participant was exposed to treatment or control.

Everyone loves a good randomised controlled trial because the mean outcome of people who were exposed to treatment minus the mean outcome of people who were exposed to control – a between-participant difference – is an unbiased estimator of the mean of within-participant individual treatment effects.

I’ve coded up a simulation in R over here to illustrate how they work. Note in particular the importance of confidence intervals!

A comment about a question

How often have you been at an event where someone from the audience wants to ask a question and insists that they don’t need a microphone because they have a loud voice – only to be reminded that they still require one. Well, I’m at a national conference and I have a question (not a comment). It’s concise, should take a few seconds to say, and hopefully open enough that anyone in the panel can pick it up. Fortunately, once upon a time I took a course on live sound engineering, so I understand how you’re supposed to hold a mic. I’ll make this brief with no fuss.

A couple of mics are passed around the audience. Someone else asks a question before me. After a brief wait, it’s my turn. I hold the microphone as I have learned you should, the correct distance from my mouth and pointing towards it. I begin to speak.

What I hadn’t considered was that I was sitting near the back of a large hall, and the PA system speakers were positioned at the front. My mic technique had optimised the volume of my speech, which I heard back after a short and reverberated delay. This booming, reverberated clone of me made it difficult to concentrate. Additionally, it heightened my awareness of the Julian Clary dimension to my voice, triggering thoughts about cisgendered and heterosexual (cishet) norms of professionalism and which types of voices tend to be taken seriously. I felt that mine had squarely landed on the not-seriously end of the spectrum. All of these thoughts were happening at once; I was only about a sentence in.

So, I tried to adjust the volume by holding the microphone further from my mouth, then a little closer, and then further again. I noticed people dotted around the room, whom I’d been speaking to earlier over coffee, had turned to listen. The panel appeared to be straining to hear me. One panel member asked me to repeat the question, which, by then, I had reduced to a short sentence – simply wanting the moment to be over. That panel member gave a good answer. Another asked, “Can we just take another question, please?”

I’ve recounted this experience in various ways, sometimes aiming for a giggle (taking poetic license with the Clary elements and in-out mic adjustments). Other times, focusing more on the frustrations of dominant professional norms. And yet, I don’t have a conclusion. I’d be eager to hear your thoughts.