Individual differences (continued)

“I am surprised that the author has used this data set. In my lab, when we collect data with such large individual differences, we refer to the data as ‘junk’. We then redesign our stimuli and/or experimental procedures, and run a new experiment. The junk data never appear in publications”

—An anonymous reviewer in 2005, commenting on research that sought to model individual differences in cognition.

From the intro to Navarro, D. J.; Griffiths, T. L.; Steyvers, M. & Lee, M. D. Modeling individual differences using Dirichlet processes. Journal of Mathematical Psychology, 2006, 50, 101-122

Maximum likelihood

From an old post to the R mailing list by Peter Dalgaard:

“the possibility of finding maximum likelihood estimates by writing up the likelihood and maximizing it is often overlooked…”

R really is magical sometimes. Suppose you want to fit a distribution, M. All you need is to maximise \(\prod_{i=1}^N P(x_i | M)\), or equivalently, \(\sum_{i=1}^N \log P(x_i | M)\). Here’s an example of fitting a Gaussian, starting by breaking a fairly good first guess…

> x = rnorm(1000, 100, 15)
> f = function(p) -2*sum(dnorm(x, p[1], p[2], log=T))
> optim(c(mean(x)-50, sd(x)+15), f)
[1] 100 15

[1] 8193

function gradient
69 NA

[1] 0


(Well actually -2 times the log-likelihood.) Now to have a look at your estimate:

hist(x, probability=T)
curve(dnorm(x,100,15), min(x), max(x), add=T)

It’s funny how the same names keep popping up…

I first heard of Per Martin-Löf through his work in intuitionist logic, which turned out to be important in computer science (see Nordström, Petersson, and Smith, 1990).  His name has popped up again (Martin-Löf, 1973), this time in the context of his conditional likelihood ratio test, apparently used by Item Response Theory folk to assess whether two groups of items test the same ability (see Wainer et al, 1980).  Small world.


Martin-Löf, P. (1973). Statistiska modeller. Anteckningar fran seminarier lasaret 1969–1970 utarbetade av rolf sundberg. Obetydligt ändrat nytryck, october 1973 (photocopied manuscript). Institutet för Säkringsmatematik och Matematisk Statistik vid Stockholms Universitet.

Bengt Nordström, Kent Petersson, and Jan M. Smith. (1990). Programming in Martin-Löf’s Type Theory. Oxford University Press.

Howard Wainer, Anne Morgan and Jan-Eric Gustafsson (1980).  A Review of Estimation Procedures for the Rasch Model with an Eye toward Longish Tests.  Journal of Educational Statistics, 5, 35-64

Some advice on factor analysis from the 60s

“What are the alternatives to factor (or component) analysis if one has a correlation whose analysis one cannot escape? There is only one alternative method of analysing a correlation matrix which needs to be mentioned, and that is to LOOK AT IT.”

“Quite the best alternative to factor analysis is to avoid being saddled with the analysis of a correlation matrix in the first place. (Just to collect a lot of people, to measure them all on a lot of variables, and then to compute a correlation matrix is, after all, not a very advanced way of investigating anything.)”

From Andrew S. C. Ehrenberg (1962). Some Questions About Factor Analysis. The Statistician, 12(3), 191-208

13 ways to look at (Galton-Pearson) correlation

Found this paper on having a nosy around to see different ways of correlating non-Gaussian variables: Joseph Lee Rodgers and W. Alan Nicewander (1988). Thirteen Ways to Look at the Correlation Coefficient. The American Statistician, 42(1), 59-66.

Therein you’ll find details of the history (apparently Gauss got there first, but didn’t care about the special case of bivariate correlation); a range of examples of how to get the coefficient (e.g., standardised covariance, standardised regression slope, a geometric interpretation in “person space”, the balloon rule). Also a nice reminder that, in terms of the maths, the dichotomy between experimental and observational analysis is false: the difference lies in interpretation. Still many people seem to think that ANOVA is for experiments and regression is for observational studies (or that SEM magically deals with causation in observational studies).

All amusing stuff.

A couple of properties of correlation

Spotted these in Langford, E.; Schwertman, N. & Owens, M. (2001) [Is the Property of Being Positively Correlated Transitive? The American Statistician, 55, 322-325.]

1. Let U, V, and W be independent random variables. Define X = U+V, Y = V+W, and Z = WU. Then the correlation between X and Y is positive, Y and Z is positive, but the correlation between X and Z is negative.

It’s easy to see why.  X and Y are both V but with different uncorrelated noise terms. Y and Z have W in common, again with different noise terms. Now X and Z have U in common: for this pair, X is U plus some noise and Z is –U plus some noise which is uncorrelated with the noise in X.

2. If X, Y, and Z are random variables, and X and Y are correlated (call the coefficient \(r_1\)), Y and Z are correlated (\(r_2\)), and \(r_1^2 + r_2^2 > 1\), then X and Z are positively correlated.

Mediation analysis

Glad it’s not just me…

… mediation: a crucial issue in causal inference and a difficult issue to think about. The usual rhetorical options here are:

– Blithe acceptance of structural equation models (of the form, “we ran the analysis and found that A mediates the effects of X on Y”)

– Blanket dismissal (of the form, “estimating mediation requires uncheckable assumptions, so we won’t do it”)

– Claims of technological wizardry (of the form, “with our new method you can estimate mediation from observational data”)

For example, in our book, Jennifer and I illustrate that regression estimates of mediation make strong assumptions, and we vaguely suggest that something better might come along. We don’t provide any solutions or even much guidance.

This is from a blog positing by Andrew Gelman.  He links to a paper which purports to solve the problem, but it looks Hard.

Different notions of “effect size”

Tired of people equating “effect size” with “standardised measure of effect size”? Here’s an antidote, thanks to Shinichi Nakagawa and Innes C. Cuthill (2007). [Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol. Rev. (2007), 82, pp. 591–605.]

They review the different meanings of “effect size”:

  • “Firstly, effect size can mean a statistic which estimates the magnitude of an effect (e.g. mean difference, regression coefficient, Cohen’s d, correlation coefficient). We refer to this as an ‘effect statistic’ (it is sometimes called an effect size measurement or index).
  • “Secondly, it also means the actual values calculated from certain effect statistics (e.g. mean difference = 30 or r = 0.7; in most cases, ‘effect size’ means this, or is written as ‘effect size value’).
  • “The third meaning is a relevant interpretation of an estimated magnitude of an effect from the effect statistics. This is sometimes referred to as the biological importance of the effect, or the practical and clinical importance in social and medical sciences.”

They argue in favour of confidence intervals, as these “are not simply a tool for NHST [signifcance testing], but show a range of probable effect size estimates with a given confidence.”

They also cite Wilkinson, L & The Task Force on Statistical Inference (1999) [Statistical methods in psychology journals. American Psychologist 54, 594–604]:

“our focus on these two standardised effect statistics does not mean priority of standardised effect statistics (r or d) over unstandardised effect statistics (regression coefficient or mean difference) and other effect statistics (e.g. odds ratio, relative risk and risk difference). If the original units of measurement are meaningful, the presentation of unstandardised effect statistics is preferable over that of standardised effect statistics (Wilkinson & the Task Force on Statistical Inference, 1999).”

Good stuff, this.

Group versus individual statistical predictions

Lovely article in the Guardian by Christine Evans-Pughe on how making statistical predictions about individuals is exceedingly tricky. Reports work by Hart, Michie, and Cooke (2007).  Abstract of the latter:

BACKGROUND: Actuarial risk assessment instruments (ARAIs) estimate the probability that individuals will engage in future violence. AIMS: To evaluate the ‘margins of error’ at the group and individual level for risk estimates made using ARAIs. METHOD: An established statistical method was used to construct 95% CI for group and individual risk estimates made using two popular ARAIs. RESULTS: The 95% CI were large for risk estimates at the group level; at the individual level, they were so high as to render risk estimates virtually meaningless. CONCLUSIONS: The ARAIs cannot be used to estimate an individual’s risk for future violence with any reasonable degree of certainty and should be used with great caution or not at all. In theory, reasonably precise group estimates could be made using ARAIs if developers used very large construction samples and if the tests included few score categories with extreme risk estimates.

Hart, S.D., Michie, C., & Cooke D.J. (2007) Precision of actuarial risk assessment instruments: Evaluating the ‘margins of error’ of group v. individual predictions of violence. British Journal of Psychiatry, 190, s60-s65.

What’s the difference between fixed and random effects?

Gelman (2005, p. 21) to the rescue.

We prefer to sidestep the overloaded terms “fixed” and “random” with a cleaner distinction […]. We define effects (or coefficients) in a multilevel model as constant if they are identical for all groups in a population and varying if they are allowed to differ from group to group.

Gelman A. (2005). Analysis of variance—why it is more important than ever. Annals of Statistics, 33(1), 1–53