Growth curve analysis (GCA) is a multilevel regression technique designed for analysis of time course or longitudinal data. A major advantage of this approach is that it can be used to simultaneously analyze both group-level effects (e.g., experimental manipulations) and individual-level effects (i.e., individual differences). We have been using this method for several years, particularly in the context of visual world paradigm (VWP) eye tracking data and learning curves, though it can be applied to any time course data.
Mirman, D. (2014). Growth Curve Analysis and Visualization Using R. Chapman and Hall / CRC.
Mirman, D., Dixon, J.A., & Magnuson, J.S. (2008). Statistical and computational models of the visual world paradigm: Growth curves and individual differences. Journal of Memory and Language, 59(4), 475-494.
The basics are summarized below. Detailed tutorial materials are available on the LCDL github page. All of the analyses are conducted using the R package lme4
and the graphs generated using the ggplot2
package. For more information about R, check out our R Resources page.
Growth curve analysis provides a way to address those challenges by explicitly modeling change over time and quantifying both group-level and individual-level differences. To specify a growth curve model, you need to decide on three key components:
To see how these components come together, consider the ChickWeight
data set (part of the default R installation), which has data from an experiment on the effect of diet on early growth of chicks. We start out with just a "base" model of chick growth allowing for individual variability in weight (in technical terms, a random intercept for each Chick: (1 | Chick)
):
> m.base <- lmer(weight ~ Time + (1 | Chick), data=ChickWeight, REML=F)
To this, we can add fixed effects of diet on the intercept (i.e., a constant difference in weights among chicks randomly assigned to different diets):
> m.0 <- lmer(weight ~ Time + Diet + (1 | Chick), data=ChickWeight, REML=F)
and on the slope (i.e., effects of diet on the rate of growth):
> m.1 <- lmer(weight ~ Time * Diet + (1 | Chick), data=ChickWeight, REML=F)
We can visually examine the effect of adding these terms by plotting the model fits against the observed data. The observed data are on the left, the m.0 (intercept model) fit is in the middle, and the m.1 (linear model) fit is on the right:
To statistically evaluate the effects of adding these terms, we can examine the change in the goodness of fit (log likelihood) through model comparisons:
> anova(m.base, m.0, m.1)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m.base 4 5630.3 5647.8 -2811.2 5622.3
m.0 7 5619.2 5649.7 -2802.6 5605.2 17.143 3 0.0006603 ***
m.1 10 5508.0 5551.6 -2744.0 5488.0 117.184 3 < 2.2e-16 ***
The critical statistic is a chi-square with degrees of freedom equal to the number of parameters added. Since there were 4 diets, each effect of Diet added 3 parameters to the model (diet 1 is considered the reference condition by default and parameters are estimated for each of the 3 other diets relative to this baseline). The print command will give us the actual parameter estimates:
>
coef(summary(m.1))
Estimate Std. Error t value
(Intercept) 31.5080726 5.9112888 5.33015279
Time 6.7130158 0.2573086 26.08935872
Diet2 -2.8744770 10.1918475 -0.28203690
Diet3 -13.2577473 10.1918475 -1.30081885
Diet4 -0.3982789 10.2006418 -0.03904449
Time:Diet2 1.8961205 0.4267194 4.44348327
Time:Diet3 4.7098552 0.4267194 11.03735892
Time:Diet4 2.9494975 0.4322552 6.82350985
Of course, in addition to different baseline weights, the chicks might have other (unmeasured) individual properties that affect their growth rate. To capture this, we can add a linear effect on time of Chick to the random effects, and we can use model comparisons to examine whether this effect improved model fit:
> m.t1 <- lmer(weight ~ Time * Diet + (1 + Time | Chick), data=ChickWeight, REML=F)
> anova(m.1, m.t1)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
m.1 10 5508.0 5551.6 -2744.0 5488.0
m.t1 12 4824.2 4876.5 -2400.1 4800.2 687.78 2 < 2.2e-16 ***
> coef(summary(m.t1))
Estimate Std. Error t value
(Intercept) 33.654115 2.8023251 12.009354
Time 6.279857 0.7303555 8.598357
Diet2 -5.020519 4.8072243 -1.044370
Diet3 -15.403790 4.8072243 -3.204300
Diet4 -1.747533 4.8145791 -0.362967
Time:Diet2 2.329279 1.2507982 1.862234
Time:Diet3 5.143014 1.2507982 4.111785
Time:Diet4 3.252804 1.2515485 2.599023
Adding the random slopes did improve model fit. As a general rule, the model should include all random effects that are licensed by the design, that is, all the ones that could vary across participants (Barr et al., 2013). Notice that adding the random slope effect caused slight changes in the fixed effect parameter estimates because some of the variance was now captured by that random effect. This is why Barr et al. recommend using the "maximal" random effect structure -- even when a random effect does not improve model fit, it can still affect the fixed effect estimates and excluding it can elevate the false positive rate.
Often, our data are not perfectly straight, so we want to capture that curvature with higher-order polynomial terms (time squared, time cubed, etc.). Because our time variable is usually only positive, natural polynomials are correlated, so the estimated parameters will be interdependent. Given a particular time range and a maximum polynomial order, we can transform the polynomial time vectors to make them independent, that is, "orthogonal". This is illustrated in the figure below for linear and quadratic time in the range 1-10. Since orthogonal polynomial time terms are independent, the parameter estimates can be interpreted independently.
With orthogonal polynomials, the intercept term reflects the average overall curve height, rather than the height at the left edge of the time window, so if you are interested in differences at the very beginning of the time window, you may be better off sticking with natural polynomials.
There are two rules of thumb for reporting growth curve analysis results:
Target fixation proportions for two within-participants conditions (high vs. low word frequency) modeled with third-order orthogonal polynomials.
Individual differences in cohort and rhyme competition among people with aphasia and age-matched controls (Mirman et al., 2011), modeled with fourth-order orthogonal polynomials.
Individual differences in the time course of activation of function and thematic semantic knowledge in 17 participants with left hemisphere stroke (Kalenine, Mirman, & Buxbaum, 2012).
Novel word learning in two between-participants conditions (high vs. low transitional probability; based on Mirman et al., 2008) modeled using second-order orthogonal polynomial.
Additional worked examples and explanations are included in the growth curve analysis workshop slides.
I highly recommend the ggplot2 package for plotting (if you are not already familiar with it, check out our R Resources page). To start, plot the target fixation data discussed above:
ggplot(TargetFix, aes(Time, meanFix, color=Condition)) +
stat_summary(fun.data=mean_se, geom="pointrange") +
labs(y="Fixation Proportion", x="Time since word onset (ms)") +
theme_bw() + scale_color_manual(values=c("red", "blue"))
You can now add the model fits by adding another stat_summary with the y-variable mapped to the predicted values in the lmer model object, which can be accessed using the fitted()
function:
last_plot() + stat_summary(aes(y=fitted(m.full)), fun.y=mean, geom="line")
The other aesthetics (mappings) are automatically inherited (e.g., mapping of color to Condition), so the plot will automatically remain consistent. This is particularly useful for multi-panel plots as in the cohort and rhyme competition example from above. When the model was fit to a data subset, you may want to make sure to align the data and the model-predicted values by creating a new data frame that has just the data subset and add the model-predicted values to that data frame:
dat <- subset(CohortRhyme, Type == "Cohort")
dat$mfit <- fitted(cohort.group)
ggplot(dat, aes(Time, FixProp, color=Object)) +
facet_wrap(~ Group) + theme_bw() +
stat_summary(fun=mean, geom="point") +
stat_summary(aes(y=mfit), fun.y=mean, geom="line") +
labs(y="Fixation Proportion", x="Time since word onset (ms)") +
scale_color_manual(values=c("red", "blue"))