Contact

Growth Curve Analysis


Overview

Growth curve analysis (GCA) is a multilevel regression technique designed for analysis of time series data. A major advantage of this approach is that it can be used to simultaneously analyze both group-level effects (e.g., experimental manipulations) and individual-level effects (i.e., individual differences). We have been using this method for several years, particularly in the context of visual world paradigm (VWP) eye tracking data, though it can be applied to any time series data. Our first article describing this method was:

Mirman, D., Dixon, J.A., & Magnuson, J.S. (2008). Statistical and computational models of the visual world paradigm: Growth curves and individual differences. Journal of Memory and Language, 59(4), 475-494.

Our original GCA page provides the example data sets and analysis code used for that article, along with (old) tutorial materials. Since that original report, we have improved the method in various ways. This "new and improved" version of growth curve analysis is described here. All of the analyses are conducted using the R package lme4 and the graphs generated using the ggplot2 package. For more information about R, check out our R Resources page.

Key challenges in analyzing time series data

  1. Using separate analyses for individual time bins or time windows creates a trade-off between power (more data in each bin) and temporal resolution (smaller time bins) and introduces experimenter bias in selection of time bins/windows.
  2. Statistical thresholding (i.e., p < 0.05 is significant but p > 0.05 is not) creates false discretization of continuous processes.
  3. There is no clear way to quantify individual differences, which are an important source of constraints for theories of cognition.

The big picture

Growth curve analysis provides a way to address those challenges by explicitly modeling change over time and quantifying both group-level and individual-level differences. To specify a growth curve model, you need to decide on three key components:
  1. The functional form: the overall shape of the data. What family of mathematical functions are you going to use to model the data? For growth curve analysis (and multilevel regression in general) the functional form needs to be "dynamically consistent", meaning that the model of the average is equal to the average of the individual models (see our technical report on dynamic consistency for more information). Polynomial models (starting from linear and going up through quadratic, cubic, quartic, etc.) satisfy this property and are able to capture any data shape, so they are a good option. Unfortunately, polynomials are not very good at capturing flat asymptotes and extrapolation (i.e., predicting what will happen outside the observed time window) is generally not possible. If you are not very interested in asymptotic portions of your data and don't plan on making out-of-time-window prediction, then a polynomial functional form is probably best for you. 
  2. The fixed effects: What are your group-level predictors? Usually these are the experimental manipulations like word frequency, stimulus relatedness, etc. They can be continuous or categorical and they can refer to either the stimuli (e.g., word frequency) or the participants (e.g., age, short-term memory span, lesion location, etc.).
  3. The random effects: the randomly sampled observations over which you plan to generalize. Typically these are either participants or items. By capturing individual variability, the random effects provide another way to quantify individual differences.

A simple example

To see how these components come together, consider the ChickWeight data set (part of the default R installation), which has data from an experiment on the effect of diet on early growth of chicks. We start out with just a "base" model of chick growth allowing for individual variability in weight (in technical terms, a random intercept for each Chick: (1 | Chick) )
:

> m.base <- lmer(weight ~ Time + (1 | Chick), data=ChickWeight, REML=F)

To this, we can add fixed effects of diet on the intercept (i.e., a constant difference in weights among chicks randomly assigned to different diets):
> m.0 <- lmer(weight ~ Time + Diet + (1 | Chick), data=ChickWeight, REML=F)

and on the slope (i.e., effects of diet on the rate of growth):
> m.1 <- lmer(weight ~ Time * Diet + (1 | Chick), data=ChickWeight, REML=F)

We can visually examine the effect of adding these terms by plotting the model fits against the observed data. The observed data are on the left, the m.0 (intercept model) fit is in the middle, and the m.1 (linear model) fit is on the right:

To statistically evaluate the effects of adding these terms, we can examine the change in the goodness of fit (log likelihood) through model comparisons:
> anova(m.base, m.0, m.1)
       Df    AIC    BIC  logLik   Chisq Chi Df Pr(>Chisq)    
m.base  4 5630.3 5647.8 -2811.2                              
m.0     7 5619.2 5649.7 -2802.6  17.143      3  0.0006603 ***
m.1    10 5508.0 5551.6 -2744.0 117.184      3  < 2.2e-16 ***
The critical statistic is a chi-square with degrees of freedom equal to the number of parameters added. Since there were 4 diets, each effect of Diet added 3 parameters to the model (diet 1 is considered the reference condition by default and parameters are estimated for each of the 3 other diets relative to this baseline). The print command will give us the actual parameter estimates:
> print(m.1, corr=F)
Fixed effects:
            Estimate Std. Error t value
(Intercept)  31.5081     5.9108   5.331
Time          6.7130     0.2573  26.089
Diet2        -2.8745    10.1910  -0.282
Diet3       -13.2577    10.1910  -1.301
Diet4        -0.3983    10.1998  -0.039
Time:Diet2    1.8961     0.4267   4.443
Time:Diet3    4.7099     0.4267  11.037
Time:Diet4    2.9495     0.4323   6.824

Of course, in addition to different baseline weights, the chicks might have other (unmeasured) individual properties that affect their growth rate. To capture this, we can add a linear effect on time of Chick to the random effects, and we can use model comparisons to examine whether this effect improved model fit:
m.t1 <- lmer(weight ~ Time * Diet + (1 + Time | Chick), data=ChickWeight, REML=F)
> anova(m.1, m.t1)
     Df    AIC    BIC  logLik  Chisq Chi Df Pr(>Chisq)    
m.1  10 5508.0 5551.6 -2744.0                             
m.t1 12 4824.2 4876.5 -2400.1 687.78      2  < 2.2e-16 ***
> print(m.t1, corr=F)
            Estimate Std. Error t value
(Intercept)  33.6541     2.8023  12.009
Time          6.2799     0.7304   8.598
Diet2        -5.0205     4.8072  -1.044
Diet3       -15.4038     4.8072  -3.204
Diet4        -1.7476     4.8145  -0.363
Time:Diet2    2.3293     1.2509   1.862
Time:Diet3    5.1430     1.2509   4.112
Time:Diet4    3.2528     1.2516   2.599
Adding the random slopes did improve model fit. As a general rule, the model should include all random effects that are licensed by the data, that is, all the ones that improve model fit. Notice also that adding the random slope effect caused slight changes in the fixed effect parameter estimates because some of the variance was now captured by that random effect.

Applying GCA to VWP data

Orthogonal polynomials

Often, our data are not perfectly straight, so we want to capture that curvature with higher-order polynomial terms (time squared, time cubed, etc.). Because our time variable is usually only positive, natural polynomials are correlated, so the estimated parameters will be interdependent. Given a particular time range and a maximum polynomial order, we can transform the polynomial time vectors to make them independent, that is, "orthogonal". This is illustrated in the figure below for linear and quadratic time in the range 1-10. Since orthogonal polynomial time terms are independent, the parameter estimates can be interpreted independently.
With orthogonal polynomials, the intercept term reflects the average overall curve height, rather than the height at the left edge of the time window, so if you are interested in differences at the very beginning of the time window, you may be better off sticking with natural polynomials.

A complete example

For this example, we will consider the simple target fixation data shown on the right from a VWP experiment with a word frequency manipulation (High frequency words recognized faster than Low frequency words). Note that since Condition (High vs. Low frequency) was manipulated within-participants, we will want to include nested random effects of Condition with Subject.

The starting data frame is shown below. Subject is the the subject ID, Time is time (in ms) from word onset, timeBin identifies the 100ms time bin corresponding to this time point (convenient for aligning orthogonal time), Condition is the word frequency condition, and meanFix is the mean proportion of fixations to the target object.

> summary(TargetFix)
    Subject        Time         timeBin     Condition    meanFix       
 708    :16   Min.   : 300   Min.   :1.00   High:80   Min.   :0.02857  
 712    :16   1st Qu.: 475   1st Qu.:2.75   Low :80   1st Qu.:0.29003  
 715    :16   Median : 650   Median :4.50             Median :0.47141  
 720    :16   Mean   : 650   Mean   :4.50             Mean   :0.45637  
 722    :16   3rd Qu.: 825   3rd Qu.:6.25             3rd Qu.:0.63889  
 725    :16   Max.   :1000   Max.   :8.00             Max.   :0.83333  
 (Other):64   
 
The first step is to create a third-order polynomial in the range of timeBin.
> t <- poly((unique(TargetFix$timeBin)), 3)
The next step is to create variables ot1, ot2, ot3 corresponding to the orthogonal polynomial time terms and populate their values with the timeBin-appropriate orthogonal polynomial values:
> TargetFix[,paste("ot", 1:3, sep="")] <- t[TargetFix$timeBin, 1:3]

Since this is a simple case with just one within-subjects fixed effect that has only two levels, we can skip to the full model and examine its parameter estimates:
> m.full <- lmer(meanFix ~ (ot1+ot2+ot3)*Condition + (ot1+ot2+ot3 | Subject) + (ot1+ot2 | Subject:Condition), data=TargetFix, REML=F)
> print(m.full, corr=F)

Fixed effects:
                 Estimate Std. Error t value
(Intercept)       0.48160    0.01394   34.55
ot1               0.49961    0.04550   10.98
ot2              -0.06546    0.02557   -2.56
ot3              -0.07269    0.01948   -3.73
ConditionLow     -0.05045    0.01848   -2.73
ot1:ConditionLow -0.01252    0.04717   -0.27
ot2:ConditionLow  0.11010    0.03566    3.09
ot3:ConditionLow  0.00210    0.02190    0.10

Notice that the parameter estimates do not have p-values. There are good reasons for that (see this FAQ for more information), but this is cold comfort to most experimental psychologists, who need to report p-values. The quick and easy solution is to assume that, because we have relatively many observations, the t-distribution converges to the z-distribution, so we can use a normal approximation:
> coefs <- data.frame(summary(m.full)@coefs) 
> coefs$p <- format.pval(2*(1-pnorm(abs(coefs$t.value))), digits=2, eps=0.0001) #also make the p-values a bit more readable
> coefs
                     Estimate Std..Error     t.value       p
(Intercept)       0.481597222 0.01394008 34.54766633 < 1e-04
ot1               0.499606081 0.04550720 10.97861540 < 1e-04
ot2              -0.065462584 0.02556626 -2.56050645 0.01045
ot3              -0.072687479 0.01947952 -3.73148267 0.00019
ConditionLow     -0.050453750 0.01848495 -2.72944992 0.00634
ot1:ConditionLow -0.012523804 0.04717894 -0.26545329 0.79066
ot2:ConditionLow  0.110103220 0.03565713  3.08783144 0.00202
ot3:ConditionLow  0.002099561 0.02189866  0.09587624 0.92362
Another approach is to selectively remove individual effects of Condition and use model comparisons to evaluate each effect. This method would have produced very similar results:
                                 Chisq Df Pr(>Chisq)
ConditionLow         6.5532 1 0.01047 *
ot1:ConditionLow 0.0703 1 0.7909
ot2:ConditionLow 7.7542 1 0.005359 **
ot3:ConditionLow 0.0092 1 0.9236

The results confirm that fixation proportion was lower overall for the Low condition than the High condition (Condition effect on the intercept - ConditionLow - had a statistically significant negative estimate) and there was a difference in curvature between the conditions (statistically significant effect of condition on the quadratic term: ot2:ConditionLow)

Reporting results

There are two rules of thumb for reporting growth curve analysis results:
  1. Clearly describe each of the three key components of the model: the functional form (third-order orthogonal polynomial), the fixed effects (effect of Condition on all time terms), and the random effects (effect of Subject on each of the time terms and nested effects of Subject-by-Condition on each of the time terms except the cubic). Depending on the circumstances and complexity of the model, you may want to include additional information about the factors and why they were included or not. It's also a good idea to report which method was used for computing p-values.
  2. For key findings, report parameter estimates and standard errors along with significance tests. In some cases the model comparison is going to be enough, but for key findings, the readers should want to see the parameter estimates. The parameter estimate standard errors are critical for interpreting the estimates, so those should be reported as well. The t-values are not critical to report (they are just Estimate divided by the Std Error, so they can always be computed from the reported estimates and standard errors). If there are many estimated parameters, it may be a good idea to focus the main text discussion on the most important ones and report the full set in a table or appendix.
Here is how we might report the results from the target fixation example above:
Growth curve analysis (Mirman et al., 2008) was used to analyze the target gaze data from 300ms to 1000ms after word onset. The overall time course of target fixations was modeled with a third-order (cubic) orthogonal polynomial and fixed effects of Condition (Low vs. High frequency; within-participants) on all time terms. The model also included participant random effects on all time terms and participant-by-condition random effects on all time terms except the cubic (estimating random effects is “expensive” in terms of the number of observation required, so this cubic term was excluded because it tends to capture less-relevant effects in the tails). There was a significant effect of Condition on the intercept term, indicating lower overall target fixation proportions for the Low condition relative to the High condition (Estimate = -0.0505, SE = 0.0185, p < 0.01). There was also a significant effect on the quadratic term, indicating shallower curvature - slower word recognition - in the Low condition relative to the High condition (Estimate = 0.110, SE = 0.0357, p < 0.01). All other effects of Condition were not significant (see Table 1 for full results).

Another example

Here is another example based on data from an experiment examining the link between statistical learning and word learning (Mirman et al., 2008). In this experiment the critical manipulation was between-participants and the data shape is a little different (requiring only a second-order polynomial). 

Analyzing individual differences

Here we will consider cohort and rhyme competition data from 5 participants with Broca's aphasia, 3 participants with Wernicke's aphasia, and 12 age-matched controls (Mirman et al., 2011). The group means are shown below, with cohort competition in the top row and rhyme competition in the bottom row. Visually, it seems that the participants with Wernicke's exhibited more cohort competition and less rhyme competition, whereas participants with Broca's aphasia exhibited the opposite pattern.
The data frame looks like this:
> summary(CohortRhyme)
     subjID          Group          Time         timeBin             Object        Type        FixProp      
 101    :  80   Control :960   Min.   :  50   Min.   : 1.00   Competitor:800   Cohort:800   Min.   :0.0000  
 102    :  80   Broca   :400   1st Qu.: 525   1st Qu.: 5.75   Unrelated :800   Rhyme :800   1st Qu.:0.0000  
 103    :  80   Wernicke:240   Median :1000   Median :10.50   Target    :  0                Median :0.0417  
 104    :  80                  Mean   :1000   Mean   :10.50                                 Mean   :0.0665  
 105    :  80                  3rd Qu.:1475   3rd Qu.:15.25                                 3rd Qu.:0.1000  
 106    :  80                  Max.   :1950   Max.   :20.00                                 Max.   :0.5833

We'll use fourth-order polynomials to capture the rise and fall of the competition curves:
> t <- poly(unique(CohortRhyme$timeBin), 4)
> CohortRhyme[,paste("ot", 1:4, sep="")] <- t[CohortRhyme$timeBin, 1:4]

When the individual differences are "experiment external", such as aphasia subtype (or continuous variables like age), the best option is to add the individual difference variable as a fixed effect to the model. So we start with a base model of the overall competition effect (Object fixed effect) and add a fixed effect of Group. It would be a good idea to build up the Group effects gradually, but we'll skip to the full model to keep this example more brief:
> cohort.base <- lmer(FixProp ~ (ot1+ot2+ot3+ot4)*Object + (1+ot1+ot2+ot3+ot4 | subjID) + (1+ot1+ot2 | subjID:Object), data=subset(CohortRhyme, Type == "Cohort"), REML=F)
> cohort.group <- lmer(FixProp ~ (ot1+ot2+ot3+ot4)*Object*Group + (1+ot1+ot2+ot3+ot4 | subjID) + (1+ot1+ot2 | subjID:Object), data=subset(CohortRhyme, Type == "Cohort"), REML=F)
> anova(cohort.base, cohort.group)
             Df     AIC     BIC logLik  Chisq Chi Df Pr(>Chisq)    
cohort.base  32 -2277.6 -2127.7 1170.8                             
cohort.group 52 -2303.4 -2059.8 1203.7 65.794     20  8.717e-07 ***
Using the same approach as before to get normal-approximation p-values, we can see that this overall Group difference is due to the Wernicke's aphasia group differing from the Control (baseline reference) group. We're just going to look at the effects up to the quadratic term because the higher order terms are just capturing differences in the tails. Note that the parameter estimates are for the Unrelated object relative to the Related object, so a negative estimate for the Intercept term (e.g.,
ObjectUnrelated:GroupWernicke) corresponds to a larger competition effect (i.e., for the Wernicke's group compared to the Control group, the Unrelated object has an even lower intercept than the Competitor does).
                                      Estimate  Std. Error     t value          p
ObjectUnrelated:GroupBroca         0.007168833 0.017405094   0.4118813  0.6804264
ObjectUnrelated:GroupWernicke     -0.049674167 0.021106776  -2.3534701  0.0185991
ot1:ObjectUnrelated:GroupBroca    -0.047473215 0.060789929  -0.7809388  0.4348385
ot1:ObjectUnrelated:GroupWernicke  0.009277717 0.073718617   0.1258531  0.8998482
ot2:ObjectUnrelated:GroupBroca     0.044092440 0.058924682   0.7482847  0.4542884
ot2:ObjectUnrelated:GroupWernicke  0.279205510 0.071456673   3.9073399 9.3318e-05 

The same steps for the rhyme competition data reveal that only the Broca's group differed from controls:
> rhyme.base <- lmer(FixProp ~ (ot1+ot2+ot3+ot4)*Object + (1+ot1+ot2+ot3+ot4 | subjID) + (1+ot1+ot2 | subjID:Object), data=subset(CohortRhyme, Type == "Rhyme"), REML=F)
> rhyme.group <- lmer(FixProp ~ (ot1+ot2+ot3+ot4)*Object*Group + (1+ot1+ot2+ot3+ot4 | subjID) + (1+ot1+ot2 | subjID:Object), data=subset(CohortRhyme, Type == "Rhyme"), REML=F)
anova(rhyme.base, rhyme.group)

            Df     AIC     BIC logLik  Chisq Chi Df Pr(>Chisq)  
rhyme.base  32 -2269.6 -2119.7 1166.8                            
rhyme.group 52 -2265.2 -2021.6 1184.6 35.597 20 0.01715 *

> coefs.rhyme <- as.data.frame(summary(rhyme.group)@coefs)
> coefs.rhyme$p <- format.pval(2*(1-pnorm(abs(coefs.rhyme[,"t value"]))))

                                      Estimate  Std. Error     t value          p
ObjectUnrelated:GroupBroca -0.035564333 0.015762534 -2.25625741 0.0240545 ObjectUnrelated:GroupWernicke 0.017755000 0.019114880 0.92885753 0.3529629 ot1:ObjectUnrelated:GroupBroca 0.122036460 0.064624392 1.88839624 0.0589728 ot1:ObjectUnrelated:GroupWernicke -0.128382147 0.078368587 -1.63818376 0.1013834 ot2:ObjectUnrelated:GroupBroca -0.006877810 0.055991778 -0.12283607 0.9022369 ot2:ObjectUnrelated:GroupWernicke -0.011219074 0.067900004 -0.16522935 0.8687635

So far, we have established a double dissociation -- one group has larger cohort competition, the other group has larger rhyme competition -- but we can also ask whether there is an association between the effects. That is, do participants with larger cohort effects tend to show smaller rhyme effects? To test this kind of "experiment internal" individual differences, we can use the random effects to estimate each participants effect size and then test the correlation between the effect sizes.

The random effects are the systematic deviations from the "mean" pattern predicted by the fixed effects. So we'll use the random effects from the base model, which did not distinguish between participant groups so we can get individual effect sizes relative to the overall mean (otherwise we'd be looking at individual effect sizes relative to the sub-group mean). The key function for extracting random effects is ranef and in our case it will be a list with two elements, because we had a random effect by subjID and a random effect by subjID:Object. We're interested in the latter, because it is the one that will capture individual differences in competition effect size. In particular, we'll want to subtract the random effect estimate for the Unrelated object from the estimate for the Competitor for each subject to get that subject's individual competition effect size. Extracting the random effects is relatively easy (e.g., ranef(cohort.base)$'subjID:Object'), but there is a moderate amount of data manipulation involved in getting them into a convenient form. There are many ways to do the data manipulation, here is my way:
#get cohort effect sizes
> blup.cohort <- data.frame(colsplit(row.names(ranef(cohort.base)$'subjID:Object'), ":", c("Subject", "Object")), ranef(cohort.base)$'subjID:Object')
> ES.coh <- ddply(blup.cohort, .(Subject), summarize, Intercept = X.Intercept.[Object=="Competitor"] - X.Intercept.[Object=="Unrelated"], Quadratic = ot2[Object=="Competitor"] - ot2[Object=="Unrelated"])

#get rhyme effect sizes
> blup.rhyme <- data.frame(colsplit(row.names(ranef(rhyme.base)$'subjID:Object'), ":", c("Subject", "Object")), ranef(rhyme.base)$'subjID:Object')
> ES.rhy <- ddply(blup.rhyme, .(Subject), summarize, Rhyme.Intercept = X.Intercept.[Object=="Competitor"] - X.Intercept.[Object=="Unrelated"])

#combine
> ES <- merge(ES.coh, ES.rhy, by="Subject")
> group <- unique(subset(CohortRhyme, select=c(subjID, Group))) #get group assignments from original data frame
> ES <- merge(ES, group, by.x="Subject", by.y="subjID")

 Now we can examine a scatterplot of the individual cohort and rhyme competition effect sizes (effects on the intercept, shown on the right) and compute correlations across the complete sample and just for the participants with aphasia:
> cor.test(ES$Intercept, ES$Rhyme.Intercept)
cor.test(ES$Intercept[ES$Group != "Control"], ES$Rhyme.Intercept[ES$Group != "Control"])

The correlation is significant for participants with aphasia (r = -0.86, p = 0.006), but not for control participants (r = 0.34, p > 0.25). This suggests that there may be a single mechanism behind this pattern, in contrast to the standard separable components interpretation of a double dissociation. 

Another example

Here is another example based on data from an experiment that examined the time course of activation of function and thematic semantic knowledge in 17 participants with left hemisphere stroke (Kalenine, Mirman, & Buxbaum, 2012).

Visualizing growth curve models: Under construction


Č
Ċ
ď
Dan Mirman,
Sep 6, 2012, 12:12 PM
Ċ
ď
Dan Mirman,
Jun 18, 2012, 4:56 AM
Ċ
ď
Dan Mirman,
Aug 10, 2012, 10:24 AM
Ċ
ď
Dan Mirman,
Jul 26, 2012, 12:30 PM
Comments