 Research article
 Open Access
 Published:
Modelling subjectspecific childhood growth using linear mixedeffect models with cubic regression splines
Emerging Themes in Epidemiology volume 13, Article number: 1 (2016)
Abstract
Background
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, welldefined longitudinal models are essential to characterize both population and subjectspecific growth. Linear mixedeffect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subjectspecific growth, velocity and acceleration.
Methods
We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subjectspecific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines visàvis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life.
Results
Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixedeffect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population and individuallevel growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation.
Conclusions
Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for nonstatisticians using linear mixedeffect models.
Background
Childhood growth is a cornerstone of pediatric research and many centuries of work have been undertaken to understand and model how children grow [1–5]. In studies of childhood growth, anthropometric data are often collected at multiple time points to describe growth in a population, evaluate the role of exposures on growth, investigate the effects of an intervention, and assess individual growth in clinical practice [6–11]. Height is commonly monitored longitudinally as a marker of chronic malnutrition; however, estimation and prediction of subjectspecific height curves with age, as in the study we consider here, can present several methodological challenges to researchers.
Crosssectional studies are an attractive option for surveillance because of their feasibility and costeffectiveness in populations, but this approach for growth monitoring has several inherent limitations. For example, they can be confounded by secular trends, such as selective mortality that leads to perceived improved growth at older ages due to the better health of the survivor population [12]. Additionally, crosssectional growth data may display large skewness and kurtosis and may exhibit substantial heteroskesdasticity, and marginal analyses to describe population trajectories require transformations to normality, weighting or both to achieve an adequate fit [13]. While longitudinal data may also suffer from the same problems, linear mixed effects models naturally take skewness, kurtosis, and even heteroskedasticity into account, making transformations not necessary. The utility of transformation techniques remains controversial. Indeed, while transformations may lead to a better fit of Gaussian models, they require a priori knowledge of the data structure. The flexible BoxCox transformation family of distributions [14, 15] can be used, but it may fail when data are clustered. Moreover, interpretation of transformed data is problematic, and producing predictions at the subject and population level is not straightforward.
Longitudinal studies provide a more realistic view of child growth, in which a cohort of children are monitored over time and repeated measurements of height are collected. Longitudinal studies provide information about individual growth patterns, and allow the estimation and analysis of growth velocity and acceleration at either the individual or population level. However, longitudinal growth data has complex characteristics that need to be accounted for including: withinsubject clustering of growth observations, heterogeneity of individual baseline and dynamic growth characteristics, and autocorrelation within individuals.
Linear mixedeffect models combine the components of fixed effects, random effects, and repeated measurements in a single unified approach [16, 17]. Analysis of longitudinal data using mixed effects models does not require the same assumptions as a crosssectional study and may not require transformations. The use of linear mixedeffect models, while widely described in the statistical literature, has been only slowly adopted by applied researchers. This was due in part to the limited availability of userfriendly software tools; this is now rapidly changing with many commercial and open source software providing fitting capabilities for increasingly complex mixed effects models [18]. To address this problem, ensure reproducibility of methods, and provide a wider dissemination, we provide examples of statistical code in an opensource platform for fitting models described in this paper. Another reason for the slow adoption of linear mixedeffect models is their complexity relative to standard regression models and the modeling structures necessary to capture nonlinear trajectories. These elements require a higher level of statistical and computational expertise [19]. Utilizing longitudinal length/height data from a child cohort in Peru, we describe a natural and intuitive stepwise approach to the development of a linear mixedeffect model with cubic splines for the analysis of longitudinal childhood growth in height. We then derive individual height velocity and acceleration curves from the longitudinal model based on height. Our objective is to describe the use of these methods to analyze height and height velocity data in a way that can be easily used by an applied researcher. While we have a preference to use the raw measurements of growth, methods introduced here extend directly to Zscores relative to the WHO standard.
Methods
Study setting
The study was conducted in Las Pampas de San Juan Miraflores and Nuevo Paraíso, both periurban shanty towns with high population density located on the southern edge of Lima city in Peru. The shanty towns had approximately 40,000 residents of whom 25 percent are under 5 years of age. These communities are described in detail elsewhere [10, 11].
Study participants
A simple census was conducted to find pregnant women and children less than 3 months of age. Eligible newborns and pregnant women were randomly selected from the census and invited to participate in the study. Only one newborn was recruited per household. Written informed consent was required from parents or guardians before enrollment. Exclusion criteria for the study were: severe disease that required hospitalization, severe or chronic conditions, child of a multiple pregnancy, birth weight less than 1500 g and plans to move to another community within the study time. Our analysis excluded data of children whose followup ended before they reached one year of age, those who were followed up for less than 60 days during the first 6 months of life, or followed up for less than 120 days during the first year of life.
Study design
We conducted a birth cohort study between May 2007 and February 2011. Child growth data presented in this paper is a part of a larger longitudinal study conducted in Brazil and Peru. The objective of the cohort study was to assess if infection with Helicobacter pylori increases the risk of diarrhea and in turn adversely affects growth in children less than 2 years of age [20]. For the purposes of illustration, this study utilizes longitudinal height data from Peru only.
Outcomes
We measured anthropometrics weekly until the child was 3 months of age, every 2 weeks between three and 11 months of age, and once monthly thereafter for the remainder of followup. Both height and recumbent length (supine length) was measured using a wooden length platform and sliding headfootboard (Shorr stadiometers, Shorr Productions, Olney, Maryland). We followed calibration procedures as per manufacturer instructions. Height or length were measured to the nearest 0.1 cm. Anthropometric standardization was conducted at the beginning of the study, and once yearly thereafter. Outcomes of this study were height in centimeters and height velocity in centimeters per month. Child length at birth was obtained from the perinatal care booklet given to mothers from the health care providers. Children less than 2 years of age were measured in a horizontal position (recumbent length), whereas children aged 2 years and older were measured in a vertical position (height). Length (or height) was measured using a wooden length platform and sliding footboard (or headboard) to the nearest 0.1 cm.
Height velocity was calculated in two ways. First, we calculated empirical height velocity by subtracting previous height from current height and dividing by the time gap between both measurements:
Second, we computed an estimated height velocity using the first time derivative of the longitudinal height models using standard methods (Table 1).
Biostatistical methods
The primary aim of our analysis was to model height and height velocity. We included the following predictors in our models: age in months, an indicator variable for greater than 24 months to account for unit differences in length vs. height measurement methods, and sex. To model the nonlinear relationship between age and height over time, we used smooth, flexible functions known as cubic regression splines. While there are several forms of regression splines that can be used to model nonlinear relationships between a predictor (i.e., age) and an outcome (i.e., height), we chose to use cubic regression splines because they are simple to construct and understand [21]. We purposely varied the number and positions of the interval knots in several of our examples to demonstrate that our models are not affected by these changes. As mentioned, derivation of different types of splines to calculate height velocity is straightforward (Table 1). Since other investigators [6] have proposed the use of linear splines to model child growth because of their ease of use, in this paper, we compared adequacy of both estimation and prediction between cubic and linear regression splines with variations in the number and positions of the knots. We compared cubic and linear regression spline models using an insample (i.e., estimation) mean square error (MSE) and outofsample (i.e., prediction) MSE using standard methods. For outofsample prediction, we used 80 % of the data for training and 20 % of the data for validation. For each individual growth curve, we randomly sampled 80 % of the data values and used them to construct the model fit. The validation data consisting of 20 % of the data was then used to generate predicted height values. The predicted height and observed height were used to compute subjectspecific prediction MSEs.
As described in detail below, the following statistical methods were used in the model: a cubic regression spline with knots at 3, 6, 12, 18, 24 and 40 months, random intercepts and slopes to capture the heterogeneity in growth curves, and a firstorder continuous autoregressive error [CAR(1)] to capture residual serial correlation that arises from repeated measurements of the same child [22–24]. At each step of model building, we conducted exploratory analysis in which we visualized standardized residuals with age, a sample variogram of the residuals [25], and pairwise scatterplots of residuals at different time points to assess goodnessoffit. We limited the number of internal knots for our statistical models between three and six because this number is likely sufficient to model the inflexion of linear/height growth curves in children under 5 years of age. Moreover, we also tested slight variations in the numbers and positioning of knots to determine if the model fit was affected. We conducted our statistical analyses in STATA 11 (StataCorp LP, College Station, TX, USA) and R (http://www.rproject.org). Examples of statistical code in R are provided to ensure reproducibility and improve dissemination of methods (Additional file 1).
Research ethics
This study was approved by the internal review boards of A.B. PRISMA (Lima, Peru), Johns Hopkins University (Baltimore, USA), and the European Union Ethics Committee.
Results
Baseline characteristics
We included a total of 215 children in this analysis. The initial cohort started with 304 children: however, 11 did not have available anthropometric data and 78 were lost to follow up before 1 year of age. Average followup was 34 months (range 12–45 months). Average number of observations per children was 50, giving a total of 10,838 observations. Of 215 children, 109 were girls (51.0 %).
Initial observations
We conducted exploratory data analysis as the first step to gain insight into the structure and nature of the data. In exploratory analysis for longitudinal data, we obtained an understanding of the nonlinear relationship of height with age but also the heterogeneity in growth curves for the study population (Fig. 1). Using a spaghetti plot of empirical height velocity, we found that it declined with age and the change was steepest at the youngest ages. In addition, in longitudinal growth data with repeated height measures, it is important to evaluate the serial autocorrelation, or the relationship between these measures. The variogram is a useful tool, as it evaluates autocorrelation by comparing the half square difference between each pair of heights within an individual to the time gap associated with each pair of heights. A nonparametric smooth fit is added to the variogram to describe the pattern. The exponential nonparametric smooth fit indicates that data exhibit both strong serial autocorrelation and heterogeneity (Fig. 1).
Building a longitudinal growth model
Our exploratory data analysis highlighted several challenges that need to be addressed in modeling growth data. Specifically, we had to account for the shape of the curve, the clustering of values, the heterogeneity among individuals, and the individual effects of serial correlation. To emphasize the role of each of these factors, we used combinations of random effects and serial correlation components in a linear model (Table 2). The steps to build the final model are described below.
Ordinary least squares model
Ordinary least squares (OLS) estimates growth parameters by minimizing the sum of the squared vertical distances between the observed and predicted responses. It provides efficient and valid predictions with the following assumptions: height can be explained by a linear combination of predictors, values of height are determined independently of each other, height has a normal distribution at any particular age, and height has equal variances at any particular age.
Unfortunately, growth data violates all of these assumptions. First, growth does not follow a linear pattern with age. This can be addressed through the use cubic regression spline to model the curvature with age. However, there are other concerns that cannot be as easily adjusted by OLS. Several height measurements were taken from each child; however, these serial measurements of height for the same child are not independent of each other. This is confirmed by a variogram of the residuals from the OLS (Fig. 2, panel a). Next, the height data did not follow a normal distribution as per the Shapiro–Wilk test (p < 0.001), the Mardia skewness test (p < 0.001), and the Mardia Kurtosis test (p < 0.001). Finally, a display of residuals versus age revealed a heteroskedastic distribution implying that the variance of the error was not constant across ages (Fig. 2, panel d). Heteroskedasticity was confirmed with the Breusch–Pagan test (p < 0.001). OLS model is therefore insufficient to model longitudinal growth.
Linear mixedeffect model without repeated measurements
The OLS model indicated that additional modeling components are necessary to account for individuallevel clustering and residual autocorrelation. Linear mixedeffect models allow for nonindependence and clustering by describing both between and within individual differences. We added random intercepts and slopes to the OLS model described above. Specifically, we added the parameters of the variance–covariance matrix to the fitted model (Table 2). Our model assumes that the variance–covariance of the random effects is unstructured.
At first, we only included random intercepts (b_{0i}). The model assumes that the random intercepts are normally distributed with mean 0 and variance g_{11}. We found that this parameter was statistically significant with a variance (g_{11}) of 6.31 (95 % CI 5.21–7.62; p < 0.001). This result suggests that the growth of individual children diverge from the population prediction by a shift in the intercept; in other words, individual growth is shifted up or down but parallel to the population prediction.
Then, in a second step, we included both random intercepts and random slopes (b_{1i}). The variance of the intercept (g_{11}) was 3.60 (95 % CI 2.97–4.36; p < 0.001), the range is 5.27 cm to 3.57 cm and the mean is 0.0056 cm. In addition, the variance of the slope (g_{22}) was 0.0091 cm/month (95 % CI 0.0075–0.011; p < 0.001), the range is −0.27 cm/month to 0.24 cm/month and the mean is 0.00077 cm/month (Table 3). Following the same reasoning as for random intercepts, the random slopes represent the individual variability in growth velocity around the population prediction: some individual children grow faster or slower than the population. A statistically significant variance, close to 0, as in this case, means that the shift of the individual growth velocity from the population is statistically supported but it is small. Therefore, the data exhibit less heterogeneity in the random slope than in the random intercept, indicating that the main source of difference between trajectories is encapsulated in their birth height. The story could be different in other growth cohorts or on other outcomes different than growth such as blood pressure.
In our data, it was suitable to assume that the random effects follow an unstructured matrix because the random intercepts and random slopes have different variance estimates; this flexibility confers individuality to each child’s pattern of growth. The covariance (g _{ 12 }) describes how the random intercept varies with the random slope. The covariance for this data set was 0.019 (95 % CI −0.0059 to 0.044; p < 0.001; Table 3). A positive covariance, as what is shown from these data, suggested that children with greater individual intercept tend to have a greater rate of growth.
The unexplained or residual variance in this model is 0.60 (95 % CI 0.59–0.62). The introduction of random effects reduced the residual variance by an order of magnitude from 7.34 to 0.60, implying that individual effects explain a considerable portion of the outcome variation.
To evaluate the fit of our model, we visualized standardized residuals with age (Fig. 2, panel e). The linear mixedeffect model eliminated heteroskedasticity of residuals. The mixed model assumes errors are normal and conditionally independently distributed with mean zero and common variance. However, the estimated residuals did not appear randomly distributed; instead, they show symmetry in wider or narrower areas across the horizontal axis. Additionally, the variogram of the residuals still displays a degree of autocorrelation (see Fig. 2, panel b). The correlation in the residuals was confirmed by plotting the residual versus the residual of the previous observation within a child (Fig. 3). Taken together, a mixed model was able to address the nonindependence and clustering that OLS could not, and in doing so account for a greater proportion of unexplained variance and reduce the heteroskedasticity. However, it still had limitations in addressing the serial correlation from repeated measures in growth data.
Linear mixedeffect model with CAR (1) error for repeated measurements
From the exploratory data analysis, we noted that serial height measurements within children were autocorrelated. We used a CAR(1) error to capture serial correlation in our statistical model. This structure assumes that errors are correlated and the degree of correlation is greater in those closer with age than in those further apart. The third model adds a new parameter of correlation ρ = 0.66 (95 % CI 0.64–0.68; p < 0.001). To evaluate the fit of this model, we visualized residuals with age (Fig. 2, panel f). The autocorrelation was further reduced as evidenced by the variogram of the residuals and by the lack of relationship in scatter plots of residuals versus the previous observation’s residuals (Fig. 3). Both the random intercept and the random slope were statistically significant and followed the same growth patterns as before (Table 3). The covariance was 0.036 (95 % CI 0.013–0.059; p < 0.001). The unexplained variance was 0.81, it remained low and was similar to the unexplained variance of the linear mixedeffect model. Since there can be sex differences in child growth, we also evaluated the contribution of sex as a covariate. If all the variables remained constant, boys were on average 1.5 cm (95 % CI 0.99–2.01) taller than girls during the follow up period (Table 3).
Cubic versus linear regression splines
Cubic regression splines were superior to linear regression splines in both estimation and prediction at each modeling step, and even when varying the number and position of knots. In Table 4, we report the values of AIC and BIC for OLS, LME, and LME with CAR(1) models even when the number of knots and their positions are varied. All results indicate that mixed effects models with cubic splines by far outperform the other models considered and that cubic splines outperform linear splines for every level of model complexity. AIC and BIC results show that cubic splines outperform linear splines even when cubic splines use three knots and linear splines use five knots. This is probably because the growth curvature is better captured by a cubic function than by multiple piecewise linear splines. Equally importantly, however, is the important reduction in AIC and BIC noted when a CAR(1) error structure was incorporated into the regression model.
Cubic regression splines models were also better at both estimation and prediction than were linear regression splines. Using three knots (at 3, 10, and 29 months) we obtained a median subjectspecific estimation MSEs of 0.65 for linear regression splines and 0.51 for cubic regression splines (Fig. 4). A Kolmogorov–Smirnov test comparing the MSE distributions indicates that the two MSE distributions were different (D = 0.186, p = 0.001). Differences in the median subjectspecific estimation MSE were similar even when the number of knots were increased, or their positions were varied (data not shown). Specifically, the outofsample prediction MSE was 0.86 for linear regression splines and 0.80 for cubic regression splines. These findings suggest that objective of inference in child growth problems should be primarily the curve and only secondarily the coefficients.
Longitudinal models for height velocity and acceleration
With an appropriate model for height, the model for height velocity using a cubic regression spline is simply obtained from its first derivative (Table 1). Similarly, growth acceleration can be obtained by taking the derivative of the height velocity (formulas not shown). As a result of the derivation, the height velocity model loses the indicator variable of a difference between length and height. Also, sex is not included because our model assumes there is no interaction between sex and age. In this height velocity model, the random slope component is represented by \({\hat{\text{b}}}_{{1{\text{i}}}}\).
An interesting characteristic of the height velocity and acceleration curves from a cubic regression model is that they are continuous and make biological sense. In contrast, a linear regression spline would assume that growth velocity is piecewise constant within knots but discontinuous across knots. Moreover, growth acceleration curve is zero. Both these assumptions of linear spline approaches contradict scientific knowledge about growth trajectories and the data in our application. For example, in Fig. 5 we estimated growth velocity (top panels) and acceleration (bottom panels) for three subjects in the study using linear (left panels) and cubic (right panels) splines with three knots (at 3, 10, and 29 months.) A different number of knots and knot locations would result in slightly different plots, but with the same qualitative interpretation. The linear spline plot assumes that growth velocity is piecewise constant between the knots, an assumption that once highlighted is very difficult to accept from a scientific perspective. In contrast, the cubic spline estimate of the velocity curve is much more aligned with the current knowledge of human growth with the velocity curve being continuous and smooth. Even more striking, the acceleration curve estimated using linear splines is always zero. In contrast, cubic regression splines estimate a negative acceleration, which is especially large in absolute terms in the first part of the curve. This corresponds to the obvious patterns we observe in growth curves: they have a slightly concave shape, with concavity much more pronounced immediately after birth (indicating deceleration of growth). Interestingly, the cubic spline estimator gets much closer to zero after month 10, but continues to be negative, which may indicate continuous concavity of the function, though much subtler.
Interpretation of the final model
By displaying the individual growth of three children at three different percentiles and their predicted values, we can see how our model effectively reflects individual and population growth patterns (Fig. 6). Differences between children appear to be largely from shifts in intercepts, with minimal differences in growth rates. This is further supported in the height velocity model in which the three individuals have almost the same rate of growth, only differing by a minimal vertical shift (Fig. 6). Therefore, the last model was able to effectively predict both population growth and differences between and within individuals in a population of Peruvian children from a periurban shanty town.
Discussion
The final statistical model for the prediction of height was obtained by applying a linear mixedeffect model with random intercepts and slopes, and a firstorder continuous autocorrelation structure for the error term. To account for the curvature of growth with age, we used cubic regression splines. To describe this modeling process, we build longitudinal growth data using a stepwise process. At each step, we examined standardized residuals with age, the sample variogram of the residuals and pairwise scatterplots of residuals at different time points to assess goodnessoffit. We also compared estimation and prediction of cubic regression spline models visàvis linear regression splines, and found that the former outperform the latter. This suggests that inference of child growth problems should focus on differences in growth curves based on the regression parameters rather than on direct interpretation of the regression parameters themselves.
While a sizeable literature on modeling child growth data already exists [1–11], it should be recognized that estimation and, more importantly, prediction of subjectspecific growth trajectories continues to be a problem under active methodological development and much attention needs to be given to the various choices and refinements to address the wide variety of applications. Our modeling framework adds to the literature in at least three directions. First, we are interested in fitting and predicting both at the population and subjectspecific level, whereas previous related papers focus exclusively on populationlevel parameters [6–11]. In large studies with more than 100 children, estimating the population level parameters is relatively easy and straightforward and there is not a lot of need for modeling the subjectspecific deviations. However, subjectspecific estimation and predictions is more complicated and requires a higher level of technical detail. Second, cubic regression splines provide a better fit than linear splines to nonlinear growth curves when the number of knots is small, as proposed by us and others. This happens because cubic regression splines capture better the nonlinear trajectories between knots and are especially wellsuited for modeling child growth immediately after birth when both acceleration and velocity are the greatest. Second, cubic regression splines assume that the velocity of growth in children is continuous and smooth whereas linear splines assume that velocity is constant in a small number of intervals, an assumption that is biologically difficult to support. Moreover, growth acceleration is a piecewise linear and continuous function when data are modeled as cubic splines, whereas a linear spline approach assumes that growth acceleration is exactly zero. Our results indicate that, for this growth data set, cubic splines outperforms linear splines when the same number and location of knots is used, and these findings are consistent with previously published work [26, 27]. In general, we recommend conducting applicationspecific simulations to assess which model performs better. Third, modeling residuals as a continuous autoregressive process is a fundamentally important feature of unbalanced data that needs to be taken carefully into account.
Linear mixedeffect models are an advantageous and appropriate statistical method for longitudinal growth in children under 5 years of age. Linear mixed models not only provide information about the population prediction of growth but also give insight on individual patterns of growth through the random component. Our analysis also points to the importance of adequately accounting for autocorrelation of repeated measurements. Inclusion of cubic regression splines when compared to linear splines to model the curvature of growth is supported by our data because more measurements were taken when growth was faster. Also, linear mixed models allow for the analysis of unbalanced data with ease, e.g. subjects with a different number of observations and observations measured at different points in time. From a methodological and logistical perspective, this is an advantage because sometimes it is hard to ensure that subjects do not miss any visits and that they are measured at the exact scheduled time. Linear mixedeffect models can account for some bias because subjects with complete follow up time might differ from subjects lost during the follow up period [28]. In addition, linear mixedeffect models support several variance–covariance matrices and structure of the residuals, allowing flexibility for various types of data with different levels of clustering, the inclusion of timevarying and timeinvariant covariates and an efficient method to account for repeated measurements [17, 18].
A limitation of linear mixedeffect model approach presented in this paper is that not all the statistical software packages support the generation of variograms for longitudinal data, and may therefore require programming. Our analysis has some additional shortcomings. First, our inferences are based on the analysis of a single dataset. Second, the population under study consisted of a rather homogenous group of children with a high number of height measurements in early childhood. We acknowledge that there may be differences in statistical modeling approaches in study samples that exhibit more heterogeneity and have fewer measurements during early childhood. Third, our analytical approach requires the assumption that growth data are missing at random, in which case mixed effects models are very robust. If outcome data were not missing at random (e.g. the child was sick for a long period of time), then this can lead to larger prediction errors due to subjectspecific biases. Finally, we did not include an interaction between sex and age to simplify our illustration of the statistical principles of our analysis; however, this model may not be realistic for human growth or perhaps for other longitudinal outcomes.
In conclusion, we present a stepwise approach to developing longitudinal growth models using linear mixedeffect models that account for random effects and serial autocorrelation with cubic regression splines to capture the nonlinear relationship between age and growth in height. As compared to other approaches, this modeling approach is simpler, direct and does not require multiple steps of transformation, analyzing age intervals separately or estimating LMS parameters. The growth velocity model obtained from the derivative eliminates measurement error directly from the growth without the need of assuming negative increments as no growth. Moreover, models based on cubic regression splines outperform those based on linear regression splines, suggesting that inferences should be based on differences in growth curves based on parameters and not on the interpretation of the parameters themselves. Therefore, researchers who seek to model longitudinal child growth in their investigations would benefit from using these steps for mixed modeling in future analyses.
Abbreviations
 CAR(1):

continuous autoregression of order 1
 OLS:

ordinary least squares
 TPS:

truncated polynomial spline
References
 1.
Quetelet A. Sur l’homme et le développement de ses facultés, ou Essai de physique sociale. Paris: Bachelier, imprimeurlibraire, quai des Augustins, no. 55, 1835.
 2.
Roberts C. The Physical development and proportions of the human body. St. George’s Hosp Rep. London, vol 8, 1874–76. p 1–48.
 3.
Boas F. Observations on the growth of children. Science. 1930;72:44–8.
 4.
Falkner F, Tanner JM. Human growth: a comprehensive treatise. New York: Plenum Press; 1986.
 5.
Tanner JM. A history of the study of human growth. Cambridge: Cambridge University Press; 1981.
 6.
Howe LD, Tilling K, Matijasevich A, Petherick ES, Santos AC, Fairley L, Wright J4, Santos IS, Barros AJ, Martin RM, Kramer MS, Bogdanovich N, Matush L, Barros H, Lawlor DA. Linear spline multilevel models for summarizing childhood growth trajectories: a guide to their application using examples from five birth cohorts. Stat Methods Med Res 2013 [Epub ahead of print].
 7.
Fairley L, Petherick ES, Howe LD, Tilling K, Cameron N, Lawlor DA, West J, Wright J. Describing differences in weight and length growth trajectories between white and Pakistani infants in the UK: analysis of the Born in Bradford birth cohort study using multilevel linear spline models. Arch Dis Child. 2013;98:274–9.
 8.
Tilling K, Davies N, Windmeijer F, Kramer MS, Bogdanovich N, Matush L, Patel R, Smith GD, BenShlomo Y, Martin RM. Promotion of Breastfeeding Intervention Trial (PROBIT) study group. Is infant weight associated with childhood blood pressure? Analysis of the Promotion of Breastfeeding Intervention Trial (PROBIT) cohort. Int J Epidemiol. 2011;40:1227–37.
 9.
Roth DE, Perumal N, A Al Mahmud, Baqui AH. Maternal vitamin D3 supplementation during the third trimester of pregnancy: effects on infant growth in a longitudinal followup study in Bangladesh. J Pediatr. 2013;163:1605–11.
 10.
Checkley W, Epstein LD, Gilman RH, Black RE, Cabrera L, Sterling CR. Effects of Cryptosporidium parvum infection in Peruvian children: growth faltering and subsequent catchup growth. Am J Epidemiol. 1998;148:497–506.
 11.
Checkley W, Epstein LD, Gilman RH, Cabrera L, Black RE. Effects of acute diarrhea on linear growth in Peruvian children. Am J Epidemiol. 2003;157:166–75.
 12.
Louis TA, Robins J, Dockery DW, Spiro A 3rd, Ware JH. Explaining discrepancies between longitudinal and crosssectional models. J Chronic Dis. 1986;39(10):831–9.
 13.
Borghi E, de Onis M, Garza C, Van den Broeck J, Frongillo EA, GrummerStrawn L, et al. Construction of the World Health Organization child growth standards: selection of methods for attained growth curves. Stat Med. 2006;25:247–65.
 14.
Rigby RA, Stasinopoulos DM. Generalized additive models for location, scale and shape. Applied Stat. 2005;54:507–54.
 15.
Rigby RA, Stasinopoulos DM. Smooth centile curves for skew and kurtotic data modelled using the BoxCox power exponential distribution. Stat Med. 2004;23:3053–76.
 16.
Goldstein H. Efficient statistical modelling of longitudinal data. Ann Hum Biol. 1986;13:129–41.
 17.
Laird NM, Ware JH. Randomeffects models for longitudinal data. Biometrics. 1982;38:963–74.
 18.
Cnaan A, Laird NM, Slasor P. Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data. Stat Med. 1997;16:2349–80.
 19.
Naumova EN, Must A, Laird NM. Tutorial in biostatistics: evaluating the impact of ‘critical periods’ in longitudinal studies of growth using piecewise mixed effects models. Int J Epidemiol. 2001;30:1332–41.
 20.
Jaganath D, Saito M, Gilman RH, Queiroz DM, Rocha GA, Cama V, Cabrera L, Kelleher D, Windle HJ, Crabtree JE, Checkley W. First detected Helicobacter pylori infection in infancy modifies the association between diarrheal disease and childhood growth in Peru. Helicobacter. 2014;19:272–9.
 21.
Crainiceanu CM, Ruppert D, Wand MP. Bayesian analysis for penalized spline regression using Win BUGS. J Stat Softw. 2005;14:1–24.
 22.
Funatogawa I, Funatogawa T, Ohashi Y. An autoregressive linear mixed effects model for the analysis of longitudinal data which show profiles approaching asymptotes. Stat Med. 2007;26:2113–30.
 23.
Jones RH, BoadiBoateng F. Unequally spaced longitudinal data with AR(1) serial correlation. Biometrics. 1991;47:161–75.
 24.
Ugrinowitsch C, Fellingham GW, Ricard MD. Limitations of ordinary least squares models in analyzing repeated measures data. Med Sci Sports Exerc. 2004;36:2144–8.
 25.
Diggle PJ. An approach to the analysis of repeated measurements. Biometrics. 1988;44:959–71.
 26.
Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Cambridge: Cambridge University Press; 2003.
 27.
Wood S. Generalized additive models: an introduction with R. New York: Chapman & Hall; 2006.
 28.
Laird NM. Missing data in longitudinal studies. Stat Med. 1988;7:305–15.
 29.
Cressie N. Statistics for spatial data. Hoboken: Wiley; 1993.
Authors’ contributions
LM: Developed the analytical plan, drafted the initial manuscript, and approved the final manuscript as submitted. AI: Developed and applied mixed effects model software, conducted simulations, provided results, and contributed to manuscript development and responses to reviewers. MS: Coordinated and supervised data collection of the original study in Peru, critically reviewed the manuscript and approved the final manuscript as submitted. CC: Critically reviewed and provided input to develop the linear mixedeffect models, critically reviewed the manuscript and approved the final manuscript as submitted. DJ: Critically reviewed the manuscript, contributed to writing and approved the final manuscript as submitted. RHG, JC, DK, VC: Involved in study design and conduct, critically reviewed the manuscript and approved the final manuscript as submitted. LC: Coordinated and supervised data collection of the original study at the site at Peru. WC: Conceptualized and designed the original study and developed the analytical plan, critically reviewed the manuscript, and approved the final manuscript as submitted. All authors read and approved the final manuscript.
Acknowledgements
This study was funded by the European Union CONTENT Framework VI project INCOCT2006032136. We are grateful to the community of Pampas de San Juan de Miraflores for their longterm collaboration.
Competing interests
The authors declare that they have no competing interests.
Funding sources
Data collection was supported by the Sixth Framework Programme of the European Union, Project CONTENT (INCODEV3032136). William Checkley and Ciprian Crainiceanu were further supported by a Bill & Melinda Gates Foundation Grant (OPP1114097).
Financial disclosure statement
The authors have no financial relationships relevant to this article to disclose.
Author information
Affiliations
Corresponding author
Additional information
Laura M. Grajeda and Andrada Ivanescu indicates joint first authorship
Additional file
Additional file 1.
Examples of statistical code in R.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Grajeda, L.M., Ivanescu, A., Saito, M. et al. Modelling subjectspecific childhood growth using linear mixedeffect models with cubic regression splines. Emerg Themes Epidemiol 13, 1 (2016). https://doi.org/10.1186/s1298201500383
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s1298201500383
Keywords
 Longitudinal studies
 Body Height
 Child development
 Growth
 Linear Models