Longitudinal Data

Prof. Sam Berchuck

Mar 06, 2025

Review of last lecture

During our last lecture, we introduced correlated (or dependent) data sources.
We discussed the idea of accounting for dependencies within a group using group-specific parameters.
We introduced the random intercept model and studied the induced correlation (forced to be positive) in the marginal model.
Today we will look at longitudinal data and introduce a simple model that accounts for group-level changes.

Longitudinal Data

Repeated measurements taken over time from the same subjects. Examples include:

Monitor Disease Progression: Track how diseases evolve, such as diabetes or glaucoma.
Evaluate Treatments: Understand how interventions work over time.
Personalized Health Insights: Capture individual health trajectories for personalized care.
Study Long-Term Effects: Evaluate the long-term outcomes of medical treatments or behaviors.

Example: Glaucoma Disease Progression

Imagine we are tracking mean deviation (MD, dB), a key measure of visual field loss in glaucoma patients, over time.

Multiple measurements of MD for each patient across several years.
We’re interested in glaucoma progression, which is defined as the rate of change in MD over time (dB/year).
Define \(Y_{it}\) as the MD value for eye \(i\) (\(i = 1,\ldots,n\)) at time \(t\) (\(t = 1,\ldots,n_i\)) and the time of each observation as \(X_{it}\) with \(X_{i0} = 0\).

Rotterdam data

Treating Eyes Separately

We can model each eye separately using OLS (this is a form of longitudinal analysis!). For \(t = 1,\ldots,n_i\), the model is:

\[Y_{it} = \beta_{0i} + X_{it} \beta_{1i} + \epsilon_{it}, \quad \epsilon_{it} \stackrel{iid}{\sim} N(0, \sigma_i^2).\]

Where:

\(\beta_{0i}\) is the intercept for eye \(i\).
\(\beta_{1i}\) is the slope for eye \(i\) (i.e., disease progression).
\(\sigma_i^2\) is the residual error for eye \(i\).

OLS regression

Treating Eyes Separately

Fitting OLS separately allows each eye to have a unique intercept and slope, which of course is consistent with the data generating process.
However, this can lead to eye-specific intercepts and slopes that are not realistic (consider OLS regression with very few data points).
Estimating eye-specific intercepts and slopes within the context of the whole study sample should shrink extreme values toward the population average.

Subject-specific intercepts and slopes

For \(i = 1,\ldots,n\) and \(t=1,\ldots,n_i\), we can write the model:

\[\begin{aligned} Y_{it} &= \beta_{0i} + X_{it} \beta_{1i} + \epsilon_{it}, \quad \epsilon_{it} \stackrel{iid}{\sim} N(0, \sigma^2),\\ \beta_{0i} &= \beta_0 + \theta_{0i},\\ \beta_{1i} &= \beta_1 + \theta_{1i}. \end{aligned}\]

Population Parameters:

\(\beta_0\) is the population intercept (i.e., average MD value in the population at time zero).
\(\beta_1\) is the population slope (i.e., average disease progression).
\(\sigma^2\) is the population residual error.

Subject-specific intercepts and slopes

For \(i = 1,\ldots,n\) and \(t=1,\ldots,n_i\), we can write the model:

Subject-Specific Parameters:

\(\theta_{0i}\) is the subject-specific deviation from the intercept for eye \(i\).
\(\theta_{1i}\) is the subject-specific deviation from the slope for eye \(i\).

Subject-specific intercepts and slopes

For \(i = 1,\ldots,n\) and \(t=1,\ldots,n_i\), we can write the model:

Key Advantage:

This model defines subject-specific estimates of \(\beta_{0i}\) and \(\beta_{1i}\) relative to the population average, preventing overfitting and making the estimates more stable.
Shrinks subject-specific parameters to the population average.

Linear Mixed Model

The subject-specific intercepts and slope model can be seen as a special case of the linear mixed model (LMM). For \(i = 1,\ldots,n\), LMM is defined as:

\[\mathbf{Y}_i = \mathbf{X}_i \boldsymbol{\beta} + \mathbf{Z}_i \boldsymbol{\theta}_i + \boldsymbol{\epsilon}_i, \quad \boldsymbol{\epsilon}_i \stackrel{ind}{\sim} N_{n_i}(\mathbf{0}_{n_i}, \sigma^2 \mathbf{I}_{n_i}).\]

\(\mathbf{Y}_i = (Y_{i1},\ldots,Y_{in_i})\) are subject-level observations.
\(Y_{it}\) is the \(t\)th observation in subject \(i\).
\(\boldsymbol{\epsilon}_i = (\epsilon_{i1},\ldots,\epsilon_{in_i})\), such that \(\epsilon_{it} \stackrel{iid}{\sim} N(0,\sigma^2)\).