Recall we worked with a dataset on women aged 15-49 sampled from the 2013-14 Democratic Republic of Congo (DRC) Demographic and Health Survey. Variables are:
loc_id: location id (i.e. survey cluster).
hemoglobin: hemoglobin level (g/dL).
anemia: anemia classifications.
age: age in years.
urban: urban vs. rural.
LATNUM: latitude.
LONGNUM: longitude.
Motivating dataset
loc_id hemoglobin anemia age urban LATNUM LONGNUM
1 1 12.5 not anemic 28 rural 0.220128 21.79508
2 1 12.6 not anemic 42 rural 0.220128 21.79508
3 1 13.3 not anemic 15 rural 0.220128 21.79508
4 1 12.9 not anemic 28 rural 0.220128 21.79508
5 1 10.4 mild 32 rural 0.220128 21.79508
6 1 12.2 not anemic 42 rural 0.220128 21.79508
Modeling goals:
Learn the associations between age and urbanality and hemoglobin, accounting for unmeasured spatial confounders.
Create a predicted map of hemoglobin across the spatial surface controlling for age and urbanality, with uncertainty quantification.
Map of Sud-Kivu state
Last time, we focused on one state with ~500 observations at ~30 locations.
Map of the DRC
Today we will extend the analysis to the full dataset with ~8,600 observations at ~500 locations.
Modeling
Data objects:
indexes unique locations.
indexes individuals at each location.
denotes the observation of individual at location .
, where is the number of predictors (excluding intercept).
Modeling
Population parameters:
is the intercept.
is the regression coefficients.
is the overall residual error (nugget).
Location-specific parameters:
denotes coordinates of location .
denotes the spatial intercept at location .
Location-specific notation
.
is an dimensional matrix with rows .
.
Full data notation
, with .
stacks .
.
is an dimensional block diagonal binary matrix. Each row contains a single 1 in column that corresponds to the location of .
Modeling
We specify the following model: with priors
, where is the Matérn 3/2 covariance function with magnitude and length scale
. This is the intercept after centering .
,
Computational issues with GP
Effectively, the prior for is Matérn 3/2 is an isotropic covariance function, .
This is not scalable because we need to invert an dense covariance matrix for each MCMC iteration, which requires floating point operations (flops), and memory.
Scalable GP methods overview
The computational issues motivated exploration in scalable GP methods. Existing scalable methods broadly fall under two categories.
Sparsity methods
sparsity in , e.g., covariance tapering (@furrer2006covariance).
sparsity in , e.g., Vecchia approximation (@vecchia1988estimation) and nearest-neighbor GP (@datta2016hierarchical).
Low-rank methods
approximate on a low-dimensional subspace.
e.g., process convolution (@higdon2002space), inducing point method(@snelson2005sparse).
Hilbert space method for GP
@solin2020hilbert introduced a Hilbert space method for reduced-rank Gaussian process regression (HSGP).
@riutort2023practical discussed how to practically implement HSGP.
Tutorial codes are available in different probabilistic programming languages:
We want to make predictions for , observations at new locations. Define , . Recall:
Likelihood: – remains the same as for GP
Kriging: – we will focus on this next
Posterior distribution: – we have just discussed
Kriging
Recall under the GP prior,
where is the covariance of , is the covariance of , and is the cross covariance matrix between and .
Therefore by properties of multivariate normal,
Kriging under HSGP
Under HSGP, , , where is the feature matrix for the new locations. Therefore approximately Claim , where denotes a generalized inverse of matrix such that .
Kriging under HSGP
Claim
Sketch proof below, see details in class:
By properties of multivariate normal, ,
Show if has full column rank, which is true under HSGP, then for any matrix of proper dimension, Therefore .
Kriging under HSGP
Under the reparameterized model, , for Therefore
During MCMC sampling, we can obtain posterior predictive samples for through posterior samples of and . Let superscript denote the th posterior sample:
Kriging under HSGP – alternative view
Under the reparameterized model, there is another (perhaps more intuitive) way to recognize the kriging distribution under HSGP.
We model , where is treated as the unknown parameter. Therefore for kriging:
HSGP kriging in stan
Kriging under HSGP can be easily implemented in stan.
for covariance function which admits a power spectral density.
on a box .
with number of basis functions.
We have talked about:
why HSGP is scalable.
how to do posterior sampling and posterior predictive sampling in stan.
HSGP parameters
@solin2020hilbert showed that HSGP approximation can be made arbitrarily accurate as and increase.
But how to choose:
size of the box .
number of basis functions .
HSGP approximation box
Due to the design of HSGP, the approximation is less accurate near the boundaries of .
Suppose all the coordinates are centered. Let such that is the smallest box which contains all observed and prediction locations. We should at least ensure .
We want the box to be large enough to ensure good boundary accuracy. Let be boundary factors, we consider
HSGP approximation box
How much the approximation accuracy deteriorates towards the boundaries depends on smoothness of the true surface.
E.g., the larger the length scale , the smoother the surface, a smaller box can be used for the same level of boundary accuracy.
HSGP approximation box
The larger the box,
the more basis functions we need for the same level of overall accuracy,
hence higher run time.
HSGP basis functions
The total number of basis functions , i.e., we need to decide on ’s, the number of basis functions for each dimension.
The higher the , the better the overall approximation accuracy, the higher the runtime.
scales exponentially in , hence the HSGP computation complexity also scales exponentially in . Therefore HSGP is only recommended for , at most .
Relationship between and
@riutort2023practical used simulations with a squared exponential covariance function to investigate the relationship between and .
Notice that needs to be above a certain minimum value to achieve a reasonable boundary accuracy.
Relationship between , and
To summarize:
The boundary factor needs to be above a minimum value, otherwise HSGP approximation is poor no matter how large is.
As increases, the surface is less smooth,
needs to increase to retain boundary accuracy.
needs to increase to retain overall accuracy.
As increases, needs to increase to retain overall accuracy.
As increases, run time increases. Hence we want to minimize and while maintaining certain accuracy level.
And, we do not know . So how to choose and ?
Prepare for next class
Work on HW 05 which is due Apr 8
Complete reading to prepare for Thursday’s lecture