12.4 Inference with additive models

While models of this type provide very flexible and visually informative descriptions of the data, it is also necessary to consider how models can be compared and inferences drawn. Although we are outside the strict realm of a standard linear model, as a result of the smoothness constraints, we generally proceed by analogy with the linear model.

For an additive model, the residual sum-of-squares can easily be defined as \[ \mbox{RSS} = \sum_{i=1}^n (y_i - \hat{y}_i)^2, \] where \(\hat{y}_i\) denotes the fitted value, produced by evaluating the additive model at the observation \(x_i\). We can write the residual sum-of-squares as \[ \mbox{RSS} = \sum_{i=1}^n (y_i - \hat{y}_i)^2 = y^T (I-S)^T (I-S) y, \] where \(S\) denotes the projection matrix discussed earlier. The approximate degrees of freedom for error can be defined as \[ \mbox{df} = \mbox{tr}\{(I-S)^T (I-S)\} . \]

In an obvious notation, comparisons of two models can expressed quantitatively in \[ F = \frac{(\mbox{RSS}_2-\mbox{RSS}_1) / (\mbox{df}_2-\mbox{df}_1)} {\mbox{RSS}_1 / \mbox{df}_1} , \label{eq:add_F} \] by analogy with the \(F\)-statistic used to compare linear models. Unfortunately, this analogy does not extend to distributional calculations and no general expression for the distribution of the F-statistic is available. However, Hastie & Tibshirani (1990; sections 3.9 and 6.8) suggest that at least some approximate guidance can be given by referring the observed nonparametric \(F\)-statistic to an F distribution with \((\mbox{df}_2-\mbox{df}_1)\) and \(\mbox{df}_1\) degrees of freedom.

A different approach is to examine whether particular groups of coefficients in the regression spline, for example those associated with the building blocks for a particular term in the additive model, might all be zero. The details of this are comprehensively discussed in Wood (2017) and this is the approach implemented in mgcv.

The reef data provide a simple illustration of how model comparisons may be made. The table below indicates that both Latitude and Longitude show significant effects on the catch score.

  summary(trawl.model)$s.table
##                   edf   Ref.df         F     p-value
## s(Latitude)  1.000000 1.000000  9.648493 0.003879748
## s(Longitude) 7.023069 8.064042 26.768506 0.000000000

Is the additive model sufficient or do we need an interaction term (which would simply create a smooth surface over Latitude and Longitude simultaneously)? We can examine that by adding an interaction term. The evidence for its presence is not convincing.

ind         <- (trawl$Year==0 & trawl$Zone==1)
trawl.model1 <- gam(Score1 ~ s(Latitude) + s(Longitude) + ti(Latitude, Longitude),
                    data = trawl, subset = ind)
summary(trawl.model1)$s.table
##                             edf   Ref.df         F   p-value
## s(Latitude)            1.000000 1.000000  6.213004 0.0196678
## s(Longitude)           7.799202 8.444384 23.363779 0.0000000
## ti(Latitude,Longitude) 6.972077 8.728649  1.154443 0.3665392