AMAP assessment methodology

Assessment methodology

Modelling mercury time series

Time series of mercury concentrations are assessed by first log transforming the concentration measurements and then fitting a mixed model of the form:

response: log concentration
fixed: f(year)
random: year + sample + analytical

The fixed effects model describes the systematic changes in log concentrations over time (year). The form of f(year) depends on the number of years of data. With 5-6 years of data, log concentrations are assumed to change linearly over time:

\(\text{f(year)} = \mu + \beta\text{year}\)

With 7+ years of data, more complex patterns of change are considered and the log concentrations are allowed to change smoothly over time:

\(\text{f(year) = s(year)}\)

with the amount of smoothing determined by the data (see later). Only time series with 5+ years of data are considered.

The random effects model describes the sources of random variation in the log concentrations. It has three components:

year: random variation in log concentration between years. Here, year is treated as a categorical variable
sample: random variation in log concentration between samples within years. When there is only one sample each year, this term is omitted and implicitly subsumed into the between-year variation
analytical: random variation inherent in the chemical measurement process. This is assumed known and derived from the the ‘uncertainties’ reported with the data. Specifically, if \(u_i\), \(i = 1...n\), are the uncertainties associated with concentrations \(c_i\) (expressed as the standard deviations of the concentration measurements), then the standard deviations of the log concentration measurements \(\text{log } c_i\) are taken to be \(u_i/c_i\). Measurements with \(u_i > c_i\) (i.e. an analytical coefficient of variation of more than 100%) are omitted from the time series.

The linear mixed model is fitted by maximum likelihood assuming each of the random effects are independent and normally distributed (on the log concentration scale)¹.

Log concentrations are allowed to change smoothly over time when there are 7+ years of data. The amount of smoothing is chosen by fitting a series of models with different degrees of freedom (df) and selecting the model with the lowest Akaike’s Information Criterion corrected for small sample size (AICc)². Effectively, the data determine the amount of smoothing, with AICc providing an appropriate balance between model fit and model parsimony³. With 7-9 years of data, a linear model (a special case of a smoother with 1 df) and a smoother (thin plate regression spline) on 2 df are fitted, with the selected model that with the lower AICc. With 10-14 years of data, a linear model and smoothers on 2 and 3 df are fitted and compared. With 15+ years of data, a linear model and smoothers on 2, 3, and 4 df are fitted and compared.

Modifications to the methodology are required when there are ‘less-than’ measurements, i.e. concentrations reported as below the detection limit, and missing uncertainties, i.e. the analytical variability associated with some of the concentration measurements was not reported.

Assessing temporal trends

Temporal trends are assessed using the selected mixed model. When the model is linear (i.e. when there are 5-6 years of data, or if there are 7+ years of data and no evidence of nonlinearity), the statistical significance of the temporal trend is obtained from a likelihood ratio test⁴ that compares the fit of the linear model

\(\text{f(year)} = \mu + \beta\text{year}\)

with the fit of a model in which there is no trend

\(\text{f(year)} = \mu\)

The summary map shows a downward or upward trend if the trend is significant at the 5% significance level.

When a smooth model has been selected, a plot of the fitted model is needed to understand the overall pattern of change. (This is available in the Assessment plot or Trend with data tabs in the html report file obtained by clicking on the time series in the summary map.) The summary map focusses on just one aspect of the change over time: the change in concentration in the most recent twenty monitoring years; i.e. between 1999 and 2018 (the assessment only includes data up to 2018). For this, the fitted value of the smoother in 2018 is compared to the fitted value in 1999 using a t-test⁵, with significance assessed at the 5% level. The correlation between the two fitted values is accounted for by the t-test. If the time series does not extend to 2018, then the fitted value in the last monitoring year is used instead. Similarly, if the time series starts after 1999, the fitted value in the first monitoring year is used.

Post-hoc power calculations

The performance of each time series was characterised by its power to detect a log-linear trend in concentration. Several metrics were calculated, all based on two-sided tests at the 5% significance level.

the percentage annual increase in concentration detectable by the monitoring programme with 80% power
as above, but assuming any years with no data had also been sampled
the percentage annual increase in concentration detectable with 80% power given 10 years of annual monitoring and variability typical of the time series
the number of years of annual monitoring with variability typical of the time series required to detect an annual 5% increase in concentration with 80% power
the power of the monitoring programme to detect an annual 5% increase in concentration
as above, but assuming any years with no data had also been sampled
the power to detect an annual 5% increase in concentration given 10 years of annual monitoring and variability typical of the time series.

Note that the power to detect an annual 5% increase in concentration is equivalent to the power to detect an annual 4.8% decrease in concentration. The asymmetry arises because of the log transformation.

¹ Such models cannot be readily fitted in the R statistical environment becuase the \(\text{analytical}\) variance is assumed know. Instead, the likelihood is maximised directly using the optim function. Ideally, the models should be fitted by restricted maximum likelihood (apart from when being used for likelihood ratio tests), but this has not been implemented yet. ↩

² AICc is a model selection criterion that gives greater protection against overfitting than AIC when the sample size is small. For contaminant time series, small sample sizes correspond to few years of data. AICc is not formally defined for mixed models, but the usual definition is adapted to give a sensible criterion for the models considered here. The usual definition of AICc is

\[\text{AICc = - 2 log likelihood} + 2 k n / (n - k - 1)\]

where \(n\) is the sample size and \(k\) is the number of parameters in the model. For a contaminant time series, the natural definition of the sample size is the number of years of data, \(N\), say. The number of parameters in the number of fixed effects parameters, \(k_\text{fixed}\), plus the number of (unknown) variance parameters, \(k_\text{random}\). For example, the linear model has \(k_\text{fixed}\) = 2 and \(k_\text{random}\) = 2 (or 1 if the sample variance component is subsumed into the year variance component). This suggests using

\[\text{AICc = - 2 log likelihood} + 2 (k_\text{fixed} + k_\text{random}) N / (N - k_\text{fixed} - k_\text{random} - 1)\]

However, the denominator now overly penalises models because the ‘sample size’ is the number of years and, whilst subtracting \(k_\text{random}\) correctly corrects for the year variance component, it also corrects for the sample variance component which measures within-year variation. (Indeed, the denominator = 0 if \(N\) = 5 and the linear model is fitted.) It therefore makes sense to take \(k_\text{random}\) in the denominator to be 1, corresponding to the year variance component, giving

\[\text{AICc = - 2 log likelihood} + 2 (k_\text{fixed} + k_\text{random}) N / (N - k_\text{fixed} - 2)\]

The denominator is now analogous to that used in a linear model with a single normally distributed error term.↩

³ Methods for estimating the smoothing degrees of freedom as part of the fitting process, for example by treating the amount of smoothing as an extra variance component, are available for several classes of models. However, such methods are not implemented in R for the case when the residual variance (the \(\text{analytical}\) variance) is known. This is a topic for future development. ↩

⁴ These tests have a type 1 error that is larger than the nominal value. For example, tests conducted at the 5% significance level will find ‘significant’ trends in more than 5% of time series, even when there are no trends. Using the standard error of the estimate of \(\beta\) from a restricted maximum likelihood fit of the linear model would be one way to improve the situation. Better still would be to use the Kenward Roger modification of F tests for linear mixed models (Kenward MG & Roger JH, 1997; Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood, Biometrics 53: 983-997). ↩

⁵ The test is based on a t-distribution with \(N\) - \(k_\text{fixed}\) - 1 degrees of freedom.
↩