AMAP dealing with less thans

Overview

Measurements reported as below the detection limit are often known as ‘less-than’ measurements. Less-thans are examples of left-censored data. Provided there are not too many less-thans, the same contaminant time series models can be fitted provided the likelihood is adjusted accordingly. Further refinements are required to prevent over-fitting if there are many less-thans or if the less-thans are unevenly distributed across the time series. When most of the data are less-thans, then trend detection is not feasible.

Adjustments to the likelihood

The likelihood when some measurements are less-thans is straightforward when there is only one measurement each year, because the measurements are then (assumed to be) statistically independent. Let \(y_i\) be the logarithm of the reported concentration in year \(t_i, i = 1...N\), and let \(A\) be {\(i\): \(y_i\) is a non-censored measurement} and \(\bar A\) be {\(i\): \(y_i\) is a less-than}. The likelihood of the data is then:

\[\prod_{i \in A} \frac 1 {\omega_i} \phi \left( \frac {y_i - \text f(t_i)} {\omega _i} \right) \prod_{i \in \bar A} \Phi \left( \frac {y_i - \text f(t_i)} {\omega _i} \right) \]

where \(\phi\) is the density function of a standard normal distribution (with zero mean and unit variance), \(\Phi\) is the corresponding cumulative density function, \(\text f(t_i)\) is the expected value of \(y_i\), and \(\omega _i\) is the standard deviation of \(y_i\) given by:

\[\omega _i^2 = \sigma_\text{year}^2 + \sigma_\text{sample}^2 + \sigma_{\text{analytical},i}^2\]

where \(\sigma_\text{year}\), \(\sigma_\text{sample}\), \(\sigma_{\text{analytical},i}\) are the between-year, between-sample and analytical standard deviations respectively. Note that the analytical standard deviations are measurement specific and are based on the uncertainties reported with the data.

The likelihood is more complicated when there are several measurements in a year, because these measurements are dependent. Extending the previous notation, let \(y_{ij}\) be the logarithm of the \(j\)th reported concentration in year \(t_i\), and let \(A_i\) be {\(j\): \(y_{ij}\) is a non-censored measurement} and \(\bar A_i\) be {\(j\): \(y_{ij}\) is a less-than}. Then the likelihood of the data is

\[\prod_i \int_{-\infty}^\infty \phi \left( \frac {z - \text f(t_i)} {\sigma _\text{year}} \right) \prod_{j \in A_i} \frac 1 {\omega_{ij}} \phi \left( \frac {y_{ij} - z} {\omega_{ij}} \right) \prod_{j \in \bar A_i} \Phi \left( \frac {y_{ij} - z} {\omega_{ij}} \right) \text dz\]

where \(\omega _{ij}\) is the within-year standard deviation of \(y_{ij}\) given by:

\[\omega _{ij}^2 = \sigma_\text{sample}^2 + \sigma_{\text{analytical},ij}^2\]

Refinements

Less-than measurements contain less infomation about changes in concentration over time than non-censored measurements. Therefore, the form of \(\text{f}(t)\) fitted to the data is based on \(N_+\), the number of years of data with at least one non-censored measurement, rather than \(N\), the total number of years of data. Specifically:

\(N_+ \leq 4\): no model is fitted and the time series is not considered further
\(5 \leq N_+ \leq 6\): linear model \(\text f(t) = \mu + \beta t\)
\(N_+ \geq 7\): smooth model \(\text f(t) = \text s(t)\); Smoothers on 2 degrees of freedom (df) are considered when \(7 \leq N_+ \leq 9\), on 2 and 3 df when \(10 \leq N_+ \leq 14\) and on 2, 3, and 4 df when \(N_+ \geq 15\).

For consistency, \(N_+\) is also used instead of \(N\) in the calculation of AICc and residual degrees of freedom.

When \(N_+\) is relatively small compared to \(N\), the model fits can become environmentally implausible, particularly if there are changes in the limit of detection over time, or if a linear or smooth model is fitted and the years at the start and end of the time series only have less-than measurements. To protect against this behaviour, three additional constraints are placed on the time series.

The time series is truncated from the left (i.e. early years are omitted) until \(N_+ \geq N/2\). For example, if there are twenty years of data (each with a single measurement) and the measurements in years 2, 11, 13, 16, 17, and 19 are non-censored, then the time series assessed comprises the data from years 11 through 20.
The first year of the time series is taken to be the first year with a non-censored measurement (i.e. all earlier years, which only contain less-thans, are omitted). For example, if there are ten years of data and the measurements in years 3, 4, 6, 8, 9, and 10 are non-censored, then the time series assessed comprises the data from years 3 through 10.
If the measurements in the most recent year(s) of the time series are all less-thans, then the expected concentration in the most recent year(s) is assumed to be constant. Specifically, if \(t_\text{last}\) is the last year with a non-censored measurement, then \(\text f(t)\) is adjusted to:

\[\text f(t) = \begin{cases} \alpha + \beta t, & \text{if } t < t_\text{last} \\ \alpha + \beta t_\text{last}, & \text{if } t \geq t_\text{last} \end{cases}\]

for the linear model and similarly for the smooth model.

back to top

Treatment of ‘less-than’ measurements

Overview

Adjustments to the likelihood

Refinements