Detecting differences in density estimates

Author

Centre for Research into Ecological and Environmental Modelling
University of St Andrews

Published

March 22, 2024

Demonstration

Comparing two density estimates

Estimating difference in density

Perhaps your study intends not only to estimate density in two strata of your study area, as is the case with the minke whale survey in the Southern Ocean; but also to make inference regarding the magnitude of the difference in densities. That inference could take the form of a significance test or inference based upon an estimated difference and its confidence interval.

Aside

When I use the term stratum, I could be referring to two geographic strata in your study area. More broadly, strata in an analysis could be two different years of surveys of the same study area, or two species within a study area in a single survey. Depending upon how strata are defined, inferences can be made about differences across space, across time, or between species.

I have written a function that computes test statistics and confidence intervals for the difference in estimated densities derived from distance sampling surveys. This function is not part of the Distance package, but is provided in the files associated with Practical 7. As explained in the Appendix below, the nature of the computation depends upon whether the two estimates being compared share a common detection function. As you can imagine, if the estimates share a common detection function, then there is a dependency between the estimates which must be taken into account.

As an example of the comparison of two density estimates, you can perform the calculations upon the minke whale survey data. Notice I have modified the survey data to add observed size of each minke whale group, such that the data to analyse for the density contrast is a superset of the data you analysed in Practical 7 (all else is the same, I’ve simply added a column size for each detection). Hence, the code snippet below begins by reading the data from a CSV file I also include in your RStudio/Cloud workspace for Practical 7.

First, detection functions are fitted to the data set. In the same way as in Practical 7, we fit a pooled detection function, the result placed in the object hazard.pooled. We also split the data by Region.Label and fit separate detection functions to each stratum, resulting in two additional fitted objects: ideal.hr and marginal.hr.

library(Distance)
whales <- read.csv("https://raw.githubusercontent.com/erex/Oct-Quarto/main/Pr7/stratify_examp.csv")
hazard.pooled <- ds(data=whales, key="hr", truncation = 1.5)
ideal.hr <- ds(data=whales[whales$Region.Label=="Ideal",], key="hr", truncation=1.5)
marginal.hr <- ds(data=whales[whales$Region.Label=="Marginal",], key="hr", truncation=1.5)

Code below sources the new function from the file density.difference.ds.r and makes two calls to the function density.difference.ds

source("density.difference.ds.r")
density.difference.ds(hazard.pooled)

  n.detect  cv.ER line.length n.transects group.size group.size.cv nparm.detect
1       39 0.2323         484          13     2.1538        0.1397            2
2       49 0.3724        1370          12     2.3265        0.2303            2
   D.hat D.hat.cv     f0   f0cv
1 0.0929   0.2916 1.0711 0.1073
2 0.0446   0.4508 1.0711 0.1073
  D1.minus.D2 SE.difference t.statistic df.t.stat P.value     LCB    UCB
1      0.0484        0.0402      1.2048   49.2919   0.234 -0.0323 0.1291

density.difference.ds(ideal.hr, marginal.hr)

  n.detect  cv.ER line.length n.transects group.size group.size.cv nparm.detect
1       39 0.2323         484          13     2.1538        0.1397            2
2       49 0.3724        1370          12     2.3265        0.2303            2
   D.hat D.hat.cv     f0   f0cv
1 0.1167   0.3182 1.3450 0.1338
2 0.0365   0.4451 0.8781 0.1315
  D1.minus.D2 SE.difference t.statistic df.t.stat P.value     LCB    UCB
1      0.0802        0.0405      1.9779   27.6636   0.058 -0.0029 0.1633

Output interpretation

The first two lines of output echo much of the input information: number of detections, encounter rate CV, line length, number of transects, average group size, CV of average groups size, number of parameters in the detection function. Also echoed are the stratum-specific density estimates and their CV along with the estimate of the \(f(0)\) parameter of the detection function and its CV.

The last line of output shows the difference of estimated densities and its standard error. Following this is the test statistic which is distributed as a t-statistic (Equation 2) and its associated degrees of freedom (Equation 3) and an associated significance level for a two-tailed test. The final pair of values are the bounds of the confidence interval on the estimated difference computed from Equation 5.

Note: When a pooled detection function is used, the difference is relatively small (~0.05) and non-significant. When separate detection functions are used, the magnitude of the estimated difference increases (~0.08), with a standard error of roughly the same magnitude. Quite a few degrees of freedom are lost by having to estimate two additional parameters from separate detection functions. The total width of the confidence interval changes little (0.1662 for separate detection functions versus 0.1614 for pooled detection function). However because of the shift in the estimated difference from 0.0484 to 0.0802, the significance level shifts from 0.234 to 0.058.

Lesson for survey design: The uncertainty in all of the density estimates come from encounter rate variability. If the purpose of the study was to demonstrate a difference in density between these strata, a better survey design, with more than 13 and 12 transects would have been required to produce better estimates of stratum-specific encounter rate uncertainty.

Footnote

What if I used stratum as a covariate in the detection function?

At the time Buckland et al. (2001) was published, the use of covariates in the detection function was in early stages of development; hence that case was not considered when testing for differences between density estimates were described. Consequently, the appendix below does not address this situation.

But code exists for addressing assessing differences in density between strata when stratum is a covariate in the detection function. The code along with two examples of its use, are presented in our case study website.

Appendix: equations describing computation of estimated density differences

This is Section 3.6.5 of Buckland et al. (2001)

Frequently, we wish to draw inference on change in density over time, or difference in density between habitats. We consider here simple comparisons between two density estimates.

Consider two density estimates, \(\hat{D}_{1}\) and \(\hat{D}_{2}.\) Suppose first that they are independently estimated. We then estimate the difference in density by \(\hat{D}_{1}-\hat{D}_{2}\) with variance \[ \widehat{\operatorname{var}}\left(\hat{D}_{1}-\hat{D}_{2}\right)=\widehat{\operatorname{var}}\left(\hat{D}_{1}\right)+\widehat{\operatorname{var}}\left(\hat{D}_{2}\right) \tag{1}\] Distance provides approximate degrees of freedom \(\mathrm{df}_{1}\) for \(\hat{D}_{1}\) and \(\mathrm{df}_{2}\) for \(\hat{D}_{2},\) based on Satterthwaite’s approximation. We can use these to obtain an approximate \(t\)-statistic:
\[ T=\frac{\left(\hat{D}_{1}-\hat{D}_{2}\right)-\left(D_{1}-D_{2}\right)}{\sqrt{\widehat{\operatorname{var}}\left(\hat{D}_{1}-\hat{D}_{2}\right)}} \sim t_{\mathrm{df}} \tag{2}\]

where \[ \mathrm{d} \mathrm{f} \simeq \frac{\left\{\widehat{v a r}\left(\hat{D}_{1}\right)+\widehat{v a r}\left(\hat{D}_{2}\right)\right\}^{2}}{\left\{\widehat{v a r}\left(\hat{D}_{1}\right)\right\}^{2} / \mathrm{df}_{1}+\left\{\widehat{v a r}\left(\hat{D}_{2}\right)\right\}^{2} / \mathrm{df}_{2}} \tag{3}\]

Provided df are around 30 or more, the simpler \(z\)-statistic provides a good approximation: \[ Z=\frac{\left(\hat{D}_{1}-\hat{D}_{2}\right)-\left(D_{1}-D_{2}\right)}{\sqrt{\widehat{\operatorname{var}}\left(\hat{D}_{1}-\hat{D}_{2}\right)}} \sim N(0,1) \tag{4}\] For either statistic, we can test the null hypothesis \(H_{0}: D_{1}=D_{2}\) by substituting \(D_{1}-D_{2}=0\) in Equation 2 or Equation 4, and looking at the resulting value in \(t\)-tables or \(z\)-tables. Approximate \(100(1-2 \alpha) \%\) confidence limits for \(\left(D_{1}-D_{2}\right)\) are given by \[ \left(\hat{D}_{1}-\hat{D}_{2}\right) \pm t_{\mathrm{df}}(\alpha) \sqrt{\widehat{\operatorname{var}}\left(\hat{D}_{1}-\hat{D}_{2}\right)} \tag{5}\]

for df \(<30,\) or \[ \left(\hat{D}_{1}-\hat{D}_{2}\right) \pm z(\alpha) \sqrt{\widehat{\operatorname{var}}\left(\hat{D}_{1}-\hat{D}_{2}\right)} \tag{6}\] otherwise. Often, a single detection function is fitted to pooled data, so that \[ \hat{D}_{1}=\frac{n_{1} \hat{f}(0) \hat{E}_{1}(s)}{2 L_{1}} \quad \text { and } \quad \hat{D}_{2}=\frac{n_{2} \hat{f}(0) \hat{E}_{2}(s)}{2 L_{2}} \tag{7}\] (where the terms \(\hat{E}_{i}(s)\) are omitted if the objects are not in clusters). Because \(\hat{f}(0)\) appears in both equations, we can no longer assume that \(\hat{D}_{1}\) and \(\hat{D}_{2}\) are independent. Instead, we can write \[ \hat{D}_{i}=\hat{M}_{i} \hat{f}(0), \quad i=1,2 \tag{8}\] where \[ \hat{M}_{i}=\frac{n_{i} \hat{E}_{i}(s)}{2 L_{i}} \tag{9}\] As the \(M_{i}\) are independently estimated, and are assumed to be independent of \(\hat{f}(0),\) we can now find the variance of \(\hat{D}_{1}-\hat{D}_{2}\) using the delta method: \[ \hat{D}_{1}-\hat{D}_{2}=\left(\hat{M}_{1}-\hat{M}_{2}\right) \hat{f}(0) \tag{10}\] so that \[ \begin{aligned} \widehat{\operatorname{var}}\left(\hat{D}_{1}-\hat{D}_{2}\right) &=\left(\hat{D}_{1}-\hat{D}_{2}\right)^{2}\left[\frac{\widehat{\operatorname{var}}\left(\hat{M}_{1}-\hat{M}_{2}\right)}{\left(\hat{M}_{1}-\hat{M}_{2}\right)^{2}}+\frac{\widehat{\operatorname{var}}\{\hat{f}(0)\}}{\{\hat{f}(0)\}^{2}}\right] \\ &=\{\hat{f}(0)\}^{2} \widehat{\operatorname{var}}\left(\hat{M}_{1}-\hat{M}_{2}\right)+\left(\hat{M}_{1}-\hat{M}_{2}\right)^{2} \widehat{var}\{\hat{f}(0)\} \end{aligned} \tag{11}\] where \[ \widehat{\operatorname{var}}\left(\hat{M}_{1}-\hat{M}_{2}\right)=\widehat{\operatorname{var}}\left(\hat{M}_{1}\right)+\widehat{\operatorname{var}}\left(\hat{M}_{2}\right) \tag{12}\] and \[ \widehat{\operatorname{var}}\left(\hat{M}_{i}\right)=\hat{M}_{i}^{2}\left[\frac{\widehat{\operatorname{var}}\left(n_{i}\right)}{n_{i}^{2}}+\frac{\widehat{\operatorname{var}}\left\{\hat{E}_{i}(s)\right\}}{\left\{\hat{E}_{i}(s)\right\}^{2}}\right], \quad i=1,2 \tag{13}\] Note that the second form of Equation 11 still applies when \(\left(\hat{M}_{1}-\hat{M}_{2}\right)=0,\) whereas the first form leads to a ratio of zero over zero. Inference can now proceed as before, either with additional applications of Satterthwaite’s approximation in conjunction with Equation 2 if an approximate \(t\)-statistic is required, or more usually, by straightforward application of Equation 4.

References

Buckland, S. T., Anderson, D. R., Burnham, K. P., Laake, J. L., Borchers, D. L., & Thomas, L. (2001). Introduction to distance sampling: Estimating abundance of biological populations. Oxford, New York: Oxford University Press.