Size bias—how large is the problem?

Author

Centre for Research into Ecological and Environmental Modelling
University of St Andrews

Published

October 16, 2024

Supplement

Correcting size bias via size as a covariate. When does it matter?

Size bias in distance sampling surveys

As shown in the lecture, if detectability is a function not only of distance, but also size (big groups are easier to see than small groups), then groups in the sample are likely to be larger than groups in the entire population. Consequently, when the density of groups is scaled up to the density of individuals \[\hat{D}_{indiv} = \hat{D}_{groups} \times \overline{size}_{group}\]

\(\hat{D}_{indiv}\) is overestimated.

A resolution to this problem is to explicitly model the probability of detection as a function of group size using size as a covariate in the detection function. I will demonstrate two applications: one where group size variability is small and one where group size variability is large. I will use simulation (where the answer is known) to demonstrate.

The necessary syntax to include covariates, group size in this instance, in the detection function is:

a.covariate <- ds(my.data, transect="line", key="hn", formula=~size)

Example 1: Perhaps a terrestrial ungulate

Here animals occur in small herds. The distribution of herd size is Poisson with a mean herd size of 10.

Figure 1: Group size distribution for animals in small groups

You can see it is very rare for herds to exceed a size of twice the mean.

I’ll create a population with this distribution of herd size, with a true number of herds of 200; hence true number of individuals in the population is 2000.

Do we have the tell-tale sign of size bias–missing small groups at large distances?

Perhaps small groups at large distances are missed; include group size in the detection function.

Analysis including group size covariate

The distribution of computed average group size centred on the true size of 10 and there was no problem with fitting a detection function. The average over the simulations estimated number of individuals was 2003.05, a bias of 0.2%.

Analysis without group size covariate

As a comparison, what happens if we don’t include size as a covariate in our detection function?

The distribution of computed average groups sizes is shown above. We would expect an overestimate of mean group size because small groups at large distances are missing from our sample; but that effect is small in this instance. As a consequence, the average \(\hat{N}_{indiv}\) across all simulations is 2002.71, a bias of 0.1%.

Example 2: Possible dolphin pods or seabird rafts

I use a different distribution to mimic the group size distribution. A log normal distribution (you heard about it during the precision lecture) is like a normal distribution that has had its right tail pulled out. In all other respects the survey is the same (same design, etc.)

The median of this distribution is 12 (not far from 10 in the previous example), but because of the right tail, the mean is 22.5. This changes the true number of individuals in the population (for the same 200 groups) to 4493.

How about “missingness” of small groups at large distances?

Analysis with covariate

When including size as a covariate, estimates of average group size are not affected (figure above). Likewise, mean \(\hat{N}_{indiv}\) is effectively unbiased: 4440.5, a bias of -1.2%. .

Analysis without the covariate

Now mean \(\hat{N}_{indiv}\) is considerably biased: 5357.69, a bias of 19.3%.

Take home message

When variability in group size is small for your study animal, size bias is unlikely to cause a problem, because even missing small groups at large distances does not cause the average size in the detected sample to be too different from the average size in the population. However, when group size variation is large, the average size in the sample can be considerably larger than the average group size in the population, inducing positive bias in the estimated number of individuals in the population. Under those situations, include group size as a covariate in the detection function modelling.