### Disease presence within an individual subject

If the diagnostic test used is not a gold standard, then on observing a positive test result for a given subject, the key question is how likely is it that this subject is truly disease positive? This is the *PPV* (equation 3) and its value may depend on many things, not least the true prevalence of disease within the population from which the particular subject has its provenance.

To show the considerable public health ramifications that failing to account for imperfect diagnostic accuracy can have, we present a very simple but real epidemiological example based on a recent legal case in the UK[33]. Consider a farm with 118 cattle undergoing routine surveillance for bovine tuberculosis (bTB) using the comparative intra-dermal skin test in the north east of England where bTB is rarely seen. One animal has a positive test result. The skin test has a sensitivity of ≈ 0.78 and specificity of ≈ 0.999[34]. With an apparent prevalence of 1/118 the true herd prevalence *Π* using equation (1) can be estimated at ≈ 0.0096. There is available a secondary blood test for bTB based on an interferon gamma assay (IFNg). Because of the ongoing epidemic of bTB in the UK, DEFRA (Department for Environment, Food and Rural Affairs) has a policy of testing all animals with IFNg on a farm where bTB is confirmed providing the farm is in an area of the UK where bovine bTB does not usually occur[35]. Should one of the remaining 117 animals on the farm be positive to this secondary test a question that could be asked is what is the probability the animal has tuberculosis (i.e. the PPV of the secondary test)? The sensitivity and specificity of the interferon gamma test are reported as 0.909 and 0.965 respectively[36]. Therefore, in this case we have *Π* = 0.0096, *S*_{
e
}= 0.909, *S*_{
p
}= 0.965 and hence a PPV of 0.201. Therefore, the probability of a false positive is 0.799. This may be one reason why many cattle giving a positive INFg test result, originating from such low endemic districts, have no evidence of infection at post mortem[35]. This simple example demonstrates the dangers of incorrectly treating an imperfect diagnostic test as error free, or equivalently interpreting apparent prevalence as true prevalence.

Assessing the disease status of any individual subject is, to a greater or lesser extent, probabilistic in nature. As the above example highlights, however, when dealing with imperfect tests and diseases with low prevalence then the chance of observing a false positive result even with extremely specific diagnostic tests can be appreciable. It is therefore essential to always estimate the *PPV* .

### Disease prevalence within populations of subjects

One of the most well cited and founding articles in analyses of data from imperfect diagnostic tests is by Hui and Walter[8] who are credited with deriving rules for study designs which allow for the sensitivity and specificity of imperfect diagnostic tests, and the associated true prevalence of disease, to be estimated.

In short, using multiple imperfect diagnostic tests and one or more independent populations of animals, with differing prevalences, provides sufficient information to allow all model parameters to be estimated. Consider two examples: i) one population of 100 subjects are tested for disease, and each individual is tested using three different diagnostic tests (of uncertain accuracy), and it is assumed that the tests provide (biologically) independent results. This study design provides seven degrees of freedom (seven independent pieces of information), the counts of how many subjects out of 100 have each test pattern, e.g. suppose 15 individuals have$({T}_{1}^{+},{T}_{2}^{+},{T}_{3}^{-})$ — positive results for test 1 and test 2 but negative for test 3. There are eight possible patterns but since the total number of subjects is fixed then there are only seven independent counts. This study design has seven parameters which need to be estimated: *Π*,${S}_{{e}_{1}}$,${S}_{{p}_{1}}$,${S}_{{e}_{2}}$,${S}_{{p}_{2}}$,${S}_{{e}_{3}}$,${S}_{{p}_{3}}$ — the true prevalence plus the sensitivity and specificity of each test. Hence, we have seven parameters and seven degrees of freedom and therefore each parameter can be estimated as, generally speaking, it requires one degree of freedom to estimate each parameter in a model.

For a second example consider two independent populations (with assumed different prevalences), where each subject is tested using two imperfect (and assumed independent) tests. This time we have six parameters to estimate *Π*_{1}, *Π*_{2},${S}_{{e}_{1}}$,${S}_{{p}_{1}}$,${S}_{{e}_{2}}$,${S}_{{p}_{2}}$, the prevalence in each population and the sensitivity and specificity of each test. In each population we have three degrees of freedom - three independent counts, one for each test pattern,$({T}_{1}^{+},{T}_{2}^{-})$ etc, so again can estimate all the parameters required.

In summary, by adding additional population groups and/or additional tests then the unknown diagnostic sensitivity and specificity, and true disease prevalence, can be estimated. A number of important caveats apply to these study designs. In particular, it may be unreasonable to assume that the diagnostic tests used will be independent as they may share a similar biological basis.

The Hui and Walter “rules” apply only to maximum likelihood estimation of the model parameters, technically speaking these criteria ensure the model is identifiable, that each parameter in the model can be uniquely estimated given only the observed data. See[37] for a detailed examination of identifiability in respect of models for imperfect diagnostic tests. It should also be noted that the Hui and Walter approach can perform very poorly in situations where the prevalences across different populations are similar[38]. When using a Bayesian approach the situation is more flexible as the use of prior information can allow all model parameters to be readily estimated[11]. For example, the model in equation (6) does not meet the Hui and Walter rules as it has only one test and one population, and with three parameters these cannot be estimated uniquely using maximum likelihood — but can be readily estimated in a Bayesian context provided sufficient prior information is available (subject to some technical caveats such as the use of a proper prior, see[28]).

#### Correlated Diagnostic Tests

In Hui and Walter is it assumed that the diagnostic tests being used were conditionally independent, e.g. given a known positive sample then

$\begin{array}{l}P({T}_{1}^{+},{T}_{2}^{+}\mid {D}^{+})\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}P({T}_{1}^{+}\mid {D}^{+})P({T}_{2}^{+}\mid {D}^{+})\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3.5pt}{0ex}}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{S}_{{e}_{1}}{S}_{{e}_{2}}.\end{array}$

(7)

This is only tenable, however, if the tests are based on different biology, for example gross pathology and PCR, otherwise is it difficult to justify that each test provides independent evidence in support of the presence (or otherwise) of disease. Developing models which can incorporate dependence between test results comprises a large body of work, with one of the first examples being[

9] followed by many others (e.g.[

12,

13,

39,

40]). The impact of assuming conditional independence between tests, or indeed assuming a particular dependency structure, is of crucial importance in such analyses[

6,

27,

41] and we return to this later. There are a number of different ways to incorporate adjustment for correlation between tests, following[

13] the basic idea is as follows:

$\begin{array}{l}P({T}_{1}^{+},{T}_{2}^{+}\mid {D}^{+})\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}P({T}_{1}^{+}\mid {D}^{+})P({T}_{2}^{+}\mid {D}^{+})\phantom{\rule{2em}{0ex}}\\ \phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3em}{0ex}}\phantom{\rule{3.5pt}{0ex}}\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}{S}_{{e}_{1}}{S}_{{e}_{2}}+\mathrm{co}{v}_{s},\phantom{\rule{2em}{0ex}}\end{array}$

(8)

where compared to equation (7) an additional parameter is introduced whose purpose is simply to provide a numerical adjustment to ensure that the conditional probability$P\left({T}_{1}^{+},{T}_{2}^{+}|{D}^{+}\right)$ is no longer equal to the product of the test sensitivities. The statistical price for introducing covariance terms (e.g. *co* *v*_{
s
}) is that each one of these requires a degree of freedom in the study design and so additional populations and/or tests are required. How best to utilize the degrees of freedom available in any given study design is crucial in selecting an appropriate statistical model. Degrees of freedom can be “saved” by fixing or collapsing parameters in the model, for example by assuming that one or more of the tests are 100% specific or that two tests may have approximately the same specificity but different sensitivities.

We now present a brief empirical example comprising of multiple (three) imperfect and potentially correlated tests. All the JAGS (and R) code, and the data necessary to conduct this example, along with detailed instructions, can be found in the computing appendix (together with several other related examples). Consider the situation where we have one population of 200 subjects, and where each is tested once with three different diagnostic tests. We find that the (mean) apparent prevalence is 44% (88/200) and we wish to estimate the true prevalence. In terms of prior knowledge based on known biology and expert experience of the assays involved, we assume that the specificity for the third test is perfect (100%), and use prior Beta distributions for the specificity of the first and second tests of *Be*(9,1) for each, e.g. a mean of 90% accuracy and 2.5% and 97.5% quantiles of approximately 66.4% and 99.7% respectively. Non-informative, e.g. *Be*(1,1), priors are used for all other parameters. In other words, we are fairly confident that the specificity of the first and second tests will be reasonably good but we are not sure of exactly how good. We have no other evidence to assert prior knowledge into the modelling in respect of the other parameters. We also cannot discount (on biological grounds) covariance between the tests, and explicitly include a term in our model for covariance between the second and third tests when the subject truly has disease. Given the data, our prior assumptions (distributions) and our model structure, i.e. a multinomial model parameterised as three tests, one population and one covariance term, we can then use JAGS to produce an estimate of the true prevalence. We find using this particular model formulation that the mean true prevalence is 36.2% (see the computing appendix for detailed parameter estimates). What is also of some note is that if we were to assume that all these tests were conditionally independent (i.e. no covariance terms) then our mean estimate of the true prevalence drops to 18.3% (this example is also in the computing appendix). This highlights the crucial importance of model selection (as discussed later), and that it is essential to consider different covariance structures between tests, and then choose that which is most supported by the observed data.

### Disease prevalence across multiple population groups

In disease surveillance the objective is typically wider than estimating disease prevalence or diagnostic accuracy in respect of one or more independent population groups, but where estimates are desired across a large number of groups. This is particularly true when considering populations of food animals, where a main question of interest is the prevalence of disease in the national herd rather than on an individual farm. If multiple test results were available per subject/animal - which is uncommon due to the very considerable resources required - then such studies could be analyzed using the one population multiple test design (e.g.[27]). When considering populations structured into groups (i.e. farms or herds in the case of livestock), then issues such as within group correlation effects may need to be taken into account. In particular, what is typically desired is an estimate of the distribution of within-group (e.g. herd) disease prevalences based on observations from some random subset of individual groups.

In[19] a veterinary case study is presented utilizing a hierarchical model involving multiple herds and two conditionally independent tests, where the goal is to estimate the distribution of within-herd prevalences across many herds. A discussion of herd level testing in the absence of a gold standard diagnostic can also be found in[7]. A particularly important design of study is where only a single imperfect diagnostic test is used across many population groups, and such studies are amenable to analyzes which can be done in various ways. Using a hierarchical modelling approach such as the beta-binomial technique presented in[7] or alternatively using finite mixture modelling which seeks to identify distinct prevalence cohorts within a population[42]. While these are mathematically rather sophisticated models they are little more difficult to code in JAGS than other simpler models. Other ways to estimate the distribution of true prevalence across many population groups when only a single imperfect diagnostic test is available is to exploit laboratory replicates, as this can greatly increase the amount of data available in a study, but some care is required as replicates from the same subject will likely be correlated[43].