Mitigation of biases in estimating hazard ratios under non-sensitive and non-specific observation of outcomes–applications to influenza vaccine effectiveness

Background Non-sensitive and non-specific observation of outcomes in time-to-event data affects event counts as well as the risk sets, thus, biasing the estimation of hazard ratios. We investigate how imperfect observation of incident events affects the estimation of vaccine effectiveness based on hazard ratios. Methods Imperfect time-to-event data contain two classes of events: a portion of the true events of interest; and false-positive events mistakenly recorded as events of interest. We develop an estimation method utilising a weighted partial likelihood and probabilistic deletion of false-positive events and assuming the sensitivity and the false-positive rate are known. The performance of the method is evaluated using simulated and Finnish register data. Results The novel method enables unbiased semiparametric estimation of hazard ratios from imperfect time-to-event data. False-positive rates that are small can be approximated to be zero without inducing bias. The method is robust to misspecification of the sensitivity as long as the ratio of the sensitivity in the vaccinated and the unvaccinated is specified correctly and the cumulative risk of the true event is small. Conclusions The weighted partial likelihood can be used to adjust for outcome measurement errors in the estimation of hazard ratios and effectiveness but requires specifying the sensitivity and the false-positive rate. In absence of exact information about these parameters, the method works as a tool for assessing the potential magnitude of bias given a range of likely parameter values.


Introduction
Outcome measurement errors are common in epidemiological studies and may bias the estimated effects of exposures or interventions on health outcomes. When a binary outcome such as presence/absence of infection is measured with error, the problem is called outcome misclassification [1]. The impact of outcome misclassification on estimation of risk ratios has been studied thoroughly [2][3][4]. Nevertheless, the same lessons cannot be readily adopted when estimating hazard ratios from time-to-event data because imperfectly observed event times do not only affect event counts but may also bias the at-risk times and thus the risk set sizes.
A particular problem arises when estimating vaccine effectiveness as the relative reduction in the infection hazard. If infection-induced immunity reduces or removes the risk of subsequent infection with the same pathogen, non-sensitive measurement of infection inflates the risk set. For example, influenza is likely to immunise the human host at least temporarily and all infections in a large population are never recorded in practice. Moreover, false-positive records may occur due to imperfect specificity of diagnostic procedures.
Yang et al. [5] addressed estimation of vaccine effectiveness under non-specific observation of influenza infection using a subset of acute respiratory infections as a validation set on disease aetiology. An expectation-maximisation algorithm was developed to account for the uncertainty in the aetiology of infections outside the validation set [5]. Although the validation data carried information on the specificity, perfect sensitivity was assumed.
Meier et al. [6] focused on detection of chronic outcomes such as human immunodeficiency virus infection, which if initially missed can still be detected by later testing. A full-likelihood approach was developed to estimate the hazard ratio under repeated usage of an imperfect laboratory test, based on a proportional hazards (PH) model in discrete time and assuming the test sensitivity and specificity are known [6]. However, this method cannot be applied under imperfect observation of incident events, such as influenza infection, which by standard laboratory tests can only be detected up to one week after symptom onset [7].
The role of non-sensitive and non-specific observation of incident infection outcomes on the estimation of hazard ratios has thus not been fully covered in previous literature. We here study how outcome measurement errors affect the estimation of vaccine effectiveness based on hazard ratios. We modify the standard partial likelihood under the PH model [8] to adjust for outcome measurement errors in time-to-event data, assuming the sensitivity and the false-positive rate are known. We explore the magnitude of bias when the measurement errors are not corrected for and evaluate the robustness of effectiveness estimates to misspecification of the sensitivity and the false-positive rate. We implement the new method in R [9] (see Additional file 1: R script) and use simulated and Finnish register data to show its performance. Our work is motivated by the Finnish policy of estimating influenza vaccine effectiveness each season from register data [10], which do not include influenza-negative test results and thus do not allow for a retrospective design such as the widely used test-negative design [11]. Therefore, we here focus exclusively on cohort studies.

True and false-positive events
We consider the sensitivity of outcome measurement as the conditional probability for the true event of interest being recorded in the data. When the sensitivity is less than 1, some true events may not be recorded. Additionally, other events may be mistakenly recorded as events of interest. Such false-positive records make outcome measurement non-specific. Here, the true event means influenza infection while false-positive events are any other (non-influenza) events incorrectly recorded as influenza.
For any one subject, let the hazard of the true event at time t be (t) . We assume that the true event occurs at most once during the study period, and if it occurs, is recorded with sensitivity se . False-positive events may occur repeatedly at rate κ(t) . Originally, the data may thus comprise more than one event per subject. In the study, however, based on the assumption that the true event is unique, each subject's follow-up is set to end at his/her first recorded event or censoring, whichever occurs first. The data under study therefore comprise at most one recorded event per subject (Fig. 1). The event's status as true or false positive is indistinguishable by observation.
If the true event occurs but is not recorded, the subject's follow-up continues beyond the true event time, Fig. 1 Occurrence of true and false-positive influenza events. The figure shows the eight possible paths of events for a study subject during the study period (influenza season). The true event is depicted either by a white circle if it was recorded or by a crossed circle if it was not recorded. False-positive events are depicted by black circles. Although false positives may occur repeatedly, the figure shows only the first of these if any. The subject's true time at risk during the study period is marked by a solid line. In the study, the subject's follow-up (dashed line) ends at the time of the first recorded event, which is highlighted by a square around the event-defining circle, or at the time of censoring (vertical bar). Although recorded, the true event is not part of the data under study if there is a preceding false-positive event. The true at-risk time is then underestimated (Paths 2, 5 and 8). By contrast, Paths 6 and 7 show scenarios in which the true at-risk time is overestimated erroneously lengthening the at-risk time in the study. By contrast, should a false-positive event occur before the true event, the subject's follow-up ends prematurely and the true event time is not part of the data under study (Fig. 1). This also shows that minor violations of the above assumption of uniqueness would not compromise the study.

True and observed survival functions
The true survival function S(t) is the probability to escape the true event beyond time t . The observed survival function S (t) is the probability to avoid detection of the true event and the occurrence of any false-positive events beyond t . Assuming constant sensitivity ( se ) and falsepositive rate ( κ ), the relationship between the survival functions is (cf. Additional file 2: Web Appendix) The first right-hand-side term is the complement probability of the true event having occurred and been recorded by t . The second term is the probability of no false-positive events having occurred by t.

True and observed hazards
Given expression (1), the relation between the hazard ˜ (t) of the recorded event (observed hazard) and the hazard (t) of the true event (true hazard) follows (cf. Additional file 2: Web Appendix): where the weight w(t) is defined as The observed hazard is thus the sum of the hazard of recording the true event and the false-positive rate. The sensitivity se accounts for the possibility of not recording the true event. The weight w(t) equals the ratio of the true survival probability and the survival probability observed in absence of false positives and adjusts for the fact that a true but unrecorded event may have already removed the subject from the study's risk set before time t . It holds that w(t) ≤ 1. (2) .

Vaccine effectiveness
We compare the true hazards between two groups defined by vaccination as binary exposure. Specifically, 0 (t) and 1 (t) denote the true hazards for unvaccinated and vaccinated subjects, respectively, as functions of time since season onset. The estimand of interest is vaccine effectiveness ( VE ) defined as the relative reduction in the infection hazard [12]: In this paper, the true hazards in unvaccinated and vaccinated subjects are assumed to be proportional over time so that the VE estimand is constant. If κ = 0 , it follows from (2) that where and v = 0 (unvaccinated) or 1 (vaccinated).

Weighted partial likelihood under imperfect sensitivity in absence of false positives
The data under study comprise n recorded events in a cohort of unvaccinated and vaccinated subjects. The event times of the n cases are t 1 < t 2 < · · · < t n . Let Ñ 0 (t i ) and Ñ 1 (t i ) denote the numbers of unvaccinated and vaccinated subjects in the risk set at t i . Of note, a subject's vaccination status may change over time [10].
Under the PH assumption, the hazard ratio and thus VE can be estimated from complete and perfectly measured time-to-event data by maximising the standard partial likelihood [8]. Here, we adjust the partial likelihood to allow estimation of VE under imperfect sensitivity. When se 0 and se 1 are known, the partial likelihood of VE is.
where L i (VE) is the conditional probability for the event occurring to case i given the risk set at t i , and v i is 0 if case i is unvaccinated and 1 otherwise. Using (3), L i (VE) is obtained as Unlike the standard partial likelihood, (5) depends on weights w 0 (t) and w 1 (t) , which correct for the too large risk set following from imperfect sensitivity. Using the Kaplan-Meier estimate S (t) for S (t; se, κ = 0) leads to plug-in weights w 0 (t) and w 1 (t) (cf. Additional file 2: Web Appendix): VE is estimated by maximising (4) and its standard error ( SE ) can be obtained using the Fisher information. If se 0 = se 1 = 1 , (4) simplifies to the standard partial likelihood.

Probabilistic deletion of false-positive events
If false-positive events occur, i.e. if κ > 0 , semiparametric estimation using the weighted partial likelihood is not directly applicable as the true hazard 0 (t) does not cancel out affecting expression (5). We propose an approach that retains only a portion of the n recorded events by approximating the time-varying probability of the recorded event being a true event.
The probability that an event observed at time t i is a true event is given by the ratio of the hazard of recording the true event to the observed hazard. This probability is (cf. Additional file 2: Web Appendix) We suggest approximating ˜ v (t i ) over a short time window ( t i ) centred around t i as the number of events observed ( D v,i ) per person-time ( Ñ v (t i ) · �t i ) and, hence, an approximation to p v (t i ) is given by Subsequently, any event observed at t i is retained in the data with probability p v (t i ) , corresponding to censoring events at each t i with probability 1 − p v (t i ) . In analogy to multiple imputation, the above procedure is repeated a number of times to produce replicate data sets. Each resulting set of time-to-event data is analysed as in absence of false positives using the weighted partial likelihood. At the end, the VE estimates are pooled taking into account the within-and the between-imputation variability [13].

Set-up
We conducted a simulation study to assess the performance of the proposed methods. Briefly, true event times .
were simulated according to hazards 0 (t) (unvaccinated) and (1 − VE) · 0 (t) (vaccinated), where 0 (t) mimicked the force of infection in a Susceptible-Infected-Removed epidemic [14] with cumulative risk of 0.25 (alternatively 0.81) over a 196-day influenza season (cf. Additional file 2: Web Appendix). Two separate cohorts were considered, comprising 50,000 (30% vaccinated at season onset) and 1,000,000 (50% vaccinated) individuals, corresponding to the cohort sizes of Finnish children and elderly, respectively [10,15,16]. VE was 10%, 30%, 50%, 70% or 90%. For each individual, observed true events were realised by retaining simulated true events with sensitivities se 0 (unvaccinated) and se 1 (vaccinated). Values se 0 = se 1 = 0.04 were based on a Finnish study of the 2009/10 influenza season [17]. Alternatively, values se 0 = 0.05 and se 1 = 0.03 were employed to investigate differential sensitivity. A false-positive event time was sampled from the exponential distribution with rates corresponding to 2% or 16% of all recorded events in the unvaccinated being false-positive. The smaller of the observed true and false-positive event times was used as the recorded event time for the individual.
For each setting (cohort size, VE , se 0 , se 1 and κ ), 10 4 repeated datasets were simulated. For each dataset, ten random subsets were created by retaining events with probability p(t) as in (8). Adjusted VE estimates were computed with the same values of se 0 , se 1 and κ as used in simulation. In addition, naïve VE estimates were obtained by incorrectly assuming perfect sensitivity ( se 0 = se 1 = 1 ) and/or absence of false positives ( κ = 0 ). Finally, the ten dataset-specific estimates were pooled resulting in 10 4 estimates of VE and SE per setting.
We report the bias as the difference between the mean of VE estimates ( VE ) and the true VE . We compare the mean of the SE estimates ( SE ) with the empirical standard error of the VE estimates ( SE VE ). The estimation error ( √ MSE VE ) is assessed as the root-mean-squared error between the VE estimates and the true VE . The empirical coverage probability of the 95% confidence interval (CI) was estimated as the percentage of 10 4 CIs that included the true VE. Tables 1 and 2 show the adjusted and naïve VE estimates under non-differential sensitivity ( se 0 = se 1 = 0.04 ) and differential sensitivity (se 0 = 0.05 , se 1 = 0.03 ), respectively, with κ = 0 and cumulative risk of 0.25 in the unvaccinated. Table 3 and Additional file 2: The adjusted VE 's are unbiased. Therefore, the estimation error is equal to the standard error ( √ MSE VE = SE VE ). Because the uncertainty in the plugin weights is not taken into account, the standard errors are slightly underestimated ( SE < SE VE ), leading to smaller than nominal CI coverage probabilities. Under non-differential sensitivity, the naïve VE 's underestimate the true VE . The bias is stronger when the cumulative risk is high (0.81). When the cumulative risk is small (0.25), the estimation errors in the naïve and adjusted estimates are similar. However, as standard errors may be small, even slight biases can lead to poor CI coverage. Under differential sensitivity

Table 1 Estimates of vaccine effectiveness ( VE ) under non-differentially imperfect sensitivity and small cumulative risk of infection in absence of false-positive events
Mean of the vaccine effectiveness estimates ( VE ), mean of the standard error estimates ( SE ), standard error of the vaccine effectiveness estimates ( SE VE ), rootmean-squared error of the vaccine effectiveness estimates ( √ MSE VE ), bias in percentage points, and empirical coverage probability (Cov) of the 95% confidence intervals when estimating vaccine effectiveness from 10 4 repeated data sets under non-differential sensitivity ( se 0 = se 1 ) of 0.04 and a cumulative risk of 0.25 in the unvaccinated in absence of false-positive events. Naïve estimation was conducted under the incorrect assumption of perfect sensitivity ( se 0 = se 1 = 1)

True
Estimation adjusted for se 0 = se 1

Table 2 Estimates of vaccine effectiveness ( VE ) under differential sensitivity and small cumulative risk of infection in absence of false-positive events
Mean of the vaccine effectiveness estimates ( VE ), mean of the standard error estimates ( SE ), standard error of the vaccine effectiveness estimates ( SE VE ), root-meansquared error of the vaccine effectiveness estimates ( √ MSE VE ), bias in percentage points, and empirical coverage probability (Cov) of the 95% confidence intervals when estimating vaccine effectiveness from 10 4 repeated data sets under differential sensitivity of 0.05 ( se 0 ) and 0.03 ( se 1 ) and a cumulative risk of 0.25 in the unvaccinated in absence of false-positive events. Naïve estimation was conducted under the incorrect assumption of perfect sensitivity ( se 0 = se 1 = 1)

True
Estimation adjusted for se 0 = 0. ( se 0 > se 1 ) and small cumulative risk, the naïve VE 's overestimate the true VE . The estimation error in the naïve estimates exceeds the one in the adjusted estimates indicating that the estimation is not robust to gross misspecification of se 0 and se 1 . The error attenuates when the cumulative risk is high. Table 4 and Additional file 2: Table S2 (see Additional file 2: Web Appendix) show the adjusted and naïve VE estimates under non-differential sensitivity ( se 0 = se 1 = 0.04 ) and differential sensitivity (se 0 = 0.05 , se 1 = 0.03 ), respectively, with cumulative risk of 0.25 and false-positive proportion of 2% among unvaccinated. In general, the results correspond to the situation without false positives. The adjusted VE 's are essentially unbiased. As the naïve VE 's do not differ much between settings with and without false positives, a false-positive rate corresponding to a false-positive proportion of 2% among unvaccinated does not essentially affect the estimation. However, under the higher false-positive proportion (16%) naïve estimates are biased but adjusted estimation performs well (Table 5).

Influenza vaccine effectiveness in the Finnish elderly
This section presents estimates of influenza vaccine effectiveness in Finnish elderly in 2016/17, a season dominated by influenza subtype A/H3N2. Briefly, a nationwide cohort of individuals aged 65 years and above was monitored through the season (196 days), using data collected as part of healthcare routines. The outcome was laboratory-confirmed influenza, which is a non-sensitive measurement of influenza infection as not everyone seeks healthcare or is swabbed. The register-based cohort study design is described in detail elsewhere [10]. For simplicity, we here focus on outcome measurement errors assuming absence of other sources of bias such as exposure measurement errors or confounding. The cohort totalled 1,160,986 individuals of which 47% were vaccinated during the season. There were 8389 recorded events of which 3346 occurred in vaccinated individuals. Unlike in the simulation study, VE , se 0 , se 1 and κ were unknown. The sensitivities ( se 0 , se 1 ) were set at 0.04 (cf. Shubin et al. [17]). The false-positive rate ( κ ) was deemed very small and thus approximated as 0.

Table 3 Estimates of vaccine effectiveness ( VE ) under non-differentially imperfect sensitivity and high cumulative risk of infection in absence of false-positive events
Mean of the vaccine effectiveness estimates ( VE ), mean of the standard error estimates ( SE ), standard error of the vaccine effectiveness estimates ( SE VE ), rootmean-squared error of the vaccine effectiveness estimates ( √ MSE VE ), bias in percentage points, and empirical coverage probability (Cov) of the 95% confidence intervals when estimating vaccine effectiveness from 10 4 repeated data sets under non-differential sensitivity ( se 0 = se 1 ) of 0.04 and a cumulative risk of 0.81 in the unvaccinated in absence of false-positive events. Naïve estimation was conducted under the incorrect assumption of perfect sensitivity ( se 0 = se 1 = 1)

True
Estimation adjusted for se 0 = se 1  to be less likely detected ( se 1 /se 0 < 1 ), the VE estimates were smaller than 23%, and vice versa. For example, if se 1 = 0.04 and se 0 = 0.05 , VE was 2% with 95% CI from -3% to 6%, indicating that vaccination may not have been effective.

Discussion
Motivated by the Finnish policy of evaluating influenza vaccine effectiveness from register data, we developed a weighted partial likelihood approach with probabilistic deletion of false positives to adjust for

Table 4 Estimates of vaccine effectiveness ( VE ) under non-differentially imperfect sensitivity, small cumulative risk of infection and low rate of false-positive events
Mean of the vaccine effectiveness estimates ( VE ), mean of the standard error estimates ( SE ), standard error of the vaccine effectiveness estimates ( SE VE ), rootmean-squared error of the vaccine effectiveness estimates ( √ MSE VE ), bias in percentage points, and empirical coverage probability (Cov) of the 95% confidence intervals when estimating vaccine effectiveness from 10 4 repeated data sets given non-differential sensitivity ( se 0 = se 1 ) of 0.04 and a cumulative risk of 0.25 in the unvaccinated. The false-positive events occurred at rate κ = 10 −6 (per person-day) corresponding to a false-positive proportion of 2% among the unvaccinated. Naïve estimation was conducted under the incorrect assumptions of perfect sensitivity ( se 0 = se 1 = 1 ) and absence of false positives ( κ = 0)

Table 5 Estimates of vaccine effectiveness ( VE ) under non-differentially imperfect sensitivity, small cumulative risk of infection and high rate of false-positive events
Mean of the vaccine effectiveness estimates ( VE ), mean of the standard error estimates ( SE ), standard error of the vaccine effectiveness estimates ( SE VE ), rootmean-squared error of the vaccine effectiveness estimates ( √ MSE VE ), bias in percentage points, and empirical coverage probability (Cov) of the 95% confidence intervals when estimating vaccine effectiveness from 10 4 repeated data sets given non-differential sensitivity ( se 0 = se 1 ) of 0.04 and a cumulative risk of 0.25 in the unvaccinated. The false-positive events occurred at rate κ = 10 −5 (per person-day) corresponding to a false-positive proportion of 16% among the unvaccinated. Naïve estimation was conducted under the incorrect assumptions of perfect sensitivity ( se 0 = se 1 = 1 ) and absence of false positives ( κ = 0)

True
Estimation adjusted for se 0 = se 1 = 0.04, κ = 10 −5 Naïve estimation outcome measurement errors. A simulation study demonstrated that the new method enables unbiased estimation of hazard ratios from time-to-event data when the underlying sensitivity of outcome measurement and the false-positive rate are known. In practise, false-positive rates that are small in relation to the true hazard can be approximated to be zero without inducing bias. Moreover, the analysis of empirical data showed that the method is robust to misspecification of the sensitivity parameters as long as their ratio ( se 1 /se 0 ) is set correctly and the cumulative risk of the true event is small. We assumed the influenza vaccine's mode of action is "leaky", i.e. that vaccination provides only partial protection [12,18,19]. The appropriate effect measure of effectiveness is therefore the relative reduction in the infection hazard, assumed here to be constant over one influenza season. When estimating effectiveness based on the risk ratio, it has previously been shown that bias is determined by the ratio of the two sensitivity parameters [4]. We demonstrated that the same applies when effectiveness is based on the hazard ratio but only if the cumulative risk of the outcome is small, i.e. if the outcome is rare so that the risk set is largely unaffected by the occurrence of events.

VE
Unlike in studies that exclusively refer to sensitivity as performance of a utilised laboratory test (e.g. [2]), register-based studies (e.g. [20]) use sensitivity in a broader sense as resulting from recording accuracy, healthcare  t) ) and the corresponding estimates of the true survival functions ( S 0 (t) , S 1 (t) ) based on (1) assuming non-differential sensitivity ( se 0 , se 1 ) of 0.04 and absence of false-positive events. The estimated cumulative risks ( 1 − S 0 (t) , 1 − S 1 (t) ) at t = 196 (days) were 0.20 and 0.16. B: The linear relation between the log-log transformed survival functions S 0 (t) and S 1 (t) supports the proportional hazards assumption (cf. Additional file 2: Web Appendix). C: Time evolution of vaccine effectiveness estimates (solid line) and pointwise 95% confidence intervals (dashed lines) based on (4). D: Dependence of vaccine effectiveness estimates at t = 196 (days) on the assumed values of se 0 (symbols) and ratio se 1 /se 0 (horizontal axis) based on (4). The plot area has been restricted showing only non-negative vaccine effectiveness estimates. For the full range see Additional file 2: Figure S1 (see Additional file 2: Web Appendix)