Emerging Themes in Epidemiology Methodological Issues in Estimating Survival in Patients with Multiple Primary Cancers: an Application to Women with Breast Cancer as a First Tumour

Background: Comparing survival of patients with a single tumour and patients with multiple primaries poses different methodological problems. In population based studies, where we cannot rely on detailed clinical information, the issue is disentangling the share of survival probability from the first and second cancer, and their compounded effect. We examined three hypotheses: A) the survival probability since the first tumour does not change with the occurrence of a second tumour; B) the probability of surviving a tumour does not change with the presence of a previous primary; C) the probabilities of surviving two subsequent primary tumours are independent (additivity hypothesis on mortality rates).


Introduction
The improvement of patients survival for the vast majority of neoplasms led to a substantial increase in the probability of further developing subsequent primary tumours. However, the study of multiple primary tumours on a population basis posed many additional problems. There is, indeed, a problem of differential diagnosis, when it comes to distinguish between local and distant metastases, recurrences and the onset of a truly new lesion. Classifications may also vary leading to substantial differences in rates. For example Surveillance Epidemiology and End Results (SEER) rules [1] differ substantially from those adopted by International Agency for Research on Cancer (IARC) [2].
Furthermore, survival of patients with multiple tumours has been neglected in population-based analyses, where they are usually list-wise deleted, or analysed for the first tumour occurrence only [3,4]. Only recently two studies [5,6] reconsidered this exclusion policy. On the contrary, in clinical series survival of patients with multiple tumours is usually defined clinically and specific cause of death is assessed accordingly. However, in population studies and in series from cancer registries, clinical information on patients follow-up is often unavailable and assessment of cause of death is based only on death certificates, often liable to gross misclassification. Heinävaara et al. [7] proposed to estimate the differential amount due to first or second tumour with a statistical parametric model. Their application dealt with patients with two primary breast cancers, where the question of disentangling the cancer-specific survival due to the first or the second tumour is more difficult, also from a clinical point of view. In the case of a subsequent primary cancer of a different origin the question is apparently simpler, although not yet investigated on a population basis.
The following questions can be raised: whether the overall survival of patients has decreased because of the interaction between the two cancers, or if it has been left substantially unchanged in comparison to those with one cancer only, or even increased. For example, active surveillance and care due to the first cancer can lead to earlier diagnosis of subsequent cancers and therefore to a longer survival (or a longer lead time). However, before studying the possible effect of surveillance and other prognostic factors (which was not the aim of this study), we should focus on the correct measurement of survival, which is our research objective.
To achieve this, we had to face many complex methodological challenges: first, we had to fix the zero reference time (the time from when we started the follow-up); second, a person can die only once thus the background death rate is confounded in the follow up information after the diagnosis of the second primary, therefore it is crucial to use models able to suitably describe a situation of competing risks; third, in order to make inferences, for each model we had to define the correct expected survival based on the appropriate comparison group.
We focused our attention on the following questions: A. Does the survival probability of a patient with a second primary tumour differ from those with only first type of tumour?
B. Does the survival probability of a patient with a second primary tumour differ from those with only second type of tumour?
If a difference in survival is found in some of the previous situations, a third more fundamental question arises.
C. Are the probabilities of surviving two subsequent primary tumours independent?
Studying survival probabilities in terms of the underlying hazard of death, the question can be rephrased as follow: Is the mortality rate after a second tumour simply the sum of the two intensities (additivity hypothesis), or the way the mortality rates act follows a different functional law?
This paper aims at answering these questions for women with breast cancer and a subsequent primary tumour, paying particular attention to the conditional survival probability due to the time elapsed between the two malignancies.

Statistical analysis
To correctly defining the probability of surviving conditional to be alive up to the occurrence of a second tumour, we started by writing questions A, B and C as hypothesis in term of mortality hazard. We defined: λ A (t): mortality rate for the population with two tumours at time t from the occurrence of the first tumour; λ B (t): mortality rate for the population with two tumours at time t from the occurrence of the second tumour; λ C, α (t): mortality rate at time t from the occurrence of the second tumour for the population with a second tumour given that they already survived a time interval α.
We can break these down as where, for i = 1, 2, λ i|0 is the specific mortality rate at time t from the occurrence of tumour i for the population with only that tumour, and λ 0 is the general mortality.
We assumed that λ 1|0 , λ 2|0 and λ 0 were known, by previous studies on mortality and survival in population with the first type of tumour only, with the second type of tumour only, and in the general population, respectively.
We observed that λ 1|2 (t) was the possible difference in mortality rate in patients with a tumour of type 1 followed by a tumour of type 2 with respect to that of patients with a tumour of type 1 only, measured from the occurrence of tumour 1; λ 2|1 (t) was the possible difference in mortality rate in patients with a tumour of type 1 followed by a tumour of type 2 with respect to that of patients with a tumour of type 2 only, measured from the occurrence of tumour 2.
Questions A, B, and C can be written as follows: Occurrence probabilities conditioned to different events (occurrence of a second cancer, death) in each time interval can be estimated with the Aalen-Johansen [8] (AJ) method in the framework of a Markov process, as described later. Once we obtained these conditional probabilities, we calculated the number of expected deaths by sex, age, calendar period and follow-up time, under the different hypotheses A, B and C. From a practical point of view, we calculated the expected deaths in a similar way to that used to calculate the denominator of relative survival [9]. For example, in the case of a woman diagnosed with breast cancer at 62 who developed a rectal cancer after two years and survived for an additional period of five years, we associated an expected probability of dying with a breast cancer, occurred at the same age, for the two years elapsed with that cancer only. Subsequently, we associated an expected probability of dying with breast and/or with rectal cancer for the following years, taking into consideration the ageing of the patient (i.e. using the annual probability of dying according to the age of the patient, from age 64 to age 69). The way the calculation of the expected number of death (or the expected probability of dying) for the conjoint period when both tumours are present is performed depends on which one of the three hypotheses we are testing. If we consider hypothesis A, we do not add the probability of dying with a colon-rectum cancer. If we test hypothesis B, we do not add the probability of dying associated to a breast cancer for the first period. Finally, if we test hypothesis C (additive hypothesis), we sum the two underlying mortality hazards during the second period. Expected probabilities were derived from analyses of the cohort of patients with only one incident cancer included in the cancer registry's data.
For the interested reader, we now explain in details how we calculated expected probabilities. Since different states are concerning, we resorted to the theory of Markov models [8]. In a Markov process individuals can belong to a finite set of states and move to one state to some others with a probability, possibly depending on time. The main hypothesis is that the probability of moving from state i to state j at time t depends on i, j and t only, and not on the previous history of the individual.
We constructed a simple model with three states 1 first tumour 2 second tumour 3 death after a first (but not a second) tumour where 2, and 3 are absorbing states and the possible moves are: 1 → 2, 1 → 3.
Since our data showed right censoring, transition probabilities P ij (s, t) from state i to state j, in the time interval (s, t) were calculated using Aalen-Johansen (AJ) estimators [8].
The procedure we adopted included age standardisation, and precisely: • For each age class k we calculated the AJ estimator P ijk (s, t). We let N k be the number of subjects in class k at time 0 and we set a weight , where N equals the sum of the N k 's.
• We defined the standardised estimator as: • It is reasonable to assume that weights are deterministic (fixed) variables; under this assumption we have: ( , )) ( ( , )) . 12 2 12 stand = ⋅ ∑ Then, from probabilities previously calculated with AJ estimators, it was possible to compare observed mortality with mortality expected in the hypothesis of no interaction between the two tumours; that is the mortality intensities due if the two tumours were independent. As a consequence, the number of expected deaths is the sum of the deaths due to mortality for both tumours acting separately. We calculated the number of expected deaths considering for each patient j the time of occurrence of the first primary malignancy T 1j , time of occurrence of the second primary malignancy T 2j , and, most important, the time interval between the occurrence of the two tumours α j = T 2j -T 1j . Each patient, after a time interval t 2 since the inception of the second tumour, has a probability p 2j (t 2 ) of dying for the second tumour or general mortality equal to that of the general population of patients with only that type of tumour, according to her/his age, sex, calendar period of diagnosis and follow-up time. In addition, that patient has a probability p 1j (t 2 + α j ) of dying at the (t 2 + α j )-time interval for the first tumour or general mortality again equal to that of the general population of patients with that type of tumour only, according to her/his age, sex, calendar period of diagnosis and follow-up time.
We set (t 2 ) = p 2j (t 2 )·(1 -p 0j ) where p 0j is the general mortality of the subject j according to her/his age, sex, calendar period of diagnosis, taken from the life tables of the general population. Thus, we can say that (t 2 ) is the specific mortality for the second tumour.
Therefore, within the cohort of patients with two malignancies, at the t 2 time interval since the second tumour, in the hypothesis of no interaction between the mortality forces of the two tumours, we expect the following number of deaths: where the probability of dying for the second tumour (t 2 ) is corrected by the probability of surviving from the first tumour and general mortality 1 -p 1j (t 2 + α j ).
Since the output of these calculations was the number of expected deaths, we consequentially compared it with the observed number in a ratio similar to the well known Standardised Mortality Ratio: We used the term SMR AJ because it was quite similar to the standard term "SMR"in the sense that it was that ratio of observed to expected deaths; the expected deaths were calculated as a sum over age groups; and finally, it was similar to the indirect method of age standardisation since, as standard, we applied the mortality rates of the cohort of patients with only one tumour.

Patients
We

Results
We identified 9233 women with breast cancer in Turin from 1985 to 1998, 563 cases were excluded as they were identified from Death Certificate Only (DCO) or synchronous primary tumours (same day of diagnosis for the two tumours) or they were patients with more than two tumours, leaving 8670 cases for analysis. From this cohort, 436 second (metachronous) primary tumours developed during the prolonged follow-up period (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002) Figure 1 where cumulative numbers of deaths are displayed over years of follow-up. It can be seen that in all graphs hypothesis C (additivity) tends to overestimate the actual observed trend, hypothesis A strongly underestimates it, while hypothesis B is the closest to observed data, especially during the first time (years) of follow-up.

Discussion
The dramatic improvement of cancer survival during the last decades in Western countries brought with it a new health threat: the development of second primary cancers in survivors. An editorial in CEBP of David Alberts clearly stated that 'Second cancers are killing us! ' [11]. However, in spite of the fact that several studies on the multiple primary cancer risk were undertaken [12], the rate at which first and second, or higher-order cancers are killing us remains neglected. In clinical studies, when reliable information are available it is often possible to understand if the pathological conditions linked to a specific cancer affected the patient survival and to which extent. How- ever, at a population based level this is often not feasible due to the lack of clinical information or cause of death. Even when cancer-specific causes of death are available, they are subject to various degrees of misclassification, hindering the possibility of a reliable estimate of cancer specific survival. In the main population based statistics on cancer survival worldwide available (Eurocare [3] and SEER [4]) subsequent cancers were excluded: only the first occurring cancer was analysed, or all the subjects with multiple cancers were deleted from analysis. Although, this strategy has recently undergone through a rethinking [5,6], it was supposed to allow for more comparable results across registries with different back up information, and therefore with a different possibility in identifying those cancers that occurred in prevalent cases. However, we believe that the problem deserves more attention also from its implication in the management and care of such patients. Indeed, a wider availability of effective cancer treatments has prolonged patient survival, so increasing the possibility of developing another cancer. Studying the occurrence of multiple tumours and their association, it emerged as the higher susceptibility to subsequent malignancies can possibly be due to unfavorable genetic pattern or common exogenous risk factors [13,14]. Multiple cancer survival is also a stimulating topic of study, but received less attention. Recently, an analysis of the SEER data on multiple tumours following breast cancer [15] showed that survival of women 20-29 years old at time of breast cancer diagnosis had a worse 10-year survival, compared with women with breast cancer only, while there were no differences in the 5-year survival. However, in that analysis the time elapsed until the second cancer occurrence was not taken into account.
Before investigating the reasons influencing survival for patients with multiple tumours, we, indeed, believe that it is essential to have a correct measurement of survival that takes into account the effect of conditional probabilities of surviving given the different timing of primary cancers occurrences. We proposed a method that assigns the correct number of expected events according to the different components of mortality due to each type of cancer. The proposed method is useful only in correctly stating the prediction of mortality probabilities while cannot explain the causes of the different mortality probabilities.
The expected number of deaths was calculated taking into account the exact time spent at risk of dying for one or another cancer by age classes and calendar period, using conditional probabilities estimated by the AJ estimator Cumulative number of deaths following different hypotheses for women with a second cancer after breast cancer Figure 1 Cumulative number of deaths following different hypotheses for women with a second cancer after breast cancer: (i) all tumours (ii) colorectal cancer (iii) corpus uteri cancer. from a simple one-way Markov process with two absorbing states. Such approach was recommended since it allowed a better control of probabilities of events arising from different states. In the model referring to hypothesis A, we calculated the expected number of deaths due to the first occurring cancer starting since its time of occurrence. This model is similar to model 2 proposed by Heinävaara and colleagues [7] in the absence of cancer specific cause of death. We wrote the model's parameters in terms of risk excess (hazard rate), rather than estimating the specific mortality rates. While survival of patients with a second primary tumour was comparable or higher with that of those patients with breast cancer only during the first years, it was rapidly declining at a higher rate than the reference group after five years of follow-up. This effect was explained by the fact those patients had survived an extra amount of time (a median of five years) before developing the subsequent cancer. Indeed, results from hypothesis A showed an increased cumulative mortality only at ten years for women with two cancers when compared to those with breast cancer only, as found in the study of Raymond and Hogue [15].
The second model (hypothesis B) was built with the same structure as model A, calculating the expected number of deaths due to the second occurring cancer starting since its time of occurrence. However, the change in the baseline population and the shift in the time zero reference made the hazard rates not comparable. Indeed, for a proper comparison with those patients with the second type of cancer only, we set the starting time at the diagnosis of the second cancer. In this case, the survival was comparable at 1 and 5 years of follow-up, than that of patients with one type of cancer only, while it was slightly shorter at 10 years. In summary, results from hypothesis B showed no extra mortality compared to patients with only one cancer of the same type, and observed and expected number of deaths closely get on during the years of observation.
We then addressed the question of evaluating the eventual extra mortality due to the combination of effects of the two primary neoplasms, checking the hypothesis if the mortality of women with two cancers was due to the sum of the baseline mortality rates of breast and other cancers (additivity hypothesis C). It clearly emerged how observed cumulative mortality was lower than expected under the additivity assumption, with a statistically significant difference in the case of all cancers and colon-rectum after 10 years of follow-up. The agreement of a specific model to observed data was therefore useful for having further hints of the underlying mechanisms. In our study, the less than expected results can be explained by the fact that the second cancer can have a less advanced stage and therefore a better prognosis, since a subsequent cancer is usually diagnosed because of a deeper clinical surveillance due to the first cancer. It is clear that women with breast cancer and a subsequent cancer survive less than women with breast cancer only, but their survival is not always decreased simply as it would be if the forces of mortality work together in an additive way.
The study has some possible limitations. First of all, the method of correction is based on observed rates (mortality rates measured in the cohort of patients with only one tumour) that, when based on small numbers, can be unstable. Then, this method, being inherently non-parametric, does not give information on the underlying incidence/mortality competing laws. In calculating expected number of deaths a possible bias could have been introduced, depending on the numbers of patients who emigrated outside the Cancer Registry's area. In this case, information on life status were still available and collected, but we did not know if the patient had developed a subsequent cancer when resident in another area. During the study period, we observed about 8% of women who emigrated among those classified with breast cancer only. Their median time of emigration was 6.5 years since the breast cancer diagnosis. As a consequence, considering that the median time for developing a second primary cancer was about 5 years, the detection bias should be very limited. Finally, the method was presently tested only on a limited set of data: patients with breast cancer as a first primary tumour. As few studies are still available on this topic, more research is needed, with larger samples and including clinical data (e.g. stage at presentation, hormone receptor status), therapies (e.g. tamoxifen), information on follow-up circumstances, and modality of diagnosis. In conclusion, we showed that the presented approach for calculating conditional probabilities was correct when dealing with situations, as with multiple tumours, where competing causes of death can bias the results of survival probabilities. We also pointed out how shifted reference times can be considered in correctly comparing survival. In addition, departure from the expected additive model can give hints towards which direction to further investigate.

Authors' contributions
SR conceived the idea for the study. SR, FR and LT planned and designed the research. FR developed the statistical models. LT revised the mathematical foundations of the proposed model. FR and LT analysed the data. SR and RZ wrote the first draft of the manuscript. RZ coordinated this project. All authors edited and approved the final version of the manuscript. The corresponding author has final responsibility to submit for publication. Preliminary results were presented by Fulvio Ricceri at the GRELL meeting 2006 in Palma de Mallorca awarding the "Enrico Anglesio Prize "offered by the "Anglesio/Moroni Foundation "of Turin, Italy. We thank the researchers and professors of the Me.Ri.Ma. group of the University of Turin (Department of Mathemathics) who shared their ideas with us and gave us their time and comments.