Development and application of a dynamic prediction model for esophageal cancer
Original Article

Development and application of a dynamic prediction model for esophageal cancer

Kunpeng Du1,2#^, Lixian Li3#, Qi Wang2, Jingwen Zou4, Zhongjian Yu1,2, Jiqiang Li5, Yanfang Zheng1,2^

1Affiliated Cancer Hospital & Institute of Guangzhou Medical University, Guangzhou, China; 2Oncology Center, Zhujiang Hospital of Southern Medical University, Guangzhou, China; 3Department of Medical Matters, Puning People’s Hospital, Puning, China; 4Department of Liver Surgery, Sun Yat-sen University Cancer Center, Guangzhou, China; 5Department of Radiation Oncology, Oncology Center, Zhujiang Hospital of the Southern Medical University, Guangzhou, China

Contributions: (I) Conception and design: K Du, L Li, Y Zheng; (II) Administrative support: None; (III) Provision of study materials or patients: Z Yu, Q Wang; (IV) Collection and assembly of data: K Du, J Zou; (V) Data analysis and interpretation: L Li; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

#These authors contributed equally to this work.

^ORCID: Kunpeng Du, 0000-0002-0684-7291; Yanfang Zheng, 0000-0002-5591-6425.

Correspondence to: Jiqiang Li. Department of Radiation Oncology, Oncology Center, Zhujiang Hospital of the Southern Medical University, Guangzhou, China. Email: ljq821028@126.com; Yanfang Zheng. Affiliated Cancer Hospital & Institute of Guangzhou Medical University, Guangzhou, China. Email: 18665000236@163.com.

Background: Current prediction models of esophageal cancer (EC) are limited to predicting at a specific time point, and ignore changes in hazard ratios of predictive variables, known as time-varying effects. Our study aimed to investigate variables with time-varying effects in EC and to develop a prediction model that can update the 5-year predicted dynamic overall survival (DOS) probability during the follow-up period.

Methods: Firstly, the clinicopathological information and survival data of 4,541 patients with EC was obtained from the Surveillance, Epidemiology, and End Results (SEER) database between 2007 and 2011 for modeling. Secondly, the time-varying effect of variables was assessed and the dynamic prediction model was developed based on the proportional baselines landmark supermodel.

Results: Here, we found that age at diagnosis, sex, location of primary tumor, histological type, chemotherapy, surgery, and T stage showed significant time-varying effects on overall survival. Thirdly, the prediction model was validated by an internal SEER validation cohort and a Chinese patient cohort, respectively, and achieved promising results as follows: area under the curve (AUC) =0.733 (internal validation) and 0.864 (external validation). The heuristic shrinkage factor was 0.995. Finally, several clear cases were selected as examples for model application to map the patient’s 5-year DOS curves and to respectively demonstrate the impact of different variables’ time-varying effect on survival.

Conclusions: Overall, our results suggest that the existence of time-varying effect highlights the importance of updating the predicted survival probability during the follow-up period. Moreover, this prediction model can be used to assist doctors in making more-individualized treatment decisions based on a dynamic assessment of patient prognosis.

Keywords: Esophageal cancer (EC); cancer prognosis; the proportional baselines landmark supermodel; dynamic prediction; Surveillance, Epidemiology, and End Results (SEER) database


Submitted Aug 13, 2021. Accepted for publication Oct 20, 2021.

doi: 10.21037/atm-21-4964


Introduction

Esophageal cancer (EC) is a malignant tumor with an extremely poor prognosis, with a 5-year overall survival (OS) rate of approximately 20% (1). Different pathological types of EC differ greatly in terms of their male to female ratio, time trends, geographic patterns, and primary risk factors across countries (2-4). These diverse characteristics can interact with survival outcomes, making it difficult to obtain estimations of individual prognosis. Therefore, there is an urgent need for accurate survival prediction tools that take into account the heterogeneity of patients to help clinicians predict individual survival and propose treatment recommendations. A recent systematic review indicated that there were at least 15 prediction models for EC patients between January 1st, 2000 and February 6th, 2017 (5). However, most previous prediction models were developed using the Cox proportional hazards model, which fundamentally assumes that the hazard ratio of covariates does not change with time (6). However, several studies have discovered that some prognostic variables may exhibit time-varying effects on the outcome, leading to changes in mortality risk over time (7-12). Therefore, these predicted results could be misleading if covariates exhibit time-varying effects.

In addition, there is a practical problem that the currently available prediction models cannot solve in the study of EC. For example, a patient may pay more attention to the survival probability or mortality of “w” years after a cancer diagnosis, which is often asked by questions such as: “How long will I live?” or “What is the probability of being alive ‘w’ years from now?” Furthermore, these questions are not only asked at diagnosis, but also at any time during the follow-up (FU) visits. However, most existing prediction models ignore this issue because they were designed only based on the patient’s baseline status at diagnosis or at a specific time during treatment, and ultimately obtain a corresponding 3- or 5-year survival rate. However, they are unable to calculate survival probabilities at different time points and therefore cannot reflect the change in patient survival probability during the follow-up period. Therefore, van Houwelingen et al. proposed a new dynamic prediction model based on the proportional baselines landmark supermodel (PBLS), which takes time-varying effects into account and is able to update survival probabilities over time (13,14). The predicted 5-year OS probability is known as a 5-year dynamic overall survival (DOS). Our previous research has compared the effect of using the PBLS versus the Cox proportional hazards model when constructing a cervical cancer prediction model in the context of time-varying effects. We found that with the time-varying effects, the PBLS model was recommended to predict a patient’s w year dynamic survival rate (15). To the best of our knowledge, no previous dynamic prediction model based on the PBLS has been developed for patients with EC.

The aim of this research was to explore covariates with time-varying effects in EC and to develop a universally applicable and accurate prediction model that can dynamically predict survival probabilities for patients with EC during the entire follow-up period. Therefore, a prediction model with time-varying effects was developed and internally validated using the Surveillance, Epidemiology, and End Results (SEER) database. Moreover, an independent Chinese EC patient cohort was used for external validation of the model. The resulting model can predict an individual patient’s 5-year survival probability at different prediction time points up to 5 years after EC diagnosis. Specific patient examples were also used to illustrate how predicted survival probabilities vary at different time points during follow-up and how the model can assist clinicians in their medical practice. Compared with previous studies, the innovation of our model was that the variables with time-varying effects were taken into account, which enabled the model to dynamically predict the survival probabilities of patients at different time points during the follow-up period. We present the following article in accordance with the TRIPOD reporting checklist (available at https://dx.doi.org/10.21037/atm-21-4964).


Methods

Data source

The SEER database is a population-based cancer database that covers approximately 28% of the U.S. population. Patient information, including demographics, clinical characteristics, pathological features, treatment, and survival data were downloaded from the SEER 18 Regs Custom Data (with additional treatment) released in November 2018 Sub [1975–2016] using SEER*Stat version 8.3.6. The information of 19,362 cases of patients with microscopically-confirmed EC was extracted between January 2007 and December 2011. Only histologic codes for squamous cell cancers (ICD-O-3 histology codes: 8000-8046, 8051-8131, 8148-8157, 8230-8249, 8508, 8510-8513, 8560-8570, 8575, 8950, 8980-8981) and adenocarcinoma (codes: 8050, 8140-8147, 8160-8162, 8170-8175, 8180-8221, 8250-8507, 8514-8551, 8514-8551, 8576, 8940-8941, 8140-8573) were included in the research.

Cohort selection

Baseline patient- and tumor-specific factors included in the model were as follows: age at diagnosis, marital status, race, sex, histological type, primary tumor site, grade, T stage, N stage, M stage, surgery primary site, radiation, chemotherapy, survival months, and vital status. The period was restricted between 2007 and 2011, during which patient pathological staging was characterized according to the American Joint Committee on Cancer (AJCC) Tumor Node Metastasis (TNM) sixth edition staging criteria.

The exclusion criteria were as follows: (I) patients who were not diagnostically confirmed by positive histology (N=904); (II) those whose tumor was not the first malignant primary indicator (N=4,992); (III) patients whose reporting source was an autopsy, hospice, death certificate, or nursing home (N=204); (IV) those with an unknown marital status (N=648); (V) patients of unknown or American Indian/Alaska Native race (N=87); (VI) those with a primary tumor site code C15.1 or C15.9 (N=1,484); (VII) patients whose histologic type was not adenocarcinoma or squamous cell carcinoma (N=26); (VIII) those lacking a histological grade (N=1,832); (IX) patients with an unknown (N=1,869) or T0 T stage (N=4); (X) those without specific N and M stages (N=384); (XI) patients with a surgery primary site code 10–27 (local tumor excision, N=23); (XII) those with a surgery primary site code 90 (NOS, N=23) or 99 (unknown, N=9); (XIII) patients with a radiation code radioisotopes (N=1); and (XIV) those with unknown or <3 survival months (N=1,302). Finally, 5,423 patients were eligible for inclusion in this study. These patients were randomly divided into a training cohort (N=4,541) and an internal validation cohort (N=882) at a ratio of 5:1. The screening process is presented in detail in Figure 1.

Figure 1 Flow chart of patients’ selection from the SEER database. The information of 19,362 cases of esophageal cancer patients diagnosed between 2007 and 2011 was downloaded from the SEER database. After screening, 5,423 eligible patients were included in this study. These eligible patients were randomly divided into a training cohort (N=4,541) and an internal validation cohort (N=882) at a ratio of 5:1. SEER, the Surveillance, Epidemiology, and End Results database.

The eligible data were defined, integrated, and grouped. First, data were divided by age into five groups: age <50, 50–59, 60–69, 70–79, and >80 years. Patients who were separated, divorced, single patients (never married), or widowed at diagnosis were integrated into the unmarried group, and married patients (including common-law marriages) were designated as the married group. Tumor sites were divided into four groups: upper third of the esophagus (C15.0, C15.3), middle third of the esophagus (C15.4), lower third of the esophagus (C15.2, C15.5), and overlapping lesion of the esophagus (C15.8). Patients were grouped into radiotherapy and no radiotherapy/unknown groups based on their radiotherapy treatment. Surgery primary site reflected whether the patient has undergone surgery and the surgical site. Thus, patients in this study were divided into surgery (codes: 30–80) and no surgery (codes: 0) groups.

A retrospective Chinese patient cohort consisting of 99 EC patients from the Zhujiang Hospital of the Southern Medical University (Guangzhou, Guangdong Province, China) between January 2004 and September 2010 was used to externally validate this dynamic prediction model. The inclusion and exclusion criteria for all cases were the identical to the screening criteria for the SEER database. All procedures performed in this study involving human participants were in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Medical Ethics Committee of Zhujiang Hospital, Southern Medical University, Guangzhou, China (ethics committee approval number: 2020-KY-001-01). Individual consent for this retrospective analysis was waived. Using SEER data does not require additional informed consent as patient privacy information is protected by the SEER cancer registries.

Statistical analysis

All-cause mortality (death from any cause) served as the primary endpoint in this study. Survival time was measured in years from the date of diagnosis until (I) the date of death, (II) the date last known to be alive, or (III) December 31, 2016.

The categorical data were indicated as frequencies (percentages). The Kaplan-Meier curves of OS were compared using the log-rank test. The univariable and multivariable Cox proportional hazard (PH) models were used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs). The PHs assumption was checked using the Grambsch-Therneau test. A PBLS (for more details, see Supplemental method I) was established to obtain the 5-year DOS. Firstly, the prediction window was fixed at w=5, where w was the patient’s response to the question “How long will I live?” at any prediction time point (s[s1,sL]). Next, a set of prediction landmark time points {sl=ssLs1=s5,l=1,2,3,...,20,21}={s1,s2,s3,...,s20,s21}={05,0.255,0.55,...,4.755,55} was selected at every third month between 0 and 5 years after the diagnosis of EC (see the blue circles and the yellow parts in Figure S1). A Cox PH model for 5-year OS at a specific s was then estimated on the subset of patients, who were still alive at s and administrative censored at s + w (see the red circles and the blue parts in Figure S1). A super prediction data set (for construction, see Supplemental method II) was stacked with all-created small subsets (the number of patients in each landmark subset is shown in Figure S2). The following model was constructed for our study, where β(s)=γ0+γ1s+γ2s2 and g(s)=s+s2.

h(t|Z,s,w)=h0(t)exp(Zβ(s)+θ(s)),sts+w

A backward selection procedure was then used to select covariates with time-varying effects in two steps. Initially, all of the interactions (Z × s2) between quadratic s2 and the covariates were tested, and non-significant terms were then removed. In the second step, interactions (Z × s2) for linear s and prognostic factors were tested, and only significant effects were retained. The w-year dynamic HR at different time points could then be calculated using the following equations:

HRw(sl)=exp(γ0+γ1sl+γ2sl)=exp(γ0+γ1×(s/5)+γ2×(s/5)2),s[0,5]

The performance of the model was evaluated in terms of both discrimination and calibration. The model’s ability to correctly discriminate between patients was evaluated using the area under the curve (AUC). Calibration was evaluated using the heuristic shrinkage factor. All analyses were performed using R software (version 3.6.1) (https://www.r-project.org/), and the significance level was set at 0.05.


Results

Patient characteristics

A total of 5,423 EC patients from the SEER database were included in the analyses. 4,541 randomly-assigned patients were used as the training cohort for the development of the prediction model, and 882 patients were used as the internal validation cohort for the model. The median follow-up time for the training cohort was 16.00 (95% CI: 15.31–16.69) months (range, 4–119 months), while the 3- and 5-year survival rates of the training cohort were 27.99% (95% CI: 26.71–29.32%) and 20.28% (95% CI: 19.14–21.49%), respectively. The median follow-up time for the interval validation cohort was 18.00 (95% CI: 16.19–19.81) months (range, 4–119 months), while the 3- and 5-year survival rates of the interval validation cohort were 28.53% (95% CI: 25.70–31.68%) and 21.01% (95% CI: 18.46–23.90%), respectively. A total of 99 patients from a Chinese patient cohort were investigated as the external validation cohort. Their median follow-up time was 13.00 (95% CI: 10.94–15.07) months (range, 2–98 months), and the 3- and 5-year survival rates of the entire cohort were 15.15% (95% CI: 9.51–24.15%) and 9.26% (95% CI: 4.78–17.96%), respectively. The OS curves of the three cohorts are shown in Figure 2. The baseline demographics and tumor characteristics of the included patients are presented in Table 1. The Kaplan-Meier survival curves (Figure S3) for age at diagnosis, sex, T stage, chemotherapy, and radiotherapy had intersecting evidence. Moreover, the Cox PH model (Table S1) could not satisfy the PH assumption that the HR is constant over time. To obtain the 5-year DOS, we used the PBLS to analyze our study.

Figure 2 Overall survival curves of esophageal cancer patients in three cohorts. The red line exhibits the overall survival of patients in the SEER training cohort, and the corresponding red shaded area is the confidence interval. The green line shows the overall survival of patients in the SEER internal validation cohort, and the corresponding green shaded area is the confidence interval. The blue line exhibits the overall survival of patients in the external validation cohort from Zhujiang hospital, and the corresponding blue shaded area is the confidence interval. The number of patients at risk periodically with the time of each cohort is exhibited in the same color at the bottom of figure. SEER, the Surveillance, Epidemiology, and End Results database.

Table 1

Baseline characteristics of cohort patients

Characteristics Training cohort (N=4,541) Internal validation (N=882) External validation (N=99)
n % n % n %
Age
   <50 years 416 9.16 79 8.96 1 1.01
   50–59 years 1,105 24.33 206 23.36 42 42.42
   60–69 years 1,572 34.62 311 35.26 33 33.33
   70–79 years 1,008 22.20 190 21.54 13 13.13
   80+ years 440 9.69 96 10.88 2 2.02
Sex
   Male 823 18.12 136 15.42 80 80.81
   Female 3,718 81.88 746 84.58 19 19.19
Marriage status
   Married 2,807 61.81 561 63.61 99 100.00
   Unmarried 1,734 38.19 321 36.39 0 0.00
Race
   Asian or Pacific islander 205 4.51 35 3.97 99 100.00
   Black 423 9.32 74 8.39 0 0.00
   White 3,913 86.17 773 87.64 0 0.00
Histological types
   Squamous cell carcinoma 1,409 31.03 260 29.48 91 91.92
   Adenocarcinoma 3,132 68.97 622 70.52 8 8.08
Primary tumor site
   Upper third of esophagus 289 6.36 44 4.99 10 10.10
   Middle third of esophagus 703 15.48 126 14.29 40 40.40
   Lower third of esophagus 3,375 74.32 679 76.98 20 20.20
   Overlapping lesion 174 3.83 33 3.74 29 29.29
Grade
   Grade I 261 5.75 45 5.10 6 6.06
   Grade II 1,926 42.41 380 43.08 62 62.63
   Grade III/grade IV 2,354 51.84 457 51.81 31 31.31
T stage
   T1 1,366 30.08 240 27.21 3 3.03
   T2 607 13.37 127 14.40 11 11.11
   T3 1,970 43.38 412 46.71 63 63.64
   T4 598 13.17 103 11.68 22 22.22
N stage
   N0 1,953 43.01 392 44.44 39 39.39
   N1 2,588 56.99 490 55.56 60 60.61
M stage
   M0 3,302 72.72 658 74.60 79 79.80
   M1 1,239 27.28 224 25.40 20 20.20
Surgery
   Yes 1,795 39.53 352 39.91 68 68.69
   No/unknown 2,746 60.47 530 60.09 31 31.31
Chemotherapy
   Yes 3,520 77.52 670 75.96 69 69.70
   No/unknown 1,021 22.48 212 24.04 30 30.30
Radiation
   Yes 3,185 70.14 597 67.69 43 43.43
   No/unknown 1,356 29.86 285 32.31 56 56.57

Variables with time-constant and time-varying effects

Regression coefficients and HRs with 95% CI for the variables included in the model are represented in Table 2 and Figure 3. Variables with time-constant and time-varying effects on the 5-year DOS were also determined. Patient baseline demographics and tumor characteristics, including marital status, race, grade, N stage, M stage, and radiotherapy, had a time-constant effect (Table 2, Figure 3). The HR for these variables was constant regardless of time point during the follow-up period (Figure 3B,3C,3G,3I,3J,3M). For instance, the HR for unmarried patients compared to married patients was 1.156 (95% CI: 1.061–1.260) at the time of diagnosis with EC (0 years). During the following 5 years after diagnosis, the HR value remained at 1.156, demonstrating a significant time-constant effect (Figure 3B).

Table 2

Covariates with time-constant effects and time-varying effects by Dynamic prediction

Covariates Regression coefficient Hazard ratio (95% CI) P value
Covariates with time-constant effects
   Marriage status (ref: married)
    Unmarried 0.145 1.156 1.061–1.260 0.001
   Race (ref: White)
    Black 0.172 1.187 1.024–1.377 0.023
    Asian or Pacific Islander −0.195 0.823 0.675–1.004 0.055
   Grade (ref: grade I)
    Grade II 0.187 1.206 1.005–1.447 0.044
    Grade III/grade IV 0.353 1.424 1.187–1.708 <0.001
   N stage (ref: N0)
    N1 0.238 1.269 1.162–1.386 <0.001
   M stage (ref: M0)
    M1 0.391 1.479 1.332–1.642 <0.001
   Radiation (ref: yes)
    No/unknown −0.019 0.981 0.880–1.094 0.735
Covariates with time-varying effects
   Age at diagnosis (ref: per 10 years)
    Constant
      Age 0.075 1.078 1.042–1.116 <0.001
    Time-varying effect
      Age (s) −0.111 0.895 0.715–1.121 0.334
      Age (s2) 0.450 1.568 1.161–2.119 0.003
   Sex (ref: female)
    Constant
      Male 0.175 1.192 1.066–1.332 0.002
    Time-varying effect
      Male (s) 0.664 1.943 1.287–2.935 0.002
   Histological types (ref: adenocarcinoma)
    Constant
      Squamous cell carcinoma −0.020 0.980 0.879–1.092 0.715
    Time-varying effect
      Squamous cell carcinoma (s) −0.815 0.443 0.257–0.764 0.003
      Squamous cell carcinoma (s2) 0.892 2.441 1.331–4.476 0.004
   Primary tumor site (ref: lower)
    Constant
      Upper −0.076 0.926 0.779–1.101 0.387
      Middle 0.018 1.018 0.894–1.159 0.788
      Overlapping 0.296 1.344 1.110–1.628 0.002
    Time-varying effect
      Upper (s) 0.238 1.269 0.660–2.439 0.475
      Middle (s) 0.563 1.756 1.170–2.636 0.007
      Overlapping (s) −0.087 0.917 0.355–2.366 0.857
   Chemotherapy (ref: yes)
    Constant
      No/unknown 0.216 1.241 1.107–1.391 <0.001
    Time-varying effect
      No/unknown (s) −1.379 0.252 0.138–0.460 <0.001
      No/unknown (s2) 1.135 3.111 1.589–6.093 0.001
   Surgery (ref: yes)
    Constant
      No 0.863 2.370 2.152–2.611 <0.001
    Time-varying effect
      No (s) −0.738 0.478 0.346–0.661 <0.001
   T stage (ref: T1)
    Constant
      T2 −0.057 0.945 0.829–1.076 0.393
      T3 0.199 1.220 1.103–1.349 <0.001
      T4 0.300 1.350 1.184–1.539 <0.001
    Time-varying effect
      T2 (s) 0.598 1.818 1.191–2.776 0.006
      T3 (s) 0.282 1.326 0.917–1.919 0.134
      T4 (s) 0.417 1.517 0.888–2.592 0.127
   Prediction time (ref: years since start of diagnosis)
    s 2.648 14.128 2.861–69.756 0.001
    s2 −4.016 0.018 0.004–0.074 <0.001
Figure 3 HRs with 95% confidence intervals in the dynamic prediction by PBLS model. The colored curves indicate the variation tendency with time of each variable’s HR, and the colored shaded areas indicate the confidence interval of the HR. HRs of age at diagnosis, sex, primary tumor site, histologic type, T stage, surgery, and chemotherapy were changing at each successive time point(s). (A) Time-varying HR for age. Red: per 10-year range/earlier 10-year range. (D) Time-varying HR for sex. Red: male/female. (E) Time-varying HR for tumor primary site. Pink: upper third of esophagus/lower third of esophagus; green: middle third of esophagus/lower third of esophagus; orange: overlapping site of esophagus/lower third of esophagus. (F) Time-varying HR for histologic type. Red: ESCC/EAC. (H) Time-varying HR for AJCC T stage. Pink: T2/T1; green: T3/T1; orange: T4/T1. (K) Time-varying HR for surgery. Red: no surgery/surgery. (L) Time-varying HR for chemotherapy. Red: no chemotherapy/chemotherapy. (B,C,G,I,J,M) HRs of marriage status, race, grade, N stage, and M stage were constant regardless of time point during the follow-up period, with straightforward time-constant effects. PBLS, proportional baselines landmark supermodel; ESCC, esophageal squamous cell carcinoma; EAC, esophageal adenocarcinoma; AJCC, American Joint Committee on Cancer; HR, hazard ratio.

On the contrary, age at diagnosis, sex, primary tumor site, histologic type, stage AJCC T, surgery, and chemotherapy demonstrated significant time-varying effects on the 5-year DOS. These HRs were constantly changing with each successive s (Figure 3A,3D,3E,3F,3H,3K,3L). For example, the HR value for a patient without chemotherapy immediately after primary treatment compared to a patient with chemotherapy (Yes) was 1.241, which was calculated using the following formula (Table 2): HR5(0)=exp(0.216-1.379×(0/5)+1.135×(0/5)2)=1.124. This value decreased to 0.986 HR5(1)=exp(0.216-1.379×(1/5)+1.135×(1/5)2)=0.986 after 1 year of follow-up, 0.816 after 3 years of follow-up, and 0.972 after 5 years of follow-up (Figure 3L). Age, sex, primary tumor site, histologic type, chemotherapy, and AJCC T stage also demonstrated a significant time-varying effect.

Internal model validation

The heuristic shrinkage factor was 0.995, which indicated good model calibration. The model discriminatory accuracy was verified using the SEER validation cohort using the AUC, resulting in values of 0.763 (95% CI: 0.745–0.78), 0.746 (95% CI: 0.732–0.760), and 0.733 (95% CI: 0.720–0.745) at 1, 2, and 3 years, respectively, and self-verified by training cohort, resulting in values of 0.784 (95% CI: 0.776–0.791), 0.767 (95% CI: 0.761–0.773), and 0.757 (95% CI: 0.752–0.762) at 1, 2, and 3 years, respectively (Figure 4), which both reflected satisfactory accuracy.

Figure 4 AUC of the dynamic prediction model. The AUC for 3-year OS was 0.757 (95% CI: 0.752–0.762) in the SEER training cohort (red line). The AUC for 3-year OS was 0.733 (95% CI: 0.720–0.745) in the SEER internal validation cohort (green line). The AUC for 3-year OS was 0.864 (95% CI: 0.825–0.902) in the external validation cohort (blue line). The AUCs for the dynamic prediction model verified by the training set, validation set, and Zhujiang hospital external validation all indicated a good model discriminatory accuracy. SEER, Surveillance, Epidemiology, and End Results; AUC, area under curve; OS, overall survival; CI, confidence interval.

External model validation

A retrospective Chinese patient cohort consisting of 99 EC patients from Zhujiang Hospital of the Southern Medical University (Guangzhou, Guangdong Province, China) between January 2004 and September 2010 was used for external model validation. Model discriminatory accuracy was verified using the AUC, resulting in values of 0.865 (95% CI: 0.811–0.919), 0.871 (95% CI: 0.827–0.914), and 0.864 (95% CI: 0.825–0.902) at 1, 2, and 3 years respectively (Figure 4).

Model application

The most important function of the dynamic prediction model is to intuitively portray the change in patient survival probability, in order to assist clinicians in performing their medical practice. Our study selected 14 patients as examples for model application to respectively demonstrate the impact of different variables’ time-varying effect on survival and to map the patient’s 5-year DOS curves (Figure 5).

Figure 5 Application of the model: changes in the 5-year dynamic survival estimates in 14 example patients. The 5-year probability of survival estimates for seven groups of 14 patients as examples to demonstrate the impact of different variables’ time-varying effect on survival. (A) Married, White, male, lower third of the esophagus, EAC, grade III, T3N1M0, no surgery, chemotherapy, and radiotherapy; (B) 58 years, unmarried, White, middle third of the esophagus, EAC, grade II, T2N1M0, surgery, chemotherapy, and no radiotherapy; (C) 44 years, married, Asian, female, ESCC, grade I, T1N0M0, surgery, no chemotherapy, and no radiotherapy; (D) 72 years, unmarried, Black, male, overlapping lesion of the esophagus, grade II, T4N1M1, no surgery, chemotherapy, and no radiotherapy; (E) 60 years, married, White, male, upper third of the esophagus, ESCC, grade I, TxN1M1, no surgery, chemotherapy, and no radiotherapy; (F) 55 years, married, Asian, male, middle third of the esophagus, ESCC, grade II, T1N0M0, no chemotherapy, and no radiotherapy; (G) 55 years, married, White, female, lower third of the esophagus, EAC, grade I, T1N1M0, surgery, and no radiotherapy. EAC, esophageal adenocarcinoma; ESCC, esophageal squamous cell carcinoma; AJCC, American Joint Committee on Cancer.

For instance, clinicians often face the problem of receiving adjuvant chemotherapy after an esophagectomy for an early EC patient. In this case, clinicians can use the dynamic survival prediction model to map the survival curves under different conditions for clinical decision-making. Figure 5G displays the 5-year probabilities of survival for a 55-year-old married Caucasian female patient with esophageal adenocarcinoma in the lower third of the esophagus, diagnosed with T1N1M0 stage, and treated with an esophagectomy. The g1 line shows the 5-year survival probability for this patient receiving adjuvant chemotherapy after esophagectomy. Conversely, the g2 line shows the survival probability for this patient without receiving adjuvant chemotherapy after esophagectomy. It is evident that postoperative adjuvant chemotherapy increases the survival probability for this patient in the early follow-up phase (time point = 0–1 years), but subsequently resulted in a lower survival probability during the follow-up period. This example shows that this model can assist doctors in developing individualized treatment strategies for patients.

The latest National Comprehensive Cancer Network (NCCN) clinical practice guidelines for EC recommend that patients with early EC undergo radical surgery. However, many patients will refuse surgery for various reasons. In this case, patients can be educated using dynamic survival curves resulting from this prediction model. For example, Figure 5F demonstrates a 55-year-old married Asian male patient diagnosed with squamous cell carcinoma in the middle third of the esophagus, with a stage of T1N0M0. The f1 line shows the 5-year survival probability for this patient receiving radical surgery for EC. Conversely, the f2 line shows this patient’s survival probability without receiving radical surgery. The 5-year dynamic survival curves suggest that in the early follow-up phase for this patient, the 5-year survival rate after undergoing radical surgery is significantly higher than that after refusing surgery. Although the gap will be shortened over time, this still underscores the importance of radical surgery for patients with early EC. There are also several additional examples shown in Figure 5, which are not elaborated in the article, and detailed patient information is attached to Table S2.

In clinical practice, this model can be used for EC patients to predict their 5-year survival probabilities at different time points during the follow-up period. In addition, this model can map patient-specific dynamic survival curves to assist clinicians in their practice.


Discussion

To the best of our knowledge, there are few prediction models for EC that can dynamically predict 5-year OS at a specific time point during follow-up after diagnosis. The main highlight of this model is that it takes into account prognostic variables with time-varying effects, including age, sex, primary tumor site, histologic type, chemotherapy, surgery, and AJCC T stage. The discovery and addition of time-varying effects in the model make its predicted results more optimal because the model can adjust the HR of prognostic variables, thereby adjusting the patient’s survival probability at different time points. Most importantly, the prominent advantage of this model is that it can predict the 5-year survival probability of patients at different time points, making the prediction more accurate and more practical.

Several EC prediction models currently exist. These models have different manifestation forms, prognostic covariates, use conditions, and predictive purposes. For example, Eil et al. created a web-based prediction tool to determine the OS of patients treated with esophagectomy or neoadjuvant chemoradiotherapy followed by esophagectomy (16). The covariates included in the model were sex, T and N classification, histology, the total number of lymph nodes examined, and treatment. Cao et al. used a population-based SEER database for constructing a nomogram to predict patient survival esophagectomy (17), which incorporated covariates such as age at diagnosis, recorded race, histological type, tumor site and size, grade, T category, N category, and retrieved lymph nodes. Custodio et al. developed a survival prediction model for Caucasian patients with advanced esophagogastric adenocarcinoma receiving first-line chemotherapy (18). Tang et al. developed a model predicting cancer-specific survival for patients initially diagnosed with metastatic EC (mEC) (19). The novelty of this study was that it filled the gap in predicting mEC.

Numerous studies have demonstrated that some prognostic variables may exhibit time-varying effects that result in a change in the HR over time during long-term follow-up. These variables include age at diagnosis (20,21), tumor size (8,21,22), lymph nodal status (8,21), tumor stage (23), histological grade (8,9,24), hormone receptors status (9,24), tumor biomarker level (10), drug exposure, and chemotherapy (21,25). In addition, Fontein et al. demonstrated that high-risk N-stage (N2/3), Human Epidermal Growth Factor Receptor 2-positive (HER2 -positive), and locoregional recurrence are all characteristics that have time-varying effects in postmenopausal, endocrine-sensitive breast cancer patients, and further designed a dynamic prediction model that can update predictions at different time points (26). Rueten-Budde et al. determined that surgical margin and tumor histology exhibit significant time-varying effects on OS, and modeled a dynamic prediction for patients with high-grade extremity soft tissue sarcoma (27). However, the EC prediction models above did not take into account that some predictive variables may have time-varying effects. Moreover, these models are not able to update predictions at different time points during the follow-up period. To date, there is no existing EC prediction model that involves variables with time-varying effects and can dynamically predict the survival probability of patients at different time points.

The present study explored the effect of predictive variables over time in EC and found significant time-varying effects of age, sex, primary tumor site, histologic type, chemotherapy, surgery, and T stage on OS. We then developed a prediction model based on the PBLS, which takes into account variables with time-varying effects. Many studies have shown that the prognostic effect of age on survival changes during long-term follow-up, which is similar to our results on the time-varying effect of age (20,21). Chemotherapy has also been shown to have time-varying effects in several studies, which is consistent with our findings (21,25). No previous research has discovered the time-varying effects of the remaining five prognostic factors, which therefore deserve further investigation. The accumulation and interaction of these time-varying effects result in a change in the risk of death for EC patients and lead to a dynamic prediction of survival probabilities. Compared with other ‘static’ prediction models, the advantages of this model were to take into account variables with time-varying effects for the first time in EC, so that the model has the ability to dynamically predict and update survival probabilities at different time points.

Owing to its ability to predict survival probabilities dynamically, this model can play an important role in practical applications. As mentioned above, when faced with the tricky problem of postoperative adjuvant chemotherapy benefit for a T1N1M0 EC patient after surgery, clinicians can use the dynamic prediction model to calculate the 5-year DOS for both chemotherapy and non-chemotherapy at different time points during the follow-up period, and choose the treatment according to its predictive results. In conclusion, this dynamic prediction model can make predictions more accurate by updating the survival probabilities over time, and can assist clinicians in patient counseling, individualized therapy decision-making, and treatment risk evaluation.

Several limitations exist in this dynamic prediction model. The retrospective nature of the SEER database data, as well as a large number of missing patient clinical pathology registration information are the main limitations of this study. Also, since the data used to construct the model is retrospective, a lot of registered information (which used an early classification version, such as the AJCC-TNM 6th edition staging criteria) was included in the present model, with subtle time differences in clinical applications. Moreover, some important variables that may alter patient prognosis during the follow-up period, such as locoregional and distant recurrences, were not included in the present study, as this information is not registered in the SEER database. For the same reason, the lack of treatment-related variables, such as induction chemoradiotherapy, the quality of esophagectomy, surgical margins, degree of response to therapy, and comorbidity information, also represent disadvantages of the model. Finally, this prediction model could benefit from presentation in an easy-to-use medium, such as a nomogram, web-based calculator, or mobile application.


Conclusions

This study explored and discovered variables that exhibit time-varying effects in EC, and then developed a prediction model that can predict survival probabilities at different time points in the follow-up period. Our dynamic prediction model can continuously revise the patient residual death risk and track changes in patient survival, thereby assisting clinicians in selecting individualized therapy. Additionally, this study underscores the importance of using prediction models for clinical guidance, not only at the time of diagnosis but also during the follow-up period.


Acknowledgments

Funding: This work is supported by a grant from the National Natural Science Foundation of China (No. 81974434), and a grant from the Natural Science Foundation of Guangdong Province (No. 2020A0505100038), grant from the Science and Technology Program of Guangzhou City (No. 201907010037), grant from the Affiliated Cancer Hospital & Institute of Guangzhou Medical University (No. 2020-YZ-01), and grant from Clinical Key Specialty Construction Project of Guangzhou Medical University (No. YYPT202017).


Footnote

Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://dx.doi.org/10.21037/atm-21-4964

Data Sharing Statement: Available at https://dx.doi.org/10.21037/atm-21-4964

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://dx.doi.org/10.21037/atm-21-4964). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All procedures performed in this study involving human participants were in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Medical Ethics Committee of Zhujiang Hospital, Southern Medical University, Guangzhou, China (ethics committee approval number: 2020-KY-001-01). Individual consent for this retrospective analysis was waived. Using the SEER data did not require additional informed consent as patient privacy information is protected by the SEER cancer registries.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.


References

  1. Jemal A, Ward EM, Johnson CJ, et al. Annual Report to the Nation on the Status of Cancer, 1975-2014, Featuring Survival. J Natl Cancer Inst 2017;109:djx030 [Crossref] [PubMed]
  2. Coleman HG, Xie SH, Lagergren J. The Epidemiology of Esophageal Adenocarcinoma. Gastroenterology 2018;154:390-405. [Crossref] [PubMed]
  3. Abnet CC, Arnold M, Wei WQ. Epidemiology of Esophageal Squamous Cell Carcinoma. Gastroenterology 2018;154:360-73. [Crossref] [PubMed]
  4. Cook MB, Chow WH, Devesa SS. Oesophageal cancer incidence in the United States by race, sex, and histologic type, 1977-2005. Br J Cancer 2009;101:855-9. [Crossref] [PubMed]
  5. van den Boorn HG, Engelhardt EG, van Kleef J, et al. Prediction models for patients with esophageal or gastric cancer: A systematic review and meta-analysis. PLoS One 2018;13:e0192310 [Crossref] [PubMed]
  6. Cox DR. Regression models and life-tables. J Royal Stat Soc (B) 1972;34:187-220. [Crossref]
  7. Fisher LD, Lin DY. Time-dependent covariates in the Cox proportional-hazards regression model. Annu Rev Public Health 1999;20:145-57. [Crossref] [PubMed]
  8. Warwick J, Tabàr L, Vitak B, et al. Time-dependent effects on survival in breast carcinoma: results of 20 years of follow-up from the Swedish Two-County Study. Cancer 2004;100:1331-6. [Crossref] [PubMed]
  9. Baulies S, Belin L, Mallon P, et al. Time-varying effect and long-term survival analysis in breast cancer patients treated with neoadjuvant chemotherapy. Br J Cancer 2015;113:30-6. [Crossref] [PubMed]
  10. Chang C, Chiang AJ, Wang HC, et al. Evaluation of the Time-Varying Effect of Prognostic Factors on Survival in Ovarian Cancer. Ann Surg Oncol 2015;22:3976-80. [Crossref] [PubMed]
  11. Rakovitch E, Sutradhar R, Hallett M, et al. The time-varying effect of radiotherapy after breast-conserving surgery for DCIS. Breast Cancer Res Treat 2019;178:221-30. [Crossref] [PubMed]
  12. Rogoz B, Houzé de l'Aulnoit A, Duhamel A, et al. Thirty-Year Trends of Survival and Time-Varying Effects of Prognostic Factors in Patients With Metastatic Breast Cancer-A Single Institution Experience. Clin Breast Cancer 2018;18:246-53. [Crossref] [PubMed]
  13. van Houwelingen HC. Dynamic prediction by landmarking in event history analysis. Scandinavian Journal of Statistics 2007;34:70-85. [Crossref]
  14. Houwelingen HV, Putter H. Dynamic Prediction in Clinical Survival Analysis. Lyon: CRC Press; 2012.
  15. Li L, Yang Z, Hou Y, et al. Moving beyond the Cox proportional hazards model in survival data analysis: a cervical cancer study. BMJ Open 2020;10:e033965 [Crossref] [PubMed]
  16. Eil R, Diggs BS, Wang SJ, et al. Nomogram for predicting the benefit of neoadjuvant chemoradiotherapy for patients with esophageal cancer: a SEER-Medicare analysis. Cancer 2014;120:492-8. [Crossref] [PubMed]
  17. Cao J, Yuan P, Wang L, et al. Clinical Nomogram for Predicting Survival of Esophageal Cancer Patients after Esophagectomy. Sci Rep 2016;6:26684. [Crossref] [PubMed]
  18. Custodio A, Carmona-Bayonas A, Jiménez-Fonseca P, et al. Nomogram-based prediction of survival in patients with advanced oesophagogastric adenocarcinoma receiving first-line chemotherapy: a multicenter prospective study in the era of trastuzumab. Br J Cancer 2017;116:1526-35. [Crossref] [PubMed]
  19. Tang X, Zhou X, Li Y, et al. A Novel Nomogram and Risk Classification System Predicting the Cancer-Specific Survival of Patients with Initially Diagnosed Metastatic Esophageal Cancer: A SEER-Based Study. Ann Surg Oncol 2019;26:321-8. [Crossref] [PubMed]
  20. Jørgensen TL, Teiblum S, Paludan M, et al. Significance of age and comorbidity on treatment modality, treatment adherence, and prognosis in elderly ovarian cancer patients. Gynecol Oncol 2012;127:367-74. [Crossref] [PubMed]
  21. Tanis E, van de Velde CJ, Bartelink H, et al. Locoregional recurrence after breast-conserving therapy remains an independent prognostic factor even after an event free interval of 10 years in early stage breast cancer. Eur J Cancer 2012;48:1751-6. [Crossref] [PubMed]
  22. Sauerbrei W, Royston P, Look M. A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biom J 2007;49:453-73. [Crossref] [PubMed]
  23. Bolard P, Quantin C, Esteve J, et al. Modelling time-dependent hazard ratios in relative survival: application to colon cancer. J Clin Epidemiol 2001;54:986-96. [Crossref] [PubMed]
  24. Bellera CA, MacGrogan G, Debled M, et al. Variables with time-varying effects and the Cox model: some statistical concepts illustrated with a prognostic factor study in breast cancer. BMC Med Res Methodol 2010;10:20. [Crossref] [PubMed]
  25. Cormier JN, Huang X, Xing Y, et al. Cohort analysis of patients with localized, high-risk, extremity soft tissue sarcoma treated at two cancer centers: chemotherapy-associated outcomes. J Clin Oncol 2004;22:4567-74. [Crossref] [PubMed]
  26. Fontein DBY, Klinten Grand M, Nortier JWR, et al. Dynamic prediction in breast cancer: proving feasibility in clinical practice using the TEAM trial. Ann Oncol 2015;26:1254-62. [Crossref] [PubMed]
  27. Rueten-Budde AJ, van Praag VM, van de Sande MAJ, et al. Dynamic prediction of overall survival for patients with high-grade extremity soft tissue sarcoma. Surg Oncol 2018;27:695-701. [Crossref] [PubMed]

(English Language Editor: A. Kassem)

Cite this article as: Du K, Li L, Wang Q, Zou J, Yu Z, Li J, Zheng Y. Development and application of a dynamic prediction model for esophageal cancer. Ann Transl Med 2021;9(20):1546. doi: 10.21037/atm-21-4964

Download Citation