Development and validation of a nomogram for predicting overall survival of patients with cancer of unknown primary: a real-world data analysis
Introduction
Cancer of unknown primary (CUP), also known as occult primary, is a group of metastatic malignancies whose anatomical primary site cannot be detected after a complete clinical evaluation (1). The incidence of CUP has been reported to range from 2% to 7.8% of all cancers (2-4). Meanwhile, it remains third to sixth in the ranking of cancer mortality (5,6) despite the advances in diagnostic pathological tests and genomic approaches in recent years. About 20% of patients identified with some favorable clinicopathological features may benefit from locoregional treatments or specific first-line therapies, while the rest (around 80%) patients still lack standard anti-cancer regimens (4,7,8).
Due to its metastatic and highly heterogeneous nature, CUP shows a variable prognosis and lacks standard staging system for its prognosis indication and treatment planning. Several studies have investigated the prognostic models of CUP; however, they were developed based on small populations and lacked precise computation for survival probability. A more accurate and feasible prediction model is still in need for estimating individual life expectancy in the current clinical setting. In addition, the benefits of different treatments in patients with CUP have been scarcely evaluated in large populations.
In this study, we aim to identify the epidemiological characteristics of CUP, including the association of survival and different treatment regimens, and build the first prognostic nomogram for CUP based on the real-world data analysis of patients in the Surveillance, Epidemiology, and End Results (SEER) (www.seer.cancer.gov) database with internal validation. We present the following article in accordance with the TRIPOD reporting checklist (available at http://dx.doi.org/10.21037/atm-20-4826).
Methods
Data source and patient population
Patients for this study were identified in the SEER 18 Database (November 2018 submission, 1975–2016 varying) (9) with SEER*Stat Software (Version 8.3.6). Since the information for combined metastasis was only available after 2010 in SEER, only patients diagnosed between 2010 and 2016 were included in this study. The following inclusion criteria were used to determine the study population: (I) patients were histologically diagnosed with CUP site (documented as ICD-O-3 C80.9); (II) patients aged ≥18 years; (III) patients had active follow-up data. Patients with hematological malignancies (ICD-O-3 Hist/behavior, malignant code 9590/3-9989/3) were excluded. Patients with 0 month of survival time were also excluded to eliminate cases diagnosed at autopsy alone. In total, 19,543 patients were identified under the selection criteria.
All eligible patients with available data for metastatic status at the bone, brain, liver, and/or lung (the information of metastases was only available at the above four sites in the database) were included in the further nomogram study to optimize the generalizability of the nomogram. We split the nomogram study population into training set and validation set by time, which is considered a stronger design than random division for assessing model performance when only a single data set is available (10). Ultimately, 3,347 cases with complete study variables were included for developing and validating nomogram. Patients diagnosed between 2010 and 2015 were used for model development (training set, N=2,286), and those diagnosed in 2016 were used for model evaluation (validation set, N=1,061).
Study variables and outcomes
The demographic and clinical information of included patients were extracted, including patient ID, year of diagnosis, sex, race, age at diagnosis, insurance status, marital status at diagnosis, histology (ICD-O-3 Hist/behavior, malignant), diagnostic confirmation, surgical procedure, radiotherapy, chemotherapy, combined metastasis (metastases) at diagnosis (at the bone, brain, liver, and/or lung), survival months, and vital status (study cutoff used). We interpreted the meanings of variables according to SEER variable dictionary and coding manuals. The histological groups were classified as squamous cell carcinomas (SCC) (ICD-O-3 8050–8089), neuroendocrine carcinomas (ICD-O-3 8013/3, 8041/3, 8153/3, 824), carcinomas not otherwise specified (NOS) (ICD-O-3 801 except for 8013/3, 802, 803, 8046/3), adenocarcinomas (ICD-O-3 814, 8160/3, 819, 820, 825–855), undifferentiated malignant neoplasms (ICD-O-3 800) and other types (malignancies that not classified as carcinomas) according to the European Society for Medical Oncology (ESMO) guideline (11) and ICD-O-3 SEER Site/Histology Validation List. Although the information of metastasis was limited in the database, metastatic status at the bone, brain, liver, and/or lung was generally acknowledged as representative for patients’ metastatic burden. Two new variables were set to conclude patients’ metastatic status: “metastatic status at major viscera” (brain/liver/lung) and “the number of metastatic organs” (only referred to the situation in the bone/brain/liver/lung).
Surgery regimens included non-primary surgery to CUP, other regional sites, distant lymph node(s), other distant sites, and any combination of the latter three procedures. Radiotherapy modalities included beam radiation, radioactive implants, radioisotopes, and other methods not specified. Chemotherapy regimens were not detailed for CUP patients in the database.
Patients’ survival months were calculated as “FLOOR ((endpoint-date of diagnosis)/days in a month)”. We chose 6- and 9-month overall survival (OS) as the endpoint respectively. The end event for OS was defined as death or last follow-up for cases alive.
Statistical analysis
The OS of patients was calculated through the Kaplan-Meier Method, with log-rank test assessing the differences between each group. We performed univariate Cox regression analyses to evaluate the association of study variables and patients’ OS (12). The benefits of different treatment options were further evaluated through univariate analyses in subgroups of patients combined with or without metastasis (at the bone, brain, liver, and/or lung). Candidate predictors of the nomogram were selected based on both clinical and statistical considerations. Potential covariates for modeling were first selected according to clinical relevance based on our literature review, including sex, race, age at diagnosis, insurance status, marital status at diagnosis, histological type, surgery, radiotherapy, chemotherapy, metastatic status at major viscera, and the number of metastatic organs. Except for age (the only continuous variable) and the number of metastatic organs (the only rank variable), all the other candidates were categorical variables. A collinearity diagnosis was performed before further variable selection. The Martingale residual test was performed to confirm the linear relation of the continuous predictor and the outcome (13). Candidate predictors were selected through a backward eliminated Cox’s proportional hazards analysis with the likelihood ratio tests according to the statistical recommendations (14,15). A nomogram was finally established based on the results of the multivariate analysis. In the nomogram, each factor was assigned a score, which summed up to a total score corresponding to the predicted survival probability. The predicted 6- and 9-month OS probability for individuals could be obtained through calculation. The Cochran–Armitage test for trend was performed for the rank variable by treating it as a continuous variable in the above model. Hazard ratios (HRs) with 95% confidence intervals (CIs) were calculated for each predictor.
The discrimination performance of nomogram was evaluated through calculating Harrell’s concordance-index (C-index) and drawing 6- and 9-month OS calibration plots of training and validation sets, respectively (10). A higher value of C-index (ranging from 0.5 to 1.0) indicates a better discrimination ability (16). Validations in both cohorts were performed using the bootstrap resampling method.
Based on each patient’s total score in the nomogram, we stratified all patients into low-, intermediate-, and high-risk groups respectively. We also performed risk stratification separately for the training cohort and the validation cohort.
All statistical analyses were performed using IBM SPSS Statistics 26.0, and R 3.6.3 with package rms, MASS, Hmisc, formula, survival, and ggplot2. A two-tailed P value less than 0.05 was considered statistically significant in this study.
This study was performed in accordance with the Declaration of Helsinki (as was revised in 2013).
Results
Demographics, clinicopathological characteristics, and treatment options
The demographics, clinicopathological characteristics, and treatment options of patients with CUP are shown in Table 1. The median OS was 6.0 (95% CI: 5.8–6.2) months for all the included patients. Among all the histological types, adenocarcinoma was the most common type. Analysis of patients with available metastatic information demonstrated that liver was the most common site for combined metastasis at diagnosis among these four organs, followed by the bone, lung, and brain.
Full table
Among the 19,543 patients with CUP, only 34.2% received chemotherapy, and the proportion was even smaller for the administration of radiotherapy (24.3%) or non-primary surgery (14.4%). In the subgroup analysis (Figure 1), among patients either combined with or without metastasis at the bone, brain, liver, and/or lung, surgical procedures to CUP were significantly associated with better survival (P value <0.001), while patients without metastasis showed more benefits (HR =2.495, 95% CI: 2.026–3.073). The use of radiotherapy was only associated with a better outcome in patients without metastasis at the above four sites (P value <0.001). Chemotherapy demonstrated benefits in both groups (P value <0.001).
Predictors of OS and nomogram construction
The univariate analysis of potential prognosticators is demonstrated in Table 2. Marital status at diagnosis was identified as a significant covariate in the univariate analysis, but was excluded from multivariate Cox model through backward elimination method. Cases combined with bone/brain/liver/lung metastasis all demonstrated significantly inferior survival compared with those without (all P values <0.001). Additionally, patients with metastasis at brain/liver/lung showed even poorer prognosis than those who without metastasis at these sites (HR =1.857, 95% CI: 1.688–2.042, P value <0.001).
Full table
Covariates incorporated in the final nomogram were sex, age, histological type, surgery, radiotherapy, chemotherapy, and the number of metastatic organs. The HR of each variable for multivariate Cox regression analysis is shown in Table 2. In the multivariate analysis, patients who were female and younger at diagnosis had better prognosis. SCC showed significantly superior prognosis than carcinomas NOS and adenocarcinomas. It also indicated that more metastatic organs led to a significantly higher risk of mortality (P value for trend <0.001). The nomogram is illustrated in Figure 2, and its points assignment is listed in Tables 3-5.
Full table
Full table
Full table
Nomogram validation
The Harrell’s C-index of nomogram was 0.705 (95% CI: 0.692–0.717) for the training cohort and 0.727 (95% CI: 0.703–0.752) for the validation cohort, representing a good discrimination ability. The calibration plots for 6- and 9-month OS are shown in Figure 3, which indicated a good agreement between predictive and observed value for both cohorts.
Performance of the nomogram in risk stratification
All patients included in the nomogram were stratified into three groups: patients with low risk (33.3%; total score <140), patients with intermediate risk (35.7%; 140≤ total score <180), and patients with high risk (31.0%; total score ≥180) (Figure 4). For all cohorts, the median OS of the low-, intermediate-, and high-risk groups was 28.0 (95% CI: 22.1–33.9) months, 6.0 (95% CI: 5.3–6.7) months, and 2.0 (95% CI: 1.8–2.2) months, respectively. The survival curves demonstrated that the risk stratification based on the nomogram could accurately discriminate the outcome of patients (log-rank test, P value <0.001).
Discussion
As a group of cancers with high heterogeneity and generally unsatisfactory outcome, CUP calls for a more practical and accurate prediction model for individual prognosis against the background of precision medicine. In this real-world study with a large population, we constructed the first prognostic nomogram of CUP. With good discrimination ability, this nomogram provides a quantitative tool for survival prediction of CUP on an individual basis. Additionally, we performed a risk stratification based on this nomogram, which could distinguish between different risk groups. This may assist clinicians in identifying high-risk patients and treatment planning.
Based on the SEER database, our study showed a median OS of 6.0 (95% CI: 5.8–6.2) months for the whole study cohort with CUP. This result was within the range of survival time reported in previous studies, which varied from 3 to 14.2 months depending on the characteristics of different study populations. However, this median survival still revealed a relatively poor prognosis of our study cohort according to recent clinical guidelines (17,18).
Several studies have discussed the prognostic factors of CUP in the past decades. Parameters with prognostic value reported in the previous literature include age, sex, smoking history, performance status (PS), histology, site of metastasis, tumor location, number of metastasis, liver metastasis, symptoms, multiple comorbidities, prolonged QT interval, lactate dehydrogenase (LDH), alkaline phosphatase (ALP), albumin (ALB), leukocytosis, lymphopenia, treatment, and socioeconomic factors (3,19-24). Our study showed a consistent result of male gender and older age as unfavorable features, and also included these two factors in the final nomogram. Meanwhile, Petrakis et al. also pointed out that it should be biological rather than chronological age that acted as the main predictor (3). However, since PS of CUP patients was inaccessible in the database, patients’ functional organ reserve could only be partially reflected through age and metastatic status in this case. For histological types, our study also confirmed that SCC and neuroendocrine carcinoma carried a superior prognosis than adenocarcinoma, which highly agreed with previous evidence (20-22). Although the number of metastatic organs had its limitation owing to the unknown status of other sites, surveillance on these major organs largely represented the organ reserve and metastatic burden of patients.
According to the National Comprehensive Cancer Network (NCCN) clinical guidelines, chemotherapy was recommended for symptomatic patients (PS score of 1–2) or asymptomatic patients (PS score of 0) with aggressive cancer, and the regimen was largely dependent on its histological type (18). Prognostic models could offer a more accurate risk classification after a comprehensive consideration. Based on the risk stratification, low-risk patients were more likely to be treated with a curative aim, while high-risk patients should be provided with options such as low-toxicity chemotherapy and palliative care. Several prognostic models have been developed in previous investigations. In 1995, van der Gaast et al. established a prognostic model with only PS and ALP based on a cohort of 79 patients (25). This model only included single-centered patients with poorly differentiated adenocarcinoma or undifferentiated carcinomas and was not validated in external datasets. Culine et al. later developed and validated a model with PS score and serum LDH levels based on a population of 150 patients, which was limited to patients with carcinoma (26). Ponce Lorenzo et al. developed a prognostic model based on PS and absence or presence of liver metastasis (19). This study was also conducted in a single center, with a population of 100 patients and lacked independent validation. A more recent prognostic algorithm put forwarded by Petrakis et al. incorporated leukocytosis, clinicopathologic CUP subgroup and PS as predictors. They used classification and regression analysis in a population of 311 cases, and tested the validity in a randomly split set of their study cohort (3). This validation was relatively weak according to the current statement (10); and one of its predictors, clinicopathological subgroup, had its limitations in clinical application. All the above prognostic models were established on small populations, and lacked a quantitative method for survival probability computation and model performance evaluation. In addition, since they all served as tools for risk assessment before therapies, the impact of treatment was not considered in these prognostic models. Our nomogram incorporated surgery, radiotherapy, and chemotherapy as predictors, and all of them indicated significant improvements on OS in the multivariate analysis. However, subgroup analysis indicated that the use of radiotherapy may not bring survival benefit to patients with metastasis. In contrast, subgroup analysis suggested the value of non-primary surgery even in the metastatic context. Meanwhile, this real-world data analysis showed that the use of non-primary surgery was only considered in a rather small population. Therefore, this study suggested that non-primary surgery to CUP should also be considered as a potential option for highly selected patients, while other factors such as patients’ PS, organ reserve, personal preference, tolerance to different therapies, and cost-effectiveness should also be taken into account.
This study also had several limitations. Due to the lack of data availability, the prognostic role of important clinical factors such as patients’ PS, metastatic status in other organs, differentiation status, biochemical and hematological indicators such as LDH, ALP, and ALB levels could not be assessed in this study. It also limited the analysis of treatment efficacy because of the unknown sequence of being treated and disease progressing, unknown regimen details, and unknown response to a certain therapy. In addition, our nomogram was established on the U.S. population and has not been validated in any external datasets. Its predicting and discriminating abilities need to be further assessed in current patients from other registries or institutions. All of the above limitations need to be considered before widespread application, and more investigations are yet to be done for the improvement of the current prognostic nomogram for CUP.
Conclusions
We developed and validated the first nomogram for predicting individual survival of patients with CUP based on a real-world study with a large population. Covariates included in the nomogram were sex, age, marital status, histological type, surgery, radiotherapy, chemotherapy, and the number of metastatic organs. It demonstrated good performance on prediction and discrimination, with a C-index of 0.705 (95% CI: 0.692–0.717) for the training cohort and 0.727 (95% CI: 0.703–0.752) for the validation cohort. Further validations of this nomogram in independent cohorts are needed before its widespread application.
Acknowledgments
Funding: This study was supported by the National Science and Technology Major Project (2020ZX09201-013); National Natural Science Foundation of China (grant No. 82072915); the Shanghai Municipal Science and Technology Commission Guidance Project, China (contract No. 18411967800); research grant from Shanghai Hospital Development Center (grant No. SHDC12018X03); and CSCO-ROCHE Cancer Research Fund 2019 (grant No. Y-2019Roche-171). We acknowledged the above funders for their financial support.
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at http://dx.doi.org/10.21037/atm-20-4826
Peer Review File: Available at http://dx.doi.org/10.21037/atm-20-4826
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at http://dx.doi.org/10.21037/atm-20-4826). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was performed in accordance with the Declaration of Helsinki (as was revised in 2013). Access was permitted by SEER to the database, which is available for public use. Therefore, consent and ethics approval were not required.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Varadhachary GR, Raber MN. Cancer of unknown primary site. N Engl J Med 2014;371:757-65. [Crossref] [PubMed]
- Pavlidis N, Briasoulis E, Hainsworth J, et al. Diagnostic and therapeutic management of cancer of an unknown primary. Eur J Cancer 2003;39:1990-2005. [Crossref] [PubMed]
- Petrakis D, Pentheroudakis G, Voulgaris E, et al. Prognostication in cancer of unknown primary (CUP): development of a prognostic algorithm in 311 cases and review of the literature. Cancer Treat Rev 2013;39:701-8. [Crossref] [PubMed]
- Pavlidis N, Pentheroudakis G. Cancer of unknown primary site. Lancet 2012;379:1428-35. [Crossref] [PubMed]
- Hemminki K, Bevier M, Hemminki A, et al. Survival in cancer of unknown primary site: population-based analysis by site and histology. Ann Oncol 2012;23:1854-63. [Crossref] [PubMed]
- Jones W, Allardice G, Scott I, et al. Cancers of unknown primary diagnosed during hospitalization: a population-based study. BMC Cancer 2017;17:85. [Crossref] [PubMed]
- Greco FA. Cancer of unknown primary site: still an entity, a biological mystery and a metastatic model. Nat Rev Cancer 2014;14:3-4. [Crossref] [PubMed]
- Economopoulou P, Pentheroudakis G. Cancer of unknown primary: time to put the pieces of the puzzle together? Lancet Oncol 2016;17:1339-40. [Crossref] [PubMed]
- Surveillance, Epidemiology, and End Results (SEER) Program () SEER*Stat Database: Incidence - SEER 18 Regs Custom Data (with additional treatment fields), Nov 2018 Sub (1975-2016 varying) - Linked To County Attributes - Total U.S., 1969-2017 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2019, based on the November 2018 submission. Available online: www.seer.cancer.gov
- Collins GS, Reitsma JB, Altman DG, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162:55-63. Erratum in: Ann Intern Med. 2015 Apr 21;162(8):600. doi: 10.7326/L15-0078-4. [Crossref] [PubMed]
- Fizazi K, Greco FA, Pavlidis N, et al. Cancers of unknown primary site: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol 2015;26 Suppl 5:v133-8. [Crossref] [PubMed]
- Iasonos A, Schrag D, Raj GV, et al. How to build and interpret a nomogram for cancer prognosis. J Clin Oncol 2008;26:1364-70. [Crossref] [PubMed]
- Lin DY, Wei LJ, Ying Z. Checking the Cox Model with Cumulative Sums of Martingale-Based Residuals. Biometrika 1993;80:557-72. [Crossref]
- Heinze G, Wallisch C, Dunkler D. Variable selection - A review and recommendations for the practicing statistician. Biom J 2018;60:431-49. [Crossref] [PubMed]
- Mantel N. Why Stepdown Procedures in Variable Selection. Technometrics 1970;12:621-5. [Crossref]
- Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361-87. [Crossref] [PubMed]
- Losa F, Soler G, Casado A, et al. SEOM clinical guideline on unknown primary cancer (2017). Clin Transl Oncol 2018;20:89-96. [Crossref] [PubMed]
- NCCN. Clinical Practice Guidelines in Oncology. Occult Primary (Cancer of Unknown Primary [CUP]), Version 1.2020. 2019. Available online: https://www.nccn.org/professionals/physician_gls/pdf/occult.pdf
- Ponce Lorenzo J, Segura Huerta A, Díaz Beveridge R, et al. Carcinoma of unknown primary site: development in a single institution of a prognostic model based on clinical and serum variables. Clin Transl Oncol 2007;9:452-8. [Crossref] [PubMed]
- Abbruzzese JL, Abbruzzese MC, Hess KR, et al. Unknown primary carcinoma: natural history and prognostic factors in 657 consecutive patients. J Clin Oncol 1994;12:1272-80. [Crossref] [PubMed]
- Fernandez-Cotarelo MJ, Guerra-Vales JM, Colina F, et al. Prognostic factors in cancer of unknown primary site. Tumori 2010;96:111-6. [Crossref] [PubMed]
- Hess KR, Abbruzzese MC, Lenzi R, et al. Classification and regression tree analysis of 1000 consecutive patients with unknown primary carcinoma. Clin Cancer Res 1999;5:3403-10. [PubMed]
- Lu HJ, Chen KW, Tzeng CH, et al. Evaluation of prognosis for carcinoma of unknown origin in elderly patients. Oncology 2012;83:24-30. [Crossref] [PubMed]
- Grajales-Álvarez R, Martin-Aguilar A, Silva JA, et al. ECOG is as independent predictor of the response to chemotherapy, overall survival and progression-free survival in carcinoma of unknown primary site. Mol Clin Oncol 2017;6:643-50. [Crossref] [PubMed]
- van der Gaast A, Verweij J, Planting AS, et al. Simple prognostic model to predict survival in patients with undifferentiated carcinoma of unknown primary site. J Clin Oncol 1995;13:1720-5. [Crossref] [PubMed]
- Culine S, Kramar A, Saghatchian M, et al. Development and validation of a prognostic model to predict the length of survival in patients with carcinomas of an unknown primary site. J Clin Oncol 2002;20:4679-83. [Crossref] [PubMed]