Developing and validating an integrated gross tumor volume (GTV)-TNM stratification system for supplementing unresectable locally advanced non-small cell lung cancer treated with concurrent chemoradiotherapy

Purpose The gross tumor volume (GTV) could be an independent prognostic factor for unresectable locally advanced non-small cell lung cancer (LANSCLC). We aimed to develop and validate a novel integrated GTV-TNM stratification system to supplement LANSCLC sub-staging in patients treated with concurrent chemoradiotherapy (CCRT). Methods We performed a retrospective review of 340 patients with unresectable LANSCLC receiving definitive CCRT. All included patients were divided into two randomized cohorts. Then the Kaplan–Meier method and Cox regression were calculated to access the prognostic value of the integrated GTV-TNM stratification system, which was further validated by the area under the receiver operating characteristic curve (AUC) score and F1-score. Results The optimal outcome-based GTV cut-off values (70 and 180 cm3) of the modeling cohort were used to determine each patient’s integrated GTV-TNM stratum in the whole cohort. Our results indicated that a lower integrated GTV-TNM stratum could had better overall survival and progression-free survival (all P < 0.001), which was recognized as an independent prognostic factor. Also, its prognostic value was robust in both the modeling and validation cohorts. Furthermore, the prognostic validity of the integrated GTV-TNM stratification system was validated by significantly improved AUC score (0.636 vs. 0.570, P = 0.027) and F1-score (0.655 vs. 0.615, P < 0.001), compared with TNM stage. Conclusions We proposed a novel integrated GTV-TNM stratification system to supplement unresectable LANSCLC sub-staging due to its prognostic value independent of TNM stage and other clinical characteristics, suggesting that it could be considered in individual treatment decision-making process.


Introduction
Lung cancer remains the most common cancer worldwide with the highest cancer-related mortality [1] and non-small cell lung cancer (NSCLC) accounts for approximately 85% of all these cases. The standard treatment of NSCLC depends primarily on the TNM staging system, which has been regarded as the single most prognostic factor in predicting survival outcomes of patients with lung cancer [2]. For patients with early stages diseases, surgical resection is recommended to be the primary therapy owing to its curative effect. The 5-year overall survival (OS) of these patients with complete resection ranges from 29 to nearly 90% [3,4]. However, nearly 30% of patients with NSCLC have unresectable, locally advanced, stage III diseases, which nominated definitive concurrent chemoradiotherapy (CCRT) as principal treatment modality.
As is well known, stage III NSCLC represents a heterogeneous population, even within the same TNM stage following radical CCRT, there are large variations in survival outcome, suggesting that further pretreatment assessment beyond TNM stage may optimize treatment strategies and then improve clinical outcome. Several potentially important prognostic factors, specific for lung cancer treated with radiation therapy, have been identified, including performance status, weight loss, age, gender and histology [5]. In addition, tumor volume is a promising prognostic indicator revealed in numerous studies, which has been recognized in the carcinoma of head and neck, esophageal cancer, melanoma, breast cancer and cervical carcinoma treated with radiotherapy [6][7][8][9][10]. Previous radiobiological studies [11][12][13][14] have demonstrated that a larger tumor is related to more proliferation, hypoxia, and radio-resistance, which may compromise therapeutic effects. However, prognostic value of tumor volume in locally advanced non-small cell lung cancer (LANSCLC) has not yet been widely investigated.
It is generally considered that the current TNM staging system still has limitations, especially in the era of modern radiation technique. Simultaneous modulated accelerated radiation therapy (SMART) is widely used in LANSCLC and allows for precise dose delivery to the irradiated target, so that dose escalation can be achieved to improved disease control without excessive toxicity. The classical TNM staging depends mainly on operability, which focuses more on tumor size and location, and lacks quantitative volumetric evaluation of overall tumor burden. To date it has been difficult in a way to quantify tumor burden directly and accurately, which may restrict its application in clinical practice. With the advent of computerized planning systems for gross tumor target contouring and treatment planning in radiotherapy, data on gross tumor volume (GTV) can be availably and systematically calculated. Thus, we launched a study to investigate prognostic value of GTV and TNM staging in unresectable LANSCLC treated with definitive CCRT, and then develop and validate an integrated GTV-TNM stratification system for clinical practice.

Data collection
Consecutive patients diagnosed with unresectable LAN-SCLC who were treated with definitive CCRT at our institution between September 2011 and September 2018 were retrospectively screened. The entry criteria were as follows: (1) pathologically confirmed NSCLC; (2) inoperable stage III disease because of bulky primary disease (T3 or T4), mediastinal lymph node involvement (unresectable N2 or N3), or contraindication for surgical resection; (3) treated with radical radiotherapy (total radiation does ≥ 60 Gy) with concurrent chemotherapy; (4) without a history of prior chest radiotherapy; (5) Eastern Cooperative Oncology Group (ECOG) performance status score 0 to 2; (6) follow-up no less than 6 months since the start of radiotherapy unless death or tumor progression was documented. Those who met the above inclusion criteria in our two ongoing randomized clinical trials were enrolled in this analysis. All included patients were randomly divided, with a ratio of 2:1, into a training group and a validation group. Each patient received physical examination, electrocardiogram, pulmonary function test, laboratory tests, computed tomography (CT) scans of the chest and upper abdomen, brain magnetic resonance imaging (MRI), bone scan and/or positron emission tomography/computed tomography (PET/ CT) and pathological biopsy. Tumor Staging was re-classified according to the TNM staging system proposed by the American Joint Committee on Cancer (8th edition) based on clinical work-ups.

Conclusions:
We proposed a novel integrated GTV-TNM stratification system to supplement unresectable LANSCLC sub-staging due to its prognostic value independent of TNM stage and other clinical characteristics, suggesting that it could be considered in individual treatment decision-making process. Keywords: Gross tumor volume, The integrated GTV-TNM stratification system, Locally advanced non-small cell lung cancer, Concurrent chemoradiotherapy

Radical radiotherapy and GTV contouring
Patients were positioned supine and immobilized in a vacuum cradle. Using 4-dimensional computed tomography (4DCT) (Brilliance CT Big Bore, Philips), stimulation CT data sets scanning from the Atlas level to the second lumbar vertebra level with 5 mm thickness slices were obtained in 10 respiratory phases, and a maximum intensity projection (MIP) data set was constructed, which were then exported to the planning system (Monaco planning system, Elekta Medical Systems, Stockholm, Sweden) for target contouring and treatment planning. GTV, clinical target volume (CTV), planning target volume (PTV), and organs at risk (OARs) were delineated. The GTV included the primary tumor and positive regional lymph nodes, which were defined as nodes with a short-axis diameter no less than 1 cm on CT scan, or with high fluorodeoxyglucose uptake on PET/CT scan, or pathologically approval by mediastinoscopy or endobronchial ultrasound-transbronchial needle aspiration (EBUS-TBNA). For patients receiving neoadjuvant chemotherapy, the GTV was defined as the post-chemotherapy volume of initially determined primary tumor and positive regional lymph nodes. GTV-lung and GTV-node was contoured using pulmonary or mediastinal window CT settings, respectively. The CTV was defined as a 0.6 cm margin around GTV-lung, involved lymph node region and 1-2 elective stations. PTV-GTV and PTV-CTV was produced by expanding GTV and CTV with a 0.6 cm margin in all directions, respectively. Lungs, esophagus, spinal cord and heart were delineated as OARs. All contours were reviewed by a senior physician. All patients received SMART, using a 6-8 MV photon beam. A radiation dose of 60-70 Gy (2.0-3.0 Gy per fraction) was delivered to PTV-GTV, and 45-50 Gy (1.8-2.6 Gy per fraction) to PTV-CTV, in 24-33 fractions. At least 95% of PTV received 95% of prescribed dose. Dose constraints on the OARs were as follow: V20 < 35% for lungs; mean lung dose < 1 Gy; maximum dose of esophagus < 66 Gy; maximum dose of spinal cord < 46 Gy; V30 < 30% for heart. Throughout the course of radiotherapy, weekly cone beam computed tomography (CBCT) was acquired before radiotherapy delivery for verification of the position of the tumor and OARs.

Concurrent chemotherapy
Platinum-based double agents were administered for concurrent chemotherapy in most patients. The regimens included docetaxel/ paclitaxel/ pemetrexed/ etoposide plus platinum, weekly or every 3 weeks. Several patients in clinical trial received triple agents as Nimotuzumab included. In patients with intolerance to double agents, single agent included pemetrexed, taxel or tyrosine kinase inhibitor (TKI) was allowed.

Follow-up and treatment response assessments
Each patient received a chest and upper abdomen CT scan every 3 months for the first 2 years after completion of CCRT, and subsequently every 6 months until tumor progression or death, while brain MRI was required every 6 months. PET/CT scan, bone scan, and biopsy were recommended if clinically suspected of progression. The responses to therapy were assessed by an independent radiation oncologist and confirmed by a senior physician at 1-2 months after radiotherapy based on Response Evaluation Criteria in Solid Tumors version 1.1. Disease progression was documented according to clinical, radiographic, or pathological evidence, and the first failure patterns were recorded. Disease recurrence at primary tumor site or local-regional lymph node was considered as locoregional recurrence, and all other sites of recurrence or metastases were defined as distant metastasis. OS and progression-free survival (PFS) were defined as the time from the initiation of radiotherapy to death, and the first occurrence of disease progression or death, respectively. The last follow up ended at May 1st, 2020. Therapeutic toxicities were recorded according to the National Cancer Institute Common Toxicity Criteria (version 4.0).

GTV risk group and the integrated GTV-TNM stratification system
The data on GTV were determined automatically from treatment planning system (TPS) using Zeus Cloud TPS V1.0 (Tongdiao, SuZhou, China). An X-tile analysis (Yale University, New Haven, CT, USA) provided the optimal outcome-based cut-off points to categorize patients into low GTV risk group (I), moderate GTV risk group (II) and high GTV risk group (III) in both the modeling and validation cohorts [15]. In order to optimize a novel integrated GTV-TNM stratification system, patients were classified into 9 subgroups: Group G1-IIIA (stage IIIA with GTV risk group I); Group G2-IIIA (stage IIIA with GTV risk group II); Group G3-IIIA (stage IIIA with GTV risk group III); Group G1-IIIB(stage IIIB with GTV risk group I); Group G2-IIIB (stage IIIB with GTV risk group II); Group G3-IIIB (stage IIIB with GTV risk group III); Group G1-IIIC (stage IIIC with GTV risk group I); Group G2-IIIC (stage IIIC with GTV risk group II); and Group G3-IIIC (stage IIIC with GTV risk group III). An ordered list of subgroups was constructed, relative to the best prognostic subgroup (Table 1). Several integrated stratification systems had been developed by combining similar subgroups [16,17], and a final novel integrated GTV-TNM stratification system comprising three stratums was brought up, due to its statistical characteristics in the training cohort.

Statistical analysis
All eligible patients were randomized into a training cohort and a validation cohort with a ratio of 2:1. The distribution differences of categorical variables were examined with the Fisher's exact test. Kaplan-Meier method was applied for survival analyses, which were compared by log-rank test (two-sided). Then those factors with a P value < 0.1 in the univariate analysis were incorporated into the Cox proportional hazards model to perform multivariate analysis for OS. A receiver operating characteristics (ROC) curve was produced for the integrated GTV-TNM stratification system, GTV risk group and TNM stage, and the area under ROC curve (AUC) was applied to assess the prognostic validity of these three different systems. Furthermore, bootstrap method was used to validate the predictions, which was done by randomly choosing 20% of the whole cohort for validation, and repeating 100 times to obtain a distribution of the prediction performance. For performance grading, F1 scores were calculated using Eq. 1-3 below: where TP is the true positive rate, FP is the false positive rate and FN is the false negative rate.

Baseline characteristics
A total of 340 eligible patients were included in analysis. They were randomly assigned into a training group comprising 227 patients and a validation group with 113 patients. In the whole cohort, there were 64 females and 276 males with the median age of 58 (range from 28-81) years. There were 97 (28.5%) patients diagnosed with stage IIIA disease, 172 (50.6%) with stage IIIB and 71 (20.9%) with stage IIIC. For all patients, the median GTV volume was 101.0 (range, 9.1-664.3) cm 3 . In the training cohort, the optimal cutoff values of GTV in terms of OS were 71.2 cm 3 and 177.2 cm 3 , which were determined by X-tile program. For the ease of clinical practice, we selected 70 cm 3 and 180 cm 3 as the uniform cutoff points in order to define patients into low, moderate and high GTV risk groups. GTV risk group I, II and III were defined as < 70 cm 3 , 70-180 cm 3 and > 180 cm 3 , respectively. Docetaxel and platinum was the most commonly used regimen of concurrent chemotherapy (70.9%, 241/340). Most patients (254/340, 74.7%) received neoadjuvant chemotherapy before definitive CCRT, and 21.8% patients (74/340) underwent adjuvant chemotherapy. All clinic-pathologic characteristics were similarly distributed between the training and validation groups ( Table 2).

Survival outcomes and tumor response
The median follow-up was 28.9 (range, 1.5-103.4) months in all patients and 46.6 (range, 6.7-103.4) months in event-free patients. Our analysis depicted a median estimated OS of 44.7 months in all patients, 45.5 months in the training group, and 38.1 months in the validation group. The 3-year and 5-year OS rate was 59.9% and 44.1% in the training set, compared with 52.9% and 38.0% in the validation set (P = 0.283), respectively. The median estimated PFS was 12.1, 13.0, and 10.4 months in the whole cohort, training cohort and validation cohort. The 1-year and 2-year PFS rate was 52.7% and 31.9% in the training group, versus 44.9% and 25.6% in the validation group (P = 0.181), respectively.
In the training cohort, 10 patients had complete remission (CR), 154 had partial remission (PR), 49 had stable disease (SD) and 14 had progressive disease (PD). In the validation cohort, 6 patients had CR, 82 had PR, 18 had SD and 7 had PD. The objective response rate (ORR) was 74.1%, 72.2% and 77.9% in the whole, training and validation cohort, respectively (Additional file 1).

First failure patterns and the prognosis of PFS
With a median follow-up of 28.9 months, a total of 238 patients exhibited failure, but 9 patients had no record of failure pattern. The most common type of first failure pattern is distant metastasis (58.0%, 192/331), while the rate of locoregional recurrence was 39.9% (132/331). Meanwhile, 25.1% (83/331) patients had synchronous failure experiencing distant metastasis and locoregional recurrence at the same time ( Table 5). The integrated GTV-TNM stratification system had significant association with failure (P = 0.018) and distant metastasis (P = 0.023), and Stratum C had a higher risk to experience failure and distant metastasis.
In the prediction of PFS, univariate analysis showed that patients with higher stratum in the integrated GTV-TNM stratification system tended to have impaired PFS in both the training group (1-year PFS, Stratum A vs B vs C, 67.5% vs 58.8% vs 31.3%, P < 0.001) (Fig. 1d) and validation group (1-year PFS, Stratum A vs B vs C, 65.4% vs 46.6% vs 25.0%, P < 0.001) (Fig. 1h). Even though the prognosis failed to reach statistical significance when Stratum A vs Stratum B (P = 0.097 and P = 0.176, respectively), the integrated GTV-TNM stratification system could be a potential strong prognostic factor for PFS.

Fig. 2
The prognostic validity of the integrated GTV-TNM stratification system, GTV risk group and TNM stage. a The ROC curve comparing these three stratification systems. b F1 score distributions acquired with bootstrap method. Colors indicate validations using Stratum B and Stratum C as predictors for survival outcome. Stars indicate significance between two groups Table 5 The distribution of first failure patterns among patients with the integrated GTV-TNM stratification system in the whole cohort a 9 patients had no record of failure pattern, but their deaths were documented, thus a total of 331 patients were enrolled in this analysis of first failure patterns GTV, gross tumor volume

Discussion
Our results demonstrate that the prognostic stratification system integrated GTV with TNM staging has provided greater predictive value for survival outcomes in unresectable LANSCLC patients receiving definitive CCRT. To our best knowledge, it is the first study to propose an integrated GTV-TNM stratification system and depict its prognostic value, with the largest sample size to date. The current study used the optimal outcome-based GTV cutoff points (70 and 180 cm 3 ) of the modeling cohort to determine each patient's GTV risk group. Our results suggested that GTV was a strong prognostic factor in unresectable LANSCLC, which were consistent with previous studies [18][19][20]. Then we established a novel integrated GTV-TNM stratification system, which was examined in the modeling and validation cohorts using Kaplan-Meier method and Cox regression. The univariate analysis indicated that the integrated GTV-TNM stratification system significantly predicted OS in the training cohort (3-year OS, Stratum A vs B, 78.4% vs 63.7%, P = 0.024; Stratum B vs C, 63.7% vs 38.1%, P = 0.002; Stratum A vs C, 78.4% vs 38.1%, P < 0.001). Multivariate Cox regression analysis showed an increased risk of death in Stratum B (HR, 1.87; 95% CI 1.07-3.27; P = 0.028) and Stratum C (HR, 3.66; 95% CI 2.07-6.45; P < 0.001) compared with Stratum A. In the prediction of PFS, patients with higher stratum in the integrated GTV-TNM stratification system tended to have impaired PFS (P < 0.001). Such statistical significance could achieve in the validation group equally. The results were robust in these two randomized sets, suggesting that the integrated GTV-TNM stratification system may be able to generalize to the general population.
Several clinical and pathological characteristics have been considered as prognostic factors for survival outcomes. Recently multiple published studies have demonstrated that pretreatment tumor burden measured by PET/CT scan was highly correlated with treatment outcomes in LANSCLC patients receiving definitive CCRT [21][22][23]. Bradley et al. [18] concluded that GTV determined by three-dimensional conformal radiotherapy (3DCRT) planning had great prognostic value for long-term survival and local control. In the secondary analysis of the Radiation Therapy Oncology Group (RTOG) 93-11, Werner-Wasik et al. [19] reported that an increasing GTV was strongly associated with poor OS and PFS. In addition, Basaki et al. [20] suggested that GTV included primary tumor volume and positive lymph nodes, could provide better prognostic value for survival outcomes than TNM stage alone, which was consistent with several published reports [24,25] and the present study. Moreover, the relationship between tumor volume and tumor control probability for NSCLC, breast tumor, head and neck cancer, malignant melanoma, and cervical carcinoma had been extensively investigated by clinical data as well as radiobiologic models [26][27][28][29]. However, the majority of those studies were conducted retrospectively, and several published data didn't support the prognostic value of tumor volume [30][31][32]. Ball et al. [33] launched a multicenter prospective observational study to investigate the prognostic value of primary tumor volume in stage I-III NSCLC treated by definitive radiotherapy. They concluded that primary tumor volume failed to provide additional prognostic information after adjusting for the effects of the T and N stages. This may indicate that making prognosis with tumor volume alone is not adequate. Our results suggested GTV was an independent prognostic factor. The AUC the prediction of for OS was 0.636 for the integrated GTV-TNM stratification system, versus 0.570 for TNM stage (P = 0.027) and 0.605 for GTV risk group (P = 0.033), which implied tumor volume alone was not sufficient for prediction of LANSCLC patients after CCRT.
Therefore, TNM staging for LANSCLC remains useful for classifying the extent of spread of cancer, while GTV appears to be a crucial factor for CCRT. Although TNM staging is more commonly regarded as a classification system of anatomical extent of tumor, it provides great details of the spread route of lung cancer. In TNM staging, the "T" classification comprises the size, location and extent of the primary tumor, while the "N" classification represents lymph node involvement, indicating whether or not the lymph nodes are infiltrated and the involved lymph node region and station. It could perfectly illustrate the relationship between primary tumor and involved lymph nodes in LANSCLC. It would be more practical to integrate the GTV information into the current TNM classification system than GTV risk group or TNM staging alone in LANSCLC patients treated with CCRT. The integration would effectively guide the clinical management and develop better treatment approaches for this subgroup of patients. Other than that, intensitymodulated radiotherapy (IMRT), especially SMART technique, has become the leading radiotherapy technique of lung cancer substituting for 3DCRT. SMART technique was used to deliver different prescription doses to GTV and CTV simultaneously, which afforded dose escalation to GTV and sparing of surrounding tissues. Furthermore, treatment planning in radiotherapy is currently conducted on specialized computerized planning system, which could enable exact location, definition and quantification of tumor volume. The ROC curve results suggested that the prognostic validity could be increased by integrating the current TNM staging system with GTV. Analysis using the ROC curve is typically more suitable for models using continuous predictor for survival outcome, whereas in our study, the predictor used is discrete (either Stratum B or Stratum C). Thus we employ a more suitable index, i.e. F1-score, which makes full use of precision and recall (or sensitivity) to evaluate the model performance, and is routinely used in the field of computer science. In our study, F1-score was significantly higher in the integrated GTV-TNM stratification system and indicated a superior prognostic value comparable to TNM stage and GTV risk group alone in LANSCLC patients treated with definitive CCRT.
It was well recognized that consolidation chemotherapy failed to yield significant survival benefit for unresectable LANSCLC [34][35][36][37][38]. However, based on the series results of the phase III PACIFIC study and other phase II trials [39][40][41], consolidation immunotherapy following CCRT significantly improve PFS and OS with acceptable toxicities in these patients, establishing a new standard of care. In PACIFIC trial, the PFS curve of durvalumab group plateaued at a proportion surviving of 40%, indicating that certain subpopulation could achieve a potential cure. In addition, compared with placebo, a higher incidence of pneumonitis or radiation pneumonitis of any grade occurred in patients receiving durvalumab (33.9% vs. 24.8%), but the rate of grade 3 or 4 was similar (3.4% vs. 2.6%). To be noticed, Asian patients seemed to have a higher incidence of any grade pneumonitis (73.6%) and severe pneumonitis (5.6%). Identifying the high-risk subgroup early is essential for the decision of consolidation immunotherapy, based on the concept of individualized treatment. Our results indicated that patients with the integrated GTV-TNM stratification system Stratum C had a higher risk to develop severe pneumonitis. Therefore, because of the lung injury following large-volume irradiation, consolidation immunotherapy might not be safely administered in Stratum C patients, which deserves further investigation.
Honestly, there were several limitations existed in our retrospective study. Firstly, it was a study comprised patients from 2 clinical trials and daily clinical practice, thus the concurrent chemotherapy regimens were determined by specific protocols and individual patient management. Secondly, the cut-off values for GTV risk group and the integrated GTV-TNM stratum were identified from a single center. Despite the distribution of TNM stages in our analysis seems to be consistent to the global sample [16], larger external databases are warranted to validate the optimal cut-off values for the general population of LANSCLC patients.

Conclusions
We proposed a novel integrated GTV-TNM stratification system to supplement unresectable LANSCLC substaging due to its prognostic value independent of TNM stage and other clinical characteristics, suggesting that it could be considered in individual treatment decisionmaking process.

Additional file 1. Tumor response.
Additional file 2. Multivariate analysis of the prognosis of the integrated GTV-TNM stratification system in the training cohort and validation cohort.