Development and validation of a model for temporal lobe necrosis for nasopharyngeal carcinoma patients with intensity modulated radiation therapy

Purpose To develop and validate a quantitative complication model of temporal lobe necrosis (TLN). To analyze the effect of clinical and dosimetric factors on TLN. Patients and methods In this study the prediction model was developed in a training cohort that consisted of 256 nasopharyngeal carcinoma (NPC) patients from January 2009 to December 2009. Dosimetric and clinical factors were extracted for model building. Dosimetric factors including the maximum dose, minimum dose, mean dose, dose covering specific volume and dose of percentage volume. Clinical factors include age, gender, T/N-stage, overall stage, diabetes and hypertension. LASSO (least absolute shrinkage and selection operator) regression model was used for feature selection, and prediction model building. A testing cohort containing 493 consecutive patients from January 2010 to December 2010 was used for model validation. The performance of the prediction model was assessed with respect to its calibration, discrimination. Results The prediction model, which consisted of two dosimetric features (D0.5cc and D10), is significantly associated with LN status (P < .001 for both training and testing cohorts). None of clinical factors show direct prediction value. The model shows good discrimination, with a C-index of 0.685 (95% CI: 0.6048–0.765) on testing set, and a consistent trend in calibration on testing set. Conclusion This study presents a prediction model can be conveniently used to facilitate the individualized prediction of TLN in patients with NPC. Clinical factors have no direct impact on TLN. Electronic supplementary material The online version of this article (10.1186/s13014-019-1250-z) contains supplementary material, which is available to authorized users.


Introduction
In patients with nasopharyngeal carcinoma (NPC), an impressive local control rate has been achieved, and as a result, the late toxicities of long-term survival after radiation therapy have become an important concern. Temporal lobe necrosis (TLN) is one major complication observed in NPC patients after radiation therapy. Because of the nearness in the anatomical structure of the TL and nasopharynx, TLs are prone to receiving high doses of radiation, causing cerebral tissue damage. Typical symptoms of TLN include dizziness, lethargy, debilitation, personality change, pressure symptoms, and epileptic attacks. Further failure of cognition can irreversibly impair patients' quality of life.
Intensity-modulated radiation therapy (IMRT) improves the physical dose distribution compared to two-dimensional RT (2DRT), and with this dose improvement, the corresponding incidence of TLN decreases significantly [1,2]. Indeed, the incidence of TLN after IMRT from different centers varies from 3 to 14% [3][4][5][6][7][8]. QUANTEC suggested that the dose limit for TL D max ≤ 60 Gy and V 65Gy ≤ 1% are commonly accepted in clinical use. However, the QUANTEC limits are based on past experience from the 3DCRT or 2DRT era and may not be suitable for IMRT. A retrospective analysis performed by Su et al. [9]. revealed that IMRT D max < 68 Gy or D 1cc < 58 Gy for the TL is relatively safe. Their study covered a series of dosimetric factors. However, following study of Su et al., pointed out rV40 and aV40 (relative and absolute volume receiving dose higher than 40 Gy) could better predict TLN [10]. Sun et al. suggested a dose limitation of D 0.5cc < 69 Gy based on an analysis of 506 patients [11]. A case-controlled study performed by Zhou found that both focal high dose and moderate dose to large volume should be considered [12].
Most previous studies focused on the dose only, and clinical factors have been seldom discussed. In clinical practice, the overall condition of the patient should be taken into consideration. Although there have been several previous publications looking at dosimetric and patient factors in determining a safe dose to the temporal lobes during NPC IMRT treatment. There was no a comprehensive and quantitative analysis.
The purpose of this study was to generate a model to predict TLN occurrence using clinical and dosimetric factors. And to access model performance on a testing data set.

Methods
The workflow of this study is presented in Fig. 1. Patients were divided into training set and testing set base on patients' treatment time. A LASSO modeling method was performed to build a prediction, and the model performance was evaluated by ROC and calibration curve.

Patients
From January 2009 to December 2010, 749 NPC patients receiving IMRT in our center were included in the study. All the patients completed full course radiotherapy and were metastatic free. Patients were followed up after radiotherapy every 3 months for the first 2 years, every 6 months from year 2 to year 5, and annually thereafter. The last follow-up date was July 2015. After a median follow-up of 48.8 months (range 3.5-75.1 months), 38 out of 749 (5.07%) patients were diagnosed with TLN based on magnetic resonance imaging (MRI). Twenty-six of the patients had a unilateral lesion, and 12 had bilateral lesions. TLN grading was performed according to RTOG/EORTC late radiation morbidity scoring schema. Most of the TLN cases (32 out of 38) were of grade 1 (mild symptoms). Two cases were grade 2, and four cases were grade 3.
To extract the precise dose parameters, re-delineation of the TLs was performed on all cases in this study. Contouring of the TL referred to the suggestion by Sun et al. [13] and included the hippocampus, parahippocampal gyrus and uncus. DVH curves of the TLs were exported from the original treatment plans on the Pinnacle (Pinnacle 3; Philips Corp, Fitchburg, WI) treatment planning system (TPS). Dose parameters including the maximum dose (D max ), minimum dose (D min ), mean dose (D mean ), dose covering specific volume (D vs ), and dose of percentage volume (D vp ) were derived from the exported DVH curves. Clinical factors included age, gender, T/N-stage, overall stage, diabetes and hypertension. Clinical characteristics were retrospectively reviewed from a clinical database under the approval of the Institutional Review Board. All patients were staged according to the 7th American Joint Committee on Cancer/ Union for International Cancer Control staging system.

Treatment protocol
Planning target delineation follows the definition of ICRU 50 and 62 reports. The gross tumor volume (GTV) covers primary tumor and metastatic lymph nodes. The clinical target volume (CTV) covers the entire nasopharynx, parapharyngeal space, clivus, the base of skull, pterygoid fossa, the posterior half of ethmoidal sinus, inferior sphenoid sinus, and the posterior edge of nasal cavity and maxillary sinuses. The planning target volume (PTV) extends 3-5 mm around the GTV or CTV. A total dose of 66 Gy in 30 fractions is delivered to T1 and T2 and 70.4 Gy in 32 fractions to T3 and T4 via the simultaneous integrated boost-IMRT (SIB-IMRT) technique. Sixty-six Gy in 30-32 fractions is delivered to the metastatic lymph nodes. For high-risk and low-risk CTV, total doses of 60 and 54 Gy are delivered in 30-32 fractions, respectively.

Statistics and modeling
All statistics analysis was based on temporal lobe, so the sample number were double. A univariate analysis was performed to analyze the impact of each features. For continuous variables, a logistic regression was used. For binary variables, a chi-square test was used. A spearman correlation coefficient for all dosimetric and clinical factors were calculated to analyze the relation between factors.
The patient data were divided by date (2010-01-01). The prior portion comprised the training set (256), and the latter portion composed the testing set (493). The two lobes were analyzed separately, and the quantities were doubled. The reason to choose this time point is the event number is almost equal in training (40) and testing set (39). The different TLN probability in training and testing set may caused by the difference in the follow-up time (median follow-up: training 55.6 months, testing 38.1 months). We believe using longer time as training set was more appropriate.
A cross-validation LASSO was used in model development. The model prediction power was assessed on testing set by the area under the curve (AUC) and calibration curve. Briefly, training set was randomly spliced into 5 parts (5 folds). Four parts were used to create model and the last part was used to find best lambda value in penalized logistic regression model. After best lambda value was selected, the final model was established in whole training set. More modeling details are described in Additional file 1: Supplement A.
Meanwhile to evaluate the bias came from data splitting. A 3-fold cross-validation was used to separated data into training and testing. And same processing was applied base on this data splitting. The detail and result of this evaluation were described in Additional file 1: Supplement B.
In order to analyze the prediction power of diagnosis, treatment and dose features, four models with same strategy were developed, including diagnosis features only, treatment features only, diagnosis and treatment features, and dose features only. Diagnosis features include age, gender, T/N-stage, overall stage, diabetes and hypertension. Treatment feature includes chemotherapy. Dose features contains all dosimetric factors.
Data preprocessing and analysis were accomplished using R language. The package 'glmnet' was used for modeling and validation.

Patient characteristics and univariate analysis
The results and patients' characters are showed in Table 1

Feature selection and prediction model building
Of the features, 27 were reduced to 2 predictors on the basis of 256 patients in the primary cohort (Fig. 2), and they were with nonzero coefficients in the LASSO logistic regression model. The final enrolled features are D0.5cc and D10. These features were presented in the prediction model formula. The unit for D0.5cc and D10 was cGy.

Validation of prediction model
The receiver operating characteristic curve and calibration curve for testing set was present in Fig. 3. Here the AUC for testing set is 0.6849 (95% CI: 0.6048-0.765).
Although there is a deviate between ideal line and calibration curve, which we will discuss it later. The trend of the prediction is consistent with observation. The different dataset splitting has similar results. Detail was provided in Additional file 1: Supplement B.

Prediction power analysis
The prediction power analysis was presented in Table 2. Without dose features, other features can only provide prediction power less than 0.55. The most powerfull prediction feature is the T stage. Because the T stage is corrlated with dose and prescrption. When T stage and dose are analyzed together, T stage will be removed by LASSO model. More details are described in Additional file 1: Supplement D.

Clinical use
To facilitate clinical use, a probability map with two dose factors is presented in Fig. 4. Two regions are delineated by our model. Dose in first region will have TLN probability less than 5%. And the 2nd region is 10%.

Discussion
In this study, both clinical and dosimetric factors were evaluated. A robust modeling method was implemented to get final model. As we know, this study is the first attempt to derive a quantitative complication model for TLN. This is a logical extension of the previous statistical studies.
The results show that only physical dose parameters were reliable factors for the prediction of TLN. Although, the clinical factor T-stage is strongly correlated Fig. 2 Feature selection using the least absolute shrinkage and selection operator (LASSO) binary logistic regression model. Tuning parameter (λ) selection in LASSO used 10-fold cross-validation via minimum criteria. The area under the receiver operating characteristic (AUC) was plotted versus log(λ). The red dot lines were draw at the optimal values by using minimum criteria. The best AUC is 0.6787 with standard deviation 0.05 a b Fig. 3 The receiver operating characteristic curve and calibration curves for testing set. a The receiver operating characteristic curve with AUC 0.6849 (95% CI: 0.6048-0.765). b The calibration curve. 'Low risk' is the TLN risk less than 5%; 'mid risk' is the TLN risk between 5 and 10%; 'high risk' is the TLN risk large than 10% Diabetes is accompanied by a biological change in the microvascular environment. Its enhancement of TLN supports the hypothesis put forward by Belka that vessel damage is a cerebral toxicity inducing factor [14]. Because of the limited number of patients, further investigation is expected to draw a statistically persuasive conclusion. Concurrent chemotherapy exhibited a similar situation and induction or adjuvant chemotherapy showed no positive effect. These results were consistent with those obtained in Lee's study [15].
Dosimetric parameters are strongly correlated with each other. In variable selection, collinearity causes competition among predictors and makes arbitrary decisions in choice [16]. The prediction variables were selected via LASSO [17]. LASSO is another approach that can be used to select highly correlated variables for which strongly correlated predictors tend to be in and out together [18]. More advanced modeling strategies, such as random forests, are also alternatives and may achieve more powerful predictions. However, considering the accessibility required for clinical purposes, they may not be a first choice.
Previous studies provided different conclusions regarding dosimetric variables. In addition to various specific cutoff limitation points, a focal high dose [9,11,17] or relatively large volume dose [10,12] are also critical differences. The affecting factors could include TL delineation, treatment strategy, and follow-up period. The posterior and superior margins of the TL are not precisely shown on the CT images, resulting in variances in the dose description of the TL. In our study, although lots dosimetric factors were correlated to TLN, the final selected features were D0.5cc and D10. This result have demonstrated that a hot spot and a small region (10% of the TL) have most effects on the TLN.
The low incidence of TLN makes clinical control difficult. After a median follow-up of 48.8 months, some of the current necrosis-free patients will likely show symptoms of TLN in the future. With an extended follow-up period, more persuasive findings may be obtained. This is the main reason that the TLN incidence rate difference in our training and testing set. This also cause a deviation of the calibration curve. Additionally, although necrosis occurs only within a small region, the dose administered to the entire TL is usually analyzed. A compromise approach is to define a 'sub-TL' ROI that delineates only the front TL because nearly all cases of TLN occur in the anterior and lower parts of the TL. Applying this approach might generate a clearer TLN nomogram.

Conclusion
This study presents a prediction model can be conveniently used to facilitate the individualized prediction of TLN in patients with NPC. Clinical factors have no direct impact on TLN.