Radiomics for prediction of radiation-induced lung injury and oncologic outcome after robotic stereotactic body radiotherapy of lung cancer: results from two independent institutions

To generate and validate state-of-the-art radiomics models for prediction of radiation-induced lung injury and oncologic outcome in non-small cell lung cancer (NSCLC) patients treated with robotic stereotactic body radiation therapy (SBRT). Radiomics models were generated from the planning CT images of 110 patients with primary, inoperable stage I/IIa NSCLC who were treated with robotic SBRT using a risk-adapted fractionation scheme at the University Hospital Cologne (training cohort). In total, 199 uncorrelated radiomic features fulfilling the standards of the Image Biomarker Standardization Initiative (IBSI) were extracted from the outlined gross tumor volume (GTV). Regularized models (Coxnet and Gradient Boost) for the development of local lung fibrosis (LF), local tumor control (LC), disease-free survival (DFS) and overall survival (OS) were built from either clinical/ dosimetric variables, radiomics features or a combination thereof and validated in a comparable cohort of 71 patients treated by robotic SBRT at the Radiosurgery Center in Northern Germany (test cohort). Oncologic outcome did not differ significantly between the two cohorts (OS at 36 months 56% vs. 43%, p = 0.065; median DFS 25 months vs. 23 months, p = 0.43; LC at 36 months 90% vs. 93%, p = 0.197). Local lung fibrosis developed in 33% vs. 35% of the patients (p = 0.75), all events were observed within 36 months. In the training cohort, radiomics models were able to predict OS, DFS and LC (concordance index 0.77–0.99, p < 0.005), but failed to generalize to the test cohort. In opposite, models for the development of lung fibrosis could be generated from both clinical/dosimetric factors and radiomic features or combinations thereof, which were both predictive in the training set (concordance index 0.71– 0.79, p < 0.005) and in the test set (concordance index 0.59–0.66, p < 0.05). The best performing model included 4 clinical/dosimetric variables (GTV-Dmean, PTV-D95%, Lung-D1ml, age) and 7 radiomic features (concordance index 0.66, p < 0.03). Despite the obvious difficulties in generalizing predictive models for oncologic outcome and toxicity, this analysis shows that carefully designed radiomics models for prediction of local lung fibrosis after SBRT of early stage lung cancer perform well across different institutions.


Introduction
Stereotactic body radiation therapy (SBRT) is an effective therapy for early-stage, node-negative, medically inoperable non-small cell lung cancer (NSCLC). Dosefractionation schemes usually depend on tumor size and location and have been largely standardized by current guidelines [1][2][3][4]. However, after irradiation, about 10-15% of the tumors will recur locally, up to 50% of the patients will experience systemic disease progression despite PET-based staging before SBRT [5], and 25-30% of the patients will develop radiation-induced lung injury (RILI) on follow-up chest imaging. Apart from an established dose-response relationship for local control [6], dosimetric and clinical factors have only shown limited capability in predicting these events [7][8][9][10][11][12][13][14].
The availability of open-source software solutions allows the extraction of standardized radiomic features and generation of complex, non-linear models which are able to account for complex interactions between features and have the potential to achieve high performance. The primary objective of the present study was to build a model for the development of radiationinduced lung injury by use of state-of-the-art feature extraction and machine-learning algorithms in order to determine the extra value of imaging tumor biomarkers when used in addition to dosimetric and clinical factors in a cohort of patients with NSCLC treated by robotic SBRT. Secondary objectives of the study were the development of models for local control, disease free survival and overall survival. The models were trained on data from one institution and tested on a cohort from a separate institution that treated patients based on similar inclusion criteria and fractionation schemes. This work extends an earlier single institution report [36].

Patients, treatment and follow-up
Two cohorts of patients with stage I/IIa NSCLC (according to staging classification of the Union for International Cancer Control [UICC], 8th edition) who underwent definitive robotic SBRT were retrospectively analyzed. The first cohort comprised 110 patients treated at the University Hospital of Cologne, Germany and was used for identification of clinical, dosimetric and image-derived parameters to predict local control (LC), overall survival (OS), disease free survival (DFS) and occurrence of local lung fibrosis (LF) as a manifestation of radiation-induced lung injury after SBRT. This cohort had already been analyzed in a previous study using a simpler radiomics approach that only included first-level features and a small set of 5 texture features from the Gray-Level Co-occurrence Matrix (GLCM) without wavelet-filtering [36] and served as the training data set. A second cohort of 71 patients was treated at the Radiosurgery Center Northern Germany, Guestrow, and was used as test set (in machine learning terminology) for the predictive power of the models developed in the training set.
In both cohorts, patients suffering from a peripheral T1/2 (UICC 8) NSCLC without lymph node metastases who were either medically inoperable or refused resection were treated solely by means of the Cyberknife R system (Accuray, Sunnyvale, USA) without concomitant therapy using a risk-adapted fractionation scheme (peripheral T1 tumors 3 × 13-18 Gy, T1 tumors with broad contact to the chest wall and T2 tumors 5 × 10-11 Gy, near-central or true central tumors 8 × 6-7.5 Gy). The dose was calculated using a Monte Carlo dose calculation algorithm (Multiplan 4.5, Accuray, Sunnyvale, USA) and the prescribed dose was referred to the 65-70% isodose in most cases. The GTV was manually outlined for clinical use on the planning CT, and the PTV was generated by adding a margin of 3-4 mm ( Table 1). A set of volumetric and dosimetric parameters was extracted from the planning system including GTV (gross tumor volume), PTV (planning target volume), GTV-D max (maximal dose in GTV), GTV-D mean (mean dose in GTV), GTV-D 95% (dose achieved in 95% of the GTV), PTV-D 95% (dose achieved in 95% of the PTV) and lung doses Lung-D 1ml , Lung-D 10ml , Lung-D 50ml , Lung-D 100ml [13,[37][38][39][40], see Table 1. According to the different dose/ fractionation schemes, Conclusion: Despite the obvious difficulties in generalizing predictive models for oncologic outcome and toxicity, this analysis shows that carefully designed radiomics models for prediction of local lung fibrosis after SBRT of early stage lung cancer perform well across different institutions.
the doses covered a wide range in both datasets. The GTV-D max which is reported as a point dose from the Cyberknife system might have been biased upwards due to noise induced by the Monte Carlo algorithm, but showed a close correlation with the GTV-D mean in both sets (Pearson correlation coefficient 0.98 and 0.89), indicating that this bias was small. The cohorts also contained 12 (8) patients with local stage T1/2 tumors who had been successfully treated for oligo-metastatic disease, and who were free from tumor activity besides the primary tumor. Patient characteristics and treatment parameters are shown in Table 1. All patients had the (3D) planning CT performed under breath-hold conditions which was used for both treatment planning and radiomics image analysis ( Table 2).
Clinical and radiological follow-up including chest CT scans was scheduled at 3 and 6 months after radiotherapy and every 6 months thereafter. A local recurrence was assumed if the irradiated lesion showed a solid core that  increased by at least 25% compared to the last follow-up and exhibited further growth. The first occurrence of diffuse or patchy consolidation, diffuse or patchy ground glass opacity or modified or mass like consolidation in the lung tissue adjacent to the tumor was regarded as radiation induced lung injury (termed local fibrosis, LF, see Fig. 1) and recorded as an event with regard to the time interval to the date of first irradiation [41,42]. Lung tissue changes smaller than the original tumor, scar-like patterns distant to the tumor and lung toxicities without clear spatial or temporal relation to radiotherapy (early acute pneumonia, late acute pneumonia, pneumonitis spatially not correlated to the PTV) were not considered. In cases where a growing lesion could not be differentiated from local fibrosis, an FDG-PET-CT scan or a biopsy was performed in order to confirm or reject the diagnosis of a local recurrence.

Image processing and feature extraction
Image processing was performed using Python 3.6.7 (Python Software Foundation, Beaverton, Oregon, USA). The original DICOM data containing manual delineations of the gross tumor volume (GTV) and anatomical image data were restored from the Cyberknife R archive and subsequently used to extract the target volumes for radiomic analysis. For all further image processing, the software package pyradiomics 2.0.1 [43] was used that allows the extraction of standardized features which were defined by the IBSI (Image Biomarker Standardization Initiative) [44]. Preprocessing included resampling of the CT images and masks to isotropic voxels of 1 mm 3 by the standard procedures of pyradiomics (B-spline interpolation for the CT images and nearest-neighbour interpolation for the binary masks) and removal of all voxels with Hounsfield units (HU) below (− 400) HU and above 1000 HU from the volume which were assumed to represent normal lung and bony tissue unintentionally included in the GTV by manual segmentation. Radiomic features were calculated based on the original image and after wavelet filtering, yielding eight additional image types based on the application of wavelet-based high-pass or low-pass filters to each of the three dimensions. In addition to 14 features descriptive of the target's shape, 93 features were calculated for each of the nine image types, resulting in a total of 851 radiomic features.

Model development and statistical analysis
All model development was performed on the training cohort and the model parameters were optimized using cross-validation schemes. First, the primary set of radiomics features was reduced by identifying and removing linearly correlated features with a Pearson correlation coefficient > 0.90. Out of the 851 extracted features, 652 were found to be highly linearly correlated and removed from the analysis. The remaining 199 features were z-normalized and used to develop predictive models for each of the four endpoints: LC, OS, DFS and occurrence of local lung fibrosis (LF) after SBRT. Two different models implemented in the scikit-survival package for Python [45] were applied. The (linear) Cox Proportional Hazard model was used in conjunction with elastic net regularization of the feature coefficients (Coxnet, [46]) by means of a grid search for the optimal penalty parameter (alpha) with 10-times repeated fivefold random cross validation within the training set. Thus, the algorithm tries to reduce as many feature coefficients as possible to zero, and to achieve good predictions from the resulting smaller selection of features in the 10 × 5 validation data sets each comprising 20% of the training data. The model with the best cross-validated average training performance was then re-evaluated both in the complete training and the test set, where the feature values of the test set were subjected to the z-transformation parameterized from the training set. In order to allow for complex nonlinear relationships between feature values and treatment outcome, a gradient-boosted ensemble of 100 regression trees (Gradient Boost, [47]) with the partial likelihood loss of the Cox's proportional hazards model used as the loss function was also chosen as a model. In the training set, the main parameters of the model (learning rate, dropout rate and subsampling rate) were optimized by a 10-times repeated fivefold grid search with crossvalidation. Thus, an intermediate model that returns a list of features and their importance was generated. The list was then used to build models with increasing number of features, by starting with the one feature with the largest importance and successively adding the next important ones. At each step, the model was retrained and crossvalidated (10-times repeated fivefold cross validation) in the training set. Typically, the cross-validated performance increased initially, but decreased by including more and more features due to overfitting. As before, the model with the optimal cross-validated performance was applied to the feature-normalized training and test sets. In addition to the radiomics features, the following continuous clinical and dosimetric variables were analyzed in univariate Cox regression models with respect to their potential impact on any of the endpoints in order to select them as predictive features: GTV, PTV, GTV-D max , GTV-D mean , GTV-D 95% , PTV-D 95% , Lung-D 1ml , Lung-D 10ml , Lung-D 50ml , Lung-D 100ml , tumor diameter, age and Charlson Comorbidity Score. Categorical clinical and treatment related factors were investigated using the Kaplan-Meier method and survival estimates were compared using two-sided log rank tests. These included: For prediction of radiation-induced lung injury, Coxnet and Gradient Boost models were computed using only clinical/dosimetric features, only imaging features or both types of features. In case of the default settings for the Coxnet model used here, a baseline survival function is not calculated and the predictions are risk scores of arbitrary scale, while the scores of the Gradient Boost models built with the partial likelihood loss of the Cox's proportional hazards model used as the loss function can be interpreted as log hazard ratios. The performance of any model was evaluated in the test set by means of the concordance index and the significance level of the predicted risk score when used as a continuous variable in a univariate Cox regression (p Cox ). For purposes of illustration, the risk scores were dichotomized by their median and the Kaplan-Meier curves for the resulting low-and high risk groups were depicted. All statistical analyses were performed with the Lifelines python package (version 0.25.10, https:// doi. org/ 10. 5281/ zenodo. 45794 31) and cross-checked by SPSS (vs. 24, Armonk, NY, USA). A p-value of < 0.05 was considered significant. The complete workflow is depicted in Fig. 2.

Clinical outcome
The outcome in terms of the analyzed clinical endpoints did not differ significantly between the two cohorts (Fig. 3). Overall survival at 36 months amounted to 56% versus 43%, p = 0.065), median DFS was 25 months versus 23 months, p = 0.43 and local control rates at 36 months were 90% vs. 93%, p = 0.197). In the training set, none of the clinical and dosimetric factors had a significant Fig. 2 Workflow for generating and validating the developed models influence on these endpoints. Local lung fibrosis developed in 33% vs. 35% of the patients (p = 0.75), all events were observed within 36 months after irradiation. Three dosimetric factors (GTV mean , PTV-D 95% , Lung-D 1ml ) and the patient's age had a significant impact (p < 0.05) on the development of local lung fibrosis with an increase in hazard of approximately 6% per Gy and per year of age.

Radiomics models for OS, DFS and local tumor control
As shown in Table 3, the Coxnet model failed to substantially reduce the optimal feature number for these endpoints. Although highly predictive models (CCI 0.77-0.94, p Cox < 0.005) were such found in the training set, these models showed poor cross-validation (CCI 0.52-0.54) and test set (CCI 0.36-0.49) performance. By application of the Gradient Boost model, a substantial feature

Clinical/dosimetric, radiomics and combined models for development of lung fibrosis
Using the 4 identified clinical/dosimetric variables, both the Coxnet and Gradient Boost algorithms selected 3 of them (age, GTV mean and Lung-D 1ml ) for building predictive models from the training set (CCI 0.71/0.73) which in case of the Coxnet also achieved a CCI of 0.65 (p Cox = 0.04) in the test set (Table 4). Feature selection was also successful in both radiomics models where the Gradient Boost Model had a CCI of 0.59 (p Cox = 0.02) using two features (Maximal Correlation Coefficient of the GLCM extracted from images filtered by two different wavelet kernels); this model could not be further improved by including any clinical/dosimetric variables. The best combined model resulted from the Coxnet that selected the 4 clinical/dosimetric variables and another 7 radiomics features and had a CCI of 0.66 (p Cox = 0.03) in the test set (Fig. 4). The two radiomics features with the largest coefficients in this model were the Large Area Emphasis from the wavelet-filtered gray-level size zone matrix (GLSZM) and, as in the radiomics Gradient Boost model, the GLCM Maximal Correlation Coefficient from the wavelet-filtered images.

Summary of findings
In the present analysis, two cohorts of early-stage lung cancer patients treated with robotic stereotactic body radiotherapy at two different institutions were investigated. Although slightly different fractionation schedules were applied, oncologic outcome in terms of local tumor control, disease-free survival and overall survival were well comparable. Importantly, the frequency and time course of development of radiation-induced local lung injury was also similar in the two cohorts. Radiomics analysis based on a selected set of standardized features and state-of-the-art modelling in the training cohort resulted in models for prediction of radiationinduced local lung injury that performed well also in the test cohort. The predictive ability of the radiomics models resembled that of a model from a selection of clinical/dosimetric variables, but the marginally best performance was achieved in a model that combined a small number of clinical/dosimetric and radiomics features. However, the models for the endpoints of oncologic outcome (OS, DFS, local control) failed to generalize to the test cohort.

Prediction of local radiation-induced lung injury
To the best of our knowledge, this is the first report that aimed at generating machine-learning models for the development of local lung injury from the GTV after lung SBRT [36]. Radiation-induced local lung injury that finally develops into local lung fibrosis is a typical event after lung SBRT, although it remains asymptomatic in most cases. It is probably triggered by the release of inflammatory cytokines such as TGF-ß from the tumor  which subsequently initiate an immunological response [48,49]. At first sight, it seems far from obvious how a texture pattern detectable by radiomics could predict for this event. However, an association between a pre-therapeutic radiomics feature (LoG standard deviation) with the TGF-ß signaling pathway has recently been observed, and in the same report, a radiomics score was correlated with the amount of tumor infiltration by T-lymphocytes [50]. The view that image features correlate with the presence of immune-competent cells in lung tumor tissue is also supported by the observation that lung tumors characterized by low CT intensity and high CT heterogeneity exhibited a high CD3 (T-lymphocyte) infiltration, suggestive of an activated immune state [51]. Interestingly, the most predictive features found in the present analysis were related to the heterogeneity of the lung tumor, as the GLSZM Large Area Emphasis is a measure of the distribution of large area size zones where a greater value is indicative of more larger size zones and more coarse textures, and the Maximal Correlation Coefficient of the GLCM is a measure of complexity of the texture resulting in values approaching unity for flat, homogenous regions. In the present report, one of the radiomics models slightly improved the pure clinical/ dosimetric model (Coxnet) for the development of local lung fibrosis while the other radiomics model (Gradient Boost) did not benefit from the potential inclusion of the clinical/dosimetric features. In a comparable approach that has been applied for prediction of radiation pneumonitis from features of the total lung tissue in lung cancer patients treated with intensity-modulated radiotherapy (IMRT), the radiomics features also only slightly improved the predictive value of the model when added to clinical and dosimetric factors [47]. Interestingly, the inhomogeneous dose distribution usually generated by robotic radiosurgery and volumetric arc therapy has itself been analyzed with respect to dose distribution patterns ("dosiomics") which in turn have been found to predict the incidence of radiation pneumonitis [52]. Thus, a more comprehensive model of radiation-induced lung injury could probably be built from incorporating texture analysis of the tumor, a shell [53,54] comprising the adjacent lung tissue and the dose distribution.

Prediction of local control, disease-free survival and overall survival
Although the two cohorts resembled each other in terms of oncologic outcome, the radiomics models did not generalize from the training to the test cohort with respect to these endpoints. In case of the Coxnet model that utilizes a simple, linear combination of features values, a substantial reduction of the feature space was not possible even during cross-validation within the training set itself. This is probably due to the fact that in the present cohort of inoperable patients with small lung tumors, local control could be achieved in most such that only a small number of events was available for training, and DFS and OS were largely independent from single radiomic properties of the tumors and confounded by other factors. Such, the predictive models for these endpoints were only found by chance and due to overfitting by using almost all features available. The Coxnet model used an ensemble of regression trees that can account for complicated, nonlinear relationships between features and time to survival or other endpoints. Much higher cross-validation scores were found from a lower number of features, but the principle obstacles of relating confounded or low eventpopulated endpoints to a small set of radiomic features in an independent test set remained. As discussed above, the relation between selected radiomic tumor features, dose distribution and development of local lung fibrosis seemed be much stronger such that the type of the applied model became less important.
A compilation of recent studies on the impact of radiomics features on oncologic outcome for lung cancer patients after SBRT is presented in Table 5. Most of the studies applied single institution cross-validation or validation by test sets from the same institution and were able to predict local tumor recurrence, regional/nodal recurrence, distant failures and overall survival with a moderate accuracy. Of note, one report failed to observe features predictive of local recurrence [24]. Only in a minority of series were the results validated in test sets from independent institutions. In a large study from the Cleveland Clinic (Ohio, USA), a convolutional neural network (CNN) was trained to predict local recurrence in a group of > 900 lung cancer patients treated by SBRT. The stratification resulted in two groups with highly significant different risk for recurrence in both the training and test set [55]. Also in another study where a CNN was applied to both CT and PET images, a highly accurate classification of survival probability was achieved in an independent data set [21]. These results suggest that the application of CNN's that learn the relevant features for time-dependent oncologic predictions may be more effective than training models on predefined features [56,57].

Limitations of the present study
The present study, although based on the results of two independent cohorts, probably still lacks a sufficient number of patients and events needed for an informative analysis of the interaction between dosimetric parameters and radiologic tissue characteristics for prediction of local events (recurrence, local lung fibrosis) after SBRT of NSCLC. Also, the classification of local lung injury and tumor control is purely image-based and remains somewhat ambiguous, as tissue specimens are rarely available following SBRT. Differences in therapeutic strategies for detecting and treating metastases may have prevented the creation of a general radiomics-based model for prediction of DFS and OS.

Conclusion
The present analysis provides evidence that radiomics analysis can, in principle, be used for prediction of local lung injury after SBRT of NSCLC in independent data sets and as such complements existing results on the successful prediction of other oncologic endpoints in this setting.