Intensity modulated proton therapy compared to volumetric modulated arc therapy in the irradiation of young female patients with hodgkin’s lymphoma. Assessment of risk of toxicity and secondary cancer induction

Background To investigate the role of intensity modulated proton therapy (IMPT) compared to volumetric modulated arc therapy (VMAT) for advanced supradiaphragmatic Hodgkin’s lymphoma (HL) in young female patients by assessing dosimetric features and modelling the risk of treatment related complications and radiation-induced secondary malignancies. Methods A group of 20 cases (planned according to the involved-site approach) were retrospectively investigated in a comparative planning study. Intensity modulated proton plans (IMPT) were compared to VMAT RapidArc plans (RA). Estimates of toxicity were derived from normal tissue complication probability (NTCP) calculations with either the Lyman or the Poisson models for a number of endpoints. Estimates of the risk of secondary cancer induction were determined for lungs, breasts, esophagus and thyroid. A simple model-based selection strategy was considered as a feasibility proof for the individualized selection of patients suitable for proton therapy. Results IMPT and VMAT plans resulted equivalent in terms of target dose distributions, both were capable to ensure high coverage and homogeneity. In terms of conformality, IMPT resulted ~ 10% better than RA plans. Concerning organs at risk, IMPT data presented a systematic improvement (highly significant) over RA for all organs, particularly in the dose range up to 20Gy. This lead to a composite average reduction of NTCP of 2.90 ± 2.24 and a reduction of 0.26 ± 0.22 in the relative risk of cardiac failures. The excess absolute risk per 10,000 patients-years of secondary cancer induction was reduced, with IMPT, of 9.1 ± 3.2, 7.2 ± 3.7 for breast and lung compared to RA. The gain in EAR for thyroid and esophagus was lower than 1. Depending on the arbitrary thresholds applied, the selection rate for proton treatment would have ranged from 5 to 75%. Conclusion In relation to young female patients with advanced supradiaphragmatic HL, IMPT can in general offer improved dose-volume sparing of organs at risk leading to an anticipated lower risk of early or late treatment related toxicities. This would reflect also in significantly lower risk of secondary malignancies induction compared to advanced photon based techniques. Depending on the selection thresholds and with all the limits of a non-validated and very basic model, it can be anticipated that a significant fraction of patients might be suitable for proton treatments if all the risk factors would be accounted for.


(Continued from previous page)
Conclusion: In relation to young female patients with advanced supradiaphragmatic HL, IMPT can in general offer improved dose-volume sparing of organs at risk leading to an anticipated lower risk of early or late treatment related toxicities. This would reflect also in significantly lower risk of secondary malignancies induction compared to advanced photon based techniques. Depending on the selection thresholds and with all the limits of a nonvalidated and very basic model, it can be anticipated that a significant fraction of patients might be suitable for proton treatments if all the risk factors would be accounted for.

Background
The role of proton therapy in the radiation treatment of Hodgkn's lymphoma (HL) patients as been investigated by many groups and various reviews and consensus publications summarized the status of the art [1][2][3].
The HL subcommittee of the particle therapy cooperative group (PTCOG) published in 2017 an evidencebased critical review [2] aiming to summarize the rationale for proton therapy based on i) the late morbidity data/models, ii) the dosimetric literature showing the potential of protons over photons and iii) the limited clinical literature. In this review, seconday cancers were identified as the primary cause of death among the long term survivors with particular concern for breast and lung. Cardiovascular morbidity/mortality was then identified as the primary cause of nonmalignant deaths/disease followed by pulmonary late effects (pneumonitis) and endocrinopathies. Dosimetric data provided evidence of the expected superiority of proton based plans over photon based plans for a large number of organs at risk while providing equivalent or superior potential for target coverage. Concerning the clinical studies, no proton-related high grade toxicity (late or acute) were reported with the caveat of sometimes short follow-up in the analysed cohorts. In general, the use of more conformal proton techniques was not associated to marginal recurrences with the promis to guarantee the excellent cure rates derived from photon series. As a recommendation, the PTCOG subcommittee advised that proton therapy should be reasonably considered for HL treatments with the recommendation for the development of a model-based approach for the selection of patients suitable for proton therapy.
More recently, the International lymphoma radiation oncology group published a consensus recommendation in the form of guidelines for proton therapy in adults with mediastinal HL [3]. The report recommended the need to demonstrate (with calculations from comparative plans) the benefit expected from protons. The authors recommended a documented medical necessity (including estimates of the risks for radiogenic late effects) and raised awareness about the increased complexity of proton planning, with special attention to the uncertainty management. The presence of uncertainty deriving from the trade-off between technique and anatomy (range and calibration uncertainties) suggested to treat patients under deep inspiration breath hold to improve organs at risk sparing and to apply robust optimization methods to mitigate the technical uncertainties. The guidelines included a valuable summary to evidence-based acceptable dose-volume constraints for heart structures, breasts, lungs and thyroid which were considered as an input also in the present study.
Due to the favorable physical properties of protons, the dose distributions of intensity modulated proton plans should inherently lead to an increased sparing of organs at risk and to the possibility of dose escalation compared to photon based treatments (when clinically appropriate). Nevertheless, in a resource-limited world, it is necessary to validate the added value of the anticipated dosimetric advantages for radiation toxicity, eventually in synergy with the possibility to prevent long term radiation induced effects (e.g. carcinogenesis). Simple methods based on the investigation of physical dose distributions or simple complication models, mostly at planning level, have been discussed but could only provide general indications and not individualized selection criteria [4][5][6][7][8]. The best approach to validation would be the execution of properly sized and defined clinical trials but this could be limited by ethical considerations, the time needed to obtain the results and the costs of execution. Alternative methods include the development of model-based validation strategies. This concept is founded on the assumption that predictive models for radiation-induced side effects can be developed and applied to estimate the risk of normal tissue complications (on multivariate level) and of other complications like secondary cancer induction for each patient. Based on the model predictions it would be possible to select patients for a given technique among a group of competitive solutions. There is a growing consensus toward the application of this strategies and in some countries (e.g. the Netherlands) this approach will be a pre-requisite for enrollment of patients into proton treatments [9,10].
The practical implementation of the model-based approach requires well designed data collection in order to fit the dose-effect relationships into multivariate models and to-date only few examples of validated models exist [11][12][13][14]. Considering the long life expectancy of the HL patients, the models used for treatment technique selection should incorporate also the risk of secondary cancer induction and the number of years lost due to any kind of late effects [15]. Up to-date no similar tools are available for HL.
Based on the paucity of evidence about this topic, the primary aim of this study was to investigate, at planning level, the relative merit of IMPT versus VMAT for young female patients with supradiaphragmatic HL. In particular, the focus was set to: i) the expected gain in a number of selected dose-volume metrics; ii) the estimation of the risk of complications for relevant organs at risk, and iii) the estimation of the risk of secondary malignancy induction for breast, lung or thyroid. Plans were designed with robust optimization methods to account for the recommendations from [3].
The secondary aim was, in the absence of validated model-based criteria for the assignment of patients to IMPT or VMAT treatments, to appraise with a simplified approach the frequency of indications to proton treatment. The selection criteria were based on simple thresholds applied to composite risks of toxicity and/or secondary cancer induction. Scope of this was to perform a very preliminar verification of the selection power under the assumption that not all patients might necessarily require proton treatments.

Patients selection, contouring and dose prescription
A cohort of 20 young female patients with stage IIA-B supradiaphragmatic HL were selected for this retrospective planning study. The cases were selected from most recent patients in the clinical database fulfilling the criteria above. The selection spanned back to 2016. All patients datasets were used for proton and photon planning.
The involved site radiation therapy concept was applied for the study. The clinical target volume (CTV) was defined as the involved lymph nodes region at diagnosis. The planning target volume (PTV) was generated adding an isotropic margin of 8 mm to the CTV (cropped 4 mm inside the skin surface). The following organs at risk (OAR) were segmented and considered: the breasts and the lungs, heart, the left anterior descending coronary (LAD), the left ventricle (LV), the esophagus, the thyroid and the spinal cord. For the LAD a planning risk volume was considered adding a margin of 4 mm. LAD and LV structures were contoured with the support of a senior diagnostic radiologist on the conventional planning CT datasets. All segmentations were executed on standard contrast-free planning CT acquired in free breathing mode.
The dose prescription was 30Gy in 15 daily fractions as in the clinical routine. For the planning comparison purposes, all the plans were normalized to 100% as the mean dose to the CTV. The dose-volume planning aims used for the optimization and assessment are listed in detail in Tables 1 and 2 for each structure. For the OARs the values were derived from [3].

Photon planning
The photon plans were designed and optimized according to the volumetric modulated arc therapy in its Rapi-dArc (RA). Flattening filter free photon beams (beam quality of 6MV) from a TrueBeam linear accelerator (Varian Medical Systems, Palo Alto, USA) were used for the study. Optimization was performed using the Eclipse ---Chapet [19] nr not reported treatment planning system Photon Optimiser algorithm and the final dose calculation with the Acuros-XB engine with a calculation grid of 2.5 mm (version 15.5). All plans were optimised according to a class solution consisting of two full arcs with collimator angle set to 10-350°. An additional third arc with collimator angle set to 90°was added if necessary to improve dose distributions. This latter was eventually split into 2 sub-arcs overlapping along the cranio-caudal axis if the field length would be longer than 15 cm.

Proton planning
The ProBeam proton system (Varian Medical systems, Palo Alto, USA) was used as a source of beam data for the optimization and calculation of intensity modulated proton therapy (IMPT) plans using the beam spot scanning technique. The dose distribution optimization was performed using the nonlinear universal Proton Optimiser (NUPO, v15.5) [20] while for the final dose calculation, the Proton Convolution Superposition algorithm (v15.5) was used using a grid size of 2.5 mm and a constant relative biological effectiveness RBE of 1.1. All patients were planned with a standardised class solution with one anterior field and two posterior-oblique beams arranged with individualized gantry angles according to the target position aiming to minimize the healthy tissue involvement [5].
A robust optimization technique was applied in order to account for setup and range uncertainties considering ±4 mm shifts in the isocentre along each axis and ± 3% in beam range. The robust optimization was run to result in the solution which minimizes the trade-offs due to the applied uncertainties to the dose-volume constraints in the cost function as discussed in [21]; robust plans were optimised by applying the uncertainties to the CTVs.

Quantitative assessment of dose-volume metrics
The mean dose and a variety of V x and D x parameters (V x represents the volume receiving at least an x level of dose (in % or in Gy) and D x is the minimum dose that covers an x fraction of volume (in % or in cm 3 ) [22] were derived from the dose volume histograms (DVH) and used as quantitative metrics.
For the target, the homogeneity index (HI) was scored to measure the variance of the dose. HI was defined as HI = (D 5% -D 95% )/D mean . The dose conformality was scored with the Conformity Index, CI95%, defined as the ratio between the patient volume receiving at least 95% of the prescribed dose and the PTV volume. The average DVHs were computed, for each structure and each cohort, with a dose binning resolution of 0.02Gy. Proton doses are reported in Cobalt equivalent Gy (corrected for the 1.1 RBE factor).
The significance of the observed differences was determined by means of the Wilcoxon matched-paired signed-rank test which is appropriate to assess the significance of the discrepancies between two matched pair of variables, not normally distributed in small samples. The threshold for statistical significance was set at p < 0.05. The SPSS package version 22 (IBM Corporation) was used for the study.

Modelling the risk of toxicity and secondary cancer induction
Normal tissue complication probabilities (NTCP) for selected endpoints were estimated for the lungs (pneumonitis), the esophagus (esophagitis) and the whole heart (death). The lungs were jointly considered as a single structure. Two models were used for the computation: the Lyman-Kutcher-Burman [23] and the seriality (Poisson-LQ) model [24]. The parameters used for the computation are summarized in Table 1 together with associated uncertainties as derived from the original publications. Data were retrived from: Gagliardi [16] for cardiac mortality (from the analysis of breast patients treated with photons); Seppenvolde [17] for pneumonitis (from data of breast cancer, malignant lymphoma and non-small-cell lung cancer patients); Moiseenko [18] for symptomatic pneumonitis (from malignant thymoma patients); Chapet [19] for esophagitis (from non-small-cell Other endpoints, included in first instance but leading to null NTCP estimation were not included in this report. The relative risk (RR) estimation for LAD and LV failure/disease was performed according to the linear model proposed by van Nimwegen [25,26] for Hodgkin lymphoma patients (following the observations of Darby [27] for breast cancer patients) who correlated coronary heart disease to the mean dose to the heart. The excess relative risk was fixed to 7.4 and 9.0% per Gy, respectively for LAD and LV. The choice of applying the van Nimwegen model using as input the mean dose to the LAD is somehow arbitrary in the absence of definitive data on sensitivity of substructures but is consistent with other similar investigations such as, e.g., Levis et al. [28].
Composite values for NTCP and RR predictions were also computed and reported. For NTCP, the composite prediction was defined as the sum of NTCP for esophagitis, pneumonitis (all three estimates) and cardiac mortality (Composite 1). For RR the composite resulted in the sum of the individual RR for LAD and left ventricule (Composite 2). The composite estimate shall be considered as a worst case scenario where all the possible toxicities concur simultaneously with the same weighting factor.
Concerning the risk of secondary malignancy induction, the excess absolute risk (EAR) for a whole specific organ (org) was estimated according to the methods described in [29]. In brief, it is expressed by: where V T is the total organ volume. The sum is over all the bins of the differential DVH, V(D i ) is the absolute volume receiving a dose D i . Δ is the slope of cancer induction based on the atomic bomb survivors' data [30] corrected for the age distribution. The values used in this analysis were: 4.98, 3.78, 3.2 and 0.4 for breast, lung, esophagus and thyroid. RED(D) is the dose response, which has been modeled using different approaches to fit the Hodgkin's patient's data group [30]. The organ equivalent dose (OED): was introduced as the dose in Gray, which, when uniformly distributed across the organ, causes the same radiation-induced cancer incidence.
Different models were published to calculate OED. In the present study, the so-called full model was adopted [31].
Where R is the parameter accounting for repopulation and/or repair and models the ability of the tissue to recover between two dose fractions (R = 0 means no recovery, R = 1 full recovery). This model fully includes all the biological aspects of cell killing, repopulation/repair, and fractionation. The used values for R were 0.62, 0.83, 0.846, 0 for breast, lung, esophagus and thyroid cancer induction, respectively [30][31][32]. In the case of thyroid, dose-response curves show no repair and are bellshaped [32].
Also for the EAR a composite value was defined as the sum of the EAR per each organ with the same worst case scenario approach used for NTCP and RR.

Model based selection of the appropriate technique
An attempt to determine selection criteria for the potential assignment of patients to either photon based or proton based treatments, a simple model based approach was derived from NTCP, RR and EAR estimates.
In the absence of validated models for lymphoma patients, arbitrary thresholds were defined for each of the composite estimates as follows: NTCP composite: thresholds set at either 8% or 5%; RR composite: threshold set to 0.25 or 0.10. For EAR the threshold on the composite value was set to either 15 or 10.
Given the definition of the composite estimates, this model based selection aims to detect the maximum possible benefit of one technique over the other in the rare case of all toxicities manifesting simultaneously. More sophisticated models, based on clinical data, should be developed in the case of clinical use.
The model based selection would then require that a patient would have been assigned to the IMPT treatment according to possible 3 scenarios (from more selective to more inclusive): Strong criteria: estimates concomitantly in excess of all the three higher thresholds; Intermediate criteria: estimates concomitantly in excess of all the three lower thresholds; Weak criteria: estimates in excess of at least 1 high or 2 low thresholds.
It is absolutely obvious that, given the small sample size of the study, the approach suggested here cannot be considered more than a feasibility aiming to pave the path for more appropriate and comprehensive models in view of any clinical applicability [9][10][11][12][13][14][15]. Fig. 1 illustrates the dose distributions for one representative case for the RA and IMPT plans in two axial and the coronal and sagittal planes. The colorwash was set from 1 to 33Gy. Fig. 2 presents the average dose volume histograms for the two techniques for the CTV, the PTV and the main organs at risk considered in the study. In front to an equivalence for the target volumes, the IMPT robust plans presented an average large sparing effect over the dose range 0-15Gy for most of the organs and over the entire range for the spine and thyroid. In the case of the esophagus the sparing effect of protons is less remarkable since this OAR is either partially included or very proximal to the target. For LV protons showed the sparing effect up to 5Gy while for higher doses no average difference was seen with respect to photons.

Dosimetric comparison
This qualitative overview reflects the quantitative analysis of the DVHs summarized in Table 2 for CTV and PTV and in Table 3 for the OARs. Regarding target volumes the only statistically significant differences were marked for the standard deviation in the case of the CTV (0.1Gy difference) and for the near-to-maximum dose D 2% for the PTV (0.3Gy, i.e. 1% difference). In both cases the clinical relevance of such a difference seems to be negligible. Contrarily to the target volumes, for all the OARs statistically significant differences were observed for most of the relevant dose-volume metrics and near to significant (p < 0.1) for the remaining.
When the analysis is restricted to the metrics having explicit planning aims, IMPT plans respected the constraints in all cases while RA plans resulted slightly suboptimal for the mean dose to the breasts, to the LAD (0.7Gy higher than the aim in both cases).
The dose conformity was similar for RA and IMPT (1 0% better for IMPT) but, as inherently expected, the dose bath resulted largely reduced with Protons. The V 10Gy metric resulted 75% higher for RA than for IMPT. Table 4 reports the the RR estimates for the LAD and LV as well as the NTCP estimates for the two techniques and the various endpoints considered in the analysis and for the composite estimates. The uncertainties reported are relative to the inter-patient variance. The uncertainty on the NTCP predictions due to the uncertainty on the model's parameters are not included here but were determined to be of the same order of the discrepancy observed between photons and protons.

Toxicity and secondary cancer risk estimates
Based on these results, the findings from proton plans are suggestive of a remarkable sparing effect with a highly significant average reduction of the RR of 0.13 for both LAD and LV.
In the case of NTCP, for all the endpoints the IMPT plans resulted in highly significant reduction of the risk or toxicity compared to RA although in this case the average risk for each individual endpoint would be relatively small, lower then 5%. This statement is correct under the working assumption that the model-related uncertainties cancel out in the computation of the NTCP difference (Δ(RA-IMPT)). Otherwise, the reported differences, at individual level, would not be Fig. 1 An example of the dose distribution for the proton and photon plans in axial, coronal and sagittal planes. The colorwash was set from 1 to 33Gy significant. The average composite gain with protons in NTCP risk resulted < 3%, and < 0.3 for RR.
The analysis of the EAR (expressed in cases per 10,000 patient-years) showed a systematic and highly significant expected benefit for IMPT with respect to RA as summarized in Table 5. In particular, the average EAR gain (defined as the EAR for photonsthe EAR for protons) resulted 9.1 and 7.2 for the breasts and the lungs, respectively. The estimated EAR gain did not reach 1 case per 10,000 patient-years for the thyroid and the oesophagus so, although statistically significant, this cannot be considered of clinical relevance.

Model based selection of the appropriate technique
Based on the arbitrary thresholds, the model based selection of patients simulated in this study would have lead to: 1 (5%) patient only eligible to proton therapy in the case of the strong, 4 (20%) patients in the case of the intermediate and 15 (75%) patients in the case of application of the weak criteria.

Discussion
The present study compared IMPT and RA plans for 20 HL patients. Prior to discuss the findings of the study, it is fundamental to acknowledge the limit of the sample size. Although it is common practice of in-silico planning studies to select small groups of patients, it is obvious that this approach does not allot to accurately capture all the risk factors due to the large uncertainties caused by contouring and organ motion. The role of a planning study is to identify the potential of a technique (compared to other references) and to pave the path to well-defined clinical investigations. The analysis of dosevolume metrics demonstrated a systematic incremental sparing of all OARs achievable with protons over photons. These results are in line with other planning evaluations [4][5][6]. Existing data from literature confirm the low toxicity profile expected from protons. Hoppe [33] reported the early outcome for HL patients treated by consolidative proton therapy. In a cohort of 138 patients (pediatric and adults), the 3-year relapse free survival was 92% with no grade 3 radiation-related toxicities. Similarly, König [7] reported about the treatment of 20 patients. Homogeneity index resulted 1.04, (coherently with the data from our study) with a 1-2Gy reduction observed in the mean dose to the breast compared to photon based intensity modulation plans (~3.3Gy in our evaluation) and 3-4Gy for the dose to the heart (~1.7 in our group). After a median follow up time of 32 months, the relapse free survival was 95.5% and no toxicity greater than 1-2 was observed. These results are a direct confirmation of the low incidence of severe complications in HL treatments but, in the absence of long follow-up time it does not fully clarify the long term morbidity/mortality risks.
The dosimetric potential advantage is translated into an anticipated reduction of risk of toxicity. Any estimate of normal tissue complication is based on set of parameters derived from the analysis of clinical dataset, frequently small and possibly obtained from treatments not strictly consistent with those under investigation. This relatively arbitrary choice of parameters shall be considered if absolute estimates are to be appraised. In addition, all model parameters are affected by inherent uncertainties (often large) which are reflected into the predictions. This can be seen as an implicit bias in the calculations which can be mitigated expressing the results in relative terms, as ratios or differences as we did in this study. This working hypothesis is used in the present study. Otherwise, due to the large uncertainty on the parameters concurring to the computation of NTCP (see Table 1), the discrepancy between the NTCP predictions for photon and proton plans, would be of the same order of the model-related uncertainty cancelling the significance of the differences at a patient-perpatient level. This issue should be mitigated only by an improved determination, with small variance, of the parameters in the NTCP model; a fact of major relevance for model-based selection of the treatment to select for the patients. The use of non-parametric tests to determine the potential significance of the discrepancies can accounts for the small sample sizes. This was evaluated with NTCP for the lungs, the whole heart and the oesophagus with different endpoints and different models using evidence-based input parameters for the calculations. A more recent approach was adopted for the estimation of the relative risk of morbidity for the heart substructures, namely the left ventricular chamber and the left anterior descending coronary. In this case the Nimwegen-Darby models [25][26][27] were applied and also in this case treatments with IMPT have the potential to reduce the risk on average. Levis [28] applied the same models for RR in a comparison among two variants of VMAT. They found that RR for the LV was 1.3 ± 0.6 while RR for the coronaries (all together) was 2.0 ± 0.6 in comparison with the 1.26 ± 0.20 and 1.41 ± RA RapidArc, IMPT Intensity modulated proton therapy, Prv planning risk volume, D x% dose received by at least (maximum) x% of the volume, V xGy volume receiving x Gy 0.34 for the RA plans in our study. This demonstrates the consistency of VMAT data among different institutions with different planning environments. It is important to notice, as limiting factors, that the van Nimvegen model vas developed correlating the coronary disease to the mean heart dose and not to the sub-structure dose. In addition, the data presented in those studies were derived from photon treatments and therefore the models might not be strictly applicable also to protons. In the absence of definitive published data, the assumptions made, although arbitrary, would allow an appraisal of the relative risks. The RR estimates for LAD and LV (and heart substructures in general) are biased by at least 2 major limits: i) the identification of these structures onto planning CT datasets and ii) the motion induced by heart beat. The use of the planning risk volume concept is an effective mitigation for the motion uncertainties of the coronaries (as applied in this study). The possibility to identify the heart sub-structures on normal (contrast free) treatment planning CT scans is instead quite limited. As a consequence of the challenges in segmenting the LAD on contrast-free planning CT (as in our study) it is obvious that the results presented shall be associated to an uncertainty which is potentially severe and certainly hard to quantify. A possible mitigation of both issues might be the advisable use of cardiac scanning  (with the need of registration against the planning CT and the associated extra costs). Alternatively, Levis [28] proposed to use atlas-based contouring guidelines (as defined by Feng [34]). This approach might facilitate the taks, particularly if dedicated cardiac scanning would not be available.
The calculations for the excess absolute risk of secondary cancer induction were performed by means of the socalled full model as described in the methods. This accounts for repair, repopulation and fractionation mechanisms. For the thyroid, epidemiologic data suggest the absence of any repair mechanism (as shown as an example by Bhatti [35]) so the parameter R is set to 0. Mathematically, the limit for R approaching R of the EAR function reduces it to the bell shaped form. In practice, EAR resulted quite large for breast and lung and strongly reduced with IMPT. EAR for lung was in absolute terms large also for protons (15.3 ± 8.0) because of the extensive dosimetric involvement of these organs for supradiaphragmatic HL patients. On the contrary, the thyroid and the oesophagus resulted in lower absolute EARs (in the range from 1 to 3.5 in average) with modest improvement offered by protons. In summary, lung and breast confirmed to be the structures at highest risk and, for (young) female patients, the risk reduction (a sparing of~72% from the RA level) expected from protons is likely the strongest benefit derived from the data analysed.
Given the large variability of target locations and extension, although all classified as supradiaphragmatic cases, the level of protection of each OAR, in particular the heart sub-structures, presents a large variance in all considered metrics (dose-volume parameters, NTCP, RR and EAR) confirming that, a case by case assessment of the appropriate treatment technique should be performed with a model-based methodology.
The absence of validated multivariate models applicable for HL patients prevented to implement the selection strategies suggested by Langendijk and realized by Rwigema or Cheng [9][10][11]13] for head and neck or Prayongrat [14] for liver. This is a clear need and intensive investigations shall be performed in the near future in this area to provide reliable model. A simple attempt to mimic the process was introduce in this study. In practice thresholds with different selection power (weaker or stronger) were applied to composite risk estimates (NTCP, RR and EAR) to determine scenarios of decreasing selectivity. Without the claim of generalization, we demonstrated that, even with quite relaxed criteria, not all patients would result eligible for proton therapy (up to 75% in the most inclusive scenario) while, given the anyhow high quality of VMAT plans, if highly restrictive criteria would be applied, only a minority of the cases (eventually of the order of 5%) might result eligible. It is obvious that this speculation should be considered with caution but it is suggestive/ confirmative of the fact that: i) selection is needed in a resource-limited health economic environment and ii) also HL patients can benefit from proton therapy, particularly for the groups at highest risk as young females (or pediatrics).
The IMPT plans used for this analysis were obtained by means of a robust optimization engine using 3% uncertainty in the range calibration and 4 mm in the isocenter position. The thresholds are somehow arbitrary but realistic from routine clinical practice. The use of robust optimization is coherent with the guidelines of the International Lymphoma Radiation Oncology group [3].
Although not fully consolidated in clinical practice, the use of robust optimization should be seen as an essential ingredient of the planning receipt.
Even if dosimetric studies shown that pencil beam scanning can be superior to other techniques [36] the same studies demonstrated that the quality of plans is sensitive to motion/positioning related issues which can be accounted for or mitigated by means of the robust optimization methods.
Motion due to respiration can be further mitigated by respiratory gating, and in this perspective, deep inspiration breath hold (DIBH, a practice largely consolidated in photon treatments in the breast or chest districts) can provide additional benefit in terms or dose reduction to various OARs. The use of DIBH is also part of the guidelines from the international lymphoma group. Everett [6] showed how DIBH could improve lung dosimetry but might had little impact on cardiac involvement in mediastinal lymphoma treatments. Similarly Edvardsson [8] demonstrated the benefit for lung and the limited value for breasts (with large variations depending on the individual case for the heart structures). In the present study, the CT datasets used for the planning were acquired in free breathing (due to the current clinical practice for HL in our institute) and it can be anticipated that all the dosimetric findings for lung might be further improved according to the above indications. Nevertheless the benefit would be consistently present in both RA and IMPT plans and therefore [5] the relative comparison between the two approaches would not be substantially different. It is clear that in the case of future clinical investigations, DIBH should be included as a prerequisite. The heartbeat is a challenging factor that could affect the dose delivery uncertainty. There are today no tools allowing any sort of gating or tracking following the heartbeat, contrarily to what happen with the respiration. In the supradiaphragmatic region, heartbeat changes the anatomy with a frequency higher than, and not syncronyzed with the dose delivery, generating an unforeseen and unintended interplay effect, especially for spot delivery in IMPT treatments.

Conclusion
In relation to young female patients with advanced supradiaphragmatic HL, IMPT can offer an improved dose-volume sparing of organs at risk leading to an anticipated lower risk of early or late treatment related toxicities. This would reflect also in significantly lower risk of secondary malignancies induction compared to advanced photon based techniques. Depending on the selection thresholds and with all the limits of a nonvalidated and vary basic model, it can be anticipated that a significant fraction of patients might be suitable for proton treatments if all the risk factors would be accounted for.