Clinical implications in the use of the PBC algorithm versus the AAA by comparison of different NTCP models/parameters

Purpose Retrospective analysis of 3D clinical treatment plans to investigate qualitative, possible, clinical consequences of the use of PBC versus AAA. Methods The 3D dose distributions of 80 treatment plans at four different tumour sites, produced using PBC algorithm, were recalculated using AAA and the same number of monitor units provided by PBC and clinically delivered to each patient; the consequences of the difference on the dose-effect relations for normal tissue injury were studied by comparing different NTCP model/parameters extracted from a review of published studies. In this study the AAA dose calculation is considered as benchmark data. The paired Student t-test was used for statistical comparison of all results obtained from the use of the two algorithms. Results In the prostate plans, the AAA predicted lower NTCP value (NTCPAAA) for the risk of late rectal bleeding for each of the seven combinations of NTCP parameters, the maximum mean decrease was 2.2%. In the head-and-neck treatments, each combination of parameters used for the risk of xerostemia from irradiation of the parotid glands involved lower NTCPAAA, that varied from 12.8% (sd=3.0%) to 57.5% (sd=4.0%), while when the PBC algorithm was used the NTCPPBC’s ranging was from 15.2% (sd=2.7%) to 63.8% (sd=3.8%), according the combination of parameters used; the differences were statistically significant. Also NTCPAAA regarding the risk of radiation pneumonitis in the lung treatments was found to be lower than NTCPPBC for each of the eight sets of NTCP parameters; the maximum mean decrease was 4.5%. A mean increase of 4.3% was found when the NTCPAAA was calculated by the parameters evaluated from dose distribution calculated by a convolution-superposition (CS) algorithm. A markedly different pattern was observed for the risk relating to the development of pneumonitis following breast treatments: the AAA predicted higher NTCP value. The mean NTCPAAA varied from 0.2% (sd = 0.1%) to 2.1% (sd = 0.3%), while the mean NTCPPBC varied from 0.1% (sd = 0.0%) to 1.8% (sd = 0.2%) depending on the chosen parameters set. Conclusions When the original PBC treatment plans were recalculated using AAA with the same number of monitor units provided by PBC, the NTCPAAA was lower than the NTCPPBC, except for the breast treatments. The NTCP is strongly affected by the wide-ranging values of radiobiological parameters.


Background
As a result of the increased sophistication of treatment techniques and delivery methods, the accuracy of highly conformal radiotherapy has improved rapidly with technological advances in recent years. However more demanding modern treatment techniques require better modeling of treatment beams and more sophisticated modeling in the presence of inhomogeneities in order to guarantee accuracy in the calculation of dose distribution. In the clinical routine, calculations of dose to the tumor are performed by commercial treatment planning systems (TPS) and the majority of these systems include dose calculation algorithms with a limited ability to account for the effects of electron transport [1]. The Pencil Beam Convolution (PBC) algorithm is commonly used in clinical practice. However it is well known that it has shortcomings regarding the presence of inhomogeneities, particularly in those regions where charged particle equilibrium does not exist [2][3][4][5]. The introduction of convolution-superposition (CS) algorithms that better account for electron transport, have enabled improved calculation of dose distribution, principally in the absence of electronic equilibrium [6][7][8][9][10][11]. In the Eclipse TPS (Varian Medical Systems) the Anisotropic Analytical Algorithm (AAA) is implemented; it is a 3D pencil-beam kernel-based superposition algorithm [12]. The AAA includes separately modeled contributions from three sources: primary photons, extra-focal photons and contaminating electrons; each of these has an associated fluence, an energy deposition density function and a scatter kernel. A better consideration of inhomogeneities is obtained when the AAA is used. The higher accuracy of the AAA, compared to the Eclipse's Pencil Beam Convolution (PBC) algorithm is well-established [13][14][15][16]. Monte Carlo (MC) simulation is considered to be a gold standard in dose calculation, and it is therefore used to evaluate other dose calculation algorithms. Sterpin et al. [17] investigated the accuracy of the AAA in two studies. First the AAA was compared both with MC and measurements in an inhomogeneous phantom. Second, the AAA and MC were compared with four Intensity Modulated Radiation Therapy (IMRT) treatment plans in the presence of inhomogeneous tissue. They showed good agreement between the AAA and MC and evaluated the improved accuracy of the AAA compared to the PBC algorithm.
Studies Fogliata et al. [18] carried out, show how Collapsed Cone (CC) and AAA manifested a high degree of consistency compared to the MC method, when the impact of photon dose calculation algorithms on expected dose distribution in lungs under different respiratory phases was investigated. PBC proved to be severely defective in calculations, particularly for cases where specific respiratory phases (e.g. deep inspiration breath hold) were assumed for treatment.
In the study of Bragg et al. [19] compared to the PBC algorithm, the AAA was not found to significantly alter the quality of IMRT treatments plans for prostate, parotid or nasopharynx. While its more accurate modeling of lateral electron transport demonstrates significant increases in the volume of PTV being underdosed in IMRT non-small cell lung cancer (NSCLC) treatments plans.
Nielsen et al. [20] investigated the differences in calculated dose distributions and NTCP values between six different dose calculation algorithms for NSCLC treatments. The study showed how the calculated NTCP values for pneumonitis were more sensitive to the choice of algorithm than mean lung dose and V 20 which are commonly used for plan evaluation. Furthermore, NTCP for the lungs was calculated using two different model parameter sets; within each dose calculation algorithm, large differences were found between the calculated NTCP values.
The aim of our study was the retrospective analysis of 3D clinical treatment plans to investigate qualitative, possible clinical consequences of the use of PBC versus AAA, which was considered as benchmark data. The 3D dose distributions of 80 treatment plans at four different tumor sites, produced using PBC algorithm, were recalculated using AAA and the same number of monitor units provided by PBC and clinically delivered to each patient. Similarly to the study of Nielsen et al., in addition to the information relating to the normal tissue/target dosimetry and the Tumor Control Probability (TCP), the comparison was performed investigating the consequences on the dose-effect relations for normal tissue injury comparing different Normal Tissue Complication Probability (NTCP) model parameters extracted from a review of published studies.
Several authors have proposed studies to compute NTCP from a survey on clinical tolerance data [21,22]. After the first parameterization of dose-volume effects reported by Burman et al. [23] and based on the experience from the 2D radiotherapy era, the availability of 3D dose-volume information strongly increased the amount of quantitative data. Consequently much effort was dedicated both to the proposal of reliable dose-volume constraints that demonstrated capability to reduce toxicity, and the development of normal tissue complication probability (NTCP) models properly fitting clinical data. However NTCP models have been re-evaluated using 3D dose data calculated by rather simple computation algorithms; as it is difficult to quantify the clinical consequence of approximate dose calculations, the value of the reported NTCP parameters remains questionable. De Jaeger et al. [24] compared the results of dose calculations in lung tissue using an approximate dose calculation algorithm (the equivalent-pathlength model, EPL) with calculations for a convolution-superposition (CS) algorithm, and the consequences with respect to the estimation of normal lung tissue injury in a group of patients with NSCLC was evaluated [25]. The study demonstrated that when more accurate dose data is available, a reevaluation of NTCP model parameters is necessary to avoid NTCP being grossly over or underestimated.
The degree to which the NTCP is strongly affected by the wide-ranging values of radiobiological parameters is shown in this study. At present, the evaluation of the optimal values of the radiobiological parameters is difficult; absolute NTCP values are not reliable enough to be considered for evaluating a treatment plan. However the NTCP values have the attractive feature of synthesizing in only one value the whole dose distribution throughout the organ of interest and along with the dosevolume parameters are useful tools for comparing rival plans or for defining dose escalation strategies.

Methods
Patient data, treatment planning and delivery technique 3D clinical treatment plans of 80 patients were reviewed for this study. 20/80 were irradiated for left breast cancer with two tangential fields, 20/80 were treated for nonsmall cell lung cancer (NSCLC) by three-dimensional conformal radition therapy (3DCRT); as the PTV locations varied widely, the beam angles were adjusted for each individual patient to meet both the set dose-volume constraints for Organs At Risk (OAR) and acceptable dose distributions. The tumor was situated in the middle or lower lobe, eight plans required four fields; all others used five fields. 20/80 patients were irradiated with intensitymodulated radiotherapy for head-and-neck cancer using seven coplanar fields arrangement [26]; finally 20/80 prostate cancer 3DCRT treatments plans (five coplanar field technique 0°,45°,90°,270°,315°) were re-evaluated.
The target volumes were defined in accordance with the 1993 International Commission on Radiation Units and Measurements Report 50 (ICRU Report 50). The gross tumor volumes (GTV) included all known gross disease as determined by imaging and clinical findings. GTVs were expanded to yield corresponding clinical target volumes (CTVs) according to clinical assessment in each case.
For breast cancer, the CTV was glandular breast tissue and the PTV was generated by expanding the CTV by 0.7 cm isotropically, except in the direction of the skin surface. For lung cancer, the stage was IIIA/IIIB (T3) with a broad scope of disease; the GTV was similar to the CTV. The radiation oncologist identified the GTV, and using a margin of 0.3 cm, the CTV was delineated. A margin of 0.7 cm for middle lobe tumors and 1.0 cm for tumors in the lower lobes was added to create the PTV. For headand-neck cancer, the margins were adjusted to 1.0 cm beyond the GTV to obtain the CTV; the CTV was expanded symmetrically by 0.3 cm in all directions to account for patient setup and motion within the thermoplastic mask. For prostate cancer the CTV was considered to be the prostate plus seminal vesicles; the planning treatment volume (PTV) was obtained by expanding in 3D the CTV by 1.0 cm and 0.7 cm on the prostate-rectum interface to avoid excessive rectal wall involvement.
All patients, except for five head-and-neck cases, were treated with one fraction per day, 5 days a week, with the fraction dose equal to 2Gy at the ICRU reference point [27]. Five head-and-neck treatments receiving 69.96 Gy to PTV1 and 59.40 Gy to PTV2 with simultaneous integrated boost in 33 fractions, the remaining ones received 70.0 Gy, according to clinical risk. Respectively for lung, prostate and breast cancer the prescription dose was 60.0, 76.0 and 50.0 Gy.
The treatment plans were developed using Eclipse 8.1 TPS (Treatment Planning System); the dose distributions of the clinical treatment plans initially performed using the PBC algorithm were recalculated with AAA using the same number of monitor units provided by PBC.
The paired Student t-test was used for statistical comparison of all results obtained from the use of the two algorithms. All tests were two-tailed with a p value of < 0.05 considered the threshold for statistical significance.
For the validation of both the algorithms implemented in the TPS, the tests, the analyses, and the acceptability criteria were in large part based on the report of the AAPM Report 55 [28], and other documents such as the technical report by IAEA [29] were consulted. For the AAA, the outcomes of some tests were comparable to those provided by Van Esch et al. [14].

Dose analysis
For the PTV we evaluated D 95% , D 2% dose levels on the dose volume histogram (DVH) above which lay 95% and 2% of the volume of the PTV; they were used as a surrogate for dose minimum and dose maximum, respectively. The mean dose to the PTV was also considered.
To describe the degree of the PTV dose inhomogeneity, the Inhomogeneity Index (II) was used and it has been calculated as (D 2% − D 95% )/D median , II is equal to 0 if no intra-target inhomogeneity is observed.
For breast treatments, D 15% , D 2% and the mean dose, D mean , to the heart and homolateral lung were assessed; D 2% , D 20% , D 60% , V 20 (relative volume of the lung receiving at least 20.0 Gy) and D mean to the lung-CTV for NSCLC treatments were considered; mean parotid glands dose for head-and-neck treatment; finally D 2% , D 50% , D 95% and D mean to the rectum and D 2% , D 50% , D 80% to femoral heads for prostate treatments were recorded.
The NTCP and TCP calculation was performed by a home-made software; the dose calculation in Eclipse was calculated using the minimum available grid size, 0.25 cm, and the step size for differential DVH export was chosen to be 25 cGy, it was a compromise between time necessary for TCP/NTCP calculation and the accuracy of the computed values.
Six different combinations of NTCP model/parameters were available for the heart, three quantifying the risk for pericarditis, and three the excess risk for cardiac mortality. For the lung we applied eight different model parameter sets relating to the development of pneumonitis. The lung-GTV is the volume considered from some of the used NTCP parameter sets (e.g. De Jaeger et al. [24]) and using these parameters on the lung-CTV volume would generally be incorrect, as any differences in dose distribution from the PBC or AAA is most noticeable at the lung/ tumour interface. However because the tumor was at an advanced stage and the GTV was similar to the CTV, lung-GTV and lung-CTV were only slightly different. Therefore using the NTCP model parameters for lung-GTV on the lung-CTV may be suitable.
For quantifying the risk of xerostomia from irradiation of the parotid glands we used three different combinations of NTCP model/parameters. Seven sets of parameters were used for the prediction of late rectal bleeding in the patients treated for prostate cancer and only one set of parameters relating to the necrosis of femoral heads. The applied parameters are listed in Tables 1, 2 Using the LQ model the TCP was also calculated from the DVH of the CTV. For head-and-neck, breast and lung tumor α/β ratio of 10 Gy was used. At present much debate is going on about the value of α/β for prostate cancer. Modeling studies suggest that α/β could be as low as 1.49 Gy, while other studies show higher α/β [32][33][34][35][36][37]. Because valid data were not available, we decided to do our investigation with two extreme values; a α/β ratio of 1.49 Gy [32] and 10 Gy [37] were chosen, subsequently, these sets of parameters are referred to as α/β_1.49 and α/β_10 respectively.

Results
The results of the comparison of the treatment plans as calculated by two algorithms, PBC and AAA, are summarized in Tables 6, 7, 8, and 9 for breast, lung, head-and-neck and prostate treatment respectively.
Subsequently, NTCP calculated with the AAA and PBC algorithm are referred to as NTCP AAA and NTCP PBC , respectively; the NTCP values less than 0.1% are assumed to be zero.

The breast treatment
When AAA was used, the maximum percentage difference was −3.3% for D 95% and a increase of 2.0% for II was found ( Table 6). The poorer coverage of the PTV was reflected in the TCP, which was significantly lower when the AAA was used, the mean value was 77.3% (sd = 7.7%) and 85.1% (sd = 4.3%) for PBC (p < 0.001). For the ipsilateral lung while mean D 2% decreased when the AAA was applied, the mean D 15% and D mean increased by 3.0 Gy   [39], model fitting by Ågren-Cronqvist [40]. and 1.8 Gy respectively; the mean NTCP AAA values were higher than NTCP PBC (Figures 1 and 2) . The mean NTCP AAA varied from 0.2% (sd = 0.1%) to 2.1% (sd = 0.3%), while the mean NTCP PBC varied from 0.1% (sd = 0.0%) to 1.8% (sd = 0.2%) depending on the chosen parameters set.
A mean increase of 0.1% was observed on the NTCP AAA value when it was estimated by De Jeager et al. [24] parameters and re-evaluated using convolution-superposition (CS) algorithm for the dose calculation. When the Seriality Model was applied, the NTCP value estimated by Seppenwoolde et al. [38] was lower than had been predicted by the LKB model.
Following left breast cancer treatments, the risk of excess cardiac mortality was found to be low; there was no statistically significant difference between the AAA and PBC. The risk for developing pericarditis was zero for all the considered NTCP parameters. These results might be due to little cardiac tissue having been exposed to radiation beam, but on the other hand it is also necessary to consider that presently there are insufficient clinical data on the dose-response characteristics of cardiac tissue on which to base reliable estimates of radiobiological parameters.

The lung treatment
Changes were observed specifically in the PTV coverage; there was a mean percentage difference of about −3% for the D 95% , D 2% and D mean when the AAA was used and as a result, a mean decrease of 8.3% for TCP was observed ( Table 7).
For the normal lung dose parameters, the AAA predicted a mean reduction of 3.6 Gy for D 2% , while the differences for D mean and D 20% were not statistically significant. D 60% and V 20% were found to be slightly higher when calculated with the AAA than had been predicted by the PBC algorithm. The resultant NTCP AAA values were lower than the NTCP PBC (Figure 3). For both Seppenwoolde et al. [38] and Emami et al. [39] / Burman et al. [23] parameters, the comparison of NTCP values as calculated by two models, LKB and RS, involved higher values if the LKB model was applied.
When De Jaeger et al. [24] parameters were used we found a mean increase of 4.3%: 14.9% (sd=7.2%) versus 10.6% (sd=4.8%), if the NTCP AAA was calculated with the parameters evaluated from dose distribution based on a CS algorithm.
The head-and-neck treatment −3.7%, -2.9% and −3.4% were the percentage difference between AAA and PBC for D 95% , D 2% and D mean respectively, and the differences were statistically significant (Table 8). A reduction of 4% was observed in the mean TCP with the use of the AAA.
The mean NTCP AAA of parotid gland toxicity values were lower than NTCP PBC . Furthermore the NTCP value obtained using Emami et al. [39]/Burman et al. [23] parameters was lower than the value predicted by the other sets of NTCP parameters. Using the Eisbruck et al. [41] parameters, the risk of a decrease in the salivary flow to 25% of the pre-treatment flow at 1 year post treatment was much higher (see Table 8) than the risk calculated by Roesink et al. [42] parameters which considered the same endpoint.

The prostate treatment
This type of treatment showed a markedly different pattern. The present study showed no clinically significant   differences for any of the evaluated PTV dose parameters ( Table 9). The type of algorithm did not affect the II (Figure 4). Across the two algorithms, the TCP was within 2.0%. Regarding the rectum, the AAA predicted slightly higher values for the dose parameters (though statistically significant) except D 2% . But on the other hand when the PBC was used, low percentage of rectum volume was exposed to a higher dose than had been obtained with the AAA and the NTCP PBC was higher than NTCP AAA ( Figure 5). The tissue-bone interfaces were encountered by the lateral fields (90°, 270°); the slight shift towards higher doses of the femoral heads DVH curve was found when the AAA was used ( Figure 5). Consequently, the only available combination of parameters (Emami et al. [39] / Burman et al. [23]) predicted a slightly higher risk of necrosis, even if of limited clinical significance.

Discussion
Since the AAA is considered to be a more accurate dose computation algorithm, the comparison between the AAA and PBC dose distributions by the analyzed dose indices, provides an indication of the difference between the dose predicted by the PBC and that   actually delivered. In our study the consequences of the difference on the dose-effect relations for normal tissue injury were analyzed, comparing different NTCP model/parameters. The influence of the low density lung in proximity to the target volume was the factor largely responsible for the differences between the algorithms' dose distributions for the breast plans. When AAA was used, the increase of II was the result of the broadening of the penumbra which was nicely modeled by the AAA; the TCP was significantly lower.
Using a more accurate algorithm the radiation in the lung was clearly transported further away; when the AAA was used the mean D 2% to the lung decreased by 8.1%. This is in accordance with the findings by Knöös et al. [16], which in their comparison of dose calculation algorithms divided the algorithms in two groups based on how changes in electron transport were accounted for; for the breast plans, they found that the D 5% to the adjacent lung decreased by 9.5% or more when accurate algorithms were used. While mean D 2% decreased when the AAA was applied, the mean D 15% and D mean increased; as result, the mean NTCP AAA values were higher because the available NTCP parameter sets described the lung by a prevalent parallel architecture for the analyzed endpoint. The incidence of radiation pneumonitis predicted by both Emami et al. [39] / Burman et al. [23] and Emami et al. [39] / Agren-Cronqvist [40] parameters was zero. It is worth remembering that the Emami et al. [39] study was derived from the clinical experience of the 2D planning era, without any individually assessed dose-volume information; that most external radiation therapy was delivered with opposing fields and the normal tissue was irradiated with a fairly uniform dose fraction size.
It would be interesting to know the NTCP values for the parameters estimated from Emami's study. However more reliable parameterizations, fitting radiobiological models to clinical and dosimetric data, have been published and have to be considered to predict the risk of normal tissue toxicity.
In the lung treatment, the extremely inhomogeneous lung region resulted in the major difference of the two algorithms' abilities to account for inhomogeneities on the final dose distribution. The factor responsible was the widening of the penumbra, in fact looking closer at the isodoses in proximity to the target revealed that they had a larger separation when the more accurate AAA was applied and was worse for those treatments where a large amount of lung tissue is involved in the PTV. These results are in agreement with the report of Bragg et al. [13], who found that lung plans generated the most significant issues in PTV coverage. When the AAA was used, a reduction for D 2% was observed, while the differences for D mean and D 20% were not statistically significant, contrary to what was observed in the breast plans. The NTCP AAA values were lower than the NTCP PBC . The highest NTCP values and the maximum differences between NTCP AAA and NTCP PBC were found for Emami et al. [39] parameters. One can see from Table 2 how much these parameters are markedly different than the other parameterizations, whether for the LKB model or for the Seriality model.
The results obtained show how the use of NTCP parameters based on more accurate dose calculation should be recommended to avoid underestimating the calculated values of NTCP. When De Jaeger et al. [24] parameters were used the underestimate was not significant when the lung tissue volume involved was small, such as for breast treatments, but it was relevant when the volume was major such as for lung treatments; we found a mean increase of 4.3%, when the NTCP AAA was calculated with the parameters evaluated from dose distribution based on a CS algorithm.
In the head-and-neck treatments, as every beam passed through a region of low density material the resulting dose distributions of the two algorithms were dissimilar. A Figure 1 Comparison of NTCP for risk of developing pneumonitis following breast treatment computed with the AAA (ordinate) and the PBC algorithm (abscissa) for NTCP models/parameters sets from Table 2. Each symbol represents data of an individual patient. The dotted line indicates the line of identity. reduction of the mean TCP and of the mean NTCP values, quantifying parotid gland toxicity, was observed with the use of the AAA.
Moreover using the Eisbruck et al. [41] parameters, the NTCP value was much higher than the risk calculated by Roesink et al. [42] parameters which considered the same endpoint. This can be taken as a warning to radiation oncologists: before introducing a predictive model into clinical practice, it is necessary to assess if its predictions "make sense" in regard to that clinic's treatment plans and experience [43,44].
The dose distribution across the two algorithms was found to be very similar in the prostate plans; no clinically significant differences for any of the evaluated PTV dose parameters were observed. Such small differences between the dose distributions were found because only two of the five beams used for these treatments were such that there was involvement of heterogeneities.
When the PBC was used, a low percentage of rectum volume was exposed to a higher dose than had been   obtained with the AAA and the NTCP PBC was higher than NTCP AAA . That is consistent with a prevalent serial architecture of the rectum for the analyzed endpoint. The effect of the variability of the NTCP parameters on the NTCP value is shown in Table 9. The major differences between the two algorithms were for those set of parameters with the n-parameter closer to 0; the highest NTCP values were with the parameters proposed by Tucker et al. [45] and Rancati et al. [46]. It is interesting to note how a slight difference for the m-value and D 50 -value between Tucker et al. [45] and Söhn et al. [47] parameters resulted in an important difference in the corresponding NTCP values.
The comparison of the two algorithms in the present study is in accordance with the literature; the differences are of minor clinical significance in many situations such as for prostate treatments and probably for other lesions in the pelvic area. The adoption of the AAA into clinical treatment planning practice requires one to fully understand its effect and its potential consequences so as to re-evaluate an assessment of dose-effect relationships and of parameters used in treatment planning decisions [48]. Similarly the introduction of a predictive model into clinical practice has to be prudent as it is necessary to assess if it is based on calculations and treatments similar to those for which the NTCP has to be calculated. The results found in this study show how the NTCP is strongly affected by the wide-ranging values of radiobiological parameters and the differences between the dose distributions of the two tested algorithms yield statistically significant differences in the NTCP values.

Conclusions
In this study, we have tried to investigate qualitative, possible clinical consequences of the use of PBC versus AAA (keeping the same number of monitor units provided by PBC and clinically delivered to each patient) by comparing different NTCP model/parameters. As general result, the NTCP AAA was lower than the NTCP PBC , except for the breast treatments.
The difference in NTCP between PBC and AAA treatment plans could be clinically significant and it may be the subject of a future prospective study.
Moreover we have observed how much the NTCP value depends strongly on the choice of radiobiological parameters. Radiobiological modeling can play an important role in high quality radiotherapy, however uncritical reliance on model results may compromise treatment outcomes and patient safety. It is important to use NTCP parameter sets based on calculations and treatments similar to those for which the NTCP has to be calculated; additionally, it is necessary to improve models and obtain more robust radiobiological parameters [49][50][51][52][53][54][55][56].