Dose calculation algorithms
The former dose calculation algorithms, types (A) and (B), such as pencil beam, convolution/superposition, analytical anisotropic algorithm (AAA), etc. are based on D(w,m) mode [8, 9]. The D(w,m) in the algorithms can be converted back to D(m,m). Conversely, the Acuros XB (AXB) algorithm, as type (C), uses the Linear Boltzmann transport equation (LBTE) providing both modes D(w,m) and D(m,m). In type (C), the absolute dose in each voxel is calculated using the determined electron angular fluence, the macroscopic electron energy deposition cross sections, and the material density of the voxel. Among these algorithms, the AXB shows the highest accuracy between measurements and dose calculations and is closer to full Monte Carlo (MC) simulations [10,11,12,13,14].
The D(m,m) mode is recently made available in treatment planning. Thus, the dose distribution with D(m,m) could be compared to reference one as D(w,m) where the dose limits are well established and used in radiation oncology.
Clinical example
The following analyses from lung cancer radiotherapy data present an overview of how the QA process may be used for the evaluation of real differences between treatment plans from different dose calculation algorithms. The recent improvement of dose calculation D(m,m) vs D(w,m) was used as an example of a dose calculation model change, producing a true situation of decision in radiotherapy as well as the future transition for all radiotherapy departments seeking to improve radiotherapy plans and approaching the truest calculated DVH.
The magnitudes of dose differences depend on the type of algorithm transition and of the reference one. The methods described in this study have been applied to lung cancer with photon beams showing an example of transition (e.g, moving from pencil beam convolution (PBC) with no heterogeneity correction (PBCNC) to modified Batho’s density correction method (PBCMB) or moving from AAA to AXB D(m,m)).
Quality assurance method
Normalization methods to compare dose calculation algorithms
To compare different dose calculation algorithms, all dosimetric data are calculated with a unique set of images for a given patient, whatever the number of different algorithms to compare.
A brief summary of the QA requirements for the process to ensure the clinical validation of a new dose calculation algorithm is the following [15]:

a threedimensional conformal radiotherapy (3DCRT) plan is initially generated for each chosen case to deliver, with the best possible conformation, the prescription dose (Dpr). This is the reference Plan 1. The 3DCRT is a convenient technique to evaluate the real impact of the change of a dose calculation algorithm regarding the monitor units (MUs) and dose distribution. This irradiation technique allows to limit the technical parameters to the minimum, conversely to more complicated IntensityModulated Radiation Therapy (IMRT) technique. The Plan 1 should be normalized at the isocentre (Diso) defined as corresponding closely to the center of the planning target volume (PTV). The Dpr should cover 95% of the PTV, showing a real treatment plan meeting the radiotherapy goal: maximizing dose to PTV while minimizing dose to OARs.

the test plan, Plan 2, uses exactly the same beams as the Plan 1, recalculated for each field with the new algorithm, for the same Dpr as Plan 1.

a complementary plan, Plan 3, is generated using the same MUs of the reference Plan with the same beam arrangements. The dose distribution of Plan 3 shows actually the dose distribution of the former treatments (Plan 1) as recalculated with the new algorithms.

field sizes and shapes in all plans should be identical using the beam’seye view projection of the PTV, or Gross tumor volume (GTV).
There are different modes of Dpr, the most popular being either the Dpr to the isocentre (Diso) as recommended by International Commission on Radiation Units & Measurements.
(ICRU) reports 50, 62 and 83 [16,17,18] or setting that at least 95% of the Dpr should cover the entire PTV or that 95% of the PTV should receive at least the Dpr (D95% = Dpr), etc. Under the above conditions, the maximum dose within the target could range between about 95% and 105% of the Dpr. Any mode of dose prescription is compatible with the procedure described hereby. The Fig. 1 shows the successive generation of the Plans 1, 2 and 3 for each patient case.
QA procedure
The Fig. 2 summarizes this QA method to measure and assess the dosimetric shift of a new dose calculation algorithm including dosimetric analysis, gamma indices (γ), radiobiological and statistical analysis.
Delivered dose
The MUs can be used as QA tool to compare and validate photon dose calculation algorithms. The MUs from the former/reference algorithm could be reused to recalculate the delivered dose (DD), in Plan 3, at the reference point: Diso. The dose differences, ΔDiso, for recalculated Diso with the new algorithm depend on the magnitude of the ΔMUs (between Plans 1 and 2):

If ΔMUs > 0, showing (MUs from ref. Plan 1 > MUs from tested Plan 2), the Diso will be higher in Plan 3 than in the reference Plan 1.

If ΔMUs < 0, showing (MUs from ref. Plan 1 < MUs from tested Plan 2), the Diso will be lower in Plan 3 than in the reference Plan 1.
Dose volume histograms (DVH) indices
The QA process should be performed for each cancer site for both target and OARs. Anatomical regions with the most heterogeneous tissue densities are also the most prone to have dosimetric shifts. The DVH should be recalculated with the new dose calculation algorithms using firstly the same Dpr (Plan 2) and secondly with the same MUs from former one (Plan 3), as mentioned above. The beam arrangements, geometry and rotation should be similar in all plans without any supplementary optimization. The most important parameters are the dose near minimum (D98%), the dose near maximum (D2%, and the mean/average dose (Dmean). In addition, dose volume indices as the percent volume that received at least 95% of the prescription dose (V95%), D95%, as well as quality indices are recommended. The impact of the change would result in different DVH parameters, leading to significant impacts on quality indices. The higher/lower doses translate into overestimation or underestimation of the delivered doses and thus influencing TCP/NTCP values. The D95% for PTV should be as close as possible to the Dpr, in order to avoid the under irradiation of the tumor. On the other hand, a higher D95% would predict a higher TCP value.
The D95% could be used as indicator to readjust the Dpr and correlate with TCP values [19]:
Gamma analysis
The γ index is a very useful tool for comparing measured and calculated dose differences, in situations where the measurement uncertainties introduce a mix of positional and dosimetric uncertainties. This tool combines two criteria including the dose difference in percentage (%) and the distancetoagreement (DTA) in millimeters (mm). An ellipse is used to determine the acceptance region, γ ≤ 1 representing fulfillment of the criteria [20]. Since γ analysis generates a value for all points in a distribution, this value contains information about the magnitude of any disagreement in the dose and DTA from two planning algorithms. Thus, to make an overall comparison, a novel approach using 2D or 3D has been proposed. The utility of γ for comparing the results of two planning algorithms has been demonstrated by several works [21, 22]. For γ analysis, the Digital Imaging and COmmunications in Medicine (DICOM) data including dose distribution from reference and tested algorithms for each patient should be exported from TPS. The results per treatment plan could be calculated by considering all pixels for a specific patient using axial, sagittal and coronal plans. The results are displayed using a γ maps and cumulative PixelsγHistogram (PγH).
The γmaps show the pixels with γ > 1 that were out of tolerance, indicating overestimated or under estimated doses. We could then discriminate the healthy tissues located around the target volumes. The superposition of the γmap with the computed tomography (CT scan) provides the anatomical information, showing in color, where the dose differences are located helping the radiation oncologist for decisionmaking. The PγH indicates the fraction of pixels with a γindices ≤1. We considered that dose distributions from both algorithms were similar, if 95% of pixels or voxels are passing the γcriteria with γ ≤ 1.
It is interesting to note that, there are also some other techniques to compare dose distributions more or less similar to γ, such as delta envelope. However, caution should be done when comparing dose distribution from former algorithm with dose distribution with MC to avoid the overestimated or underestimated average γvalue or γpassing rate due to the increase of the statistical noise level in the dose distributions computed with MC simulation [23,24,25].
Radiobiological analysis
The DVH for both target and OARs could be used to determine respectively the TCP and NTCP from a treatment plan with a specific Dpr. The most important parameter that correlates with the TCP is the Dpr translated by the TPS into DD with MUs. However, when changing a dose calculation algorithm, the dose distribution will change and it would be hard to get exactly the same TCP and NTCP values, compared to the reference one. In this context, to correlate the real DD with Dpr, the EUD concept was shown to be a useful indicator to compare the dose distribution, coming from different algorithms, for the target volume and OARs [26].
According to Niemierko’s model, EUD is defined as [27, 28]:
$$ EUD={\left(\sum \limits_i{v}_i{D}_i^a\right)}^{1/a} $$
(1)
where (v_{i}) is the fractional organ volume receiving a dose (D_{i}) and (a) is a tissue specific parameter, easy to find in the literature, that describes the volume effect. It is one of the problems of EUD’s applicability, that tissue specific parameters, such as (a) are not readily described.
The TCP and NTCP could be calculated as:
$$ TCP=\frac{1}{1+{\left(\frac{TCD_{50}}{EUD}\right)}^{4{\gamma}_{50}}} $$
(2)
$$ NTCP=\frac{1}{1+{\left(\frac{TD_{50}}{EUD}\right)}^{4{\gamma}_{50}}} $$
(3)
where TCD_{50} is the dose to control 50% of the tumors when the tumors are homogeneously irradiated. TD_{50} is the tolerance dose for 50% complication rate of the normal organ. The factor (γ_{50}) describes the slope of the doseresponse curve.
As shown in eq. 1, the EUD concept combines dose distribution with a radiobiological parameter (a), and reflects the biological properties of the tumors and organs. The parameter (a) has a negative value for tumors, and a ≥ 1 for OARs. The values (a = 1/n) for OARs can be taken from LymanKutcherBurman (LKB) model [29, 30].
By definition, D98% < EUD < D2%:

when a < 1, for target volumes (e.g a = − 10 for the lung), the model weights more on the low dose area and EUD becomes D98%;

when a = 1, for parallel organs that exhibits a large volume effect as lung, the EUD becomes Dmean and thus NTCP value depends on Dmean;

when a > 1, for serial organs such as the spinal cord, the model weights more on the high dose area to penalize hot spots and EUD becomes close to D2%.
If the parameter (a) cannot be calibrated for the calculation of EUD, a confidence interval around the calculated EUD values by calculating the lower and upper bounds on the EUD can be estimated, using a = (0.5–3.0) for parallel organs, and a = (4.0–15.0) for serial organs [31].
To obtain TCP or NTCP equal to 50%, which is the most sensitive part of the sigmoid doseresponse curves, the TCD_{50} or TD_{50}, respectively, should be equal to the EUD values derived from DVH. To avoid the uncertainties associated with the use of TCP and NTCP running with obsolete radiobiological parameters, Chaikh et al. 2016, proposed to use the EUD concept to validate the new dose calculation algorithms in a radiobiological perspective. Consequently, the EUD resulting from a given treatment, taken as reference, could be the gold standard to obtain the desired TCP or NTCP values, since they depend on EUDs. In addition, it could be used as an objective for optimization [26].
As a whole, if the new algorithms provide a lower EUD to the target, this will indicate that the target will be under irradiated compared to the reference one. This might produce unexpected recurrences. Since the expected local control is associated with the Dpr, the EUD value of the target provides essential information about the real delivered dose that should be very close to Dpr. On the other hand, the EUD for an OAR should be much lower indeed than TD_{50}, as 50% of severe complications is usually not acceptable.
Statistical methods
As the same CT scan, for each patient, is used to generate the different treatment plans and that the dose is recalculated with the new algorithm, there is a relationship between the dosimetric data from reference plan and the tested plans with new algorithms, excluding any anatomical variation. Thus, Wilcoxon signed rank test can be used and is able to calculate a reliable pvalue with a very small number of cases. In addition, the statistical correlation between the data could be evaluated using Spearman’s correlation coefficient (ρvalue). More recently Chaikh et al. 2016, proposed the bootstrap simulation method to estimate the minimal number of cases to observe a significant difference with p < 0.05. The method uses randomly chosen sample (n), iteratively drawn with replacement from the original data set accounting a cases number (m). For every n, the mean pvalue across the 1000 random samples could be computed using Wilcoxon signedrank test. Then the pvalues as a function of each (n) could be plotted up to number (n = m) showing the variation of pvalue with (n) [32].
Medical decision: suggested Dpr adjustments
The final objective is to propose an approach, already tested in our department for lung radiotherapy, to check if the Dpr should be readjusted, or not, when changing the dose calculation algorithm. Considering that, if there is a statistically significant difference in dose calculation, with p < 0.05, the Dpr could be readjusted. The objective is mainly to keep unchanged NTCPvalue. The significant difference means, with a 95% of confidence, an existing difference between former and newer algorithms. To support the medical decision, a quantitative evaluation could be carried out using dosimetric, 2D or 3D global analysis based on γcriteria and radiobiological based on EUD concept. The Fig. 3 shows a suggested principle of medical decision concerning the modification of Dpr when moving toward a new dose calculation algorithm.