To propose adding index of achievement (IOA) to IMRT QA process

Background In intensity modulated radiation therapy (IMRT) quality assurance (QA), evaluation of QA result using a pass/non-pass strategy under an acceptance criterion often suffers from lack of information on how good the plan is in absolute manner. In this study, we suggested adding an index system, previously developed for dose painting technique, to current IMRT QA process for better understanding of QA result. Methods The index system consists of three indices, index of achievement (IOA), index of hotness (IOH) and index of coldness (IOC). As indicated by its name, IOA does measure the level of agreement. IOH and IOC, on the other hand, measure the magnitude of overdose and underdose, respectively. A systematic analysis was performed with three 1-dimensional hypothetical dose distributions to investigate the characteristics of the index system. The feasibility of the system was also assessed with clinical volumetric modulated arc therapy (VMAT) QA cases from 8 head & neck and 5 prostate patients. In both simulation studies, certain amount of errors was intentionally induced to each dose distribution. Furthermore, we applied the proposed system to compare calculated with actual measured data for a total of 60 patients (30 head & neck and 30 prostate cases). QA analysis was made using both the index system and gamma method, and results were compared. Results While the gamma evaluation showed limited sensitivity in evaluating QA result depending on the level of tolerance criteria used, the proposed indices tended to better distinguish plans in terms of the amount of errors. Hotness and coldness of prescribed dose in the plan could be evaluated quantitatively by the indices. Conclusions The proposed index system provides information with which IMRT QA result would be better evaluated, especially when gamma pass rates are identical or similar among multiple plans. In addition, the independency of the index system on acceptance criteria would help making clear communications among readers of published articles and researchers in multi-institutional studies.


Background
Patient specific quality assurance (QA) in intensity modulated radiation therapy (IMRT) is important to verify the accuracy of dose calculation and delivery. IMRT QA is commonly accomplished by comparing a calculated dose distribution with an actually measured dose distribution [1,2].
Many reports have been published with regard to quantifiable indices for IMRT QA evaluation and the assessment of their performances in various situations [2][3][4][5][6][7][8][9][10]. Presently, a well-accepted approach is to count how many measurement points are within a preset criterion [3,4,11]. Most criteria are made based on either dose difference (DD) or distance-to-agreement (DTA), or both. The gamma index is similar in principle, but it does utilize a criterion that combines both dose difference and DTA into a single parameter [12]. There are clear advantages in the gamma method. Obviously, dealing with one quantity (i.e., gamma-index) is more straightforward than doing with two quantities. In addition, evaluating whether an IMRT QA satisfies or not based on the number of passing points under the given criteria is simple and convenient in certain aspect thus, the gamma index method has been preferably adopted in many clinic sites.
In the pass rate-based approach, however, it is difficult to estimate the absolute matching quality of each plan between measurement and calculation because its pass rate can vary significantly depending on how the acceptance criterion is chosen [13][14][15]. Such issue can be problematic when a reader/reviewer is trying to understand the matching quality of plans reported in publications and/or submitted for review in multi-institutional clinical trials. To be specific, for instance, in case institution A requires over 90% pass rate under a DD/DTA criterion of 2%/2 mm and institution B does over 95% pass rate under a DD/DTA criterion of 3%/3 mm, it is not easy to judge which institution does keep higher QA result overall. Therefore, it would be beneficial to have an additional system that simply supplements the current method by providing the matching quality of IMRT plans independent of the preset criteria.
Recently, index of achievement (IOA) system, a plan quality evaluation approach under the dose-painting paradigm, has been introduced that takes point-by-point relative dose differences into account to provide simple indices [16]. The concept of IOA is basically based on the first principle of direct dose difference thus, simple but effective in the dose-painting strategy for which typical homogeneity index, a popular strategy in conventional therapy, does not work at all.
While IOA method was developed for plan evaluation in planning stage, we believe, it can be utilized for QA stage as well. Only difference is that comparison is made between measured dose and planned dose in QA stage instead of between planned dose and prescribed dose. Therefore, in this study, we tried to apply IOA method to QA evaluation in IMRT QA and investigated its feasibility for compensating the limitations of the current method. There are two more indices in the IOA approach, index of hotness (IOH) and index of coldness (IOC), and they are also included to measure the overall levels of overdose and underdose. For both hypothetical dose profiles and actual IMRT planning dose distributions, the characteristics of obtained indexing values were analyzed and compared with that of gamma evaluation.

Formula
Three indices in this approach (IOA, IOH and IOC) are expressed as follows: where, and, D ref, i and D eva, i are the reference and evaluation dose of ith voxel, N is the total number of voxels, B i is the binary factor of ith voxel allowing for binary selection of each voxel in calculating an index, and D ref, max is the maximum of the reference doses, respectively. Regarding B i , we used three sub-binary factors, b i, TH , b i, H and b i, C , each to take into account the threshold of dose, hotness and coldness, respectively. B i can be a product of several b i values and each of them corresponds to a specific condition. Now, Eq.
(1), (2), and (3) can be rewritten as described below: and, D ref, TH is the threshold dose which is 'TH'% dose of the maximum reference dose and N TH is the number of voxels having dose not smaller than D ref, TH . Note that each index is expressed with a subscript, 'TH' added to indicate voxels having lower than 'TH'% of maximum dose in the reference plan are excluded in index calculation. As defined, in addition, IOH is obtained in the region where evaluation doses are higher than reference doses and IOC is opposite. It is clear that each index is a single value and becomes '1' in an ideal case, that is, when the evaluation dose distribution perfectly matches with the reference. Note that for non-ideal cases, the value of IOC is always smaller than '1' while those of IOA and IOH are larger than '1'.
The index system performs a point-by-point calculation on the identical grid point (i.e., indicated as ith voxel) of both the reference and evaluation distribution. Intuitively, the total number of voxels to be evaluated (i.e., the whole domain of ith voxel) better be determined by whichever distribution between the reference and evaluation has smaller number of available points. However, when the locations of available points do not exactly match between the reference and evaluation distributions, interpolations can be made to generate values on certain grid points and index calculations can be    performed based on such grid points, which is what used in this study.

Systematic study with 1-D hypothetical dose distribution
A systematic analysis was performed on three one-dimensional (1-D) normalized dose distributions using MATLAB (The MathWorks, Inc., Natick, MA) as shown in Fig. 1 (a, b and c). As can be seen, (a) and (b) have the same size of flat region but different penumbra while (a) and (c) have the same penumbra but different size of flat region. For convenience we will call dose distributions (a), (b) and (c) as Model A, Model B and Model C, respectively. For the systematic analysis, simulated evaluation distributions were generated from the reference distributions by modifying them in both magnitude and location. For magnitude change, 0 to 3% of maximum reference dose at 1% interval were added. On the other hand, for location change, lateral displacements by 0 to 3 mm at 1 mm interval were made. The resolution used for Model A, B, and C was 1 mm. Figure 1 (d) illustrates a simulated evaluated dose distribution (dotted plot) together with the reference of Model B (solid plot) that contains uncertainties of + 3% of maximum reference dose in magnitude and + 3 mm in location. For each simulated case, IOA was calculated and compared with gamma evaluation results under 1%/ 1 mm, 2%/2 mm, 3%/3 mm and 4%/4 mm DD/DTA acceptance criteria for both global and local normalization.

2-D systematic study using clinical cases
In order to investigate the feasibility of the proposed indices, a total of 13 ( Table 3 The 1-D systematic study results of the gamma method and the IOA for the local normalization     . For each simulation, dose distributions were interpolated to 1 mm grid size to easily apply intentional spatial errors in the interval of 1 mm. In condition of both global and local normalizations, all of 3 indices (i.e., IOA, IOH and IOC) were obtained and compared with gamma evaluation results. Regarding gamma evaluation, 4 different DD/DTA criteria (1%/ 1 mm, 2%/2 mm, 3%/3 mm and 4%/4 mm) were considered (i.e., a total of 4992 gamma evaluations). As commonly adopted, a threshold of 10% of the maximum dose was applied in this study. Figure 2 shows examples of normalized dose difference for one of the head and neck cases when intentional spatial displacements are applied, from 1 to 3 mm along (a -c) horizontal or (df ) longitudinal direction, respectively. All values were normalized to the maximum dose difference among the same group [i.e., (ac) horizontal group or (df ) longitudinal group].
Application for comparing calculated with measured data for clinical cases The proposed method was applied to a total of 60 cases (30 from head & neck and another 30 from prostate patients) under IRB approval (Seoul National Bundang Hospital Protocol ID #B-1711-432-108). The calculated and measured data were based on VMAT QA cases using the PDIP and electronic portal imaging device (EPID) dosimetry. Acquisition conditions for calculated and measured data are shown in Table 1. For each case, the calculated and measured dose distributions were interpolated to 1 mm grid size for consistent evaluation regardless of acquisition conditions. IOA values in both global and local normalizations were obtained and compared with gamma evaluation results. Regarding gamma evaluation, 4 different DD/DTA criteria (1%/1 mm, 2%/ 2 mm, 3%/3 mm and 4%/4 mm) were considered. Also, a threshold of 10% of the maximum dose was applied in this study.

Results
Systematic study with 1-D hypothetical dose distribution Table 2 shows the result of 1-D systematic study (i.e., IOA values and gamma pass rate under 1%/ 1 mm, 2%/2 mm, 3%/3 mm and 4%/4 mm acceptance criteria for global normalization). Just for convenience, each case was ranked based on IOA (i.e., in the order of achievement) within the group it belongs to (i.e., starting with '1' for the best case). In the results of IOA, the values varied through most cases and showed a trend of gradual increase with the amount of error, demonstrating strong distinguishability of QA results. Contrary to the IOA analysis, every simulated gamma analysis cases up to 2%/2 mm intended error showed 100% pass rate under 3%/3 mm criterion for all of 3 dose distributions. Simulations of 3%/0 mm and 0%/3 mm also showed 100% gamma pass rate.
In the local normalization, as shown in Table 3, most cases showed 100% pass rate under such condition except for cases having 2% or more dose error from model A and C. Obviously, therefore, gamma method is not able to distinguish each simulated case from another in terms of its quality in such situations. For other cases, pass rate varied from~99 to 0%, showing certain level of discernment ability when the amount of error is relatively large.
When the same amount of spatial displacements is applied, more errors are expected with model B compared to model A due to the steeper dose gradients at penumbra regions. While the gamma method does not show such difference the IOA values demonstrate it clearly (e.g., 1.037 for model A vs. 1.081 for model B with intended 0% & 3 mm error in Table 2 Table 3). Because of such ability of QA result differentiation, the IOA method made it possible to place all the cases in order of overall uncertainty in each model (see the ranks from 1 to 16 indicated in Tables 2  and 3). Also note that these ranks are totally independent of the gamma acceptance criterion. Tables 4 and 5 show the calculation results of the proposed indices (i.e., IOA, IOH and IOC values) and gamma evaluation (i.e., pass rate) under 1%/1 mm. 2%/ 2 mm, 3%/3 mm and 4%/4 mm criteria for one of head  & neck cases, which used global and local normalization, respectively. In each example, dose errors ranged from − 3 to + 3% (of the maximum in the reference) and spatial displacements did from − 3 mm to + 3 mm in the lateral direction, resulting in a total of 48 erroneous situations. As can be seen, the values of IOA, IOH and IOC showed noticeable and reasonable variations from case to case, implying that the proposed index system was capable of differentiating QA results. It is worth to note that the IOA values are symmetric between the same magnitude of positive and negative dose errors (e.g., + 3% vs. -3% intended error). This can be easily expected from the definition of IOA. However, both the IOH and IOC values varied asymmetrically and provided additional information to decide whether the measured dose was hot or cold. In Table 4, the smallest IOC was 0.936 (with − 3%/− 3 mm intended error) and the largest IOH was 1.054 (with + 3%/+ 3 mm intended error). The IOA values at those two largest intended error situations were 1.06 and 1.059, respectively. The gamma pass rate became significantly low with large errors (i.e., when at least 3% dose error or 3 mm displacement error was involved) and reached the minimum of 66.4% under 3%/ 3 mm criteria in the case of − 3%/+ 3 mm intended error. However, it stayed 100% in 28 out of 48 situations, indicating that its capability of differentiating QA results significantly depended on acceptance criteria in many situations. Figures 3 and 4 show the scatter plots of the IOA values vs. the gamma pass rates in 4 different DD/DTA acceptance criteria (i.e., 1%/1 mm, 2%/2 mm, 3%/3 mm and 4%/4 mm), which used global and local normalization, respectively. Top 2 plots [i.e., In every case, the gamma pass rates tended to decrease as the IOA values did increase. Although such correlation seemed stronger under tighter gamma evaluation criteria in general, the highest correlation was obtained under the DD/DTA criterion of 2%/2 mm based on the regression analysis (R-square, p < 0.01).

2-D systematic study using clinical cases
Application to compare calculated with measured data for clinical cases Figure 5 shows the scatter plots of the IOA values vs. the gamma pass rates in 4 different DD/DTA acceptance criteria (i.e., 1%/1 mm, 2%/2 mm, 3%/3 mm and 4%/ In every case of gamma criteria, the gamma pass rates tended to decrease as the IOA values did increase. However, a correlation between IOA and gamma pass rate under 3%/3 mm and 4%/4 mm criteria seemed relatively weak, which was understandable. In Fig. 5 (a and c), we note IOA values were smaller than 1.03 in all of cases used in this study. With such results, it is not unreasonable to estimate that the overall global dose uncertainty was less than 3% in every case.

Discussion
The proposed single index method is quite simple and intuitively easy to implement as an additional tool in IMRT QA for evaluating differences between planned and measured dose distribution. The proposed index system is fully based on point-by-point comparison and deals with dose difference directly. In fact, the quantity that is directly relevant to clinical outcome is 'dose' thus, spatial uncertainty itself (e.g., DTA) is incomplete to provide direct information necessary and it needs to be converted to 'dose uncertainty' to be more meaningful. Therefore, any approach including the gamma method that utilizes spatial information directly without conversion to dose information is subject to such limitation. The proposed IOA, IOH, and IOC are not intended to replace the existing gamma evaluation methods. However, it would be useful to estimate a range of index values which is reasonably acceptable in common practice. In case of global normalization, it was found in Fig. 5 (a and c) that most cases (i.e., 57 out of 60) having the IOA value of less than 1.025 showed 90% or higher pass rate under the 2%/2 mm global gamma test. Based on such observation, the value of 1.025 could be a good reference. Note, in principle, 1.025 implies that the overall dose difference of a plan is about 2.5%. Considering the definition (i.e., index of achievement), it would make more sense to use local normalization only in IOA calculation. However, it is common to use global normalization in IMRT QA thus, we included global normalization as well. Therefore, using the IOA, it is possible to figure out dose difference either absolutely based on a reference value or relatively by each point. For qualitative assessment of IMRT QA, the pass rate based gamma evaluation method has been widely adopted as an essential technique in clinical practice and its application has been expanded from simple 2-D to 3-D and even to 4-D [17][18][19][20][21][22][23][24]. Recently, however, several publications have been made to report limitations of the gamma method [10,[25][26][27][28][29][30][31][32]. As illustrated in Tables 2  and 3, for example, it does suffer from lack of the ability of finely differentiating plans in terms of their quality depending on how the acceptance criterion is chosen. In comparison of the results between two model dose distributions, A with less steep penumbra and B with steeper penumbra, while it was possible to tell the difference between them using the index system by observing that relatively large difference of IOA values existed in B, it was not easy to do using the gamma method because the gamma pass rates were same in many situations. Similar behavior can be observed in Figs. 3 and 4. The scatter plots basically do not take a continuous trend from IOA = 1.0. Instead, they initially have points having a 100% pass rate until certain IOA value specific to given DD/DTA tolerance criterion then suddenly show points at lower pass rates and take continuous trend from there. Obviously, such non-continuous regions are range where the gamma method is insensitive for finely discerning QA results. Regardless of what the DD/DTA criterion is used, the index system provides the same values of IOA, IOH and IOC. In addition, those values are proportional to the amount of errors. In Figs. 3 and 4, in addition, the IOA not only showed relatively robust correlations with the gamma pass rates in certain range but also illustrated the possibility that it could complement the inexplicable part by the pass rate of the gamma analysis. Therefore, we believe the proposed index system can add value to the current gamma method by providing information that is often lost due to the acceptance criteria approach. Figure 6 shows the average rank maps of QA results based on the (a) gamma pass rate under the criterion of 3%/3 mm, (b) IOA, (c) IOH and (d) IOC from all of head and neck cases. As described above, the gamma method obtained 100% pass rate in 28 out of 48 situations and was subject to insensitivity of QA result evaluation (since 28 cases got ranked with '1' and were not distinguishable). However, the values of indices were more sensitive and enabled putting ranks in more detailed steps. Figure 7 shows ranking profiles measured along the '0 mm' displacement line in Fig. 6 (a-d). As can be expected easily, the IOH and IOC values provided rank maps properly in terms Fig. 6 Example of rank maps for all of head and neck cases: (a) 3%/3 mm gamma evaluation, (b) IOA, (c) IOH, and (d) IOC for the global normalization. It consists of average rank from QA results with regard to each amount of dose difference from − 3 to + 3% by 1% intervals along lateral axis and/or spatial displacement by 1 mm intervals along longitudinal axis. Note that no change of levels among results indicates insensitivity, which means undistinguishable which plan is better of overdose (i.e., hotness) and underdose (i.e., coldness), respectively. In Fig. 6, it is worth to note that the values of rank vary more abruptly following the y-axis (i.e., the axis of intended spatial error) than x-axis (i.e., the axis of intended dose error). This trend indicates that spatial displacement has more impact on QA result than dose perturbation in the case studied. However, in general, such difference of importance is often not fairly taken into account in gamma method. When a DD/DTA criterion is chosen of 2%/2 mm, for example, regardless of their true importance both 1% dose error and 1 mm spatial error are considered to be same in their contribution to gamma value calculation by the definition of gamma. This, we believe, is the most serious limitation of the gamma method and the proposed index system in this study is able to compensate it to certain extent.
Recently, Steers et al. reported that the optimal acceptance criterion in arbitrary situations is closely related with the selected dose threshold in a gamma analysis [33]. Thus, it would be useful to systematically investigate characteristics of the proposed indices according to the level of dose threshold in addition to other variables such as acceptance criteria, dose distribution grid size and interpolation method.
A collapsed dose matrix is obtained in the case of Portal Dosimetry-based QA for VMAT. Obviously, such collapsed dose matrix is not able to mimic actual dose delivery and it cannot be considered 'real'. However, it is a limitation of current Portal Dosimetry-based QA method in terms of 'what to evaluate' but not for 'how to evaluate'. In other words, the proposed method is rather about 'how to evaluate' than 'what to evaluate' and the proposed method has nothing to do with which QA technique is used. From the view of index calculation, for instance, there is no difference between a realistic static-beam dose matrix and a collapsed dose matrix.

Conclusions
We have proposed adding an index system to the current IMRT QA process for better understanding the result of IMRT QA and performed a systematic simulation study to evaluate the feasibility of the method proposed. The simulation study containing both hypothetical 1-D and clinical 2-D dose distributions demonstrated that the method was able to provide indices that were independent of acceptance criteria and enabled evaluating the matching quality of each plan with measurement.
Based on the findings, independency on acceptance criteria of the method will also help making clear communications among readers of published articles and researchers in multi-institutional studies. We believe this method can compensate some of limitations of the gamma-based QA method by providing valuable information that is often lost in the current approach.