Evaluation of a highly refined prediction model in knowledge-based volumetric modulated arc therapy planning for cervical cancer

Background and purpose To explore whether a highly refined dose volume histograms (DVH) prediction model can improve the accuracy and reliability of knowledge-based volumetric modulated arc therapy (VMAT) planning for cervical cancer. Methods and materials The proposed model underwent repeated refining through progressive training until the training samples increased from initial 25 prior plans up to 100 cases. The estimated DVHs derived from the prediction models of different runs of training were compared in 35 new cervical cancer patients to analyze the effect of such an interactive plan and model evolution method. The reliability and efficiency of knowledge-based planning (KBP) using this highly refined model in improving the consistency and quality of the VMAT plans were also evaluated. Results The prediction ability was reinforced with the increased number of refinements in terms of normal tissue sparing. With enhanced prediction accuracy, more than 60% of automatic plan-6 (AP-6) plans (22/35) can be directly approved for clinical treatment without any manual revision. The plan quality scores for clinically approved plans (CPs) and manual plans (MPs) were on average 89.02 ± 4.83 and 86.48 ± 3.92 (p < 0.001). Knowledge-based planning significantly reduced the Dmean and V18 Gy for kidney (L/R), the Dmean, V30 Gy, and V40 Gy for bladder, rectum, and femoral head (L/R). Conclusion The proposed model evolution method provides a practical way for the KBP to enhance its prediction ability with minimal human intervene. This highly refined prediction model can better guide KBP in improving the consistency and quality of the VMAT plans.


Introduction
Volumetric modulated arc therapy (VMAT) followed by intracavitary brachytherapy has become one of major treatment modalities for cervical cancer [1][2][3][4]. However, developing an appropriate VMAT plan presents a real challenge, since inverse VMAT planning in essence is still a trial-and-error procedure. The planner has to manually set the starting optimization objectives for the tumor target as well as for each organ of interest, which needs to take into account the patient anatomy, the linac performance, the prescription doses and the organ dose tolerance limits. This makes VMAT planning operator-and experience-dependent, as too "easy" objectives may lead to suboptimal plan while too hard objectives may cause sub-optimal trade-offs. Several authors have reported some head & neck and prostate cases suffering from over irradiation to the organs at risk (OAR) due to suboptimal treatment plans [5][6][7]. To address this issue, knowledgebased planning (KBP) arouses growing interest, which utilizes the prior knowledge to predict what kind of dose distribution is achievable and hence automatically generates the patient-specific optimization objectives for each OAR according to the estimated dose volume histograms (DVHs). Various KBP methods have been developed [8][9][10][11][12][13][14] and among them, RapidPlan ™ is the first commercial software that has been put into clinical use. Previously published works have demonstrated its usefulness in improving plan quality and planning efficiency for tumors in head & neck, prostate and rectum [15][16][17][18][19].
One of major concerns about the use of KBP is the quality of the plan database, which may determine the degree of accuracy that a prediction model can offer. It has already been revealed that current estimated results can only fulfil the "clinical acceptable" criteria rather than "optimal" or "near optimal" standards, due to the fact that the database plans may not all possess optimal dose distributions [5,9]. Some researches tried to re-optimize each of prior plans by a group of experts to guarantee a high quality [20][21][22]. This is tremendously labor intensive and time consuming, especially for cases where a large number of training samples are used. To improve the predictive accuracy more efficiently, Appenzoller et al. introduced a refined method to take the estimated DVHs as a reference to exclude suboptimal plans from the training cohort and repeat the modeling process on the remaining training dataset [11]. Wang et al. demonstrated that both the prediction model and its constituent plans were able to be significantly improved after two runs of closed loop refinements [22]. More recently, a refined model has been applied in an ongoing multi-institutional clinical trial as a quality assurance tool, highlighting its great potential for accurate DVH prediction [21]. Nevertheless, as it has been supposed that the quality of the database can be improved over time by using the KBP method [5,9,13], this may suggest that the prediction model should also be retrained on a regular basis to ensure its predictive accuracy. Therefore, it was expected to develop a progressive training strategy striving to create a highly refined KBP model with minimum human intervene.
Another issue often encountered in model generation is how many patient plans are recommended to build a particular prediction model. The manufacturer recommended that the minimum number of plans required for RapidPlan model creation was 20, but they emphasized that adding additional plans would usually help create a more robust model [23]. A newly published research concluded that the minimum required sample size needed to accurately train KBP models for prostate cancer depends on the specific model and endpoint to be predicted, and a sample size greater than 75 was recommended to train the KBP models [24]. Hence it is of primary importance to determine a propriate number of training samples in establishing the prediction model for cervical cancer to maintain its accuracy and robustness.
In this study, we present our experience in the application of KBP in VMAT treatment of cervical cancer with special attention to the above issues. A highly refined DVH prediction model was built for VMAT treatment of cervical cancer, which underwent a total of six runs of refining through progressive training until the training set size increased up to 100 cases. The proposed model evolution method was assessed in 35 new cervical cancer cases. The reliability and efficiency of KBP using this highly refined model in improving the consistency and quality of the VMAT plans were analyzed.

Database
A total of 100 patients with stage IA-IVB cervical cancer treated by pelvic VMAT were retrospectively reviewed. All patients were immobilized in the supine position. The CT images were acquired by a CT simulator using 3 mm slice and 3 mm spacing. The gross target volume (GTV) included all grossly enlarged lymph nodes with a short diameter of ≥ 1 cm and regional metastatic lymph nodes on imaging findings or as determined by PET/CT findings. The clinical target volume (CTV) included the cervix, whole uterus, parametrium, upper part of the vagina, and pelvic lymphatic drainage area (common, internal, and external iliac; obturator; and presacral). Inguinal lymph nodes were included if lower one-third vaginal involvement was observed. In patients with common iliac metastatic lymph nodes, para-aortic irradiation was administered. The planning gross target volume (PGTVnd) was generated by adding a 5-mm margin to the GTV and the planning clinical target volume (PCTV) was generated by adding a 6-mm margin to the CTV in all orientations, except for the anterior direction where a 10-mm margin was used. Dual-arc VMAT plans were designed by using Varian Eclipse treatment planning system (Varian Medical Systems, Palo Alto, CA), including two coplanar full arcs with gantry rotating counterclockwise from 179° to 181° and clockwise from 181° to 179°. Dose prescription was set to be 60 Gy in 25 fractions to the PGTVnd and 45 Gy in 25 fractions to the PCTV. The planning goals for tumor targets and dose constraints for the OARs were detailed in Table 1. Recent follow-up indicated that all patients were proved to have favorable prognoses with neither severe late toxicity nor treatment failure (local recurrence/distant metastasis).

Model building and evolution
The prediction model was automatically generated for pelvic organs of interest, specific for each OAR, based on the principle of parameterization of the structure set and dose matrices for the prior plans in the training set. The built-in proprietary algorithm for the RapidPlan ™ (version 13.5, Varian Medical Systems, Palo Alto, CA) is largely inspired by the methodology described by Yuan et al. [25].
In this study, an in-house model evolution strategy was developed to enhance the prediction ability of the model by progressively upgrading the database with new higherquality plans and re-training the model. The developed model was initially built using 25 clinically approved VMAT plans for cervical cancer (model C 0 ), which was the minimum number of treatment plans suggested by the product specialist. A closed-loop refinement process was conducted subsequently, in which relatively suboptimal plans were identified by comparing estimated DVHs with planned DVHs. Unlike previous studies [5,9], these suboptimal plans were not excluded from the database, but were rejoined to the training dataset after they were re-optimized under the guidance of estimated DVHs to further spare the OARs. This resulted in a refined model C 1 , preliminarily applied in clinic: (1) To guide the planning/re-planning process with better OAR sparing achieved; (2) To be a self-checking tool to identify the quality of the plan. By this means, VMAT plans with quality superior to the prediction were screened out and were added to the database to re-train the model on a monthly basis. Within the past 6 months, the developed model underwent five runs of refinement, generating model C 2 -C 6 , with training set size increased up to 100 cases. The detailed diagram of our model evolution process was illustrated in Fig. 1.

Dosimetric evaluation
The proposed model evolution method was assessed in 35 new cervical cancer cases. For comparison purpose, three kinds of VMAT plans were developed for each patient. (1) Automatic plan (AP): automatically created by only one click at the "optimization" button with no other human intervention; (2) Manual plan (MP): designed independently by a qualified planner in the traditional trial-and-error way; (3) Clinically approved plan (CP): created based on AP, but unlike AP, possible manual adjustments are permitted thereafter. The CPs were regarded as the reference standard in our plan comparison.
The prediction ability of the refined models of different stages was evaluated by comparing the estimated DVHs with the actual dose distributions finally achieved (i.e., DVHs derived from the CP, which was herein taken as the reference). This was done by assessing the degree of approximation between the predicted values and the reference values at given dosimetric endpoints. In consequence, the difference (in the absolute value) between the estimated dose and the reference dose of every model for a given OAR was calculated, and was ranked from small to large, with 6 points for the first, 5 points for the second, 4 points for the third, and so on. A total of 35 cases were evaluated and the average scores of various models were obtained for each OAR. The full mark of this investigation was 6 points. The introduction of such a scoring method is mainly to minimize the impact of individual cases on the global results.
To evaluate the usefulness of such a highly refined model in VMAT planning, a dosimetric comparison was conducted between APs generated by using model C 1 and model C 6 , respectively. The CPs generated based on model C 6 were also compared with the MPs with respect to the target coverage, the OAR sparing and the planning Table 1 Plan quality evaluation criteria. The planning goals for tumor target and dose constraints for OARs were listed The dosimetric indices herein used for target dose evaluation include the dose coverage, the CI and HI. The conformity index (CI) [26] was calculated by: Fig. 1 The diagram of the proposed model evolution process. The prediction model functions as a self-checking tool, ensuring that only new plans with quality higher than past plans can be added to the training dataset. The developed model underwent a total of six runs of refinements with training set size increased from initial 25 plans up to 100 cases where V Tref refers to the volume of the target covered by the reference isodose (here 95% isodose), V T was the target volume, and V ref was the volume of the reference isodose (i.e., 98% isodose).
The homogeneity index (HI) was defined as follows: where D x% refers to the absorbed dose received by x% of the target volume [27].
The dosimetric indices to OARs were selected according to their radiobiological properties. The average dose (D mean ) was computed for parallel organs, while the maximum dose (D max ) was recorded for serial organs like spinal cord. Other dosimetric indices collected include: V 18 Gy for kidney (L/R) and V 30 Gy , V 40 Gy and V 50 Gy for bladder, rectum, and femoral head (L/R).
To quantify the difference between plans, an assessing tool, namely Plan Quality Metric (PQM), was introduced [28,29]. The penalty points were assigned to the tumor target and each OAR, according to the priority of dose optimization objectives. The built-in dosimetric endpoints were determined with reference to our institutional protocols and the RTOG 0921 guideline [30]. The scoring details were listed in Table 1.

Statistical analysis
All statistical analyses were performed with SPSS software (version 20, SPSS Inc, Chicago, IL). The analyses of variance (ANOVA) were applied when normality (and homogeneity of variance) assumptions are satisfied. Otherwise, Wilcoxon Signed Rank test will be used. The statistically significant level was set as 0.05. Figure 2 plotted the predicted doses, the actual doses, and the scores for different models of a certain organ of 35 cervical cancer patients. There is a tendency that the predictive accuracy was reinforced with the increased number of refinements, in terms of the degree of approximation of the predicted doses to the actual values. This can be more clearly seen in the scoring curve, which minimize the impact of individual cases on the global results by introducing the weighted scores. The predictive outcomes of model C 1 were relatively poor, most of which were ranked at the bottom and got the lowest score. Model C 5 obtained a score approximate to 5 points in most cases, while model C 6 provided the best estimate to the actual doses among the refined models of different stages. The associated scores for model C 6 were all above

Results
5.5 points for various tested OARs (The full mark is 6 points). It seems that the prediction ability approaches the limits of current planning skills (i.e., best effort plan) after five to six runs of re-training, when the training samples increases up to about 75 to 100 cases.
The refined model C 1 and C 6 were applied in automatic KBP for cervical cancer. Compared with APs created by model C 1 (AP-1), APs created by model C 6 (AP-6) shows advantages in dealing with the trade-offs between the target coverage and the dose to the OAR ( Table 2). The proportion of AP-6 that can directly satisfy the clinical requirements without any manual revision was 22/35, while that of AP-1 was 16/35. The plan quality scoring gave the average values of 85.61 ± 6.78 and 83.92 ± 6.86 for AP-6 and AP-1 (p = 0.013).
The dosimetric results of CPs vs. MPs were given in Table 3. It was shown that both sets of VMAT plans achieved the dose coverage of V 60Gy higher than 99% for PGTVnd and PCTV. Compared with MPs, CPs exhibited lower V 110% (p < 0.001) and better CI (p = 0.001) for PCTV at a slight sacrifice of target dose coverage (p = 0.011) and minimal dose D min (p = 0.002). However, The D min was all greater than 93% of the prescription dose for both kinds of treatment plans. The average plan quality scores for tumor target (PGTVnd plus PCTV) were 43.39 ± 4.04 and 42.23 ± 3.47 for CP and MP (p = 0.011) ( Table 4).
As for the radiation dose to OARs, CPs significantly reduced almost all the dosimetric indices except for the bladder V 50Gy (%) and the rectum V 50Gy (%), compared with MPs (Table 3). The overall quality assessment gave the mean scores of 89.02 ± 4.83 and 86.48 ± 3.92, respectively, for CPs and MPs (p < 0.001) ( Table 4).

Discussion
Although pelvic VMAT has been an increasingly used technique for treatment of cervical cancer, designing an appropriate VMAT plan remains a challenge. The major difficulty lies in the fact that the planner usually does not know what kind of dose distribution is achievable for each OAR. Due to limited planning time, especially in some busy centers, the planner may not have enough chance to repeatedly adjust the dose distributions to explore whether there are better results. This tends to lead to some suboptimal plans. It was shown by us and others [31,32] that there were quite a few clinical plans that have room for improvement. Therefore, it has become a top priority to develop a way to improve the consistency and quality of the VMAT plans.
The KBP model can provide the estimated DVHs based on the prior knowledge, helping direct the planner's efforts towards an achievable high-quality plan. It was observed that the prediction ability was enhanced with the increased number of refinements in terms of OAR sparing for most OARs. This may result from the joint actions of the increased number of training samples and the improved quality of the plan database. Currently, there are few studies related to the required training samples size for a particular KBP model. The manufacturer specialists suggested that the minimum number of plans required for RapidPlan model creation was 20, but they also emphasized that adding additional plans would usually help create a more robust model [23]. Meanwhile, a newly published research discussed that although only 20 samples were needed to predict the rectum DVH, a sample size greater than 75 was recommended to train the KBP model [24]. This is why we started training from 25 samples and finally increased the sample size up to 100 cases. Our experiments proved once again the advantages of large training sample size in establishing the prediction model. By continuously updating the database with new plans of higher quality than before, the quality of the database was improved over time in a systematic way, which had interactive impact on the KBP model. The prediction model experiencing several runs of progressive training was found to provide better estimates for the final dose distribution.
With enhanced prediction ability, the highly refined model has shown its advantage in capturing actual clinical practices during the knowledge-based VMAT planning of cervical cancer. More than 60% of AP-6 plans can be directly approved for clinical treatment. The primary reason for the failure of automatic planning is the insufficient coverage of PCTV by the prescription dose in the overlapping region of PCTV to rectum and PCTV to bladder. There is a tendency for pelvic radiotherapy in our practice that the high dose coverage of tumor target area will be improved preferentially to ensure local tumor control when the radiation doses of critical organs do not exceed the dose tolerance limits. However, the RapidPlan takes the lower bound of the DVH estimate range as the optimization objectives by default with attempt to maximize OAR sparing. This, on the other hand, may lead to underdose of the adjacent tumor target. Adding a 3-mm ring structure outside the PCTV to allow for the high dose fall-off helps improve the success rate of automatic planning. More research is warranted.
Compared with the conventional trial-and-error planning method, our results demonstrated that the novel KBP method could enhance the quality of treatment planning in term of better OAR protection. Radiation induced acute and chronic toxicities, including small bowel obstruction, enteritis, proctitis, and radiation cystitis are serious issue of concern directly related to Table 2 Dosimetric comparison of AP-1 vs. AP-6. The statistical results between AP-1 scores and AP-6 scores were also given AP refers to the fully automatic plan, which was created by only one click at the "optimization" button with no other human intervention. AP-1 and AP-6 were fully automatic plans generated by using model C 1  the quality of life. It has been reported that the incident rates of grade 3 or higher complications in 83 patients treated with pelvic IMRT plus high dose rate brachytherapy are 2.4% and 3.6% for the rectum and the bladder, respectively [1]. By applying KBP, the dosimetric indices of the rectum and the bladder, such as D mean , V 40Gy and V 30Gy , were all significantly reduced (p < 0.001) under the condition of approximately similar target dose distributions. This may contribute to the fact that the incidence of late toxicities at our institution appeared to be lower than those reported in previous studies [3]. Moreover, the KBP method was found to help standardize the patient treatment, making treatment results less operator-and experience-dependent. In dosimetric comparisons, even CPs designed by a junior planner can achieve dose distributions comparable to MPs. Notably, the time spent for a KBP plan is much lower than that for a manual plan, even if manual revision is required.

Conclusion
The proposed model evolution method not only utilizes the KBP model to guide the planning process, but also takes it as a self-checking tool to identify high quality plans, providing a practical way to enhance the prediction ability with minimal human intervene. It has proved to our satisfaction that this highly refined prediction model can better guide KBP in improving the consistency and quality of the VMAT plans. The method described here was universal and can be used for some other cancer sites. However, in order to satisfy the diverse needs of clinical practice, it is recommended that each unit establishes its own model using this refinement method.