Dosimetric impact of inter-observer variability for 3D conformal radiotherapy and volumetric modulated arc therapy: the rectal tumor target definition case

Background To assess the dosimetric effect induced by inter-observer variability in target definition for 3D-conformal RT (3DCRT) and volumetric modulated arc therapy by RapidArc (RA) techniques for rectal cancer treatment. Methods Ten patients with rectal cancer subjected to neo-adjuvant RT were randomly selected from the internal database. Four radiation oncologists independently contoured the clinical target volume (CTV) in blind mode. Planning target volume (PTV) was defined as CTV + 7 mm in the three directions. Afterwards, shared guidelines between radiation oncologists were introduced to give general criteria for the contouring of rectal target and the four radiation oncologists defined new CTV following the guidelines. For each patient, six intersections (I) and unions (U) volumes were calculated coupling the contours of the various oncologists. This was repeated for the contours drawn after the guidelines. Agreement Index (AI = I/U) was calculated pre and post guidelines. Two RT plans (one with 3DCRT technique using 3–4 fields and one with RA using a single modulated arc) were optimized on each radiation oncologist’s PTV. For each plan the PTV volume receiving at least 95% of the prescribed dose (PTV V95%) was calculated for both target and non-target PTVs. Results The inter-operator AI pre-guidelines was 0.57 and was increased up to 0.69 post-guidelines. The maximum volume difference between the various CTV couples, drawn for each patient, passed from 380 ± 147 cm3 to 137 ± 83 cm3 after the introduction of guidelines. The mean percentage for the non-target PTV V95% was 93.7 ± 9.2% before and 96.6 ± 4.9%after the introduction of guidelines for the 3DCRT, for RA the increase was more relevant, passing from 86.5 ± 13.8% (pre) to 94.5 ± 7.5% (post). The OARs were maximally spared with VMAT technique while the variability between pre and post guidelines was not relevant in both techniques. Conclusions The contouring inter-observer variability has dosimetric effects in the PTV coverage. The introduction of guidelines increases the dosimetric consistency for both techniques, with greater improvements for RA technique.


Background
Modern radiation therapy techniques with inverse planning optimization are able to achieve optimal dose painting covering any desired volume. In this context accurate target delineation is vitally important to ensure that the target is not under-treated and to limit the dose to surrounding normal tissues. At this purpose, recent reports recommended the creation of a target definition consensus and stated the importance of specific educational interventions concerning target contouring.
Pre-operative chemo-radiotherapy of rectal cancer in locally advanced stage has become a widely accepted treatment modality. Locally advanced rectal cancer treated with neo-adjuvant chemoradiation therapy is expected: a) to show positive response with tumour down-staging in about half of patients [1]; b) to obtain better results in terms of local control compared to adjuvant approach as shown in a phase III study [2]. The new technologies in radiotherapy, such as intensity modulated radiotherapy (IMRT) or more recently volumetric modulated arc therapy (VMAT), allow to achieve highly conformed dose distribution on the target volume and to spare the adjacent healthy tissues (HT) and organs at risk (OAR). In several studies with patients receiving pelvic irradiation for rectal or anal cancer, it has been shown that IMRT and VMAT are dosimetrically superior to other conformal techniques in protecting normal tissue close to the target [3]. Roberton et al. [4] showed dose-volume relationship between bowel irradiation and acute grade 3 diarrhoea to be clearly correlated and suggested the need of reducing as much as possible the OARs involved in preoperative irradiation of rectal cancer. Thus a contouring methodology shared by the group is a fundamental topic, as assessed by many works on rectal cancer [5][6][7][8].
In this paper we have investigated the dosimetric impact of introducing educational interventions in the delineation of the rectal target. We conducted a study in which participating radiation oncologists delineated target contours before and after the introduction of shared guidelines. The aim of the study was to evaluate and compare the dosimetric effects of target contouring variability in cases of 3Dconformal RT (3DCRT) and of RapidArc (RA) techniques. Plans were optimized for each target delineated by the radiation oncologists. The primary endpoint was to evaluate the dosimetric coverage of the remaining radiation oncologist's targets defined on the same patient. Secondary endpoint was the evaluation of doses at OAR for the two techniques. The contouring inter-observer variability within the radiation oncologists of the group before and after the introduction of shared guidelines was preliminarily evaluated.

Patient selection
Ten patients (seven males and three females) with pathologically proven rectal cancer in locally advanced stage, subjected to neo-adjuvant RT with curative intent were considered in the present analysis. Patients were randomly selected from the internal database of patients; to avoid possible biases in contouring, the patients' names were hidden and associated with a progressive numeration. Computed tomography (CT) datasets were acquired with a 3-mm slice thickness from a 16 slice CT system, in free breathing condition. Patients, with arms raised above the neck, were in prone position and immobilized with Belly-Board devices to dislocate anteriorly as much as possible intestinal loops of small bowel.
Four radiation oncologists were involved in this study. Each of them was asked to contour the clinical target volume (CTV) for each of the ten patients in blind mode (i.e. radiation oncologists could not see the contours of the other oncologists involved in the study). After that, our institute's rectal cancer referential radiation oncologist established a consensus-based guideline on CTV delineation, in order to share some general criteria for the contouring of rectal target. After a minimum of one month, the same four radiation oncologists contoured the ten targets following the guidelines, in blind mode, too (i.e. radiation oncologist's could not see neither the other physician's contours nor their own previous ones).
The planning target volume (PTV) was defined adding three-dimensional 7 mm margins to the CTV.

Target definition guidelines
This educational intervention included a formal guideline, available on-line in our department, as well as an initial teaching session involving all physicians taking part to this study. The CTV had to include the entire mesorectum, the presacral and internal iliac nodal regions, the gross tumor with a cranial and caudal margin of at least 2 cm. Criteria for CTV delineation strictly followed guidelines from Roels et al. [6]. Since RTOG atlases are commonly used in our department by means of on-line links at the contouring workstations, participants were carefully informed about differences of our criteria compared with those in the RTOG paper from Myerson et al. [5].

Planning techniques
For each patient, 8 plans (4 3DCRT and 4 RA) were optimized before the introduction of the guidelines and other 8 were optimized after the guidelines introduction. Each PTV drawn by the four physicians before and after the guidelines introduction was set as plan target of one of the 16 plans optimized for each patient. A standard protocol was adopted for all plans: dose prescription was set to 50.4 Gy to mean PTV in fractions of 1.8 Gy/day. For all PTVs, plans aimed to achieve V 95% > 95% (at least 95% of the PTV volume must be covered by 95% of the prescribed dose) and a maximum dose (i.e. D 2% as defined in ICRU 83) lower than 107%. Bowel (defined as the entire peritoneal cavity), bladder and femoral heads were considered as OARs. The mean dose, maximum dose (D 2% ) and appropriate values of V × Gy (volume receiving at last × Gy) were scored. Planning objectives for OARs were defined as follows: bowel V 45Gy < 80 cm 3 and V 50Gy < 20 cm 3 ; no hotspot inside the bladder was allowed, D 30% < 35 Gy, and mean dose objective was <45 Gy; femoral heads maximum dose (D 2% ) < 47 Gy [3]. The planning objectives for HT were not numerically formalised, the strategy was to minimise its involvement.
The 3DCRT series were planned according to our institute's practice with three fields (one posterior and two laterals with wedges) or four fields (posterior, anterior, and two laterals with wedges) with 6 MV or 18 MV energy. The beam arrangement was set in order to obtain the best solution according to the target shape. Conformal shaping of the fields was performed by means of static MLC, setting 5 mm MLC margin in lateral and 7 mm in cranialcaudal direction. The RA plans consisted of a single 360°arc of 6 MV; the RA plans were optimized starting with a common dose volume histogram (DVH) objective template. All plans were normalized to the mean dose of the target PTV (i.e.100% at target mean). Both techniques were optimized using Varian Eclipse treatment planning system (version 8.9) on a 2100-DHX Varian Linac, equipped with a Millennium MLC (leaf width at isocentre of 5 mm in the central 20 cm part of the field, 10 mm in the outer 2 × 10 cm and a leaf transmission of 1.7%). All dose distributions were computed with the Anisotropic Analytical Algorithm (AAA) implemented in the Eclipse planning system with a calculation grid resolution of 2.5 mm.

Data analysis
Firstly, the contours were evaluated from the geometrical point of view. In particular, for each patient the CTV volumes were measured and the variation was calculated as the maximum volume difference between two CTVs among the four targets (one for each physician) drawn on a same patient. The percentage volume variation was calculated for each patient's target, defined as Δ = 100 × (V max -V min )/V mean . Furthermore the ratio (V max /V min ) was reported. These definitions were used to give information about the deviation regardless of the volume absolute values. Concerning the interobserver contouring variability, for each patient six intersections (I) and unions (U) volumes were calculated coupling the contours of the various oncologists. This was repeated for the contours drawn after the guidelines. Agreement Index (AI) (i.e. with V i and V j the volume delineated by the i-th and j-th physician) AI ij ¼ V i ∩V j V i ∪V j was calculated for each target and for all possible couples of contours pre and post guidelines introduction.
Quantitative evaluation of plans was performed by means of DVH. For PTV the following data were reported and used as a parameter: target coverage evaluating the PTV V95%: PTV's volume receiving at least 95% of the prescribed dose (dose prescription: 50.4 Gy). These parameters were evaluated separately for the target PTV (i.e. the PTV on which the plan was optimized) and for non-target PTVs (see Figure 1) in order to assess the dosimetric impact of the target definition uncertainty for both the techniques considered. This analysis was performed before and after the introduction of guidelines in order to evaluate a possible dosimetric improvement.

Study design and statistical analysis
The present study was performed as part of the internal quality process for improving RT practice. Ten CT scans were considered as a representative sample of the procedure. Contours and plans were compared with the Wilcoxon matched-pair signed-rank test for nonparametrically distributed data. The threshold for statistical significance was set at p < 0.05. The analysis was performed using Statistica 6.0 (Vigonza, Italy).

Contouring inter-observer variability
A total of 80 contours were generated and analyzed. Each contour was superimposed on the original CT images. An example of target contouring drawn before and after the introduction of guidelines by the four radiation oncologists is shown in Figure 2.
Tables 1 and 2 report the analysis of the CTV volumes contoured before and after the guidelines introduction. In detail, the mean inter-operator variability, evaluated on CTV contouring, was evaluated before and after the introduction of the guidelines. For the pre-guidelines contours, mean CTV volume was 380 cm 3 ranging from a maximum of 682 cm 3 (patient 2) to a minimum of 117 cm 3 (patient 7) and the mean value decreased to 137 cm 3 ranging from 283 cm 3 (patient 10) to 31 cm 3 (patient 4) for the post guidelines contours. The ratio of the largest to the smallest contoured volume was 1.79 before and 1.27 after the introduction of guidelines. The inter-operator AI passed from 0.57 to 0.69 thanks to the guidelines introduction. The intra-observer AI before and after the guidelines introduction was 0.74, with significative target volume reduction.
The most relevant discrepancy in terms of target definition regarded the bilaterally inclusion of external iliac nodes. This differences influenced the anterior posterior target volume, while in cranial caudal direction no relevant differences were found (<1 cm). Figure 3 shows representative examples of dose distributions, using color-wash lookup table, obtained with both 3DCRT and RA techniques on the same patient. Plans on the left were optimized on the PTV defined by physicians 1 while plans on the right were optimized on the target drawn by physician 2 before the introduction of the guidelines. The target PTVs were fully covered both with 3D-CRT and RA though RA technique allows a better dose sculpting on the target and a dose reduction on neighbour HT. This dose sculpting, however, induces an under dosage on the non-target PTV (i.e. the PTVs delineated by the other physician on the same patient) for both plans (see arrows in Figure 3). On the contrary, the 3D-CRT, with the classical box approach, has a lower sparing of the neighbor tissue but allows a better coverage of the non-target PTVs, and only in one case inducing an under-dosage. Figure 4 shows the PTVs drawn by physicians 1 and 2 (the same of Figure 3) after the introduction of the guidelines. Dose distributions, using color-wash lookup table, are shown. As a consequence of the shared guidelines, the contours appear more similar and thus the dose distributions too; in this case, only a small area of the non-target PTV was not covered using the RA approach, while a complete coverage was met for the 3DCRT technique. Table 3 reports the systematic DVH analysis for the 3DCRT and RA techniques before and after the guidelines introduction. Data in the table are normalized to the prescription dose (100% corresponds to 50.4 Gy). In detail, the target PTV always fulfilled the objectives in terms of target coverage (95% of the volume received 95% of the prescribed dose). Considering the non-target PTVs, instead, the mean volume receiving 95% of the prescribed dose was 93.7 ± 9.2% before and 96.6 ± 4.9% after the introduction of guidelines for the 3DCRT; for RA plans the increase was more relevant, going from  86.5 ± 13.8% (pre) to 94.5 ± 7.5% (post). Furthermore the percentage of plans that had an acceptable non-target PTV coverage (i.e. V 95% ≥ 95%) passed from 62% to 73% (+11%) for 3D-CRT, while for VMAT plans the increase was +22% (from 41% to 63%) (see Figure 5).

Target coverage and dose homogeneity
For OARs the results were the following: for the bladder V 40Gy = 41.1 ± 24.8% for 3D pre, 23.7 ± 20.1% for RA pre, 29.7 ± 18.7% for 3D post and 16.0 ± 10.7% for RA post; the mean dose for right and left femur resulted almost equivalent: 32 Gy for 3D pre, 30 Gy for 3D post and 19 Gy for RA pre and post guidelines.
The mean values of MU/Gy for 3D plans were 179.2 ± 17.2 and 180.4 ± 6.8 respectively for pre and post guidelines, while for RA plans these values were 198.6 ± 26.2 and 187.3 ± 18.3 for pre and post guidelines plans respectively.

Discussion
This work is located inside the topic of quantifying and improving the precision and accuracy of the RT treatments with an interdisciplinary approach, as summarized by Yorke at al. in the anniversary paper on the role of medical physicists in improving geometric aspects of treatment accuracy and precision [9]. Imprecise localization of internal anatomy, tissue in-homogeneities, patient voluntary and involuntary motions, and other kind of human induced uncertainties can lead to inaccuracies much greater than the 1-2% of the usual absolute dose calibration uncertainty. In this report the dosimetric consequences of inter-observer variability in target contouring for different techniques was evaluated. The rectal tumor case was chosen as representative of challenging target definition and for its concave shape, very suitable for intensity modulated techniques.
Inter-observer variability in target volume delineation is demonstrated to be one of the major factors contributing to the global uncertainty in radiation treatment planning [7,8,10]. Accurate target delineation is extremely important to make sure that the CTV is not under-treated and to limit the dose to surrounding normal tissue. Despite the well-known consequences of geometric inaccuracy in target volume delineation [11][12][13], variability in target delineation has been demonstrated in several studies and for various anatomic tumor sites [13]. In the case of nonsmall cell lung cancer, for example, Steenbakkers et al. [14] reported that the size of GTV ranged from 36 cm 3 to 129 cm 3 (ratio 3.6, average 69 cm 3 ), while van Sornsen de Koste et al. [15] found that the average GTV for the main tumor of a cT2N2M0 lung cancer was 13.6 cm 3 (SD 5.2 cm 3 , median 12.3 cm 3 , range 8.3-26.9 cm 3 ) as determined by 16 radiation oncologists. Concerning rectal cancer, CTV delineation presents a great variability in literature. Fuller et al. [7] analyzed a set of patients very similar to the ones in this study in terms of tumor stage and found a range in CTV delineation between 590 cm 3 and 820 cm 3 ; this result is comparable with our findings (CTV range between 499 cm 3 and 994 cm 3 ). The impact of the uncertainties should be evaluated again whenever a new modality of treatment delivery is introduced in the clinical practice and for RA this evaluation was already performed from other points of view [16,17]. This is particularly important since an increase in precision and conformation of dose distribution usually leads to heavier effects on dose distribution due to geometric uncertainties.
In this study we have evaluated the dosimetric impact of introduction of shared guidelines in the contouring of rectal cancer target. This project was done as part of an internal process of risk analysis in RT [18]. A total of 80 contours were generated and analysed as pre-requisite to perform the dosimetric analysis. In fact only verifying the consistency of the contouring variability with data reported in recent literature with higher populations is possible to perform the dosimetric analysis. In detail, AI was calculated pre and post guidelines coupling the contours of the various oncologists.   variation was analyzed volumetrically using the conformation number (CN, where CN = 1 equals total agreement). This research showed that a consensus atlas led to a significant increase of inter-observer agreement and CN increased from 0.58 to 0.69. Something similar was found by Myerson et al. [5] using Kappa statistics as a measure of agreement between participants: without any protocol K mean value was 0.49. Comparable results were found also for different sites. In a multi-institute study by Van Mourick et al. [19] a conformity index (CIvm), corresponding to the AI reported in this study, was determined (per patient and per observer couple) dividing the common volume by the encompassing volume (CIvm = 0 indicates no overlap between the two observers, whereas CIvm = 1 indicates perfect overlap). CIvm value passed from 0.3 to 0.8 with the introduction of contouring guidelines. This result is comparable with the one found by Batumalai et al. [20]: using a contour reference guide for the delineation of breast target, a mean concordance index of 0.81 was evaluated. A similar result (mean concordance index of 0.87) was reported by Struikmans et al. [21] for the same site. The inter-observer concordance increasing value, found in this research as well as in literature,   indicates that the use of a contouring protocol may contribute to decrease inter-observer variability. Moreover, the reduction of mean CTV volumes after the introduction of guidelines (649 cm 3 vs 595 cm 3 ) can be due to the higher confidence in contouring that avoids excessively conservative contours, for example guidelines reduced the uncertainties regarding inclusion of external iliac nodes as can be seen in Figures 2 and 3. This ensures a further OAR sparing, as the irradiated volume is reduced. Once verified the consistency of our contouring results we focused on the evaluation of the dosimetric uncertainty due to contouring observer variability for both 3DCRT and VMAT by RA techniques. Foppiano et al. [8] investigated the impact of inter-observer variability on rectal tumor volume and the consequences of this in DVH analysis in order to define reliable constraints for 3D conformal RT. In our series, after the introduction of guidelines the mean value of V 95% increased for both techniques and, at the same time, the standard deviation decreased of about 50%. In addition, improvement in PTV coverage was respectively of 3% for 3DCRT and of 8% for RA technique.
Moreover, dosimetric data showed the RA capability to reliably reproduce the dosimetric quality of conventional conformal plans, with some observable improvement such as: treatment conformality, reduction of hot spots inside target volume, reduction of OAR involvement like femurs and global reduction of HT involvement. This also confirms that normal tissue can often be better protected with IMRT and VMAT than with other conformal RT techniques, this feature was already demonstrated by other dosimetric investigations, in patients receiving pelvic radiation for anal or rectal cancer [22][23][24][25] and other anatomical regions [26][27][28][29].
While the potential of normal tissue sparing is one of the motivations behind the move towards RA for this site, thanks to the higher dose conformation, the identification of the correct target and the achievement of good target coverage remain the primary objectives and gain a still greater importance. The DVH evaluation of non-target PTVs for all optimized plans showed possible under dosages and hot-spots that made some plans unacceptable. In this setting the importance of reducing as much as possible the uncertainty in target delineation is evident. The introduction of shared guidelines is, in this context, a key intervention. In our study, for RA series, more than 50% of the plans resulted in a PTV under-dosage and this rate was reduced by more than 20% introducing guidelines. For a 3D technique this is less crucial since the absence of modulation and dose sculpting ensures acceptable target coverage despite the great PTV variability. Therefore the introduction of RA (as well as the other modulated techniques) in the clinical activity requires precise target delineation as the inverse procedure used in RA technique optimizes the dose conformation to the contoured target.
A limitation of this study is that CT was the only imaging modality used to determine the tumor target. Modern imaging techniques, such as MRI, endoscopic ultrasound, and PET could add useful information. As confirmed by different studies [12,30,31], the use of PET-CT or MRI matching may reduce inter-clinician Figure 5 Cumulative histogram representing the number of plans (%) with a certain non-target PTV volume fraction covered by the 95% isodose. For example for 3D pre 83% of the plans had at least 85% of the non-target PTV volume covered by the 95% isodose.
variations, irrespective of the introduction guidelines. Another possible limitation of this study is the number of observers used for delineation, though the optimal number of observers required in such studies remains unknown. The current study had a total of ten different patients' CT scans and four observers; this is comparable with the study by Batumalay [20], who used four observers and ten patients. Otherwise Fuller et al. [7] and Foppiano et al. [8], for example, reached similar results in target volume contouring in rectal irradiation respectively with 17 observers and 4 patients and 14 observers and only one patient's CT scan.

Conclusions
The introduction of guidelines reduces considerably the inter-observer variability in neo-adjuvant rectal cancer CTV delineation. In 3DCRT the minimization of contouring inter-observer variability improves the dosimetric consistency of the plans but the low dose conformation makes these changes less crucial than in modulated techniques where it is, instead, of primary importance. The introduction of shared guidelines is thus a necessary prerequisite when treating rectal cancer with modulated techniques in order to avoid severe target miss.