Quality assurance for the EORTC 22071–26071 study: dummy run prospective analysis

Purpose The phase III 22071–26071 trial was designed to evaluate the addition of panitumumab to adjuvant chemotherapy plus intensity modulated radiotherapy (IMRT) in locally advanced resected squamous cell head and neck cancer. We report the results of the dummy run (DR) performed to detect deviations from protocol guidelines. Methods and Materials DR datasets consisting of target volumes, organs at risk (OAR) and treatment plans were digitally uploaded, then compared with reference contours and protocol guidelines by six central reviewers. Summary statistics and analyses of potential correlations between delineations and plan characteristics were performed. Results Of 23 datasets, 20 (87.0%) GTVs were evaluated as acceptable/borderline, along with 13 (56.5%) CTVs and 10 (43.5%) PTVs. All PTV dose requirements were met by 73.9% of cases. Dose constraints were met for 65.2-100% of mandatory OARs. Statistically significant correlations were observed between the subjective acceptability of contours and the ability to meet dose constraints for all OARs (p ≤ 0.01) except for the parotids and spinal cord. Ipsilateral parotid doses correlated significantly with CTV and PTV volumes (p ≤ 0.05). Conclusions The observed wide variations in treatment planning, despite strict guidelines, confirms the complexity of development and quality assurance of IMRT-based multicentre studies for head and neck cancer. Electronic supplementary material The online version of this article (doi:10.1186/s13014-014-0248-9) contains supplementary material, which is available to authorized users.


Background
The management of locally advanced head and neck squamous cell carcinoma (HNSCC) involves increasingly complex combined modality approaches. After primary surgery, conventionally-fractionated adjuvant radiotherapy (RT), commonly delivered to a total dose of 64-66 Gy, reduces the locoregional recurrence rate by at least half, which has translated into a survival benefit [1][2][3][4][5][6][7][8]. The addition of concurrent cisplatin has been investigated in two major randomized trials with significant improvements in progression-free [6,7] and overall survival (OS) [6] overall, and for the subgroup of patients with extracapsular extension (ECE) and/or surgical margins <5 mm [9]. Additionallly, blockade of the epidermal growth factor receptor (EGFR) with cetuximab reduces the likelihood of disease progression and increases 3-year OS by 10% without enhancing typical RT side effects [10]. The open-label, multicentre randomized phase III EORTC 22071 trial was designed to determine whether adding the EGFR inhibitor panitumumab to adjuvant chemoradiotherapy (CRT) concurrently would significantly prolong disease-free survival (DFS) in macroscopically completely resected HNSCC (ClinicalTrials.gov NCT01142414). Eligible patients had surgically resected non-metastatic squamous cell carcinoma of the hypopharynx, oropharynx, larynx or oral cavity, stage pT1-2 node positive, or any pT3-4 (UICC 6 th Edition) at high risk of locoregional recurrence based on one or more of the following: R0 resection with surgical margins <5 mm, R1 resection (margin <1 mm) or ECE.
Variations in compliance with protocol RT delivery in multicentre studies decrease tumour control, increase RT toxicity and may negatively impact survival [11][12][13][14][15][16][17][18][19]. An extensive quality assurance (QA) program increases interinstitutional consistency and familiarizes participating sites with EORTC procedures [16]. Centres undergoing pre-trial accreditation are better prepared to comply with protocol requirements [17], since QA ensures participating institutions can adhere to protocol instructions, including adequate contouring of volumes [18]. In a secondary analysis of available Radiation Oncology Group dummy run (DR) cases over two decades, institutions which previously completed a DR were significantly more likely to be successful at other trials' QA [19].
Planned RT QA procedures for EORTC 22071 included completion of a trial-specific DR as well as a complex dosimetry check (phantom irradiation). However, in spring 2011, the RTOG reported that adding concurrent cetuximab to CRT did not benefit patients with locally advanced unresected HNSCC [20]. Therefore, testing the same concept in the adjuvant setting with a different investigational agent was not considered likely to be beneficial. As a consequence, the trial was suspended in July 2011 prior to accrual of its first patient. Despite closure of the trial, we analyzed completed DR datasets to assess compliance with protocol guidelines.

DR procedure
The DR case history (Online Additional file 1) reflected a 51 year old female with a pT2N1M0 left lateral tongue HNSCC post-R1 resection ( Figure 1). CT simulation imaging and preoperative CT and MRI scans were downloaded from the EORTC by participating institutions and loaded into the local treatment planning systems (TPS). Target volumes and organs at risk (OAR) were to be defined and a protocol-compliant treatment plan generated. Subsequently, the planning CT in DICOM format, plus the structure set, 3D dose matrix and RT plan (DICOM-RT format) were uploaded to the EORTC Quality Assurance in Radiotherapy office via a webbased tool. After evaluation for data integrity, datasets were transmitted to the Image-Guided Therapy Quality Assurance Centre (http://atc.wustl.edu) for review by the trial QA team in relation to the master contours. Master contours had been constructed independently by four expert head and neck radiation oncologists according to protocol guidelines (WB, JL, SN, CS), after which two meetings were held to reach consensus.

Radiotherapy
The protocol recommended that the diagnostic CT be co-registered with the postoperative planning CT to facilitate definition of the preoperative extent of primary tumour ("GTV-pt"). A low risk volume (CTV52.8 Gy), and areas at intermediate (CTV59.4 Gy) and high risk (CTV66 Gy) of harbouring microscopic disease were to be defined (Table 1). Volume selection guidelines were outlined in the protocol, based on Gregoire et al. [21]. Although sites were given the option to encompass the entire LN level containing the LN with ECE for trial patients, for the DR patient, the preoperative extent of the LN with ECE was considered to be clearly identifiable. Therefore, only the reconstructed LN with ECE should have been included. For all CTVs, 3D margin expansion was to be anatomically adapted to avoid structures not at risk for microscopic disease (eg air cavity, bone). Three PTVs were to be generated using a recommended margin of 5 mm; 3 mm was allowed if using advanced position verification procedures. Required optimization structures called 'PTV-Exacts' were equivalent to the respective PTVs collapsed inside the external body contour by 5 mm to avoid extension into the build-up region. A simultaneous integrated boost (SIB) technique with 6-10MV photons was required for all patients receiving intensity modulated RT (IMRT). Prescription dose to the PTV1-Exact was 52.8 Gy (referred to throughout as "PTV52.8 Gy-Exact"), PTV2-Exact was 59.4 Gy ("PTV59.4 Gy-Exact"), and PTV3-Exact was 66.0 Gy ("PTV66 Gy-Exact"), each delivered in 33 fractions; at least 95% of the prescribed dose was to cover 95% of the PTV-Exacts.

Organs at risk
Delineation of the brainstem, spinal cord (equal to the osseous borders of vertebral canal), and parotid glands was mandatory. All other OARs were considered optional. Formal planning volume at risk margins were not utilized.

Central review
Four radiation oncologists and two medical physicists participated in the central review procedure. Reviewers evaluated volume selection and delineation in relation to master contours (Figure 1), as well as treatment planning parameters, OAR dosimetry, dose distributions and dose-volume histograms (DVH). Deviations were graded as acceptable, borderline, and unacceptable based on the defintions in Table 1 and taking into account all available information along with ICRU recommendations [22]. For example, the final grade of the CTVs considered the degree of compliance of the GTV. In case of discrepant judgments of reviewers, the grade was assigned either by WB or by JL. Based on all reviewer comments, an overall grade was given to each DR dataset by WB or JL.

Statistical analysis
Descriptive statistics were compiled as proportions for categorical variables, and averages (standard deviations) for normally distributed continuous variables. Recalculated DVHs were used to assess dosimetric parameters. Fisher's exact test evaluated the correlation between two categorical variables with cell count <5. The Spearman correlation coefficient explored correlations between characteristics of target volumes and OAR doses. The independent t-test compared OAR doses delivered to volumes which did versus did not meet protocol constraints. A p value of ≤0.05 was considered statistically significant. All analysis was conducted using IBM SPSS version 19. The mean virtual radius was obtained for all CTVs by calculating the radius of a sphere with the identical volume as the corresponding CTV.

Volume delineation
GTV-pt contours were evaluated as acceptable in 9/23 (39.1%) cases, borderline in 11 (47.8%) and unacceptable in three (13.0%). The specific LN with ECE was not reconstructed by 10/23 sites, but this was not by itself sufficient to downgrade the GTV if it was otherwise considered acceptable. Examples of submitted datasets with contours graded overall as minor or major deviation are shown in D-F). Overall, 13.0% of sites' target volumes (global evaluation of GTV, CTVs and PTVs) were acceptable, 43.5% graded as minor deviation and 43.5% as major deviation.

Dosimetry
D98% and D95% constraints were used to evaluate whether sufficient RT dose was prescribed to the respective volumes (Online Additional file 1). Seventeen (73.9%) sites met both PTV52.8 Gy-Exact dose constraints. For the D98% constraint, there were six major deviations, and for the D95%, one minor and five major deviations. Twenty (87.0%) datasets met both PTV59.4 Gy-Exact constraints; three had major violations for both D98% and D95%. Based on the D98% of the PTV66 Gy-Exact, 21 were acceptable, with one minor and one major deviation. In terms of the D95%, 22 were acceptable, with one major violation. 21 (91.3%) DR datasets met both PTV66 Gy-Exact constraints. There was no statistical correlation between PTV contour evaluation and whether the site met PTV52.8 Gy-Exact, PTV59.4 Gy-Exact, or PTV66 Gy-Exact constraints (all p ≥ 0.74). Nine median PTV66 Gy-Exact doses exceeded 66 Gy by >2%. Whether the PTV66 Gy-Exact was evaluated as acceptable, borderline or unacceptable did not predict whether the median PTV66 Gy dose exceeded 66 Gy by 2% (p = 0.86). No plan exceeded a dose of 72.6 Gy to a volume larger than 1.8 cc outside the PTV66 Gy-Exact. In three datasets, optimization was probably performed on PTVs not collapsed inside the body contour. In one, a second non-IMRT plan used for treatment of the low neck was not submitted rendering the plan not fully evaluable.

Organs at risk
Consensus contour evaluations are displayed subdivided by whether the OAR met dose constraints ( Figure 5). If  There was no statistically significant correlation between the volume of any CTV or PTV and ability to meet parotid dose constraints (all p > 0.20), or between parotid contour acceptability and meeting constraints (p = 0.17). Mean dose to the right parotid versus PTV volumes are shown in the online Additional file 1. The dose limit was met by one third of sites who contoured the (optional) right submandibular gland, although it was often included in the low dose PTV so was difficult to optimally spare. Recommended doses for both the oral cavity and larynx excluded overlap with PTV-Exacts. There were statistically significant correlations between contour grades and whether the site was able to meet respective dose constraints for the brainstem, right submandibular gland, oral cavity, and larynx (all p ≤0.004). Volumes of CTV58.4 Gy, CTV66 Gy and PTV58.4 Gy correlated significantly with left parotid D50 (all p ≤ 0.047), and volumes of CTV58.4 Gy and PTV58.4 Gy correlated significantly with left parotid mean dose (both p ≤ 0.043). Average doses were significantly higher for OARs which did not meet constraints versus those that did (p ≤ 0.04) ( Figure 6).

Discussion
Twenty of twenty-three GTVs, 13 CTVs and 10 PTVs were evaluated as acceptable or borderline, and three CTVs excluded part of the preoperative extent of tumour. Seventy-four percent, 87.0% and 91.3% of PTV52.8 Gy-, PTV59.4 Gy-and PTV66 Gy-Exacts, respectively, met all dose constraints. Although the overall degree of variation in target volume delineation was high, and centers' volumes tended to be larger than the master contours ( Figure 4) with the exception of elective nodal volumes, deviations were smaller in the high risk volumes (GTV-pt and CTV66 Gy). The mean distance of the GTV-pt contour from the master contour was, with the exception of one center, below 2.5 mm based on the mean virtual radius (Figure 4). Dosimetric data can be considered a proxy for the biologic effects of protocol treatment, had it been delivered [23], and since most locoregional recurrences occur in the high risk volume, this variation between centers would be expected to have a low impact on locoregional tumor control. The main causes of poor compliance with the guidelines were that the delineation of the nodal levels and the CTV margins on the GTV-pt were not performed entirely according to protocol. Similar to the current study, investigators from the PARSPORT multicenter randomized trial were asked to delineate volumes as per protocol which were then compared centrally with a master set [24]. For three submissions, there were no significant differences between submitted and reference volumes; there were small discrepancies in four, attributed to the learning curve and inter-observer variability. Large differences from lack of adherence to the trial guidelines were found in the remaining three representing exclusion of specific anatomical areas or nodal levels. Planners also created PTVs based on CTV and OAR volumes provided [24]. Eight centres achieved the required dose constraints with the other two within 2%. Cord and brainstem tolerances were not exceeded in any plan. The average mean contralateral parotid dose was 25.5 Gy. All contralateral parotid doses <24 Gy were delivered with dynamic IMRT using 5 mm width leaves. The authors concluded that differences in parotid sparing were likely due to MLC leaf width rather than delivery technique; this information is not available for datasets in the present study [24]. A Swiss national DR-type study was undertaken to try to explain the reasons behind large discrepancies in target volume delineation [25]. Radiation oncologists from 11 centres received a CD containing a CT scan and MRI sequences for a 72 year old man with a well-differentiated T3N0 base of tongue HNSCC. The authors reported increased GTV homogeneity when more precise radiological imaging was available (contrast, thinner CT slices, multiple modalities). PTVs were more homogeneously defined, partially compensating for relative inconsistencies in GTV and CTV contours. The authors concluded that the main reasons for observed differences were variable interpretation of protocol instructions and ICRU definitions, particularly the CTV, and a compensation effect of the PTV where a clinical margin was subconsciously added [25]. In our study, median volumes of CTV52.8 Gy, CTV59.4 Gy and CTV66 Gy were on average 7%, 32%, and 24% larger than the respective master contours, suggesting that local investigators also tended to incorporate a safety margin.
Nelms et al. examined variation in OAR contours in a patient with oropharyngeal cancer to quantify interclinician variability as well as changes in IMRT dosimetry due strictly to OAR differences [26]. Investigators were provided with a CT dataset that included a precontoured GTV, three CTVs, and three PTVs. OAR definition was left to the discretion of local investigators; major variations were seen in resulting OAR sizes and shapes. The dosimetric impact of variation in OAR contours was estimated by overlying the reference OARs onto each site's optimized dose grid. Reported dose differences depended on the degree of contour variation and the plan's dose gradients with smaller differences seen with increasing OAR accuracy [26]. We provided multimodality imaging (preoperative CT and MRI) and references to OAR atlases to attempt to address this potential source of interobserver variation.
The ability to meet a dose-volume constraint is also dependent on the proximity of an OAR to a high-dose PTV or steep dose gradient [18,23,[25][26][27]. The requirement to meet a single dose-volume constraint, considered a limitation in previous IMRT comparison studies [28], was avoided by provision of up to three constraints per structure in this trial. This alllows greater flexibility in how dose objectives can be met [18], although may increase plan heterogeneity.
A modern protocol should leave freedom in RT technique but clearly describe target volumes, dose homogeneity requirements, dose prescription, relative priorities, and dose-volume constraints to unambiguously specified OARs [29]. This study had a few limitations. A clearer distinction between requirements versus recommendations for OAR contours may have decreased the number of violations since absence of a mandatory structure resulted in an 'unacceptable' rating. Although both CT and MRI were provided for the DR patient, 18 F-FDG PET was not utilized. Preoperative PET images may have helped delineation of the GTV and therefore increased inter-observer agreement [30,31]. Regular progress meetings for clinicians using the same TPS and hardware for structured knowledge transfer, as suggested by Clark et al. [24] were not performed. In evaluating acceptability of target volumes, a reference must be defined. It could be argued whether our set of master contours was worthy of being the gold standard, but as per Nelms et al., given any reference dataset, variability would still exist [26]. Finally, inter-centre comparison of OAR doses was complicated by the fact that each centre created its own contours, making it harder to determine the degree to which differences were influenced by planning technique.
Steps have already been taken to address the last issue in the form of a two-step DR procedure. First, the DR case is sent to participating centres for delineation of volumes, along with a list of requirements for compliance and predefined major deviations. Via an iterative process, local investigators resubmit volumes for evaluation until no major deviations are identified. Minor deviations are also reported to participating centres but correction is not required. In the second step, the master contours are sent to centres, based on which a protocolcompliant treatment plan is generated and centrally reviewed. This guarantees the same starting conditions for all centres. Additional means by which improvement in DR quality could be achieved would depend on determining specific reasons for non-compliance within each trial; potential reasons have been recently reviewed [19].

Conclusions
Wide variation in dose planning in the EORTC 22071 dummy run confirms the complexity of development of IMRT-based multicentre clinical studies for head and neck cancer, and underscores the need for insistence on adherence to strict QA procedures for the present time.

Additional file
Additional file 1: Dummy run case. Table S1. Volume and dose statistics for CTVs and PTVs. Dosimetric indices for CTVs not prespecified in protocol. Abbreviations: avgaverage; N/anot applicable; SDstandard deviation. Figure S1. Mean dose to right parotid versus planning target volumes.