Interobserver variability in target volume delineation in definitive radiotherapy for thoracic esophageal cancer: a multi-center study from China
Radiation Oncology volume 16, Article number: 102 (2021)
To investigate the interobserver variability (IOV) in target volume delineation of definitive radiotherapy for thoracic esophageal cancer (TEC) among cancer centers in China, and ultimately improve contouring consistency as much as possible to lay the foundation for multi-center prospective studies.
Sixteen cancer centers throughout China participated in this study. In Phase 1, three suitable cases with upper, middle, and lower TEC were chosen, and participants were asked to contour a group of gross tumor volume (GTV-T), nodal gross tumor volume (GTV-N) and clinical target volume (CTV) for each case based on their routine experience. In Phase 2, the same clinicians were instructed to follow a contouring protocol to re-contour another group of target volume. The variation of the target volume was analyzed and quantified using dice similarity coefficient (DSC).
Sixteen clinicians provided routine volumes, whereas ten provided both routine and protocol volumes for each case. The IOV of routine GTV-N was the most striking in all cases, with the smallest DSC of 0.37 (95% CI 0.32–0.42), followed by CTV, whereas GTV-T showed high consistency. After following the protocol, the smallest DSC of GTV-N was improved to 0.64 (95% CI 0.45–0.83, P = 0.005) but the DSC of GTV-T and CTV remained constant in most cases.
Variability in target volume delineation was observed, but it could be significantly reduced and controlled using mandatory interventions.
Definitive radiotherapy (dRT) concurrent with chemotherapy has been recognized as standard treatment for patients with locally advanced or unresectable thoracic esophageal cancer , and accurate target volume delineation was a prerequisite for three-dimensional conformal and intensity-modulated radiotherapy (IMRT) techniques, especially when using simultaneous-integrated boost (SIB) radiotherapy to deliver a boost dose to the gross tumor volume (GTV-T) and nodal gross tumor volume (GTV-N) [2, 3]. In 1998, Tai et al.  observed interobserver variability (IOV) of target volume delineation in cervical esophageal cancer among 48 radiation oncologists, and the same team further discovered that the variation could be controlled with the help of special training . However, the delineation variation in dRT for thoracic esophageal cancer has not been evaluated.
Traditionally, definitive radiotherapy (dRT) field borders for esophageal cancer were designated by 3–5-cm expansions proximally and distally beyond the primary lesion along the esophagus, based on 2-dimensional planning [6, 7]. Recently, based on the intensity modulated radiation therapy (IMRT) technique, an expert consensus on contouring guidelines  compiled by radiation oncologists from cancer centers throughout the United States was published, which recommends that the CTV should include the GTV and GTV-N with at least 1-cm margin in all directions. In China, there are presently no consensus reference contouring guidelines, and hence, some variance in dRT field is likely among different cancer centers. An investigation to address IOV in target volume delineation seemed appropriate, as the IOV appeared to have an impact on clinical outcomes in multi-center studies and could potentially be minimized with refined consensus guidelines [9, 10].
This study aimed to investigate the IOV in target volume delineation in dRT for thoracic esophageal cancer among cancer centers in China, and ultimately improve contouring consistency as much as possible to lay the foundation for the multi-center prospective study.
Materials and methods
The following clinical examinations of three cases were completed in the primary center: The barium meal films and esophagogastroduodenoscopy (EGD) helped in locating the site and length of the tumor; further, endoscopic ultrasound (EUS) and computed tomography (CT) were mainly used to determine invasive depth and the relationship of surrounding tissues. Besides, nodal status was comprehensively judged by EUS, CT, and 18-fluorodeoxyglucose-positron emission tomography computed tomography (PET/CT). Brain magnetic resonance imaging and PET/CT were performed to exclude distant metastasis.
Case 1: The primary lesion was in the upper thoracic esophagus, and its boundary to the surrounding tissue was unclear with suspicion of tracheal invasion. Suspicious lymph nodes were in Station 1R, 1L, 2L, 4L, and 5 .
Case 2: The primary lesion was in the middle thoracic esophagus with a limited range of suspected lymphatic metastasis in Station 2R and 4R.
Case 3: The primary lesion was in the lower thoracic esophagus with a wide range of lymphatic metastasis. Suspicious lymph nodes were in Station 2R, 8L, and near the course of the left gastric artery.
The invited radiation oncologists from sixteen cancer centers were members of the Jing-Jin-Ji Esophageal and Esophagogastric Cancer Radiotherapy Oncology Group (3JECROG). A flow chart giving an overview of the study is shown in Fig. 1. In Phase 1, all branch centers received patient history, clinical examinations and planning CT fused with planning PET (slice thickness: 3.0 mm), and heads of the radiotherapy department were asked to identify their specialists in thoracic oncology to delineate the first group of GTV-T, GTV-N, and clinical target volume (CTV) based on their own routine experience, which was sent back to the primary center after completion and recorded as the routine group (RG). In Phase 2, differences and consistency of these target volumes between the centers were fully discussed at the second 3JECROG annual conference, and finally the contouring protocol was drafted and referential target volumes (RTVs) were drawn based on the expert opinions. Then, RTVs along with the contouring protocol (Additional file 1: The contouring protocol for guiding the determination of target volumes) and an atlas for target volume delineation  were sent to each center. The same specialists were asked to and give their opinions on RTVs and follow the protocol to re-delineate the second set of target volumes, which was recorded as the protocol group (PG).
We introduced the dice similarity coefficient (DSC)  as a direct measure of the degree of target volume matching (Fig. 2), which had the ability to comprehensively evaluate the similarity in both volume and location. The method was used to calculate the spatial overlap between RTVs and target volumes from branch-centers. The value of DSC varies from 0 (completely disjoined) to 1 (absolutely overlapped). DSC was defined as follows:
where V(RTVs), V(branch), and V (RTVs∩branch) are the volume of RTVs, target volumes from branch-centers, and their overlapping region, respectively.
The Shapiro–Wilk test  was used to check the distribution of continuous data for normality. The intergroup differences with normal distribution were evaluated using the paired t test; the ones with skewed distribution were evaluated using the rank test. All tests were two-sided, and a P < 0.05 was considered to indicate statistical significance. All statistical analyses were performed using R, version 3.5.1 (https://www.r-project.org/).
Number of datasets received
A total of 16 datasets was retrieved from 15 branch-centers in Phase 1, and one of the branch-centers included two radiation oncologists delineating target volumes separately. CTVs delineation in the RG was presented in Fig. 3a. In all three cases, the RG and PG were available from 10 clinicians; however, one of them did not comply with the protocol for the second delineation. Three participating clinicians returned their agreement on RTVs instead of contouring the new one, and the other three clinicians did not submit the protocol group of target volumes before the deadline. Figure 3b shows the CTVs in the PG, and a total of nine pairs of target volumes were included into the following comparison analysis.
According to the RG, the result of IOV in routine clinical practice was shown in Table 1. The maximum volume of CTV in case 3 was nearly seven times that of the smallest (range, 95.9–652.9 cc) volume. In general, GTV-T showed a higher degree of consistency, of which the DSC > 0.75 in all three cases. In contrast, variability in GTV-N was larger, of which the DSC < 0.55.
Efficiency of protocol
Detailed results of the comparison between the paired groups are presented in Table 2. The use of protocol had almost improved the DSC of all target volumes, and the most significant improvement was in GTV-N, which increased from 0.51, 0.38, and 0.37 to 0.67 (P = 0.022), 0.55(P = 0.260), and 0.72 (P = 0.005) in case 1, case 2, and case 3, respectively. In addition, it could be observed that the CTV of case 3 had a significantly better consistency with its DSC increasing from 0.63 to 0.72 (P = 0.004).
In the era of precision radiotherapy, the accuracy of target volume delineation plays a significant role in planning and execution of radiotherapy. However, owing to variance in the location of primary lesions and the range of lymph node metastasis, there is variability among radiation oncologists and radiation centers with respect to in the target volumes of dRT, which may lead to delineation bias in multicenter research studies. Therefore, it is important to ensure the consistency of target volume delineation before conducting a prospective, multi-center study. Our study found that IOV existed in routine delineating practice, and the contouring protocol could help in improving the contouring consistency.
According to the RG data, the consistency in delineation of GTV-T is generally high with basic DSC above 0.75; no obvious IOV existed among clinicians and centers regardless of the location of the primary lesion. Similar results were reported by a QA program of PRODIGE 26/CONCORDE phase 2/3 trial  that the GTV delineation was almost respected in all centers. As also reported by Nowee et al. , GTV delineation consistency seemed difficult to further improved.
For GTV-N, although PET-CT, EUS, and other auxiliary examinations were provided to help diagnose metastatic lymph nodes, and the contouring protocol was applied to improve its consistency in diagnosis, its DSC value was generally lower than 0.70. The possible reasons for these results are: first, there is no clear standardized definition of metastatic lymph nodes in esophageal cancer, which may have resulted in instances of either missed or over contoured GTV-N. Second, some clinical studies [17,18,19] have shown that a large proportion of metastatic lymph nodes diagnosed by preoperative imaging is clinically over- or under-estimated compared with the postoperative pathological results. As reported by Mantziari et al.  in a study of 193 patients with esophageal cancer (clinical stage: T3N0), though the patients were enrolled into the single surgery group, pathological N0 cases accounted for only 35.8%, which indicated that more than 60% cases were under-estimated in the clinical assessment. Finally, according to the analysis by Gockel , there was a 27–55% rate of lymph node metastasis when the primary lesion invades the submucosa. In addition, the lymph node metastasis in esophageal cancer is very extensive. For the cases receiving three-field lymph node dissection, as reported by Isono , the rate of metastases in cervical nodes was 27.5% among patients with middle thoracic esophageal cancer. Therefore, it is challenging to assess lymph node metastasis in clinical practice. We reviewed the multi-center target volumes and found that variance in GTV-N is mainly due to the first reason, that is, clinicians’ cautious overestimation of suspected metastasis in lymph nodes. In reality, the introduction of protocol allows clinicians to comprehensively combine multiple diagnostic methods for judgment, which may be the reason for the increased consistency of GTV-N.
The CTV field mainly relies on clinical examinations that mainly serves to provide a reference for clinicians by improving the accuracy of judgment of metastatic lymph nodes and determination of the range of radiation treatment. In the RG, IOV in CTV were observed among branch-centers. However, for those cases with relatively limited and proven range of lymph node metastasis, the IOV was relatively small regardless of the study group, which indicates that radiation oncologists reached an agreement in delineation of such target volumes. However, in case 3, with relatively more extensive lymph node metastasis, the IOV in target volume delineation becomes an issue. The efficacy of involved field irradiation (IFI) versus elective nodal irradiation (ENI) is still debatable [23,24,25], and a meta-analysis  shows that there is no survival difference between IFI and ENI. Thus, more prospective, comparative studies should be conducted for validation. Our study suggests that regardless of the location of the primary lesion, the consistency of delineating CTV was significantly improved according to the requirements of the protocol. Therefore, the findings of this multi-center study are important because it emphasizes that a different center could achieve a more consistent target volume delineation.
A similar observation has been documented for other tumors such as nasopharyngeal, cervical, pulmonary, and gastric carcinomas [27,28,29,30]. Factors accounting for the variance in GTV-N and CTV definition in this study were similar to those found by Weiss et al.  who suggested that causes are multifactorial, including image- and observer-related factors. A previous study  suggested that refined contouring guidelines should be provided to better reduce the IOV. Accordingly, the present protocol proposed a consensus on involved lymph nodes and strict definition in the expansion criteria of CTV, leading to higher consistency in delineation of GTV-N and CTV. According to an investigation on head and neck cancers by Peters et al. , protocol compliance did improve the radiotherapy quality assurance to achieve optimal treatment outcomes in the combined modality (chemoradiotherapy) treatment.
To our best knowledge, this is a minority of multicenter study of the variability in the target volumes of dRT for thoracic esophageal cancer, including 16 centers throughout China. Besides, we investigated IOV with respect to both volume and spatial relationship. In addition, unlike the previous trials that included dummy runs for QA analysis [33, 34], our study showed, similar to Spoelstra et al.  that IOV both before and after using the protocol was used to evaluate its efficiency.
Our study explored the variability in target volumes of dRT for thoracic esophageal cancer and compensated for the gap in this field. Furthermore, our study enforced that a contouring protocol could contribute to the consistency of target volumes, making the results of multi-center studies more reliable. Except the protocol, the improvement in the contouring consistency depends on advances in diagnostics. The department of imaging diagnosis in our center indicated that combining both the lymph node size and axial ratio relationship could improve the sensitivity in diagnosis . Besides, the addition of PET-CT and EUS will further increase the accuracy of N-stage [36, 37], thereby improving the consistency of target volumes definition.
One limitation of our study is that the results were based on the combination of multiple modality examinations, while patients could only receive several of them in clinical practice, which could result in more striking contouring variance. In addition, we did not ask participating centers to design treatment plans for their target volumes, and therefore variances in dosimetric parameters could not be evaluated. Instead, we planned to further assess the variability in treatment plans designed for dRT of esophageal cancer and evaluate the impact of the dose-restriction protocol on dose–volume histogram parameters.
The IOV was observed in target volume delineation, and no available uniform consensus may account for it, which likely illustrates the different contouring philosophies of the participating centers and emphasizes the need for standardization. The consistency of target volumes delineation in different centers could be improved through mandatory procedure, to lay a solid foundation for the reliability of multi-center prospective studies.
Availability of data and materials
All data generated or analyzed during this study are included in this published article and its supplementary information data.
Intensity-modulated radiation therapy
Simultaneously integrated boost
Gross tumor volume
Metastatic regional nodes
Organ at risk
Positron emission tomography
Clinical target volume
Referential target volume
Dice similarity coefficient
Involved field irradiation
Elective nodal irradiation
Cooper JS, Guo MD, Herskovic A, Macdonald JS, Martenson JA, Al-Sarraf M, et al. Chemoradiotherapy of locally advanced esophageal cancer: long-term follow-up of a prospective randomized trial (RTOG 85–01). J Am Med Assoc. 1999;281:1623–7.
Chang JY, Gomez DR, Allen PK, Younes AI, Bhutani M, Komaki RU, et al. Local control and toxicity of a simultaneous integrated boost for dose escalation in locally advanced esophageal cancer: interim results from a prospective phase I/II trial. J Thorac Oncol. 2016;12:375–82.
Welsh J, Palmer MB, Ajani JA, Liao Z, Swisher SG, Hofstetter WL, et al. Esophageal cancer dose escalation using a simultaneous integrated boost technique. Int J Radiat Oncol Biol Phys. 2012;82:468–74.
Tai P, Van Dyk J, Yu E, Battista J, Stitt L, Coad T. Variability of target volume delineation in cervical esophageal cancer. Int J Radiat Oncol Biol Phys. 1998;42:277–88.
Tai P, Van Dyk J, Battista J, Yu E, Stitt L, Tonita J, et al. Improving the consistency in cervical esophageal target volume definition by special training. Int J Radiat Oncol Biol Phys. 2002;53:766–74.
Herskovic A, Martz K, Al-Sarraf M, Leichman L, Brindle J, Vaitkevicius V, et al. Combined chemotherapy and radiotherapy compared with radiotherapy alone in patients with cancer of the esophagus. N Engl J Med. 1992;326:1593–8.
Krasna MJ, Willett C, Goldberg R, Sugarbaker D, Tepper J, Hollis D, et al. Phase III trial of trimodality therapy with cisplatin, fluorouracil, radiotherapy, and surgery compared with surgery alone for esophageal cancer: CALGB 9781. J Clin Oncol. 2008;26:1086–92.
Wu AJ, Bosch WR, Chang DT, Hong TS, Jabbour SK, Kleinberg LR, et al. Expert consensus contouring guidelines for intensity modulated radiation therapy in esophageal and gastroesophageal junction cancer. Int J Radiat Oncol Biol Phys. 2015;92:911–20.
Vinod SK, Jameson MG, Min M, Holloway LC. Uncertainties in volume delineation in radiation oncology: a systematic review and recommendations for future studies. Radiother Oncol. 2016;121:169–79.
Joye I, Macq G, Vaes E, Roels S, Lambrecht M, Pelgrims A, et al. Do refined consensus guidelines improve the uniformity of clinical target volume delineation for rectal cancer? Results of a national review project. Radiother Oncol. 2016;120:202–6.
El-Sherief AH, Wu CC, Abbott GF, Drake RL, Rice TW, Lau CT. International Association for the Study of Lung Cancer (IASLC) lymph node map: radiologic review with CT illustration. RadioGraphics. 2014;34:1680–91.
Xiao Z, Zhou Z, Li Y. Esophageal cancer target volume delineation and treatment guidance for radiation therapy. 1st ed. Beijing: People’s Medical Publishing House; 2017.
Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945;26:297–302.
Royston P, Remark AS. R94: a remark on algorithm AS 181: the W-test for normality. Appl Stat. 2006;44:547.
Boustani J, Rivin Del Campo E, Blanc J, Peiffert D, Benezery K, Pereira R, et al. Quality assurance of dose-escalated radiation therapy in a randomized trial for locally advanced oesophageal cancer. Int J Radiat Oncol Biol Phys. 2019;105:329–37.
Nowee ME, Voncken FEM, Kotte ANTJ, Goense L, van Rossum PSN, van Lier ALHMW, et al. Gross tumour delineation on computed tomography and positron emission tomography-computed tomography in oesophageal cancer: a nationwide study. Clin Transl Radiat Oncol. 2019;14:33–9.
Samson P, Puri V, Robinson C, Lockhart C, Carpenter D, Broderick S, et al. Clinical T2N0 esophageal cancer: identifying pretreatment characteristics associated with pathologic upstaging and the potential role for induction therapy. Ann Thorac Surg. 2016;101:2102–11.
Speicher PJ, Ganapathi AM, Englum BR, Hartwig MG, Onaitis MW, D’Amico TA, et al. Induction therapy does not improve survival for clinical stage T2N0 esophageal cancer. J Thorac Oncol. 2014;9:1195–201.
Zhang JQ, Hooker CM, Brock MV, Shin J, Lee S, How R, et al. Neoadjuvant chemoradiation therapy is beneficial for clinical stage T2 N0 esophageal cancer patients due to inaccurate preoperative staging. Ann Thorac Surg. 2012;93:429–37.
Mantziari S, Gronnier C, Renaud F, Duhamel A, Théreaux J, Brigand C, et al. Survival benefit of neoadjuvant treatment in clinical T3N0M0 Esophageal cancer: results from a retrospective multicenter European study. Ann Surg. 2017;266:805–13.
Gockel I, Sgourakis G, Lyros O, Polotzek U, Schimanski CC, Lang H, et al. Risk of lymph node metastasis in submucosal esophageal cancer: a review of surgically resected patients. Expert Rev Gastroenterol Hepatol. 2011;5:371–84.
Isono K, Sato H, Nakayama K. Results of a nationwide study on the three-field lymph node dissection of esophageal cancer. Oncology. 1991;48:411–20.
Onozawa M, Nihei K, Ishikura S, Minashi K, Yano T, Muto M, et al. Elective nodal irradiation (ENI) in definitive chemoradiotherapy (CRT) for squamous cell carcinoma of the thoracic esophagus. Radiother Oncol. 2009;92:266–9.
Ji K, Zhao L, Yang C, Meng M, Wang P. Three-dimensional conformal radiation for esophageal squamous cell carcinoma with involved-field irradiation may deliver considerable doses of incidental nodal irradiation. Radiat Oncol. 2012;7:1–8.
Yamashita H, Takenaka R, Omori M, Imae T, Okuma K, Ohtomo K, et al. Involved-field radiotherapy (IFRT) versus elective nodal irradiation (ENI) in combination with concurrent chemotherapy for 239 esophageal cancers: a single institutional retrospective study. Radiat Oncol. 2015;10:1–10.
Cheng YJ, Jing SW, Zhu LL, Wang J, Wang L, Liu Q, et al. Comparison of elective nodal irradiation and involved-field irradiation in esophageal squamous cell carcinoma: a meta-analysis. J Radiat Res. 2018;59:604–15.
Jansen EPM, Nijkamp J, Gubanski M, Lind PARM, Verheij M. Interobserver variation of clinical target volume delineation in gastric cancer. Int J Radiat Oncol Biol Phys. 2010;77:1166–70.
Spoelstra FOB, Senan S, Le Péchoux C, Ishikura S, Casas F, Ball D, et al. Variations in target volume definition for postoperative radiotherapy in stage III non-small-cell lung cancer: analysis of an international contouring study. Int J Radiat Oncol Biol Phys. 2010;76:1106–13.
Eminowicz G, McCormack M. Variability of clinical target volume delineation for definitive radiotherapy in cervix cancer. Radiother Oncol. 2015;117:542–7.
Peng YL, Chen L, Shen GZ, Li YN, Yao JJ, Xiao WW, et al. Interobserver variations in the delineation of target volumes and organs at risk and their impact on dose distribution in intensity-modulated radiation therapy for nasopharyngeal carcinoma. Oral Oncol. 2018;82:1–7.
Weiss E, Hess CF. The impact of gross tumor volume (GTV) and clinical target volume (CTV) definition on the total accuracy in radiotherapy: theoretical aspects and practical experiences. Strahlenther Onkol. 2003;179:21–30.
Peters LJ, O’Sullivan B, Giralt J, Fitzgerald TJ, Trotti A, Bernier J, et al. Critical impact of radiotherapy protocol compliance and quality in the treatment of advanced head and neck cancer: results from TROG 02.02. J Clin Oncol. 2010;28:2996–3001.
Foppiano F, Fiorino C, Frezza G, Greco C, Valdagni R. The impact of contouring uncertainty on rectal 3D dose-volume data: results of a dummy run in a multicenter trial (AIROPROS01-02). Int J Radiat Oncol Biol Phys. 2003;57:573–9.
Davis JB, Reiner B, Dusserre A, Giraud JY, Bolla M. Quality assurance of the EORTC trial 22911. A phase III study of post-operative external radiotherapy in pathological stage T3N0 prostatic carcinoma: the dummy run. Radiother Oncol. 2002;64:65–73.
Liu J, Wang Z, Shao H, Qu D, Liu J, Yao L. Improving CT detection sensitivity for nodal metastases in oesophageal cancer with combination of smaller size and lymph node axial ratio. Eur Radiol. 2018;28:188–95.
Choi J, Kim SG, Kim JS, Jung HC, Song IS. Comparison of endoscopic ultrasonography (EUS), positron emission tomography (PET), and computed tomography (CT) in the preoperative locoregional staging of resectable esophageal cancer. Surg Endosc. 2010;24:1380–6.
Tan R, Yao SZ, Huang ZQ, Li J, Li X, Tan HH, et al. Combination of FDG PET/CT and contrast-enhanced MSCT in detecting lymph node metastasis of esophageal cancer. Asian Pac J Cancer Prev. 2014;15:7719–24.
The authors thank all participating centers for their effort to send their target volumes for comparison with other centers.
This work was supported by the Beijing Hope Run Special Fund of Cancer Foundation of China (LC2016L04).
Ethics approval and consent to participate
The study conformed to the Declaration of Helsinki and was approved by the institutional ethics committee (National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital). Informed consent was obtained with the opt-out method.
Consent for publication
Our study did not contain any individual person’s data in any form.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Chang, X., Deng, W., Wang, X. et al. Interobserver variability in target volume delineation in definitive radiotherapy for thoracic esophageal cancer: a multi-center study from China. Radiat Oncol 16, 102 (2021). https://doi.org/10.1186/s13014-020-01691-4