GTV delineation in supraglottic laryngeal carcinoma: interobserver agreement of CT versus CT-MR delineation
© Jager et al.; licensee BioMed Central. 2015
Received: 2 September 2013
Accepted: 23 December 2014
Published: 23 January 2015
GTV delineation is the first crucial step in radiotherapy and requires high accuracy, especially with the growing use of highly conformal and adaptive radiotherapy techniques. If GTV delineations of observers concord, they are considered to be of high accuracy.
The aim of the study is to determine the interobserver agreement for GTV delineations of supraglottic laryngeal carcinoma on CT and on CT combined with MR-images and to determine the effect of adding MR images to CT-based delineation on the delineated volume and the interobserver agreement.
Twenty patients with biopsy proven T1-T4 supraglottic laryngeal cancer, treated with curative intent were included. For all patients a contrast enhanced planning CT and a 1.5-T MRI with gadolinium were acquired in the same head-and-shoulder mask for fixation as used during treatment. For MRI, a two element surface coil was used as a receiver coil. Three dedicated observers independently delineated the GTV on CT. After an interval of 2 weeks, a set of co-registered CT and MR-images was provided to delineate the GTV on CT. Common volumes (C) and encompassing volumes (E) were calculated and C/E ratios were determined for each pair of observers. The conformity index general (CIgen) was used to quantify the interobserver agreement. Results: In general, a large variation in interobserver agreement was found for CT (range: 0.29-0.77) as well as for CT-MR delineations (range: 0.17-0.80). The mean CIgen for CT (0.61) was larger compared to CT-MR (0.57) (p = 0.032). Mean GTV volume delineated on CT-MR (6.6 cm3) was larger compared to CT (5.6 cm3) (p = 0.002).
Delineation on CT with co-registered MR-images resulted in a larger mean GTV volume and in a decrease in interobserver agreement compared to CT only delineation for supraglottic laryngeal carcinoma.
KeywordsInterobserver agreement Supraglottic laryngeal carcinoma Head-and-neck cancer GTV MRI
Radiotherapy for head-and-neck cancer can give rise to severe acute and late side effects [1-4]. To minimize damage to healthy tissues on one hand and eradicate macroscopic tumor on the other hand, the gross tumor volume (GTV) should be determined as accurate as possible. This is especially required when applying intensity-modulated radiation therapy (IMRT) and position verification, to maximize the benefits of high-precision radiation techniques using smaller radiation fields .
Various studies [6-10] have been performed, using different imaging modalities, to determine the agreement among observers when delineating the GTV in head-and-neck cancer. Interobserver agreement and interobserver variability (disagreement) are often used in the same context whereas these terms express the opposite. Generally, interobserver agreement is used to assess the quality of an image modality to visualize the tumor. Thus, when there is more agreement among the observers, the image modality is assumed to be more precise and even more accurate in visualizing the tumor, although high accuracy can only be assessed by pathology.
For delineating the GTV and treatment planning in head-and-neck cancer, Computed Tomography (CT) is the imaging modality of first choice in most cases [11,12]. The advantages of CT are that it is widely available, does not cause geometrical distortion and has intrinsic information on the relative electronical density of the various tissues used for dose calculation algorithms . Where CT offers excellent bony detail, magnetic resonance imaging (MRI) uses various sequences to visualize soft tissue and bone contrasts. Especially the capability of MRI to visualize soft tissues is an improvement compared to CT, therefore permitting better definition of disease extent and organs at risk [12-14]. Because MRI does not carry intrinsic information on electronic density, it is currently precluded as sole imaging modality in clinical use for radiotherapy treatment planning in head-and-neck tumors [11,12]. Various studies demonstrated superior soft tissue contrast on MRI compared to CT [6,9,15,16]. Although there is agreement on the capacity of MRI to increase visibility of soft tissue structures in head-and-neck oncology, there is no agreement on the value of MRI for determination of the GTV and its influence on the interobserver agreement [7,8,10,11].
The aim of this study is to compare the interobserver agreement between delineations on CT and on CT with co-registered MR-images in supraglottic laryngeal carcinoma and to determine the value of adding MR-images to the “gold standard” CT images.
Twenty patients with biopsy proven T1-T4 supraglottic laryngeal cancer (squamous cell carcinoma, SCC) and treated with high-dose radiotherapy with curative intent at our institution between November 2005 and October 2009 were included in this study.
From a database of 120 patients with laryngeal and hypopharyngeal cancer, 39 patients fulfilled the criteria of inclusion. Which were; patients with a supraglottic tumor, the availability of a contrast enhanced CT scan and a MRI with gadolinium performed in a radiotherapy mask. Twenty patients were randomly selected from this group, Initial clinical assessment of tumor stage was performed based on triple-endoscopy under anesthesia, contrast enhanced CT-scan, and indirect laryngoscopy to assess mobility of the vocal cords. The study group consisted of five female and 15 male patients with a mean age of 64 years (range: 40-80 yr).
Imaging technique and data acquisition
Delineation of GTV
Gross tumor volume (GTV) was defined as the macroscopic (gross) extent of the primary tumor that is demonstrable on the imaging modality e.g, MRI-scan, CT-scan. The following guidelines for delineation of the GTV were agreed upon by the three observers at a consensus meeting in advance of the delineations sessions. Areas of doubt had to be included in the GTV according to radiotherapy practice. Edema around the tumor had to be included and evident stasis of saliva had to be excluded in the delineation. Criteria for soft tissue infiltration were: left-right asymmetry, contrast enhancement and fatty space infiltration. For cartilage invasion on CT the following signs were used: osteolysis of dense mineralized areas (if in contact with the primary tumor), cortical erosions, abnormal increased asymmetrical density and presence of tumor on both sides of bony/cartilaginous structures. Sclerotic cartilage with an intact cortex was not to be included in the GTV. Guidelines for interpretation of neoplastic invasion of laryngeal cartilages, as defined by M. Becker et al. , were used during delineation on MRI.
Three dedicated and MRI-trained head-and-neck specialists (two radiation oncologists and one radiologist) respectively called observer a, b and c, independently delineated the GTV. They started with CT-images at fixed window/level 350/50 with minor adjustments of 10-20 HU based on individual preferences.
After an interval of more than two weeks, to avoid possible bias due to recall of the previous delineation, the same CT (without previous contours) was delineated with the co-registered MR-images (T1w, T1w + Gd, T2w) simultaneously visible. Typical examples of delineations on CT and CT-MR can be found in the Additional files 1, 2, 3 and 4.
The observers received an anonymised triple-endoscopy report and were instructed to record: delineation time, window/level and which anatomical parts of the larynx were involved by tumor, during delineating on CT and CT-MR. Observers were also asked to subjectively rate image quality (good, moderate, poor and not assessable) and tumor detectability . The latter was scored as followed; 0, if tumor boundaries were not visible, 1: tumor is visible, boundaries not, 2: boundaries are visible but not clear, 3: tumors as well as boundaries are clearly visible.
Volumetric analysis and interobserver agreement
All GTVs were delineated in volume tool , a software application that is capable of simultaneous visualization of multiple 3-dimensional datasets. The volume of the GTV was determined by multiplying the number of voxels contained within a contour by the size of the voxel. The size of the voxel depends on the resolution of the image reconstruction and the slice thickness. If the center of the voxel is within the contour boundary, the voxel is regarded as being part of the volume.
A CIgen of 1.00 indicates perfect overlap (identical delineations), whereas a CIgen of 0.00 indicates no overlap at all.
Since the GTV is extended using margins to correct for several factors such as microscopic disease, movement and setup inaccuracies, the planning target volume is considerably larger than the GTV.
For each patient the GTV delineations were extended with a margin to create a PTV. Two scenarios were investigated according to the work of Vugts et al. . In one scenario conventional margins were applied (PTVclinical). In the other scenario tight margins were investigated (PTVtight). A margin of 8 mm was used for PTVtight and15 mm for PTVclinical. As a “worst case scenario”, the largest GTV was assumed to be the correct GTV. Subsequently, it was determined in how many cases this GTV was not covered by the PTVs.
Based on non-normality of the samples according to the Shapiro Wilk test, the Wilcoxon signed-rank test was applied for statistical comparison of the mean delineated volume (GTV) between CT and CT-MR.
The coefficient of variation (COV), defined as COV = standard deviation (SD)/mean volume, was determined for all delineated GTVs of the patients for each imaging modality. For each modality correlation between mean GTV volume and COV was measured using a Spearman rank correlation test.
A Student paired t-test was used for the comparison of the CIgen on CT and CIgen on CT-MR. These samples were both normally distributed according to the Shapiro Wilk test. For each modality the correlation between CIgen and GTV volume was measured using a Spearman rank correlation test. Statistical analyses were performed with SPSS 16.0 using a (alpha) level of significance of 0.05.
Image quality was considered “good” for the majority of the CT-images as well as for MR-images. For some patients the image quality of the CT-scan was deteriorated by contrast insufficiency or due to swallowing. Movement due to swallowing had an adverse effect on MR-image quality. However, the image quality was never considered to be “not assessable”, nor did the observers unanimously qualify the image quality as being “poor”.
For the CT of 16 patients, at least one observer recorded that the tumor borders were “visible but not clear” or worse (grade 2, 1 and 0). For MRI, this was the case for 11 patients. In eight patients at least one of the observers recorded explicit difficulties in the cranial and/or caudal direction on CT and in four patients referring to the MRI. These recorded difficulties in cranial and caudal direction were objectified by larger discrepancies (smaller common volumes) in the delineations in the cranial and caudal areas and in de region of the epiglottis for CT as well as for MRI in nearly all patients.
Gross tumor volumes delineated by 3 observers on CT and CT-MR in supraglottic laryngeal carcinoma
Mean GTV (cm 3 )
The median GTV volume of the three observers on CT-MR (median: 6.6 cm3, interquartile range: 2.9-9.2, 95% confidence interval: 4.8-11.8 cm3) was significantly larger (p = 0.002) compared to median of the GTV volume on CT (median: 5.6 cm3, interquartile range: 2.5-7.9, 95% confidence interval: 4.2-9.8 cm3) (Table 1).
No relation between COV and the mean GTV volume for any of the imaging modalities was observed (CT: rho = -0.28 p = 0.23 CT-MR: rho = 0.05 p = 0.84).
The mean CIgen for CT was significantly larger (0.61, SD 0.12, range 0.29-0.77, 95% confidence interval: 0.56-0.67) compared to CT-MR (0.57, SD 0.15, range 0.17-0.80, 95% confidence interval: 0.50-0.64) (p = 0.032).
Although the smallest CIgens were observed for the smallest tumors (Table 1), no relation between CIgen and the mean GTV volume for CT as well as for CT-MR delineations was observed (CT: rho = 0.28 p = 0.24, CT-MR: rho = 0.31 p = 0.18).
When applying tight margins, for 12 of the 20 patients the largest GTV contour was not covered by all the PTVs. When using clinical margins this number decreased to two of the 20 patients. The anatomical sites where the GTV contour was not encompassed by the PTV contours were mostly in cranial and caudal direction.
The present study on supraglottic laryngeal carcinoma demonstrates that adding MR-images to CT resulted in a decrease in interobserver agreement compared to the interobserver agreement of the CT-only delineation-session. Furthermore, the median GTV volume was larger on CT-MR compared to CT although there was no relation found between the GTV volume and the CIgen. Subjectively, the observers reported an increased visibility of anatomical details on MRI.
According to other studies based on head-and-neck cancer where MRI was compared with CT, Ahmed et al.  demonstrated that the delineated GTV volume for base of tongue tumors on MRI was almost two times larger compared to CT. They also reported a superior subjective visualization and delineation of base of tongue tumors on the MRI-scans relative to CT. Several other studies concluded the same for tongue and floor of the mouth cancer [15,16] and nasopharyngeal carcinoma .
Although there is agreement on the capacity of MRI to increase visibility in head-and-neck oncology, there is no agreement on the value of MRI for determination of the GTV. Rasch et al.  showed better interobserver agreement with matched CT-MRI, for target delineation in nasopharynx cancer compared to CT alone. A large improving factor on the interobserver agreement was the decision to include entire anatomical structures invaded by tumor. A previous study done by Rasch et al.  reported that for six patients with advanced head-and-neck carcinoma, the delineated GTVs and interobserver agreement was better for delineations on MRI (with CT-images available) than on CT (with MR-images available). However, no difference between one single observers’ mean GTV volume delineated on CT and on MRI for oropharyngeal, laryngeal and hypopharyngeal tumors was found by Daisne et al. . Additionally, a study performed by Geets et al.  showed no clinical advantage of MRI over CT in terms of volume determination and interobserver agreement for pharyngo-laryngeal tumors. Concerning the design of the study, this study  was the only one that, to some extent, resembled ours. However, we were not able to adequately compare our findings with results from the mentioned study because MRI was used without CT for delineating. Furthermore, the use of a different metric to quantify the interobserver agreement, based on area of overlap between contours, hampers a detailed comparison. In general, a wide variety of metrics is used to quantify the interobserver agreement in delineation studies for example: Dice similarity coefficient, common to encompassing volume ratio and Jaccard index [22,24,25].
Since the GTV is extended by margins to correct for several factors such as microscopic disease, movement and setup inaccuracies, the PTV is larger than the GTV. Our analysis indicates that large conventional margins partly compensate for the interobserver variation. However, when evidence-based tight margins are applied the interobserver variation for delineating the GTV might result in inadequate dose coverage of the GTV.
Tumor recurrence was diagnosed for two patients in this study. Due to the development in radiotherapy treatment schedules and tumor treatment planning between 2005 and 2009 we are not able to draw conclusions from this finding concerning treatment outcomes. Furthermore, the treatment plan was based on the delineation from the treating radiation-oncologist while the delineations in this study were used for research purposes.
In our study a dedicated MR protocol for radiotherapy GTV delineation was applied. This protocol has been used at our department since 2005. Care was taken to optimize the MR-image quality for radiotherapy purposes . For the majority of the patients, MR-Image quality was considered “good”. However, the introduction of 3.0 Tesla and recently 7.0 Tesla MRI scanners and the development of new fast scan protocols might further optimize MR image quality.
A shortcoming of the studies of Daisne  and Geets  was the use of a multipurpose bodycoil as receiver coil for MR-imaging. Rasch  used a head coil for MR-imaging only without a mask or external markers, causing a decrease in image quality.
Although, in our study, the observers reported a subjectively increased visibility of anatomical details on MRI compared to CT, this did not improve agreement between observers. On the contrary, interobserver agreement was decreased. Apparently, additional MRI information resulted in more options to interpret the imaging data, resulting in a greater variation in delineations and an increase in delineated volumes. The inclusion of areas of doubt in the GTV, as described in our delineation guidelines, further increased these variations and volumes. In our opinion, the increased visibility of anatomical details on MRI might be of value in radiotherapy practice when it is clear how to combine the information of different MR-sequences when delineating the GTV. To maximize the benefits of high-precision radiation techniques, the gross tumor volume (GTV) should be determined as accurate as possible. Clear guidelines for interpretation and GTV delineation of laryngeal carcinoma could therefore be very useful. To develop these guidelines, a validation-study with total laryngectomy specimens is currently being performed at our institution. In that study, tumor tissue is identified based on pathological findings and compared with GTV delineations on different image modalities .
The large variation in interobserver agreement for the GTVs delineated on CT as well as for CT-MR delineations (Table 1) suggests that for some tumors it was more difficult to delineate the GTV compared to others. In some cases, this might have been influenced by a moderately decreased image quality. In our opinion, this variation was mostly caused by differences in location and characteristics of the tumor, and difficulties to distinguish tumor borders.
For the two T4 stage tumors, the differences between the delineated volumes on CT and CT-MR were the largest. This might be explained by the presence of edema that is increased in larger tumors and which could cause an increase in delineated volume on CT-MR since MR is superior in visualizing soft tissues (e.g. edema) [12-14]. The observers also included more cartilage in their GTV on CT-MR compared to CT only. Besides the capacity of MRI to increase visibility of soft tissue, MRI might have an improved visibility for cartilage invasion compared to CT. Research performed by Becker et al. supports this presumption [14,19]. Since there was no histopathological data available for this study we are not able to further investigate this finding.
The CT-images used in this study were obtained on 2 different CT-scanners and slice thickness varied between 2 and 3 mm. This did not influence the results since there were no remarkable differences between the CIgens comparing the two scanners. Besides, no difference in image quality and no specific matching related problems were reported.
The interobserver agreement was decreased in the CT-MR session compared to the CT only delineations and mean delineated volume on CT-MR was larger compared to CT. At this point MR has no objective added value concerning the CIgen outcomes. The increased visualization of anatomical details on MRI might lead to an increased interobserver agreement and more accurate GTV estimation only when clear guidelines for interpretation and delineation of MR-images of laryngeal tumors are present.
- Meyer F, Fortin A, Wang CS, Lui G, Bairati I. Predictors of severe acute and late toxicities in patients with localized head-and-neck cancer treated with radiation therapy. Int J Radiat Oncol Biol Phys. 2012;82 Suppl 4:1454–62.View ArticlePubMedGoogle Scholar
- Zackrisson B, Mercke C, Strander H, Wennerberg J, Cavallin-Ståhl E. A systematic overview of radiation therapy effects in head and neck cancer. Acta Oncol. 2003;42(Suppl 5–6):443–61.View ArticlePubMedGoogle Scholar
- Caglar HB, Tishler RB, Othuis M, Burke E, Li Y, Goguen L, et al. Dose to larynx predicts for swallowing complications after intensity-modulated radiotherapy. Radiat Oncol Biol Phys. 2008;72 Suppl 4:1110–8.View ArticleGoogle Scholar
- Dijkema T, Raaijmakers CPJ, Braam PM, Roesink JM, Monninkhof EM, Terhaard CH. Xerostomia:a day and night difference. Radiother Oncol. 2012;104 Suppl 2:219–23.View ArticlePubMedGoogle Scholar
- Lambrecht M, Nevens D, Nuyts S. Intensity-modulated radiotherapy vs. parotid-sparing 3D conformal radiotherapy. Strahlenther Onkol. 2013;189 Suppl 3:223–9.View ArticlePubMedGoogle Scholar
- Ahmed M, Schmidt M, Sohaib A, Kong C, Burke K, Richardson C, et al. The value of magnetic resonance imaging in target volume delineation of base of tongue tumors – A study using flexible surface coils. Radiother Oncol. 2010;94 Suppl 2:161–7.View ArticlePubMedGoogle Scholar
- Rasch C, Keus R, Pameijer FA, Koops W, de Ru V, Muller S, et al. The potential impact of CT-MRI Matching on tumor volume delineation in advanced head and neck cancer. Int J Radiat Oncol Biol Phys. 1997;39 Suppl 4:841–8.View ArticlePubMedGoogle Scholar
- Geets X, Daisne J-F, Arcangeli S, Coche E, De Poel M, Duprez T, et al. Inter-observer variability in the delineation of pharyngo-laryngeal tumor, parotid glands and cervical spinal cord: Comparison between CT-scan and MRI. Radiother Oncol. 2005;77 Suppl 1:25–31.View ArticlePubMedGoogle Scholar
- Chung NN, Ting LL, Hsu WC, Lui LT, Wang PM. Impact of magnetic resonance imaging versus CT on nasopharyngeal carcinoma: primary tumor target delineation for radiotherapy. Head Neck. 2004;26 Suppl 3:241–6.View ArticlePubMedGoogle Scholar
- Rasch CR, Steenbakkers RJ, Fitton I, Duppen JC, Nowak PJ, Pameijer FA, et al. Decreased 3D observer variation with matched CT-MRI, for target delineation in Nasopharynx cancer. Radiat Oncol. 2010;5:1–8.View ArticleGoogle Scholar
- Daisne J-F, Duprez T, Weynand B, Lonneux M, Hamoir M, Reychler H, et al. Tumor volume in pharyngolaryngeal squamous cell carcinoma: comparison at CT, MR Imaging, and FDG Pet and validation with surgical specimen. Radiology. 2004;233 Suppl 1:93–100.View ArticlePubMedGoogle Scholar
- Khoo VS, Dearnaley DP, Finnigan DJ, Padhani A, Tanner SF, Leach MO. Magnetic resonance imaging (MRI): considerations and applications in radiotherapy treatment planning. Radiother Oncol. 1997;42 Suppl 1:1–15.View ArticlePubMedGoogle Scholar
- Castelijns JA, Hermans R, van den Brekel MWM, Mukherji SK. Imaging of laryngeal cancer. Semin Ultrasound CT MR. 1998;19 Suppl 6:492–504.View ArticlePubMedGoogle Scholar
- Becker M, Zbären P, Laeng H, Stoupis C, Porcellini B, Vock P. Neoplastic invasion of the laryngeal cartilage. Comparison of MR imaging and CT with histopathologic correlation. Radiology. 1995;194 Suppl 3:661–9.View ArticlePubMedGoogle Scholar
- Sigal R, Zagdanski AM, Schwaab G, Bosq J, Auperin A, Laplanche A, et al. CT and MR imaging of squamous cell carcinoma of the tongue and floor of the mouth. Radiographics. 1996;16 Suppl 4:787–810.View ArticlePubMedGoogle Scholar
- Lam P, Au-Yeung KM, Cheng PW, Wei WI, Yuen AP, Trendell-Smith N, et al. Correlating MRI and histologic tumor thickness in the assessment of oral tongue cancer. AJR Am J Roentgenol. 2004;182 Suppl 3:803–8.View ArticlePubMedGoogle Scholar
- Verduijn GM, Bartels LW, Raaijmakers CP, Terhaard CH, Pameijer FA, van den Berg CA. Magnetic Resonance Imaging protocol optimization for delineation of gross tumor volume in hypopharyngeal and laryngeal tumors. Int J Radiat Oncol Biol Phys. 2009;74 Suppl 2:630–6.View ArticlePubMedGoogle Scholar
- Webster GJ, Kilgallon JE, Ho KF, Rowbottom CG, Slevin NJ, Mackay RI. A novel imaging technique for fusion high-quality immobilised MR imaged of head and neck with CT scans for radiotherapy target delineation. Br J Radiol. 2009;82 Suppl 978:497–503.View ArticlePubMedGoogle Scholar
- Becker M, Zbären P, Casselman JW, Kohler R, Dulguerov P, Becker CD. Neoplastic invasion of laryngeal cartilage: reassessment of criteria for diagnosis of MR Imaging. Radiology. 2008;249 Suppl 2:551–9.View ArticlePubMedGoogle Scholar
- Murakami R, Baba Y, Furusawa M, Nishimura R, Nakaura T, Baba T, et al. Early glottic squamous cell carcinoma. Predictive value of MR imaging for the rate of 5-year control with radiation therapy. Acta Radiol. 2000;41 Suppl 1:38–44.View ArticlePubMedGoogle Scholar
- Bol GH, Kotte ANTJ, Van der Heide UA, Lagendijk JJ. Simultaneous multi-modality ROI delineation in clinical practice. Comput Methods Programs Biomed. 2009;96 Suppl 2:133–40.View ArticlePubMedGoogle Scholar
- Kouwenhoven E, Giezen M, Struikmans H. Measuring the similarity of target volume delineations independent of the number of observers. Phys Med Biol. 2009;54 Suppl 9:2863–73.View ArticlePubMedGoogle Scholar
- Vugts CA, Terhaard CH, Philippens ME, Pameijer FA, Kasperts N, Raaijmakers CP. Consequences of tumor planning target volume reduction in treatment of T2-T4 laryngeal cancer. Radiat Oncol. 2014;9:195.View ArticlePubMed CentralPubMedGoogle Scholar
- Fontina I, Lütendorf-Caucig C, Stock M, Pötter R, Georg D. Critical discussion of evaluation parameters for inter-observer variability in target definition for radiation therapy. Strahlenther Onkol. 2012;188 Suppl 2:160–7.View ArticleGoogle Scholar
- Hanna GG, Hounsell AR, O’Sullivan JM. Geometrical analysis of radiotherapy target volume delineation: a systematic review of reported comparison methods. Clin Oncol. 2010;22 Suppl 7:515–25.View ArticleGoogle Scholar
- Caldas-Magalhaes J, Kasperts N, van den Kooij N, Berg CA, Terhaard CH, Raaijmakers CP, et al. Validation of imaging with pathology in laryngeal cancer: accuracy of the registration methodology. Int J Radiat Oncol Biol Phys. 2012;82 Suppl 2:289–98.View ArticleGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.