Interobserver variability in the delineation of the tumour bed using seroma and surgical clips based on 4DCT scan for external-beam partial breast irradiation

To explore the interobserver variability in the delineation of the tumour bed using seroma and surgical clips based on the four-dimensional computed tomography (4DCT) scan for external-beam partial breast irradiation (EB-PBI) during free breathing. Patients with a seroma clarity score (SCS) 3 ~ 5 and ≥5 surgical clips in the lumpectomy cavity after breast-conserving surgery who were recruited for EB-PBI underwent 4DCT simulation. Based on the ten sets of 4DCT images acquired, the tumour bed formed using the clips, the seroma, and both the clips and seroma (defined as TBC, TBS and TBC+S, respectively) were delineated by five radiation oncologists using specific guidelines. The following parameters were calculated to analyse interobserver variability: volume of the tumour bed (TBC, TBS, TBC+S), coefficient of variation (COVC, COVS, COVC+S), and matching degree (MDC, MDS, MDC+S). The interobserver variability for TBC and TBC+S and for COVC and COVC+S were statistically significant (p = 0.021, 0.008, 0.002, 0.015). No significant difference was observed for TBS and COVS (p = 0.867, 0.061). Significant differences in interobserver variability were observed for MDC vs MDS, MDC vs MDC+S, MDS vs MDC+S (p = 0.000, 0.032, 0.008), the interobserver variability of MDS was smaller than that of MDC and MDC+S (MDS > MDC+S > MDC). When the SCS was 3 ~ 5 points and the number of surgical clips was ≥5, interobserver variability was minimal for the delineation of the tumour bed based on seroma.


Background
Breast-conserving therapy (BCT), which involves a wide local excision followed by radiotherapy to the whole breast, is the standard treatment for early-stage breast cancer [1]. The efficacy of BCT for the treatment of earlystage breast carcinoma has been established in multiple randomised trials [2,3]. External-beam partial breast irradiation (EB-PBI) has recently garnered increasing interest [4,5]. Several studies reported that EB-PBI, which delivers radiotherapy to the postoperative tumour bed (TB) with a margin of adjacent breast tissue, could achieve excellent results in certain patients. However, there are inherent challenges in defining accurate target volumes for partial breast irradiation (PBI). Studies demonstrate significant interobserver variability between radiation oncologists in defining the lumpectomy cavity, indicating the need to improve the accuracy and consistency in the delineation of the TB [6,7].
Surgical clips and seroma are important markers for delineating the TB for EB-PBI [7,8]. Kirby et al. [7] reported that the number of implanted markers influences the accuracy of target delineation and that five to six surgical clips are preferable for TB delineation for PBI or breast boost radiotherapy. Landis et al. [8] indicated that the shift of the centre of mass (COM) decreased and the percent volume overlap (PVO) increased significantly as the seroma clarity score (SCS) increased. The influence of the number of metal clips, SCS, delineation experience and contouring guidelines for the delineation of TB in CT images have been investigated [6,7,[9][10][11]. However, the effect of interobserver variability on the delineation of the TB using seroma and surgical clips based on 4DCT scan is not clearly established. To investigate the impact of different markers on interobserver variability in the delineation of TB based on 4DCT scan for EB-PBI, we analysed the TB delineated by five observers in this study based on clips, seroma, and both clips and seroma.

Patients
Twenty patients who underwent wide-local excision of breast cancer with full-thickness unstitching of the excision cavity (10 left-sided and 10 right-sided lesions) followed by EB-PBI between June 2009 and November 2013 were included in this study. To improve the delineation accuracy and consistency, all of the enrolled patients had SCS 3~5 and ≥5 surgical clips to mark the boundaries of the lumpectomy cavity. For every patient, five or more roundish surgical clips with diameters of 2 mm were implanted. The surgical clips were fixed to the superior, inferior, medial, lateral, and posterior walls of the surgical cavity, respectively (median number: 6) [12]. The average interval from lumpectomy to 4DCT scan was 10 weeks (range, 3-16 weeks). All patients were free of chronic lung diseases, and their ventilation functions were normal. Written informed consent was obtained from all patients with the approval of the Institutional Review Board (Shandong Tumour Hospital Ethics Committee).

Four-dimensional CT image acquisition
All twenty patients were immobilised in the supine position on a breast board using an arm support (with both arms above the head to adequately expose the breast). 4DCT images and respiratory signals were acquired with a thickness of 3 mm at the conclusion of the standard CT simulation using a 16-slice Brilliance Big Bore CT scanner (Philips Medical Systems, Inc., Cleveland, OH, USA). The signals were sent to the scanner to label a time tag on each CT image. GE Advantage 4D software (General Electric Healthcare, Waukesha, WI, USA) sorted the reconstructed 4DCT images into 10 respiratory phases based on these tags, with 0% corresponding to end inhalation (EI) and 50% corresponding to end exhalation (EE). Then, the constructed 4DCT image sets were transferred to the Eclipse treatment planning system (Eclipse™ 8.6; Varian Medical Systems, Palo Alto, CA) for structure delineation.

Observers
Five observers specialising in radiation treatment of breast carcinoma with more than five years of radiotherapy experience performed the delineations.

Tumour bed delineation
The 10%~90% phases of the 4DCT images were registered on the 0% phase images, which served as the basic phase image. The tumour beds were delineated from the ten sets of 4DCT images based on the clips, the seroma, and both the clips and seroma (termed TB C , TB S , TB C+S , respectively) according to a set of guidelines [13] (Figure 1). When the TB was delineated based on clips, we adjusted the window level and width to minimise the impact of seroma for contouring. All observers outlined a single test case that was reviewed prior to commencing the study to ensure that the guidelines were being followed.

Observation parameters
To quantify interobserver variability, the following parameters were calculated: the volume of TB (TB C , TB S , TB C+S ), which was an average value generated from the ten contours of the registered images for each patient, and the interobserver coefficient of variation (COV C , COV S , COV C+S ) for each patient. The COV was defined as the ratio between the standard deviation and the average volume of TB. For TB C , TB S , and TB C+S , an evaluation of the matching degree among the various observers in the EE phase was also performed. For each patient, the ratio between the intersection volume (the intersection among the volumes delineated by the five observers) and the union volume (the union among the volumes delineated by the five observers) was calculated (termed MD C , MD S and MD C+S , respectively) [14] ( Figure 2). The volume variability between the clips, the seroma, and both the clips and seroma based on the same observer was also calculated.
Statistical analysis SPSS 19.0 software was used for statistical analyses. A normal distribution test and a test for homogeneity of variance were performed. To investigate the interobserver variability in the delineation of the TB based on different makers in 4DCT images, a univariate analysis of variance (ANOVA) was used to compare differences in the TB (TB C , TB S , TB C+S ), COV (COV C , COV S , COV C+S ) and MD (MD C , MD S , MD C+S ) between observers. The volume variability between the TB C , TB S and TB C+S was also compared using ANOVA. P < 0.05 was considered significant.

TB volume
The volumes of TB C , TB S and TB C+S are listed in Table 1. The average volumes of TB C, TB S and TB C+S were 21.98 cc (range 8.70-45.13), 14.36 cc (range 3.14-41.80), and 28.79 cc (range 12.88-55.80), respectively. The volume variability between the TB C , TB S and TB C+S was statistically significant (p < 0.05) ( Table 2). Furthermore, the average duration from lumpectomy to 4DCT scan was 10 weeks (range, [3][4][5][6][7][8][9][10][11][12][13][14][15][16]. The volume of TB C was similar to that of TB S for six patients during weeks 4-8, the TB C was less than TB S in two patients during weeks 0-3, and the TB C was larger than TB S for twelve patients during weeks 8-16. The interobserver variability for TB C , TB S and TB C+S is listed in Table 2. The interobserver variability for TB C and TB C+S was statistically significant (p = 0.021, 0.008). However, the interobserver variability for TB S was not statistically significant (p = 0.867).

COV
The interobserver variability for COV is listed in Table 3. The interobserver variability for COV C and COV C+S was statistically significant (p = 0.002, 0.015), but the interobserver variability for COV S was not statistically significant (p = 0.061). Table 4 lists the differences in MD between the volumes delineated based on clips, seroma, and both clips and seroma in the EE phase. The interobserver variability for the MD C , MD S and MD C+S was statistically significant (F = 16.866, p = 0.000). There were also significant differences between MD C and MD S , MD C and MD C+S , and MD S and MD C+S (p = 0.000, 0.032, 0.008); the interobserver variability for MD S was smaller than that of MD C and MD C+S (MD S > MD C+S > MD C ).

Discussion
The accuracy of target volume delineation is critical for EB-PBI. The optimal target volume for EB-PBI remains to be established. Most reports define the tumour excision cavity or postoperative seroma as the target volume in treatment planning [7,8,15]. However, these studies also reported significant interobserver variation in delineating post-lumpectomy cavities. van Mourik et al. [16] investigated breast target volume delineations among thirteen observers in eight patients. They reported that the presence of clips or seroma reduced interobserver variability but that significant volumetric and spatial interobserver variability was observed in clinical target volume (CTV) even with the help of delineation guidelines. Therefore, reducing interobserver variability is a pressing issue. To improve delineation accuracy and consistency, we selected patients with SCS 3~5 and ≥5 surgical clips to mark the boundaries of the lumpectomy cavity.
Seroma clarity and volume in the lumpectomy cavity decreased as a function of time from surgery to the CT scan. The use of the CT-based seroma to guide the EB-PBI target volume is difficult due to a lack of clearly defined standards. Kader et al. [6] selected 205 women with earlystage breast cancer to undergo planning CT after breast conserving surgery and found that the mean seroma volume decreased from 47 cc to 30 cc during postoperative weeks 3-8, stabilised during weeks 9-14 (mean 21 cc) and was involuted beyond 14 weeks (mean 9 cc). In our study, the average duration from lumpectomy to 4DCT scan was 10 weeks (range, [3][4][5][6][7][8][9][10][11][12][13][14][15][16]. The volume of TB C approached the volume of TB S for six patients during weeks 4-8, and the TB C < TB S in two patients during weeks 0-3. However, after 8 weeks, the volume of TB C was larger than that of TB S for 12 patients. Therefore, given the magnitude and time trends of seroma volume and clarity loss, the optimal time to obtain the planning CT scan for PBI is within 8 weeks after surgery.
Surgical clips are not always consistent with the edge of seroma and the boundary of the lumpectomy cavity [13,17]. Ding et al [13] measured the three dimensional displacements of the GTV C , the GTVs and the GTV C+S and found that in the LR, AP and SI directions, the displacements were 0.9 mm, 1.05 mm and 1.20 mm for GTV C ; 0.80 mm, 1.05 mm and 0.80 mm for GTVs; and 0.90 mm, 1.20 mm and 1.40 mm for GTV C+S , respectively. In other words, the three dimensional displacements of GTV C+S were greater than those of GTV C and GTV S . Yang et al. [17] also measured the distance between surgical clips and the edge of the seroma in a coronal plane in women who underwent wide local excision of breast cancer and reported that the mean seroma edge extended beyond the clips by 0.3-0.5 cm. This study indicates that the volume of TB C+S delineated by observers was significantly larger than TB C and TB S . These results Figure 2 Mismatch between the TB delineated by five observers. The union volume is defined as the union of the five tumour beds, whereas the intersection volume is the intersection between the five tumour beds. may be due to decreased seroma clarity and volume in the lumpectomy cavity from the time of lumpectomy to the 4DCT scan, as well as the variability between the TB C , TB S , TB C+S regarding treatment margin.
Interestingly, we observed no significant differences between observers for TB S (p = 0.867) and COV S (p = 0.061). This could be explained by the short average duration from lumpectomy to planning CT and because the SCS 3~5 in the lumpectomy cavity improved the visualisation of the surgical cavity. Landis et al. [8] reported similar results in patients of SCS 4 and 5, as the average COM shift was 3 mm and 2 mm, respectively, and the PVO was 77% and 87%, respectively. Wong et al. [18] also reported that after reviewing contouring guidelines, the differences in seroma target volume (STV), CTV, and planning target volume (PTV) were no longer statistically significant. Although Dzhugashvili et al. [9]  Abbreviations: TB C , the tumour bed delineated based on clips; TB S , the tumour bed delineated based on the seroma; TB C+S , the tumour bed delineated based on both seroma and clips.   Abbreviations: MD C , the ratio between the intersection volume and the union volume based on clips; MD S , the ratio between the intersection volume and the union volume based on seroma; MD C+S , the ratio between the intersection volume and the union volume based on both clips and seroma.
reported that the conformity index of TB delineation was significantly improved by the placement of surgical clips within the lumpectomy cavity, a higher level of interobserver concordance was observed by the five observers when the SCS was 3~5 points in the lumpectomy cavity. These data suggest that interobserver variability in the delineation of TB based on seroma was not obvious when the SCS was 3~5 points. Although interobserver variability in the delineation of the TB based on seroma was minimal, the clinical reality could vary across regions. This was a retrospective study, and all the enrolled patients had undergone wide-local excision of breast cancer with full-thickness unstitching of the excision cavity. Full thickness closure of the excision cavity and oncoplastic surgical procedures are becoming more widely practiced in some regions, and these surgical methods could reduce the rate of seroma and its reliability as a marker of the TB. Therefore, patients should be given five or more surgical clips to reduce interobserver variability when only clips are used in clinical practice. Moreover, the placement of clips in the surgical cavity could be based on guidelines, and the duration from lumpectomy to 4DCT scan could be shortened.
Hurkmans et al. [19] reported that intra-and, to a greater extent, interobserver variability in the delineation of breast target volume on CT scans can be large. Both Dzhugashvili et al. [20] and Yang et al. [21] also reported interobserver variability (similar to the results presented here) in the delineation of the TB based on clips. These results can be explained by the fact that tissue stranding from the surgical cavity, proximity to muscle, dense breast parenchyma, and benign calcifications may be mistaken for surgical clips. Moreover, limited soft-tissue contrast on CT makes it an unreliable modality for detecting a layer of the image lacking surgical clips and when distinguishing between surgically induced densities and normal glandular breast tissue. Finally, the experience of the radiation oncologist and subjective determination of the location of the post-surgical cavity contribute to interobserver variability. Using a combination of information to more precisely define the TB, such as surgical reports, clinical palpation of the surgical defect and CT-based planning, may decrease interobserver variability. Additionally, clear communication between the surgeon and radiotherapist, including diagrammatic explanations, are crucial for accurately targeting the TB.
Cover et al. [22] reported that when the EE phase was reviewed in the sagittal plane, gating would reduce the mean tumour mobility from 6.3 ± 2.0 mm to 1.4 ± 0.5 mm. Moreover, 4DCT simulation scan can reduce motion artefacts [23,24]. Therefore, to investigate spatial mismatches of interobserver variability in the delineation of the TB based on different markers, the difference between the MD C , MD S and MD C+S were further analysed and compared based on the end-exhalation phase. Our study found interobserver variability between the MD C , MD S and MD C+S , as well as between the MDc and MDs, MD C and MD C+S , and MD S and MD C+S . Our results suggest spatial mismatch existed among observers in the delineation of the TB based on clips, the seroma, both the clips and seroma. Additionally, the MD of the seroma was larger than that of the clips as well as both the clips and seroma, and the MD of both the clips and seroma was larger than that of the clips alone (Table 4). However, the average CI reported by van Mourik et al. [16] was 0.53, which was considerably higher than that in the present study. This difference is likely due to differing target volume and calculation methods. Landis et al. [8] and Li et al. [25] reported even higher CI values ranging from 0.73-0.75; however this analysis focused on the PTV instead of the TB, the CI of which increases due to the larger volumes.
Image-guided techniques can improve clip and seroma localisation during treatment, potentially enabling the use of a smaller GTV-to-PTV margin. Ultrasound image guidance has also been investigated [26]. However, it is unclear if the margin is sufficient to account for interobserver contour variability. Further studies are needed to determine, whether contouring variability could result in an underdosing of the clips or seroma. Thus, it is imperative that future studies aiming to reduce margins from current treatment practice take interobserver contour variability into consideration.

Conclusions
The results of our study suggest that interobserver variability is smaller in the 4DCT delineation of the TB based on seroma compared with clips or both clips and seroma when the SCS was 3~5 points and the number of surgical clips was ≥5 in the lumpectomy cavity. Interobserver volume differences were observed between the TBs delineated based on surgical clips, and those delineated based on both the clips and seroma. This was also the case for spatial mismatch (MD), which was measured and analysed at the end-expiration phase. Optimising the time from lumpectomy to 4DCT scan is necessary to minimise interobserver variability in the delineation of the TB. If the time from lumpectomy to 4DCT simulation scan could be appropriately chosen, the delineation of the target volume based on seroma may be more reasonable in radiotherapy treatment planning.