Determination of patient-specific internal gross tumor volumes for lung cancer using four-dimensional computed tomography

Background To determine the optimal approach to delineating patient-specific internal gross target volumes (IGTV) from four-dimensional (4-D) computed tomography (CT) image data sets used in the planning of radiation treatment for lung cancers. Methods We analyzed 4D-CT image data sets of 27 consecutive patients with non-small-cell lung cancer (stage I: 17, stage III: 10). The IGTV, defined to be the envelope of respiratory motion of the gross tumor volume in each 4D-CT data set was delineated manually using four techniques: (1) combining the gross tumor volume (GTV) contours from ten respiratory phases (IGTVAllPhases); (2) combining the GTV contours from two extreme respiratory phases (0% and 50%) (IGTV2Phases); (3) defining the GTV contour using the maximum intensity projection (MIP) (IGTVMIP); and (4) defining the GTV contour using the MIP with modification based on visual verification of contours in individual respiratory phase (IGTVMIP-Modified). Using the IGTVAllPhases as the optimum IGTV, we compared volumes, matching indices, and extent of target missing using the IGTVs based on the other three approaches. Results The IGTVMIP and IGTV2Phases were significantly smaller than the IGTVAllPhases (p < 0.006 for stage I and p < 0.002 for stage III). However, the values of the IGTVMIP-Modified were close to those determined from IGTVAllPhases (p = 0.08). IGTVMIP-Modified also matched the best with IGTVAllPhases. Conclusion IGTVMIP and IGTV2Phases underestimate IGTVs. IGTVMIP-Modified is recommended to improve IGTV delineation in lung cancer.


Background
Lung cancer remains the leading cause of cancer-related mortality. Conventional photon radiotherapy for lung cancer is associated with about 50% local tumor control [1]. Missing the target as a result of tumor motion has been considered one of the main reasons for local failure [2]. Researchers have reported that ~40% of lung tumors move > 5 mm and that 10-12% move > 1 cm [3,4]. Several strategies have recently been developed to address the issue of tumor motion and improve local control [2]. For example, the development of image-guided radiotherapy (IGRT) has allowed for more accurate tumor targeting, so it is rapidly replacing conventional radiotherapy for lung cancer [2]. In order to account for tumor motion, the International Commission on Radiation Units and Measurements (ICRU) report 62 introduced the concept of an internal target volume (ITV), defined as the clinical target volume (CTV) plus an additional margin to account for geometric uncertainties due to internal variations in tumor position, size, and shape. Using current imaging techniques, the CTV cannot be visualized. Consequently, generation of the ITV requires delineation of the gross tumor volume (GTV) on each of the phases that constitute the four-dimensional (4-D) computed tomography (CT) image data set, followed by expansion of each GTV to account for microscopic disease. The ITV is then determined to be the envelope of motion of the CTV. In order to make the determination of the ITV more efficient, we have proposed the concept of the internal gross tumor volume (IGTV), which explicitly accounts for internal variations in tumor position, size, and shape but can be derived directly from imaging studies [2]. The ITV is then determined to be the IGTV plus a margin that accounts for microscopic disease.
Traditionally, the margin necessary to account for internal motion of tumors in the thorax has been determined using an isotropic expansion determined by populationbased estimates of respiratory motion. However, because breathing characteristics vary greatly among individual patients, such population-based estimates may overestimate or underestimate the margin needed for a given patient. Moreover, respiratory-induced tumor motion is known not to be anisotropic; typical tumor paths are those of elongated and possible curved ellipses. The advent of the multislice helical CT scanner combined with the establishment of temporal correlation between respiratory motion and the CT acquisition process have allowed tumor size, shape, and position to be observed at multiple times during a patient's respiratory cycle [5,6]. The resultant CT data set, called the 4-D CT or respirationcorrelated CT data set, provides patient-specific information about tumor position, shape, and size at different phases of the respiratory cycle.
Although using 4-D CT data provides a reliable estimate of the extent of tumor motion due to respiration in three dimensions, its clinical implementation poses some challenges. Ideally, the IGTV should be determined by contouring the GTV on each of the ten phase image sets. The combination of these individual three dimensional (3-D) volumes into a single 3-D volume represents the IGTV, which accounts for respiratory motion. However, contouring the tumor volume on ten different data sets for each patient increases the workload compared with con-touring in only one dataset. In these instances, postprocessing tools, such as the maximum intensity projection (MIP), have been shown to improve radiotherapy planning efficiency [7]. The MIP of a 4D-CT data set reduces the multiple 3-D CT data available from a 4-D CT data set into a single 3-D CT data set, where each voxel in the MIP represents the maximum intensity encountered by corresponding voxels in all individual 3-D phase image sets of the 4-D CT data set. The IGTV is then determined based on the GTV delineation on the single 3-D CT data set. Alternatively, some cancer centers have used breathhold spiral CT imaging to acquire images at the two extremes of the respiratory cycle [2,7]; contouring the GTV at these extremes (the end-expiration and the end-inspiration phases) and then combining these two 3-D volumes yields the IGTV. A limited number of studies have analyzed the accuracy of the MIP and two-phase IGTV delineation techniques relative to full ten-phase method for determining IGTV [8][9][10][11].
The aim of this study, therefore, was to evaluate the accuracy of 4-D CT MIP-based IGTV delineation and twophase-based IGTV delineation compared to ten-phase IGTV delineation as a reference. We also examined the accuracy of the MIP-based IGTV delineation after applying a modification through visual verification of GTV coverage in individual respiratory phases.

Data acquisition
As a retrospective review of radiation treatment planning, this study was included under an Institutional Review Board-approved retrospective chart review protocol. We studied 27 consecutive patients with non-small-lung cancer (NSCLC) who underwent 4-D CT simulation for treatment planning and received definitive radiotherapy at our institution between 2005 and 2006. Of these 27 patients, 17 had stage I disease and received stereotactic body radiotherapy (SBRT), and 10 had stage III disease and received intensity-modulated radiotherapy (IMRT). 4-D CT image data sets each consisting of 10 respiratory phases, were acquired on a multislice CT scanner (Discovery ST, GE Medical Systems, Madison, WI) by sorting CT images based on the phase of an external respiratory monitor (Real-time Position Management System; Varian Medical Systems, Inc., Palo Alto, CA) [12]. MIPs of the 4D-CT data sets were then generated from the individual phase images as described elsewhere [5,6].

Patient-specific IGTV determination
We determined patient-specific IGTVs using the demonstrable extent of tumor motion shown in the 4-D CT images. We used four approaches to determine these IGTVs: (1) contouring the GTV on each of the ten respiratory phases of the 4D-CT data set and combining these GTVs to produce IGTV AllPhases ; (2) contouring the GTV on the MIP of the 4-D CT data set to produce IGTV MIP ; (3) contouring the GTV on the extreme respiratory phases (0% phase = peak inhalation, 50% phase = peak exhalation) and combining these GTVs to produce IGTV 2Phases ; and (4) contouring the GTV on the MIP of the 4-D CT data set and then modifying these contours using visual verification of coverage in each phase of the 4-D CT data set to produce IGTV MIP-Modified . Visual verification of coverage in each phase was achieved by overlaying the MIP based GTV contour onto each phase of the 4-D CT data set. Thus, each of these 3D volumes (IGTV AllPhases , IGTV MIP , IGTV 2Phases , and IGTV MIP-Modified ) represented the demonstrable respiratory tumor motion volumes, or IGTVs. Figures 1 and 2 show the results obtained using these different approaches in the determination of IGTV for cases of stage I and stage III disease, respectively. For con-sistency in contouring, all GTV contours in each respiratory phase of the 4-D CT and MIP data sets were drawn by a single radiation oncologist (ME) and verified by another radiation oncologist (JYC). We used a lung window on the CT data set to contour the primary tumor and a mediastinum window to contour any involved lymph nodes. Diagnostic CT of chest with intravenous contrast and PET/ CT were used to guide our involved lymph nodes contouring as described by our previous publication (2). A total of 324 GTVs were delineated with 12 GTVs delineated for each patient (GTV in each of 10 respiratory phases, IGTV MIP , and IGTV MIP-Modified ). For stage III disease, involved hilar or mediastinal lymph nodes were contoured and analyzed independently.

Data analysis
We evaluated the IGTVs determined using each of the three contouring approaches against an all phases IGTV determined by contouring all ten respiratory phases of the 4-D CT data set (IGTV AllPhases ). Specifically, we compared the following metrics for each 3D volume: matching index, total GTV volume and under or over-estimated volume.

Matching index calculation
The matching index (MI) of any two 3D volumes A and B is defined as the ratio of the intersection of A with B to the union of A and B, that is, As can be deduced from this equation, the maximum value of the MI is 1 if the two volumes are identical, and the minimum value is 0 if the volumes are completely non-overlapping.

Volume difference calculation
While the matching index is a good measure of how well the shape of any two volumes match each other, it cannot discriminate between overestimation and underestimation. To gain better insight into any over/underestimation of the IGTV, we computed the differences in IGTV between the all phases volume (IGTV AllPhases ) and the three test volumes (IGTV MIP , IGTV 2Phases , and IGTV MIP-Modified ). For each pair of volumes, we computed the underestimation and overestimation volumes (V Under and V Over ) using the following equations: where V AllPhases is the volume in ten respiratory phases, V test is the test volume, and "\" denotes the set difference. The underestimation and overestimation volumes were computed as integrals over the z coordinate of the corresponding transverse areas as follows: where A AllPhases is the area in ten respiratory phases and A Test is the test area. The underestimation area (A Under ) and the overestimation area (A Over ) defined as were computed for each axial level by performing the Delaunay triangulation for the union of the all phases and test contour points and computing the areas as a sum of the corresponding triangular areas (see Figure 3). Given a set of data points in the plane, the Delaunay triangulation is a set of triangles such that no data points are contained in any triangle's circumscribed circle. Delaunay triangulations maximize the minimum angle of all the triangles in the triangulation and they tend to avoid skinny (or closeto-degenerate) triangles. We used the Delaunay triangulation implemented in a high-level graphical analysis and programming package, MATLAB (The Mathworks, Inc.: http://www.mathworks.com), which is based on the Quickhull algorithm [13].

Statistical analysis
To estimate any statistically significant differences between the IGTVs determined using each test volume (IGTV MIP , IGTV 2Phases , and IGTV MIP-Modified ) and the IGTV determined using the all phases volume (IGTV AllPhases ), we used a paired sample t-test in each case to determine p, with p < 0.05 considered significant. All statistical analyses were performed using the SPSS software package (v.10; SPSS Inc., Chicago, IL).  Table 2 shows the MI values for each of the three test IGTVs. As shown, the IGTV MIP-Modified (mean ± SD: 0.90 ± 0.02) most closely matched the IGTV AllPhases , with IGTV 2Phases (mean ± SD: 0.81 ± 0.06) and IGTV MIP (mean ± SD: 0.80 ± 0.05) following. There were no significant differences between IGTV 2Phases and IGTV MIP (p = 0.728), but the differences in MI between IGTV MIP and IGTV MIP-Modified and those between IGTV 2Phases IGTV MIP-Modified were significant (p < 0.001, respectively)

Results
We performed a comparative analysis of the MI values of the two patient groups (patients with SI motion ≤1 cm and those with SI motion > 1 cm) with stage I disease.
There was no strong correlation between the MI and the magnitude of SI motion, although the MI of IGTV 2Phases in some patients with SI motion ≤1 cm was lower than the general trend in patients with SI motion > 1 cm. Although the magnitude of SI motion did not significantly impact the accuracy of the IGTV contouring approaches, we found that the location of the primary tumor impacted IGTV contouring accuracy (Table 2). For example, we found that tumors located near the diaphragm (cases 1, 2, 3, and 15), mediastinum (case 8), and chest wall (cases 4, 6, 9, 10, and 12) appeared to have worse MI values than tumors located in the peripheral lung parenchyma (cases 5, 7, 11, 13, 14, 16, and 17) although it didn't reach statistical significance. Table 3 shows the SI motion and the IGTVs based on the test and all phases volumes for the 10 stage III lung tumors. As shown, the majority of these tumors (9/10) exhibited SI motion < 1 cm, so it was not meaningful to group these patients according to the 1-cm-SI motion threshold.
As with stage I lung tumors, we found that, regardless of the magnitude of SI motion, the IGTV MIP and IGTV 2Phases (mean ± SD: 193.27 ± 135.09 cm 3 Table 4 shows the MI values for each IGTV based on the test volumes and on the all phases volume for patients with stage III disease. In general, we found that the GTV MIP-Modified -based IGTV (mean ± SD: 0.93 ± 0.20) matched the GTV AllPhases -based IGTV the closest, followed by the IGTVs based on GTV 2Phases (mean ± SD: 0.91 ± 0.05) and GTV MIP (mean ± SD: 0.86± 0.07). There was a significant difference between GTV 2Phases -based and GTV MIP -based IGTVs (p = 0.05) and between GTV MIPbased and GTV MIP-Modified -based IGTVs (p = 0.03).
The volumetric underestimation and overestimation between the all phases volume and the test volumes for patients with stage I and III disease are shown in Table 5 We also observed that the volumetric underestimation percentages in stage III disease were lower than those in stage I disease. However, because GTVs are by definition larger in stage III than in stage I disease, the absolute volume underestimation was generally higher in stage III disease. Volumetric overestimation occurred in both stage I and stage III disease for both IGTV MIP and IGTV MIP-Modified . Overestimation for IGTV MIP-Modified was slightly higher than that for IGTV MIP , but both percentages were lower than 5.0% for the average volume overestimation and 10.10% for the maximum volume overestimation. Because IGTV 2Phases is a subset of IGTV AllPhases , the volumetric overestimation for IGTV 2Phases compared to the reference IGTV was always equal to zero. Figure 4 illustrates the proportional volumetric underestimations (Fig. 4a) and overestimations (Fig. 4b) in the 17 individual patients with stage I disease. We found that volumetric underestimation was > 10% using either IGTV MIP or IGTV 2Phases in 15 patients, but in no patients when IGTV MIP-Modified was used. Volumetric underestimation > 20% occurred in 5 patients using the IGTV MIP and in 7 patients using the IGTV 2Phases . Of the 5 patients in whom volumetric underestimation was > 20% using IGTV MIP , 2 had lesions near or attached to the diaphragm, 1 had a lesion near or attached to the chest wall, and another had a lesion near or attached to the mediastinum. Figure 5 illustrates the volumetric underestimations (Fig. 5a) and overestimations (Fig. 5b) in the 10 patients with stage III disease. We found that volumetric underestimation was > 5% in 9 patients using IGTV MIP , 8 patients using IGTV 2Phases , and 2 patients using IGTV MIP-Modified . Volumetric underestimation > 10% occurred in 6 patients using IGTV MIP , 1 patient using IGTV 2Phases , but no patients using IGTV MIP-Modified . In general, we found that the lowest volumetric underestimation was achieved consistently using the modified MIP approach to delineate the IGTV.
To analyze the accuracy of these contouring approaches in involved lymph nodes, we conducted the second analysis of involved lymph nodes in above stage III disease. Our data showed that IGTV MIP-Modified volume of lymph nodes (mean ± SD: 32.95 ± 40.86 cm 3 ) matched most closely with IGTV AllPhases volumes of lymph nodes (mean ± SD: 34.26 ± 42.56 cm 3 , p = 0.24), while IGTV 2Phases and IGTV MIP lymph node volumes (mean ± SD: 29.15 ± 38.14 and 25.63 ± 34.55 cm 3 respectively) differed significantly with IGTV AllPhases lymph node volume (p = 0.04 and 0.05 respectively, volume underestimation in all cases). In addition, the match index of lymph node IGTV MIP-Modified was not significantly different from IGTV 2Phases (p = 0.14) but was significantly different from IGTV MIP values (p = Computation of the underestimation area (dark gray) and the overestimation area (light gray) of the test area (area inside the dashed line) compared with reference area (area inside the solid line)

Discussion
Real-time tumor motion tracking provides most comprehensive data for respiratory tumor motion management. However, it is a challenging technique to implement in the clinical setting and more research is needed to make its clinical implementation more practical [14]. Although both MIP-based and two-phase-based approaches have been shown to more accurately delineate the GTV than conventional 3D CT-based planning, their accuracy has not been compared with that of ten-phase contouring approach particularly in stage III disease. Jin et al, in a phantom study, examined the feasibility of a method to determine ITV based on motion information obtained from select phases of a respiratory cycle [15]. They reported that adequate estimation of IGTV could in general be achieved by combining motion information from the extremes of motion in most cases and in some cases by the addition of motion information from an intermediate phase. Underberg et al. [8] reported that MIP-based contouring could provide reliable margins for determining the IGTV for stage I lung tumors treated with SBRT. However, their method did not include visual verification of the MIP-defined GTV contour through each individual phase of the 4D CT (IGTV MIP-Modified ). Bradley et al. [9] compared helical-, MIP-, and average-intensity (AI)-based 4-D CT imaging to find the optimal approach for determining the patient-specific IGTV for SBRT for stage I lung cancer. They found that the MIP-defined GTV was significantly larger than the helical-defined and average CTdefined GTVs. However, in their study, Bradley et al. did not compare the GTV based on GTV MIP with that based on GTV AllPhases , the optimal reference volume. Bradley et al. [9] did not discuss their results in the context of tumor location in their study. In another study, Cai et al. [10] determined the IGTVs for six lung tumors using a simulation method based on dynamic magnetic resonance imaging (dMRI) and MIPs. They found that MIP-based IGTVs were smaller than dMRI-based IGTVs. They concluded that because of the low temporal resolution and retrospective re-sorting, 4-D CT might not accurately depict the excursion of a moving tumor. Recent data by Rietzel et al also support our observation that tumor delineation on the MIP with subsequent visual verification of contours over all individual phases of the 4D CT yielded the best estimate of IGTV. However, there the performance of this approach in the delineation of involved lymph nodes was not separately addressed [11]. In daily clinical practice, tumor contouring in stage III disease is more challenging than in stage I disease because of the larger tumor volume, more complicated tumor shape, involvement of critical structures, and potential involvement of multiple lymph nodes in which tissue density is similar to that of the tumor. In addition, although the two-phase-based approach has been used to delineate IGTVs in the clinical setting, there is scant data on the accuracy of such twophase-based IGTVs in either stage I or stage III disease [16]. Our study showed that both MIP-based and twophase-based IGTVs underestimate the 10-phase-based IGTV in both stage I and III disease including involved lymph nodes, which can potentially result in marginal under-dosing, and that the IGTV MIP-Modified consistently  (3) The tumor spicula can not be visualized on the MIP projections due to smearing of the tumor edge. Indeed, our data show that the MI was poor and volumetric underestimation was high using the MIP-based approach to delineate IGTVs in most of lesions near the mediastinum, diaphragm, liver, and chest wall. Of these lesions, those closer to the diaphragm and liver had the lowest MI values, which could have been due to the significant motion of the diaphragm and liver and the MIP image's inability to record differences between the lesion and the diaphragm and liver. We are currently developing software that excludes diaphragm and liver images in some breathing phases using cine CT images so that better tumor MIP images will be preserved (data to be published). We should note that MIP images do not reflect the densities of tumors, lungs, and other normal tissues accurately enough for dose calculation in treatment planning [17]. Thus, a free-breathing CT image set, a 4-D scan of a single respiratory phase, or an average CT image set extracted from a 4-D CT data set should be used for treatment planning and dose calculation. This would be especially important in proton therapy, which is more sensitive to tumor motion and changes in tissue density. In a previous study on 4-D CT in proton therapy planning, we found that a MIP density override  for tumor contouring in an average CT data set was the optimal approach [18].
For the two-phase-based approach, tumor deformation between the two extreme phases of breathing and the curved motion pathway during each breathing cycle may introduce uncertainty. In most cases, however, we found that the MI of the two-phase-based IGTV was slightly higher than that of MIP-based IGTV, which indicates that most tumors moved in a generally straightforward SI direction and that tumor deformation during breathing was minimal. Particularly in stage III disease, we found that the volumetric underestimation was generally lower for the two-phase-based IGTV than for the MIP-based IGTV. Therefore, if 4-D CT based IGTV MIP-Modified is not available, the two-phase-based IGTV is a reasonable alternative approach to take tumor motion into consideration although it is not optimal one.
In clinical setting, it is common to prescribe the dose to PTV which takes additionally clinical target volume (CTV) and set-up uncertainty into consideration. The volumeunderestimation will be reduced if PTV was used to compare above mentioned four approaches. We evaluated the effect of this underestimation on the PTV in a case with maximal underestimation of the IGTV in stage I disease. IGTV was expanded by 1.6 cm (0.8 cm for CTV, 0.3 cm to account for variability in the determination of motion extent and 0.5 cm for image guided patient setup). Analysis of volumetric underestimation of the PTV was carried out in the same manner as described for IGTV. Our results showed that the volume underestimation reduced from 30.86%, 21.2%,8.53% in IGTV to 13.3%, 5.18% and 3.36% in PTV for IGTV MIP , IGTV 2Phases , IGTV MIP-Modified respectively. In general, this improvement is more dramatic in the lesions with the smaller size such as stage I disease. However, when ablative dose is attempted in clinical setting but sparing critical structures is concerning Average ± standard deviation and range are reported for stage I and stage III tumors. b. b.

Stage III tumors
such as SBRT in stage I disease, we would accept compromised coverage for PTV but not for IGTV. Therefore, IGTV delineation accuracy is still crucial clinically.
As with other such comparative studies mentioned above, inter or intra observer variability in the delineation of the GTV was not considered. The uncertainties introduced as a result of the above could however be thought to be different from those analyzed in this study, thereby requiring a separate analysis that is beyond the scope of the current report.

Conclusion
We found that the MIP-based and two-phase-based approaches to IGTV delineation significantly underestimated the IGTV in patients with stage I and stage III NSCLC. Due to the limitations of each approach, a significant amount of the tumor volume could be missed in individual patient so precautions should be taken when these techniques are used to treat patients. We also found that the IGTV MIP-Modified approach, which requires visual verification of tumor coverage after each phase of the breathing cycle, improved IGTV delineation in both cases.

Abbreviations
GTV: gross tumor volume; IGTV: internal gross tumor volume; CTV: Clinical target volume; PTV: Planning target volume; IGTV AllPhases : the gross tumor volume (GTV) contours from ten respiratory phases; IGTV 2Phases : the GTV contours from two extreme respiratory phases (0% and 50%); IGTV MIP : the GTV contour using the maximum intensity projection (MIP); IGTV MIP-Modified : the GTV contour using the MIP with modification based on visual verification of contours in individual respiratory phase.