Accuracy of automatic deformable structure propagation for high-field MRI guided prostate radiotherapy

Background In this study we have evaluated the accuracy of automatic, deformable structure propagation from planning CT and MR scans for daily online plan adaptation for MR linac (MRL) treatment, which is an important element to minimize re-planning time and reduce the risk of misrepresenting the target due to this time pressure. Methods For 12 high-risk prostate cancer patients treated to the prostate and pelvic lymph nodes, target structures and organs at risk were delineated on both planning MR and CT scans and propagated using deformable registration to three T2 weighted MR scans acquired during the treatment course. Generated structures were evaluated against manual delineations on the repeated scans using intra-observer variation obtained on the planning MR as ground truth. Results MR-to-MR propagated structures had significant less median surface distance and larger Dice similarity index compared to CT-MR propagation. The MR-MR propagation uncertainty was similar in magnitude to the intra-observer variation. Visual inspection of the deformed structures revealed that small anatomical differences between organs in source and destination image sets were generally well accounted for while large differences were not. Conclusion Both CT and MR based propagations require manual editing, but the current results show that MR-to-MR propagated structures require fewer corrections for high risk prostate cancer patients treated at a high-field MRL.


Background
Changes in anatomy over a radiotherapy (RT) treatment course for pelvic cancer has motivated adaptive treatment schemes [1,2]. However, until now the limitations of cone beam CT (CBCT) image quality and soft tissue contrast has hampered the clinical implementation [3,4]. With the introduction of magnetic resonance (MR) radiotherapy delivery systems, daily, MR guided adaptive radiotherapy (ART) has become possible with the potential to reduce the safety margins used today [5][6][7] and thus the treatment toxicity [8]. However, the workflow for an MR guided treatment fraction lasts longer than standard CBCT based linac treatment workflow, which increases the risk of patient motion during the treatment session [9][10][11]. One of the most time consuming steps in the MR workflow is the re-delineation and validation of target structures and organs at risk (OAR) [10,12]. Automatic propagation of structures might also reduce the risk of misdelineations that could introduce systematic or large random errors. Two commercial MR linac (MRL) systems are currently clinically available and for both systems a deformable image registration (DIR) algorithm is part of the treatment planning system (TPS) to facilitate a fast deformation of the planning images and structure propagation to reduce this time.
The standard workflow suggested by the vendor for the high field MRL has been specified as propagation of structures from the planning CT scan (pCT) to the MR scan of the given treatment session. Previously, different commercial algorithms have been evaluated on different anatomies when performing DIR from CT to MR, showing great potential, but also uncertainties in the deformations [13][14][15][16]. Hence, manual revision and some manual corrections of the propagated structures would usually be required across the different commercial DIR solutions and anatomies [13,[17][18][19]. Therefore use of MR-MR DIR in the online MRL treatment workflow, as described by Bertelsen et al. [11], could provide more precisely propagated structures. However, to our knowledge, this has not yet been verified.
It has been demonstrated that manual delineation of soft tissue structures is more consistent when using MR rather than CT, both in terms of the inter-and intraobserver variation [20][21][22]. E.g. Smith et al. showed that the inter-observer variation of prostate on T2 weighted (T2w) MRI was smaller than on CT [20]. Furthermore, the volume of the prostate delineated on CT was larger than on MR [20,22]. For these reasons the intraobserver variation on MR has been regarded as the reference precision in clinical practice [20,21,23].
This study investigates the geometric accuracy of deformable image registration of target volumes and OAR in high-risk prostate cancer patients for both CT-MR and MR-MR registrations using manual delineations as ground truth.

Patients
Twelve high-risk prostate cancer patients referred for 78 Gy for the prostate and proximal part of the seminal vesicles (SV) and 56 Gy for the pelvic lymph nodes, both delivered in 39 fractions on standard CBCT linacs, were included in the study. Exclusion criteria were contraindications to MR and metal implants in the pelvis (e. g. prosthetic hips).
The study has been approved by the regional board of ethics and all included patients have given their signed consent to participate.

Image acquisition
Planning CT scans were acquired on either a Philips Big Bore Brilliance 16 slice scanner (Philips Medical Systems BV, The Netherlands) or a Toshiba Aquillion One (Canon Medical Systems Corporation, Japan). The scan parameters applied were: 50 cm field of view (FOV), 512 × 512 matrix and 3 mm slice thickness.
All MR scans were performed in treatment position on a Philips Ingenia 1.5 T (Philips Medical Systems BV, The Netherlands) equipped with a flat table top and fixation device for feet and knees. Supplementary to a planning MR (pMR) acquired immediately before or after the pCT, three sets of MRI (MR 10 , MR 20 and MR 30 ) were acquired at the 10th, 20th, and 30th fraction (allowing a variation of +/− 2 days) as representative samples of daily MRI sessions over their treatment course. The T2w sequence applied in this study was a 3D scan with 3D image distortion correction applied. The following parameters were used to resemble the sequence suggested for use at the MRL by the vendor: 300 slices, 400 × 400 mm FOV, 0.5 × 0.5 mm pixels, 1 mm slice thickness, TE 216 ms, TR 1800 ms. Scan time was 5.56 min.
The geometric fidelity of the MR scanner was evaluated weekly according to a quality assurance program including scans and analysis of a vendor specific phantom. MR sequences applied clinically were evaluated with a MagPhan RT 820 phantom (The Phantom Laboratory, Salem, NY, USA) scan analysed in Total QA software (Image Owl, Greewich, NY, USA).

Structure delineation
For the current study, target volumes (prostate, SV and CTV56) were delineated on pCT and each acquired T2w MR set (pMR, MR 10 , MR 20 and MR 30 ) by one experienced oncologist. The prostate structure included the connective tissue capsule surrounding the gland. The elective volume CTV56 was defined as pelvic lymph nodes with a 7 mm margin as described by RTOG consensus guidelines [24]. All OAR (rectum, bladder, penile bulb, bowel and bilateral femoral heads) were delineated on each image set by one experienced RTT trained in delineation. For delineations on pCT a standard abdominal window/level settings were used (w 350/ l 40). Individual window/level settings were used for MRI delineations. Previous image sets were available for the observer during contouring, as they would be in a clinical setting.
These manual delineations represent the ground truth as to what each structure should encompass. The current study focuses on differences due to use of different image modality and not inter-observer variations. However the observed uncertainties are compared to the intra-observer variation which is less than the interobserver variation. Thus, structures were re-delineated on all patients' pMR by the same observers no less than one month after initial delineation to determine the intra-observer variation. In lack of the absolute truth, the intra observer-variation represents the best accuracy we can expect from the propagated structures [18].
Delineations were performed in the treatment planning system (TPS) dedicated to the Elekta Unity high-field MRL, Monaco ver. 5.40 (Elekta AB, Stockholm, The manually delineated prostate (green) is defined as reference to the deformed prostate (red) are shown in a. Distance between the two is calculated in b, and projections onto the coronal, sagittal and transversal planes are made. In c, the projections are per projection summed over the patients to provide the population percentile surface distance projection image Sweden). In this TPS, images are displayed using pixel interpolation.

Image registration
All registrations and structure propagations were performed in Monaco. The pCT was registered to each MR set and the pMR to each additional MR, as illustrated schematically in Fig. 1.
Standard settings of the commercial deformation software were used in all cases, specific details of the algorithm is not disclosed by the vendor. Deformation of CT images to MR was performed using normalized mutual information, whereas MR-MR deformations were performed using a local cross correlation algorithm.

Structure comparison
Three metrics were calculated to evaluate the accuracy of DIR: The Dice similarity coefficient (DSC), which delivers the ratio of overlap between the manually delineated structure and the corresponding deformable propagated structure [25]. This measure is most relevant for smaller structures, as the index for large structures might be very high, although large clinically relevant volumes might not overlap. The mean surface distance (MSD), as described by Zukauskaite et al. [26] gives the average distance between the manual and deformed structure in absolute measures, which is particularly relevant for larger structures. The Hausdorff distance (HD) delivers the greatest distance between a given pair of structures to show a worst case scenario [27], thus very sensitive to outliers in the data. For each patient the average value over all the scans of the DSC, MSD and HD was calculated for each structure investigated for both MR-MR and CT-MR registrations and compared to the intra-observer variation. In order to assess spatial patterns in the distribution of variation between ground truth and deformed or re-delineated contours, population based surface distance projection images were generated in the transversal, sagittal, and coronal plane for each structure. These projection images, showing the differences between ground truth and deformed structures or intra-observer variation, were created using the following procedure (see Fig. 2 for a graphical overview): The smallest possible bounding box, oriented along the main patient directions (Anterior-Posterior, Right-Left, and Cranio-Caudal), surrounding a given patient organ was initially defined. For all voxels at the surface of the organ, the distance to the reference organ was measured and then projected into the sagittal, transversal, and coronal plane. In these three planes, a grid encompassing the bounding box and with fixed number of pixels was defined (Additional file 1: Table A1). Within each pixel the projected deviations was averaged in order to observe scan specific spatial structures within the uncertainty. To provide the population based surface distance projection images, the 50 as well as the 90 percentile of all the scan individual projection deviations were subsequently found in each pixel. Although a specific type of structure may have varying size and shape depending on scan and patient, fixing the number of pixels in the three planes allow the generation of population based maps where the approximate distribution of distances, uncertainties can be assessed. The final projection images are shown with equal width and an aspect ratio corresponding to the mean structure based on all scans of all patients.

Statistical analysis
Differences between patient median DSC, MSD and HD values between CT-MR, MR-MR registrations and Fig. 3 Prostate (red) and bladder (green) delineated manually on pMR, shown in transverse and coronal view on the left. Note that the bladder and prostate do not occupy the same space. On the right these structures have been propagated onto MR 10 , and the bladder structure overlaps the prostate ground truth, as well as difference in structure volume between pCT and pMR, were tested for statistical significance by Wilcoxon signed-rank tests at the 5% significance level.

Results
In total 672 structures were created successfully by DIR propagation. However, for 25% of the patients (3/12) the bladder was not propagated correctly because of the TPS' inability to deform pin hole structures (the TPS' best approximation of a donut structure) correctly, as illustrated in Fig. 3. All MR-MR propagated structures yielded higher population median DSC than CT-MR propagations when compared to ground truth delineations ( Population median DSC and MSD showed statistically significant difference between CT-MR propagated contours and the intra-observer variation for all organs (Table 1 A). MR-MR was statistically similar to the intra observer variation in most cases (4 of 8 for DSC and 6 of 8 for MSD).
Visual inspection of the deformed structures and their source and destination image sets revealed that small differences between organs in the two image sets were generally well accounted for by the DIR algorithm. Contrarily, large differences were in most cases not at all compensated, as exemplified in Fig. 4. This effect impacted on the MSD and DSC values; deformable organs, such as the bladder and rectum, showed larger disagreements than rigid structures, e. g. the prostate and femoral heads. The difference in size between pCT or pMR and MR x ( Table 2) potentially affects the resulting deformation quality. Large variations in volume were observed for bladder and rectum delineations. The prostate structure was systematically smaller when delineated on MR than on CT (p < 0.001).
In Fig. 5, the DSC and MSD were plotted against the ratio of structure volume on the planning image versus the ground truth on the daily image (MR x ). For the prostate, only small deviations were observed between the volumes of the structure between the planning image and subsequent images, which translates to consistently high DSC and low MSD for all deformations. Still, a tendency of DSC being closer to 1 and low MSD values for ratios near 1 was observed.
A correlation between volume ratio and accuracy of propagated structures was seen for the bladder by the parabolic tendency in the plots with extrema approximately at volume ratio 1 (Fig. 6). A similar pattern was seen for rectum propagations, although less pronounced than for the bladder (Additional file 1: Figure A1).
The population difference between ground truth and deformed or re-delineated structures are visualized for the prostate in Fig. 7. The projection images show that the delineation differences overall were larger for CT-MR compared to MR-MR. Thus, the observed difference between the image modalities was not only related to specific volumes of the prostate, but also variation in delineations due to different image modalities. The largest error in DIR propagated prostate structures were seen in the anterior-cranial part of the gland and towards the rectum for CT-MR deformations (Fig. 7).
A similar pattern was observed in the population projection images for rectum (Fig. 8); overall, variations were larger for CT-based DIR with the largest differences towards the cranial boundary and the anterior wall. MR-MR DIR propagated structures were generally in good agreement, although cranially differences exceeded 3 mm. The trend was similar for the intraobserver variation. Projection images of the remaining investigated structures showed similar patterns; the largest deviations between both CT-MR and MR-MR propagated structures and ground truth was seen at the cranial and caudal limits of CTV56 (Additional file 1: Figure B1). The seminal vesicles showed the largest deviations anteriorly for CT-MR, but cranially for MR-MR (Additional file 1: Figure B2). The penile bulb propagations showed the largest deviations anteriorly for CT-  . 4 The large bladder volume seen on pCT and deformed to pMR matched the bladder seen on pMR poorly. Generally, large differences were not well accounted for by the deformable structure propagation MR, while MR-MR propagations were similar to the intra-observer variation (Additional file 1: Figure B6).

Discussion
This validation study has shown that intra-modal MR-MR image deformation is almost comparable in accuracy to the intra-observer variation of manual delineations.
Inter-modal CT-MR DIR was less accurate and thus not ideal for clinical use. CT-MR deformations were less accurate than the intra-observer variation and therefore will require more time for manual editing. This is problematic for online treatment adaptation due to the inconsistency in target and OAR definition but also due to the risk of internal organ motion increases. Because MR-MR deformations were similar in accuracy to intraobserver variations and thus require less manual corrections, the delineating clinician might have a better starting point when defining the target and OAR, and the risk of errors in delineation decreases. This result is similar to what has been shown to be the case for manual delineations: MR provides higher consistency of the delineations. In this light, the workflow using the pCT as the source for DIR to the session MR, as proposed as standard by the MRL vendor, might be suboptimal. Instead, a workflow using the pMR as the source would provide a set of propagated contours that better represent their actual shape and position. Hence, time required to review and revise while the patient is on the couch, is minimized. From the population surface distance projection images, it appears that some DIR propagation errors are systematic: Both for CT-MR and MR-MR the prostate differs mainly in the most anterior and cranial region, CTV56 at the caudal and cranial boundaries. Variations in bladder and rectum filling require corrections cranially and the penile bulb generally requires corrections anteriorly. Comparison of the 90 percentile surface distance projection images for MR-MR to the related intra-observer variation reveals that regions of large DIR uncertainty are also regions of large intra-observer variation, thus probably regions for which it inherently is difficult to define the "ground truth" of the delineation. Similar observations were made for rectum (Fig. 5), CTV56 (Additional file 1: Figure B1) and penile bulb (Additional  Figure B6). Since some structures appear more difficult to deform accurately for the DIR algorithm, it might produce better results if future algorithms include organ specific information, thereby allowing, e. g., larger variations in the cranio-caudal direction for the bladder. However, with the currently available DIR algorithm, the correlation found between large differences in the bladder volume on the planning image and daily image, indicates that minimizing the variation between the source image and images of the day may be appropriate, e. g. by using a drinking protocol. The current results indicates that only for volume ratios in the range of 0.8-1.2 result in reasonably accurate DIR structure propagations. Neither the CT-MR nor the MR-MR DIR was able to account for large volume changes.
It should be noted that in this study, the slice thickness of CT (3 mm) and MR (1 mm) differ. This means that by default there can be a difference between delineations cranially and caudally of up to 2 mm due to this fact alone. For structures of little extension in the caudo-cranial direction, but large in other directions this could have a non-negligible impact, favoring MR-MR DIR in this study. Equally, the difference in the pixel size between CT and MR could mean differences of up to 0.5 mm right-left and anterior-posterior directions. However, the population surface distance projection images of this study do not indicate that the difference in slice thickness severely impacts these results, as the geometric distribution of differences is the same for CT-MR as MR-MR. E.g. the difference between ground truth and DIR propagated contours is not increased in the most cranial and caudal regions of rigid structures such as the femoral heads for the CT-MR registrations compared to the more central regions (see Figure B4 and B5 in the Additional file 1).
Image quality itself could affect image registration and resulting structure propagations. In the current study, MR image acquisition during the course of treatment was made on a diagnostic MR scanner, rather than on the MR Linac itself. Although the scan sequence used were set up as closely as possible to the proposed clinical settings on the MRL, the acquired image quality is not identical to that of the MRL, as much of the hardware is designed differently, e. g. the magnet, gradient magnets and coils. A detailed comparison of image quality between the two scanners is beyond the scope of the current study. However, it has previously been demonstrated that the geometric fidelity of the scanners are similar [28].
For a workflow using MR as the primary image set for dose planning, electron densities required for dose calculations must be generated either based on a pCT or alternatively from a MR based pseudo CT in order to apply a MR-only planning workflow [29][30][31]. ART can also be performed using CBCT, which does provide ED information, which, with correct adjustments, can provide the basis for precise dose calculations [32][33][34]. With current standard CBCT technology, the accuracy of CT DIR to CBCT is similar to that of the inter-observer variation [35,36]. Future publications will document if new CBCT based with iterative image reconstruction technologies are able to increase deformation accuracy.
The organs at risk evaluated in this study are relevant for other pelvic indications than prostate cancer. Further studies will have to assess whether the DIR accuracy observed here also apply in other anatomical regions. Systematic shrinkage or swelling of organs is not expected for prostate cancer patients over the course of RT [37]. Therefore, continuously using the pCT and pMR, rather than successively using the latest acquired MR, should not affect the resulting overall accuracy of the DIR. In other indications, e.g. head and neck cancers and lung cancers, tumor shrinkage has been observed [38,39]. For such indications, it might be appropriate to successively use the last acquired MR for DIR for daily plan adaptation.

Conclusion
This study has shown that for high risk prostate cancer patients to be treated with an adapted plan on the Unity MRL, structures propagated from planning scan images to online daily MR images need manual editing, whether they were propagated from CT or MR. But the current results show that MR-MR propagated structures require fewer corrections and are therefore preferred for clinical use, as the online planning time may decrease and delineation accuracy increase compared to a CT-MR workflow. Thus, a clinical MR-MR workflow has been implemented locally.
Additional file 1. High resolution versions of images in the appendices are provided in the following files.