Skip to main content

A 4D-CBCT correction network based on contrastive learning for dose calculation in lung cancer



This study aimed to present a deep-learning network called contrastive learning-based cycle generative adversarial networks (CLCGAN) to mitigate streak artifacts and correct the CT value in four-dimensional cone beam computed tomography (4D-CBCT) for dose calculation in lung cancer patients.


4D-CBCT and 4D computed tomography (CT) of 20 patients with locally advanced non-small cell lung cancer were used to paired train the deep-learning model. The lung tumors were located in the right upper lobe, right lower lobe, left upper lobe, and left lower lobe, or in the mediastinum. Additionally, five patients to create 4D synthetic computed tomography (sCT) for test. Using the 4D-CT as the ground truth, the quality of the 4D-sCT images was evaluated by quantitative and qualitative assessment methods. The correction of CT values was evaluated holistically and locally. To further validate the accuracy of the dose calculations, we compared the dose distributions and calculations of 4D-CBCT and 4D-sCT with those of 4D-CT.


The structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) of the 4D-sCT increased from 87% and 22.31 dB to 98% and 29.15 dB, respectively. Compared with cycle consistent generative adversarial networks, CLCGAN enhanced SSIM and PSNR by 1.1% (p < 0.01) and 0.42% (p < 0.01). Furthermore, CLCGAN significantly decreased the absolute mean differences of CT value in lungs, bones, and soft tissues. The dose calculation results revealed a significant improvement in 4D-sCT compared to 4D-CBCT. CLCGAN was the most accurate in dose calculations for left lung (V5Gy), right lung (V5Gy), right lung (V20Gy), PTV (D98%), and spinal cord (D2%), with the relative dose difference were reduced by 6.84%, 3.84%, 1.46%, 0.86%, 3.32% compared to 4D-CBCT.


Based on the satisfactory results obtained in terms of image quality, CT value measurement, it can be concluded that CLCGAN-based corrected 4D-CBCT can be utilized for dose calculation in lung cancer.


Radiation therapy is one of the important methods of treating cancer. However, radiation may cause side effects on surrounding normal tissues, especially in organ treatment with precise positioning of respiratory motion, such as liver, lung, and mediastinum [1, 2]. In addition to intensity-modulated radiotherapy for lung cancer, stereotactic body radiotherapy (SBRT) is clinically applied for early-stage non-small cell lung cancer patients who are unsuitable for or refuses surgery [3, 4]. SBRT requires a large single dose, is more challenging in positioning and treatment. Using only three-dimensional (3D) imaging can cause the blurring of anatomical structures, whereas four-dimensional (4D) imaging can dynamically display the movement of organs in radiotherapy [5]. When the target position is affected by respiratory motion, the utilization of 4D-CT for localization and treatment planning can minimize the impact of respiratory-induced uncertainties on the displacement of the target position. Subsequently, the choice of a 4D-CBCT has practical significance for the repetition of target area location and dose during treatment [6]. Meanwhile, adaptive radiotherapy (ART) based on CBCT, which changes the treatment plan according to the transformation of the target area during sub-treatment, has clinical significance [7, 8]. However, the relevant studies are primarily limited to 3D-CBCT at present. Studies have shown that 4D-CBCT and 4D-CT used for adaptive radiotherapy (ART) can mitigate the impact of interfractional changes while reducing the PTV volume and minimizing radiation dose to normal tissue [9,10,11]. Harsolia et al. [9] compared various planning techniques including 3D-conformal, 4D-union, 4D-offline adaptive, and 4D-online adaptive to enhance the accuracy and decrease the planning target volume (PTV) margin in image-guided radiotherapy using 4D-CBCT. The results revealed that 4D-CBCT is more effective in guiding adaptive radiotherapy than 3D-CBCT. Nonetheless, 4D-CBCT suffers from low image contrast and poor quality due to the undersampling of the projections of each temporal phase [12]. Additionally, issues such as scatter artifacts, image lag, beam hardening, and patient movement during acquisition result in distorted CT values [13]. ART is a promising vision for the future, and these challenges present hurdles to the clinical implementation of 4D-CBCT for dose calculation if it were to be used in ART [9, 11].

Research in the field of CBCT value correction is primarily based on three types of artifacts: scatter, motion, and streak artifacts. The correction of scatter artifacts can be achieved through Monte Carlo simulation [14], which involves simulating the transmission, scattering, and absorption of X-rays in human tissues to improve the accuracy of CBCT dose calculation. However, the motion artifacts remaining in CBCT can cause blurring of tumors and tissues within the lungs. In addition to correcting motion artifacts, the use of 4D-CBCT in clinical practice effectively reduces the generation of motion artifacts but inevitably causes streak artifacts due to undersampling. To reduce streak artifacts, Li [15] et al. improved the image quality by increasing the scanning time and scanning dose, but it results in increased patient irradiated dose and reduced clinical efficiency. Accordingly, some studies use iterative algorithms such as total variation regularization [16] and non-local means [17] to protect the edges of the image and suppress noise. Wang et al. [18] proposed motion-compensated reconstruction based on prior knowledge to improve image quality. Considering the repetitiveness of patient respiratory motion, Huang et al. [19] optimized the registered deformation vector field (DVF) on this basis to further improve the efficiency and accuracy of reconstruction. In recent years, deep learning has been extensively used in medical-image classification, segmentation, denoising, and super-resolution reconstruction. It is also gradually being used in the image correction of 4D-CBCT. The primary application approaches include deep-learning models combined with other correction methods (4D-AirNet (2020) and CNN-MoCo (2023)) and deep-learning network models only. Given the over-smooth of image edges and contrast reduction caused by iterative algorithms, Jiang et al. [20] proposed SR-CNN (2018) to improve the sharpness of edges and anatomical structure details in undersampled CBCT. Sun [21] et al. proposed a model of U-net combined with transfer-learning strategy (2020). It uses transfer learning to fine-tune the 4D-CBCT enhanced by U-net, resulting in significant improvements in structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) compared with before fine tuning. Later, the RDN residual dense network (2020) proposed by Madesta [22] et al. simulates streak artifacts to achieve correction of 4D-CBCT without affecting the anatomical information.

The correction of 4D-CBCT by generating 4D-sCT is a research hotspot. Thummerer et al. [23] used deep convolutional networks to generate synthetic CT (sCT) through paired training of a single-phase image for dose calculation in lung-cancer radiotherapy. Considering that the training depends on the reproducibility of patient's breathing, 4D-CBCT cannot use paired supervised data for model training. Usui et al. [24] used cycle consistent generative adversarial networks (CycleGAN) for the unpaired training of images from two thresholds, 4D-CT and 4D-CBCT. However, due to the limited training data and training with only a single time phase during training, some bones are not fully recovered. The robustness also requires further improvement.

In the present study, 4D-CT and 4D-CBCT were paired trained in a network called contrastive learning (CL)-based cycle generative adversarial networks (CLCGAN), which combined the latest CL and CycleGAN [25,26,27]. CLCGAN was used to explore the mutual information present in 4D-CT and 4D-CBCT during training, aiming to train a model capable of generating images with reduced streak artifacts. Ideally, CLCGAN selectively generates images with high similarity in the feature space. To evaluate the model performance, quality and CT values were quantitatively assessed, and the accuracy of dose distribution and calculation of generated images was verified.

Materials and methods

Patient data

4D images of 20 patients with thoracic tumors were selected to train and test the deep-learning model. Patient data were obtained from a publicly available dataset in the Cancer Imaging Archive (TCIA, created by the National Cancer Institute [28, 29]. All the patients had locally advanced non-small cell lung cancer and received concurrent chemoradiotherapy, with a total dose ranging from 59.4 to 70.2 Gy delivered in daily 1.8 or 2 Gy fractions. All patient clinical information used for training and testing is shown in Table 1. Throughout their treatment, the patients all underwent 4D-CT imaging at least once and most received 4D-CBCT imaging during treatment fractions. Consequently, the dataset consisted of a total of 82 4D-CT and 507 4D-CBCT images from these 20 patients.

Table 1 Clinical information for 20 patients

Image data


4D-CT images were acquired on a 16-slice helical CT simulator (Brilliance Big Bore, Philips Medical Systems, Andover, MA, USA) under scanning conditions with a tube voltage of 120 kVp, tube currents of 50–114 mA, and exposure times of 3.53–5.83 ms. The respiratory signals obtained from the RPM respiratory gating system were divided into 10 phases from 0 to 90% in phase order, with the 0% phase corresponding to the end of inspiration. The slice thickness for each phase was 3 mm, and the image size was 512 × 512 with a pixel spacing of 0.9766 × 0.9766 mm2.


4D-CBCT images were acquired on a commercial CBCT scanner (On-Board Imager v1.3, Varian Medical Systems, Inc.) with 360° scanning at a tube voltage of 125 kVp, a tube current of 20 mA, and an exposure time of 20 ms. To promise the appropriate calculation of radiotherapy dose, CT number to electron density (CT-ED) calibration was performed with a CIRS (Norfolk, Virginia, US) phantom named Model 062M Electron Density Phantom on 4D-CBCT. During scanning, the respiratory surrogate used for 4D-CT were integrated into the 4D-CBCT acquisition system. The projection was sorted into the same 0–90% phases according to respiratory signal of surrogate. Each phase was reconstructed using the Feldkamp–Davis–Kress reconstruction algorithm with a slice thickness of 3 mm, an image size of 512 × 512, and a pixel spacing of 0.8789 × 0.8789 mm2.

4D-sCT based on CLCGAN

Image preprocessing

The training dataset comprised 4D images of 10 phases from 20 patients. Each phase comprised 50 slices, with a total of 10,000 4D-CT and 10,000 4D-CBCT slices. Each patient was centered on the lung cancer region, including the whole lung. Each phase of 4D-CT images were adjusted to the same size and resolution as the 4D-CBCT images using an open-source registration tooltik, elastix [30, 31]. The adjusted images were used for paired training with CLCGAN, and random flipping was applied during training to achieve data augmentation.

Network architecture

The CLCGAN network model applied the idea of CL to the dual-domain CycleGAN. It used only the similar features in the dual domain for image generation to realize the removal of streak artifacts. Therefore, CLCGAN comprised two branches: CycleGAN and CL. CycleGAN realized the mutual mapping of CBCT/CT to CT/CBCT to obtain the feature information of two samples. CL implemented constraints on the feature space to better guide image generation. Figure 1a shows the network architecture of CLCGAN. The implementation details of these two branches are described as follows.

Fig. 1
figure 1

Architecture and module details of the CLCGAN network: a illustrates the overall architecture of CLCGAN; and b showcases the detailed principle of contrastive learning, where (b−1) and (b−2) show the internal diagrams of the generator and discriminator, respectively

CycleGAN contained two symmetric sub-networks for generating 4D-sCT (CT → sCBCT → sCT) and 4D-sCBCT (CBCT → sCT → sCBCT). Each sub-network comprised two generators and one discriminator. Figure 1b shows the architecture of generators, where each generator comprised a three-layer encoder, a nine-residue block structure, and a three-layer decoder, whereas the discriminator comprised a four-layer encoder. The two sub-networks were simultaneously trained to extract features from CBCT and CT and thus form a feature space for regularization. The network performance was improved by optimizing the loss function between the generated and original image until the discriminator cannot distinguish between sCT, sCBCT and CT, CBCT, the model tends to converge. Ultimately, the removal of streak artifacts in 4D-CBCT was achieved by generating 4D-sCT, although the effect of artifact removal was weak. Accordingly, we combined CL to constrain the feature space and realize streak artifacts removal in latent space. CL is an unsupervised learning. The main idea is to set low-difference features with similar or common properties in CBCT and CT to “positive” and vice versa to “negative”. During training, only “positive” features were used for image reconstruction or image recovery. To maintain the model architecture, features were directly extracted from the encoder of the generator, and the features from each layer were sent to a two-layer multilayer perceptron. In the feature embedding space, the feature \(\hat{x}\) from one side of the CT or CBCT served as a query, whereas the other side contained the positive feature \(\hat{x}^{ + }\) and k negative feature {\(\hat{x}_{i}^{ - }\)}\(_{{{\text{i}} - 1}}^{{\text{k}}}\). Positive features were proximity to query, so they were correlated with each other (none streaking →  ← none streaking); otherwise, they were detached from each other (streaking ←  → none streaking). To visualize the impact of CL, the features extracted for image generation with and without CL were visualized using t-distributed Stochastic Neighbor Embedding (t-SNE) [32]. Results are shown in Fig. 2. The two features had closer distances and overlapped more after using CL. When using t-SNE to compare two features, if there is some degree of similarity between the two features, the corresponding data points in the t-SNE's two-dimensional coordinates will completely overlap and embed each other, rather than exhibiting distinct boundaries. Therefore, the features selected for generating the sCT were free of streak artifacts.

Fig. 2
figure 2

t-SNE Plots of Learned Features with and without CL. a and b represent the feature distribution obtained without and with the incorporation of contrastive learning, respectively

Loss function

In the experiment, the final loss function included a loss function \(L_{cont}\) for enforcing the distribution of the specified features, a loss function \(L_{adv}\) for minimizing the difference between the expected and predicted values of 4D-CT/4D-CBCT, and a loss function \(L_{cyc}\) for minimizing the difference between the original images of 4D-CT/4D-CBCT and the generated images. To further preserve the structure and content information of the images, a frequency loss function \(L_{freq}\) was utilized to fully leverage the frequency domain information. The overall loss function is represented as

$$L_{{{\text{total}}}} = \lambda_{1} L_{cont} + \lambda_{2} L_{cyc} + \lambda_{3} L_{adv} + \lambda_{4} L_{freq} ,$$

\(\lambda_{i}\) is the weight parameter for each item, and we set \(\lambda_{i}\), \(\lambda_{2}\), \(\lambda_{3}\) and \(\lambda_{4}\) to 2, 1, 1, and 0.01 respectively.

Contrastive loss: The feature was normalized to \(f = E(\hat{x})\), \(f^{ + } = E(\hat{x}^{ + } )\), \(f_{{\text{i}}}^{ - } = E(\hat{x}_{i}^{ - } )\) by formula, and the function of the canonical feature distribution is denoted as

$$\begin{gathered} L_{{{\text{cont}}}} {\text{(G}}_{SN}, {\text{G}}_{NS} ) = E_{s\sim S,n\sim N} \left[ { - {\text{log}}\frac{{{\text{sim}}(f,f^{ + } )}}{{{\text{sim(}}f,f^{ + } ) + \sum\nolimits_{{{\text{i}} = 1}}^{{\text{N}}} {{\text{sim(}}f,f_{i}^{ - } {)}} }}} \right], \hfill \\ \hfill \\ \end{gathered}$$
$${\text{sim}}({\text{u}},{\text{v}}) = {\text{exp}}\left( {\frac{{u^{\tau } v}}{{\left\| {\text{u}} \right\|\left\| {\text{v}} \right\|\tau }}} \right),$$

sim(u, v) represents the cosine similarity function between two normalized feature vectors, and τ represents the temperature parameter, which is set to 0.07.

Adversarial loss: The discriminator \(D_{N}\)/\(D_{S}\) was trained to make the discriminating output on 4D-CT/4D-CBCT close to 1 and the generated 4D-sCT/4D-sCBCT image output close to 0. Minimize \(L_{adv}\). Thus, the final generator made the output of discriminator on generated images as close to 1 as possible. Therefore, the adversarial loss function is denoted as

$$L_{adv} (G_{SN}, {\text{D}}_{N} ) = E_{n\sim N} \left[ {\log D_{N} ({\text{n}})} \right] + E_{s\sim S} \left[ {\log (1 - D_{N} (G_{SN} ({\text{s}})))} \right],$$

Cycle consistency loss: The generator \(G_{SN}\)/\(G_{NS}\) was trained to minimize \(L_{cyc}\) so that the difference between the generated image and the real sample s/n was minimized. The cycle consistency loss function is denoted as

$$L_{{{\text{cyc}}}} = E_{n\sim N}^{{}} \left[ {\left\| {G_{SN} \left( {G_{NS} \left( n \right)} \right) - n} \right\|_{1} } \right] + E_{s\sim S}^{{}} \left[ {\left\| {G_{NS} \left( {G_{SN} \left( {\text{s}} \right)} \right) - s} \right\|_{1} } \right],$$

Frequency loss:

$$L_{freq} = E_{n\sim N} [||FT(G_{S2N} (G_{N2S} (n))) - FT(n)||_{2}^{2} ] + E_{r\sim R} [||FT(G_{N2S} (G_{S2N} (r))) - FT(r)||_{2}^{2} ],$$

Parameter selection

During training, a batch size of 1 and instance normalization were used. The training images were randomly cropped into 512 × 512 blocks in a paired manner for CL. In the training process, Adam optimizer with parameters \(\beta_{1}\) = 0.5 and \(\beta_{2}\) = 0.999 and a learning rate of 0.0002 were adopted, and the model was trained for 100 epochs starting from 0. The entire network based on the PyTorch framework was implemented on a deep-learning server (Inter (R) Xeon (R) Gold 6133 CPU @ 2.50 GHz, NVIDIA A100 80 GB, 256 GB).

Evaluation methods

Image-quality assessment

To evaluate the effect of the CLCGAN model in removing image artifacts, we selected five cases comprising 2500 untrained paired 4D-CT and 4D-CBCT slices for testing. The resolution and size of the testing data were kept consistent with the training data. The evaluation comprised two parts: comparing the generated 4D-sCT with the original 4D-CT, and comparing the 4D-sCT generated using the CLCGAN and CycleGAN network individually.

To quantitatively evaluate the image quality, the 4D-CBCT, 4D-sCT based on CycleGAN, and CLCGAN were measured against the original 4D-CT by using SSIM and PSNR. To enable better use of 4D-sCT for guidance and dose calculation in lung-cancer radiation therapy, the CT values of 4D-CBCT and 4D-sCT were measured against the 4D-CT using mean error (ME) and mean absolute error (MAE). To ensure an accurate evaluation of the training results, the precision of the registration was measured by calculating mutual information (MI). Lastly, paired t-tests were performed in Statistical Product and Service Solutions (SPSS) software to assess significant differences between all 4D-sCT and 4D-CBCT results. Given the conduct of multiple hypothesis tests, all p-values were assessed following Bonferroni correction. When the p-value is less than 0.003, the results are significantly different. The corresponding expressions are shown below:

$$SSIM(X,Y) = \frac{{\left( {2{\mu}_{X} {\mu}_{Y} + C_{1} } \right)(2{\sigma}_{X} {\sigma}_{Y} + C_{2} )}}{{({\mu}_{X}^{2} + {\mu}_{Y}^{2} + C_{1} )\left( {{\sigma}_{X}^{2} + {\sigma}_{Y}^{2} + C_{2} } \right)}},$$
$$PSNR = 10\log_{10} \frac{{\max \left| {X\left( {i,j} \right)} \right|^{2} }}{MSE},$$
$$MSE = \frac{1}{M \times N}\sum\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{N} {(X(i,j) - Y(i,j))^{2} } } ,$$
$$\begin{gathered} ME(X,Y) = \frac{1}{M \times N}\sum\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{N} {(X(i,j) - Y(i,j))} } , \hfill \\ \hfill \\ \end{gathered}$$
$$MAE(X,Y) = \frac{1}{M \times N}\sum\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{N} {|X(i,j) - Y(i,j)|} } ,$$
$$p_{i} = h_{i} /\left( {\sum\limits_{i = 1}^{N - 1} {h_{i} } } \right),$$
$$H\left( Y \right) = - \sum\limits_{i = 0}^{N - 1} {p_{i} } \log p_{i} ,$$
$$H\left( {X,Y} \right) = - \sum\limits_{x,y} {p_{xy} \left( {x,y} \right)\log p_{xy} (x,y)} ,$$
$$MI(X,Y) = H(X) + H(Y) - H(X,Y),$$

In the expression of SSIM, X represents 4D-CBCT and 4D-sCT, and Y represents 4D-CT. \(\mu_{{\text{x}}}\) and \(\mu_{{\text{y}}}\) denote the average pixel values of images X and Y, respectively. \(\sigma_{{\text{x}}}\) and \(\sigma_{{\text{y}}}\) represent the variances, whereas C is a regularization constant with C1 and C2 taken as (0.01 × 2000)2 and (0.03 × 2000)2, respectively. The dynamic range of the image pixels was 4095. In the expressions of mean-square error (MSE), ME, and MAE, X represents 4D-CBCT and 4D-sCT, whereas Y represents 4D-CT. M and N represent the width and height of the input images, respectively. The expression for PSNR was obtained by dividing the maximum value by the MSE. In formulas (12) (13) (14) (15), X and Y denote two images, where \(h_{i}\) represents the sum of pixel points in image Y with gray i, N represents the gray level in image Y, and \(P_{i}\) represents the probability of gray i. H(Y) denotes the entropy of an image, H(X,Y) denotes the joint entropy of X and Y. MI reflects the degree of information contained between two images, with value ranging from 0 to positive infinity. The higher the similarity or overlap between images, the smaller the joint entropy and the greater the MI. After conducting paired t-tests, statistical significance was observed in the SSIM, PSNR, ME, MAE and MI of the 4D-sCT images.

To measure the local information of CT values, the 4D-CBCT, 4D-CT, and 4D-sCT images of five patients were outlined with 35 × 35, 15 × 15, and 25 × 25 regions of interests (ROIs) in the lungs, bones, and soft tissues. The mean CT values were then measured. The CT values indicated that the mean CT value difference between 4D-sCT and 4D-CT was smaller, and the images generated based on CLCGAN had the smallest differences. Moreover, to evaluate the CT value errors of the lung tumor, the 4D-CBCT, 4D-CT, and 4D-sCT images of five patients were outlined with 15 × 15 ROIs in the region of the lung tumor. The results indicated that the CT value error of CLCGAN is smaller.

Dose evaluation

To assess the accuracy of dose calculations, the dose distributions of 4D-CT, 4D-CBCT, and 4D-sCT were compared and the relative percentage difference (RPD) was calculated. Each phase of 4D-CT for five tested patients was contoured for target delineation and the GTV and PTV contours averaged by ten phases were used for volumetric-modulated arc therapy planning by using a planning system (Monaco 5.1, Elekta). A prescription dose of 6000 cGy over 30 days was applied. Subsequently, the 4D-CBCT and 4D-sCT generated by both methods were rigidly registered with the reference 4D-CT, and the structure contours and treatment plans from the reference 4D-CT were copied to each image. Dose calculations were performed on all images, and dose–volume histogram (DVH) parameters were assessed for the PTV, left lung, right lung, and spinal cord. For the PTV, the dose at D98% and D2% was calculated, whereas for the spinal cord, the dose at D2% was calculated. For the left and right lungs, the lung volume was calculated at V20Gy and V5Gy, respectively.

$$RPD = \frac{|A - F|}{{(A + F)/2}} \times 100\% ,$$

In the expression of RPD, A represents the dose or volume of 4D-CT, and F represents the dose or volume of 4D-CBCT and 4D-sCT (Cyc, and CLC).


Tables 2 and 3 present the results of image-quality evaluation. CLCGAN improved its performance in terms of SSIM and PSNR, increasing from 0.771 and 22.31 dB to 0.980 and 29.15 dB, respectively. The ME and MAE of overall CT values also decreased from − 116.70 and 220.29 to 3.20 and 70.76. Additionally, compared with CycleGAN, CLCGAN showed an improvement of 0.11 and 0.42 dB in SSIM and PSNR, respectively, and a reduction of 0.25 and 3.39 in ME and MAE. After t tests, we found that all the improvements and reductions were statistically significant, and the improvement of CLCGAN in SSIM and PSNR also had statistical significance.

Table 2 Evaluation Results of Structural Similarity and Peak Signal-to-Noise Ratio
Table 3 Evaluation results of mean error and mean absolute error

Additionally, Table 4 illustrates the MI between the registered 4D-CBCT, the 4D-sCT generated using two methods and the 4D-CT. The results reveal that the MI between the registered 4D-CBCT and 4D-CT is only 0.735, whereas there is a substantial improvement in the accuracy of 4D-sCT (p < 0.01), with CycleGAN and CLCGAN yielding respective improvements of 0.568 and 0.588. After t tests, we found that the improvements of the 4D-sCT were statistically significant, and the improvement of CLCGAN based on CycleGAN had statistical significance.

Table 4 Evaluation results of mutual information

To illustrate the qualitative evaluation results of the images, we provided image slices of all tested patients, including 4D-CBCT, 4D-CT, and two types of 4D-sCT (Figs. 3 and 4). Figure 3 displays the slices in three directions for the first tested patient, whereas Fig. 4 shows axial slices for the remaining four patients. Under the same window and width, we observed that CLCGAN generated images with fewer artifacts in the lungs, more continuous lung texture, and clearer and more accurate details than CycleGAN. CLCGAN also performed better in restoring bone tissue and effectively recovering details of muscle and soft tissue.

Fig. 3
figure 3

Structural Images of Patient 1 in Different Directions. The four columns represent 4D-CBCT, 4D-CT, 4D-sCT(Cyc), and 4D-sCT(CLC) images, respectively. All images are displayed at the same window width and window level

Fig. 4
figure 4

Structural images of four test patients. The four columns represent 4D-CBCT, 4D-CT,4D-sCT(Cyc), and 4D-sCT(CLC) images. All images are displayed at the same window width and window level

To visually demonstrate the results of CT value correction, we selected one patient and performed subtraction between 4D-sCT and 4D-CT, as well as between the two types of 4D-sCT. Thus, we obtained axial CT value difference images (Fig. 5). Both methods were found to effectively preserve the overall structure of the 4D-sCT images. However, the CT value error was evidently smaller in the images generated by CLCGAN compared with those by 4D-CT. Particularly in the lungs and some bone structures, the difference between the images generated by CLCGAN and the 4D-CT images was smaller than that between the images generated by CycleGAN. Furthermore, we conducted a subtraction of dose distribution between the 4D-CBCT, 4D-sCT and 4D-CT for the patient, resulting in the dose difference images (Fig. 5). The findings indicate that the dose difference between the 4D-sCT generated by CLCGAN and the 4D-CT is the most minimal.

Fig. 5
figure 5

CT value difference maps and dose difference maps of Patient 2. The first row is the CT value difference, and the second row is the dose difference. 5-1 shows the difference between 4D-sCT (Cyc) and 4D-CT. 5-2 displays the difference between 4D-sCT (CLC) and 4D-CT. 5-3 represents the difference between 4D-sCT (Cyc) and 4D-sCT (CLC). 5-4 shows the difference between 4D-CBCT and 4D-CT. 5-5 displays the difference between 4D-sCT (Cyc) and 4D-CT. 5-6 displays the difference between 4D-sCT (CLC) and 4D-CT

Figure 6 depicts the quantitative evaluation of the localized 3D ROI and the mean CT difference in the ROI at different phases for all tested patients under the same window and width. CLCGAN showed significant improvements in the restoration of the lung, bone, and soft tissue. The absolute mean differences from 4D-CT decreased from 137.31, 183.15, and 50.67 to 66.28, 62.91, and 43.72, respectively. Furthermore, the artifact removal of lungs, bones, and soft tissues was also significantly improved with CLCGAN relative to CycleGAN, with decreases of 18.00, 20.94, and 5.7, respectively.

Fig. 6
figure 6

Differences in mean CT values for the regions of interests (bone, lung, and soft tissue) compared with 4D-CT

The Table 5 provides the CT value errors and the errors for each patient was acquired by delineating the regions of interest for each phase. Comparing the results of 4D-CBCT and 4D-sCT with the ground truth of 4D-CT, the errors of 4D-sCT are smaller than those of 4D-CBCT, and CLCGAN demonstrates the lower errors for the lung tumor compared with CycleGAN.

Table 5 CT difference of lung tumor for 5 tested patients

The dose-calculation results are shown in Tables 6 and 7. They show the average dose difference relative to 4D-CT for the five patients and the dose difference relative to 4D-CT for each patient, respectively. In all dose-calculation results, the 4D-sCT showed a significant improvement compared with 4D-CBCT, with the relative difference close to zero. CLCGAN performed most accurately in dose calculation for the left lung (V5Gy), the right lung (V5Gy, V20Gy), the therapeutic target area (D98%), and the spinal cord (D2%). Specifically, we showed dose distribution and dose–volume histograms for one tested patient (Fig. 7). CT1, CT2, CT3, and CT4 represent the dose distribution for the reference 4D-CT, 4D-sCT (CLCGAN), 4D-CBCT, and 4D-sCT (CycleGAN), respectively. Evidently, CT2 closely resembled the dose curve of the reference CT in terms of the decrease in dose in the target region and the dose at 50% volume for the right lung and spinal cord.

Table 6 Average results of dose calculations for all patients
Table 7 Results of dose calculations for all patients
Fig. 7
figure 7

Dose distribution and dose–volume histograms (DVHs) of Patient 2: a 4D-CT, 4D-CBCT, and 4D-sCT (CLC); and b 4D-CT, 4D-CBCT, and 4D-sCT (Cyc)


4D-CBCT is an imaging technique that can display real-time lung motion. It has great practical significance in conventional and SBRT for lung cancer. However, factors such as streak artifacts caused by insufficient projection acquisition at each phase and scatter artifacts during acquisition can affect the accuracy of CT values. Such distortion can reduce the imaging quality of 4D-CBCT, make dose calculations imprecise (Fig. 7), and hinder the progress of 4D-CBCT image-based ART [24]. Therefore, we proposed a network framework called CLCGAN to utilize the feature-extraction capability of CL and thus improve the image quality of the generative model.

To reduce the problem of slight anatomical displacement caused by patient respiratory motion [34], we performed deformable registration of 4D-CT and 4D-CBCT before training. The registered 4D-CT was used as ground truth for validation. During training, we selected 10 phases. CycleGAN achieved better results in bone recovery in 4D-sCT than training with a single phase [24]. However, according to Figs. 3 and 4. severe artifacts remained in the lungs, and the lung texture was unclear. These blurry artifacts can interfere with the clinical assessment of small structures, such as blood vessels and airways. Our network learned to remove streak artifacts through feature selection before generating the images. As a result, the 4D-sCT obtained by CLCGAN can greatly reduce the streak artifacts in the lungs, the lung texture was clearer, the bone tissues were more accurate, and the results obtained were closer to the real 4D-CT. Furthermore, the quantitative evaluation listed in Table 2 shows an improvement in SSIM and PSNR for our results, which was statistically significant (P < 0.01). Due to the problems of mode collapse and unstable loss, generative adversarial networks can generate unreal, blurred, and under-diverse images [35]. CycleGAN failed to correctly recover soft tissues within some parts of the chest wall (patients 2, 3, 4, and 5) and certain high CT value regions near the spine (patient 5). Conversely, our method reduced these distortion effects (red lines in Fig. 4). However, our method is slightly over-corrected (green lines in Fig. 4), such as the brightening of the pericardial region of patient 4 caused by streak artifact is synthesized to appear even brighter. And the over-correction may be attributed to overlearning the training dataset and model’s complexity. In the future, the matter may be avoided by reducing the model’s complexity by fine-tuning the model parameters of the training dataset [36]. Moreover, Table 4 shows that the accuracy of mutual information between the registered 4D-CBCT and 4D-CT is 0.735 ± 0.08, while the two types of 4D-sCT based on CycleGAN and CLCGAN are 1.303 ± 0.08 and 1.323 ± 0.08, respectively. According to the results, the generated 4D-sCT recovers lung textures, bone, and soft tissue, leading to higher mutual information. Compared to CycleGAN, CLCGAN exhibits fewer residual artifacts and higher capability of detail recovery, thus possessing higher mutual information.

Given that 4D-CBCT can be used for ART and accurate dose calculation is needed when applying 4D-CBCT for ART, restoring the CT values while improving the image quality was necessary [9, 24]. Therefore, the ME and MAE of the overall CT value were calculated, and CT and dose difference maps were produced for the generated images, as shown in Table 3 and Fig. 5. The ME and MAE of the CT values were significantly reduced, and the dose difference was significantly decreased. The quantitative evaluation results of CT values for the local ROI are shown in Fig. 6. CLCGAN significantly improved the restoration of the lungs, bones, and soft tissues, with the smallest differences compared with 4D-CT. The improvements were more significant in the lungs and bones, consistent with the previous results from generating 4D-sCT [23, 24, 34]. Additionally, Table 5 focuses on the CT value errors of the tumor region, and the results show that the use of CLCGAN minimizes both the ME and MAE of CT value in the region of lung tumor.

4D images used for dose calculations, guiding conventional and SBRT adaptive radiation therapy have been showed to improve target repeatability while reducing target volume and radiation dose to normal tissues [9, 11]. Sonke et al. experimented on sixty-five lung cancer patients who treated with SBRT without a body frame to 54 Gy in three fractions. Even with considerable breathing motion, the PTV margins can safely be kept small [37]. Similarly, Bellec et al. demonstrated a reduction in PTV in thirty-two lung cancer patients who received a prescribed dose of 48–54 Gy in three to six fractions under the guidance of 4D-CBCT [38]. Additionally, Harsolia A et al. conducted 3D-CBCT and 4D-CBCT-guided ART to eight lung cancer patients who received a prescription dose of 63 Gy in thirty-five fractions, and 4D-CBCT achieved the best results in decreasing PTV volume and normal tissue doses [9]. In our study, a prescription dose of 60 Gy in thirty fractions was applied. The results of Table 6 and 7 were obtained by performing dose calculations on the test patients. The mean relative difference of the five patients showed a reduction in therapeutic target area and all critical organs, with all relative difference close to 0. Compared with 4D-sCT based on CycleGAN, CLCGAN further optimized the dose calculation for the V5Gy of left and right lung, V20Gy of right lung, D98% of therapeutic target area, and D2% of the spinal cord. However, it did not achieve better results for the V20Gy of left lung and D2% of therapeutic target area.

Although CLCGAN effectively corrected 4D-CBCT, it did not improve the recovery of structures such as blood vessels inside the heart, which can cause the low PSNR. Based on Table 4, the accuracy of registration is not particularly high, which may be of the reason for blood vessels recovery as well as low PSNR [39]. When removing streak artifacts, we did not consider the small amount of subtle lung texture in the feature maps [25, 26]. Moreover, faster CBCT scanners may be used in clinical practice, resulting in a shorter respiratory cycle and a larger interval between projections, which can lead to lower image quality. We did not perform experiments under such conditions, and the robustness of our method still needs to be considered. This was a limitation of our approach. In the future, it may be beneficial to collect clinical data from multiple centers or simulate datasets with sparse projections from different angles to address these limitations.


We demonstrated the ability of CLCGAN to generate 4D-sCT from undersampled 4D-CBCT. Satisfactory results were obtained by quality assessment, CT value evaluation and dose calculation, with reference to 4D-CT images acquired on the same day. Therefore, the corrected 4D-CBCT based on CLCGAN can be used for dose calculation in lung cancer.

Availability of data and materials

The datasets used during the current study are available from the corresponding author on reasonable request.



Four-dimensional cone beam computed tomography


Contrastive learning-based cycle generative adversarial networks


Adaptive radiotherapy


Four-dimensional synthetic computed tomography


Four-dimensional computed tomography


Structural similarity index measure


Peak signal-to-noise ratio




Cycle consistent generative adversarial networks


Planning target volume


Stereotactic body radiotherapy


Deformation vector field


Contrastive learning


Right upper lobe


Right lower lobe


Left upper lobe


Left lower lobe


T-distributed Stochastic Neighbor Embedding


Mean error


Mean absolute error


Mean-square error


Mutual information


Regions of interest


Dose–volume histogram


Relative percentage difference


  1. Wang K, Tepper JE. Radiation therapy-associated toxicity: etiology, management, and prevention. CA Cancer J Clin. 2021;71(5):437–54.

    Article  PubMed  Google Scholar 

  2. Dhont J, Harden SV, Chee LYS, Aitken K, Hanna GG, Bertholet J. Image-guided radiotherapy to manage respiratory motion: lung and liver. Clin Oncol (R Coll Radiol). 2020;32(12):792–804.

    Article  CAS  PubMed  Google Scholar 

  3. Thai AA, Solomon BJ, Sequist LV, Gainor JF, Heist RS. Lung cancer. Lancet. 2021;398(10299):535–54.

    Article  PubMed  Google Scholar 

  4. Shultz DB, Diehn M, Loo BW Jr. To SABR or not to SABR? Indications and contraindications for stereotactic ablative radiotherapy in the treatment of early-stage, oligometastatic, or oligoprogressive non-small cell lung cancer. Semin Radiat Oncol. 2015;25(2):78–86.

    Article  PubMed  Google Scholar 

  5. Vinod SK, Hau E. Radiotherapy treatment for lung cancer: current status and future directions. Respirology. 2020;25(Suppl 2):61–71.

    Article  PubMed  Google Scholar 

  6. Vergalasova I, Cai J. A modern review of the uncertainties in volumetric imaging of respiratory-induced target motion in lung radiotherapy. Med Phys. 2020;47(10):e988–1008.

    Article  PubMed  Google Scholar 

  7. Giacometti V, Hounsell AR, McGarry CK. A review of dose calculation approaches with cone beam CT in photon and proton therapy. Phys Med. 2020;76:243–76.

    Article  PubMed  Google Scholar 

  8. Yoo S, Yin FF. Dosimetric feasibility of cone-beam CT-based treatment planning compared to CT-based treatment planning. Int J Radiat Oncol Biol Phys. 2006;66(5):1553–61.

    Article  PubMed  Google Scholar 

  9. Harsolia A, Hugo GD, Kestin LL, Grills IS, Yan D. Dosimetric advantages of four-dimensional adaptive image-guided radiotherapy for lung tumors using online cone-beam computed tomography. Int J Radiat Oncol Biol Phys. 2008;70(2):582–9.

    Article  PubMed  Google Scholar 

  10. Britton KR, Starkschall G, Liu H, Chang JY, Bilton S, Ezhil M, John-Baptiste S, Kantor M, Cox JD, Komaki R, Mohan R. Consequences of anatomic changes and respiratory motion on radiation dose distributions in conformal radiotherapy for locally advanced non-small-cell lung cancer. Int J Radiat Oncol Biol Phys. 2009;73(1):94–102.

    Article  PubMed  Google Scholar 

  11. O’Brien RT, Dillon O, Lau B, George A, Smith S, Wallis A, Sonke JJ, Keall PJ, Vinod SK. The first-in-human implementation of adaptive 4D cone beam CT for lung cancer radiotherapy: 4DCBCT in less time with less dose. Radiother Oncol. 2021;161:29–34.

    Article  PubMed  Google Scholar 

  12. Sonke JJ, Zijp L, Remeijer P, van Herk M. Respiratory correlated cone beam CT. Med Phys. 2005;32(4):1176–86.

    Article  PubMed  Google Scholar 

  13. Schulze R, Heil U, Gross D, Bruellmann DD, Dranischnikow E, Schwanecke U, Schoemer E. Artefacts in CBCT: a review. Dentomaxillofac Radiol. 2011;40(5):265–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Thing RS, Bernchou U, Hansen O, Brink C. Accuracy of dose calculation based on artefact corrected cone beam CT images of lung cancer patients. Phys Imaging Radiat Oncol. 2017;1:6–11.

    Article  Google Scholar 

  15. Li T, Xing L. Optimizing 4D cone-beam CT acquisition protocol for external beam radiotherapy. Int J Radiat Oncol Biol Phys. 2007;67(4):1211–9.

    Article  PubMed  Google Scholar 

  16. Sidky EY, Pan X. Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization. Phys Med Biol. 2008;53(17):4777–807.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Jia X, Tian Z, Lou Y, Sonke JJ, Jiang SB. Four-dimensional cone beam CT reconstruction and enhancement using a temporal nonlocal means method. Med Phys. 2012;39(9):5592–602.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Wang J, Gu X. Simultaneous motion estimation and image reconstruction (SMEIR) for 4D cone-beam CT. Med Phys. 2013;40(10): 101912.

    Article  ADS  PubMed  Google Scholar 

  19. Huang X, Zhang Y, Chen L, Wang J. U-net-based deformation vector field estimation for motion-compensated 4D-CBCT reconstruction. Med Phys. 2020;47(7):3000–12.

    Article  PubMed  Google Scholar 

  20. Jiang Z, Chen Y, Zhang Y, Ge Y, Yin FF, Ren L. Augmentation of CBCT reconstructed from under-sampled projections using deep learning. IEEE Trans Med Imaging. 2019;38(11):2705–15.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Sun L, Jiang Z, Chang Y, Ren L. Building a patient-specific model using transfer learning for four-dimensional cone beam computed tomography augmentation. Quant Imaging Med Surg. 2021;11(2):540–55.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Madesta F, Sentker T, Gauer T, Werner R. Self-contained deep learning-based boosting of 4D cone-beam CT reconstruction. Med Phys. 2020;47(11):5619–31.

    Article  PubMed  Google Scholar 

  23. Thummerer A, Seller Oria C, Zaffino P, Visser S, Meijers A, GuterresMarmitt G, Wijsman R, Seco J, Langendijk JA, Knopf AC, Spadea MF, Both S. Deep learning-based 4D-synthetic CTs from sparse-view CBCTs for dose calculations in adaptive proton therapy. Med Phys. 2022;49(11):6824–39.

    Article  PubMed  Google Scholar 

  24. Usui K, Ogawa K, Goto M, Sakano Y, Kyougoku S, Daida H. A cycle generative adversarial network for improving the quality of four-dimensional cone-beam computed tomography images. Radiat Oncol. 2022;17(1):69.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Spiegl B. Contrastive unpaired translation using focal loss for patch classification. arXiv preprint arXiv:2109.12431; 2021.

  26. Chen X, Pan J, Jiang K, et al. Unpaired deep image deraining using dual contrastive learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. p. 2017–2026.

  27. Zhu J Y, Park T, Isola P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2017. p. 2223–2232.

  28. Clark K, Vendt B, Smith K, Freymann J, Kirby J, Koppel P, Moore S, Phillips S, Maffitt D, Pringle M, Tarbox L, Prior F. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J Digit Imaging. 2013;26(6):1045–57.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Hugo GD, Weiss E, Sleeman WC, Balik S, Keall PJ, Lu J, Williamson JF. A longitudinal four-dimensional computed tomography and cone beam computed tomography dataset for image-guided radiation therapy research in lung cancer. Med Phys. 2017;44(2):762–71.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Klein S, Staring M, Murphy K, Viergever MA, Pluim JP. Elastix: a toolbox for intensity-based medical image registration. IEEE Trans Med Imaging. 2010;29(1):196–205.

    Article  PubMed  Google Scholar 

  31. Shamonin DP, Bron EE, Lelieveldt BP, Smits M, Klein S, Staring M. Alzheimer’s Disease neuroimaging initiative. Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer’s disease. Front Neuroinform. 2014;7:50.

    PubMed  PubMed Central  Google Scholar 

  32. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res 2008;9(11).

  33. Pluim JPW, Maintz JBA, Viergever MA. Mutual-information-based registration of medical images: a survey. IEEE Trans Med Imaging. 2003;22(8):986–1004.

    Article  PubMed  Google Scholar 

  34. Riblett MJ, Christensen GE, Weiss E, Hugo GD. Data-driven respiratory motion compensation for four-dimensional cone-beam computed tomography (4D-CBCT) using groupwise deformable registration. Med Phys. 2018;45(10):4471–82.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Saxena D, Cao J. Generative adversarial networks (GANs) challenges, solutions, and future directions. ACM Comput Surveys (CSUR). 2021;54(3):1–42.

    Article  Google Scholar 

  36. Ying X. An overview of overfitting and its solutions. J Phys Conf Ser. 2019;1168:022022.

    Article  Google Scholar 

  37. Sonke JJ, Rossi M, Wolthaus J, van Herk M, Damen E, Belderbos J. Frameless stereotactic body radiotherapy for lung cancer using four-dimensional cone beam CT guidance. Int J Radiat Oncol Biol Phys. 2009;74(2):567–74.

    Article  PubMed  Google Scholar 

  38. Bellec J, Arab-Ceschia F, Castelli J, Lafond C, Chajon E. ITV versus mid-ventilation for treatment planning in lung SBRT: a comparison of target coverage and PTV adequacy by using in-treatment 4D cone beam CT. Radiat Oncol. 2020;15(1):54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Tanabe Y, Ishida T. Quantification of the accuracy limits of image registration using peak signal-to-noise ratio. Radiol Phys Technol. 2017;10:91–4.

    Article  PubMed  Google Scholar 

Download references




The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Changzhou Social Development Project, National Natural Science Foundation of China, Natural Science Foundation of Jiangsu Province, Jiangsu Provincial Medical Key Discipline Construction Unit (Oncology Therapeutics (Radiotherapy)), Social Development Project of Jiangsu Provincial Key Research & Development Plan, General Project of Jiangsu Provincial Health Commission (Grant Numbers CE20235063, 62371243, BK20231190, JSDW202237, BE2022720, and M2020006).

Author information

Authors and Affiliations



NC participated in the design of the study, carried out the study, performed the statistical analysis, and drafted the manuscript; JD, ZW, HZ, SZ, LG, JS and KX: helped to carried out the study; XN: conceived and designed the study, edited and reviewed the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Xinye Ni.

Ethics declarations

Ethics approval and consent to participate

Not applicable. CT data were obtained from The Cancer Imaging Archive. (

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, N., Wang, Z., Ding, J. et al. A 4D-CBCT correction network based on contrastive learning for dose calculation in lung cancer. Radiat Oncol 19, 20 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: