Multiple comparisons permutation test for image based data mining in radiotherapy

Chen, Chun; Witte, Marnix; Heemsbergen, Wilma; Herk, Marcel van

doi:10.1186/1748-717X-8-293

Methodology
Open access
Published: 23 December 2013

Multiple comparisons permutation test for image based data mining in radiotherapy

Chun Chen¹,
Marnix Witte¹,
Wilma Heemsbergen¹ &
…
Marcel van Herk¹

Radiation Oncology volume 8, Article number: 293 (2013) Cite this article

5748 Accesses
65 Citations
2 Altmetric
Metrics details

Abstract

Comparing incidental dose distributions (i.e. images) of patients with different outcomes is a straightforward way to explore dose-response hypotheses in radiotherapy. In this paper, we introduced a permutation test that compares images, such as dose distributions from radiotherapy, while tackling the multiple comparisons problem. A test statistic T_max was proposed that summarizes the differences between the images into a single value and a permutation procedure was employed to compute the adjusted p-value. We demonstrated the method in two retrospective studies: a prostate study that relates 3D dose distributions to failure, and an esophagus study that relates 2D surface dose distributions of the esophagus to acute esophagus toxicity. As a result, we were able to identify suspicious regions that are significantly associated with failure (prostate study) or toxicity (esophagus study). Permutation testing allows direct comparison of images from different patient categories and is a useful tool for data mining in radiotherapy.

Introduction

When planning a radiotherapy treatment, a compromise is made between coverage of the target and exposure of Organs At Risk (OAR). While the dose to the designated target is generally uniform and homogeneous between patients, the dose to surrounding structures can be highly variable, depending on patient geometries, tumor locations, and treatment techniques. Such heterogeneous incidental dose distributions in patients might “accidentally” lead to different treatment outcomes regarding tumor control (e.g. if subclinical disease is important) or normal tissue toxicity. Therefore, applying data mining techniques to incidental dose distributions gives the possibility to explore dose patterns that are associated with clinical outcomes.

The purpose of introducing data mining in radiotherapy is to explore hypotheses for dose-response relationships. In cancer radiotherapy treatment, variations in stem cells, tumor microscopic disease and radiosensitivity distributions can be expected to affect dose-response relationships. Unfortunately, many of them are unknown. Data mining on incidental dose may yield suspicious anatomical features from which –based on biological or clinical considerations– hypotheses for dose-response relationships can be formulated. If validated, such dose-response relationships would eventually provide evidence for better treatment planning, such as refined knowledge of the clinical target volume (CTV), optimal dose painting inside the GTV and more effective sparing of OARs.

Several studies have focused on exploring dose-response relationships from a different perspective than the conventional dose volume histogram (DVH) based method[1–3]. These methods include either exploring the characteristics of dose distributions (e.g. eccentricity, homogeneity), or applying an advanced classifier (e.g. neural network). However, these methods are not easily applicable in the situation where the hypothesized region is a priori not known. Directly comparing dose distributions is then a straightforward method of exploring dose-response relationships. Since the dose at each voxel is compared without prior anatomical or geometrical based hypothesis, a voxel-by-voxel based testing is suitable for hypothesis generation, i.e., to localize suspicious regions. In a prostate study[4], the 3D dose prescribed to prostate patients were registered to an anatomy grid and tested voxel-by-voxel (t-test) for relations with failure. Results indicate that a cluster of voxels outside the prostate yield a p-value of less than 0.05. However, obtaining a p-value at every voxel is not yet the complete result. Since a large number of voxels were tested simultaneously, it is likely that the null hypothesis was incorrectly rejected at some voxels (type I errors), and this is the so called multiple comparisons problem.

The aim of this paper is to introduce a multiple comparisons permutation test to compare images between patients in radiotherapy studies. We begin with describing the methodology. Afterwards, we demonstrate the validity of this method with simulations. Finally, we give two examples of applying permutation test in radiotherapy: one study that relates dose to failure for prostate cancer patients[4] and another study that relates dose to acute esophagus toxicity for non-small cell lung cancer (NSCLC) patients.

Materials and methods

A permutation test involves five steps: 1) register images from different patients, 2) form a null hypothesis, 3) define a scalar test statistic, 4) generate random samples by permuting the true labels of the patients and extract the test statistic from each random sample, 5) calculate the adjusted p-value from the distribution of the test statistic. Thus, instead of a p-value for every voxel, this test gives a single p-value to describe the difference between two imaging datasets.

Suppose we observe a sample of patients with two outcomes: non-event (N) and event (E). These patients are considered to be representative for the entire population. To compare the dose distributions between the two groups, the first step is to register the dose distributions of all patients into the same grid, through an image registration method[5, 6]. The null hypothesis then states that there be no difference in dose distributions between the N and E labeled groups. In the following part, we introduce a test statistic T_max, and describe the permutation procedure to compute the adjusted p-value.

Test statistic

In order to compare the dose distribution for a sample i (randomly drawn from the study population) that includes two outcome groups N and E, the most straightforward way is to compute their average dose difference at each voxel, resulting in a dose difference map. To account for multiple comparisons, we can choose the maximum value of such a dose difference map as a single number test statistic. However, the maximum will not be consistent over all random samples (e.g. i = 1,…,1000), because it is highly sensitive to the variation or standard deviation (SD) of the dose difference at each voxel over all random samples. For instance, if voxel 1 has an average dose difference of 10 Gy in sample i but the SD of the dose difference is 10 Gy over all samples, while voxel 2 has an average dose difference of 8 Gy in sample i and the SD of the dose difference is 1 Gy over all samples, the maximum dose difference for sample i would be 10 Gy as derived from voxel 1. In fact, it is more likely that in sample i, the large 10 Gy dose difference for voxel 1 is due to chance, because it has a large variation over all samples. To account for this effect, the average dose difference d_i,k for sample i,(i = 1,…,N_p) between the E and N groups at voxel k,(k = 1,…,N_v) should be normalized into T_i,k according to an estimate of its SD:

\begin{align} d_{i, k} = μ_{E, i, k} - μ_{N, i, k} \end{align}

(1)

\begin{align} T_{i, k} = \frac{d_{i, k}}{σ_{k}} \end{align}

(2)

where μ_E,i,k and μ_N,i,k are the average dose at voxel k for group E and N in sample i, and σ_k is the standard deviation of d_i,k over N_p samples. σ_k is computed over the random samples generated from the permutation procedure, as described in the following part. As a result, we obtain a normalized dose difference map (or T_i,k map) for sample i. The test statistic T_max,i is then selected as the maximum value of the T_i,k map. Unlike a voxel-by-voxel based test, T_max,i gives a single number that summarizes the discrepancy of the dose distributions between the two label groups, rather than the discrepancy of a particular voxel. Therefore, T_max accounts for multiple comparisons. Clearly, T_max is not the only option for extracting a single value test statistic from the T_k map. Other test statistics like the x percentile, are also eligible[7]. However, T_max is often chosen for its strong control over Type 1 errors[8].

Permutation test

We then introduce a permutation procedure that generates random samples under the null hypothesis, such that the distribution of T_max,i is determined. A permutation test relies on the rearrangement of the outcome labels. Under the null hypothesis that there is no significant dose difference between group E and N, labels E and N are exchangeable. Thus randomly permuting the labels of the observed sample gives a new randomized sample. For example, in a sample of 5 patients with 3 N and 2 E labels, there are 10 possible label sets, as shown in Table1. If the first case is the true labelling that we observed, the other 9 labellings are assigned by permutation. In practice, the numbers of both E and N labels are large in the sample, leading to an extremely large and unfeasible number of possible permutations. However, it has been shown in[9] that when the number of patients in the observed sample is large, 1000 permutations can effectively approximate the distribution for the null hypothesis. The permutation test contains the following steps: For every permuted sample i, a d_i,k is computed for every voxel k. After 1000 permutations, the standard deviation σ_k of {d_i,k,i = 1,…,1000} at voxel k is computed and the normalized dose difference maps T_i,k and ${\tilde{T}}_{k}$ are generated for every permuted sample i and the observed sample, respectively. Subsequently, the maximum T_max,i of T_i,k is extracted for every permuted sample i and the maximum ${\tilde{T}}_{max}$ of ${\tilde{T}}_{k}$ is extracted from the observed sample. Thus, after 1000 permutations, we obtain a distribution of T_max,i under the null hypothesis. Finally, the adjusted p-value is computed as the proportion of the permutation samples that yield a higher T_max value than the ${\tilde{T}}_{max}$ observed with the true labels. If the adjusted p-value is smaller than significance level α (e.g. 0.05), we reject the null hypothesis. Furthermore, the (1-α) percentile of the T_max,i distribution gives a threshold value T^∗. In the observed normalized dose difference map ${\tilde{T}}_{k}$ , voxels higher than T^∗ show significant dose difference between E and N groups. The mathematical formulation of the permutation test is presented in more detail in Appendix A: the multiple comparisons permutation test.

Table 1 The 10 possible label combinations of 3 Ns and 2 Es

Full size table

Simulation

To demonstrate the permutation test, we conducted simulations on two groups of artificially generated dose images (with 128 × 128 pixels). Each group had 50 images. As seen in Figure1(a) and1(b), group E had a homogeneous average dose of 60 Gy, while for group N, a block (50 × 50) of average 58 Gy was located inside the dose image. At each pixel, we simulated the additive dose variation as a normal distribution N ∼ (0,σ). Figure1(c) illustrates the simulated values for σ. Large variations were simulated in the upper part (σ_max = 10 Gy), while small variations were in the lower part (σ_min = 1 Gy). After adding noise to the average dose, a smooth Gaussian filter was applied to each patient’s dose image. The purpose of generating spatially correlated variation and applying Gaussian filter is to mimic actual spatially correlated incidental dose in planning. We first illustrate how the the multiple comparison permutation test works step-by-step. Afterwards, we conduct a sensitivity analysis to show how the result changes according to the average dose difference and the sample size, compared to the voxel-by-voxel based t-test.

First, a voxel-by-voxel based t-test was applied to these two groups. Pixels obtaining a p < 0.05 are illustrated in Figure2(a), showing a big central region as well as many isolated spots. These isolated spots obtained a low p-value by chance. For the permutation test, Figures2(b)–2(d) show the average dose difference map for the observed sample, the SD of the average dose difference from 1000 permutation random samples, and the consequent normalized average dose difference map for the observed sample, respectively. The distribution of T_max,i from 1000 random samples is illustrated in Figure2(e). The ${\tilde{T}}_{max}$ from the observed sample is much larger than any of the T_max,i, thus the adjusted p-value is zero, i.e. there is a highly significant dose difference between the two groups. The 95% of the T_max,i distribution gives a threshold T^∗, by applying it to the ${\tilde{T}}_{k}$ map, we obtained the second region illustrated in Figure2(a). This region is smaller than that from a voxel-by-voxel based test. Additionally, the isolated spots, which would probably falsely reject the null hypothesis by the voxel-by-voxel based test, are excluded. Therefore, compared to the voxel-by-voxel based test, permutation test provides a statistically stronger result, in terms of an adjusted p-value and a more accurate region with dose differences.

Furthermore, Figure3 shows the simulation results by increasing the average dose difference, from 0.5 Gy, 2 Gy (example above) to 10 Gy, given the other parameters fixed as the above example. The larger the average dose difference is, the more regions were detected for both multiple comparisons test and t-test, especially at the top of the square region, where the dose variance is higher (Figure2(c)). Multiple comparison test becomes more conservative than the t-test, when the average dose difference is small. However, t-test always ends up with more isolated false positive spots (type-I error).

Similarly, Figure4 shows the simulation results by increasing the number of patients, from n = 5, n = 50 to n = 100. Results show that multiple comparison test becomes more conservative than the t-test, especially when the sample size is small. However, in any cases, t-test ends up with more isolated false positive spots (type-I error).

Applications

Study I: prostate

We applied the permutation test on data used by[4]. The aim of this study was to relate dose distributions with failure in prostate cancer patients. We selected a group of 67 patients with a relatively higher risk for extraprostatic disease, estimated according to T-stage, iPSA and Gleason score or differentiation grade[10]. These patients were treated in Erasmus Medical Center, The Netherlands, and they were included in the Dutch Phase III trial (CKVO 96-10) with dose randomized between 68 Gy and 78 Gy[11]. The Ethical Committee of each institution approved the protocol. Patients mainly had tumors of stage T3b and were treated to the delineated prostate and the seminal vesicles. The extra boost of 10 Gy had a 5 mm margin to the CTV (except towards the rectum, where a 0 mm margin was applied). For the 68 Gy PTV, a 10 mm margin was applied. In this study, the failure was biochemical (PSA nadir plus 2)[12] or clinical (locoregional or distant progression or start of salvage hormone therapy), determined at a fixed 4 year endpoint. As a result, 37 failure patients and 30 non-failure patients were eligible for analysis. Delineations from the planning CT and the planned dose distributions were collected for each patient. Firstly, dose distributions of all patients were registered into a dose grid as described in[4]. In short, voxels correspond if their direction with respect to the center of mass (CM) is the same, and their distance to the surface in this direction is the same. For voxels inside the prostate, corresponding voxels have the same fractional distance between the CM and the surface. The registration identifies anatomical points at locations relative to the delineated prostate surface. The choice for this registration procedure is an important part of the dose-effect hypothesis, and was based on the suspicion that extracapsular extension might have affected outcome. The resulting grid has a dimension of 31 × 35 × 34, resulting in N_v = 36890 dose voxels. The null hypothesis is that there is no dose distribution difference between the failure and the non-failure patients. The multiple comparison permutation test was applied to the registered dose maps.

Study II: esophagus

We applied the permutation test on a esophagus toxicity study. The aim of this study is to relate dose distributions on the esophagus surface with acute esophagus toxicity (AET) in NSCLC patients. We selected 185 NSCLC patients treated in Netherlands Cancer Institute (NKI) from 2008 to 2010 with concurrent chemotherapy combined with IMRT. The RT dose was 66 Gy in 24 fractions. The concurrent chemotherapy included a daily low dose cisplatin[13]. AET was scored according to the Common Toxicity Criteria 3.0. Toxicity was scored weekly from baseline until 3 weeks after RT. Afterwards, patients were checked every 2 months or more frequently if necessary. Of the 185 patients, 76 had no or grade 1 AET; 67 patients developed grade 2 and 42 patients had grade 3; Grade 4 or 5 AET did not occur. The delineated esophagus from the planning CT and the planned dose distributions were retrospectively collected for each patient, allowing a 2D esophagus surface dose map (ESDM) to be computed. For each patient, dose was sampled on every slice of the CT scan (3 mm thickness) at 36 fixed orientations along the delineated esophagus contour to the center, from 0 to 360 degrees with 10 degrees increment. The 0 degree angle was always chosen in the Right-Left direction. (In our experience the esophagus is always star shaped, i.e. the full contour can always be seen from a single centreline.) The same sampling procedure was then done through all the m slices where the esophagus was delineated. As a result, the ESDM contains m × 36 pixels for every patient (m varies from patient to patient). ESDMs of all patients were registered such that the pixel with the highest dose is in the center of the 2D dose map, alowing translations along and rotations around the length of the esophagus. The choice for this mapping was based on the assumption that the length and the circumference of the high dose region on the esophagus surface is associated with AET, irrespective of its anatomical location. Permutation tests were applied to find differences between grade 0–1 and grade 2–3, and between grade 0–2 and grade 3 AET.

Results

Results of study I

The distribution of T_max,i,(i = 1,…,1000) from the 1000 random permutations under the null hypothesis is shown in Figure5. The proportion of T_max,i that are larger than the observed value ( ${\tilde{T}}_{max} = 3.81$ ) gives an adjusted p-value of 0.02, i.e., there is a significant dose difference between the non-failure and the failure patients for this patient group at α = 0.05 significance level. Furthermore, the 95 percentile of the T_max,i distribution gives the threshold T^∗ = 3.56. In Figure6, voxels above this threshold in the observed ${\tilde{T}}_{k}$ map are marked, overlaid on the CT scan of the standard anatomy grid. This region is situated in the obturator region and suggests nodal involvement, but does not correspond to the presumed extracapsular extension. The significant region is much smaller than that obtained through a voxel-by-voxel based t test[4]. The adjusted value p = 0.02 was significant for the test statistic T_max. Compared to the previous voxel-by-voxel based testing method, the permutation test gives a statistically stronger conclusion that there is indeed a difference in the dose distribution outside the prostate between failure and non-failure patients.

Results of study II

An example of generating an ESDM is illustrated in Figure7. The average dose maps of the registered ESDMs for all 185 patients as well as each toxicity grade subgroup are illustrated in Figure8. The differences in the isodose lines for patients with grade 0–1 until grade 3 AET imply that for patients of increasing AET grade, their average esophagus surface that received high dose increases: All isodose lines expand along the length of the esophagus for patients with more complications. Specifically, the 50 Gy and 60 Gy isodose lines are expanding through both the length and the circumference.

The T_max distribution, adjusted p-value and T^∗ threshold were computed in the same way as in study 1. The adjusted p-value for the dose distribution between patients with AET grade 0–1 and grade 2–3 is p < 0.001, showing a significant dose difference at α = 0.05 significance level. The region with significant dose difference is illustrated in Figure9(a). Note that the region refers to a certain shape of dose distribution (as the highest doses were mapped to the same point), rather than a specific anatomical area on the esophagus. Considering the esophagus as a tube, this region includes a full band covering the high dose area (35 to 65 Gy) and a sub-band covering the moderate dose region (10 to 30 Gy). Similarly, the adjusted p-value for the dose distribution of patients between AET grade 0–2 and grade 3 is p = 0.002. The region with significant dose difference is illustrated in Figure9(b), it appears to be part of the significant region in Figure9(a), showing a high dose (50 to 60 Gy) area with extra length and circumference coverage.

Discussion

In this paper, we introduced multiple comparison permutation testing for voxel based data mining in radiotherapy and we demonstrated the test in two studies. For both studies we were able to locate regions where dose significantly associates with the outcome. In the prostate study, we were able to provide strong statistical evidence for a dose difference between non-failure and failure patients, confirming a difference located in the obturator region that could be suspicious for subclinical disease. In the esophagus study, both regions to predict grade ≥2 and grade 3 are consistent with the V50 dose volume histogram (DVH) parameter as derived in study[13]. Grade 2 seems to be caused by high dose (≥50 Gy) and the length/circumference coverage of low dose, while the length/circumference coverage of high dose (around 50 Gy) plays a role in severe AET of grade 3. This result suggests that using the length and circumference parameters may be a more sensitive method to predict AET compared to DSHs.

A broadly recognized method to address the problem of multiple comparisons is the Bonferroni correction[14]: if n independent hypotheses are tested, each individual hypothesis is tested at the 1/n times of the original statistical significance level when tested for only one hypothesis. However, this correction is not straightforwardly applicable to voxel maps, since there can be millions of voxels that are highly correlated in space. Hypothesis testing on images was first conducted through parametric random field methods[15]: t-tests are conducted at every voxel and the distributional results for continuous random fields are used to identify regions that are significant. Contrary to these parametric methods, non-parametric permutation tests on voxel maps were introduced by[8, 16]. Two test statistics are often used: a single maxima threshold and the supra-threshold cluster. A single maxima threshold is the T_max as we used in our study. In[16], T_max was applied in a permutation test to localize the region of visual cortex sensitive to motion on 3D PET imaging[17] and to analyze the order effects in working memory using fMRI[18]. Contrarily, a supra-threshold cluster test assesses the size of the connected supra-threshold regions for significance. As a result, the power to localize regions was reduced. Since the goal of data mining in radiotherapy was to localize suspicious regions, we recommend using T_max as test statistic.

The incidental dose essentially comes from the variations of dose planning for some un-targeted organs, and it’s a good thing to explore. Whether or not we are able to detect a significant dose difference depends on two aspects: 1) the average incidental dose between 2 groups and 2) the variance of the incidental dose. Statistically, a higher average incidental dose difference and a lower variance of each group facilitates yielding a true positive result. On the other hand, registration error is a bad thing. Inaccuracies in the registration, or an inappropriate choice for the registration method, could prevent the method to identify an existing dose effect relation, thus reducing the power of the statistical test. While it seems less likely that some particular registration procedure or inaccuracy therein generates a false positive result from dose variations which only consist of noise, any dose effect relation which is subsequently derived should be verified on an independent data set. Depending on the specific anatomical properties and the expected dose-effect parameters, the requirements for the accuracy of the registration vary. For instance, if the data mining is conducted in regions with small structures (e.g. head and neck), a sophisticated registration procedure may be required to find significant results. Contrarily, if we want to explore a large volume of low gradient dose distributions (e.g. lung), a loose registration may suffice. If we aim to explore dose distributions surrounding one structure, the registration accuracy is then focused on regions close to this structure. Therefore, a registration strategy should be chosen in advance based on the type of hypothesis that we want to explore. Afterwards, significant regions can be anatomically identified, and subjected to biological and clinical interpretation. Such consideration can then guide further efforts to derive dose-response relationships through conventional modeling methods.

Permutation testing is a useful tool to explore dose patterns from incidental dose distributions. Instead of analyzing dose-response effect, we intend to use permutation testing as a preliminary step to identify suspicious regions for hypothesis generation. Permutation testing takes into account multiple comparisons by yielding an adjusted p-value and gives visually straightforward suspicious regions. Another advantage of such method is that it is non-parametric. Thus, this test does not depend on the assumption of Normal distribution, which is often not true in the case of incidental dose in the planning. Permutation testing is practically useful and important in radiotherapy, especially in the era where adaptive radiotherapy is on the agenda, but we still have only limited knowledge about tumor stem cells, microscopic disease, radiosensitivity, etc. Permutation testing helps us to maximally explore dose-response relationships from the incidental dose in the clinical data.

Conclusions

We introduced a permutation test that deals with hypothesis testing on images and illustrated this method in a synthetic dataset, and in clinical datasets from a prostate and an esophagus study. Compared to a voxel-by-voxel based test, the permutation method reduces the rate of false positives. Permutation testing is a useful tool to identify hypotheses for dose-response relationships and tackle the multiple comparisons problem.

Consent

Written informed consent was obtained from the patient for the publication of this report and any accompanying images.

Appendix A: the multiple comparisons permutation test

Suppose we observed a sample of patients with two outcomes: non-event (N) and event (E). Every patient has a dose distribution of N_v voxels and they are all registered to an identical grid. To compare the dose distribution between the two groups, the permutation test is conducted as follows:

(i)
Compute the average dose difference $\tilde{d_{k}}$ between E and N groups in the observed sample:
$\tilde{d_{k}} = {\tilde{μ}}_{E, k} - {\tilde{μ}}_{N, k}, k = 1, \dots, N_{v},$
(A.1)

where ${\tilde{μ}}_{E, k}$ and ${\tilde{μ}}_{N, k}$ are the average dose value at voxel k for group E and N, respectively.
(ii)
Permute the labelling of the observed sample and compute the average dose difference. Repeat this process for N _p times:
$d_{i, k} = μ_{E, i, k} - μ_{N, i, k}, k = 1, \dots, N_{v}, i = 1, \dots, N_{p},$
(A.2)

where μ_E,i,k and μ_N,i,k are the average dose value at voxel k for group E and N in the i^th permuted random sample.
(iii)
Compute the standard deviation for every voxel k over all N _p random samples:
$σ_{k} = \sqrt{\frac{1}{N_{p} - 1} \sum_{i = 1}^{N_{p}} (d_{i, k} - {\bar{d}}_{k})}, where {\bar{d}}_{k} = \frac{1}{N} \sum_{i = 1}^{N_{p}} d_{i, k} .$
(A.3)
(iv)
Compute the locally normalized dose difference for every voxel in every random sample as well as the observed sample:
$\begin{align} T_{i, k} = \frac{d_{i, k}}{σ_{k}}, \end{align}$
(A.4)

$\begin{align} {\tilde{T}}_{k} = \frac{{\tilde{d}}_{k}}{σ_{k}}, k = 1, \dots, N_{v} . \end{align}$
(A.5)
(v)
Compute the test statistic T _max for every resampling as well as the true labeling sample:
$\begin{align} T_{max, i} = max (T_{i, k}), \end{align}$
(A.6)

$\begin{align} {\tilde{T}}_{max} = max ({\tilde{T}}_{k}), k = 1, \dots, N_{v} . \end{align}$
(A.7)
(vi)
Compute the adjusted p-value:
$p = Pr [T_{max, i} > {\tilde{T}}_{max}] .$
(A.8)
(vii)
Compare the adjusted p-value with the significance level α. If p < α, reject the null hypothesis, otherwise the null hypothesis can not be rejected.
(viii)
Compute the T ^∗ as (1 - α) percentile of T _max,i = 1,…,N _p. Regions in ${\tilde{T}}_{k}$ that are above T ^∗ show significant difference between E and N groups.

References

Buettner F, Gulliford SL, Webb S, Partridge M: Using dose-surface maps to predict radiation-induced rectal bleeding: a neural network approach. Phys Med Biol 2009, 54: 5139-5153. 10.1088/0031-9155/54/17/005
Article PubMed Google Scholar
Buettner F, Gulliford SL, Webb S, Sydes MR, Dearnaley DP, Partridge M: Assessing correlations between the spatial distribution of the dose to the rectal wall and late rectal toxicity after prostate radiotherapy: an analysis of data from the mrc rt01 trial (isrctn 47772397). Phys Med Biol 2009, 54: 6535-6548. 10.1088/0031-9155/54/21/006
Article PubMed Google Scholar
El Naqa I, et al.: Exploring feature-based approaches in pet images for predicting cancer treatment outcomes. Pattern Recognit 2009, 42: 1162-1171. 10.1016/j.patcog.2008.08.011
Article PubMed PubMed Central Google Scholar
Witte MG, Heemsbergen WD, Bohoslavsky R, Pos FJ, Al-Mamgani A, Lebesque JV, van Herk M: Relating dose outside the prostate with freedom from failure in the dutch trial 68 gy vs. 78 gy. Int J Radiat Oncol Biol Phys 2010, 77: 131-138. 10.1016/j.ijrobp.2009.04.040
Article PubMed Google Scholar
Castadot P, Lee JA, Parraga A, Geets X, Macq B, Grégoire V: Comparison of 12 deformable registration strategies in adaptive radiation therapy for the treatment of head and neck tumors. Radiother Oncol 2008, 89: 1-12. 10.1016/j.radonc.2008.04.010
Article PubMed Google Scholar
Goshtasby AA: 2-D and 3-D Image Registration: for Medical, Remote Sensing, and Industrial Applications. Hoboken: Wiley-Interscience; 2005. ISBN 0471649546
Google Scholar
Chen C, Witte MG, Heemsbergen WD, van Herk M: Significance testing in dose difference maps. In Proc. of XVIth International Conference on the use of Computers in Radiation Therapy. Amsterdam; 2010.
Google Scholar
Holmes AP, Blair RC, Watson JD, Ford I: Nonparametric analysis of statistic images from functional mapping experiments. J Cereb Blood Flow Metab 1996, 16: 7-22.
Article CAS PubMed Google Scholar
Edgington ES: Approximate randomnization tests. J Psychol 1969, 72: 143-149. 10.1080/00223980.1969.10543491
Article Google Scholar
Partin AW, Yoo J, Carter HB, Pearson JD, Chan DW, Epstein JI, Walsh PC: The use of prostate specific antigen, clinical stage and gleason score to predict pathological stage in men with localized prostate cancer. J Urol 1993, 150: 110-114.
CAS PubMed Google Scholar
Peeters ST, Heemsbergen WD, van Putten W L, Slot A, Tabak H, Mens JW, Lebesque JV, Koper PC: Acute and late complications after radiotherapy for prostate cancer: results of a multicenter randomized trial comparing 68 gy to 78 gy. Int J Radiat Oncol Biol Phys 2005, 61: 1019-1034. 10.1016/j.ijrobp.2004.07.715
Article PubMed Google Scholar
Roach M, Hanks G, Thames H, Schellhammer P, Shipley WU, Sokol GH, Sandler H: Defining biochemical failure following radiotherapy with or without hormonal therapy in men with clinically localized prostate cancer: recommendations of the rtog-astro phoenix consensus conference. Int J Radiat Oncol Biol Phys 2006, 65: 965-974. 10.1016/j.ijrobp.2006.04.029
Article PubMed Google Scholar
Kwint M, Uyterlinde W, Nijkamp J, Chen C, de Bois J, Sonke JJ, van den Heuvel M, Knegjens J, van Herk M, Belderbos J: Acute esophagus toxicity in lung cancer patients after intensity modulated radiation therapy and concurrent chemotherapy. Int J Radiat Oncol Biol Phys 2012,84(2):e223-e228. 10.1016/j.ijrobp.2012.03.027
Article PubMed Google Scholar
Abdi H: Bonferroni and Sidak corrections for multiple comparisons. In Encyclopedia of Measurement and Statistics. Edited by: Salkind NJ. Sage; 2007.
Google Scholar
Friston KJ, Holmes AP, Worsley KJ, Poline JP, Frith CD, Frackowiak RSJ: Statistical parametric maps in functional imaging: A general linear approach. Human Brain Mapp 1994, 2: 189-210. 10.1002/hbm.460020402
Article Google Scholar
Nichols TE, Holmes AP: Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human Brain Mapp 2002, 15: 1-25. 10.1002/hbm.1058
Article Google Scholar
Watson JDG, Myers R, Frackowiak RSJ, Hajnal JV, Woods RP, Mazziotta JC, Shipp S, Zeki S: Area V5 of the human brain: evidence from a combined study using positron emission tomography and magnetic resonance imaging. Cereb Cortex 1993, 3: 79-94.
Article CAS PubMed Google Scholar
Marshuetz C, Smith EE, Jonides J, DeGutis J, Chenevert TL: Order information in working memory: fMRI evidence for parietal and prefrontal mechanisms. J Cogn Neurosci 2000,12(Suppl 2):130-144.
Article PubMed Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the Dutch center for translational molecular medicine (CTMM) for supporting the project “Personalized chemo-radiation of lung and head-and-neck cancer” (AIRFORCE).

Author information

Authors and Affiliations

Department of Radiation Oncology, The Netherlands Cancer Institute, Plesmanlaan 121, 1066CX, Amsterdam, The Netherlands
Chun Chen, Marnix Witte, Wilma Heemsbergen & Marcel van Herk

Authors

Chun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Marnix Witte
View author publications
You can also search for this author in PubMed Google Scholar
Wilma Heemsbergen
View author publications
You can also search for this author in PubMed Google Scholar
Marcel van Herk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun Chen.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors participated in the study and drafted the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Authors’ original file for figure 13

Authors’ original file for figure 14

Authors’ original file for figure 15

Authors’ original file for figure 16

Authors’ original file for figure 17

Authors’ original file for figure 18

Authors’ original file for figure 19

Authors’ original file for figure 20

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Chen, C., Witte, M., Heemsbergen, W. et al. Multiple comparisons permutation test for image based data mining in radiotherapy. Radiat Oncol 8, 293 (2013). https://doi.org/10.1186/1748-717X-8-293

Download citation

Received: 06 August 2013
Accepted: 24 November 2013
Published: 23 December 2013
DOI: https://doi.org/10.1186/1748-717X-8-293

Multiple comparisons permutation test for image based data mining in radiotherapy

Abstract

Introduction

Materials and methods

Test statistic

Permutation test

Simulation

Applications

Study I: prostate

Study II: esophagus

Results

Results of study I

Results of study II

Discussion

Conclusions

Consent

Appendix A: the multiple comparisons permutation test

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Radiation Oncology

Contact us