Evaluation of early imaging response criteria in glioblastoma multiforme

Background Early and accurate prediction of response to cancer treatment through imaging criteria is particularly important in rapidly progressive malignancies such as Glioblastoma Multiforme (GBM). We sought to assess the predictive value of structural imaging response criteria one month after concurrent chemotherapy and radiotherapy (RT) in patients with GBM. Methods Thirty patients were enrolled from 2005 to 2007 (median follow-up 22 months). Tumor volumes were delineated at the boundary of abnormal contrast enhancement on T1-weighted images prior to and 1 month after RT. Clinical Progression [CP] occurred when clinical and/or radiological events led to a change in chemotherapy management. Early Radiologic Progression [ERP] was defined as the qualitative interpretation of radiological progression one month post-RT. Patients with ERP were determined pseudoprogressors if clinically stable for ≥6 months. Receiver-operator characteristics were calculated for RECIST and MacDonald criteria, along with alternative thresholds against 1 year CP-free survival and 2 year overall survival (OS). Results 13 patients (52%) were found to have ERP, of whom 5 (38.5%) were pseudoprogressors. Patients with ERP had a lower median OS (11.2 mo) than those without (not reached) (p < 0.001). True progressors fared worse than pseudoprogressors (median survival 7.2 mo vs. 19.0 mo, p < 0.001). Volume thresholds performed slightly better compared to area and diameter thresholds in ROC analysis. Responses of > 25% in volume or > 15% in area were most predictive of OS. Conclusions We show that while a subjective interpretation of early radiological progression from baseline is generally associated with poor outcome, true progressors cannot be distinguished from pseudoprogressors. In contrast, the magnitude of early imaging volumetric response may be a predictive and quantitative metric of favorable outcome.


Background
In 1990, MacDonald et al [1] reported criteria for response assessment in glioma. Importantly, these criteria incorporated features such as time factors, degree of response of contrast-enhancing tumor using computedtomography (CT)-based uni-dimensional World Health Organization (WHO) criteria [2], neurologic status and the use of corticosteroids. Although these criteria have become widely accepted, they have also been criticized for their limitations [3][4][5], including their inability to accurately assess complex tumor morphology, account for non-tumor factors that may cause contrast enhancement, reaction to local therapies [6], and lack of applicability to non-enhancing tumors. Furthermore, the phenomenon of 'pseudoprogression' observed in patients receiving concurrent chemo-radiotherapy [7][8][9], as well as the dilemma of 'pseudo-response' seen with some of the newer anti-angiogenic therapies [5,10], adds to the already complex challenge of early assessment as these phenomena can confound image interpretations.
The accurate and early prediction of response and/or progression remains important for several reasons. In principle, this may enable more objective evaluation and comparison of novel therapies [5]. Secondly, such a biomarker could be utilized as a surrogate endpoint in clinical trials, thus conferring the distinct advantage of earlier response prediction and greater opportunity to amend or institute alternate therapies, especially given the aggressive nature of Glioblastoma Multiforme (GBM). Thirdly, earlier imaging predictors could potentially allow the conduct of smaller clinical trials requiring fewer patients, enable earlier judgements about promising versus futile therapies, more expeditious regulatory approval for new drugs, and ultimately earlier application and translation of new therapies into clinical practice [11,12]. In reality however, the evidence for reliable imaging response thresholds that could ultimately influence therapeutic decision making is still lacking. Currently, response criteria are largely based on the response evaluation criteria in solid tumors (RECIST) guidelines [13,14], which were developed to standardize reporting of outcomes of clinical trials. Most recently, the Response Assessment in Neuro-Oncology (RANO) working group provided updated criteria for high-grade gliomas [15], but as of yet there is not analysis of these criteria as they relate to clinical endpoints such as overall survival and progression-free survival.
We embarked on a study investigating early structural and functional magnetic resonance imaging (MRI) evaluations of response in patients with GBM. As a first step, we sought to investigate the predictive value of standard structural imaging response criteria one month after the delivery of concurrent chemotherapy and radiotherapy (RT). We also undertook exploratory analysis of alternate structural imaging response thresholds that may better correlate with and/or predict for clinical outcomes.

Methods
This study was approved by the institutional research ethics board. Patients were prospectively enrolled over a 26 month interval between May 2005 and July 2007. Patients were approached for enrollment if they met the following criteria: histological diagnosis of WHO grade IV Glioblastoma Multiforme; planned to receive definitive concurrent chemotherapy (temozolomide 75 mg/m 2 daily) and RT (60Gy in 30 fractions over 6 weeks) followed by adjuvant temozolomide chemotherapy (200 mg/m 2 × 5 days, monthly for 1 year or until progression); age ≥18 years; and ECOG performance status 0 or 1. Patients were excluded if they had contraindications to MRI, severe claustrophobia, or previous cranial radiotherapy. Relevant clinical and demographic information, including gender, age, diagnosis date, disease multifocality, surgical status, and radiation treatment dates were also captured.
MRI acquisition was performed at the following timepoints: Baseline (BL) post-operatively but prior to radiotherapy (RT); week 3 and week 6 of RT, 1 month after completion of RT, then every two months until evidence of clinical progression (defined below) or until 1 year of follow-up. All images were acquired using a 1. Post-contrast axial T1-weighted FSE images were rigidly co-registered (mutual information algorithm) with the RT planning CT datasets using a commercial radiotherapy treatment planning system (Pinnacle 3 v7.6c and 8.1, Philips Radiation Oncology Systems, Madison, WI). A radiation oncologist (ESK, NL) delineated tumor volumes on the T1-weighted post-contrast MR images as defined by areas of abnormal contrast enhancement reflecting residual or recurrent tumor, whilst excluding areas of post-surgical change. All volumes were then reviewed and finalized by a diagnostic radiologist (EY).
Both longest diameter (axial, coronal, and sagittal planes) and 3D volumetric data (cc) were computed at baseline (BL) and one-month post RT. Progression was then assessed via RECIST criteria, a 20% increase in the longest tumor diameter or a 40% increase in volume (sums of diameters or volumes were used in the case of multi-focal disease). Disease response as determined by RECIST was defined as a 65% decrease in volume or a 30% decrease in diameter. The MacDonald criteria were also evaluated: progressive disease defined as a 25% increase in the largest tumor area (cm 2 ) and responsive disease defined as a 30% decrease in largest area. Each patient was then classified in a binary fashion, as either having progressive or responsive disease based on these imaging thresholds. In addition, the following range of volume, area and diameter progression/response thresholds (see Additional File 1 - Table 1) were investigated including: Diameter -any increase; any increase or decrease up to > 5%, 15% or 30%; Area -any increase, any increase or decrease > 5%; 15% or 30%; and Volume -any increase, > 25% increase, any increase or decrease > 10%; 25%; or 50%.
Sensitivity and specificity values were calculated for each threshold using clinical progression-free survival at 1 year and overall survival at 2 years. Receiver-operator curves (ROC) were also constructed and statistical analysis was performed on the basis of work by DeLong et. al. [16]. Kaplan-Meier survival curves were created to analyze early progression, pseudoprogression and clinical progression as previously defined.

Results
A total of 30 patients were prospectively recruited. One patient refused study procedures after enrollment and another 4 patients did not undergo MRI examination one month after RT, leaving a total of 25 patients from whom imaging data was analyzed. It should be noted that demographical and follow-up data was taken from all 29 patients followed, however only the demographics of the 25 patients analyzed in this study are reported here. The median age of patients enrolled was 56 years (15 men, 10 women, range 46 -68 years). Five patients presented with multifocal disease. Tumor volumes at baseline ranged from 0.96 cm 3 to 143.2 cm 3 . The majority of patients were enrolled after gross total resection (n = 14), while 8 and 3 patients underwent partial resection and biopsy only, respectively.
The study cohort had a median follow-up of 26.3 months (range 13.3 -37.7 months). Median survival was high at 26.7 months and median time to clinical progression was 7.5 months (range 1.5 mo. -35.9 mo.).
A qualitative impression of any radiological progression (ERP) from baseline was found in 11 patients (40.0%), although only 2 patients strictly met the Mac-Donald criteria for progression at 1 month. Median survival for patients with ERP was significantly shorter than those without (11.2 mo vs. not reached, p < 0.001) (Figure 1). Of those with ERP, five were subsequently determined to have pseudoprogression (45.5% of ERP). Pseudoprogressors fared better than true early progressors, with a median survival of 19.0 months vs. 7.2 months (p < 0.001), (Figure 2) Sensitivity and specificity values were calculated for each response threshold, along with the positive and negative likelihood ratios (+LH; -LH) and the area-under-the-curve (AUC) for volume, area and diameter metrics (see Additional files 1, 2, 3 - Table 1, 2 and 3 respectively) in predicting for 2-year overall survival. The most sensitive tests were those measuring response, namely greater than 25% and 50% decreases in volume and 15% and 30% decreases in area and diameter. The most specific tests were those with the highest thresholds for progression, namely the RECIST criteria for both volume and diameter, and Mac-Donald criteria for area. In general, the volume measurements consistently performed better in every category than did the area and diameter metrics. This trend can also be visualized in Figure 3, receiver-operator curves plotting sensitivity vs. 1-specificity for the volume, area and diameter thresholds against overall survival at 2 years. The respective AUC's are 0.83 (0.59 -0.94 95% CI), 0.76 (0.53 -0.90 95% CI) and 0.69 (0.44 -0.84 95% CI) for volume, area and diameter respectively. These values were significantly different from chance (AUC of 0.5) for both volume and area (p < 0.005 and p < 0.05, respectively) but not for diameter (p > 0.1). When comparing amongst AUC's there was no significant difference between volume, area or diameter, with the greatest trend seen between volume and diameter (p > 0.1). The two most prognostic thresholds were > 15% decrease in area (3.33 +LH, 0.22 -LH) and > 25% decrease in volume (3.38 +LH, 0.21 -LH).   Figure 4 compares the receiver-operator characteristics of volume thresholds when predicting for progression-free survival at 1 year and overall survival at 2 years, demonstrating a trend that volume metrics to be more predictive of overall survival at 2 years than PFS at 1 year (AUC 0.83 vs. 0.70, p < 0.2). Figure 5 depicts Kaplan-Meier survival based on > 25% volume response at 1-month post RT nearing statistical significance (median survival 14.9 mo vs. not reached, p < 0.06).

Discussion
The early and accurate prediction of response to cancer treatment through the application of imaging criteria has several potential advantages. Ideally, imaging thresholds would provide utility as surrogates for outcome over and above the more traditional measures including overall and progression free survival [17], allowing for more expeditious conduct of clinical trials (both phase II [18] and III). This in turn could lead to the earlier institution of alternate therapies that show a beneficial effect on outcome. This is particularly important in dealing with aggressive and rapidly growing malignancies such as GBM.
Our results show that across all thresholds, both progressive and responsive, volume was uniformly more predictive of OS and PFS as seen by the right shift of the diameter ROC curve in Figure 3 (AUC of 0.83 vs. 0.76 vs. 0.69). However this was only a trend, not achieving significance amongst the three, the closest being volume vs. diameter (p > 0.15). This is similar to what Shah et al and Galanis et al have reported as correlations between uni and multi-dimensional radiological data in classifying progressive disease [19,20].
Furthermore, we show that a qualitative interpretation of any radiological progression one-month post therapy is associated with poor outcomes. However, this assessment is not acted upon clinically because of the confounding potential for treatment effect (or pseudoprogression), and our current inability (clinically and radiologically) to distinguish the two groups apriori. Many recent investigations have looked at the incidence and outcomes related to pseudoprogression [21][22][23][24]. Two Canadian studies by Roldan et al and Sanghera et al found rates of pseudoprogression of 40% and 32% respectively, and median survivals of 9.1 months and 31.2 months [22,23]. Another recent study by Gerstner et al found the pseudoprogression rate to be 57% with a median survival of 24.4 months, however their definition of pseuodprogression was at 3 months post-chemoRT [24], compared to 6 months in this study (and the two   referenced previously). All three showed no significant difference in OS between those with pseudoprogression and those without ERP. The results from this study were in keeping with other literature, including a rate of pseudoprogression of 38.5% and a median survival of 19.0 months. There was also no survival benefit between pseudoprogressors and those patients with no ERP, however pseudoprgressors showed improved OS compared with true early progressors (median survival 19.0 mo vs. 7.2 mo, p < 0.01), in keeping with the results of Roldan and Sanghera [22,23]. This demonstrates that there is sufficient qualitative information in early structural imaging to help guide clinicians in identifying progressive vs. responsive disease, with the exception of pseudoprogression, a topic which is now finding its way into the realm of imaging response criteria.
Historically, quantitative imaging criteria was first addressed in 1979 by the WHO in their published guidelines [2]. Since then, RECIST v1.0 [13] was published in 2000 with subsequent revised criteria (version 1.1) in 2009 [14]. Each was developed in an attempt to standardize reporting and facilitate comparison of imaging response assessment within the context of clinical oncology trials [4,11], however the results of this study show that the ability to assess progressive disease via quantitative radiological data remains limited. We found that each of the MacDonald, RECIST and additional thresholds, both uni and multi-dimensional, while specific for progressive disease were highly insensitive. This translated into a poor correlation with both PFS at one year and OS at two years (Figure 4), therefore limiting their usefulness as endpoint surrogates in clinical trials. One obvious contributor to this effect is the issue of pseudoprogression, in that pseudoprogressors will always negatively impact the accuracy of progressive thresholds based on standard structural imaging. Recent updates in response assessment criteria by the RANO group (Response Assessment in Neuro-Oncology) have included an effort to address these challenges by developing guidelines specific to the management of brain tumors including parameters for disease progression [15]. They suggest deferring the determination of progressive disease until ≥ 12 weeks after the completion of RT, except in the case of a new lesion outside of the radiation field and/or pathology proven progressive disease within the original tumor site. This recommendation aims to defer a change in clinical management until pseuodprogression can be more reliably ruled out. However, as was mentioned previously the OS between pseudoprogressors identified at one month after RT is not significantly different from non-progressors, and therefore if these patients could be identified more readily, the truly progressive patients would avoid an additional 8 weeks of ineffective chemotherapy.
In contrast, metrics for defining responsive disease performed much better in terms of both PFS and OS (Figure 4), likely in part because identifying responders is not marred by the issue of pseudoprogression and also because intuitively, those with large reductions in tumor burden will do better than those without. Clinical trials showing evidence of radiological response in GBM are therefore likely to have an increased clinical relevance in terms of survival endpoints, than those focusing on progressive characteristics. This is contrary to the findings of Galanis et al who found that progressive disease to be more predictive of OS. This difference is probably multi-factorial, for one a variety of gliomas were included as compared to solely GBM as in this study. Secondly, the there was a smaller portion of responders in the Galanis study, likely owing in part to the addition of temozolomide to the treatment regiment in this study. Finally, the timing of the imaging was later in the Galanis study, 4 months post-induction of therapy as compared to one month post-RT in our study. This difference in timing may decrease the incidence of pseudoprogressors as a fraction may have already declared themselves as true early progressors by that point, thereby alleviating their negative statistical impact on the progressive imaging thresholds. If true, it is conceivable that optimizing the timing of post-therapy followup imaging could aid in of identification of pseudoprogressors. Our study only looked at a single imaging time point, however further investigation into multiple imaging time points would certainly be insightful. It is unlikely however that the answer to this challenging issue lies in timing along, and as such an array of research continues to look for potentially more robust and quantifiable solutions. Many groups have looked at the use of functional imaging modalities to augment standard anatomical information. The addition of perfusion and diffusion-weighted techniques are thought to be able to provide information about tumor activity as a potential biomarker of tumor progression [25]. As such, the role of functional MRI (diffusion-weighted and perfusion) is the subject of intense clinical investigation [26][27][28][29][30][31][32][33], and recent findings have shown that diffusion-weighted imaging can predict for OS and time-to-progression in high grade glioma [29,30]. Furthermore, recent results by Tsien et. al. have shown promise in using dynamic susceptibility contrast magnetic resonace imaging (DSC-MRI) and parametric response maps measuring relative cerebral blood volume to identify pseudoprogression from true progression during therapy [34]. The role of FLT-PET and molecular imaging is also being actively investigated as a potential modality for imaging tumor progression [35,36].
A primary limitation of our study lies in a relatively small sample size of prospectively recruited Glioblastoma patients. Our work must be further validated in a larger cohort for meaningful interpretation and future clinical translation. Furthermore, as was mentioned above, our study only investigated a single imaging time point (one month post-RT), additional imaging would be useful determining if there is an optimal time point, and what that might be. Our study cohort had a significantly higher median survival (26.2 mo. 95% CI 13.7 -not reached) than expected from the literature (14.6 mo. 95% CI 13.2 -16.8 [37]). Finally, baseline imaging in the study was performed post-operatively, where resolving post-surgical changes may have been a potential confounding factor in the assessment of response. Strengths of this cohort include a typical and balanced population demographic in age, gender and size. Extent of surgery was also balanced with~50% undergoing gross total resection and the remainder having either partial total resection or biopsy alone. The extended length of follow-up (median 22 months) was also beneficial to this study.

Conclusion
We sought to evaluate early radiologic response criteria relevant to clinical outcomes in patients with GBM treated with concurrent chemotherapy and radiotherapy, and found that a qualitative clinical impression of radiologic progression at one month after therapy was predictive of poor outcomes despite the confounding factor of treatment effect (pseudoprogression). Quantitatively, we found that response metrics were more indicative of outcome than progressive indices and that there was a trend of volumetric data outperforming diameter or area thresholds, however significance was not reached in this case. Further investigation will focus on adding additional imaging time points as well as adjunct functional imaging to better understand progression features that may have a stronger predictive value than structural geometric indices alone.

Additional material
Additional file 1: Table 1: Sensitivity and specificity metrics in predicting 2 year overall survival according to various volume thresholds, from baseline to one month after RT.
Additional file 2: Table 2: Sensitivity and specificity metrics in predicting 2 year overall survival according to various area thresholds, from baseline to one month after RT.
Additional file 3: Table 3: Sensitivity and specificity metrics in predicting 2 year overall survival according to various diameter thresholds, from baseline to one month after RT.