Next-generation sequencing: hype and hope for development of personalized radiation therapy?

The introduction of next-generation sequencing (NGS) in the field of cancer research has boosted worldwide efforts of genome-wide personalized oncology aiming at identifying predictive biomarkers and novel actionable targets. Despite considerable progress in understanding the molecular biology of distinct cancer entities by the use of this revolutionary technology and despite contemporaneous innovations in drug development, translation of NGS findings into improved concepts for cancer treatment remains a challenge. The aim of this article is to describe shortly the NGS platforms for DNA sequencing and in more detail key achievements and unresolved hurdles. A special focus will be given on potential clinical applications of this innovative technique in the field of radiation oncology.


Introduction
Recent technological advances in DNA sequencing with greater speed and resolution at lower costs has provided new insights in cancer genetics. The next-generation sequencing (NGS) technology is tremendously facilitating the in-depth genome-wide search for genetic alterations which might significantly contribute to aggressive and/or treatment-resistant phenotypes of cancers, thereby establishing the basis for the development of molecularly targeted therapy. High-throughput sequencing of distinct cancer entities in large-scale projects has improved our understanding of the disease-specific mutational patterns [1][2][3][4] and the 'Darwinian' selection forces involved in subclonal tumor evolution resulting in highly heterogeneous tumors. Initially, NGS has been developed for detection of DNA-based alterations. However, it can also assess other molecular aberrations, including those in the epigenome [5,6], transcriptome [7,8] or RNAome [9]. In this review we will only briefly discuss the technical principle of NGS for DNA sequence analysis. For more detailed information we would like to refer the reader to the excellent reviews of Metzker et al. [10], Meyerson et al. [2] and Wong et al. [11]. We will instead focus on key achievements in cancer genetics and potential clinical applications of this innovative technique in the field of radiation oncology.

The advantages of NGS
Next-generation sequencing has rapidly been evolving within the last decade [10]. This high-throughput method offers several advantages over classical capillary electrophoresis-based 'Sanger' sequencing including increased speed and resolution at dramatically lower costs compared to the older sequencing technologies. To illustrate the remarkable progress achieved by NGS, the Human Genome Project which used first-generation 'Sanger' sequencing technology to sequence the human genome took over 10 years and nearly 3 billion USD to achieve its goal [12][13][14]. By next-generation sequencing an individual human genome can now be sequenced in less than 2 weeks for approximately 5000 USD [15].
In theory, the whole genome does not need to be sequenced to identify genetic alterations in most human cancer-associated genes. More than 85 % of pathogenic mutations are found within the protein-coding regions of the genome [16], which collectively are referred to as the "human exome". This already dramatically reduces the regions that need to be sequenced for personalized oncology, thereby decreasing costs and time for whole exome sequencing of one sample to approximately 1,500 USD and 48 h (the exact prices mainly depend on the NGS platform, the required sequencing depth and are exclusive of the costs for bioinformatics). Furthermore and probably even more relevant for integration into clinical trials [17] or routine diagnostic applications [18], focusing on a selected panel of genes with established impact in cancer progression and/or a proven role in treatment resistance is possible which offers the opportunity for detection of rare genetic variants at very high sensitivity [2,17] in all types of samples including archival formalin-fixed, paraffin-embedded (FFPE) tissue [18,19] and plasma cell-free circulating tumor DNA [20].

The technical principle behind NGS
DNA sequencing was initially developed in 1975 by Sanger and Coulson [21] and these techniques are still used widely today. 'Sanger' sequencing is based on the use of oligonucleotide primers specifically binding to either side of the target DNA region which is then amplified in a polymerase chain reaction (PCR). The use of chain-terminating nucleotides in the DNA synthesis process allows the generation of different copies of the original DNA template at all possible lengths, which are separated by capillary electrophoresis. By using specifically labelled chain-terminating nucleotides (A, C, T or G) the original DNA sequence can be assembled.
NGS is based on the principle of sequencing in a massively parallel fashion. This means that up to millions of DNA fragments can be sequenced at the same time. Initially, DNA is fragmented into short segments called a shotgun library. Adaptors are ligated to the ends of each fragment. These adaptors are themselves short sequences of DNA which have primer binding sites for subsequent amplification. The shotgun library can subsequently be enriched for the sequences of interest, using different approaches [22,23]. As one example, probes which correspond to the target regions, e.g. the human exome, and which are immobilized on beads or a solid plate can be used in order to physically separate the target DNA fragments from the remaining DNA. Alternatively, custom arrays can be designed to enrich for specific groups of genes of interest (cancer gene panels). Following enrichment, the fragment library can be sequenced on next-generation sequencing platforms from several manufacturers (for a comprehensive review of the differing platform techniques see Metzker et al. [10]. Recording of the captured sequences occurs at live mode in a massively parallel fashion when the fluorescent signals from dye-labelled nucleotides in the nascent DNA strands on each bead, channel or cluster are detected during DNA synthesis. The challenge of big data analysis from NGS Whilst large amounts of sequencing data can be generated relatively quickly, data analysis can be time-consuming and difficult. The first problem is the large size of NGS raw data files, especially for results from WES or WGS. For example, non-compressed FASTQ files from human WGS with a mean coverage of 30x requires up to 200 gigabytes, making data transfer and storage of even small WGS projects a real challenge. These estimates do not include the disk space required for any downstream analysis. Development of streamlined, highly automated pipelines for pre-processing of raw data, alignment or de novo assembly of reads, quality control, copy number variation (CNV) and/or SNP calling is essential and high-capacity server solutions are mandatory. The key first step of data processing is the alignment of the sequence reads to a reference genome. Three characteristics of NGS data complicate this task. First, read lengths are relatively short (in average 26-330 bp) [10] compared to capillary-based 'Sanger' sequencing, which decreases the likelihood that a read can be mapped to one unique location. Second, reads from NGS platforms contain higher rates of sequencing errors, especially in regions of homopolymer repeats [10]. Subsequent validation of novel variants by 'Sanger' sequencing to exclude technical sequencing errors is therefore highly recommended. This technical limitation of NGS is also underlined by the results from a recent study which revealed a higher rate of false-positive single nucleotide variations detected by WES compared to WGS and a considerable fraction of insertions and deletions detected by both WES and WGS which could not be confirmed by subsequent Sanger sequencing [24].
By all means, in each individual case most of the identified variants will represent single nucleotide polymorphisms (SNPs) of no pathogenic relevance [25]. These can be removed either by filtering against sequencing results from 'control' DNA of the same patient's normal tissue or, if such control is not available, against data sets from public databases such as the NCBI dbSNP and the '1000-genomes' project [25]. The remaining variants can be filtered against public collections of genetic alterations in cancer, such as the Catalogue Of Somatic Mutations In Cancer (COSMIC) database (http://cancer.sanger.ac.uk) which as of August 2014 contained over 2 million coding mutations, more than 70,000 gene fusions or genome rearrangements and almost 700,000 abnormal copy number variants [26]. By such an approach, genetic variants with known/potential oncogenic function can be identified.
An additional approach to separate biologically relevant from irrelevant variants often utilizes new software tools (SIFT [27,28], PolyPhen-2 [29], mutation-assessor [30]) which are now widely available and help to determine which mutations may have a functional impact on the encoded protein, which are likely to be pathogenic, or which are rather neutral variants without biological effect. These methods are generally based on the assumption that important amino acids will be conserved in the protein family, and that changes at well-conserved positions are likely to be deleterious [27]. For example, given a protein sequence SIFT chooses related proteins and obtains an alignment of these proteins with the query. Based on the amino acids appearing at each position in the alignment, SIFT calculates the probability that an amino acid at a position is tolerated or deleterious [27].
MutSig is another algorithm which has been developed at the Broad Institute of Harvard and MIT in 2007 [31]. MutSig is currently broadly used to identify driver mutations among large numbers of passenger mutations. In contrast to the above mentioned methods, MutSig takes into account that background mutation processes occurred during formation of tumors and it considers the mutations of each gene to identify genes that were mutated more often than expected by chance [4]. Besides looking for abundance above background, MutSig looks for positive selection in genes, i.e. increased numbers of non-synonymous vs. silent mutations or mutation clusters at hotspots. Its advanced version (MutSigv2.0) takes also into account the functional impact of mutations (as estimated by the above mentioned tools SIFT, PolypPhen-2, Mutation Assessor, etc). In addition, incorporation of the covariates DNA replication time, chromatin state (open/closed), and general level of transcription activity into the background model has been shown to substantially reduce the number of false-positive findings [4].
These in-silico methods certainly assist in the filtering process, however their results still need to be cautiously interpreted in conjunction with the involved gene and certainly have their limitations. Methods like MutSig identifying driver gene mutations based on background mutation rates rely on a correct estimation of this background rate in a given tumor type and at a defined genomic region in order to keep the number of false positives to a minimum [4]. Other algorithms underestimate functional changes in poorly conserved positions [32]. As a result, frequency-based methods with loose background mutation rates will detect driver candidates with a probably high rate of false positives. On the other hand, methods implementing stricter models will identify more specific candidate lists but might miss some true cancer driver genes. Combination of complementary methods might overcome these limitations [3] and will certainly increase the knowledge gain from NGS studies. Last but not least, functional studies in preclinical models for elucidation of the mode of interaction of genetic variants with biological processes in tumor cells are indispensable for validation of NGS findings and are certainly mandatory before NGS technologies should move into clinical applications [33]. Translation into clinical practice can certainly only be achieved by multidisciplinary research approaches in order to extract meaningful diagnostic interpretation from large NGS datasets.

Novel approaches for personalization of radiotherapy
Over the last two decades, technological advances in treatment planning and delivery have improved the quality of radiotherapy in terms of precise dose application to the target volume together with minimal dose to normal tissue. Despite these achievements, a fundamental question that remains unresolved is whether based on the molecular profile of their tumors it is possible to prospectively identify patients who are more likely to benefit from radiotherapy. Personalized radiotherapy could be achieved by establishing biomarkers which can classify radiosensitive/-resistant tumors and/or tumor-surrounding normal tissue before initiation of treatment. To achieve such goal, previous studies have mostly evaluated single biomarkers or functional assays of DNA damage repair as predictor of intrinsic cellular radiosensitivity. Among others, assessment of the cell survival fraction [34] or the number of residual DNA double strand breaks after ex vivo irradiation of tumor cells [35] or normal tissue [36,37] as well as in vivo determination of the extent of tumor hypoxia [38] have been evaluated extensively. Although promising according to preliminary clinical data, none of them have become routine yet which might be due to low robustness of some of these in-vivo assays [36].
The generation of high-throughput data sets in the omics era has provided a novel and complementary opportunity in biomarker discovery. Using high-throughput transcriptome analysis, it has been previously shown that prediction of cellular radiosensitivity of tumor cell lines by expression analysis of a defined set of genes clearly outperformed assays of single gene analysis [39]. The value of this molecular signature as predictive biomarker for radiosensitivity was already confirmed in a large clinical cohort [40] speaking for its clinical potential. Another interesting approach is the use of hypoxia gene expression signatures for selecting patients who likely benefit from the inclusion of hypoxia-modifying drugs in regimens of radio- [41] or radiochemotherapy [42].
Beside the influence of gene expression levels, individual differences in cellular radiosensitivity are thought to be at least partly determined by germ-line genetic variants. Rare variants which are likely to be functional can only be detected by high-throughput DNA sequencing, made now affordable by the NGS technology. Up to date, only few studies used NGS for assessment of the exact role of SNPs for treatment outcome after radiotherapy. Recently, the role of germ-line SNPs and rare variants in MRE11A as predictive biomarkers of both tumor response and toxicity following definitive radiotherapy of muscle-invasive bladder cancer was analyzed by this technology [43]. Carriers of at least one of six rare MRE11A variants had a significantly higher risk of local failure in the radiotherapy arm, whereas no such association was seen in the surgically treated patient cohort [43]. It will certainly be interesting to expand such type of analysis to a broader spectrum of cancer types.
For elucidating the role of somatic mutations in radioresistance NGS has first been applied in bacteria [44]. In a model of cellular adaption to irradiation, extremely radioresistant E.coli strains were generated from the respective founder cells by repetitive cycles of increasing irradiation doses. Whole genome sequencing revealed a large number of genomic alterations in the radioresistant descendants of which only few were recurrent mutations, suggesting that multiple mechanisms can contribute to radiation resistance and distinct evolutionary pathways leading to this phenotype. Intriguingly, despite this heterogeneity, clear genetic patterns also emerged. Not unexpectedly, mutations clustered more frequently in genes of DNA double strand break repair.
In two recent NGS studies in locally advanced squamous cell carcinoma of the head and neck (HNSCC) our group has evaluated the role of somatic mutations in a set of cancer-related genes for the efficacy of definitive [45] and adjuvant chemoradiation [46]. Our studies could confirm previous reports of poor efficacy of radiotherapy in HNSCC tumors harboring disruptive TP53 mutations [47,48]. For the first time, we demonstrated a possible role of mutations in NOTCH1 and key driver genes (PIK3CA, KRAS, NRAS and HRAS) as predictive biomarkers of outcome after chemoradiation. Moreover, our studies also confirmed that archival formalin-fixed paraffin-embedded (FFPE) specimens are indeed suitable for targeted NGS although in series older than 8-10 years a considerable portion of samples (up to 30 %) might fail due to the high extent of DNA fragmentation (IT, ms in preparation, July 2015).
NGS is also increasingly being used for the dissection of the mechanisms involved in treatment-induced clonal selection in the course of acquired treatment resistance. To our knowledge, only one study so far has addressed this question in a model of radioresistance [49]. In this study, DNA-targeted sequencing was performed on preand post-treatment tumor tissues from rectal cancer patients who failed to respond to neoadjuvant chemoradiation. Mutant variants previously associated with radioresistance including TP53 were detected in posttreatment residual tumor tissue from non-responders. In line with an important role of TP53 mutation in radioresistance, an increase in allele frequency of aberrant TP53 variants as well as an increase in mutant p53 expression levels was observed in all cases in which the tumor harbored a hotspot missense mutation in the DNA-binding domain of p53. These data strongly suggest that chemoradiation exerts a selection pressure that leads to the increase in the relative portion of tumor cells expressing mutant p53 protein [49]. Strategies of downregulating mutant p53 [50] or refolding it into its wild-type confirmation [51] might prove effective in sensitizing tumor cells to chemoradiation in this scenario.
Another interesting approach with potential impact in radiooncology which makes use of NGS represents a novel method named XR-seq. This technique can be applied for genome-wide mapping of DNA excision repair [52]. The underlying principle is that human nucleotide excision repair generates two incisions surrounding the site of damage, creating fragments of approximately 30 nucleotides. In XR-seq, these fragments are enriched by immunoprecipitation of specific repair proteins which are tightly bound to the excised DNA fragments. By subjecting this fragment library to NGS maps of global and transcription-coupled DNA repair can be generated. This novel method will allow uncovering repair characteristics and sequence preferences of treatment-induced DNA damage and as such might facilitate studies of the effects of mutational patterns on transcriptional activity on DNA repair in human tumor cells. This method should also prove useful in determining the effects of drugs like histone-modifying therapeutics or poly ADP ribose polymerase (PARP) inhibitors on nucleotide excision repair, and how they eventually interfere with radio-or chemosensitivity of tumor cells.
The immunomodulatory effects of radiation have been widely documented (for review see Burnette & Weichselbaum [53]) and immunogenic cell death was identified as key component not only of targeted therapies but also conventional treatment modalities including radiation [54]. It could thus be speculated that radiation of tumors with large numbers of genetic alterations, with a portion of them serving as putative neoantigens, is more likely to induce anti-tumor immunity compared to radiation of tumors with low number of alterations. In support of this assumption, the total number of immunogenic mutations per se (identified by WES) was positively correlated with overall survival of cancer patients treated with standard regimens [55]. Combining radiation and immune checkpoint blockade which already demonstrated synergistic anti-tumor responses in animal models [56] are promising strategies which are based on the above-mentioned principles. Integration of NGS-based mutational profiling in upcoming clinical trials of such combinatory treatment are anticipated and will determine the predictive value of the mutational load and/or the number of immunogenic mutations in this setting.
Intertumoral and intratumoral genomic heterogeneity: a real challenge for personalized medicine As stated above, the technological advances coming along with NGS have permitted rapid analysis of individual cancer genomes at high resolution on single-nucleotide level. By this technical advancement, an astonishing heterogeneity between individual tumors has been revealed, with only a limited number of somatic alterations shared between tumors of the same histopathologic subtype. This large genetic heterogeneity can be illustrated in the model of HNSCC. Cases in this disease entity with a history of heavy smoking and alcohol consumption belong to the group of highly genetically instable tumors [57], most likely resulting from the extensive DNA damage that has been caused by tobacco carcinogen exposure for years. As of December 2014, preprocessed and preanalysed mutational data from 3 independent whole exome NGS studies in HNSCC [58][59][60] in total reporting on 412 HNSCC cases were available at cbioportal (http://www.cbioportal.org). We used these data which have been filtered using tissuematched control sequences to exclude germ-line variants for a more detailed assessment of the extent of genetic heterogeneity in HNSCC. Overall, somatic non-synonymous mutations were detected in 15,293 genes. However, only 357 (2.3 %) of these genes were altered by mutation in >3 % of the tumors. In 127 (36 %) of the more frequently affected genes the mutation occurred within hotspot regions but for only 75 genes (15 %) the same base position was involved in more than one tumors. This means that recurrent mutations at hotspot regions were detected in only 0.5 % of all genes altered by mutations (Fig. 1). Alternatively, when the non-synonymous mutations were filtered using the MutSigv2.0 algorithm according to the background mutational rate per gene rather than their prevalence in HNSCC, only 51 genes (0.3 % of all affected genes) were identified as significantly mutated genes.
A second example for tumors of very high genetic heterogeneity is cutaneous melanoma [4]. In a landmark WES study on paired tumor and normal genomic DNA from 135 patients with melanoma an overall number of 86,813 coding mutations were detected at a 2:1 ratio of non-synonymous to synonymous events, suggestive for a high passenger mutation load [61]. Filtering against the basal mutation rates using MutSig [31] produced a list of 544 significantly mutated genes. By refining the algorithm to select for non-synonymous mutations of predicted functional consequence the authors reduced the list of candidate drivers to eleven genes harboring significant functional mutation burden. Interestingly, these genes included six well-known cancer genes (BRAF, NRAS, PTEN, TP53, CDKN2A, MAP2K1) and five new candidates (PPP6C, RAC1, SNX31, TACC1, and STK19) [61].
The huge genetic heterogeneity in these types of cancer underlines the need for advanced bioinformatics models for data analysis. It also impressively illustrates the need of identifying key oncogenic driver pathways rather than individual genes as targets of precision medicine. This assumption is also supported by the observation that many low-frequency mutations in breast and colorectal tumors, each of them having small effects on cell survival [62]. It is thus rather unlikely that genome sequencing will uncover a single target as the "Achilles heel" of a tumor.
Exacerbating the complexity of the genetic landscape of tumors, intratumoral heterogeneity in terms of spatial and temporal differences in the mutational patterns of key driver genes has recently been demonstrated for renal [63,64], lung [65], colorectal [66,67] and breast cancer [68]. Beyond etiologic, microenvironmental and tumor-specific factors which all might contribute to such genetic heterogeneity, therapy may act as further exogenous source of genome instability. Consistent with this, in a recent study using the genetic model system Caenorhabditis elegans cisplatin treatment has been found to lead to a striking increase in base substitutions as well as an elevated rate of larger structural alterations [69]. Importantly, among the mutations found to be induced by cisplatin in the human model some variants genes affected by mutations in 3% of cases Fig. 1 Genetic heterogeneity of squamous cell carcinomas of the head and neck region (HNSCC). The relative distribution of genes affected by mutations is shown according to their mean prevalence within the three analyzed study cohorts (≤3 % vs. >3 % of cases) and their frequency of occurrence at hotspot regions and/or recurrent base positions. The results shown here are based upon somatic mutation data generated by the TCGA Research Network [60], Stransky et al. [59] and Agrawal et al. [58] have been linked to tumor progression and drug resistance like activating HRAS mutations at codons 12 and 13 [70,71]. Temozolomide which is broadly used as radiosensitizer in brain tumors and sarcomas has been found to leave an imprint in the cancer genome in the form of an elevated rate of C > T transitions [57]. Concerning potential mutagenicity of radiotherapy, TP53 [72] as well as c-MYC among others were identified as radiosensitive gene loci [73].
In the light of accumulating evidence for high interand intratumoral genomic heterogeneity the identification of the relevant driver mutation(s) among passengers in an individual cancer biopsy at a defined stage of disease represents a significant hurdle in the development of NGS-based molecular diagnostics and personalized treatment. One approach to overcome such hurdle might represent deep sequencing of cell-free circulating tumor DNA derived from blood plasma for personalized cancer genomic profiling [20,[74][75][76][77][78], assuming that genetic variants which are present in tumors only at subclonal level (and which are probably not captured by the diagnostic biopsy) are finally and inevitably released by dying tumor cells to this common reservoir.

Future perspectives
Exciting new data from a continuously growing number of NGS cancer studies nourish the hope that this technology will also significantly contribute to increasing our understanding of the molecular mechanisms of radioresistance. However, many more studies will certainly be needed to determine the functional consequences of individual mutations or distinct mutational patterns for cellular radiosensitivity and the individual tumor's response to radiotherapy. Proteomics is expected to provide additional important information that will guide candidate drug selection and recent advances in proteomic techniques [79,80] have opened new avenues for optimized cancer treatment. The application of these techniques will not only allow the monitoring of proteinprotein interactions, posttranslational modification and drug-target engagement directly in cells or tissues but will also represent a valuable tool for identifying off-target drug effects [80]. The latter feature will certainly also foster attempts to develop less toxic protocols of radiotherapy combined with molecularly targeted radiosensitizing agents.
The future of personalized radiation therapy will most likely not only include DNA-based NGS. It will also apply other high-throughput technologies such as RNA sequencing that in parallel provides quantitative gene expression as well as mutational status. Overall, it can be reasoned that integration of mutational patterns from NGS analysis and other omics data together with functional measures of cellular radiosensitivity in systems biology models will strongly improve the power of outcome prediction and optimize current treatment selection algorithms for individual patients.