Effect of machine learning methods on predicting NSCLC overall survival time based on Radiomics analysis

Background To investigate the effect of machine learning methods on predicting the Overall Survival (OS) for non-small cell lung cancer based on radiomics features analysis. Methods A total of 339 radiomic features were extracted from the segmented tumor volumes of pretreatment computed tomography (CT) images. These radiomic features quantify the tumor phenotypic characteristics on the medical images using tumor shape and size, the intensity statistics and the textures. The performance of 5 feature selection methods and 8 machine learning methods were investigated for OS prediction. The predicted performance was evaluated with concordance index between predicted and true OS for the non-small cell lung cancer patients. The survival curves were evaluated by the Kaplan-Meier algorithm and compared by the log-rank tests. Results The gradient boosting linear models based on Cox’s partial likelihood method using the concordance index feature selection method obtained the best performance (Concordance Index: 0.68, 95% Confidence Interval: 0.62~ 0.74). Conclusions The preliminary results demonstrated that certain machine learning and radiomics analysis method could predict OS of non-small cell lung cancer accuracy. Electronic supplementary material The online version of this article (10.1186/s13014-018-1140-9) contains supplementary material, which is available to authorized users.

measuring the linear dependency between two variables.
The Pearson correlation coefficient (PCC) between the feature x and the times y is defined as following: (1) The Kendall correlation coefficient (KCC) between the feature x and the times y is defined as following: The Spearman correlation coefficient (SCC) is equivalent to PCC applied to the rankings of the columns X and Y. when all the ranks in each column are distinct, the equation simplifies to: Here, d and n are the difference between the ranks of the two columns and length of each column, respectively.

◆ Mutual information (MI):
MI is a method applied to measure the mutual dependence between the two variables.
The equation is defined as following:  The hazard function of this model is formulated as following: Here, the λ 0 and are the baseline hazard function and regression coefficients, respectively. The can be estimated using the partial log-likelihood: • Gradient boosting linear model based on CI and Cox (GB-Cindex and GB-Cox): The target of the gradient boosting linear models is to establish a function to find = * ( | , ) from data X and Y. The functional mapping is learned by minimizing the loss function of the empirical risk: * ( | , ) = min ∑ ( , ( , )) =1 (7) Here, is the base-learner.
The gradient boosting method calculates the negative gradient of the loss function at each iteration (m=1, …, ) and evaluates it at −1 ( , ), = 1, … , . This yields the negative gradient vector for each base learner. The negative gradient vector is defined as following: Typically, one base learner is utilized for each covariate and result in prediction values.
Then, the ̂ is set equal to the fitted values from the corresponding best base learner. The instruction also shows that the penalty value parameter (penalty) can be selected rather coarsely.
• Bagging survival tree model (BST): Bagging is one of the most common methods which is typically used to reduce the variance of the base learners. In the bagging survival tree model, the survival function can be obtained by averaging the predictions calculated from a single survival tree The logrank splitting is used as the splitting rule criteria for survival data. The process of selecting splitting candidates and split points will continue to repeat until the terminal nodes contains no less than nodeSize unique events. Based on the resulting tree ensemble, cumulative hazard is estimated by integrating all the information of the ntree trees. It should be noted that the instruction of the R package "randomForestSRC" recommended to optimize the nodeSize by multiple experiments (Ishwaran, 2018). Where = 1/ is defined.
• Support vector regression for censored data model (SVCR): The core idea of method is to find a function which could estimate observed survival times (continuous outcome ) using covariates based on the conventional support vector regression (SVR) (Vapnik, 1998). The SVCR model can be formulated as following: Here, δ, F, , , * are the censoring indicator, the function that translates the observed covariates to the feature space, the strict regularization constant and the slack variables allowing for the errors in the training data predictions.