Single-center versus multi-center data sets for molecular prognostic modeling: a simulation study

Radiation Oncology

Table 1 Parameters of simulations

			Sc1	Sc2	Sc3
signal strength	\(\tilde {\beta }\)	=	[0; 0.5]	0.25	0.125
number of genes, informative	n_inf	=	300	[1;1000]	300
sample size	N_s	=	100	100	[40 500]
number of genes, total	N_g	=	10³	10³	10³
number of centers in MC	N_c	=	8	8	8
minimum samples per center	n_min	=	10	10	5
basal level gene g	α_g	∼	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)
target	a_ij	∼	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)
fixed batch effect gene g	γ_jg	∼	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)
number of latent factors	m_j	=	5	5	5
factor loadings	b_jgl	∼	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)
impact of factor l on sample i	Z_ijl	∼	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)
noise scaling of gene g in batch j	δ_jg	∼	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)
noise	ε_ijg	∼	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)	\(\mathcal {N}(0,1)\)
standard deviation of observation noise	σ_y	=	0.1	0.1	0.1

Each column shows the parameter set for one of three simulated scenarios. The intervals indicate the ranges in which the parameter values were varied in the respective scenarios. Fixed parameters are indicated by ‘ =’, while sources of heterogeneity as signal, noise and batch effects are characterized by the parameters of their densities, indicated by the ’ ∼’ symbol

ISSN: 1748-717X