Prediction of genetic Values using Neural Networks
Paulino Perez 1
Daniel Gianola 2
Jose Crossa 1
1CIMMyT-Mexico 2University of Wisconsin, Madison.
September, 2014
SLU,Sweden Prediction of genetic Values using Neural Networks 1/26
Contents
1 Introduction
2 Non linear models and NN
3 Model fitting
4 Case study: Wheat
5 Application examples
SLU,Sweden Prediction of genetic Values using Neural Networks 2/26
Introduction
Introduction
High density marker panels enable genomic selection (GS).Marker based models performs better than pedigree based models (e.g.de los Campos et al., 2009).Most research done with linear additive models (see eq. 1).It might be possible to increase accuracy using non-linear models withdominance and additive effects.
yi =
p∑j=1
xijβj + ei (1)
SLU,Sweden Prediction of genetic Values using Neural Networks 3/26
Introduction
Continued...Recent studies with non-additive effects:
SLU,Sweden Prediction of genetic Values using Neural Networks 4/26
Introduction
Continued...
SLU,Sweden Prediction of genetic Values using Neural Networks 5/26
Non linear models and NN
Non linear models and neural networks
yi = µ+ f (x i) + ei (2)
Any non linear function can be exactly represented as (Kolmogorov’stheorem):
f (x i) = f (xi1, ..., xip) =
2p+1∑q=1
g
( p∑r=1
λr hq(xir )
)(3)
In Neural Networks (NN) non-linear functions are “approximated” assums of finite series of smooth functions.Most basic and well known NN is the Single Hidden Layer Feed ForwardNeural Network (SHLNN).
SLU,Sweden Prediction of genetic Values using Neural Networks 6/26
Non linear models and NN
Continued...
Figure 1: Graphical representation of a SHLNN.
SLU,Sweden Prediction of genetic Values using Neural Networks 7/26
Non linear models and NN
Continued...
Figure 2: Inputs (e.g. Markers) and output (phenotype) for a SHLNN.SLU,Sweden Prediction of genetic Values using Neural Networks 8/26
Non linear models and NN
Continued...
Prediction has two (automated) steps:
Inputs transformed non-linearly in the hidden layer.Outputs from hidden layer combined to obtain predictions.
yi = µ+
Combine output from hidden layer︷ ︸︸ ︷S∑
k=1
wk gk
bk +
p∑j=1
xijβ[k ]j
︸ ︷︷ ︸
output from hidden layer
+ei
gk (·) is the activation (transformation) function.
SLU,Sweden Prediction of genetic Values using Neural Networks 9/26
Model fitting
Model fitting
Parameters to be estimated in a NN are the weights (w1, ...,wS) , biases(b1, ...,bS), connection strengths (β
[1]1 , ...., β
[1]p ; ..., β
[S]1 , ...., β
[S]p ), µ and σ2
e .
When number of predictors (p) and of neurons (S) increase, the numberof parameters to estimate grows quickly.
=⇒ Can cause over-fitting.
To prevent over fitting use penalized methods, via Bayesian approaches.
SLU,Sweden Prediction of genetic Values using Neural Networks 10/26
Model fitting Empirical Bayes
Contents
1 Introduction
2 Non linear models and NN
3 Model fittingEmpirical Bayes
4 Case study: Wheat
5 Application examples
SLU,Sweden Prediction of genetic Values using Neural Networks 11/26
Model fitting Empirical Bayes
Empirical Bayes
McKay (1995) developed Empirical Bayes approach framework forestimating parameters in a NN.
Let θ = (w1, ...,wS,b1, ...,bS, β[1]1 , ...., β
[1]p ; ..., β
[S]1 , ...., β
[S]p , µ)′
p(θ|σ2θ) = MN(0, σ2
θ I)
Estimation requires two steps,1) Obtain conditional posterior modes of the elements in θ assuming σ2
θ, σ2e
known. These are obtained by maximizing,
p(θ|y , σ2θ, σ
2e) =
p(y |θ, σ2e)p(θ|σ2
θ)
p(y |σ2θ, σ
2e)
=p(y |θ, σ2
e)p(θ|σ2θ)∫
Rm p(y |θ, σ2e)p(θ|σ2
θ)dθ
which is equivalent to minimizing the “augmented” sum of squares:
F (θ) =1
2σ2e
n∑i=1
ei +1
2σ2θ
m∑j=1
θ2j (4)
SLU,Sweden Prediction of genetic Values using Neural Networks 12/26
Model fitting Empirical Bayes
Continued...
2) Update σ2θ , σ
2e by maximizing marginal likelihood of the data p(y |σ2
θ , σ2e).
The marginal log-likelihood aproximated as:
log p(y |σ2θ , σ
2e) ≈ k +
n2
logβ +m2
logα− 12
log |Σ|θ=θmap − F (θ)|θ=θmap
where Σ = ∂2
∂θθ′ F (θ).It can be shown that this function is maximized when:
α =γ
2∑m
j=1 θ2j, β =
n − γ∑ni=1 e2
i, γ = m − 2αTrace(Σ−1)
Iterate between 1 and 2 until convergence.
NOTE:SIMILAR TO USING BLUP AND ML IN GAUSSIAN LINEAR MODELS.
SLU,Sweden Prediction of genetic Values using Neural Networks 13/26
Model fitting Empirical Bayes
Problems with the approach
Huge number of parameters to estimate,
m = 1 + S × (1 + 1 + p)
where S is the number of neurons and p is the number of covariates.Gauss-Newton algorithm used to minimize (4) requires solving linearsystems of order m ×m, complexity O(m3).Updating formulas for the variance components requires inverting amatrix of order m ×m, complexity O(m3).
Alternatives:
Derivative free algorithms (may have poor performance, unstable).Parallel computing.
SLU,Sweden Prediction of genetic Values using Neural Networks 14/26
Model fitting Empirical Bayes
brnn
We developed an R package (brnn) that implements the Empirical Bayesapproach to fiting a NN. It will be available in a few months in the R-mirrors.
Figure 3: Help page for the trainbr package.
SLU,Sweden Prediction of genetic Values using Neural Networks 15/26
Case study: Wheat
Case study: additive genetic effects (wheat)
Prediction of Grain yield (GY) and Days to heading (DTH) in wheat lines,
306 wheat lines from Global Wheat Program of CIMMyT.1,717 binary markers (DArT).Two traits analyzed:
1 GY (5 Environments).2 DTH (10 Environments).
Bayesian regularized neural networks fitted by using the MCMC approach.
Predictive ability of BRNN compared against standard models by generating50 random partitions with 90% of observations in training and 10% in testing.
SLU,Sweden Prediction of genetic Values using Neural Networks 16/26
Case study: Wheat
Continued...
Table 1: Correlations between observed and predictedphenotypes for DTH and GY (“winner” underlined).
NOTE:Non-parametricmethods better in15/15comparisons.
SLU,Sweden Prediction of genetic Values using Neural Networks 17/26
Case study: Wheat
Continued...
Figure 4: Plot of the correlation for each of 50 partitions and 10 environments for daysto heading (DTH) in different combination of models.
SLU,Sweden Prediction of genetic Values using Neural Networks 18/26
Application examples
Toy examples
#Example 1#Noise triangle wave function, similar to example 1 in Foresee and Hagan (1997)
#Generating the datax1=seq(0,0.23,length.out=25)y1=4*x1+rnorm(25,sd=0.1)x2=seq(0.25,0.75,length.out=50)y2=2-4*x2+rnorm(50,sd=0.1)x3=seq(0.77,1,length.out=25)y3=4*x3-4+rnorm(25,sd=0.1)x=c(x1,x2,x3)y=c(y1,y2,y3)X=as.matrix(x)
neurons=2out=brnn(y,X,neurons=neurons)cat("Message: ",out$reason,"\n")
plot(x,y,xlim=c(0,1),ylim=c(-1.5,1.5),main="Bayesian Regularization for ANN 1-2-1")
Note:1 Type library(brnn) and then demo(’Example_1’) to run this example in the
R console.SLU,Sweden Prediction of genetic Values using Neural Networks 19/26
Application examples
Continued...
•••
•••••
••••••••••••
•••••••••••••••
••••••
•••••••
••••••
•
••••••••••••
••••••••
•••••••••••
•••••••••••
•••
0.0 0.2 0.4 0.6 0.8 1.0
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
x
yMatlabR
SLU,Sweden Prediction of genetic Values using Neural Networks 20/26
Application examples
Continued...
#2 Inputs and 1 output#the data used in Paciorek and#Schervish (2004). The data is from a two input one output function with Gaussian noise#with mean zero and standard deviation 0.25.
data(twoinput)X=normalize(as.matrix(twoinput[,1:2]))y=as.vector(twoinput[,3])
neurons=10out=brnn(y,X,neurons=neurons)cat("Message: ",out$reason,"\n")
f=function(x1,x2,theta,neurons) predictions.nn(X=cbind(x1,x2),theta,neurons)x1=seq(min(X[,1]),max(X[,1]),length.out=50)x2=seq(min(X[,1]),max(X[,1]),length.out=50)z=outer(x1,x2,f,theta=out$theta,neurons=neurons) # calculating the density values
transformation_matrix=persp(x1, x2, z,main="Fitted model",sub=expression(y==italic(g)~(bold(x))+e),col="lightgreen",theta=30, phi=20,r=50, d=0.1,expand=0.5,ltheta=90, lphi=180,shade=0.75, ticktype="detailed",nticks=5)
points(trans3d(X[,1],X[,2], f(X[,1],X[,2],theta=out$theta,neurons=neurons), transformation_matrix), col = "red")
SLU,Sweden Prediction of genetic Values using Neural Networks 21/26
Application examples
Continued...
x1
−1.0−0.5
0.00.5
x2
−1.0
−0.5
0.0
0.5
z
−2
0
2
y = g (x) + e
•
•
••
••
•
•
• ••
•
•
•••
•
••
• •
• •
••
•
••
• ••
•
••
•
•
•
•••
••
••
•
•••
•
•
•
•
••
•
•
•
•
•
•••
•
•
••
•
•
•
•
•
••
•• ••
•
•
••
•
•
•
•
••
••
•••
•
• •• ••••
•
• ••
••
••
•
•
••
•
•
•
•
••
•
• •
•
•
•• •
••
•
••
••
••
•
•
••
•
•
•
•
•
•
•
••
••
••
•• •• ••
•••
• ••• •
••
•
• •
•
••
•
•
••
••
••
•
•
•• ••
•
• •••
••
•
•
•
••
•
•
••
•
•
•
•
•
•
•
•
• • ••
••
•
•
• ••
•
•
SLU,Sweden Prediction of genetic Values using Neural Networks 22/26
Application examples
Application for the wheat datasetWarning: This analysis can take a while,... We are selected only somemarkers. You can select markers based on p-values for example or try toreduce the dimensionality of your problem using G matrix as input or principalscores.rm(list=ls())setwd("/tmp")library(brnn)library(BLR)#Load the wheat datasetdata(wheat)
#Normalize inputsy=normalize(Y[,1])X=normalize(X)
p=300
#Fit the model with the FULL DATA, but some markers,#You can select the markers based on p-values for exampleout=brnn(y=y,X=X[,1:p],neurons=2)cat("Message: ",out$reason,"\n")
#Obtain predictionsyhat_R=predictions.nn(X[,1:p],out$theta,neurons=2)plot(y,yhat_R)
SLU,Sweden Prediction of genetic Values using Neural Networks 23/26
Application examples
Continued...
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●●
●●
●
●●
●
●
●
●●
●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
● ●
●●
●
●
●
●●
● ●
●
●
●●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●● ●
●
●
●
●
●
−1.0 −0.5 0.0 0.5 1.0
−1.
0−
0.5
0.0
y
yhat
_R
Notes:The function predictions.nnobtains y . This function takes asarguments the vector of estimatedparameters and the number ofneurons.The vector of estimatedparameters can be obtained usingthe function brnn.The brnn software works faster inthe R version developed byRevolution Analytics in Linuxenvironments.
SLU,Sweden Prediction of genetic Values using Neural Networks 24/26
Application examples
References
de los Campos G., H. Naya, D. Gianola, J. Crossa, A. Legarra, E.Manfredi, K. Weigel and J. Cotes. 2009.Predicting Quantitative Traits with Regression Models for DenseMolecular Markers and Pedigree,Genetics 182: 375-385.
Foresee, F. D., and M. T. Hagan. 1997.Gauss-Newton approximation to Bayesian regularization,Proceedings of the 1997 International Joint Conference on NeuralNetworks.
Gianola D., Fernando R, Stella A. 2006.Genomic-assisted prediction of genetic values with semi-parametricprocedures,Genetics 173:1761-1776.
Gianola D, van Kamm JBCHM. 2008.Reproducing kernel Hilbert space regression methods forgenomic-assisted prediction of quantitative traits.Genetics 178: 2289-2303.
SLU,Sweden Prediction of genetic Values using Neural Networks 25/26
Application examples
Continued...
Gianola, D. Okut, H., Weigel, K. and Rosa, G. 2011.Predicting complex quantitative traits with Bayesian neural networks: acase study with Jersey cows and wheat.BMC Genetics.
MacKay, D.1995.Probable Networks and plausible predictions - a review of practicalBayesian methods,Network: Computation in Neural Systems.
SLU,Sweden Prediction of genetic Values using Neural Networks 26/26