Exploring the Space of Machine Learning on Stencil Based ...vinu/project_report-1.pdf · s p e c cv...

Exploring the Space of Machine Learning on Stencil Based Kernels

[CS6350 : Project Report]

Abstract— Stencil based computations are used exten-sively in High Performance Computing (HPC) domain forfinding solutions of Partial Differential Equations (PDEs).These computations are represented in a polynomial formwhere the coefficients of the polynomial are influencedby the parameters such as boundary values, initialvalues, and the degree of accuracy. In this work, wepresent an empirical study to explore the solution spacefor following questions: i) is it possible to learn thecoefficients of a given polynomial using machine learning,ii) how accurate the learning could be as compared tothe original solution? And iii) what is the competitivenessof the various machine learning algorithms deployedfor this purpose. We use our own version of learningmodel using stochastic gradient descent algorithm andbenchmark it against machine learning packages: Libsvmand Liblinear, for learning the polynomial coefficients.Given the intractable size of the training data producedby the stencil kernels, we employ sampling strategy usedin the prior work to reduce the sample size.

I. INTRODUCTION

Stencil based computations are an important areaof study in the scientific domain. They find a varietyof applications in areas such as solutions to PartialDifferential Equations (PDEs) and related areas ofscientific computation research. Stencil computationsessentially are repeated update of neighboring pointson a multidimensional grid, based on a given intialcondition (initial value problem) or intial and boundaryconditions (boundary value problem), and can be usedto represent a wide variety of real world differentialequations such as heat and wave propagations. Theeffort in this project was to learn the stencil points usingthe constituent grid points as features, and compare theperformance of various learning algorithms on such alearning. If the study is successful, the results couldbe potentially utilized in a variety of applications thatmandate computational optimizations on stencils. Also,the comparitive analysis could help in making informedchoices over picking a given learning algorithm overothers. For this project, the PDE we used was heat1dequation.

Fig. 1. Stencil computation

II. FORMULATION OF THE PROBLEM

A. Stencils Based Kernels

As shown in Fig.1, the evaluation of any givengrid point on the stencil of the heat1d PDE can berepresented as :

h[i+1, j] = α ∗h[i][ j−1]+ (2−α)∗h[i][ j]

+h[i][ j+1]+h[i][ j+1]+ f [i][ j]∗dt

Here, i+1 is indicative of the time step for which thecomputation is being done, and j is the jth discretepoint that is being evaluated on the basis of ( j−1)th,jth and ( j + 1)th positions of the current time step i.The parameter α (also known as c f l) is defined by

α =dt

(dx)2

where dt and dx are the sizes of discretized time andlength steps. Further, the term f [i][ j] ∗ dt is the valueof actual heat1d equation at point (i, j) multiplied bythe discretized time step dt.

Since the learned model will be evaluated on themean squared error, we consider the loss function asthe squared error, and add a L2 regularizer term (forobtaining PAC guarantees) to the optimization functionJ(w):

J(w) =12

wT w+C2

m

∑i=1

(yi−wT xi)2

Taking partial derivative of the cost wrt. w, we get:

∇J(w) = w+C(wT xi− yi)xi

as we use stochastic gradient descent to reach to theglobal miminum, using the following update rule:

w← w− r∇J(w)

w← w+ rC(yi−wT xi)xi

From the lectures we know that SGD will converge iflearning rate is squared-summable, and not summable,so we choose the following function r(t,C) as thelearning rate:

r =ρ0

1+ ρ0tC

where t, represents the example number and ρ0 rep-resents the initial learning rate Both C and ρ arehyperparameters to the learning algorithm.

III. TERMINOLOGY

In our setting the term Initial conditions (IC) refersto the value of initial temperature of the 1-dimentionalrod to be considered for heating. Such a condition isspread across the length of the rod, except for the endpoints. The temperature of the end points constitutethe Boundary Condition, (BC) which is imposed atevery progressive time step of computation. The PDE,packaged along with the specification of IC and BC,together constitute what is known as a Boundary ValueProblem (BVP). We consider the following threeBVPs’ for learning :

1) Uniform ,uni2) Triangular,tri3) Piecewise Linear,pwlIn this section, we illustrate these three BVP’s using

the 3D plots, which have been generated from execut-ing the native stencil computations for 80 timestepseach. The first 3 sets of plots of each BVP have beenour training data, and the last BVP has been our testdata set, on which we have evaluated the learned model.These kinds of stencil computations are governed bya necessary condition, but may not be sufficient forthe convergence of the finite-difference approximationof a given numerical problem, called CFL. In orderto establish the convergence of the finite-differenceapproximation, there exists limitations on the length ofthe time step and/or the lengths of the spatial intervals.So we have chosen these BVP to be compliant with theCFL convergence criterion.

Fig. 2. Uniform IC:50 and BC:90,70




Fig. 6. Triangular IC:70/(L/2)x and BC:0,0




Fig. 10. Piece wise linear IC:0-(0,L/2) and BC:0,50

Fig. 11. Piece wise linear IC:0-(0,3L/4) and BC:0,70



IV. FRAMEWORK

A. Data Set Generation

In this phase, we generate the solutions computedby the native stencil kernel and partition them intotraining and test data sets, these are typically obtainedby thoroughly randomizing the components that definea BVP and satisfying the CFL criterion, but for theseexperiments we restricted to 4 sets of BVPs’.

B. Sampling

It was intractable to training directly on the dataobtained using the above step, so we choose to performstratified random sampling which was explored as aviable option for sample size reduction in the previouswork [1].

C. Cross validation as a higher order function

In class, we used cross validation as a primaryapproach for choosing the hyper parameters for, herewe use cv to choose between learning algorithm

Fig. 14. Process flow executed for learning each BVP

D. Ideas from class

During the course of execution of this project, weexplored and learned a variety of ideas. We figuredout that the stencil computations involving neighboringpoints from a stencil grid are indeed learnable. We alsounderstood that not all learning algorithms perform thesame way for a given learning problem. For the caseof heat1d stencil computation, LibLinear was observedto have the best performance. It yielded minimumerror for both training and testing phases across allthe variants. The ideas learned from the class wereindeed helpful. As mentioned earlier, in addition tothe deployment of standard machine learning packages- LibSVM and LibLinear, we also implemented ourown learning algorithm which was based on stochasticgradient descent over a linear regression objective. Thisalgorithm was of course picked up from the class.Further to that, the techniques of k-fold cross validationfor selection of the best hyper-parameters were alsopicked up from the class.

E. Lessons

Its possible to get comparable accuracies to popu-lar machine learning algorithms by chooseing correctranges of hyperparameters for cross validation We fig-ured out that the stencil computations involving neigh-boring points from a stencil grid are indeed learnable.We also understood that not all learning algorithmsperform the same way for a given learning problem.For the case of heat1d stencil computation, LibLinearwas observed to have the best performance. It yieldedminimum error for both training and testing phasesacross all the variants

V. RESULTS

TABLE I6-FOLD CROSS-VALIDATION RESULTS FOR UNIFORM BVP

uniform initializationalgorithm hyperparameters cv mse

liblinear

s p e c13 0.001 0.001 0.01 1.5e-0712 0.01 0.001 0.01 0.001511 0.1 0.001 0.01 0.02

libsvm

s p e c cv mse3 0.001 0.001 0.01 2.6033 0.01 0.001 0.01 3.873 0.1 0.001 0.01 4.93

sgd linear regression

c rho cv mse0.001 0.001 7.70.001 0.1 10.70.1 0.1 12.7

TABLE II6-FOLD CROSS-VALIDATION RESULTS FOR TRIANGULAR BVP

triangular initializationalgorithm hyperparameters cv mse

liblinear

s p e c13 0.001 0.001 0.01 1.213e-0712 0.01 0.001 0.01 1.25e-0511 0.1 0.001 0.01 0.00012

libsvm

s p e c cv mse3 0.001 0.001 0.01 2.59793 0.01 0.001 0.01 4.873 0.1 0.001 0.01 12.93


c rho cv mse0.001 0.001 0.0020.001 0.1 1.2030.1 0.1 11.279

TABLE III6-FOLD CROSS-VALIDATION RESULTS FOR PWL BVP

piecewise linear initializationalgorithm hyperparameters cv mse

liblinear

s p e c13 0.001 0.001 0.01 1.504e-0712 0.01 0.001 0.01 0.001511 0.1 0.001 0.01 0.0215

libsvm

s p e c cv mse3 0.001 0.001 0.01 3.11953 0.01 0.001 0.01 3.873 0.1 0.001 0.01 7.129


c rho cv mse0.001 0.001 7.720.001 0.1 10.780.1 0.1 12.67

TABLE IVERROR RESULTS ON BVP-4

Final test-errors based on libLinearBVP Variant Max Error Mean Squared Error

Uniform 0.0545 0.00161Triangular 0.0163 0.00013

Piecewise Linear 0.0397 0.000733

Fig. 15. Learned solution for uniform initialization, liblinear

Fig. 16. Learned solution for triangular initialization, liblinear

Fig. 17. Learned solution for piecewise linear initialization,liblinear

VI. ON CONTINUING THE PROJECT

We have identified the following six-ideas as part ofthe future work on this project

A. Improving the performance of our Predictor

We had implemented a version of linear regression,from these experiments we learn that it needs signifi-cant improvements, we will explore ideas to improve itin the lines of computational efficiency and accuracy.

B. Sample Size Planning

A method of information theoretic based samplingmentioned in [3], mentions maintaining E, for thesample population

E(samplePop) =M

∑i=1−pilog(pi)

And adding a new sample only when, it increases thisentropy by a certain threshold, this threshold becomesa hyperparameter to the learning algorithm

C. Studying other Stencil Kernels

In this project, we study only the 1-D Heat PDE, butthere are two other important PDE’s namely the Waveand Laplaces equation, 2-D and 3-D stencils programsfor calculating the numerical solutions using stencilkernals, we plan to reuse this flow to find out if thesethese time-steppes solution are learnable.

D. Other ML algorithms

Learning PDE, is becoming an important area, fortwo-reasons, 1. Numerical scientists have stated thatPDE codes are inherently prone to bugs and are harderto debug, there has been a study [14] to use ANN forlearning seemingly complex PDE’s.

E. Generative vs Discriminative model

In class we studied about these two orthogonalapproaches to model a learning problem, In our case oflearning solutions to PDE’s, we know the ”un-known”function and its absolute behaviour, the concept and itsclass are deterministic. Basically we are fully aware ofthe generative story of the data sets, we will strive toanswer the question: can we predict better, having thisprior knowledge.

F. Online Learning

We claimed that if a model can be learned ona fixed boundry value problem, it can be deployedon any other BVP. But we will keep in mind the”black swan” criterion to the prediction theory andexplore the possibility to update the learned model,upon encountering large errors while testing on similarBVPs.

REFERENCES

[1] Vishal C. Sharma, Ganesh Gopalakrishnan, Greg Bronevetsky,Detecting Soft Errors in Stencil based Computations

[2] Siamak Mehrkanoon, Johan A.K Suykens, Learning solutionsto partial differcential equations using LS-SVM. 2015

[3] Siamak Mehrkanoon, Johan A.K Suykens, LS-SVM approxi-mate solutions to linear time varying descriptor systems, 2012

[4] Eduardo Berrocal, Leonardo Bautista-Gomez, Sheng Di,Zhilling Lan and Frank Capello, Lightweight Silent DataCorruption Detection Based on Runtime Data Analysis forHPC Applications

[5] Ken Kelly, Sample size planning for the coefficient of varia-tion from the accuracy in parameter estimation

[6] Richard H Byrd, Gillian M. Chin, Jorge Nocedal, Yuchen Wu,Sample Size Selection in Optimization Methods for MachineLearning

[7] Burkardt, John, Scientific Computing Library : https://people.sc.fsu.edu/˜jburkardt

[8] Kreyszig, Erwin.Advanced engineering mathematics. Wiley,2011.

[9] Vishal C Sharma, Approaches for Approximate Stencil basedComputations

[10] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-RuiWang, Chih-Jen Lin, LIBLINEAR: A Library for Large LinearClassification , NTU

[11] John burkardt, Florida State University, Codes and Data Setshttps://people.sc.fsu.edu/˜jburkardt/py_src/fd1d_heat_explicit/

[12] Michael Bowles, Machine Learning in Python, Essential Tech-niques for Predictive Analysis

[13] Bitbucket Repository https://[email protected]/ufmr/cs6350mlproj.git

[14] Isaac Elias Lagaris, Aristidis Likas and Dimitirios I Fotiadis,Artificial Neural Networks for Solving ODE’s and PDE’s

Date post:	08-Jul-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Exploring the Space of Machine Learning on Stencil Based ...vinu/project_report-1.pdf · s p e c cv...

Documents