Download - Gaussian Process Regression Forecasting of Computer ...cgarman/files/REUPresentation.pdfIntroduction GPR Derivations Conclusion Gaussian Process Regression De nition A Gaussian process

Introduction GPR Derivations Conclusion

Gaussian Process Regression Forecasting ofComputer Network Conditions

Christina Garman

Bucknell University

August 3, 2010

Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 1 / 22


What are we doing and why do we care?

We have investigated Gaussian process regression forforecasting network conditions

Computer network conditions concern:

Users with large data transfers or resource-intensiveapplicationsNetwork engineers monitoring the quality of their networkNetwork researchers

Gaussian process regression has not been applied to the fieldof computer networking



Computer Networking

A computer network is a system of computers and devicesconnected to share information and resources

Performance metrics of interest

Available bandwidthLatencyLoss

L. Peterson and B. Davie, Computer Networks: A Systems Approach, Elsevier, 2007.



Background

Our forecasting efforts focus on the Department of Energy’sEnergy Sciences Network (ESnet)

Forecasts are done in MATLAB. We have created a frameworkthat allows the code to be run directly in MATLAB or from aC program.



ESnet

Department of Energy, Energy Sciences Network (Esnet), http://www.es.net/pub/maps/topology.html



Gaussian Process Regression

Definition

A Gaussian process is an indexed set of random variables, anyfinite number of which have a joint Gaussian distribution. It can becompletely specified by a mean function and covariance function.

Gaussian process regression (GPR) allows us to make predictions ofcontinuous quantities based on “learning” from a set of trainingdata.



What is a covariance function?

Also called a kernel function

Chosen in a way that best fits the data

Gives us a model of the data

Controls the properties of the Gaussian process

Has adjustable parameters, called hyperparameters

k(xi , xj ) = σ2e− 1

2

(|xi−xj |

l

)2



What are hyperparameters?

Adjustable

Can be “learned” or inferred from a set of training data

Allow the kernel function to provide the best description ofthe current data

k(xi , xj ) = σ2e− 1

2

(|xi−xj |

l

)2



Maximum Likelihood Estimation

Used to “learn” the hyperparameters

µ =~Y TK−1~1

~1TK−1~1

σ2 =1

n~Y TK−1~Y



Terminology

Expected Value

E [X ] =∑

i

pixi

VarianceV [X ] = E [(X − E [X ])2]

Covariance

Cov [X ,Y ] = E [(X − E [X ])(Y − E [Y ])]

Σ = Cov [~Y ] =

Cov [Y1,Y1] Cov [Y1,Y2] · · · Cov [Y1,Yn]Cov [Y2,Y1] Cov [Y2,Y2] · · · Cov [Y2,Yn]

......

. . ....

Cov [Yn,Y1] Cov [Yn,Y2] · · · Cov [Yn,Yn]



Forecasting

Forecast

Yf = E [Yf |~Y ]

Standard Error

se(Yf ) =

√V [Yf |~Y ]



Basic Algorithm

1 Given a vector ~Y of n measurements made at times t1, . . . , tn

as training data

2 Choose a kernel function

3 Perform a maximum likelihood estimate of the kernelparameters (hyperparameters) using the training data

4 Forecast the measurement Yf at time tf . The mean andvariance of Yf given the n measurements ~Y are

E [Yf |~Y ] = µ+ ΣTf Σ−1(~Y − ~µ)

V [Yf |~Y ] = Σff − ΣTf Σ−1Σf



Why GPR?

GPR accommodates

Asynchronous data sourcesPeriodic dataActively measured dataMissing dataStructural data

GPR can model various different trends and properties of adata set

Simple covariance functions can be combined to create morecomplex ones



Combining Covariance Functions

C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006.



New Formulae for Updating GPR Forecasts

Expected Value

E [Yf |~Y ,Yu] = E [Yf |~Y ] +

Σf

Σuf

TΣ−1Σu

−1

Σ−1Σu

−1

T ~YYu

Σuu−ΣT

u Σ−1Σu

Variance

V [Yf |~Y ,Yu] = V [Yf |~Y ]−

Σf

Σuf

TΣ−1Σu

−1

2

Σuu−ΣTu Σ−1Σu

Computationally efficient - no new matrix inversions

No need to redo whole process each time a new data point isreceived



Variance - Two Questions

Question 1

What is the effect of history length on prediction error?

tn+1 tn t2 t1 tf· · ·

E [Var [Yf |Y1, · · · ,Yn]− Var [Yf |Y1, · · · ,Yn+1]] =???




Question 2

How does the variance change as our forecasting point moves outin time?




Both of these questions boil down to a study of the same quantity:

KTf K−1Kf



Bounds

Using the Rayleigh-Ritz theorem, we can bound the quantity thatwe are interested in, giving us:

1λmax (K)K

Tf Kf ≤ KT

f K−1Kf ≤ 1λmin(K)K

Tf Kf

Or more simply:

1λmax (K)nk(t)2 ≤ KT

f K−1Kf ≤ 1λmin(K)nk(t)2



Future Work

Revisit this work from an information theoretic perspective

Improve network performance characteristics forecasting usingmultivariate data



Acknowledgements

Department of Energy Research Assistantship

MATLAB Code: Carl Edward Rasmussen and Hannes Nickisch



Questions?