Introduction GPR Derivations Conclusion
Gaussian Process Regression Forecasting ofComputer Network Conditions
Christina Garman
Bucknell University
August 3, 2010
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 1 / 22
Introduction GPR Derivations Conclusion
What are we doing and why do we care?
We have investigated Gaussian process regression forforecasting network conditions
Computer network conditions concern:
Users with large data transfers or resource-intensiveapplicationsNetwork engineers monitoring the quality of their networkNetwork researchers
Gaussian process regression has not been applied to the fieldof computer networking
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 2 / 22
Introduction GPR Derivations Conclusion
Computer Networking
A computer network is a system of computers and devicesconnected to share information and resources
Performance metrics of interest
Available bandwidthLatencyLoss
L. Peterson and B. Davie, Computer Networks: A Systems Approach, Elsevier, 2007.
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 3 / 22
Introduction GPR Derivations Conclusion
Background
Our forecasting efforts focus on the Department of Energy’sEnergy Sciences Network (ESnet)
Forecasts are done in MATLAB. We have created a frameworkthat allows the code to be run directly in MATLAB or from aC program.
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 4 / 22
Introduction GPR Derivations Conclusion
ESnet
Department of Energy, Energy Sciences Network (Esnet), http://www.es.net/pub/maps/topology.html
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 5 / 22
Introduction GPR Derivations Conclusion
Gaussian Process Regression
Definition
A Gaussian process is an indexed set of random variables, anyfinite number of which have a joint Gaussian distribution. It can becompletely specified by a mean function and covariance function.
Gaussian process regression (GPR) allows us to make predictions ofcontinuous quantities based on “learning” from a set of trainingdata.
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 6 / 22
Introduction GPR Derivations Conclusion
What is a covariance function?
Also called a kernel function
Chosen in a way that best fits the data
Gives us a model of the data
Controls the properties of the Gaussian process
Has adjustable parameters, called hyperparameters
k(xi , xj ) = σ2e− 1
2
(|xi−xj |
l
)2
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 7 / 22
Introduction GPR Derivations Conclusion
What are hyperparameters?
Adjustable
Can be “learned” or inferred from a set of training data
Allow the kernel function to provide the best description ofthe current data
k(xi , xj ) = σ2e− 1
2
(|xi−xj |
l
)2
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 8 / 22
Introduction GPR Derivations Conclusion
Maximum Likelihood Estimation
Used to “learn” the hyperparameters
µ =~Y TK−1~1
~1TK−1~1
σ2 =1
n~Y TK−1~Y
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 9 / 22
Introduction GPR Derivations Conclusion
Terminology
Expected Value
E [X ] =∑
i
pixi
VarianceV [X ] = E [(X − E [X ])2]
Covariance
Cov [X ,Y ] = E [(X − E [X ])(Y − E [Y ])]
Σ = Cov [~Y ] =
Cov [Y1,Y1] Cov [Y1,Y2] · · · Cov [Y1,Yn]Cov [Y2,Y1] Cov [Y2,Y2] · · · Cov [Y2,Yn]
......
. . ....
Cov [Yn,Y1] Cov [Yn,Y2] · · · Cov [Yn,Yn]
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 10 / 22
Introduction GPR Derivations Conclusion
Forecasting
Forecast
Yf = E [Yf |~Y ]
Standard Error
se(Yf ) =
√V [Yf |~Y ]
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 11 / 22
Introduction GPR Derivations Conclusion
Basic Algorithm
1 Given a vector ~Y of n measurements made at times t1, . . . , tn
as training data
2 Choose a kernel function
3 Perform a maximum likelihood estimate of the kernelparameters (hyperparameters) using the training data
4 Forecast the measurement Yf at time tf . The mean andvariance of Yf given the n measurements ~Y are
E [Yf |~Y ] = µ+ ΣTf Σ−1(~Y − ~µ)
V [Yf |~Y ] = Σff − ΣTf Σ−1Σf
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 12 / 22
Introduction GPR Derivations Conclusion
Why GPR?
GPR accommodates
Asynchronous data sourcesPeriodic dataActively measured dataMissing dataStructural data
GPR can model various different trends and properties of adata set
Simple covariance functions can be combined to create morecomplex ones
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 13 / 22
Introduction GPR Derivations Conclusion
Combining Covariance Functions
C. Rasmussen and C. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006.
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 14 / 22
Introduction GPR Derivations Conclusion
New Formulae for Updating GPR Forecasts
Expected Value
E [Yf |~Y ,Yu] = E [Yf |~Y ] +
Σf
Σuf
TΣ−1Σu
−1
Σ−1Σu
−1
T ~YYu
Σuu−ΣT
u Σ−1Σu
Variance
V [Yf |~Y ,Yu] = V [Yf |~Y ]−
Σf
Σuf
TΣ−1Σu
−1
2
Σuu−ΣTu Σ−1Σu
Computationally efficient - no new matrix inversions
No need to redo whole process each time a new data point isreceived
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 15 / 22
Introduction GPR Derivations Conclusion
Variance - Two Questions
Question 1
What is the effect of history length on prediction error?
tn+1 tn t2 t1 tf· · ·
E [Var [Yf |Y1, · · · ,Yn]− Var [Yf |Y1, · · · ,Yn+1]] =???
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 16 / 22
Introduction GPR Derivations Conclusion
Variance - Two Questions
Question 2
How does the variance change as our forecasting point moves outin time?
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 17 / 22
Introduction GPR Derivations Conclusion
Variance - Two Questions
Both of these questions boil down to a study of the same quantity:
KTf K−1Kf
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 18 / 22
Introduction GPR Derivations Conclusion
Bounds
Using the Rayleigh-Ritz theorem, we can bound the quantity thatwe are interested in, giving us:
1λmax (K)K
Tf Kf ≤ KT
f K−1Kf ≤ 1λmin(K)K
Tf Kf
Or more simply:
1λmax (K)nk(t)2 ≤ KT
f K−1Kf ≤ 1λmin(K)nk(t)2
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 19 / 22
Introduction GPR Derivations Conclusion
Future Work
Revisit this work from an information theoretic perspective
Improve network performance characteristics forecasting usingmultivariate data
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 20 / 22
Introduction GPR Derivations Conclusion
Acknowledgements
Department of Energy Research Assistantship
MATLAB Code: Carl Edward Rasmussen and Hannes Nickisch
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 21 / 22
Introduction GPR Derivations Conclusion
Questions?
Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010 22 / 22