The curse of (attribute) dimensionality• Data may have large number of attributes
– Data matrix with many columns d
– Challenges• Models have large number of parameter
– d(d-1)/2 variances / covariances in a Gaussian model– Cumbersome, more difficult to estimate well
– Dimension reduction• Identify small number of attributes (or combinations of attributes) that
account for most of the data variation• Other dimensions of the data are viewed as noise,
! is Lagrange Multiplier used in constrained optimization. See https://en.wikipedia.org/wiki/Lagrange_multipliers
projected mean is 0 because D is centered
=
Exercise!
Scree Plots: eigenvalues in decreasing order