+ All Categories
Home > Documents > Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to...

Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to...

Date post: 20-Dec-2015
Category:
View: 213 times
Download: 0 times
Share this document with a friend
24
Supervised Learning I, Cont’d
Transcript
Page 1: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

Supervised Learning I, Cont’d

Page 2: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

Administrivia•Machine learning reading group

•Not part of/related to this class

•We read advanced (current research) papers in the ML field

•Might be of interest. All are welcome

•Meets Fri, 3:00-4:30, FEC349 conf room

•More info: http://www.cs.unm.edu/~terran/research/reading_group/

•Lecture notes online

•Pretest/solution set online

Page 3: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

5 minutes of math...•Solve the linear system

Page 4: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

5 minutes of math...•What if this were a scalar equation?

Page 5: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

5 minutes of math...•Not much different for linear systems

•Linear algebra developed to make working w/ linear systems as easy as working w/ linear scalar equations

•BUT matrix multiplication doesn’t commute!

NOTE! not

Page 6: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

5 minutes of math...•So when does this work? When does a

solution for V exist/unique?

•Think back to scalar version:

•When does this have a solution?

•What’s the moral equivalent for linear systems?

Page 7: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

5 minutes of math...

•The moral equivalent of a scalar “0” is a “singular matrix”

•Many ways to determine this. Simplest is the determinant:

•System has (unique) solution iff

Page 8: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

5 minutes of math...•Finally, what “shapes” are all of the

parts?

•RHS and LHS must have same shape...

•So R must be a column vector

•What about c T V ?

Column vector

Page 9: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

5 minutes of math...•Consider some cases. What if T is a

vector?

•What about a rectangular matrix?

Page 10: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

5 minutes of math...⇒For the term c T V to be a column vec, T

must be a square matrix:

Page 11: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

•Feature (attribute):

•Instance (example):

•Label (class):

•Feature space:

•Training data:

Review of notation

Page 12: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

Hypothesis spaces•The “true” we want is usually called

the target concept (also true model, target function, etc.)

•The set of all possible we’ll consider is called the hypothesis space,

•NOTE! Target concept is not necessarily part of the hypothesis space!!!

•Example hypothesis spaces:

•All linear functions

•Quadratic & higher-order fns.

Page 13: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

Visually...

Space of all functionson

Might be here

Or it might be here...

Page 14: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

More hypothesis spacesRulesif (x.skin==”fur”) { if (x.liveBirth==”true”) { return “mammal”; } else { return “marsupial”; }} else if (x.skin==”scales”) { switch (x.color) { case (”yellow”) { return “coral snake”; } case (”black”) { return “mamba snake”; } case (”green”) { return “grass snake”; } }} else { ...}

Page 15: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

More hypothesis spacesDecisionTrees

Page 16: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

More hypothesis spacesDecisionTrees

Page 17: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

Finding a good hypothesis•Our job is now: given an in some

and an , find the best we can by searching

Space of all functionson

Page 18: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

Measuring goodness•What does it mean for a hypothesis to be

“as close as possible”?

•Could be a lot of things

•For the moment, we’ll think about accuracy

•(Or, with a higher sigma-shock factor...)

Page 19: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

Constructing DT’s, intro•Hypothesis space:

•Set of all trees, w/ all possible node labelings and all possible leaf labelings

•How many are there?

•Proposed search procedure:3.Propose a candidate tree,

4.Evaluate accuracy of w.r.t. and

5.Keep max accuracy

6.Go to 1

•Will this work?

Page 20: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

A more practical alg•Can’t really search all possible trees

•Instead, build tree greedily and recursively:DecisionTree buildDecisionTree(X,Y)Input: InstanceSet X, LabelSet YOutput: decision treeif (pure(X,Y)) { return new Leaf(Y); }else {

Attribute a=getBestSplitAttribute(X,Y);DecisionNode n=new DecisionNode(a);[X1, ..., Xk, Y1, ..., Yk]=splitData(X,Y,a);for (i=1;i<=k;++i) {

n.addChild(buildDecisionTree(Xi,Yi));}return n;

}

Page 21: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

A bit of geometric intuition

x1: petal length

x2:

sepa

l wid

th

Page 22: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

The geometry of DTs•Decision tree splits space w/ a series of axis

orthagonal decision surfaces

•A.k.a. axis parallel

•Equivalent to a series of half-spaces

•Intersection of all half-spaces yields a set of hyper-rectangles (rectangles in d>3 dimensional space)

•In each hyper-rectangle, DT assigns a constant label

•So a DT is a piecewise-constant approximator over a sequence of hyper-rectangular regions

Page 23: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

Filling out the algorithm•Still need to specify a couple of

functions:

•pure(X)

•Determine whether we’re done splitting set X

•getBestSplitAttribute(X,Y)

•Find the best attribute to split X on

•pure(X) is the easy (easier, anyway) one...

Page 24: Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.

Splitting criteria•What properties do we want our getBestSplitAttribute() function to have?

•Increase the purity of the data

•After split, new sets should be closer to uniform labeling than before the split

•Want the subsets to have roughly the same purity

•Want the subsets to be as balanced as possible

•These choices are designed to produce small trees

•Definition: Learning bias == tendency to find one class of solution out of H in preference to another


Recommended