+ All Categories
Home > Documents > Slides 18

Slides 18

Date post: 10-Nov-2015
Category:
Upload: seanwu95
View: 217 times
Download: 2 times
Share this document with a friend
Description:
dfa adsg ag as a asd af
Popular Tags:
25
h ←→ H ←→ Hi Hi P(x) training testing x
Transcript
  • Review of Leture 17

    O

    am's Razor

    The simplest model that

    ts the data is also the

    most plausible.

    omplexity of h omplexity of H

    unlikely event signiant if it happens

    Sampling biasHi

    Hi

    P(x)

    trainingtesting

    x

    Data snooping

    PSfrag replaements

    Day

    C

    u

    m

    u

    l

    a

    t

    i

    v

    e

    P

    r

    o

    t

    %

    no snooping

    snooping

    0 100 200 300 400 500

    -10

    0

    10

    20

    30

  • Learning From Data

    Yaser S. Abu-Mostafa

    California Institute of Tehnology

    Leture 18: Epilogue

    Sponsored by Calteh's Provost Oe, E&AS Division, and IST Thursday, May 31, 2012

  • Outline

    The map of mahine learning

    Bayesian learning

    Aggregation methods

    Aknowledgments

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 2/23

  • It's a jungle out there

    stochastic gradient descent

    nonlinear transformation

    overfitting data snooping

    Occams razor

    perceptronsdata contamination

    error measures

    cross validation

    linear models

    types of learningkernel methods

    logistic regression

    training versus testing

    VC dimension linear regressiondeterministic noise

    noisy targets biasvariance tradeoff

    RBF

    SVM

    weight decayregularization

    softorder constraint

    sampling bias neural networks

    exploration versus exploitation

    weak learners

    Gaussian processes

    active learning

    graphical models

    decision trees

    ensemble learning

    Bayesian prior

    collaborative filtering

    clustering

    hidden Markov models

    distributionfree

    ordinal regression

    Boltzmann machines

    no free lunch

    mixture of experts

    Q learning

    learning curves

    semisupervised learning

    is learning feasible?

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 3/23

  • The map

    TECHNIQUES PARADIGMSTHEORY

    VC

    biasvariance

    complexity

    bayesian

    unsupervised

    reinforcement

    supervised

    online

    active

    neural networks

    RBF

    nearest neighbors

    SVD

    linear

    SVMaggregation

    input processinggaussian processes

    graphical models

    models methods

    regularization

    validation

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 4/23

  • Outline

    The map of mahine learning

    Bayesian learning

    Aggregation methods

    Aknowledgments

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 5/23

  • Probabilisti approah

    f: X Yx( )P

    y x yNN11xD =

    HYPOTHESIS SET

    ALGORITHMLEARNING FINAL

    HYPOTHESIS

    H

    AX Yg:

    xN1x , ... ,

    x

    x x( ) ( )g ~ f~

    UNKNOWN TARGET DISTRIBUTION

    target function plus noiseP y ( | )xP y P y P y

    Hi

    Hi

    UNKNOWNINPUT

    DISTRIBUTION

    DATA SET( , ), ... , ( , )

    Extend probabilisti role to all omponents

    P (D | h = f) deides whih h (likelihood)

    How about P (h = f | D) ?

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 6/23

  • The prior

    P (h = f | D) requires an additional probability distribution:

    P (h = f | D) =P (D | h = f) P (h = f)

    P (D) P (D | h = f) P (h = f)

    P (h = f) is the prior

    P (h = f | D) is the posterior

    Given the prior, we have the full distribution

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 7/23

  • Example of a prior

    Consider a pereptron: h is determined by w = w0, w1, , wd

    A possible prior on w: Eah wi is independent, uniform over [1, 1]

    This determines the prior over h - P (h = f)

    Given D, we an ompute P (D | h = f)

    Putting them together, we get P (h = f | D)

    P (h = f)P (D | h = f)

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 8/23

  • A prior is an assumption

    Even the most neutral prior:

    x is unknown

    11x

    P(x)x is random

    Hi

    Hi

    1 1

    The true equivalent would be:

    x is unknown

    11x

    x is random

    Hi

    Hi

    1 1a

    a(x )

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 9/23

  • If we knew the prior

    . . . we ould ompute P (h = f | D) for every h H

    = we an nd the most probable h given the data

    we an derive E(h(x)) for every x

    we an derive the error bar for every x

    we an derive everything in a prinipled way

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 10/23

  • When is Bayesian learning justied?

    1. The prior is valid

    trumps all other methods

    2. The prior is irrelevant

    just a omputational atalyst

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 11/23

  • Outline

    The map of mahine learning

    Bayesian learning

    Aggregation methods

    Aknowledgments

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 12/23

  • What is aggregation?

    Combining dierent solutions h1, h2, , hT that were trained on D:

    Hi

    Hi

    Regression: take an average

    Classiation: take a vote

    a.k.a. ensemble learning and boosting

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 13/23

  • Dierent from 2-layer learning

    In a 2-layer model, all units learn jointly:

    training data

    AlgorithmLearning

    Hi

    Hi

    In aggregation, they learn independently then get ombined:

    training data

    AlgorithmLearning

    Hi

    Hi

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 14/23

  • Two types of aggregation

    1. After the fat: ombines existing solutions

    Example. Netix teams merging blending

    2. Before the fat: reates solutions to be ombined

    Example. Bagging - resampling D

    training data

    AlgorithmLearning

    Hi

    Hi

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 15/23

  • Deorrelation - boosting

    Create h1, , ht, sequentially: Make ht deorrelated with previous h's:

    training data

    AlgorithmLearning

    Hi

    Hi

    Emphasize points in D that were mislassied

    Choose weight of ht based on Ein(ht)

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 16/23

  • Blending - after the fat

    For regression, h1, h2, , hT g(x) =

    T

    t=1

    t ht(x)

    Prinipled hoie of t's: minimize the error on an aggregation data set pseudo-inverse

    Some t's an ome out negative

    Most valuable ht in the blend?

    Unorrelated ht's help the blend

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 17/23

  • Outline

    The map of mahine learning

    Bayesian learning

    Aggregation methods

    Aknowledgments

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 18/23

  • Course ontent

    Professor Malik Magdon-Ismail, RPI

    Professor Hsuan-Tien Lin, NTU

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 19/23

  • Course sta

    Carlos Gonzalez (Head TA)

    Ron Appel

    Costis Sideris

    Doris Xin

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 20/23

  • Filming, prodution, and infrastruture

    Leslie Maxeld and the AMT sta

    Rih Fagen and the IMSS sta

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 21/23

  • Calteh support

    IST - Mathieu Desbrun

    E&AS Division - Ares Rosakis and Mani Chandy

    Provost's Oe - Ed Stolper and Melany Hunt

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 22/23

  • Many others

    Calteh TA's and sta members

    Calteh alumni and Alumni Assoiation

    Colleagues all over the world

    AML Creator: Yaser Abu-Mostafa - LFD Leture 18 23/23

  • Faiza A. IbrahimTo the fond memory of


Recommended