+ All Categories
Home > Documents > Machine!Learning!Practical!! NITPSummer!Course2013! Pamela...

Machine!Learning!Practical!! NITPSummer!Course2013! Pamela...

Date post: 26-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
13
NITP Summer 2015 PK Douglas Machine Learning Practical NITP Summer Course 2013 Pamela K. Douglas UCLA Semel Institute Email: [email protected]
Transcript
Page 1: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

             

Machine  Learning  Practical    NITP  Summer  Course  2013  

   

Pamela  K.  Douglas  UCLA  Semel  Institute  

Email:  [email protected]    

 

Page 2: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

Topics  Covered      Part  I:  WEKA  Basics  J    Part  II:  MONK  Data  Set  &  Feature  Selection  (from  the  Kohavi  &  John  1997)  

• We  will  run  part  this  together    Part  III:  Applying  WEKA  to  the  Haxby  data  set  

                                   

Page 3: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

Part  1.  Weka  Basics    Background    What  is  Weka?    Weka  is  data  mining  software  written  in  Java.    It  contains  a  collection  of  machine  learning  algorithms  (supervised  &  unsupervised),  regression  tools,  and  feature  selection  methods.  Weka  is  open  source  and  freely  available  at:  http://www.cs.waikato.ac.nz/ml/weka/.      Currently,  Weka  only  deals  with  “flat”  files.    However,  an  import  Nifti  button  will  be  added  to  the  next  version  of  Weka.    Input  files  are  called  Attribute  Relation  File  Format  (.arff)  files.    Until  their  “brain  button”  is  released,  you  must  first  convert  your  data  into  this  format.    Example  MATLAB  files  to  do  this  are  available  on  the  NITP  website.    Benefits  of  Weka:    

1.) It  is  very  easy  to  do  cross  validation  &  nested  cross  validation  with  the  simple  use  of  a  flag.  2.) There  are  many  classifiers  available.  Each  of  these  have  been  vetted  by  the  machine  learning  

community.  3.) The  classification  part  is  very  fast.    The  art  of  using  WEKA  is  in  the  feature  selection  step.  

   The  Weka  File  Format      In  WEKA,  features  are  called  “attributes.”    The  first  section  of  the  Input  file  is  the  Header.    In  this  section  of  the  input  file,  one  simply  names  (or  initializes)  all  of  the  features.  Each  feature  must  be  declared  or  listed  as  a  separate  line  starting  with  @attribute,  followed  by  the  feature  name,  and  then  the  variable  type.        

Example:  @attribute  HaxbyVoxel1  real  

 The  <variable  type>  can  be  any  of  the  options  supported  by  Weka:  

numeric   integer     real     string   date    

     Note,  for  most  neuroimaging  features,  we  will  use  either  a  real  or  numeric  variable  type.    However,  we  may  also  wish  to  include  sex  or  behavioral  data  that  may  be  a  string  (e.g.,  “female”).    The  next  part  of  the  .arff  file  contains  the  data  as  a  comma  separated  list  followed  by  the  class  label  (e.g.  face).  The  entries  on  the  line  must  correspond  to  the  order  of  features/attributes  listed  in  the  header.    For  example,  if  

Page 4: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

there  are  4  voxels  being  used  as  features,  and  the  first  example  (or  instance)  is  when  subjects  viewed  a  face,  the  first  line  may  look  like  this:    -­‐0.23,  0.56,  0.78,    0.51,  face      Testing  (a  variety  of)  Machine  Learning  Classifiers    Why  Test  Multiple  Classifiers?  According  to  the  Wolpert  &  MacGreedy  “no  free  lunch”  theorem,  there  is  no  single  learning  algorithm  that  universally  performs  best  across  all  domains.  Most  Supervised  ML  algorithms  differ  in  the  model  g(x|θ)  complexity  that  they  use  to  describe  inputs,  x,  using  parameters  θ,  (the  inductive  bias),  the  loss  function  used,  and/or  the  optimization  procedure  used  to  best  fit  the  model  parameters  to  the  data.  You  may  therefore  wish  to  test  a  series  of  classifier  model  hypotheses.    Let’s  try  a  few  using  the  WEKA  gui  on  one  of  the  supplied  test  sets.    Launch  the  graphical  user  interface  for  weka  by  navigating  to  WEKA-­‐3-­‐6.  Then  double  click  on  the  Weka  Icon.      

   From  here,  select  the  icon  for  Explorer  on  the  main  Weka  menu  (see  below).    

   You  should  now  see  a  screen  like  that  shown  below.    

Page 5: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

   From  the  Preprocess  menu  (at  the  top),  select  ‘Open  File…’,  and  navigate  to  the  iris.arff  file.  Once  you  have  selected  this  file,  the  data  should  be  loaded  in,  and  should  look  like  what  you  see  below.    

   Select  the  button,  “Visualize  All,”  to  view  a  histogram  of  each  attribute’s  distribution  by  class.    Are  there  some  attributes  that  are  more  informative  than  others?    

Page 6: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

     Now  select  the  “Classify”  from  the  top  menu.    We  will  now  select  the  classifier  to  use  on  the  iris  data  set.    We  will  start  by  using  the  J48  Decision  Tree  algorithm,  which  implements  the  C4.5  Decision  Tree,  as  described  originally  by  Quinlan  (1993).    You  can  find  this  algorithm  by  selecting  classifiers>>trees>>J48.    

   There  are  various  approaches  to  determine  the  performance  of  classifiers,  however  cross-­‐validation  seems  to  be  the  most  popular.      In  cross-­‐validation,  a  number  of  folds  n  is  specified.  The  dataset  is  randomly  reordered  and  then  split  into  n  folds  of  equal  size.  In  each  iteration,  one  fold  is  used  for  testing  and  the  other  n-­‐1  folds  are  used  for  training  the  classifier.  The  test  results  are  collected  and  averaged  over  all  folds.  This  gives  the  cross-­‐validation  estimate  of  the  accuracy.    

Page 7: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

 The  default  is  to  randomly  assign  your  data  into  folds.    However,  you  can  create  folds  that  have  the  same  number  of  class  examples  per  fold.  This  is  called  stratified  cross  validation.  Leave-­‐one-­‐out  (loo)  cross-­‐validation  signifies  that  n  is  equal  to  the  number  of  examples.  Out  of  necessity,  loo  cv  has  to  be  non-­‐stratified,  i.e.  the  class  distributions  in  the  test  set  are  not  related  to  those  in  the  training  data.  Leave  one  out  cv  can  be  useful  in  dealing  with  datasets  with  a  few  number  of  exemplars,  since  it  utilizes  the  greatest  amount  of  training  data.  However,  it  provides  only  one  example  for  testing  and  assessing  accuracy.    Here,  we  will  start  by  using  the  default,  10-­‐fold  (randomly  assigned)  cross  validation.    Now  click  Start.    The  Weka  bird  will  pace  back  and  forth  while  you  classify.  The  output  should  look  like  what  you  see  below:      

     You  may  with  to  examine  the  confusion  matrix,  which  indicates  the  number  of  correctly  classified  instances  on  the  diagonal,  and  mis-­‐classified  data  on  the  off-­‐diagonal.      

     Now  try  out  a  different  classifier,  Naïve  Bayes.  How  does  Naïve  Bayes  compare  to  the  J48  Tree?  How  about  Support  Vector  Machine?  (hint:  its  located  under    classifier  à  functions,  and  is  called  SMO)              

Page 8: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

Part  2.  Monk1  Data  Set  –  Feature  Selection    Background.      The  classic  Monk-­‐1  dataset  (Thrun  et  al.  1991),  available  on  the  UCI  database  repository  (http://archive.ics.uci.edu/ml/machine-­‐learning-­‐databases/monks-­‐problems/monks.names).  was  the  first  one  used  in  an  international  competition  applying  ML  algoriths  to  the  same  dataset.  It  is  sometimes  still  used  for  ML  benchmarking  purposes.    Now,  load  the  monk1_train.arff  using  the  preprocess  tab  at  the  top,  as  you  did  previously  for  the  Iris  data.    Next  click  on  classify,  and  select  ‘supplied  test  set.’    Navigate  to  the  file  called  monk1_test.arff.    Click  close.    Try  using  AdaBoost  to  classify  this  data  set  (located  under  the  ‘meta’  menu  tab).    The  dangers  of  ‘circular  logic’  have  been  discussed  in  detail  in  the  neuroimaging  literature  (see  Kriegeskorte.  Circular  analysis  in  systems  neuroscience:  the  dangers  of  double  dipping.  Nat  Neurosci.  2009).    In  order  to  avoid  this  pitfall,  one  can  set  aside  a  separate  set  altogether  called  a  validation  set,  which  has  not  been  touched  at  all  in  any  of  the  processing.    **  NOTE  –Feature  Selection  should  be  run  on  your  Training  Data  only!!!  Using  your  Test  Data  in  Feature  Selection  is  another  form  of  ‘peeking’.  !    Feature  Selection  Step.      Redundant  (or  highly  correlated)  features  degrade  classifier  performance.  Neuroimaging  data  have  many  spatially  contiguous  highly  correlated  voxels.    In  this  Monk-­‐1  data  set,  see  how  only  a  few  redundant  features  causes  problems!    With  the  Monk-­‐1  data  set,  Features  1  and  2  are  highly  correlated.    Try  going  back  to  the  ‘preprocess’  screen,  and  select  the  second  feature  for  removal.      Using  AdaBoost,  try  classifying  the  data  set  again.    (Note:  you  will  need  to  use  the  validation  set  called  monk1_test_minus2.arff).  Did  the  classifier  perform  better?    There  are  a  number  of  approaches  to  feature  subset  selection.  Forward  selection  begins  with  an  empty  set  of  features;  whereas  backward  elimination  refers  to  a  search  that  begins  at  the  full  set  of  features.    Each  of  these  methods  are  generally  performed  iteratively.    You  may  wish  to  try  out  a  few  of  feature  selection  methods  on  the  original  monk1_train.arff  data  set.    To  do  so,  click  on  the  Weka  tab  called  “Select  Attributes”  at  the  top.        

Page 9: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

Part  3.  Running  the  Haxby  et  al.  2001  Data  Set  with  Weka    3.1.  Background  &  Setup.      In  the  well-­‐known  Haxby  et  al.  2001  Science  paper,  fMRI  data  was  collected  while  subjects  passively  viewed  data  from  one  of  eight  categories  of  objects.    Distributed  and  overlapping  response  patterns  to  each  stimulus  category  were  identified  –  even  within  regions  that  responded  maximally  to  only  one  response  category.      Here  we  will  explore  these  data  using  Weka  and  MVPA  scripts.    First,  let’s  get  started  with  Weka,  and  see  how  different  classifiers  perform  on  these  data.    Open  MATLAB,  and  add  the  folder  NITP_ML_2015  to  your  path.    Make  sure  to  “add  with  subfolders.”    Note,  this  folder  contains  the  following:    

a.) Scripts  for  converting  NIFTI  data  to  Weka  format  b.) The  MVPA  toolbox  c.) The  MATLAB  NIFTI  toolbox  

 

If  you  already  have  MVPA  and  the  NIFTI  toolbox  installed,  you  may  wish  to  add  only  the  main  folder  or  appropriate  select  folders.  

 Step  1:  Convert  the  Haxby  Data  to  Weka  format.    Open  the  script  called,  “create_arff_nifti_2015.m”.      Notice  that  it  is  rather  straight  forward  to  convert  the  data  to  Weka  format.    In  this  first  script,  we  apply  the  ventral  –temporal  lobe  masks  provided  within  the  Haxby  data  set  to  select  all  voxels  within  that  ROI.    To  run  this  script,  you  must  simply  provide  two  input  variables.    The  first  is  the  path  to  the  data  (string).    We  will  try  this  out  for  subject  1.  The  second  is  what  you  wish  to  name  your  output  Weka  file  (string).  For  example,  at  the  MATLAB  command  line,  you  might  type:    data_dir=’/Users/NITP_Student/NITP_ML_2015  /subj1/’  Weka_file=’Haxby_Subj1_ROI’    Now,  run  this  script.    create_arff_nifti_2015(data_dir,  Weka_file)    Check  the  output.    A  file  called  Haxby_Subj1_ROI.arff  should  have  been  created  in  the  data  directory.    

Page 10: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

 Load  the  file  into  Weka,  and  test  out  the  support  vector  machine  results.  On  the  8  class  problem,  you  should  get  ~65%  accuracy.  Right  click  on  the  bar  at  the  top  to  see  all  the  options  available.    You  can  try  out  different  kernels,  and  change  parameters  like  the  ‘C’  penalty  term.    To  get  a  detailed  description  of  your  options,  click  “More.”    

   Try  out  some  other  classifiers.    You  might  try  the  NaiveBayes  classifer,  which  applies  the  conditional  independence  assumption.  As  you  will  see  it  performs  very  poorly.        On  the  other  hand,  you  might  try  the  MultiClassClassifier  under  “Meta.”    How  does  this  perform  under  the  default  parameters?    (Better  than  SVM?)  Note,  with  many  of  the  “meta”  classifiers,  you  have  the  option  to  choose  a  variety  of  base  classifiers  whose  output  information  is  typically  boosted  (e.g.,  AdaBoost)  or  voted  upon,  perhaps  after  a  bagging  procedure  (e.g.,  Random  Forest).      Ok!    Now  you  should  be  somewhat  familiar  with  Weka!                    

Page 11: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

Regularization  &  Hyperparameter  Tuning    “To  validate  the  generalization  ability  of  a  classifier  with  hyperparameters  one  has  to  perform  a  nested  cross-­‐validation.  On  each  training  set  of  the  outer  cross-­‐validation,  an  inner  cross-­‐validation  is  performed  for  different  values  of  the  hyperparameters.  The  one  with  minimum  (inner)  cross-­‐validation  error  is  selected  and  evaluated  on  the  test  set  of  the  outer  cross-­‐validation.”  –  Muller  et  a.  (2004)    Why  perform  parameter  tuning?  One  example  that  illustrates  why  this  is  important  is  shown  on  the  right  from  the  Alypaydin  (2004)  textbook.    When  too  many  neighbors  are  used  in  K-­‐NN,  the  algorithm  begins  to  overfit,  and  won’t  generalize  well  to  incoming  data  sets.                                Instead  of  performing  feature  selection  as  a  separate  step,  you  may  wish  to  use  regularization.    A  regularization  term  can  be  added  to  a  ML  algorithm  objective  function  that  trades  of  complexity  and  accuracy.    The  ‘C’  term  in  SVM  is  essentially  a  regularization  term.    If  time  permits,  you  should  try  out  different  values  for  the  C  parameter.    You  may  need  to  make  large  (order  of  magnitude)  changes  to  see  a  difference.        Tuning  the  ‘C’  parameter  can  be  vital.    Note  –  in  order  to  do  this  you  will  need  to  perform  a  “Nested  Cross-­‐Valdiation.”      Within  MVPA,  you  can  do  this  easily.    There  are  a  number  of  good  tutorial  walk-­‐throughs  to  do  this.    For  the  purposes  of  this  exercise,  we  will  simply  test  how  this  parameter  influences  the  outcome  using  cross-­‐validation  on  the  training  data.    If  using  Weka  for  this  purpose,  it  is  useful  to  optimize  hyperparameters  using  command  line  scripts.      Note  on  Batching  Weka        To  batch  Weka,  you  can  create  scripts  (bash,  Perl,  etc.)  that  use  their  command  line  options.  The  bar  that  you  right  click  to  get  the  options  also  gives  you  the  line  that  you  type  if  you  choose  to  run  Weka  at  the  command  line.    With  large  neuroimaging  data  sets,  you  may  need  to  request  more  memory.    Use  the  ‘Xmx’  flag  followed  by  the  memory  request.    

Page 12: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

 MVPA:  The  Haxby  et  al.  2001  Data  Set      MVPA  is  perfectly  suited  for  classifying  and  decoding  fMRI  data.  It  has  a  number  of  capabilities  built  in  for  feature  selection,  cross-­‐validation,  and  parameter  tuning.    One  of  the  nice  aspects  of  MVPA  is  the  ability  to  run  searchlight  feature  selection  within  this  toolbox.  Furthermore,  there  are  tools  for  running  permutation  tests  (where  the  labels  are  shuffled).  In  doing  so,  you  can  create  distributions  using  these  shuffled  labels  to  test  your  accuracy  outcome  against.    There  are  a  number  of  detailed  tutorials  available.  W  e  suggest  you  try  out  some  of  the  tutorial  scripts  available  as  part  of  the  MVPA  toolbox.    They  can  be  found  here:    /mvpa/core/tutorial_easy  ..        Further  Explorations    MVPA  and  Weka  may  also  be  run  together.    You  can  perform  all  your  preprocessing  in  MVPA,  and  then  some  of  the  additional  classifiers  available  in  Weka  from  Matlab.    If  time  permits,  you  might  also  try  WEKA’s  feature  selection  tools  under  the  “SELECT  ATTRIBUTES”  menu.  Try  the  linear  forward  search.    Do  the  “informative  features”  found  match  the  ones  that  you  would  expect  by  visual  inspection  of  the  matrix?    To  see  what  command  you  should  use  –  try  holding  your  curser  over  the  main  classifier  window.    WEKA  should  give  you  the  command  needed.    This  comes  in  handy  when  running  a  number  of  optimizations.          **  Note  –  We  will  not  do  this  today,  however  I  wanted  to  let  you  know  that  scripts  for  converting  functional  connectivity  matrices  into  Weka  are  available  from  the  previous  Weka  NITP  tutorial.    See  NITP  website  2013.                

     

Page 13: Machine!Learning!Practical!! NITPSummer!Course2013! Pamela ...brainmapping.org/NITP/images/Summer2015Slides/Demos/NITP_Ma… · NITP Summer 2015 PK Douglas ! !!!!! Machine!Learning!Practical!!

NITP Summer 2015 PK Douglas

 Useful  References    Feature  Selection  &  Parameter  Tuning    Kerr  WT,  Douglas  PK,  Anderson  A,  Cohen  MS.  The  utility  of  data-­‐driven  feature  selection:  re:  Chu  et  al.  2012.  Neuroimage.  2014  Jan  1;84:1107-­‐10.    Kriegeskorte  N,  Goebel  R,  Bandettini  P.  Information-­‐based  functional  brain  mapping.  Proc  Natl  Acad  Sci  U  S  A.  2006  Mar  7;103(10):3863-­‐8.  Epub  2006  Feb  28.  

Muller,  K-­‐R  et  al.  Machine  Learning  Techniques  for  Brain-­‐Computer  Interfaces.  (2014)  http://doc.ml.tu-­‐berlin.de/bbci/publications/MueKraDorCurBla04.pdf  

Lemm,  S.  et  al.  Introduction  to  Machine  Learning  for  Brain  Imaging.  Neuroimage.  2011  May  15;56(2):387-­‐99.  PMID: 21172442      Why  you  should  avoid  interpreting  Feature  Weights  Directly    Haufe  et  al.  On  the  interpretation  of  weight  vectors  of  linear  models  in  multivariate  neuroimaging.  Neuroimage.  2014  Feb  15;87:96-­‐110.      Neuroimaging  &  WEKA  J    If  using  these  scripts  in  your  own  work,  please  cite:    PK   Douglas,   Sam   Harris,   Alan   Yuille,   Mark   S.   Cohen       “Performance   Comparison   of   Machine   Learning  Algorithms   and   Number   of   Independent   Components   Used   in   fMRI   Decoding   of   Belief   vs.   Disbelief”  Neuroimage  2011  May;  56(2):  544-­‐53.    PMID: 21073969        


Recommended