+ All Categories
Home > Technology > Big data from small data:  A survey of the neuroscience landscape through the Neuroscience...

Big data from small data:  A survey of the neuroscience landscape through the Neuroscience...

Date post: 07-May-2015
Category:
Upload: maryann-martone
View: 581 times
Download: 1 times
Share this document with a friend
Description:
Presentation on the NIF project to Sandia Labs, with an in depth look into NIF's data federation and strategies for creating on-line knowledge spaces
57
Maryann E. Martone, Ph. D. University of California, San Diego
Transcript
Page 1: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Maryann  E.    Martone,  Ph.  D.  University  of  California,  San  Diego  

Page 2: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Neuroscience  is  unlikely  to  be  served  by  a  few  large  databases  like  the  genomics  and  proteomics  community  

Whole  brain  data  (20  um  

microscopic  MRI)  

Mosiac  LM  images  (1  GB+)  

ConvenNonal  LM  images  

Individual  cell  morphologies  

EM  volumes  &  reconstrucNons  

Solved  molecular  structures  

No  single  technology  serves  these  all  equally  well.   Mul6ple  data  types;    mul6ple  scales;    mul6ple  

databases  

Page 3: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

hPp://neuinfo.org  

Page 4: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework
Page 5: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  NIF’s  mission  is  to  maximize  the  awareness  of,  access  to  and  uNlity  of  research  resources  produced  worldwide  to  enable  bePer  science  and  promote  efficient  use  –  NIF  unites  neuroscience  informaNon  without  respect  to  domain,  

funding  agency,  insNtute  or  community  

–  NIF  is  like  a  “Pub  Med”  for  all  biomedical  resources  and  a  “Pub  Med  Central”  for  databases  

– Makes  them  searchable  from  a  single  interface  –  PracNcal  and  cost-­‐effecNve;    tries  to  be  sensible  –  Learned  a  lot  about  current  data  prac6ces  

The  Neuroscience  InformaNon  Framework  is  an  iniNaNve  of  the  NIH  Blueprint  consorNum  of  insNtutes        hPp://neuinfo.org  

Page 6: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

h=p://neuinfo.org  June10,  2013   dkCOIN  InvesNgator's  Retreat   6  

•  A  portal  for  finding  and  using  neuroscience  resources  

  A  consistent  framework  for  describing  resources  

  Provides  simultaneous  search  of  mulNple  types  of  informaNon,  organized  by  category  

  Supported  by  an  expansive  ontology  for  neuroscience  

  UNlizes  advanced  technologies  to  search  the  “hidden  web”  

UCSD,  Yale,  Cal  Tech,  George  Mason,  Washington  Univ  

Literature  

Database  FederaNon  

Registry  

Page 7: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

We’d  like  to  be  able  to  find:  •  What  is  known****:  

–  What  are  the  projecNons  of  hippocampus?  –  Is  GRM1  expressed  In  cerebral  cortex?  –  What  genes  have  been  found  to  be  upregulated  in  

chronic  drug  abuse  in  adults  –  What  animal  models  have  similar  phenotypes  to  

Parkinson’s  disease?  –  What  studies  used  my  polyclonal  anNbody  against  

GABA  in  humans?  

•  What  is  not  known:  –  ConnecNons  among  data  –  Gaps  in  knowledge  

A  framework  makes  it  easier  to  address  these  quesNons  

Page 8: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework
Page 9: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

With  the  thousands  of  databases  and  other  informaNon  sources  available,  simple  descripNve  metadata  will  not  suffice  

Page 10: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

• NIF  curators  • NominaNon  by  the  community  • Semi-­‐automated  text  mining  pipelines  

 NIF  Registry   Requires  no  special  skills   Site  map  available  for  local  hosNng  

• NIF  Data  FederaNon  • DISCO  interop  • Requires  some  programming  skill  • Open  Source  Brain  <  2  hr  

Two  Nered  system:    low  barrier  to  entry  

Page 11: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Current  Planned  

DISCO  Dashboard  Func6ons  •  Ingest  Script  Manager  •  Public  Script  Repository  •  Data  &  Event  Tracker  •  Versioning  System  •  Curator  Tool    •  Data  Transformer  Manager  

June10,  2013   dkCOIN  InvesNgator's  Retreat   11  Luis  Marenco,  Rixin  Wang,  Perrry  Miller,  Gordon  Shepherd  Yale  University  

Page 12: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

NIF  was  designed  to  be  populated  rapidly  with  progressive  refinement  

Page 13: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Databases  come  in  many  shapes  and  sizes  

•  Primary  data:  –  Data  available  for  reanalysis,  e.g.,  

microarray  data  sets  from  GEO;    brain  images  from  XNAT;    microscopic  images  (CCDB/CIL)  

•  Secondary  data  –  Data  features  extracted  through  

data  processing  and  someNmes  normalizaNon,  e.g,  brain  structure  volumes  (IBVD),  gene  expression  levels  (Allen  Brain  Atlas);    brain  connecNvity  statements  (BAMS)  

•  TerNary  data  –  Claims  and  asserNons  about  the  

meaning  of  data  •  E.g.,  gene  upregulaNon/

downregulaNon,  brain  acNvaNon  as  a  funcNon  of  task  

•  Registries:  –  Metadata  –  Pointers  to  data  sets  or  

materials  stored  elsewhere  •  Data  aggregators  

–  Aggregate  data  of  the  same  type  from  mulNple  sources,  e.g.,  Cell  Image  Library  ,SUMSdb,  Brede  

•  Single  source  –  Data  acquired  within  a  single  

context  ,  e.g.,  Allen  Brain  Atlas  

Researchers  are  producing  a  variety  of  informaNon  arNfacts  using  a  mulNtude  of  technologies  

Page 14: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Hippocampus  OR  “Cornu  Ammonis”  OR  “Ammon’s  horn”   Query  expansion:    Synonyms  

and  related  concepts  Boolean  queries  

Data  sources  categorized  by  “data  type”  and  level  of  nervous  

system  

Common  views  across  mulNple  

sources  

Tutorials  for  using  full  resource  when  geong  there  from  

NIF  

Link  back  to  record  in  

original  source  

Page 15: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Connects  to  

Synapsed  with  

Synapsed  by  

Input  region  

innervates  

Axon  innervates  Projects  to  Cellular  contact  

Subcellular  contact  

Source  site  

Target    site  

Each  resource  implements  a  different,  though  related  model;    systems  are  complex  and  difficult  to  learn,  in  many  cases  

Page 16: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework
Page 17: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  You  (and  the  machine)  have  to  be  able  to  find  it  –  Accessible  through  the  web  –  Structured  or  semi-­‐structured  –  AnnotaNons  

•  You  (and  the  machine)    have  to  be  able  to  use  it  –  Data  type  specified  and  in  an  acNonable  form  

•  You  (and  the  machine)  have  to  know  what  the  data  mean  

•  SemanNcs  •  Context:    Experimental  metadata  •  Provenance:    where  did  they  come  from  

Page 18: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Knowledge  in  space  and  spaNal  relaNonships  (the  “where”)  

Knowledge  in  words,  terminologies  and  logical  relaNonships  (the  “what”)  

Page 19: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Purkinje  Cell  

Axon  Terminal  

Axon  DendriNc  Tree  

DendriNc  Spine  

Dendrite  

Cell  body  

Cerebellar  cortex  

There  is  liPle  obvious  connecNon  between  data  sets  taken  at  different  scales  using  different  microscopies  without  an  explicit  representaNon  of  the  biological  objects  that  the  data  represent  

Page 20: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  NIF  covers  mulNple  structural  scales  and  domains  of  relevance  to  neuroscience  •  Aggregate  of  community  ontologies  with  some  extensions  for  neuroscience,  e.g.,  Gene  

Ontology,  Chebi,  Protein  Ontology  

NIFSTD  

Organism  

NS  FuncNon  Molecule   InvesNgaNon  Subcellular  structure  

Macromolecule   Gene  

Molecule  Descriptors  

Techniques  

Reagent   Protocols  

Cell  

Resource   Instrument  

DysfuncNon   Quality  Anatomical  Structure  

Page 21: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Brain  

Cerebellum  

Purkinje  Cell  Layer  

Purkinje  cell  

neuron  

has  a  

has  a  

has  a  

is  a  

•  Ontology:  an  explicit,  formal  representaNon  of  concepts    relaNonships  among  them  within  a  parNcular  domain  that  expresses  human  knowledge  in  a  machine  readable  form  

•  Branch  of  philosophy:    a  theory  of  what  is  

•  e.g.,  Gene  ontologies  

Page 22: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  Express  neuroscience  concepts  in  a  way  that  is  machine  readable    –  Synonyms,  lexical  variants  –  DefiniNons  

•  Provide  means  of  disambiguaNon  of  strings  –  Nucleus  part  of  cell;    nucleus  part  of  brain;    nucleus  part  of  atom  

•  Rules  by  which  a  class  is  defined,  e.g.,  a  GABAergic  neuron  is  neuron  that  releases  GABA  as  a  neurotransmiPer  

•  ProperNes  –  Support  reasoning  

•  Provide  universals  for  navigaNng  across  different  data  sources  –  SemanNc  “index”  –  Link  data  through  relaNonships  not  just  one-­‐to-­‐one  mappings  

•  Provide  the  basis  for  concept-­‐based  queries  to  probe  and  mine  data  •  Establish  a  semanNc  framework  for  landscape  analysis  

MathemaNcs,  Computer  code  or  Esperanto  

Page 23: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

birnlex_1732   Brodmann.1  

Explicit  mapping  of  database  content  helps  disambiguate  non-­‐unique  and  custom  terminology  

Page 24: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

June10,  2013   24  

Aligns  sources  to  the  NIF  semanNc  framework  

Page 25: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework
Page 26: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  Search  Google:    GABAergic  neuron  

•  Search  NIF:    GABAergic  neuron  

–  NIF  automaNcally  searches  for  types  of  GABAergic  neurons  

Types  of  GABAergic  neurons  

Search by meaning not by string

Page 27: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Equivalence  classes;    restricNons  

Arbitrary  but  defensible  

• Neurons  classified  by  • Circuit  role:    principal  neuron  vs  interneuron  • Molecular  consNtuent:    Parvalbumin-­‐neurons,  calbindin-­‐neurons  • Brain  region:    Cerebellar  neuron  • Morphology:    Spiny  neuron  

•   Molecule  Roles:    Drug  of  abuse,  anterograde  tracer,  retrograde  tracer  • Brain  parts:    Circumventricular  organ  • Organisms:    Non-­‐human  primate,  non-­‐human  vertebrate  • QualiNes:    Expression  level  • Techniques:    Neuroimaging  

Page 28: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

What  genes  are  upregulated  by  drugs  of  abuse  in  the  adult  mouse?  (show  me  the  data!)  

Morphine  Increased  expression  

Adult  Mouse  

Page 29: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

• NIF  ConnecNvity:    7  databases  containing  connecNvity  primary  data  or  claims  from  literature  on  connecNvity  between  brain  regions  

• Brain  Architecture  Management  System  (rodent)  • Temporal  lobe.com  (rodent)  • Connectome  Wiki  (human)  • Brain  Maps  (various)  • CoCoMac  (primate  cortex)  • UCLA  MulNmodal  database  (Human  fMRI)  • Avian  Brain  ConnecNvity  Database  (Bird)  

• Total:    1800  unique  brain  terms  (excluding  Avian)  

• Number  of  exact  terms  used  in  >  1  database:    42  • Number  of  synonym  matches:    99  • Number  of  1st  order  partonomy  matches:    385  

Page 30: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

hPp://neurolex.org  

• SemanNc  MediWiki  

• Provide  a  simple  interface  for  defining  the  concepts  required  

• Light  weight  semanNcs  

• Good  teaching  tool  for  learning  about  semanNc  integraNon  and  the  benefits  of  a  consistent  semanNc  framework  

• Community  based:  • Anyone  can  contribute  their  terms,  concepts,  things  

• Anyone  can  edit  • Anyone  can  link  

• Accessible:    searched  by  Google  • Growing  into  a  significant  knowledge  base  for  neuroscience  

• InternaNonal  NeuroinformaNcs  CoordinaNng  Facility    

Demo    D03  

Larson  et  al,  FronNers  in  NeuroinformaNcs,  in  press  

Page 31: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  Neurolex  provides  an  on-­‐line  computable  index  for  expressing  models  in  semanNc  terms,  and  linking  to  other  knowledge  and  data  

•  Implemented  forms  for  certain  types  of  enNNes  

•  Neuroscience  knowledge  in  the  web  

Pages  are  linked  through  properNes;    Knowledge-­‐base  built  through  cross-­‐modular  relaNons  and  links  to  data;    red  links  

Page 32: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  >  1000  Dicom  Terms  –  Karl  Helmer  –  Data  Sharing  Task  Force  

•  Tasks  and  CogniNve  Concepts  from  CogniNve  Atlas  –  Russ  Poldrack  

•  >280  Neurons  –  Gordon  Shepherd  and  30  world  

wide  experts  •  ~500  fly  neurons  from  Fly  

Anatomy  Ontology  –  David  Osumi-­‐Sutherland  

•  >1200  Brain  parcellaNons  

`20,000  concepts:      Spreadsheet  downloads,  through  NIF  Web  Services,  SPARQL  endpoint  

 200,000  edits   150  contributors  

Page 33: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Because  they  are  staNc  URL’s,  Wikis  are  searchable  by  Google  

Page 34: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Neurolex:    >  1  million  triples �

Dr.  Yi  Zeng:    Chinese  neural  knowledge  base  NIF  Cell  Graph  

Page 35: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

1.  Look  brain  region  up  in  NeuroLex  2.  Look  up  cells  contained  in  the  brain  

region  3.  Find  those  cells  that  are  known  to  project  

out  of  that  brain  region  4.  Look  up  the  neurotransmiPers  for  those  

cells  5.  Determine  whether  those  

neurotransmiPers  are  known  to  be  excitatory  or  inhibitory  

6.  Report  the  projecNon  as  excitatory  or  inhibitory,  and  report  the  enNre  chain  of  logic  with  links  back  to  the  wiki  pages  where  they  were  made  

7.  Make  sure  user  can  get  back  to  each  statement  in  the  logic  chain  to  edit  it  if  they  think  it  is  wrong  

Stephen  Larson   CHEBI:18243  

Are  projecNons  from  the  VTA  excitatory  or  inhibitory?  

Page 36: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  INCF  Project  –  Neuron  Registry  –  >  30  experts  worldwide  

–  Fill  out  neuron  pages  in  Neurolex  Wiki  

–  Led  by  Dr.  Gordon  Shepherd  

Soma  locaNon  

Dendrite  locaNon  

Axon  locaNon  

0  

50  

100  

150  

200  

250  

300  

Number   Total  redlinks   easy  fixes  

hard  fixes  

Soma  locaNon  

Dendrite  locaNon  

Axon  locaNon  

Social  networks  and  community  sites  let  us  learn  things  from  the  collecNve  behavior  of  contributors  

Page 37: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

37  

neurolex.org: Semantic Wiki

• INCF Community encyclopedia • Define all vocabulary, terms, protocols, brain structures, diseases, etc

• Living review articles • Links to data, models and literature • Semantic organization, search, analysis and integration • Searchable via the web

• Global directory of all shared vocabularies, CDEs, etc

Slide  courtesy  of  Sean  Hill:    InternaNonal  NeuroinformaNcs  CoordinaNng  Facility  

Page 38: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

MarNn  Telefont,  HBP:    Lab  Space  connecNng  to  Knowledge  Space  

Page 39: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  NIF  can  be  used  to  survey  the  data  landscape  

•  Analysis  of  NIF  shows  mulNple  databases  with  similar  scope  and  content  

•  Many  contain  parNally  overlapping  data  

•  Data  “flows”  from  one  resource  to  the  next  –  Data  is  reinterpreted,  reanalyzed  or  

added  to  

•  Is  duplicaNon  good  or  bad?  NIF  is  trying  to  make  it  easier  to  work  with  diverse  data  

Page 40: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

NIF  is  in  a  unique  posiNon  to  answer  quesNons  about  the  neuroscience  landscape:    Kepler  Workflow  engine  +  NIF  semanNcs  

Where  are  the  data?  

Striatum  Hypothalamus  Olfactory  bulb  

Cerebral  cortex  

Brain  

Brain  region

 

Data  source  

Page 41: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

∞  

What  is  easily  machine  processable  and  accessible  

What  is  potenNally  knowable  

What  is  known:  Literature,  images,  human  

knowledge  

Unstructured;    Natural  language  processing,  enNty  recogniNon,  image  processing  and  

analysis;  paywalls  communicaNon  

Abstracts  vs  full  text  vs  tables  etc  

Page 42: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Closed  world  vs  open  world  

We  know  a  lot  about  some  things  and  less  about  others;    some  of  NIF’s  sources  are  comprehensive;    others  are  highly  biased  

But...NIF  has  >  2M  anNbodies,  338,000  model  organisms,  and  3  million  microarray  records  

Page 43: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Neocortex  

Olfactory  bulb  

Neostriatum  

Cochlear  nucleus  

All  neurons  with  cell  bodies  in  the  same  brain  region  are  grouped  together  

ProperNes  in  Neurolex  

Page 44: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Exposing  knowledge  gaps  and  biases  

Where  are  the  data?  

Striatum  Hypothalamus  Olfactory  bulb  

Cerebral  cortex  

Brain  

Brain  region

 

Data  source   Funding  

Page 45: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  Gemma:    Gene  ID    +  Gene  Symbol  •  DRG:    Gene  name  +  Probe  ID  

•  Gemma  presented  results  relaNve  to  baseline  chronic  morphine;    DRG  with  respect  to  saline,  so  direcNon  of  change  is  opposite  in  the  2  databases  

•           Analysis:  • 1370  statements  from  Gemma  regarding  gene  expression  as  a  funcNon  of  chronic  morphine  • 617  were  consistent  with  DRG;      over  half    of  the  claims  of  the  paper  were  not  confirmed  in  this  analysis  • Results  for  1  gene  were  opposite  in  DRG  and  Gemma  • 45  did  not  have  enough  informaNon  provided  in  the  paper  to  make  a  judgment  

RelaNvely  simple  standards  would  make  life  easier  

Page 46: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

NIF  favors  a  hybrid,  Nered,  federated  system  

•  Domain  knowledge  –  Ontologies  

•  Claims,  models  and  observaNons  –  Virtuoso  RDF  triples    –  Model  repositories  

•  Data  –  Data  federaNon  –  SpaNal  data  –  Workflows  

•  NarraNve  –  Full  text  access  

Neuron   Brain  part   Disease  Organism   Gene  

Caudate  projects  to  Snpc   Grm1  is  upregulated  in  

chronic  cocaine  Betz  cells  

degenerate  in  ALS  

NIF  provides  the  tentacles  that  connect  the  pieces:    a  new  type  of  enNty  for  21st  century  science  

Technique  People  

Page 47: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Scholar  

Library  

Scholar  

Publisher  

FORCE11.org:    Future  of  research  communicaNons  and  e-­‐scholarship  

Page 48: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Scholar  

Consumer  

Libraries  

Data  Repositories  

Code  Repositories  Community  databases/pla}orms  

OA  

Curators  

Social  Networks  

Social  Networks  Social  

Networks  

Peer  Reviewers  

NarraNve  

Workflows  

Data  

Models  

MulNmedia  

NanopublicaNons  

Code  

Page 49: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  Of  the  ~  4000  columns  that  NIF  queries,  ~1300  map  to  one  of  our  core  categories:  –  Organism  

–  Anatomical  structure  

–  Cell  – Molecule  

–  FuncNon  –  DysfuncNon  –  Technique  

•  30-­‐50%  of  NIF’s  queries  autocomplete  

•  When  NIF  combines  mulNple  sources,  a  set  of  common  fields  emerges  –  >Basic  informaNon  models/semanNc  models  exist  for  certain  types  of  enNNes  

SemanNc  frameworks  create  spaces  in  which  to  compare  the  current  state  of  data  and  knowledge  

Page 50: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  Several  powerful  trends  should  change  the  way  we  think  about  our  data:    One    Many  – Many  data  

•  GeneraNon  of  data  is  geong  easier    shared  data  •  Data  space  is  geong  richer:    more  –omes  everyday  •  But...compared  to  the  biological  space,  sNll  sparse  

–  Many  resources:    everyone  wants  to  be  “the”  one  but  e  pluribus  unum  –  Many  eyes  

•  Wisdom  of  crowds  •  More  than  one  way  to  interpret  data  

–  Many  algorithms  •  Not  a  single  way  to  analyze  data  

–  Many  analyNcs  •  “Signatures”  in  data  may  not  be  directly  related  to  the  quesNon  for  which  they  were  acquired  but  tell  us  something  really  interesNng  

New  works  need  to  be  created  with  an  eye  towards  the  web  and  interoperability  

Page 51: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Jeff  Grethe,  UCSD,  Co  InvesNgator,  Interim  PI  

Amarnath  Gupta,  UCSD,  Co  InvesNgator  

Anita  Bandrowski,  NIF  Project  Leader  

Gordon  Shepherd,  Yale  University  

Perry  Miller  

Luis  Marenco  

Rixin  Wang  

David  Van  Essen,  Washington  University  

Erin  Reid  

Paul  Sternberg,  Cal  Tech  

Arun  Rangarajan  

Hans  Michael  Muller  

Yuling  Li  

Giorgio  Ascoli,  George  Mason  University  

Sridevi  Polavarum  

Fahim  Imam  

Larry  Lui  

Andrea  Arnaud  Stagg  

Jonathan  Cachat  

Jennifer  Lawrence  

Svetlana  Sulima  

Davis  Banks  

Vadim  Astakhov  

Xufei  Qian  

Chris  Condit  

Mark  Ellisman  

Stephen  Larson  

Willie  Wong  

Tim  Clark,  Harvard  University  

Paolo  Ciccarese  

Karen  Skinner,  NIH,  Program  Officer  (reNred)  

Jonathan  Pollock,  NIH,  Program  Officer  

And  my  colleagues  in  Monarch,  dkNet,  3DVC,  Force  11  

Page 52: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Data  Space  

Laboratory  Space  

Knowledge  Space  

BAMS  

Lexicon  

Encyclopedia  

Page 53: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

47/50  major  preclinical  published  cancer  studies  could  not  be  replicated  

•  “The  scienNfic  community  assumes  that  the  claims  in  a  preclinical  study  can  be  taken  at  face  value-­‐that  although  there  might  be  some  errors  in  detail,  the  main  message  of  the  paper  can  be  relied  on  and  the  data  will,  for  the  most  part,  stand  the  test  of  Nme.    Unfortunately,  this  is  not  always  the  case.”    

•  Geong  data  out  sooner  in  a  form  where  they  can  be  exposed  to  many  eyes  and  many  analyses  may  allow  us  to  expose  errors  and  develop  bePer  metrics  to  evaluate  the  validity  of  data  

Begley  and  Ellis,  29  MARCH  2012  |  VOL  483  |  NATURE  |  531  

Page 54: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

•  Every  resource  is  resource  limited:    few  have  enough  Nme,  money,  staff  or    experNse  required  to  do  everything  they  would  like  –  If  the  market  can  support  11  MRI  databases,  fine  

–  Some  consolidaNon,  coordinaNon  is  usually  warranted  

•  Big,  broad  and  messy  beats  small,  narrow  and  neat  –  Without  trying  to  integrate  a  lot  of  data,  we  will  not  know  what  needs  to  be  done  

–  Progressive  refinement;    addiNon  of  complexity  through  layers  

•  Be  flexible  and  opportunisNc  –  A  single    opNmal  technology/container  for  all  types  of  scienNfic  data  and  informaNon  

does  not  exist;    technology  is  changing  

•  Think  globally;    act  locally:  –  No  source,  not  even  NIF,  is  THE  source;    we  are  all  a  source  –  Think  about  interoperaNon  from  the  incepNon  

Page 55: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework
Page 56: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

Regional  part  of  nervous  system   ParcellaNon  

scheme  parcel  

ParcellaNon  scheme  parcel  

Single  species  or  strain  

ParcellaNon  scheme  

Precise  definiNon  

Technique  

INCF  Task  Force:    Alan  Rutenberg,    Seth  Ruffins    

FuncNonal  part  of  nervous  system  

ParNally  overlaps  

Taxon  rank  

General  hierarchy  

Page 57: Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

 1200  parts  of  nervous  system  characterized  (mostly)    according  to  CUMBO  terms  

 1200  “parcels”  from  individual  atlases/papers  

 700  neurons   280  via  Neuron  Registry  

 Available  via  NIF  vocabulary  services  (REST)  

 Hosted  in  a  Virtuoso  triple  store  via  SPARQL  


Recommended