+ All Categories
Home > Technology > Data repositories -- Xiamen University 2012 06-08

Data repositories -- Xiamen University 2012 06-08

Date post: 01-Dec-2014
Category:
Upload: jian-qin
View: 762 times
Download: 2 times
Share this document with a friend
Description:
 
21
Data Repositories and Services Xiamen University Library June 8, 2012 Jian Qin School of InformaCon Studies Syracuse University hDp://eslib.ischool.syr.edu/jqin/
Transcript
Page 1: Data repositories -- Xiamen University 2012 06-08

Data  Repositories  and  Services  

Xiamen  University  Library  June  8,  2012  

 Jian  Qin  

School  of  InformaCon  Studies  Syracuse  University  

hDp://eslib.ischool.syr.edu/jqin/  

Page 2: Data repositories -- Xiamen University 2012 06-08

Agenda  •  What  is  a  repository?  Repository  soNware?  •  What  does  it  do?    •  How  does  it  work?  •  Case  studies:  – Dryad:  an  internaConal  repository  of  data  and  publicaCons  for  basic  and  applied  biosciences  

– Dataverse:  a  data  repository  system  

2  Data  repositories  and  services  6/8/12  

Page 3: Data repositories -- Xiamen University 2012 06-08

What  is  a  data  repository?  

Data  repositories  and  services   3  

Data  Repository  is  a  logical  (and  someCmes  physical)  parCConing  

of  data  where  mulCple  databases  which  apply  to  

specific  applicaCons  or  sets  of  applicaCons  reside.    

 hDp://www.learn.geekinterview.com/data-­‐warehouse/

dw-­‐basics/what-­‐is-­‐data-­‐repository.html    

Repository  commonly  refers  to  a  locaCon  for  storage,  oNen  for  safety  

or  preservaCon.    

hDp://en.wikipedia.org/wiki/Repository    

6/8/12  

Page 4: Data repositories -- Xiamen University 2012 06-08

WHAT  CAN  WE  EXPECT  IN  A  DATA  REPOSITORY?  

Data  repositories  and  services   4  6/8/12  

Page 5: Data repositories -- Xiamen University 2012 06-08

Technical  features  •  Standards  

–  OAI-­‐PMH  –  Z39.50  protocol    –  Open  source  license  

•  Hardware  –  Minimum  hardware  requirements  –  SAN  support  

•  So;ware  –  OS    –  Programming  language  –  Database  –  Web  server  –  Java  servlet  engine  –  Search  engine  –  Other  

•  Staff  requirements  –  UNIX  systems  

administrator  –  Java  programmer  –  PERL  programmer  –  Python  programmer  

Data  repositories  and  services   5  

Open  Society  InsCtute.  (2004).  A  guide  to  insCtuConal  repository  soNware.  3rd  ed.  hDp://www.soros.org/openaccess/pdf/OSI_Guide_to_IR_SoNware_v3.pdf      

6/8/12  

Page 6: Data repositories -- Xiamen University 2012 06-08

Features  and  funcCons  •  Repository  &  system  administraDon  – User  registraCon,  authenCcaCon  &  password  administraCon  

– Module-­‐level  APIs  •  Content  submission  administraDon  – Define  mulCple  collecCons  with  same  instance  of  system  

–  Submission  stages  –  Submission  support  –  System  generated  usage  stats  and  reposts  

Data  repositories  and  services   6  

Open  Society  InsCtute.  (2004).  A  guide  to  insCtuConal  repository  soNware.  3rd  ed.  hDp://www.soros.org/openaccess/pdf/OSI_Guide_to_IR_SoNware_v3.pdf      

6/8/12  

Page 7: Data repositories -- Xiamen University 2012 06-08

FuncCons  of  repositories  •  Content  management  

–  Content  import/export  –  Document/object  formats  –  Metadata  –  Real-­‐Cme  updaCng  and  indexing  of  accepted  content  

•  DisseminaCon  –  User  interface  –  Search  capability  

•  Full  text  •  All  descripCve  metadata  •  Selected  metadata  fields  •  Browse  •  Sort  search  results  

–  Indexed  by  Google/other  search  engines  

•  Archiving  –  Persistent  document  idenCficaCon  

–  Data  preservaCon  report  –  Object  history/version  control  

•  System  maintenance  –  System  support  

•  DocumentaCon/manual  •  Listserv  •  Bug  track/feature  request  system  

•  Formal  support/help  desk  

Data  repositories  and  services   7  

Open  Society  InsCtute.  (2004).  A  guide  to  insCtuConal  repository  soNware.  3rd  ed.  hDp://www.soros.org/openaccess/pdf/OSI_Guide_to_IR_SoNware_v3.pdf      

6/8/12  

Page 8: Data repositories -- Xiamen University 2012 06-08

Research  community  

The  context  of  repositories  

Data  repository  

Datasets  PublicaCons,  presentaCons,  reports,  etc.    

InsCtuConal  repository  

Disciplines  Standards  Technology  

8  Data  repositories  and  services  6/8/12  

Page 9: Data repositories -- Xiamen University 2012 06-08

InsCtuConal  repositories  •  An  insCtuConal  repository  (IR)consists  of  formally  

organized  and  managed  collecCons  of  digital  content  generated  by  faculty,  staff,  and  students  at  an  insCtuCon  

•  Types  of  IRs:  –  CollecCon-­‐based  digital  repositories  managed  by  library  

professionals  –  Course  management  systems  and  associated  file  stores  –  CollecCon  of  research  data  and  reports  managed  by  research  

units  (centers,  laboratories,  etc.)  –  Student  academic  porlolio  systems  –  InsCtuConal  file  storage  systems  –  Digital  asset  management  workflow  systems    –  Web  content  management  systems    used  by  insCtuCons  or  

depts  to  store  and  stage  web  content  

PublicaCons,  presentaCons,  reports,  etc.    

InsCtuConal  repository  

EDUCAUSE  Evolving  Technologies  CommiDee.  (2003).  InsCtuConal  repositories:  Enhancing  teaching,  learning,  and  research.  hDp://net.educause.edu/ir/library/pdf/DEC0303.pdf    

9  Data  repositories  and  services  6/8/12  

Page 10: Data repositories -- Xiamen University 2012 06-08

Data  repositories  •  No  one  agreed-­‐upon  definiCon  •  CharacterisCcs:  – A  repository  operated  by  an  academic  insCtuCon/unit  or  a  research  organizaCon  

– A  system  for  storing,  managing,  preserving,  and  providing  access  to  data  

–  Centered  on  a  discipline  or  a  research  field  involving  mulCple  disciplines  

–  Policies  governing  the  intellectual  property  rights,  management,  access,  sharing,  and  citaCon  

Data  repository  

Datasets  

10  Data  repositories  and  services  6/8/12  

Page 11: Data repositories -- Xiamen University 2012 06-08

Dryad:  a  repository  for  data  and  publicaCons  

Data  repositories  and  services   11  

hDp://datadryad.org/    

•  As  a  data  repository,  Dryad  provides  a  plalorm  to  associate  data  with  underlying  publicaCons.    

•  Content  acquisiCon:  user  submission  •  How  to  moCvate  users  to  submit  data?  •  Make  it  simple  and  rewarding  •  Provide  detailed  support  informaCon  about:  

•  DeposiCng  data  •  Managing  data  •  Intellectual  property  rights  (CC0)  •  Download  data  packages  •  View  usage  staCsCcs  

6/8/12  

Page 12: Data repositories -- Xiamen University 2012 06-08

Dryad  metadata  record  example  

6/8/12   Data  repositories  and  services   12  

hDp://datadryad.org/handle/10255/dryad.8085    

Page 13: Data repositories -- Xiamen University 2012 06-08

Dryad  metadata  record  example  (cont’d)  

6/8/12   Data  repositories  and  services   13  

Individual  files  in  the  data  package.  The  metadata  shows:  •  #  of  downloads  •  File  technical  

data  •  Copyright  type  •  DocumentaCon  

for  the  data  file  

Page 14: Data repositories -- Xiamen University 2012 06-08

Dryad  Backend  •  Uses  core  features  of  DSpace  with  modificaCons  or  complete  replacement  

•  Uses  OAI-­‐PMH  to  allow  metadata  harvesCng  – Metadata  formats  available  for  harvesCng  include  

•  METS/MODS,  OAI-­‐DC  (Dublin  Core),  OAI-­‐ORE/Atom,  and  RDF/DC    

•  Uses  DOI  to  idenCfy  Dryad  data  packages  and  files  

6/8/12   Data  repositories  and  services   14  

hDp://wiki.datadryad.org/Category:Technical_DocumentaCon    

Page 15: Data repositories -- Xiamen University 2012 06-08

DOI  Examples      

•  Data  packages  –  doi:10.5061/dryad.1664  –  doi:10.5061/dryad.642  –  doi:10.5061/dryad.1307  

•  Data  files  –  doi:10.5061/dryad.1664/1  –  doi:10.5061/dryad.642/1  –  doi:10.5061/dryad.1307/1  –  doi:10.5061/dryad.1307/2  –  doi:10.5061/dryad.1307/3  

6/8/12   Data  repositories  and  services   15  

Page 16: Data repositories -- Xiamen University 2012 06-08

DATA  REPOSITORY  SOFTWARE  

6/8/12   Data  repositories  and  services   16  

Page 17: Data repositories -- Xiamen University 2012 06-08

6/8/12   Data  repositories  and  services   17  

Page 18: Data repositories -- Xiamen University 2012 06-08

6/8/12   Data  repositories  and  services   18  

Dataverse  metadata  ediCng  interface  

Page 19: Data repositories -- Xiamen University 2012 06-08

6/8/12   Data  repositories  and  services   19  

Dataverse  metadata  ediCng  interface  (cont’d)  

Page 20: Data repositories -- Xiamen University 2012 06-08

6/8/12   Data  repositories  and  services   20  

Page 21: Data repositories -- Xiamen University 2012 06-08

Standards  and  tools  for  repositories  •  Open  Archive  IniCaCve  (OAI)  and  its  Protocol  for  Metadata  HarvesCng  (OAI-­‐PMH)  

•  Tools  (open  source):  –  DSpace  (hDp://www.dspace.org)    –  Fedora  (hDp://www.fedora-­‐commons.org/)  –  Dataverse  (hDp://thedata.org/)    –  EPrints  (hDp://www.eprints.org/)  – More:  hDp://oad.simmons.edu/oadwiki/Free_and_open-­‐source_repository_soNware    

21  Data  repositories  and  services  6/8/12  


Recommended