+ All Categories
Home > Documents > Imagining the UK National Data Infrastructure - Recommendations

Imagining the UK National Data Infrastructure - Recommendations

Date post: 22-Dec-2015
Category:
Upload: martin-hamilton
View: 25 times
Download: 0 times
Share this document with a friend
Description:
The data ecosystem in the UK is expanding rapidly to cope with the demands of the UK’s data intensive research. We recognise the key challenges ahead if we are to develop our world-leading data infrastructure in a sustainable and innovative way. In response to these challenges the National e-Infrastructure Project Directors Group (NeI- PDG) brought together in December 2014 a large number of representatives from RCUK-funded ‘Big Data’ projects to imagine how the national data infrastructure could develop.The working group have made a number of recommendations in the key themes of Integration, Capability, Connections, and Infrastructure (as identified in the EPSRC e-Infrastructure roadmap) and we outline some key deliverables for 2015.
Popular Tags:
16
Imagining the UK National Data Infrastructure Connecting up Big Data in the UK Project Directors Group (PDG) Imagining the UK National Data Infrastructure Connecting up Big Data in the UK Report of the UK National eInfrastructure Project Directors Group workshop held at the Farr Institute, London, 15th December 2014 Authors: David Fergusson, Francis Crick Institute David Colling, Imperial College / GridPP / WLCG David de Roure, University of Oxford / ESRC Martin Hamilton, Jisc (editor) Brian Matthews, STFC Jacky Pallas, University College London / eMedLab David Salmon, Jisc Jeremy Yates, University College London / STFC DiRAC
Transcript

Imagining  the  UK  National  Data  Infrastructure  

Connecting  up  Big  Data  in  the  UK    

 

 

Project  Directors  Group  (PDG)  

 

 

Imagining  the  UK  National  Data  Infrastructure  Connecting  up  Big  Data  in  the  UK    Report  of  the  UK  National  e-­‐Infrastructure  Project  Directors  Group  workshop  held  at  the  Farr  Institute,  London,  15th  December  2014  

           Authors:  

David  Fergusson,  Francis  Crick  Institute  David  Colling,  Imperial  College  /  GridPP  /  WLCG  David  de  Roure,  University  of  Oxford  /  ESRC  Martin  Hamilton,  Jisc  (editor)  Brian  Matthews,  STFC  Jacky  Pallas,  University  College  London  /  eMedLab  David  Salmon,  Jisc  Jeremy  Yates,  University  College  London  /  STFC    DiRAC    

Imagining  the  UK  National  Data  Infrastructure  

Connecting  up  Big  Data  in  the  UK  

Project  Directors  Group  (PDG)    

 

 Contents   2  

Contents  Contents  ........................................................................................................................  2  

1.  Purpose  and  scope  .....................................................................................................  3  

2.  Integration  .................................................................................................................  4  

3.  Capability  ...................................................................................................................  6  

4.  Connections  ...............................................................................................................  8  

5.  Infrastructure  .............................................................................................................  9  

6.  Deliverables  .............................................................................................................  11  

7.  An  Imagined  Data  Infrastructure  –  Another  Traditional  View  ....................................  15  

Imagining  the  UK  National  Data  Infrastructure  Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

1.  Purpose  and  scope    3  

1.  Purpose  and  scope  The  data  ecosystem  in  the  UK  is  expanding  rapidly  to  cope  with  the  demands  of  the  UK’s  data  intensive  research.  We  recognise  the  key  challenges  ahead  if  we  are  to  develop  our  world-­‐leading  data  infrastructure  in  a  sustainable  and  innovative  way.  In  response  to  these  challenges  the  National  e-­‐Infrastructure  Project  Directors  Group  (NeI-­‐PDG)   brought   together   in   December   2014   a   large   number   of   representatives   from   RCUK-­‐funded   ‘Big   Data’  projects  to  imagine  how  the  national  data  infrastructure  could  develop.    

 

Figure  1  –  The  UK  National  Data  Landscape  for  Research  

The   working   group   have   made   a   number   of   recommendations   in   the   key   themes   of   Integration,   Capability,  Connections,  and  Infrastructure  (as  identified  in  the  EPSRC  e-­‐Infrastructure  roadmap1)  and  we  outline  some  key  deliverables  for  2015.  

 

                                                                                                                                                       1  http://www.epsrc.ac.uk/newsevents/pubs/e-­‐infrastructure-­‐roadmap/  

Imagining  the  UK  National  Data  Infrastructure  

Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

2.  Integration   4  

2.  Integration    “Our  aspiration  is  for  the  UK  to  have  an  integrated  e-­‐infrastructure:  one  that  is  run  and  managed  as  a  whole  without   silos   or   boundaries,   where   there   are   simple   processes   by   which   users   can   get   access   to   the   e-­‐infrastructure   they  need  across   the  eco-­‐system,  as  appropriate   for   the   type  or   stage  of   research   they  are  doing.“  

We  do  not  envisage  the  UK  data  infrastructure  as  a  single  system  but  rather  an  integrated  solution  which  reflects  the  range  of  excellent  science  supported  via  both   large-­‐scale  projects  and  research   institutions.  We  propose  to  build  on  existing  resources  and  work  towards  better  integration  through  best  practice  for  sharing  data  coupled  to  extensive  training  support.    

The  UK  engages  in  a  broad  range  of  international  projects  such  as  EUDAT,  ELIXIR,  and  SKA.  There  is  a  need  for  a  “single   voice”   for   UK   in   the   international   arena   which   can   represent   the   academic   community   in   large  collaborations.  

Recommendation:  Build  on  international  activity  –  standards,  policies  etc.  in  a  more  strategic  and  co-­‐ordinated  way.  Role   for  RCUK  coordinator   to  ensure   that  UK  gets  value   for  money   from   its   involvement/subscriptions   in  large  scale  international  collaborations.  

There  is  an  expectation  that  significant  capital  investment  in  the  research  e-­‐Infrastructure  should  deliver  benefits  for  UK  industry,  especially  allowing  SMEs  to  benefit  through  access  to  big  data  and  compute  resources.  Some  of  these   benefits   can   be   realised   through   direct   collaboration   between   industry   and   academic   institution(s).  However   we   believe   that   there   are   additional   opportunities   by   leveraging   funding   with   Innovate   UK   and  established  (or  future)  Catapult  Centres.      

Recommendation:   Identify   funding   opportunities   within   existing   streams   to   allow   academic   institutions   to  interact  more   effectively   with   the   existing   and   projected   future   Catapult   centres,   as   a  mechanism   to   engage  industry  more  effectively  around  key  areas  such  as  digital  health  and  futures  cities/urban  transformation.  

 We  can  only  work  effectively  and  share  data  with  researchers,  whether  UK,  international  or  industry,  if  datasets  are  managed  and  discoverable.      

Imagining  the  UK  National  Data  Infrastructure  

Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

2.  Integration   5  

Standards  

● Datasets  ● Metadata,   e.g.   schema.org,   CKAN,   DataCite   and   others.   We   need   methods   to   capture   metadata  

automatically  ● Internationally  agreed,  community  driven  ● Domain/project  specific,  regulatory  (e.g.  health)  

De   facto   standards   have  often  been  driven  by   common  hardware   in   instruments   across   domains,   e.g.   EXIF   in  digital   imaging.  We   then   need   to   layer   on   top   of   those   domain   specific  metadata   standards   with   “Discovery  Metadata”.   In  some  domains  these  are  well  established,  such  as  Biosharing.org,  however  this   is  not  widely  the  case.  

Metadata   is   a   key   enabler   of   data   management   and   discovery,   and   at   “big   data”   scales   its   collection   and  sometimes  its  use  must  be  automated.  However,  there  is  a  need  to  document  the  current  metadata  landscape  and   best   practice,   and   identify   areas   for   further   development,   improvement   and   standardization.   This   will  become  a   living  document,   in   collaboration  with   those  organisations   involved   in   the  Open  Research  Data  and  Data  Transparency  areas  e.g.  Digital  Curation  Centre  and  HE  institutions.  

Recommendation:  Metadata   is   a   key   enabler   of   data  management   and   discovery,  which   at   “big   data”   scales  must  be  automated.  However,  there   is  a  need  to  document  the  current  metadata   landscape  and  best  practice,  and  identify  areas  for  further  development,  improvement  and  standardization.  

 In  order  to  promote  sharing  at  scale  researchers  must  see  some  benefit  beyond  compliance  with  RCUK  and  other  funder   policies.   Sharing   of   datasets   should   bring   academic   credit   through   data   citations   (for   example   the  DataCite   consortium)   with   DOIs   or   other   persistent   identifiers   being   associated   with   published   datasets.  Publication  of   datasets   should  be   captured   as   an   impact   outcome  of   funded   research   through  metrics   portals  such  as  Researchfish.   Jisc  are  also   reviewing  proposals   innovations   in  Data  Management   in   the  Research  Data  Spring  initiative2.  

Recommendation:  Recognition  for  the  impact  of  research  datasets  to  the  community  through  the  use  of  DOIs  or  other  common  identifiers  and,  equally,  giving  credit  to  researchers  for  generating  datasets.  Metrics  should  be  captured  via  existing  mechanisms  such  as  Jisc,  Gateway  to  Research,  Researchfish  for  example.    

                                                                                                                                                       2  http://www.Jisc.ac.uk/rd/projects/research-­‐data-­‐spring  

Imagining  the  UK  National  Data  Infrastructure  Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

3.  Capability    6  

3.  Capability    There   is   broad   recognition   of   the   concept   of   research   data   management   as   an   essential   activity   across   the  project  lifecycle  rather  than  just  a  paper  exercise  at  the  time  of  grant  submission,  as  illustrated  in  the  DCC  Data  Lifecycle  model   below.   RCUK   has   driven   the   requirement   for   institutions   to   show   leadership   in   research   data  management,  management,  with  a  joint  position  on  Data  Management3  and  the  EPSRC  in  particular  asking  HEIs  to  meet  specific  standards  by  May  20154  .  

 

Figure  2  -­‐  The  Digital  Curation  Centre  Lifecycle  Model  

Training  in  research  data  management  needs  to  speak  to  projects/centres,  institutions  and  individual  researchers  at  all  levels.  There  is  a  huge  opportunity  to  reach  Early  Career  Researchers  in  particular  through  existing  Centres  for  Doctoral  Training  via  a  “train  the  trainers”  type  approach.    

                                                                                                                                                       3  http://www.rcuk.ac.uk/research/datapolicy  4  http://www.epsrc.ac.uk/files/aboutus/standards/clarificationsofexpectationsresearchdatamanagement/  

Imagining  the  UK  National  Data  Infrastructure  Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

3.  Capability    7  

 

Recommendation:  Training  in  data  management  -­‐  Build  upon  existing  PDG,  SSI  and  DCC  activities  to  create  a  concerted  and  coordinated  approach  to  promoting  best  practice  in  data  management.  Capitalize  on  existing  activities   to   orchestrate   this,   e.g.   “train   the   trainers”  whereby   the   actual   training   is   delivered   by   projects   and  institutions.    

Capacity  building  and  skills  training    The  need  for  technology  transfer  between  subject  domains,  in  terms  of  staff  experience  rather  than  commercialization,  was  recognised.  While  RCUK  has  a  number  of  schemes  for  academic  placements  such  as  Bridging  the  Gaps,  there  is  no  equivalent  for  technical  staff.  One  possible  activity  was  proposed  -­‐  Cross-­‐RCUK  big  data  tech-­‐specific  scheme.  Proposals  to  such  a  scheme  would  preferably  driven  by  an  actual  problem,  ideally  across  disciplines  or  e-­‐Infrastructure  projects  and  provide  potential  for  host  institution  staff  to  gain  management  or  supervisory  experience.    

Recommendation:  Sharing  excellence  across  domains  -­‐  e.g.  cross  RCUK  initiative,  buying  out  staff  time  (not  just  academics)  for  a  defined  period  to  work  on  specific  activities,  proposal  from  two  subject  domains  as  a  minimum.    

 

Imagining  the  UK  National  Data  Infrastructure  Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

4.  Connections    8  

4.  Connections  

User  management    User  management   systems  are  essential   to  enable   researcher   access   to   regional   and  national   systems.  This   is  especially   important   for   the   health   informatics   and   administrative   data   networks   which   require   additional  security  and   two-­‐factor  authentication  systems.  There  are  existing  activities  around  Shibboleth,  SAFE,  VOMS,  Moonshot  and  Safe  Share,  but  existing  well   established   services  and   facilities  have   their  own  approaches   that  need   to   be   taken   into   account.   Pilots   will   lead   to   recommendations   for   common   standards.   There   is   a  particular  role  for  Jisc  and  RCUK  here  in  terms  of  international  standards  liaison  e.g.  W3C,  schema.org,  Research  Data  Alliance.  This  will  require  wider  buy-­‐in  from  the  community  as  well  as  pump-­‐priming  funding.    

Data  Transfer  and  access    Lots  of  closely  coupled  systems  with  compute  and  storage  are  co-­‐located,  and  there  are  some  examples  of  tiered  approaches   when   huge   volumes   of   data   involved   e.g.  WLCG.   The   group   felt   that   these   issues   were   typically  addressed   as   part   of   projects.   Exemplars   for   researcher   access   to   datasets   (and   compute)   respecting   trust  boundaries  include  EBI,  UKDA,  NERC  data  centres,  GridPP  data  movement  orchestration.  The  comparison  was  made   between   between   LHC   data   (instrument   in   the   stream)   and   the   Twitter   “firehose”   for   social   sciences  studies.  

There  is  a  requirement  for  remote  data  access  for  researchers  with  the  necessary  control  and  orchestration,  and  caching  tiers.  Examples  range  from  a  client  running  on  an  end  user  workstation  (GridPP)  versus  access  mediated  through   a   website   (EBI).   We   propose   a   new   project   to   develop   cross-­‐discipline   solutions   to   managing   data  transfer  through  joint  working  with  biomedical  and  physical  science  domains.  

Recommendation:  Particular   example  around  orchestrating  data   transfer   -­‐  problem   is  widely   recognised,   and  there   are   already  understood  approaches   in   some   subject  domains.  Orchestrating  data   transfer   -­‐  Crick,  EBI,  GridPP  joint  project  

 

Imagining  the  UK  National  Data  Infrastructure  Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

5.  Infrastructure    9  

5.  Infrastructure  

Networks    The   group   felt   that   with   the   recent   investment   in   Janet6,   the   network   had   sufficient   capacity   and   “room   for  expansion”.  However,  access  to  high  capacity  for  short  periods  would  increasingly  be  required.  A  number  of  points  were  raised  about  campus  networks  which  would  be  challenging  to  address  and  difficult  or  expensive.    

● “Last  mile”  -­‐  e.g.  campus  network  to  end  user.  ● Is  the  campus  LAN  fit  for  purpose  for  NeI  users?  ● Do  campus  firewalls  have  sufficient  throughput?  ● Is  campus  Janet  connection  oversubscribed  /  separate  research  connection  required?  ● What  would  a  campus  focal  point  look  like?  e.g.  GridPP  use  of  Squid  cache    ● Estates  constraints  on  many  institutions  -­‐  listed  buildings,  busy  city  streets  etc  ● Investment  in  Janet6,  improved  connectivity  to  major  research  institutions  and  improved  resilience  for  

day-­‐to-­‐day  use.    

Q:  Do  we  need  a  new  equivalent  to  the  HEFCE  LAN/MAN  initiative?  

Q:  What  would  a  “NeI  Network  Appliance”  look  like?      

Would  it  be    

● a  Virtual  Machine  (VM)  image  or    ● a  Transmission  Control  Protocol  (TCP)  stack  tuned  e.g.  Maximum  Transmission  Unit  (MTU)  

 It  would  need  to  use  AAAI  and  it  should  scheduled  file  transfers  

 

Recommendation:   The   group   felt   that   more   flexible   access   to   high   capacity   networking   for   defined   periods  would  increasingly  be  required.  For  example  the  eMedLab  project  will  be  moving  2.5PB  data  from  EBI  at  the  start  of  the  project  (April  2015).  

Imagining  the  UK  National  Data  Infrastructure  Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

5.  Infrastructure    10  

Archive    There  was  much  discussion  around  archives,  defined  as  long-­‐term  storage  of  immutable  datasets.  Some  projects  have  their  own  archives  and  some  disciplines  have  international  repositories  (e.g.  EBI).  However  the  RCUK  data  sharing  policy  has  specific  requirements  to  make  research  data  objects  available  for  up  to  10  years  after  the  last  requested  access.  The  group  felt   that   it  was  difficult   to  focus  on  approaches  offered   individual   institutions  and  proposed   a   survey   of   the   data   management   landscape.   Any   institutional   archive   should   provide   DOIs   or  persistent   identifiers   for   datasets   to   allow   discovery,   and   a   means   of   crediting   researchers   for   creating   and  depositing  datasets  (as  outlined  earlier).    

Imagining  the  UK  National  Data  Infrastructure  Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

6.  Deliverables    11  

6.  Deliverables  

Pre-­‐Requisites    

• The  Data  Analytics  and  Open  Research  Data  activities   in  the  data  e-­‐Infrastructure  should  be  supported  by  a  simple  layered  middleware  and  software  e-­‐Infrastructure.  

• This   e-­‐infrastructure   should   consist   of   a   Common   Basic   Layer     (CBL)   on   which   a   Research   Domain  Specific  layer  would  sit.  

• The  Common  Basic  Layer  (CBL)  should  therefore  be  small  and  capable  of  generic  use.  

• The  Research  Domain  Specific  Layer  (RDSL)  needs  to  be  constructed  at  the  same  time.  

• Key  elements  of  the  CBL  are    o The  AAAI  and  Security  Models  –  I  am  who  I  am  and  I  can  use  resources.  o Control  access  to  data  –  The  RCUK  AAAI  project  SAFE  SHARE  is  delivering  aspects  of  this.  o Data  In-­‐flight  Security  –  my  data  is  going  to  flow  ok  and  only  the  right  people  will  get  it  and  see  it  o Data  at-­‐rest  Security  –  it’s  looked  after  and  I  am  obeying  the  pertinent  regulations.    The  data  are  

open  to  those  who  are  allowed  to  see  it;  it  is  searchable  and  query-­‐able.    o Cloud/Grid  middleware   to  enable  appropriate   resources   to  be  used.   From   the  user  perspective  

this  can  be  broken  down  into  the  following  attributes:    1. Can  I  see  resources?  2. You  can  use  resources,    3. and  actually  using  resources,  4. here  is  what  you  have  used  and  5. here  are  your  results  in  the  place  you  asked  them  to  be  put.  

o Wrapping  compute  around  big  data  –  use  of    virtualisation  and  containers  to  send  our  workflows  to   where   the   data   are   residing.     The   local   compute   simply   executes   the   workflows   we   have  constructed/run  on  other  machines.      

o An  Application  Program  Interface  (API)  that  allows  Data  Policies  (e.g.  metadata  requirements)  to  be  actualised  in  applications.    

o Simple  Tools  and  Services  to  enable  data  discovery  and  exploration.    Data  can  be  accessed  and  queried  using  published  metadata  and  data  transport  tools.  

• An  RDSL  would  have  elements  such  as    o Applications  or  web  portals   that  allow   its   researchers   to  use  CBL  services.    These  are   the  user-­‐

friendly  User  Interface  (UI)  and  would  be  the  gateway  to  the  NeI  for  the  average  researcher.  o If  needed,  extra  security  and  AAAI  requirements  could  be  included  here.  o Access  to  training  resources  could  be  included,  such  as  online  courses  and  tests.  o The   interfaces   and   APIs   to   the   Data   Analytics   and  Open   Research   Data   infrastructures   would  

reside  in  the  RDSL.  

Imagining  the  UK  National  Data  Infrastructure  

Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

6.  Deliverables   12  

• Hardware  will  be  domain  and  activity  specific.    However  object  stores  that  can  act  as  repositories  could  be  centralised  and  be  a  common  activity  between  the  RCs.  

 

In  terms  of  current  activities  our  progress  in  creating  these  Pre-­‐Requisites  is  also  listed  below.    Table  1:  Pre-­‐Requisites  for  the  Data  Infrastructure  

Infrastructure   Projects   Who  is  Responsible?  

Authentication,  Allocation  and  Authorisation  Infrastructure  with  2  factor  Security  Controls  

 

Jisc-­‐led  Safe  Share  Project  already  underway  

Research  Domain  aspects  of  AAAI  need  to  be  constructed.  

Jisc  and  partners  from  ESRC  and  MRC  

Research  Domains  

Data-­‐in-­‐flight  Information  Assurance  

 

Jisc   Jisc,  Research  Domains  

Data-­‐at-­‐rest  Information  assurance   No  overall  description,  or  indeed  none  

NeI  as  a  whole  

Data  abstraction  layer  development  NeI  Projects   PDG  members,  RCs  

Networks   High  Capacity  Networking  

Local  Research  Organisation  

Links  to  Business  

Jisc  

RO  

Jisc  

Advanced  Compute   NeI  Projects   PDG  members,  RCs  

Data  Storage  Facilities   NeI  Projects   PDG  Members,  RCs  

Cloud/Grid  Infrastructure   GridPP,  JASMIN2,  EMBASSY  CLOUD,  eMedLab  

Cloud  WG,  PDG  

Imagining  the  UK  National  Data  Infrastructure  

Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

6.  Deliverables   13  

Infrastructure   Projects   Who  is  Responsible?  

Tools  and  Software   Varied  –  no  coherence   Big  Data  SIG,  PDG  and  RCs  

   What  needs  to  be  tried  out  and  tested?    The  tools  and  software  needed  to  discover  data  and  move  data  around  (needed  for  multiple  data  sources)  need  to  be  developed  into  a  coherent  and  simple  package.      Below  are  listed  a  set  of  deliverables  that  can  be  achieved  in  2015  to  enable  this.    However  these  are  dependent  on  activities  listed  in  Table  1.      This  is  why  the  tests  will  be  done  in  the  field  on  live  NeI  systems.      Table  2:  List  of  Deliverables  

Recommendations   Action   Milestone  (OWNER)  

Training  in  data  management   Projects  to  produce  data  management  plans  and  run  courses  on  data  management  for  user  communities  and  staff.    CDTs  to  be  involved.  

DMPs  and  Courses  in  place  by  June  2015  (PDG)  

Document  the  current  metadata  landscape  and  best  practice  

RCs  to  document  the  relevant  Metadata  standards  and  publish  these  standards    Create  code  libraries  that  applications  can  use  to  produce  metadata  when  data  are  produced.  

Publish  Standards  and  insist  on  their  use  –  particularly  when  data  are  produced  (RCUK).    Demonstrate  on  PDG  Projects’  systems  (PDG)  

Develop  data  abstraction  layer     Build  test  and  open  source  software  tools  for  data  abstraction  and  presentation  of  meta-­‐data  

Integration  of  iRODS  and  OpenStack  as  a  POC  for  data  integration  and  presentation  (PDG)  

Co-­‐ordination  of  International  Projects  to  extract  best  value  and  influence  Agendas  

Produce  report  on  the  various    national  and  international  projects  the  UK  is  involved  in  

Produce  Strategy  Document  (RCUK)  

Imagining  the  UK  National  Data  Infrastructure  

Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

6.  Deliverables   14  

Recommendations   Action   Milestone  (OWNER)  

Working  with  Catapult  Centres   Work  with  Innovate  UK  to  ensure  that  business  has  access  to  Janet    RCUK  NeI  Group  to  communicate  to  academic  community  opportunities  to  work  with  catapult  centres    

Simple  Contracts  and  portal  make  sure  Business  can  book  network  access  easily  (Jisc).    Adding  to  existing  regular  research  bulletins  (RCUK)    Organise  joint  academic/Innovate  UK  workshops  to  link  academy  to  Catapults  (RCUK)  

Data  Transfer  1  –  Data  transport  and  orchestration  

Make  FTS  a  generic  tool  to  act  as  an  aggregator  and  orchestrator  and  link  to  the  RCUK  AAAI    

Test  on  the  DiRAC,  JASMIN2  and  eMedLab  systems    (PDG,  Jisc)  

Data  Transfer  2  –  High  Capacity  Network  Access  

Secure  Transport  of  Data  to  eMedLab  and  RAL  WOS  

Transfer  of  multi-­‐PB  EBI  data  to  eMedLab  and  and  DiRAC@Durham  Data    to  RAL  WOS  (PDG,  Jisc)  

Data  Transfer  3  –  Creating  Single  Name  Spaces  

Create  WLAN  and  VLANS  in  projects  to  create  single  filesystems  (global  spaces)  between  distributed  systems  

Test  on  DiRAC  systems  between  Durham  and  Edinburgh  and  between  EBI  and  eMedLab  (PDG,  Jisc)    Test  on  wLHC  and  DiRAC  (PDG,  Jisc)  

Knowledge  Transfer  and  Consultancy  

Produce  Work  programme     Produce  by  April  2015  

 

   

Imagining  the  UK  National  Data  Infrastructure  

Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

7.  An  Imagined  Data  Infrastructure  –  Another  Traditional  View   15  

7.  An  Imagined  Data  Infrastructure  –  Another  Traditional  View  

A schematic of what a National Data e-Infrastructure may look like. Note the ubiquitous presence of Janet.

Key: a Janet Connection

The  Proposed  CBL  and  RDSL  would  be  the  enabling  middleware  infrastructure  for  this  e-­‐Infrastructure.  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

HEI  3  HEI  2  

JASMINE2  

DIAMOND  

HEI  1  

National  Deep  Archive  Service  

National  Tertiary  Storage  Service  

Sanger,  EBI,  ESRC,  DiRAC,  ARCHER  

The  Attributes  and  functional  blow-­‐up  of  a  TYPICAL  Local  System,  the  National  Tertiary  Storage  Service  and  the  National  Deep  Archive  Service  

 “Local”  Tertiary  Storage  Layer  

Meta  Data  Presented  to  World  

Database  Creation/Ingestion  Layer  and  Analytics  

Parallel  File  System,  HEI    RDM/Repository  

Data  Generator.  Experiments,  Clusters,  PCs....  

Imagining  the  UK  National  Data  Infrastructure  

Connecting  up  Big  Data  in  the  UK    

Project  Directors  Group  (PDG)  

 

  16  

The principal components needed for such an e-Infrastructure are:-

1. Local tertiary storage platforms for active data. 2. Data Base Creator/Ingestor widget to create structured data from unstructured data and policies to

meta-data tag such data – e.g. owner, project, grant no. etc. 3. A National tertiary storage /metadata service to build up and store metadata from the other databases

in the National e-I, as well as store our major active databases. 4. A National Deep Archive Service to store data that has been produced by National Facilities and to

provide data replication services for the National E-Infrastructure.

This is a traditional representation of a computing infrastructure. It is very much the end point of the proposed work in this document, which is why it belongs at the end. The work proposed in this document enables this infrastructure to exist in an efficacious way. The outputs we propose are the real Data Infrastructure in that they enable data to be moved, selected, and queried. It is these that give the data its form and value.  

 

 


Recommended