+ All Categories
Home > Technology > Streaming exa-scale data over 100Gbps networks

Streaming exa-scale data over 100Gbps networks

Date post: 06-Jul-2015
Category:
Upload: balmanme
View: 212 times
Download: 2 times
Share this document with a friend
Description:
Crd allhands meeting balman
23
Streaming Exascale Data over 100Gbps Networks Mehmet Balman Scien.fic Data Management Group Computa.onal Research Division Lawrence Berkeley Na.onal Laboratory CRD All Hands Mee.ng July 15, 2012
Transcript
Page 1: Streaming exa-scale data over 100Gbps networks

Streaming  Exa-­‐scale  Data  over  100Gbps  Networks  

Mehmet  Balman    

Scien.fic  Data  Management  Group  Computa.onal  Research  Division  

Lawrence  Berkeley  Na.onal  Laboratory  

CRD  All  Hands  Mee.ng  July  15,  2012  

Page 2: Streaming exa-scale data over 100Gbps networks

Outline  • A  recent  100Gbps  demo  by  ESnet  and  Internet2  at  SC11  

     • One  of  the  applica=ons:  

• Data  movement  of  large    datasets  with  many  files  (Scaling  the  Earth  System  Grid  to  100Gbps  Networks)  

Page 3: Streaming exa-scale data over 100Gbps networks

ESG  (Earth  Systems  Grid)  

• Over  2,700  sites  •  25,000  users  

 

•  IPCC  FiNh  Assessment  Report  (AR5)  2PB    

•  IPCC  Forth  Assessment  Report  (AR4)  35TB  

Page 4: Streaming exa-scale data over 100Gbps networks

Applications’  Perspective  

•  Increasing  the  bandwidth  is  not  sufficient  by  itself;  we  need  careful  evalua=on  of  high-­‐bandwidth  networks  from  the  applica=ons’  perspec=ve.    

 •  Data  distribu.on  for  climate  science  

•   How  scien*fic  data  movement  and  analysis  between  geographically  disparate  supercompu*ng  facili*es  can  benefit  from  high-­‐bandwidth  networks?  

Page 5: Streaming exa-scale data over 100Gbps networks

Climate  Data  Distribution  •  ESG  data  nodes  

•  Data  replica=on  in  the  ESG  Federa=on  

•  Local  copies  •  data  files  are  copied  into  temporary  storage  in  HPC  centers  for  post-­‐processing  and  further  climate  analysis.    

Page 6: Streaming exa-scale data over 100Gbps networks

Climate  Data  over  100Gbps  

•  Data  volume  in  climate  applica=ons  is  increasing  exponen=ally.  

•  An  important  challenge  in  managing  ever  increasing  data  sizes  in  climate  science  is  the  large  variance  in  file  sizes.    •  Climate  simula=on  data  consists  of  a  mix  of  rela=vely  small  and  large  files  with  irregular  file  size  distribu=on  in  each  dataset.    

• Many  small  files  

Page 7: Streaming exa-scale data over 100Gbps networks

Keep  the  data  channel  full  

FTP RPC

request a file

request a file

send file

send file

request data

send data

•  Concurrent  transfers  •  Parallel  streams  

Page 8: Streaming exa-scale data over 100Gbps networks

lots-­‐of-­‐small-­‐Diles  problem!  Dile-­‐centric  tools?    

l  Not  necessarily  high-­‐speed  (same  distance)  -  Latency  is  s=ll  a  problem  

100Gbps pipe 10Gbps pipe

request a dataset

send data

Page 9: Streaming exa-scale data over 100Gbps networks

Framework  for  the  Memory-­‐mapped  Network  Channel  

memory  caches  are  logically  mapped  between  client  and  server    

Page 10: Streaming exa-scale data over 100Gbps networks

Moving  climate  Diles  efDiciently  

Page 11: Streaming exa-scale data over 100Gbps networks

The  SC11  100Gbps  demo  environment  

Page 12: Streaming exa-scale data over 100Gbps networks

Advantages  •  Decoupling  I/O  and  network  opera=ons  

•  front-­‐end  (I/O    processing)  •  back-­‐end  (networking  layer)    

•  Not  limited  by  the  characteris=cs  of  the  file  sizes    On  the  fly  tar  approach,    bundling  and  sending    many  files  together  

•  Dynamic  data  channel  management    Can  increase/decrease  the  parallelism  level  both    in  the  network  communica=on  and  I/O  read/write    opera=ons,  without  closing  and  reopening  the    data  channel  connec=on  (as  is  done  in  regular  FTP    variants).    

Page 13: Streaming exa-scale data over 100Gbps networks

The  SC11  100Gbps  Demo  

•  CMIP3  data  (35TB)  from  the  GPFS  filesystem  at  NERSC  •  Block  size  4MB  •  Each  block’s  data  sec=on  was  aligned  according  to  the  system  pagesize.    

•  1GB  cache  both  at  the  client  and  the  server      •  At  NERSC,  8  front-­‐end  threads  on  each  host  for  reading  data  files  in  parallel.  

•   At  ANL/ORNL,  4  front-­‐end  threads  for  processing  received  data  blocks.  

•   4  parallel  TCP  streams  (four  back-­‐end  threads)  were  used  for  each  host-­‐to-­‐host  connec=on.    

Page 14: Streaming exa-scale data over 100Gbps networks

83Gbps    throughput  

Page 15: Streaming exa-scale data over 100Gbps networks

MemzNet:  memory-­‐mapped  zero-­‐copy  network  channel  

network

Front-­‐end  threads  

(access  to  memory  blocks)

Front-­‐end  threads  (access  to  memory  blocks)

Memory  blocks Memory  

blocks

memory  caches  are  logically  mapped  between  client  and  server    

Page 16: Streaming exa-scale data over 100Gbps networks

ANI  100Gbps    testbed  

ANI 100G Router

nersc-diskpt-2

nersc-diskpt-3

nersc-diskpt-1

nersc-C2940 switch

4x10GE (MM)

4x 10GE (MM)

Site Router(nersc-mr2)

anl-mempt-2

anl-mempt-1

anl-app

nersc-app

NERSC ANL

Updated December 11, 2011

ANI Middleware Testbed

ANL Site Router

4x10GE (MM)

4x10GE (MM)

100G100G

1GE

1 GE

1 GE

1 GE

1GE

1 GE

1 GE1 GE

10G

10G

To ESnet

ANI 100G Router

4x10GE (MM)

100G 100G

ANI 100G Network

anl-mempt-1 NICs:2: 2x10G Myricom

anl-mempt-2 NICs:2: 2x10G Myricom

nersc-diskpt-1 NICs:2: 2x10G Myricom1: 4x10G HotLava

nersc-diskpt-2 NICs:1: 2x10G Myricom1: 2x10G Chelsio1: 6x10G HotLava

nersc-diskpt-3 NICs:1: 2x10G Myricom1: 2x10G Mellanox1: 6x10G HotLava

Note: ANI 100G routers and 100G wave available till summer 2012; Testbed resources after that subject funding availability.

nersc-asw1

anl-C2940 switch

1 GE

anl-asw1

1 GE

To ESnet

eth0

eth0

eth0

eth0

eth0

eth0

eth2-5

eth2-5

eth2-5

eth2-5

eth2-5

eth0

anl-mempt-3

4x10GE (MM)

eth2-5 eth0

1 GE

anl-mempt-3 NICs:1: 2x10G Myricom1: 2x10G Mellanox

4x10GE (MM)

10GE (MM)10GE (MM)

SC11  100Gbps    demo  

Page 17: Streaming exa-scale data over 100Gbps networks

Many  TCP  Streams  

(a) total throughput vs. the number of concurrent memory-to-memory transfers, (b) interface traffic, packages per second (blue) and bytes per second, over a single NIC with different number of concurrent transfers. Three hosts, each with 4 available NICs, and a total of 10 10Gbps NIC pairs were used to saturate the 100Gbps pipe in the ANI Testbed. 10 data movement jobs, each corresponding to a NIC pair, at source and destination started simultaneously. Each peak represents a different test; 1, 2, 4, 8, 16, 32, 64 concurrent streams per job were initiated for 5min intervals (e.g. when concurrency level is 4, there are 40 streams in total).  

   

Page 18: Streaming exa-scale data over 100Gbps networks

ANI testbed 100Gbps (10x10NICs, three hosts): Interrupts/CPU vs the number of concurrent transfers [1, 2, 4, 8, 16, 32 64 concurrent jobs - 5min intervals], TCP buffer size is 50M

Effects  of  many  streams  

Page 19: Streaming exa-scale data over 100Gbps networks

MemzNet’s  Performance    

TCP  buffer  size  is  set  to  50MB    

MemzNet GridFTP

SC11 demo

ANI Testbed

Page 20: Streaming exa-scale data over 100Gbps networks

MemzNet’s  Architecture  for  data  streaming  

Page 21: Streaming exa-scale data over 100Gbps networks

 Experience  with  100Gbps  Network  

Applications  Mehmet  Balman,  Eric  Pouyoul,  Yushu  Yao,  E.  Wes  Bethel,  Burlen  Loring,  Prabhat,  John  Shalf,  Alex  Sim,  and  Brian  L.  Tierney  

DIDC  –  DelO,  the  Netherlands  June  19,  2012  

Page 22: Streaming exa-scale data over 100Gbps networks

Acknowledgements  Peter   Nugent,   Zarija   Lukic   ,   Patrick   Dorn,   Evangelos  Chaniotakis,   John   Christman,   Chin   Guok,   Chris   Tracy,   Lauren  Rotman,   Jason   Lee,   Shane   Canon,   Tina   Declerck,   Cary  Whitney,  Ed  Holohan,    Adam  Scovel,  Linda  Winkler,  Jason  Hill,  Doug  Fuller,    Susan  Hicks,  Hank  Childs,  Mark  Howison,  Aaron  Thomas,  John  Dugan,  Gopal  Vaswani  

Page 23: Streaming exa-scale data over 100Gbps networks

The  2nd  Interna=onal  Workshop  on  Network-­‐aware  Data  Management  

to  be  held  in  conjunc=on  with  the  EEE/ACM  Interna=onal  Conference  for  High  Performance  

Compu=ng,  Networking,  Storage  and  Analysis  (SC'12)    

hip://sdm.lbl.gov/ndm/2012    

Nov  11th,  2012    

Papers  due  by   the  end  of  August  

Last  year's  program  hip://sdm.lbl.gov/ndm/2011  


Recommended