HPCS16 - Frederick Lefebvre - Bridging the last mile

A platform for data management and analytics in campuses and research labs

Frédérick [email protected]

● Compute Canada and its regional partners have put a lot of work into using Canarie’s and the Nren’s network to interconnect their infrastructure through high speed networks

● 10 GbE right now / 100 GbE for all new systems● 25 Globus/GridFTP data transfer nodes have been

deployed to facilitate data movement across the Compute Canada infrastructure

Fast data transfers between datacenters is great but what about everyone else ?

● Data doesn't just magically appear on on Compute Canada’s systems.

● It gets created “somewhere”, has a life of its own, comes to our systems for a brief time and goes back home...

Utilization data from the CC Globus infrastructure over the past 2 years supports this model

● Transfers to and from our infrastructure○ More data moves back out but not by much

● As we centralize resources, we are moving storage and computing further away from researchers

● Visualization, real-time computations as well as application development and prototyping can be impaired by the increase latency with the systems and their teams

● There is a need to improve tools available to researchers to facilitate their use of Advanced Research Computing resources.○ Improved end-to-end networking

○ Wider deployment of data movement and pre-

processing infrastructure

● Deploy Data Transfer Nodes (DTN) close to where data is generated and extend the science-dmz all the way to the labs○ DTNs administered by the local ARC team

○ Local ingestion points can be dedicated to a research

lab or the whole campus

Based on the Fiona DTN developed by SDSC for the Pacific Research Platformhttps://fasterdata.es.net/science-dmz/DTN/fiona-flash-i-o-network-appliance/

● Science-DMZ○ Dedicated

research network

○ Away from

firewalls

○ All the way to the

researchersRef: Science-dmz - es.nethttp://fasterdata.es.net/science-dmz/science-dmz-architecture/

● High speed data transfers need purpose built Data Transfer Node

● Above all, they require fast drives to prevent disk IOs from becoming the bottleneck

● Spinning disks are seldom usable unless you are going to have lots of them ○ Think 10s of them to achieve 40 Gbps!!!

● Modern processors have much more power that what is required to move data from drives to networks

● The fast IOs of a DTN and its large memory make it ideal to run streaming workload, data analytics and general data transformation

● Why let it sit idle ?

● Enhance the DTNs with the ability to run code on local data through a web interface ○ Focus for now on scripting languages and big data

analytics with Apache Spark

○ Creates an environment where data can be ingested,

explored, modified and then moved elsewhere

grifFTP server inside container, bound to specific cores

All other cores shared by the OS and user code

● JupyterLab to manage and launch user’s Notebooks

● Authentication against the CC ldap directory

● Perfsonar in containers (in progress)● Scale out whole Notebooks or Apache Spark

workloads to a parallel cluster (in progress)● Network export of local storage● Automated data transformation pipelines● Software building blocks & code snippets in

the Notebooks

S3

Sensors upload data to local storage through an S3 API

Researcher explores its data with R and Apache Spark in a Notebook

1.

2.

Data is anonymized3.

Anonymized data is transferred to a CC system using Globus

4.

Sequencers output data on local storage through CIFS share

Fastq files are preprocessed locally

1.

2.

Files are characterized and indexed

3.

Data is transferred to parallel system for further processing

4.

● A gateway to get researcher’s data onto Compute Canada’s infrastructure

● A local platform for data exploration & visualization, pre-processing and prototyping

● A generic web portal to submit workloads on ARC systems○ We have automated node reservation to scale out

Notebooks on Colosse.

○ The way we do it on Colosse requires the portal to be

a submit host○ There has to be a better way. Web API ?

Processors 2x Xeon E5-2640v4 = 40 logical cores

Memory 128 GB DDR4

Network interfaces Mellanox ConnectX3-pro dual port 40GbE

Drives for OS 2x 128 GB SATA SSD

Local storage (Perf. option) 8x 400GB nvme drives

Local storage (Capacity option) 24x 8TB NL SAS drives

● Cost is from ~12K to 25K and up ○ storage is the differentiator

● There is a need for high speed data transport services in campuses and larger labs

● Local computing capabilities create new opportunities for quick innovation

● We envision a model where researchers finance their local portal to size it up to their needs

● We have selected 2 pilot sites that will be deployed this summer

● You can participate by:○ Becoming a pilot site○ Contribute to the platform design and development○ Letting us know how we can improve the model○ Help us find a better name…

● Contact us: [email protected]

Date post:	09-Feb-2017
Category:	Documents
Upload:	frederick-lefebvre
View:	93 times
Download:	0 times

HPCS16 - Frederick Lefebvre - Bridging the last mile

Documents