+ All Categories
Home > Documents > Technology and Infrastructure Support for Large Scale Information Marcio Faerman The Brazilian...

Technology and Infrastructure Support for Large Scale Information Marcio Faerman The Brazilian...

Date post: 14-Dec-2015
Category:
Upload: diane-whelpley
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
18
Technology and Infrastructure Support for Large Scale Information Marcio Faerman The Brazilian National Education and Research Network - RNP [email protected] www.rnp.br
Transcript

Technology and Infrastructure Support for Large Scale Information

Marcio FaermanThe Brazilian National Education and Research Network - [email protected]

Generating Large Data Collections• Large Data Volumes can be generated much faster

than they can be analyzed– Instrument Observations

• Particle Accelerators (Cern LHC)• Telescopes, Satellites• Sensor Networks• Virtual Observatories

– Large Model Simulations• High resolution, Very complex

• Scientific Experiments– medical imaging (fMRI): ~ 1 GByte per measurement (day)– Bio-informatics queries: 500 GByte per database– Satellite world imagery: ~ 5 TByte/year– Current particle physics: 1 PByte per year– LHC physics (2007): 10-30 PByte per year– LSST Astronomy (2012): 5 PBytes per year

Challenges Managing Large Volume Data• Scalability

– What works for small datasets does not necessarily work for large collections

• Data Integrity– At a terabyte scale failures and data corruption are very likely to occur– Is data provenance reliable?

• Efficiency– Data should be accessed at a rate which keeps work feasible– More data – need for more speed

• Distributed Access– Data can be at remote (and possibly unknown) location

• Infrastructure Management– Heterogeneous– Distributed– Prone to failures– Very Complex

Challenges – Getting to Know your Data

• Extract knowledge from raw data files– Data product derivation

• Vizualization• Relationships• Patterns • New derived quantities

– Cross institutional and cross disciplinary collaborations• What if experiments

– Your data with our model?

• Dataset Access– Multiple formats

• Each sensor, simulation has its own storage format

– Federated collections

– Discovery by content

Technological Response

• Integration of compute, communication, storage and instrument resources into a powerful infrastructure – Information Grids– Very powerful infrastructure– Economy of scale

• Serves broad range of customers– biologists, pysicists, government, industry

• Infrastructure is heterogeneous, distributed, very complex

• Middleware and Data Oriented tools act as facilitators to tackle data management complexities

Open Access and Preservation Functionalities• Federated Digital Libraries

– Integration of distributed repositories– Access control – can decide who can see it– Organize the data in collections– Describe your data – Metadata

• Data Grids– Access to efficient parallel I/O systems– Hierarchical Systems

• Disk caches, tapes• Often Distributed

– Analysis, Data Mining– Visualization– Workflow based systems– Transaction based data ingestion

• Data provenance, Data fingerprinting– What if virtual lab

• End User Oriented Portals– "I deal with the data in the way it makes sense to me"

Middlewares and Tools

• Data Management– Storage Resource Broker (SRB)– Globus Data Management– L-Store– IBP– Storage Resource Manager (SRM)

• Data Representation Libraries– HDF5– NetCDF

• Portals– OGCE– JSR 168

Today’s Reality

• Exceptional achievements by early adopters

• Integration between domain scientists – data users and producers still a challenge– Need much more cross-disciplinary interaction

• Emphasis on scale and performance• Failures are still a taboo

– Frustration factor should be addressed in partnership with users

– Focus on failure recovery and quality of service getting more attention

e-Infrastructure Workshop, NUDI/USP, São Paulo, 07.05.2007 9

Grid Initiatives around the World

HEPGrid

Ringrid

EELA

SPRACE

UCRAV

OurGrid

UNAM

SINAPAD

CL Grid

Networking in Latin America

RNP-BRREUNA-CL

CUDI-MX

RAAP-PE

REACCIUN-VE

12

Brazilian National Research And Education Network - RNP

• In November 2005 the RNP networking infrastructure was entirely renovated.

It consists of

• A multigigabit core connecting 10 capitals at 2.5 and 10 Gbps

• Connections at 34 Mbps to 11 capitals

• Connections up to16 Mbps to 6 capitals

Infra-estrutura para e-Ciência 13

Communitary Metropolitan Networks

• It is not enough to bring high speed connectivity to each city – it is necessary bring it to the university campus / research lab as well.

• The metropolitan network is the solution– Infrastructure sharing to support:

• Campi interconnection of each partner institution• Access to RNP national network backbone

– This sharing substantially reduces deployment costs– Preferably, the infrastructure will be owned by the partners

themselves (reducing operating costs)

• Pilot: The Metrobel project in the city of Belém do Pará in the Amazon region

Metrobel – Belém Metropolitan Network

Infra-estrutura para e-Ciência 15

Redecomep Project(2005-7)

• Following Metrobel, Brazilian Ministry of Science and Technology is supporting the Communitary Networks for Education and Research (Redecomep) Project, with a R$ 39,7 M (~ U$ 19,0 M) through Finep (dec/2004)

• Goals:– Extend the metropolitan optical network to other

26 cities with RNP points of presence– Promote integration in metropolitan area– High speed access to RNP point of presence

Next steps

• Integration between network, data repositories, compute, storage resources and applications– Identify who needs better connectivity

– Developing Brazilian cyberinfrastructure

– Generally uncoordinated funding for infrastructure resources

– Need broad vision at funding agencies and partners level of application requirements and cyberinfrastructure integration

• RNP articulating with scientific communities and infrastructure providers e-Science/Infrastructure initiative in Brazil

e-Infrastructure Workshop, NUDI/USP, São Paulo, 07.05.2007 17

JRU- Brazil: 22 members in EELA-2 # STATE INSTITUTION E-SCIENCE COMMUNITIES

1 SP CCE / USP (e-INFRASTRUCTURE only) 

2 RJ CEFET-RJ e-GOVERNMENT, E-INDUSTRY

3 RJ FCM / UERJ BIOMED

4 RJ FIOCRUZ BIOMED, e-EDUCATION

5 SP IAG / USP CLIMATE

6 RJ IME BIOMED

7 SP INCOR / USP BIOMED

8 SP INPE CLIMATE

9 RJ LNCC BIOMED

10 RJ ON PHYSICS

11 BR RNP (NREN) (e-INFRASTRUCTURE only)

12 SP SPRACE / UNESP PHYSICS

13 PB UFCG CLIMATE, EARTH-SCIENCE

14 RJ UFF (e-INFRASTRUCTURE only)

15 MG UFJF BIOMED

16 MS UFMS BIOMED

17 RS UFRGS CLIMATE

18 RJ UFRJ (coordinator for EELA-2) BIOMED, PHYSICS, e-EDUCATION, CLIMATE

19 RS UFSM CLIMATE

20 DF UnB BIOMED

21 RJ UNILASALLE e-EDUCATION

22 SP UNISANTOS BIOMED, E-LEARNING, e-GOVERNMENT

Developing Together

• Information infrastructure is being redefined in Brazil and Latin America

• Now is the time to have as much cross-disciplinary interaction as possible to define needs, partnerships and investments

• Please contact us

THANK YOU!


Recommended