+ All Categories
Home > Documents > The data challenge in astronomy archives technology problems solution DCC conference Bath Andy...

The data challenge in astronomy archives technology problems solution DCC conference Bath Andy...

Date post: 12-Jan-2016
Category:
Upload: teresa-augusta-stewart
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
The data challenge in astronomy • archives • technology • problems • solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observato
Transcript
Page 1: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

The data challenge in astronomy

The data challenge in astronomy

• archives

• technology

• problems

• solution

DCC conference Bath Andy Lawrence Sep 2005

DCC conference Bath Andy Lawrence Sep 2005

the virtual observatory

Page 2: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

astronomical archives

(1)

Page 3: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

IT in astronomy : key areasIT in astronomy : key areas• (1) facility operations• (2) facility output processing• (3) shared supercomputers for theory• (4) science archives• (5) end-user tools

(1-3) : big bucks(4-5) : smaller bucks but

- produces the final science output

- sets requirements for (1-2)

Page 4: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

astronomical archives astronomical archives

• major archives growing at TB/yr

ESO Archive Volume (GB)

1

10

100

1000

10000

ESO Archive Volume (GB)

1

10

100

1000

10000

Page 5: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

astronomical archives astronomical archives

• major archives growing at TB/yr

• issue not storage but management (curation)

• improving quality of data access and presentation

• needs specialist data centres

ESO Archive Volume (GB)

1

10

100

1000

10000

ESO Archive Volume (GB)

1

10

100

1000

10000

Page 6: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

end users end users

• increasing fraction of archive re-use

Page 7: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

end users end users

• increasing fraction of archive re-use• increasing multi-archive use • most download small files and analyse at home• some users process whole databases• reduction standardised; analysis home grown

Page 8: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

needles in a haystackneedles in a haystackHambly et al 2001

- faint moving object is a cool white dwarf- may be solution to the dark matter problem- but hard to find : one in a million- even harder across multiple archives

Page 9: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

failed starsfailed stars

compare optical and infra-red

extra object is very cold

a "brown dwarf" orfailed star

Page 10: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

multi- views of a Supernova Remnant

Shocks seen in the X-ray

Heavy elementsseen in the optical

Dust seen in the IR

Relativistic electrons seen in the radio

Page 11: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

solar-terrestrial linkssolar-terrestrial links

Coronal mass ejection imaged by space-based

solar observatory

Effect detected hours later bysatellites and ground radar

Page 12: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

background technology

(2)

Page 13: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

dogs and fleas dogs and fleas

• there is a very large dog

Page 14: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

hardware trends hardware trends

• ops, storage, bw : all 1000x/decade– can get 1TB IDE = $5K

– backbones and LANS are Gbps

1.E-06

1.E-03

1.E+00

1.E+03

1.E+06

1.E+09

1880 1900 1920 1940 1960 1980 2000

doubles every 7.5 years

doubles every 2.3 years

doubles every 1.0 years

ops per second/$

Page 15: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

hardware trends hardware trends

• ops, storage, bw : all 1000x/decade– can get 1TB IDE = $5K– backbones and LANS are Gbps

• but device bw 10x/decade– real PC disks 10MB/s; fibre channel SCSI poss 100MB/s

• and last mile problem remains– end-end b/w typically 10Mbps

1.E-06

1.E-03

1.E+00

1.E+03

1.E+06

1.E+09

1880 1900 1920 1940 1960 1980 2000

doubles every 7.5 years

doubles every 2.3 years

doubles every 1.0 years

ops per second/$

Page 16: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

operations on a TB database operations on a TB database

• searching at 10 MB/s takes a day– solved by parallelism– but development non-trivial ==> people

• transfer at 10 Mbps takes a week– leave it where it is

• ==> data centres provide search and analysis services

Page 17: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

network development network development • higher level protocols ==> transparency

• TCP/IP message exchange

• HTTP doc sharing (web)

• grid suite CPU sharing

• XML/SOAP data exchange

==> service paradigm

Page 18: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

next up on the internet next up on the internet

• workflow definition

• dynamic semantics (ontology)

• software agents

Page 19: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

the problems

(3)

Page 20: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

data growth data growth

• astronomical data is growing fast

• but so is computing power

• so whats the problem ?

(1) Heterogeneity(2) End user delivery(3) End user demand

Page 21: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

data rich future data rich future • heritage

– Schmidt, IRAS, Hipparcos

• current hits– VLT, SDSS, 2MASS, HST, Chandra, XMM, WMAP

• coming up : – UKIDSS, VISTA, ALMA, JWST, Planck, Herschel

• cross fingers : – LSST, ELT, Lisa, Darwin,SKA, XEUS, etc.

• plus lots more

• issue is archive interoperability– need standards and transparent infrastructure

Page 22: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

archive data rates archive data rates • map the sky : 0.1" x 16 bits = 100 TB• process to find objects : billion row tables• VISTA 100 TB/yr by 2007• SKA datacubes 100PB/yr by 2020• not a technical or financial problem

– LHC doing 100PB/yr by 2007

• issue is logistic : data management • need professional data centres

Page 23: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

data rates : user delivery data rates : user delivery

• disk I/O and bandwidth – end-user bottlenecks will get WORSE– but links between data centres can be good

• move from download to service paradigm– leave the data where it is– operations on data (search, cluster analysis, etc) as services– shift the results not the data– networks of collaborating data centres (datagrid or VO)

Page 24: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

user demands user demands

• bar constantly raising– online ease– multi-archive transparency– easy data intensive science

• new requirements – automated resource discovery (intelligent Google)– cheap I/O and CPU cycles – new standards and software infrastructure

Page 25: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

the virtual observatory

(4)

Page 26: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

the VO concept the VO concept

• web all docs in the world inside your PC

• VO all databases in the world inside your PC

Page 27: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

Generic science driversGeneric science drivers

• data growth• multi-archive science• large database science

can do all this now, but needsto be fast and easy

• empowerment

Beijing as good as Berkeley

Page 28: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

whats its notwhats its not

• not a monolith

• not a warehouse

Page 29: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

VO frameworkVO framework

• framework + standards

• inter-operable data

• inter-operable software modules

• no central VO-command

- its not a thing- its a way of life

Page 30: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

VO geometryVO geometry

• not a warehouse

• not a hierarchy

• not a peer-to-peer system

• small set of service centresand large population of end users

– note : latest hot database lives with creators / curators

Page 31: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

yesterdayyesterday

browserfrontend

CGIrequest

html

web page

DBengine

SQL

data

Page 32: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

todaytoday

appl

icat

ion

webservice

SOAP/XML request

SOAP/XML data

DBengine

SQL

nativedata

anyt

hin

g

standard formats

Page 33: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

tomorrowtomorrow

appl

icat

ion

webservice

job

results

anyt

hin

g

webservice

webservice

webservice

webservice

webservice

Registry Workflow

GLUE Certification VO Space

standard semantics

publ

ish W

SDL

grid

con

nec

ted

Page 34: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

publishing metaphorpublishing metaphor

• facilities are authors

• data centres are publishers

• VO portals are shops

• end-users are readers

• VO infrastructure is distribution system.

Page 35: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

International VO alliance (IVOA) International VO alliance (IVOA)

Page 36: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

IVOA standardsIVOA standards

• formal process modelled on W3C

• technical working groups and interop workshops

• agreed functionality roadmap

Page 37: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

IVOA standardsIVOA standards

• key standards so far– table formats– resource and service metadata definitions– semantic dictionary– protocols for image and spectrum access

• coming year– grid and web service interfaces– authentication– storage sharing protocols– application metadata and interfaces

Page 38: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

state of implementationsstate of implementations

• key projects : AstroGrid, US-NVO, Euro-VO

• many compliant data services

• VO aware tools

• mutually harvesting registries

• workflow system

• simple shared storage

• AstroGrid has ~100 registered users

• first science results coming out

Page 39: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

coming yearcoming year

• single sign on

• internationally shared storage

• NGS link up

• many more tools

Page 40: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

next stepsnext steps

• intelligent glue– ontology, agents

• analysis services– cluster analysis, multi-D visualisation, etc

• theory services – simulated data, models on demand

• embedding facilities– VO ready facilities

– links to data creation

Page 41: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

lessons

Page 42: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

lessonslessons

• drivers: end user bottleneckend user demandempowerment

• need network of healthy data centres

• need last mile investment

• need facilities to be VO ready

• need continuing technology development

• need continuing standards programme

Page 43: The data challenge in astronomy archives technology problems solution DCC conference Bath Andy Lawrence Sep 2005 the virtual observatory.

FIN


Recommended