+ All Categories
Home > Documents > Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani...

Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani...

Date post: 27-Mar-2015
Category:
Upload: colin-morales
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
20
Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential world….)
Transcript
Page 1: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Web Services for the Virtual Observatory

Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar

SPIE, Hawaii, 2002

(Living in an exponential world….)

Page 2: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 2

Outline

Collecting DataExponential Growth

Making DiscoveriesPublishing DataVO: How will it work?Web Services

Atomic vs Composite services

Distributed queries with SkyQueryCross-Matching AlgorithmSkyNode Web Services + Portal

Page 3: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 3

The World is Exponential

Astrophysical data is growing exponentially

Doubling every year (Moore’s Law+):both data sizes and number of data sets

Computational resources scale the same way

Constant $$$ will keep up with the data

Main problem is the software component

Currently components are not reusedSoftware costs are increasingly larger fractionAggregate costs are growing exponentially

Page 4: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 4

Making Discoveries

When and where are discoveries made?Always at the edges and boundariesGoing deeper, using more colors….

Metcalfe’s lawUtility of computer networks grows as the number of possible connections: O(N2)

VO: Federation of N archivesPossibilities for new discoveries grow as O(N2)

Current sky surveys have proven thisVery early discoveries from SDSS, 2MASS, DPOSS

Page 5: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 5

Publishing Data

Roles

Authors

Publishers

Curators

Consumers

Traditional

Scientists

Journals

Libraries

Scientists

Emerging

Collaborations

Project www site

Bigger Archives

Scientists

Page 6: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 6

Changing Roles

Exponential growth:Projects last at least 3-5 yearsData sent upwards only at the end of the projectData will be never centralized

More responsibility on projectsBecoming Publishers and CuratorsLarger fraction of budget spent on softwareLot of development duplicated, wasted

More standards are neededEasier data interchange, fewer tools

More templates are neededDevelop less software on your own

Page 7: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 7

Emerging New Concepts

Standardizing distributed dataWeb Services, supported on all platformsCustom configure remote data dynamicallyXML: Extensible Markup LanguageSOAP: Simple Object Access ProtocolWSDL: Web Services Description Language

Standardizing distributed computingGrid ServicesCustom configure remote computing dynamicallyBuild your own remote computer, and discardVirtual Data: new data sets on demand

Page 8: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 8

Shielding Users

Users do not want to deal with XML,they want their dataUsers do not want to deal with configuring grid computing, they want resultsSOAP: data appears in user memory, XML is invisibleSOAP call: just a remote procedure

Page 9: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 9

NVO: How Will It Work?

Define commonly used `atomic’ servicesBuild higher level toolboxes/portals on topWe do not build `everything for everybody’Use the 90-10 rule:

Define the standards and interfacesBuild the frameworkBuild the 10% of services that are used by 90%Let the users build the rest from the components

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5

# of services# o

f u

sers

Page 10: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 10

Atomic Services

Metadata information about resourcesWavebandSky coverageTranslation of names to universal dictionary (UCD)

Simple search patterns on the resourcesCone SearchImage mosaicUnit conversions

Simple filtering, counting, histogrammingOn-the-fly recalibrations

Page 11: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 11

Higher Level Services

Built on Atomic ServicesPerform more complex tasksExamples

Automated resource discoveryCross-identificationsPhotometric redshiftsOutlier detectionsVisualization facilities

Expectation:Build custom portals in matter of days from existing building blocks (like today in IRAF or IDL)

Page 12: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 12

SkyQuery

Distributed Query tool using a set of servicesFeasibility study, built in 6 weeks from scratch

Tanu Malik (JHU CS grad student) Tamas Budavari (JHU astro postdoc)

Implemented in C# and .NETWon 2nd prize of Microsoft XML ContestAllows queries like:

SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o,

TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5

AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2

Page 13: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 13

Architecture

Image cutout

SkyNodeSDSS

SkyNode2Mass

SkyNodeFirst

SkyQuery

Web Page

Page 14: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 14

Cross-id Steps

Parse queryGet countsSort by countsMake planCross-match

Recursively, from small to large

Select necessary attributes onlyReturn outputInsert cutout image

SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o,

TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5 AND AREA(181.3,-0.76,6.5) AND (o.i - t.m_j) > 2 AND o.type=3

Page 15: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 15

Monte-Carlo Simulation

Comparing different algorithms for 3-way xid

Transmit all the dataTransmit after filteringRecursive cross-match

SurveysSDSS2MASSFirst

Random variables:Sky Area (0..10 sqdeg)Selectivity of each subselect (0..1)Efficiency of join (0.5..2)Selectivity of common select (0..1)

0

500

1000

1500

2000

-4 -2 0 2 4log cost

0

500

1000

1500

2000

-4 -2 0 2 4log cost

Page 16: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 16

SkyNode

Metadata functions (SOAP)Info, Tables, Columns, Schema, Functions, Keysearch

Query functions (SOAP)Dataset Query(String sqlCmd)Dataset Xmatch(Dataset input, String sqlCmd, float eps)

Database MS SQL ServerUpload datasetVery fast spatial search engine (HTM-based)crossmatch takes <3 ms/object over 15M in SDSSUser defined functions and stored procedures

Page 17: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 17

Data Flow

SkyNode 1

SkyQuery

SkyNode 2

SkyNode 3

query

http://www.skyquery.net

Page 18: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 18

Other web services

Create density maps and masks for angular clustering

Deliver photometric redshifts form photometry dataIntersect pointed observations with surveysGenerate XSLT from script XML=> SVGWrap legacy (Linux C) data mining applications as a web serviceCreate a C# class for the CFITSIO library

Page 19: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 19

Archive Footprint

Footprint is a ‘fractal’Result depends on context

all sky, degree scale, pixel scale

Translate to web servicesFootprint()returns single region that contains the archiveIntersection(region, tolerance) feed a region and returns the intersection with archive footprintContains(point) returns yes/no (maybe fuzzy) if point is inside archive footprint

Page 20: Web Services for the Virtual Observatory Alex Szalay, Tamas Budavari, Tanu Malik, Jim Gray, and Ani Thakar SPIE, Hawaii, 2002 (Living in an exponential.

Alex Szalay, SPIE 2002 20

Summary

Exponential data growth – distributed data– federation needed

Projects now Publishers and CuratorsWeb Services – hierarchical architectureUse the 90-10 rule (maybe 80-20)There are clever ways to federate datasets!


Recommended