+ All Categories
Home > Documents > Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology...

Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology...

Date post: 13-Jan-2016
Category:
Upload: angelina-cole
View: 215 times
Download: 0 times
Share this document with a friend
23
Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology [email protected] SDMIV 24 October 2002 Edinburgh KE Tools S Data
Transcript
Page 1: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Knowledge Extractionfrom

Scientific Data

Roy WilliamsCalifornia Institute of Technology

[email protected]

SDMIV24 October 2002

Edinburgh

KE Tools S Data

Page 2: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Scientific Data Datacubes

N-dimensional array– spectrum, time-series, – image, voxels, hyperspectral image

Concentration Pattern matching Integration

Event Sets Often derived from pattern matching A set of events is a table Integrating Event Sets Clustering

Page 3: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Knowledge Extraction

Concentration principle components cluster/outlier finding

Datacube Eventset Pattern matching From theory or from training set

Integration registration of datacubes join / crossmatch of eventsets

Page 4: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

DatacubeSome stars from the DPOSS survey

Page 5: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

DatacubeAn AVIRIS image of San Francisco Bay

400-2500 nm in 224 bandsR. Green, JPL

atmosphericabsorption

Page 6: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Concentrating Information

eg Principle Component Analysis Given a set of vectors Compute dot products

(same as correlations)

Diagonalize Throw out weaker (noise) components

Page 7: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Information concentrationPrinciple Component Analysis

Page 8: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Event Sets

Created by pattern matching from a known rule from a training set by finding clusters

Page 9: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Event Set = Table

name=longitudecontent=Earth coordinateunits=degreesdatatype=doubledisplay=f6.2

43.487.283.2

name=IDcontent=keyunits=nonedatatype=char

E3948547E3948545E3943766108?

103?

Page 10: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Gravitational Lenses

A. Szalay, Johns Hopkins

Pattern matching finds events in datacubes

Page 11: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Black hole collisionsLIGO: Laser Interferometric Gravitational Wave Experiment

Page 12: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Creating Event SetsGiven a set of volcanoes, find a lot more volcanoesHere we use Singular Value Decomposition

Supervised Classification

Page 13: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

all sources

stellargalaxy

compactgalaxy

high fX/fopt

low fX/fopt

all sources

activedM stars

BLAGN

medium fX/fopt

NELGs

possible hi-z quasar

F/G stars?

normalgalaxies?

symbols: X-ray source counterpartscontours: all optical objects

BLAGN

Multiparameterdatacolour-colour-fx/fopt

Mike WatsonLeicester University

Page 14: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Integrating Datacubes

Find a mapping from one domain to the otherRegistration of DPOSS and Hubble Deep Field

Page 15: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Datacube RegistrationMovement of ice inferred from registration

Page 16: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Integrating Event Sets

Database Join Fuzzy Join

eg astronomical crossmatch

Distributed Join does the Grid do databases?

Page 17: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Integration of Star Catalogs

Roy Williams

2MASS versus DPOSS cross-identification with- j_m as 2MASS magnitude and - I_mtotn as DPOS magnitude

2MASS : j_m ,+ 15DPOSS: I_mtotn <= 18

DPOSS unmatched

2MASS matched

DPOSS matched

2MASS unmateched

Cross Matching

Page 18: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Visualizing Event SetsUnsupervised clustering

50000 stars in color-color space

Page 19: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

A Grid of Services

Human gets Data

Network of Services

Understood by humanFurther processing after format change

Grid of pipes and enginesSwitches and actuators

data flow

Page 20: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Example Grid of Services

StorageService

DPOSSService

CatalogService

User’s code CrossmatchService

2MASSService

Query CheckService

QueryEstimator

flexible complex metadataAND

broadband binary

Page 21: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Computing Challenges

• High-dimensionalClustering & ClassificationVisualizationOutlier Detection

• Visualization of 1010 points

• Database access to 1010 points

• Large Distributed Join

Page 22: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Standards needed

• Bundling diverse objects togetherwith code and references

• Referencing data resources on the Gridlocal, remote, replicated, ....

Page 23: Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Problem Solving Environment

StorageService

DPOSSService

CatalogService

User’s code CrossmatchService

2MASSService

Query CheckService

QueryEstimator

•Plumbing (big data) and electrical (control, metadata)

•Web service and workflow

•Finding service classes/implementations by semantics

•GUI / Executive / IO adapters / Algorithms


Recommended