+ All Categories
Home > Documents > Classification

Classification

Date post: 05-Jan-2016
Category:
Upload: kalkin
View: 37 times
Download: 0 times
Share this document with a friend
Description:
Classification. Ashish Mahabal aam at astro.caltech.edu iPTF Summer School Caltech 2014-08-25. Need for classification. Astro datasets getting larger (TB -> PB -> …) SDSS/ CRTS/PTF/…/LSST/SKA/LIGO Transient science (multi-epoch surveys) Spectroscopy is a bottleneck - PowerPoint PPT Presentation
Popular Tags:
37
Classification Ashish Mahabal aam at astro.caltech.edu iPTF Summer School Caltech 2014-08-25
Transcript
Page 1: Classification

Classification

Ashish Mahabalaam at astro.caltech.edu

iPTF Summer SchoolCaltech

2014-08-25

Page 2: Classification

Ashish Mahabal 2

Need for classification

Astro datasets getting larger (TB -> PB -> …)SDSS/CRTS/PTF/…/LSST/SKA/LIGOTransient science (multi-epoch surveys)Spectroscopy is a bottleneckEarly characterization and classification is a mustSeparating ordinary and known from unknown and

interestingGiven the data volumes, it should be automated

8/25/14

Page 3: Classification

Semantic Tree of Astronomical Variables and Transients AGN Subtypes

SN SubtypesTo understand transients, the variables need to be understood too.

Page 4: Classification

Ashish Mahabal 4

Computer Science

Mathematics and

Statistics

Domain Specific

Knowledge

MachineLearning

DataScience

Efficient algorithms and optimization

galaxy proximityGalactic latitude etc.

8/25/14

abstractionsand summaries

Page 5: Classification

Automated Classification Techniques• Implementation of clustering algorithms in a

machine-learning (ML) or AI setting– Examples: star-galaxy separation, automated galaxy

morphology classification, stellar or galaxy spectral types, etc.

• Supervised classifiers: a set of learning examples is provided; the number of possible classes is known– Examples: SVMs, Decision Trees, …

• Unsupervised classifiers: the program decides how many classes are needed to account for the diversity of the data, and classifies on the basis of the data

Page 6: Classification

Ashish Mahabal 6

Variety of available tools

• Python– PyML– scikit-learn

• R– http://cran.r-project.org/web/views/

MachineLearning.html• Matlab

8/25/14

Page 7: Classification

Ashish Mahabal 7

From Python’s scikit-learn

8/25/14

Page 8: Classification

Ashish Mahabal 8

Transient classification

• Characteristic properties– proximity to a galaxy– Galactic latitude– proximity to a radio source

• Lightcurve based quantities– amplitude– skew– Stetson J

Quantify thesemake “priors” out of them

8/25/14

Page 9: Classification

Ashish Mahabal 9

Simple(r) classification problem

8/25/14

Stars and galaxies

Page 10: Classification

Ashish Mahabal 10

Enter clustering• Determine the number of classes– Stars– Galaxies

8/25/14

Page 11: Classification

Ashish Mahabal 11

Possible complications

• Star - galaxy• Galaxy - galaxy (E, S0, S, Ir)• Quasar - star• Dwarfs - main sequence

8/25/14

Page 12: Classification

Ashish Mahabal 12

Enter clustering• Determine the number of classes• Understand their properties– Extendedness– Light concentration

8/25/14

Page 13: Classification

Ashish Mahabal 13

8/25/14

Measure parameters that are handles for these properties– Pixels occupied– Ratio of flux in two apertures

Arun Kumar

Page 14: Classification

Ashish Mahabal 14

Enter clustering• Determine the number of classes• Understand their properties• Measure parameters that are handles for

these properties• Plot the parameters• “Separate” the clusters

8/25/14

Page 15: Classification

Ashish Mahabal 15

• Classification is an integral part of A’nomy• Clustering is the means to separate the classes

(in an unsupervised manner)

8/25/14

Page 16: Classification

Ashish Mahabal 16

Simple classification problemComplications: just stars and galaxies?

• Stars• Galaxies• CCD defects• Cosmic rays• Bleed trails• Satellite trails• Asteroids!

8/25/14

e.g. real-bogus or CRTS’ NN for artifact removal

Page 17: Classification

Ashish Mahabal 17

Complications• How many classes are there?• Are they cleanly separated?– Brighter stars– Distant galaxies– Grazing cosmic rays

8/25/14

Page 18: Classification

Ashish Mahabal 18

ComplicationsHow many classes are

there?Are they cleanly

separated?Do all objects belong to

these classes?

8/25/14Djorgovski

Page 19: Classification

Ashish Mahabal 19

Complications• How many classes are there?• Are they cleanly separated?• Do all objects belong to these classes?• Could we add observables to classify better

and find rarer objects?– Another waveband?– A third one?– More epochs?

8/25/14

Page 20: Classification

Ashish Mahabal 20

Typical Parameter Space for S/G Classif.

Stellar locus

Galaxies

(From DPOSS)8/25/14

Page 21: Classification

Ashish Mahabal 21

Automated Star-Galaxy Classification:Decision Trees (DTs)

(Weir et al. 1995)8/25/14

Page 22: Classification

Ashish Mahabal 22

An Example: Classification of DPOSS Sources with AutoClass (an unsupervised Bayesian classifier)

Class 1: stellar (PSF)

Class 2: star with a fuzz

Class 3: early-type galaxy

Class 4: late-type galaxy8/25/14

Page 23: Classification

Ashish Mahabal 23

•Classification is an integral part of A’nomy•Clustering is the means to separate the classes•Outliers are the interesting rarer objects which

do not belong to the main classes

8/25/14

Page 24: Classification

Semantic Tree of Astronomical Variables and Transients AGN Subtypes

SN SubtypesTo understand transients, the variables need to be understood too.

Page 25: Classification

Richards+11

Debosscher+07

Richards+11

Page 26: Classification

Broad, incomplete hierarchy

All transients

SN

SN I

CV

CV, blazars, periodic

SN II

CV, blazars

periodic

blazarsTo other classifiers

Page 27: Classification

beyond1stdskew

• Measure features (metrics) for all light curves

Light-curve features

Adam Miller

Amplitude

Page 28: Classification

Ashish Mahabal 288/25/14

amplitude and std-dev for six classes of variables from CRTS

Page 29: Classification

Ashish Mahabal 298/25/14

Separation is better understood when shown as density

Page 30: Classification

beyond1stdskew

Amplitude

freq_signif

freq_varrat

freq_y_offset

freq_model_max_delta_magfreq_model_min_delta_mag

freq_model_phi1_phi2

freq_rrd

freq_n_alias

flux_%_mid20flux_%_mid35flux_%_mid50flux_%_mid65flux_%_mid80

linear_trend

max_slope

MAD

median_buffer_range_percentage

pair_slope_trend

percent_amplitude

percent_difference_flux_percentileQSOnon_QSO

std

small_kurtosis

stetson_jstetson_k

scatter_res_raw

p2p_scatter_2praw

p2p_scatter_over_mad

p2p_scatter_pfold_over_mad

medperc90_p2_pfold_2p_slope_10%fold_2p_slope_90%

p2p_ssqr_diff_over_var

Many features - not all are independent

Adam Miller

Page 31: Classification

Ashish Mahabal

A Variety of parameters that can be used• Discovery: magnitudes, delta-magnitudes• Contextual:

– distance to nearest star– Magnitude of the star– color of that star– normalized distance to nearest galaxy– Distance to nearest radio source– Flux of nearest radio source– Galactic latitude

• Follow-up– Colors (g-r, r-I, i-z etc.)

• Prior classifications (event type)• Characteristics from light-curve

– amplitude– Median buffer range percentage– Standard deviation– Stetson k– Flux percentile ratio mid80– Prior outburst statistic

Not all parameters are always present

http://ki-media.blogspot.com/

Bayesian Networks best to deal with such datasets as they can deal with missing data and the structure can be learnt from the data – at least in principle

Page 32: Classification

Relative significance of parameters

Linear trend: sign(linear trend) × log(linear trend| + 1e−06)sign(linear trend) ×√{|linear trend|}

med_buf_range_per: −log(1 − med_buf_range_per)

Kurtosis: log(3 + kurtosis)

Parameters from Richards et al.

Page 33: Classification

Ashish Mahabal 33

Bits we will leave outPeriodicity– Kepler – dense light-curves– irregular and sparse light-curves (most surveys)– best phasing, characteristic time-scales etc.

GPR– interpolation– regular grid

8/25/14

Page 34: Classification

Ashish Mahabal

Non-SNe (1) SNe (2)

1

2

2

2

1

1

Using 900 non-SNe and 600 SNe

80-90% completeness using just these parameters

Page 35: Classification

Ashish Mahabal 35

Bigger Bayesian Network picture

8/25/14

Page 36: Classification

Ashish Mahabal 36

Various methods

• Support Vector machines (SVM)• Random Forests• Decision Trees(Deep learning for images)

8/25/14

Page 37: Classification

Citizen science classfications as a path to machine learning

Most data never seen by scientistsPattern matching techniques not mature enough(and may never be as mature as humans – but large data makes a difference)

- Hanny’s Voorwerp is an excellent example

citizen sky is a path to better understanding (not an end)


Recommended