+ All Categories
Home > Documents > An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Date post: 22-Jan-2016
Category:
Upload: pascal
View: 19 times
Download: 0 times
Share this document with a friend
Description:
An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS). PI: Sara J. Graves Project Lead: Rahul Ramachandran Information Technology and Systems Center University of Alabama in Huntsville [email protected] [email protected]. http://www.itsc.uah.edu. - PowerPoint PPT Presentation
Popular Tags:
32
An Interoperable An Interoperable Framework for Mining and Framework for Mining and Analysis of Space Science Analysis of Space Science Data (F-MASS) Data (F-MASS) PI: Sara J. Graves PI: Sara J. Graves Project Lead: Rahul Ramachandran Project Lead: Rahul Ramachandran Information Technology and Systems Center Information Technology and Systems Center University of Alabama in Huntsville University of Alabama in Huntsville [email protected] [email protected] [email protected] [email protected] http://www.itsc.uah.edu
Transcript
Page 1: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

An Interoperable Framework for An Interoperable Framework for Mining and Analysis of Space Mining and Analysis of Space

Science Data (F-MASS)Science Data (F-MASS)

PI: Sara J. Graves PI: Sara J. Graves Project Lead: Rahul RamachandranProject Lead: Rahul Ramachandran

Information Technology and Systems CenterInformation Technology and Systems CenterUniversity of Alabama in HuntsvilleUniversity of Alabama in Huntsville

[email protected]@itsc.uah.edu [email protected]@itsc.uah.edu

http://www.itsc.uah.edu

Page 2: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Others Involved in the ProjectOthers Involved in the Project

Wladislaw Lyatsky and Arjun Tan (Co-PI)Wladislaw Lyatsky and Arjun Tan (Co-PI)

Department of Physics, Alabama A&M Department of Physics, Alabama A&M UniversityUniversity

Glynn Germany Glynn Germany

Center for Space Plasma, Aeronomy, and Center for Space Plasma, Aeronomy, and Astrophysics Research, University of Alabama in Astrophysics Research, University of Alabama in HuntsvilleHuntsville

Xiang Li, Matt He, John Rushing and Amy LinXiang Li, Matt He, John Rushing and Amy Lin

ITSC, University of Alabama in HuntsvilleITSC, University of Alabama in Huntsville

Page 3: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Extend the existing scientific data mining framework Extend the existing scientific data mining framework by providing additional data mining algorithms and by providing additional data mining algorithms and customized user interfaces appropriate for the customized user interfaces appropriate for the space science research domainspace science research domain Provide a framework for mining to allow better data Provide a framework for mining to allow better data

exploitation and use exploitation and use

Utilize specific space science research scenarios Utilize specific space science research scenarios as use case drivers for identifying additional as use case drivers for identifying additional techniques to be incorporated into the frameworktechniques to be incorporated into the framework Enable scientific discovery and analysisEnable scientific discovery and analysis

Project ObjectivesProject Objectives

Page 4: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Overview of the “New” Mining Overview of the “New” Mining Framework Framework

Case Study: Comparing Different Case Study: Comparing Different Thresholding Algorithms for Thresholding Algorithms for Segmenting Auroras Segmenting Auroras

Presentation OutlinePresentation Outline

Page 5: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

ADaM 4.0ADaM 4.0 Algorithm Development and MiningAlgorithm Development and Mining

Page 6: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Mining System: Design Mining System: Design ObjectivesObjectives

Ease of Use!Ease of Use!

Reusable ComponentsReusable Components

Simple Internal Data ModelSimple Internal Data Model

Allow both loose and tight coupling Allow both loose and tight coupling

with other applications/systemswith other applications/systems

Flexible to allow ease of use in both Flexible to allow ease of use in both

batch and interactive modebatch and interactive mode

Page 7: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Mining System DesignMining System Design

VIRTUAL REPOSITORY OF OPERATIONS

DATA MINING IMAGE PROCESSING

TOOLKIT TOOLKIT

OPERATIONS

PROVIDE MINING OPERATIONS AS WEB SERVICES

BUILD GENERIC APPLICATIONS

USE OPERATIONS AS STAND ALONE

EXECUTABLES

BUILD CUSTOMIZED APPLICATIONS

Each component is provided with a Each component is provided with a C++ application programming C++ application programming interface (API), an executable in interface (API), an executable in support of scripting tools (e.g. Perl, support of scripting tools (e.g. Perl, Python, Tcl, Shell) Python, Tcl, Shell)

ADaM components are lightweight ADaM components are lightweight and autonomous, and have been and autonomous, and have been used successfully in a grid used successfully in a grid environmentenvironment

ADaM has several translation ADaM has several translation components that provide data level components that provide data level interoperability with other mining interoperability with other mining systems (such as WEKA and systems (such as WEKA and Orange), and point tools (such as Orange), and point tools (such as libSVM and svmLight)libSVM and svmLight)

ADaM also includes Python wrappersADaM also includes Python wrappers

ADaM toolkit is available to all ADaM toolkit is available to all

Page 8: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

ADaM 4.0 ComponentsADaM 4.0 Components

Page 9: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

ADaM Example: Classification ProcessADaM Example: Classification Process

Identify potential features which may characterize Identify potential features which may characterize the phenomenon of interestthe phenomenon of interestGenerate a set of training instances where each Generate a set of training instances where each instance consists of a set of feature values and the instance consists of a set of feature values and the corresponding class labelcorresponding class labelDescribe the instances using ARFF file formatDescribe the instances using ARFF file formatPreprocess the data as necessary (normalize, Preprocess the data as necessary (normalize, sample etc.)sample etc.)Split the data into training / test set(s) as Split the data into training / test set(s) as appropriateappropriateTrain the classifier using the training setTrain the classifier using the training setEvaluate classifier performance using test setEvaluate classifier performance using test set

Page 10: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Sample Data Set – ARFF FormatSample Data Set – ARFF Format

Page 11: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Utilities for Splitting the SamplesUtilities for Splitting the SamplesADaM has utilities for splitting data sets into disjoint groups for training and testing ADaM has utilities for splitting data sets into disjoint groups for training and testing classifiersclassifiers

The simplest is ITSC_Sample, which splits the source data set into two disjoint The simplest is ITSC_Sample, which splits the source data set into two disjoint subsetssubsetsExample: split data set into two groups, one with 2/3 of the patterns and another Example: split data set into two groups, one with 2/3 of the patterns and another with 1/3 of the patterns:with 1/3 of the patterns:

ITSC_Sample -c class -i bcw.arff -o trn.arff -t tst.arff –p 0.66ITSC_Sample -c class -i bcw.arff -o trn.arff -t tst.arff –p 0.66

The –i argument specifies the input file nameThe –i argument specifies the input file name The –o and –t arguments specify the names of the two output files (-o = output one, -t = The –o and –t arguments specify the names of the two output files (-o = output one, -t =

output two)output two) The –p argument specifies the portion of data that goes into output one (trn.arff), the The –p argument specifies the portion of data that goes into output one (trn.arff), the

remainder goes to output two (tst.arff)remainder goes to output two (tst.arff) The –c argument tells the sample program which attribute is the class attributeThe –c argument tells the sample program which attribute is the class attribute

Page 12: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Training the ClassifierTraining the Classifier

ADaM has several different types of classifiersADaM has several different types of classifiersEach classifier has a training method and an application methodEach classifier has a training method and an application methodExample: Naïve Bayes classifierExample: Naïve Bayes classifier

ITSC_NaiveBayesTrain -c class -i trn.arff –b bayes.txtITSC_NaiveBayesTrain -c class -i trn.arff –b bayes.txt

The –i argument specifies the input file nameThe –i argument specifies the input file name The –c argument specifies the name of the class attributeThe –c argument specifies the name of the class attribute The –b argument specifies the name of the classifier file:The –b argument specifies the name of the classifier file:

Page 13: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Applying the ClassifierApplying the Classifier

Once trained, the Naïve Bayes classifier can be used to classify unknown instancesOnce trained, the Naïve Bayes classifier can be used to classify unknown instancesFor this demo, the classifier is run as follows:For this demo, the classifier is run as follows:

ITSC_NaiveBayesApply -c class -i tst.arff –b bayes.txt -o res_tst.arffITSC_NaiveBayesApply -c class -i tst.arff –b bayes.txt -o res_tst.arff The –i argument specifies the input file nameThe –i argument specifies the input file name The –c argument specifies the name of the class attributeThe –c argument specifies the name of the class attribute The –b argument specifies the name of the classifier fileThe –b argument specifies the name of the classifier file The –o argument specifies the name of the result fileThe –o argument specifies the name of the result file

Page 14: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Evaluating Classifier PerformanceEvaluating Classifier Performance

By applying the classifier to a test set where the correct class is known in By applying the classifier to a test set where the correct class is known in advance, it is possible to compare the expected output to the actual advance, it is possible to compare the expected output to the actual output.output.ITSC_Accuracy is run as follows:ITSC_Accuracy is run as follows:

ITSC_Accuracy -c class -t res_tst.arff –v tst.arff –o acc_tst.txtITSC_Accuracy -c class -t res_tst.arff –v tst.arff –o acc_tst.txt

Page 15: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Python Script for ClassificationPython Script for Classification

Page 16: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Additional InformationAdditional Information

http://datamining.itsc.uah.edu/adam/http://datamining.itsc.uah.edu/adam/ Additional informationAdditional information DocumentationDocumentation Download ADaM 4.0 executables (Windows Download ADaM 4.0 executables (Windows

and Linux)and Linux)

Page 17: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Comparing Different Thresholding Comparing Different Thresholding Algorithms for Segmenting Auroras Algorithms for Segmenting Auroras

Case Study using 130 images from Case Study using 130 images from UVI observations on September 14, UVI observations on September 14, 1997, covering the time period from 1997, covering the time period from

8:30 UT and 11:27 UT 8:30 UT and 11:27 UT

Page 18: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Segmenting Auroral EventsSegmenting Auroral Events

Spacecraft UV images observing auroral events contain Spacecraft UV images observing auroral events contain two regions, an auroral oval and the background two regions, an auroral oval and the background Under ideal circumstances, the histogram of these Under ideal circumstances, the histogram of these images has two distinct modes and a threshold value images has two distinct modes and a threshold value can be determined to separate the two regions can be determined to separate the two regions Different factors such as the date, time of the day, and Different factors such as the date, time of the day, and satellite position all affect the luminosity gradient of the satellite position all affect the luminosity gradient of the UV image making the two regions overlap and thereby UV image making the two regions overlap and thereby making the threshold selection a non trivial problemmaking the threshold selection a non trivial problem Objective of this study:Objective of this study:Compare different thresholding techniques and Compare different thresholding techniques and algorithms for segmenting auroral events in Polar UV algorithms for segmenting auroral events in Polar UV imagesimages

Page 19: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Thresholding Thresholding

Global ThresholdingGlobal Thresholding uses a fixed threshold for all pixels in the imageuses a fixed threshold for all pixels in the image works only if the intensity histogram of the input works only if the intensity histogram of the input

image contains neatly separated peaks corresponding image contains neatly separated peaks corresponding to the desired subject(s) and background(s)to the desired subject(s) and background(s)

cannot deal with images containing, for example, a cannot deal with images containing, for example, a strong illumination gradient. strong illumination gradient.

Adaptive/Local ThresholdingAdaptive/Local Thresholding selects an individual threshold for each pixel based on selects an individual threshold for each pixel based on

the range of intensity values in its local the range of intensity values in its local neighbourhoodneighbourhood

allows for thresholding of an image whose global allows for thresholding of an image whose global intensity histogram doesn't contain distinctive peaks intensity histogram doesn't contain distinctive peaks

Page 20: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Methodology Used in this StudyMethodology Used in this Study

Test global thresholding technique using the following Test global thresholding technique using the following algorithmsalgorithms

Mixture ModelingMixture Modeling Fuzzy SetFuzzy Set EntropyEntropy

Develop adaptive thresholding technique using context Develop adaptive thresholding technique using context information and the following algorithmsinformation and the following algorithms

Modified Mixture ModelingModified Mixture Modeling Fuzzy SetFuzzy Set EntropyEntropy Edge Based DetectionEdge Based Detection

Test and evaluateTest and evaluate

Page 21: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Mixture ModelingMixture Modeling

Mixture modeling thresholding algorithm assumes that Mixture modeling thresholding algorithm assumes that the object and the background are distributed normally the object and the background are distributed normally and the threshold is calculated by minimizing the error by and the threshold is calculated by minimizing the error by fitting the two Gaussian distributions to the histogram, fitting the two Gaussian distributions to the histogram, The Gaussian distributions are described by:The Gaussian distributions are described by:

The least square minimization function is given byThe least square minimization function is given by

)2/)(( 22

)(

x

eP

xG

v

i

ie

NP

0

)2/)(( 22

255

0

221 ))()()((

i

iFiGiGR

Page 22: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Fuzzy SetsFuzzy Sets

The fuzzy sets algorithm uses a membership function to The fuzzy sets algorithm uses a membership function to define a fuzzy object regiondefine a fuzzy object regionIn a normal set theory, a set has no elements in common In a normal set theory, a set has no elements in common between itself and its complement. between itself and its complement. In a fuzzy set, each element may belong to a set and its In a fuzzy set, each element may belong to a set and its complement with certain probabilities. complement with certain probabilities. Yager (1979) defined a measure of fuzziness for the Yager (1979) defined a measure of fuzziness for the degree with which a set and its compliment were degree with which a set and its compliment were indistinguishable. This is given by the following indistinguishable. This is given by the following expressionexpression

The gray level value that minimizes the fuzziness is the The gray level value that minimizes the fuzziness is the threshold value for the image.threshold value for the image.

p

i

p

xxp iuiutD

1

)()()(

Page 23: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

EntropyEntropy

Entropy is a measure commonly used in information Entropy is a measure commonly used in information theory and characterizes the impurity of a data. The theory and characterizes the impurity of a data. The entropy of the object and the background for a given entropy of the object and the background for a given threshold threshold tt can be calculated by using: can be calculated by using:

and where and where pjpj is the histogram value at gray level j is the histogram value at gray level j

Thus, finding the threshold for the image now becomes Thus, finding the threshold for the image now becomes an optimization. an optimization. The gray level value that maximizes the entropy for the The gray level value that maximizes the entropy for the sum of H0 and Hw is used as the threshold.sum of H0 and Hw is used as the threshold.

t

jt

ii

j

t

ii

j

p

p

p

pH

0

00

0 log

255

1255

1

255

1

logtj

tii

j

tii

jw

p

p

p

pH

Page 24: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Edge Based DetectionEdge Based Detection

The edge based method identifies the aurora region by detecting the The edge based method identifies the aurora region by detecting the transition zone between aurora and backgroundtransition zone between aurora and backgroundThis is an adaptive technique that uses context information to subdivide the This is an adaptive technique that uses context information to subdivide the image into subzonesimage into subzonesUsing the domain knowledge that the auroras are centered on the Using the domain knowledge that the auroras are centered on the magnetic pole, radial slices at a certain Magnetic Local Time (MLT) and magnetic pole, radial slices at a certain Magnetic Local Time (MLT) and starting from the magnetic pole are used to divide the image into subzonesstarting from the magnetic pole are used to divide the image into subzonesThe rate of change of the intensity along the magnetic latitude is calculated The rate of change of the intensity along the magnetic latitude is calculated using the following formulausing the following formula

The transition zones show a sharp change in intensity between the aurora The transition zones show a sharp change in intensity between the aurora and backgroundand backgroundTherefore, by detecting the maximum and minimum gradient location, the Therefore, by detecting the maximum and minimum gradient location, the poleward and equatorward boundaries of aurora oval are identified for this poleward and equatorward boundaries of aurora oval are identified for this MLT MLT

),,(),()( MLTLatIMLTLatLatILat

IMLT

Page 25: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Global Thresholding Result: Global Thresholding Result: Sept, 14, 1997 image, 08:41:53 UTC Sept, 14, 1997 image, 08:41:53 UTC

ORIGINAL IMAGE IMAGE HISTOGRAM

MIXTURE MODELING (64) ENTROPY (122)FUZZY SETS (132)

Page 26: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Global Thresholding Result: Global Thresholding Result: Sept, 14, 1997 image,10:31:40 UTCSept, 14, 1997 image,10:31:40 UTC

ORIGINAL IMAGE IMAGE HISTOGRAM

MIXTURE MODELING (43) ENTROPY (130)FUZZY SETS (138)

Page 27: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Adaptive Thresholding ProcessAdaptive Thresholding Process

MLT(18) MIXTURE MODELING (91) FUZZY SETS(124) ENTROPY(117)

MLT(20) MIXTURE MODELING (52) FUZZY SETS(123) ENTROPY(162)

Page 28: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Adaptive Thresholding Process:Adaptive Thresholding Process:Edge Based DetectionEdge Based Detection

MLT (18)

MLT( 20)

Page 29: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Adaptive Thresholding Results:Adaptive Thresholding Results:Sept 14, 1997 image 09:05:48 UTC Sept 14, 1997 image 09:05:48 UTC A

B C

D E

A. Original Image B. Mixture Modeling C. Entropy D. Fuzzy Sets E. Gradient

Page 30: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Adaptive Thresholding Results:Adaptive Thresholding Results:Sept 14, 1997 image 08:31:27 UTC Sept 14, 1997 image 08:31:27 UTC

A. Original Image B. Mixture Modeling C. Entropy D. Fuzzy Sets E. Gradient

AB C

D E

Page 31: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

AnalysisAnalysis

Global Thresholding Techniques:Global Thresholding Techniques: As expected, global thresholding techniques do not perform well As expected, global thresholding techniques do not perform well

in aurora detection in aurora detection Fuzzy sets and Entropy based thresholding algorithms in Fuzzy sets and Entropy based thresholding algorithms in

general tend to overestimate the threshold and are unable to general tend to overestimate the threshold and are unable to detect the entire aurora detect the entire aurora

Mixture modeling algorithm on the other hand underestimates Mixture modeling algorithm on the other hand underestimates the threshold and may label some portion of background as the threshold and may label some portion of background as auroraaurora

Adaptive Thresholding Techniques:Adaptive Thresholding Techniques: Adaptive thresholding techniques show better results for the Adaptive thresholding techniques show better results for the

given set of images as compared to global thresholdinggiven set of images as compared to global thresholding Fuzzy set, Entropy, Edge Based Detection algorithms all are Fuzzy set, Entropy, Edge Based Detection algorithms all are

unable to do as good a job as Mixture modelingunable to do as good a job as Mixture modeling

Page 32: An Interoperable Framework for Mining and Analysis of Space Science Data (F-MASS)

Future WorkFuture WorkInvestigate the behavior of these thresholding algorithms for oval Investigate the behavior of these thresholding algorithms for oval boundary detection in the presence of day glow and other types of boundary detection in the presence of day glow and other types of noise noise Investigate the use of Chow and Kaneko (1972) technique for Investigate the use of Chow and Kaneko (1972) technique for adaptive thresholding adaptive thresholding

Current adaptive thresholding scheme divides an image into subzones Current adaptive thresholding scheme divides an image into subzones based on the MLT time. based on the MLT time.

Disadvantage: Disadvantage: It requires auxiliary MLT information. It requires auxiliary MLT information. The numbers of pixels in each subzone are not uniform, and at times, improper The numbers of pixels in each subzone are not uniform, and at times, improper

threshold is obtainedthreshold is obtained Chow and Kaneko method partitions the image into non-overlapping Chow and Kaneko method partitions the image into non-overlapping

blocks of equal area and the threshold for each block computed blocks of equal area and the threshold for each block computed independentlyindependently

The thresholds for the blocks are then interpolated over the entire imageThe thresholds for the blocks are then interpolated over the entire image

The best thresholding algorithms will be incorporated to ADaM 4.0The best thresholding algorithms will be incorporated to ADaM 4.0


Recommended