Data-Driven Modeling using Spherical Self …Data-Driven Modeling using Spherical Self-Organizing...

Data-Driven Modeling using Spherical Self-Organizing Feature Maps

by

Archana P. Sangole

ISBN: 1-58112- 319-1

DISSERTATION.COM

Boca Raton, Florida USA • 2006

Data-Driven Modeling using Spherical Self-Organizing Feature Maps

Copyright © 2003 Archana P. Sangole All rights reserved.

Dissertation.com

Boca Raton, Florida USA • 2006

ISBN: 1-58112- 319-1

Data-Driven Modeling using Spherical

Self-Organizing Feature Maps

By

Archana P. Sangole

The Department of Mechanical and Materials Engineering

Graduate Program

in

Engineering

Submitted in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

Faculty of Graduate Studies

The University of Western Ontario

London, Ontario

June, 2003

Archana P. Sangole 2003

Dedication

This work is dedicated to my father Mr. Prabhakar R. Sangole and my mother Mrs. Shobha P.

Sangole who have stood by me at all times and believed in me.

Do not go where the path may lead,

go instead where there is no path and leave a trail.

- Ralph Waldo Emerson

iii

ABSTRACT

Researchers and data analysts are increasingly relying on graphical tools to assist them in

modeling their data, generating their hypotheses, and gaining deeper insights on their

experimentally acquired data. Recent advances in technology have made available more

improved and novel modeling and analysis media that facilitate intuitive, task-driven exploratory

analysis and manipulation of the displayed graphical representations. In order to utilize these

emerging technologies researchers must be able to transform experimentally acquired data

vectors into a visual form or secondary representation that has a simple structure and, is easily

transferable into the media. As well, it is essential that it can be modified or manipulated within

the display environment.

This thesis presents a data-driven modeling technique that utilizes the basic learning strategy

of an unsupervised clustering algorithm, called the self-organizing feature map, to adaptively

learn topological associations inherent in the data and preserve them within the topology imposed

by its predefined spherical lattice, thereby transforming the data into a 3D tessellated form. The

tessellated graphical forms originate from a sphere thereby simplifying the process of computing

its transformation parameters on re-orientation within an interactive, task-driven, graphical

display medium. A variety of data sets including six sets of scattered 3D coordinate data, chaotic

attractor data, the more commonly used Fisher’s Iris flower data, medical numeric data,

geographic and environmental data are used to illustrate the data-driven modeling and

visualization mechanism.

The modeling algorithm is first applied to scattered 3D coordinate data to understand the

influence of the spherical topology on data organization. Two cases are examined, one in which

the integrity of the spherical lattice is maintained during learning and, the second, in which the

inter-node connections in the spherical lattice are adaptively changed during learning. In the

analysis, scattered coordinate data of freeform objects with topology equivalent to a sphere and

those whose topology is not equivalent to a sphere are used. Experiments demonstrate that it is

possible to get reasonably good results with the degree of resemblance, determined by an average

of the total normalized error measure, ranging from 6.2x10-5 – 1.1x10-3. The experimental

analysis using scattered coordinate data facilitates an understanding of the algorithm and provides

evidence of the topology-preserving capability of the spherical self-organizing feature map.

The algorithm is later implemented using abstract, seemingly random, numeric data. Unlike

in the case of 3D coordinate data, wherein the SOFM lattice is in the same coordinate frame

iv

(domain) as the input vectors, the numeric data is abstract. The criterion for deforming the

spherical lattice is determined using mathematical and statistical functions as measures-of-

information that are tailored to reflect some aspect of meaningful, tangible, inter-vector

relationships or associations embedded in the spatial data that reveal some physical aspect of the

data. These measures are largely application-dependent and need to be defined by the data analyst

or an expert. Interpretation of the resulting 3D tessellated graphical representation or form (glyph)

is more complex and task dependent as compared to that of scattered coordinate data. Very

simple measures are used in this analysis in order to facilitate discussion of the underlying

mechanism to transform abstract numeric data into 3D graphical forms or glyphs. Several data

sets are used in the analysis to illustrate how novel characteristics hidden in the data, and not

easily apparent in the string of numbers, can be reflected via 3D graphical forms.

The proposed data-driven modeling approach provides a viable mechanism to generate 3D

tessellated representations of data that can be easily transferred to a graphical modeling and

analysis medium for interactive and intuitive exploration.

KEYWORDS: Deformable shape modeling, spherical map, data-driven models, self-organizing

feature map, scientific data visualization, exploratory data analysis, and numeric data

transformation.

v

ACKNOWLEDGEMENTS

I would like to express my sincere gratitude to all those who helped to make this research

possible by being a part of this journey and by just being there for me.

First and foremost, heartfelt thanks are due to my supervisor, Prof. George K. Knopf, for his

guidance and encouragement throughout the gruelling experience. His constant enthusiasm in this

research added a touch of hope and inspiration to surge forward and to strive for excellence. He

shared with me his passion for research and teaching, and, presented me with challenges and

opportunities that helped me learn more about myself. He has been a source of inspiration in all

respects and I am truly thankful to him.

I had to opportunity to interact with several faculty members and graduate students who provided

support in various ways; sometimes helping me brainstorm on ideas, by listening to me no matter

how busy or tired they were, by occasionally sharing a coffee with me and, by just having a

friendly conversation to brighten my day. Their presence made a big difference. My sincere

thanks to Prof. Roger Khayat and his students KyuTae Kim, Zhenyu Li and Radoslav German for

their assistance and for making me feel as if I were part of the team. I am very thankful to Prof.

Cynthia Dunning Zwicker for her constant encouragement during the last few months and for just

being there for me when I was almost ready to give up. Thanks are also due to Mr. Marian

Jaworski, a senior Technician in the Department, and, to other students and friends who gave

their support.

My sincere gratitude to the department graduate secretaries Mrs. Joan Tangen and Ms. Chris

Seres, and, the faculty ITS computing staff for their constant support and for accommodating me

at all times. It was a pleasure working with my former colleagues Jonathan, Alireza, and Basem,

as well as my colleagues Dennis, WeiWei and Philip. Special thanks are due to Dennis and his

family for seeing me through frustrating and difficult times.

I gratefully acknowledge the financial assistance that I received in the form of scholarships from

the Ontario Ministry of Education and Training, and, the Faculty of Graduate Studies. It made the

trip a little easier to endure. This work was also partially supported by a research grant from the

Natural Sciences and Engineering Research Council of Canada (NSERC).

There are not enough words to express my appreciation and thankfulness to the unreserved

encouragement, inspiration and support from my parents, Mr. and Mrs. P.R. Sangole, and, my

vi

Uncle and Aunt, Mr. and Mrs. D.J. Patel. The endless resource of moral support, patience and

understanding that I received from them deserves a special acknowledgement. They made the

journey worthwhile by being there for me at all times and I am truly grateful to be blessed with

such a wonderful family.

On a more official note, I would like to thank the Distributed Active Archive Centre (DAAC) at

the Goddard Space Flight Centre (GSFC), Greenbelt, MD, for the multi-spectral satellite data set,

Forrest Hall and Blanche Meeson, the principal investigators in the ISLSCP Initiative-II Project,

for the snow cover environmental data set, Prof. Hervé Delingette (INRIA), Prof. Jim Bezdek

(University of West Florida) and Mr. Robert Cushman (Director, CDIAC) for their helpful

comments and encouraging replies to my e-mails, the Stanford 3D Scanning Repository for the

bunny data set, the UCI Repository of Machine Learning for the Wisconsin breast cancer data and

Mr. Markus Wawryniuk (Computer Science Institute, Konstanz, Germany) for directing me to the

UCI Repository. The assistance from Mr. Peter Smith (DAAC-GSFC), Mr. Basem Yousef who

helped in acquiring the sculptured and human head range data and, Mr. Tesfaye Breta, a 2nd year

Engineering student, who modeled for the human head range data are gratefully acknowledged.

vii

TABLE OF CONTENTS CERTIFICATE OF EXAMINATION ………………………………………………. ii

ABSTRACT …………………………………………………………………………. iii

ACKNOWLEDGEMENTS ………………………………………………………… v

TABLE OF CONTENTS ……………………………………………………………. vii

LIST OF TABLES ………………………………………………………………….. xi

LIST OF FIGURES …………………………………………………………………. xii

CHAPTER 1 INTRODUCTION …………………………………………………. 1

1.1 The Problem ……………………………………………………………. 1

1.2 Basic Terminology …….…………………………………………….…. 1

1.3 General Concept of the Modeling Mechanism ..………………………. 2

1.4 Overview of the Thesis …………………………………………...…… 3

CHAPTER 2 THE SPHERICAL SELF-ORGANIZING FEATURE MAP ……. 5

2.1 Introduction ……………………………………………………………. 5

2.2 Unsupervised Topology Mapping Methods ……….…………………… 5

2.2.1 Sammon’s mapping method …………………………………. 5

2.2.2 Multi-dimensional scaling method ………………………….. 6

2.3 Self-Organizing Feature Map (SOFM) ………………………………… 7

2.3.1 Learning strategy in a one-dimensional SOFM ……………… 9

2.3.2 Topology of the SOFM lattice ………………………………. 11

2.3.3 Weight adaptation strategy in the spherical SOFM ………….. 13

2.3.4 The neighbourhood operator (NEi,j,k*) ……………………….. 16

2.3.5 The learning rate (α) ………………………………………… 16

2.3.6 The learning rules …………………………………………… 17

2.3.7 Computation time …………………………………………… 20

2.4 Significance of the Spherical Topology ………………………………… 20

2.5 Merits of the Self-Organizing Feature Map ……………………………. 23

2.6 Concluding Remarks …………………………………………………… 24

viii

CHAPTER 3 DATA-DRIVEN MODELING OF SCATTERED

COORDINATE DATA ……………………………………………. 25

3.1 Introduction ……………………………………………………………. 25

3.2 Freeform Shape Approximation Methods ……………………………. 26

3.2.1 Interpolation methods ……………………………………….. 26

3.2.2 Methods based on splines ……………………………………. 27

3.2.3 Signed distance function and radial basis function methods ….. 27

3.2.4 Triangulation methods ………………………………………. 28

3.2.5 Adaptive methods …………………………………………… 28

3.2.6 Deformable models ………………………………………….. 29

3.3 Generation of Tessellated Representations for Freeform Shapes ………. 31

3.3.1 Maintaining topological associations during learning ……….. 31

3.3.2 Updating topological associations during learning ………….. 32

3.4 Quality of the Tessellated Representation ……………………………… 36

3.5 Merits of the Data -Driven Modeling Mechanism for Scattered

Coordinate Data ……………………………………………………….. 37

3.6 Potential Areas of Applicability ………………………………………… 38


CHAPTER 4 EXPERIMENTS USING SCATTERED COORDINATE DATA … 40

4.1 Introduction ……………………………………………………………. 40

4.2 Experiments using Coordinate Data …………………………………… 40

4.2.1 Synthetic data of a cube ………………………….…………. 40

4.2.2 Scattered data of a sculptured head …………………………. 41

4.2.3 Cloud of points of the Stanford bunny ………………………. 45

4.3 Quality of the Tessellated Form ……………………………………….. 49


CHAPTER 5 DATA-DRIVEN MODELING FOR SCIENTIFIC DATA

VISUALIZATION …………………………………………………. 53

5.1 Introduction ……………………………………………………………. 53

5.2 Scientific Data Visualization Methods …………………………………. 54

5.2.1 Geometric projection methods …………………………….. 56

5.2.2 Glyph-based visualization methods ………………………….. 56

ix

5.2.3 Exploratory visualization methods …………………………… 59

5.2.4 Combined glyph-based and exploratory visualization

methods ……………………………………………………… 60

5.2.5 Visualization in the spherical domain ……………………….. 60

5.3 Generation of 3D Tessellated Graphical Forms for Numeric Data …….. 62

5.3.1 Mapping high-dimensional data into the tessellated

SOFM lattice …………………………………………………. 62

5.3.2 Mechanism to deform the spherical lattice ………………….. 64

5.4 Merits of the Data-Driven Modeling Mechanism for

Scientific Data Visualization ………………………………………….. 67

5.5 Metrics for an Effective Visualization Mechanism ……………………. 68


CHAPTER 6 EXPERIMENTS FOR SCIENTIFIC DATA VISUALIZATION … 71

6.1 Introduction ……………………………………………………………. 71

6.2 Chaotic Attractor Data …………………………..…………………….. 71

6.2.1 The lozi attractor function ………….……………………..… 72

6.2.2 The Hénon attractor function ……………………………….. 72

6.2.3 The Rössler attractor function .………..……………………. 75

6.2.4 The Lorenz attractor function ………………………………. 75

6.3 Simulated High-Dimensional data ………..…………………………… 79

6.4 Fisher’s Iris Flower Data ………..…………………………………….. 82

6.5 Wisconsin Breast Cancer Data …..…………………………………….. 85

6.6 Multi-spectral Satellite Image Data ………..………………………….. 88

6.7 Snow-coverage Data ………..………………………………………….. 92

6.8 Discussion: Effectiveness of a Visualization mechanism .…………… 98


CHAPTER 7 CONCLUSIONS …………………………………………………… 101

7.1 Review of the Data-Driven Modeling Mechanism …………………….. 101

7.2 Novel Features of the Method …………………………………………. 101

7.2.1 Scattered coordinate data …………………………………… 101

7.2.2 Scientific data visualization …..…………………………….. 103

7.3 Recommendations to Resolve Limitations …………………………….. 104

x

7.4 Future Improvement of the Data-Driven Modeling Mechanism ……….. 105 7.5 Final Remark ………………………………………………….……….. 107

BIBLIOGRAPHY …………………………………………………………………… 108

APPENDIX A: Tessellated Forms of Scattered Coordinate Data …………………. 117

A.1 Introduction ……………………………………………………………. 117

A.1.1 Scattered data of a human head ……………………………… 117

A.1.2 Synthetic coordinate data of an open ring …………………… 118

A.1.3 Synthetic coordinate data of a torus …………………………. 120

A.2 Observations from the Experimental Analysis ………………………… 121

APPENDIX B: Scientific Data Visualization Tables ……………………………… 122

APPENDIX C: Rigid and Non-Rigid Shape Transformation ……………………….. 125

C.1 Introduction ……………………………………………………………. 125

C.1.1 Rigid shape transformation: Registration …………………….. 125

C.1.2 Freeform shape registration example ………………………… 129

C.1.3 Non-rigid shape transformation: Morphing ………………… 131

C.2 Summary ……………………………………………………………… 134

CURRICULUM VITAE ……………………………………………………………… 135

xi

LIST OF TABLES

Table 4.1 Sum-of-squared errors between the tessellated forms and the respective

coordinate points. The corresponding figures are given in Appendix A ……. 50

Table 6.1 Visualization metrics for the different examples of abstract numeric

data ………………………………………………………………………… 98

Table B.1 Initial conditions and system parameters for the Lozi attractor function ……. 122

Table B.2 Initial conditions and system parameters for the Hénon attractor

function ……………………………………………………………………. 122

Table B.3 Initial conditions and system parameters for the Rössler attractor

function ……………………………………………………………………. 123

Table B.4 Initial conditions and system parameters for the Lorenz attractor

function …………………………………………………………………….. 123

Table B.5 Numeric test data set consisting of 6-dimensional feature vectors …………. 124

xii

LIST OF FIGURES

Figure 2.1 The feature map (Φ), its relationship with the input vector (xp) in the

N-dimensional space, that cluster unit or node located at (i,j,k) in the

discrete SOFM output space (ϖ) and the weight vector (wi,j,k)

associated with it (Haykin, 1999) ……………………………….…………. 8

Figure 2.2 Schematic representation of the learning strategy in a 1D SOFM.

Weight vectors connecting just one sample vector to the SOFM cluster

units have been shown ……………………………………………………… 10

Figure 2.3 Data association in a one-, two- and three-dimensional SOFM …………..… 11

Figure 2.4 Spherical SOFM with quadrilateral elements ………………………………. 12

Figure 2.5 Data association in a spherical SOFM ……………………………………… 13

Figure 2.6 The spherical self-organizing feature map (SOFM) ……………………… 14

Figure 2.7 Characteristic curves displaying the rate of change in the weights

during the learning operation ………………………………………………. 19

Figure 2.8 Cloud of points of the test object and the resulting two-dimensional

self-organizing feature map at the end of the learning operation …………… 21

Figure 2.9 The folding effect in a two-dimensional self-organizing feature map

(SOFM) due to the wrap-around condition in the neighbourhood

configuration along the boundaries of the map ……………………..……. 22

Figure 2.10 The undesirable folding effect when a 2D self-organizing feature map

(SOFM) is implemented using the wrap-around condition in the

neighbourhood configuration along the boundaries of the map. Part of

the open seam is traced onto the figure for visual enhancement ……………. 22

Figure 2.11 The resulting tessellated form of the test object after implementing the

spherical self-organizing feature map ……………………………………… 23

Figure 3.1 Schematic representation of surface reconstruction using the spherical

map. Weight vectors connecting just one coordinate point to a few

spherical SOFM cluster units have been shown …………………………… 31

Figure 3.2 Schematic representation of surface reconstruction for a torus, wherein

the map stretches across the hole due to the predefined connections in

the neural architecture of the spherical map ………………………………… 33

xiii

Figure 3.3 Reassignment of neighbours within the neighbourhood of a winning

node located at (i,j,k)*. The links to only two nodes are shown …………… 34

Figure 4.1 Synthetic data of a cube and initialization of the spherical map ……...……. 40

Figure 4.2 Evolution of the tessellated form as it adaptively learns the topological

associations in the cloud of points of the cube ……………………………. 41

Figure 4.3 Range image and scattered coordinate points of the sculptured head ………. 42

Figure 4.4 Initialization of the spherical map around the cloud of points …………….. 42

Figure 4.5 A sequence of the map as the tessellated form of the sculptured head

evolves during various stages of the learning process ……………………… 43

Figure 4.6 The final tessellated representation of the sculptured head along with

the scattered coordinate points superimposed on it. It displays an

interesting effect where the spherical SOFM tries to approximate the

geometry of the freeform object in regions of scarce data. ………………… 44

Figure 4.7 Cloud of points of the Stanford bunny ……………………………………... 45

Figure 4.8 Initialization of the spherical SOFM where it engulfs the cloud of

points ………………………………………………………………………... 46

Figure 4.9 Weight vectors and reconstructed surface representation of the bunny

without updating the topological connections in the spherical map ………… 47

Figure 4.10 Evolution of the tessellated form of the bunny as the topology of the

spherical lattice is updated during the learning process ……………………. 47

Figure 4.11 Weight vectors and the facetted surface model of the Stanford bunny at

the end of the learning process ……………………………………………... 48

Figure 4.12 Coordinate points, weight vectors and the facetted surface model that

comprise the base of the bunny ………………………………..……………. 48

Figure 4.13 The tessellated form generated by the spherical SOFM approximates the

topology of the cube coordinate data set …………………………………… 49

Figure 4.14 Nodes at the base of the sculptured head tessellated shape

representation that contribute to high SSEerror values ……………………… 51

Figure 5.1 Two-dimensional glyphs for visualizing a N-dimensional data vector

(a) Chernoff’s facial caricature and (b) Star plot ……………………..……. 57

Figure 5.2 Implicit surface generation to transform a N-dimensional vector in a

3D graphical form or glyph (Rohrer et al., 1999) …………………………… 58

xiv

Figure 5.3 Schematic representation of surface reconstruction using the spherical

deformable map. Weight vectors connecting just one coordinate point

to a few spherical SOFM cluster units have been shown …………………. 62

Figure 5.4 Deforming and colour-coding the spherical SOFM lattice to reflect

variation along the three dimensions of the input space. The

mechanism is illustrated using the analogy of five nodes uniformly

arranged on a circle ……………….….. 66

Figure 6.1 Poincaré maps and corresponding 3D tessellated graphical

representations for different initials conditions of the Lozi attractor

function. The tessellated forms reflect information about similarity

between cluster centres is coded as both shape deformation and colour

for enhanced visualization …………………………………………………. 73


representations for different initials conditions of the Hénon attractor





representations for different initials conditions of the Rössler attractor





representations for different initials conditions of the Lorenz attractor




Figure 6.5 Comparative analysis of the 3D glyphs for the Lorenz attractor

function under two sets of initial conditions in an interactive

visualization environment …………………………………………………. 78

Figure 6.6 The different views of the colour-coded 3D graphical representation

for the 6-dimensional numeric data with the four classes identified

(Sangole and Knopf, 2003) …………………………………………………. 80

xv

Figure 6.7 The scatter plot and three views of the 3D tessellated graphical

representation of the 6D numeric data along input dimension D3

(Sangole and Knopf, 2003) …………………………………………………. 81

Figure 6.8 The colour-coded graphical representations for each input dimension

of the 6D data set (Sangole and Knopf, 2003) ……………………………… 82

Figure 6.9 The three species of the Iris flower: Iris setosa, Iris versicolor and Iris

virginica ……………………………………………………………………. 83

Figure 6.10 The scatter plot of the Iris data along the sepal length and three views

of the corresponding graphical representations (Sangole and Knopf,

2003) ………………………………………………………………………… 84

Figure 6.11 Three-dimensional graphical representations (glyphs) of the Iris flower

for the four dimensional measurements of the data (Sangole and

Knopf, 2003) ………………………………………………………………. 85

Figure 6.12 Various views of the three-dimensional graphical representation of the

nine-dimensional Wisconsin cancer data. The location of one input

vector for each type of tissue sample, benign and malignant, are

identified on the colourized glyph ………………………………………….. 87

Figure 6.13 A misclassified benign tissue sample is identified in the three-

dimensional glyph ………………………………………………………….. 88

Figure 6.14 Three-dimensional glyph and the corresponding pattern projection onto

geographic space for multi-channel spectral data. The metric M3,, kji

reflects combined vegetation and temperature data for North America ……. 90

Figure 6.15 Three-dimensional glyphs and the corresponding pattern projections

onto geographic space for individual channels of spectral data. The

metric (M4,, kji )n reflects each channel of the multi-spectral data …………… 91

Figure 6.16 The different views of the 3D glyph representing annual snow

coverage during the year of 1991 and its projection onto the Northern

Hemisphere geographic space (Sangole and Knopf, 2002) ………………… 94

Figure 6.17 Enlarged views of the regions highlighted in Figure 6.16. These

regions primarily fall within valleys, thereby indicating that very little

variation exists amongst the input vectors that have been assigned to

the cluster units (Sangole and Knopf, 2002) ………………………………. 95

xvi

Figure 6.18 The graphical representations of the annual snow cover patterns during

1987 and 1988 with their respective projections onto the Northern

Hemisphere geographic space (Sangole and Knopf, 2002) ………………… 96

Figure 6.19 Graphical forms of the annual snow cover patterns for the years 1988

and from 1991-1995, with their respective projections onto the

geographic space. The region on the graphical form that has cluster

units onto which feature vectors of Alaska have been mapped is

highlighted in each representation (Sangole and Knopf, 2002) ……………. 97

Figure 7.1 Triangulation and its dual simplex mesh ……………………………………. 106

Figure A.1 Scattered coordinate data of a human head and initialization of the

spherical map ………………………………………………………………. 117

Figure A.2 Evolution of the tessellated form as it adaptively learns the topological

associations in the cloud of points of the human head …………………….. 118

Figure A.3 Synthetic coordinate data of an open ring and initialization of the

spherical map ……………………………………………………………….. 119


associations in the cloud of points of the open ring ………………………… 119

Figure A.5 Scattered coordinate data of the torus and initialization of the spherical

map …………………………………………………………………………. 120


associations in the cloud of points of the torus …………………………….. 120

Figure C.1 Illustration of shape registration based on using tessellated

representations of the freeform shapes (Knopf and Sangole, 2002a) ………. 127

Figure C.2 The original orientation of the tessellated representations for the

reference sculptured head sculptrS and the object to be matched sculpt

tS ……. 130

Figure C.3 Aligning the transformed with the reference tessellated representation

so that their centres-of-mass coincide ……………………………………… 130

Figure C.4 Transformation of the sculptured head at various stages of the

registration process along with corresponding registration error values …… 131

Figure C.5 Non-rigid transformation from Shape-1 to Shape-2 (Knopf and

Sangole, 2002b) ……………………………………………………………. 133

Figure C.6 Sequence of shape transformations involving morphing a cube into a

disc and back into the cube (Knopf and Sangole, 2002b) ………………….. 134

1

CHAPTER 1 INTRODUCTION

1.1 The Problem

Graphical methods are essential tools that assist data analysts and product designers in evaluating

their data and deriving meaningful inferences, thereby gaining deeper understanding about the

physical phenomenon characterizing their data. Emerging technologies have made available more

sophisticated graphical display tools. For example, recent advances in research have

demonstrated that immersive virtual reality (IVR) can provide a far richer visualization and,

interactive modeling and analysis medium (Van Dam et al., 2000; Nielson et al., 1997). These

sophisticated, interactive, task-driven display and analysis media employ the full range of human

sensorimotor capability and help in providing an insight on huge volumes of experimentally

acquired data. Generating graphical models with a structure that is simple to manipulate and also

facilitate replicating real world interactive techniques are issues that need to be addressed in order

to make use of this technology most effectively (Brooks, 1999). The data-driven modeling

approach discussed in this thesis presents a viable mechanism to generate a variety of 3D

graphical forms, or glyphs, of data that enable intuitive and task-driven interaction with

maximum flexibility and versatility. Several data sets are used to demonstrate the effectiveness of

the proposed methodology including scattered 3D coordinate data and a variety of abstract

numeric data sets. The following section briefly discusses the terminology used and the context

within which they apply before introducing the general data-driven modeling concept.

1.2 Basic Terminology

Data-driven, in this context, implies a process that tries to extract novel characteristics of the

underlying physical system or phenomenon from strings of numeric data with little or no a priori

assumptions about the organization of the data (Solomatine, 2002). Structure refers to the

configuration of the lower-dimensional mapping space that tries to simplify the representation of

complex patterns inherent in the data. Its purpose is to enhance and display novel features or

attributes of the data, or the data itself, in a manner that emphasizes other underlying patterns that

are not easily apparent in the string of numbers, thereby providing a better understanding of the

physical phenomenon. It therefore forms a basis of connecting or relating the input space

variables that describe the system’s behaviour and which, by themselves, provide limited

2

knowledge about the details of the phenomenon or the system under study. Topological

associations apply to neighbourhood relationships between the data vectors in the input space.

These associations are likewise reflected in the mapping space by establishing connectivity

between groups or clusters of similar data vectors. It is therefore essential that the map be

configured in a manner that reflects these associations. Throughout the thesis, ‘map’ signifies the

structure within the mapping space and is intuitive in the visual sense. It is hence different from

the mathematical perspective where it is used as a synonym for transformation. The term

“abstract” is used in conjunction with numeric data in order to distinguish between the two

categories of data: spatial (x, y, z) 3D coordinate data of freeform objects, and numeric data for

scientific visualization that consist of a string of numbers indicating different attributes of the

physical phenomenon being observed.

1.3 General Concept of the Modeling Mechanism

The research work presented is this thesis discusses a data -driven modeling mechanism to

generate 3D tessellated graphical representations of data. Its core algorithm utilizes the

unsupervised clustering algorithm of the self-organizing feature map (SOFM) to adaptively learn

associations hidden in the data. The topology-preservation capability of the SOFM causes groups

of similar data vectors to get assigned to identifiable neighbourhoods in its predefined lattice. The

closed structure of the lattice is utilized to create a 3D graphical form that represents the data

within a simple structure with known inter-node connectivity thereby making it favourable to be

intuitively manipulated in an interactive display medium and consequently facilitating an

exploratory, task-driven analysis.

A major ity of the research work involving the use of the self-organizing feature map,

primarily pattern classification, exploit the unsupervised clustering capability of the SOFM. On

the contrary, the research presented in this thesis gives emphasis to the topology of the predefined

lattice and explains how it influences the SOFM learning parameters. The predefined lattice of

the SOFM in this work is a tessellated unit sphere where each node represents a cluster unit.

Although clustering is the primary function of the SOFM algorithm the discussion is directed

towards highlighting the merits of its predefined spherical lattice and topology for modeling and

visualization. It is assumed that the same principles of clustering apply and are as valid as in the

case of a 1D or a 2D SOFM lattice (Kohonen, 1997).

3

Since there is very limited published literature on the spherical self-organizing feature map,

the thesis first explains the implementation of the data-driven modeling algorithm using scattered

3D coordinate data. This exercise demonstrates the topology preserving nature of the spherical

self-organizing feature map by generating tessellated representations of freeform objects given

their cloud of scattered points. Every node in the tessellated representation is therefore a

coordinate point in 3D space. It is later applied to seemingly “random” multi-dimensional data to

demonstrate how abstract information embedded in the data set is given a graphical form. Unlike

in the previous case, every node in the tessellated representation is now an N-dimensional vector.

In order to create a graphical representation, associations or inter-relationships between the

numeric vectors are extracted using metrics or mathematical formulations that best quantify and

relate to some aspect of the physical phenomenon. The metric may relate the N-dimensional

vectors of neighbouring nodes (cluster units) and consequently establish a relation between

groups of similar data vectors or, it may relate a group of input vectors assigned to a particular

cluster unit. In this manner, every node has one or more scalar values reflecting some aspect of

the physical phenomenon. The spherical lattice is then deformed or colourized in proportion to

these values thereby transforming it into a visual dimension of the display space. The resulting

colour-coded tessellated form is thus a glyph or a graphical representation reflecting patterns

hidden in the data. The mechanism provides researchers with an automatic/ semi-automatic

method to quickly enhance and observe novel characteristics of their data by means of graphical

representations and assist in gaining deeper insights on the physical phenomenon characterized by

their numeric experimental data. Unlike in the case of scattered coordinate data, interpretation of

the graphical representation is complex and largely application dependent.

In both cases, 3D scattered data and abstract numeric data, the resulting representations

originate from a tessellated sphere and the connected nodes in the lattice assist in the comparison

of multiple representations within interactive environments by simplifying the process of re-

orientating the object in the display space. Several different approaches are used to generate 3D

tessellated graphical representations of data. They have been discussed appropriately to address

the two general categories of data used in the analysis: scattered coordinate data and abstract

numeric data for scientific visualization. Three-dimensional coordinate data acquired using

commercial scanning systems help to illustrate certain novel features of the mechanism that

would otherwise be difficult to understand and perceive in the case of abstract numeric data. The

objective is to present the versatility of the spherical deformable map algorithm and the potential

of its applicability in data modeling and visualization using innovative immersive virtual reality

technology.

4

1.4 Overview of the Thesis

The thesis first explains in Chapter 2 the Sammon’s mapping method and the multi-dimensional

scaling method that, like the self-organizing feature map, retain some notion of topology. It then

provides an elaborate discussion on the self-organizing feature map learning strategy and the

influence of the topology of its predefined lattice on data organization. Chapter 3 discusses the

data-driven modeling mechanism when it is implemented using scattered 3D coordinate data

followed with several examples in Chapter 4 to illustrate its performance. The examples

demonstrate two cases, one in which the integrity of the map is maintained during learning and

the second, wherein the topological connections are updated during learning. Implementation of

the data-driven mechanism using abstract numeric data is explained in Chapter 5. Various

examples illustrating its application are presented in Chapters 6. The examples include chaotic

attractor data, simulated six-dimensional data, Fisher’s Iris flower data, Wisconsin breast cancer

data, geographic and annual snow-coverage data. These examples demonstrate the capability of

the mechanism to generate consistent, reproducible, colour-coded shapes that represent the

numeric data and reflect underlying associations (natural clusters) in the form of variations in

colour and surface geometry of the 3D graphical representations (glyphs). In conclusion, Chapter

7 summarizes significant results and suggests recommendations to resolve limitations and to

further improve the method.

Appendix A provides examples of scattered 3D coordinate data that are not included in

Chapter 4. Tables of the chaotic attractor parameters and the simulated high-dimensional abstract

numeric data, used in Chapter 6, are included in Appendix B. Preliminary work involving the

utilization of the tessellated representations of scattered 3D coordinate data in rigid and non-rigid

shape transformations are summarized in Appendix C.

5

CHAPTER 2 THE SPHERICAL SELF-ORGANIZING FEATURE MAP

2.1 Introduction

The proposed data-driven modeling method utilizes a deformable map that is initialized by a

tessellated unit sphere with predefined topological associations between the nodes of the spherical

lattice. The map learns the topology of the input space using the weight adaptation strategy of the

self-organizing feature map (SOFM). This chapter discusses other unsupervised mapping

methods that maintain topology before presenting an overview of the fundamentals of the self-

organizing feature map and its learning strategy. It highlights the merits of using a spherical

topology in the predefined lattice of the SOFM architecture. It also provides a discussion on how

the proposed method compares with the Sammon’s mapping approach and the multi-dimensional

scaling method. In conclusion, it summarizes key points that justify the selection of the self-

organizing feature map in this particular research work.

2.2 Unsupervised Topology Mapping Methods

This section briefly discusses the two primarily used methods, besides the self-organizing feature

map, that generate mappings that try to preserve some notion of input topological associations –

namely the Sammon Mapping method and the statistical method of Multi-Dimensional Scaling.

2.2.1 Sammon’s mapping method

The Sammon’s mapping method (Sammon J.W.Jr., 1969), is a non-linear mapping approach that

maps high-dimensional vectors to a lower-dimensional space while preserving inherent geometric

relationships among subsets of data vectors in the input space. The algorithm involves a point-to-

point mapping from the N-dimensional space to the lower (2 or 3) dimensional space such that

inter-point distances are approximately preserved. If there are P data vectors in the N-dimensional

space (ℜN), XP such that xp = [ pN

pp xxx ,...,, 21 ], where p = 1, 2, … P, and the mapped space is

denoted by D, where D ∈ ℜ2 or ℜ3, then the cost function Esammon of the mapping operation is

6

Esammon = ∑∑ <<

−P

jiij

ijij

ji

ij δdδ

δ

2][][

1 (2.1)

where

],[ ji xxdistδ ij = , (2.2a)

and

],[ ji yydistd ij = , (2.2b)

( ji xx , ) ∈ ℜN, ( ji yy , ) ∈ ℜ2 or ℜ3, i = 1, 2, … , P and j = 1, 2, … , P (Sammon J.W.Jr., 1969).

The initial starting configuration of D is generally defined by an orthogonal projection of the N-

dimensional vectors onto the D space.

2.2.2 Multi-dimensional scaling method

Multi-dimensional scaling (MDS) is another technique that like the Sammon’s method, does a

point-to-point mapping, by preserving inter-point distances. Given a set of P data vectors in the

input space, associations in the Euclidean sense, between approximately P(P-1)/2 pairs of points

are preserved in the low-dimensional space (Duda and Hart, 1973). The cost-function for finding

an optimum configuration in the low-dimensional space may be defined as below:

EMDS - 1 = ∑∑ <

−P

ji

ijdδ)(δ

22 )(1 ij

ij , (2.3a)

EMDS - 2 = 2

∑<

⎟⎟⎠

⎞⎜⎜⎝

⎛ −P

ji

ij

δdδ

ij

ij , (2.3b)

EMDS - 3 = ∑∑ <

<

−P

ji

ijij

P

ji

δdδ

δij

ij

2)(1 , (2.3c)

7

where ijδ is the measure of dissimilarity in ℜN while, dij is that in ℜ2 or ℜ3, i = 1, 2, … , P and j =

1, 2, … , P. EMDS - 1 emphasizes largest errors in the mapping operation, EMDS - 2 emphasizes

largest fractional errors, while, EMDS - 3 is a compromise of the previous two as it emphasizes the

largest product of the two errors and has also been implemented in the Sammon’s mapping

method. The initial configuration in the low-dimensional space is chosen randomly or by using a

projection of the vectors along the direction of largest variance.

2.3 Self-Organizing Feature Map (SOFM)

The self-organizing feature map (Kohonen, 1997) is an unsupervised clustering algorithm that

develops a set of “internally ordered” cluster units (nodes) by exploiting hidden redundant and

complementary feature vectors and tries to preserve topological associations of the input space

within its predefined lattice. A continuous input space of activation patterns is mapped onto a

discrete output space of nodes, arranged on a predefined lattice, by a process of competition

among nodes in the SOFM network (Haykin, 1999). The response of the network, to an input

pattern presented to it, is interpreted in terms of the position of the node in the predefined lattice

or the weight vector of a node closest to the input pattern in the Euclidean sense.

If χ represents the N-dimensional spatially continuous input space that comprises of a set of

{XN} activation patterns, whose topology is defined by metric relationship of the vectors xp ∈ χ ,

and ϖ denotes the spatially discrete SOFM space, then, in the mathematical sense (Haykin, 1999)

the SOFM non linear transformation may be expressed as

Φ : χ → ϖ (2.4)

where Φ the SOFM non-linear mapping between the input data space and the weight vectors of

the SOFM space, Figure 2.1. Every input vector gets assigned to a cluster unit in the discrete

SOFM space. The weight vector of the assigned cluster unit forms a representation of the input

vector and may be considered as an image of that cluster unit when it is projected in the input

space.

Date post:	28-May-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Data-Driven Modeling using Spherical Self …Data-Driven Modeling using Spherical Self-Organizing...

Documents