Data-Driven Modeling using Spherical Self-Organizing Feature Maps
by
Archana P. Sangole
ISBN: 1-58112- 319-1
DISSERTATION.COM
Boca Raton, Florida USA • 2006
Data-Driven Modeling using Spherical Self-Organizing Feature Maps
Copyright © 2003 Archana P. Sangole All rights reserved.
Dissertation.com
Boca Raton, Florida USA • 2006
ISBN: 1-58112- 319-1
Data-Driven Modeling using Spherical
Self-Organizing Feature Maps
By
Archana P. Sangole
The Department of Mechanical and Materials Engineering
Graduate Program
in
Engineering
Submitted in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Faculty of Graduate Studies
The University of Western Ontario
London, Ontario
June, 2003
Archana P. Sangole 2003
Dedication
This work is dedicated to my father Mr. Prabhakar R. Sangole and my mother Mrs. Shobha P.
Sangole who have stood by me at all times and believed in me.
Do not go where the path may lead,
go instead where there is no path and leave a trail.
- Ralph Waldo Emerson
iii
ABSTRACT
Researchers and data analysts are increasingly relying on graphical tools to assist them in
modeling their data, generating their hypotheses, and gaining deeper insights on their
experimentally acquired data. Recent advances in technology have made available more
improved and novel modeling and analysis media that facilitate intuitive, task-driven exploratory
analysis and manipulation of the displayed graphical representations. In order to utilize these
emerging technologies researchers must be able to transform experimentally acquired data
vectors into a visual form or secondary representation that has a simple structure and, is easily
transferable into the media. As well, it is essential that it can be modified or manipulated within
the display environment.
This thesis presents a data-driven modeling technique that utilizes the basic learning strategy
of an unsupervised clustering algorithm, called the self-organizing feature map, to adaptively
learn topological associations inherent in the data and preserve them within the topology imposed
by its predefined spherical lattice, thereby transforming the data into a 3D tessellated form. The
tessellated graphical forms originate from a sphere thereby simplifying the process of computing
its transformation parameters on re-orientation within an interactive, task-driven, graphical
display medium. A variety of data sets including six sets of scattered 3D coordinate data, chaotic
attractor data, the more commonly used Fisher’s Iris flower data, medical numeric data,
geographic and environmental data are used to illustrate the data-driven modeling and
visualization mechanism.
The modeling algorithm is first applied to scattered 3D coordinate data to understand the
influence of the spherical topology on data organization. Two cases are examined, one in which
the integrity of the spherical lattice is maintained during learning and, the second, in which the
inter-node connections in the spherical lattice are adaptively changed during learning. In the
analysis, scattered coordinate data of freeform objects with topology equivalent to a sphere and
those whose topology is not equivalent to a sphere are used. Experiments demonstrate that it is
possible to get reasonably good results with the degree of resemblance, determined by an average
of the total normalized error measure, ranging from 6.2x10-5 – 1.1x10-3. The experimental
analysis using scattered coordinate data facilitates an understanding of the algorithm and provides
evidence of the topology-preserving capability of the spherical self-organizing feature map.
The algorithm is later implemented using abstract, seemingly random, numeric data. Unlike
in the case of 3D coordinate data, wherein the SOFM lattice is in the same coordinate frame
iv
(domain) as the input vectors, the numeric data is abstract. The criterion for deforming the
spherical lattice is determined using mathematical and statistical functions as measures-of-
information that are tailored to reflect some aspect of meaningful, tangible, inter-vector
relationships or associations embedded in the spatial data that reveal some physical aspect of the
data. These measures are largely application-dependent and need to be defined by the data analyst
or an expert. Interpretation of the resulting 3D tessellated graphical representation or form (glyph)
is more complex and task dependent as compared to that of scattered coordinate data. Very
simple measures are used in this analysis in order to facilitate discussion of the underlying
mechanism to transform abstract numeric data into 3D graphical forms or glyphs. Several data
sets are used in the analysis to illustrate how novel characteristics hidden in the data, and not
easily apparent in the string of numbers, can be reflected via 3D graphical forms.
The proposed data-driven modeling approach provides a viable mechanism to generate 3D
tessellated representations of data that can be easily transferred to a graphical modeling and
analysis medium for interactive and intuitive exploration.
KEYWORDS: Deformable shape modeling, spherical map, data-driven models, self-organizing
feature map, scientific data visualization, exploratory data analysis, and numeric data
transformation.
v
ACKNOWLEDGEMENTS
I would like to express my sincere gratitude to all those who helped to make this research
possible by being a part of this journey and by just being there for me.
First and foremost, heartfelt thanks are due to my supervisor, Prof. George K. Knopf, for his
guidance and encouragement throughout the gruelling experience. His constant enthusiasm in this
research added a touch of hope and inspiration to surge forward and to strive for excellence. He
shared with me his passion for research and teaching, and, presented me with challenges and
opportunities that helped me learn more about myself. He has been a source of inspiration in all
respects and I am truly thankful to him.
I had to opportunity to interact with several faculty members and graduate students who provided
support in various ways; sometimes helping me brainstorm on ideas, by listening to me no matter
how busy or tired they were, by occasionally sharing a coffee with me and, by just having a
friendly conversation to brighten my day. Their presence made a big difference. My sincere
thanks to Prof. Roger Khayat and his students KyuTae Kim, Zhenyu Li and Radoslav German for
their assistance and for making me feel as if I were part of the team. I am very thankful to Prof.
Cynthia Dunning Zwicker for her constant encouragement during the last few months and for just
being there for me when I was almost ready to give up. Thanks are also due to Mr. Marian
Jaworski, a senior Technician in the Department, and, to other students and friends who gave
their support.
My sincere gratitude to the department graduate secretaries Mrs. Joan Tangen and Ms. Chris
Seres, and, the faculty ITS computing staff for their constant support and for accommodating me
at all times. It was a pleasure working with my former colleagues Jonathan, Alireza, and Basem,
as well as my colleagues Dennis, WeiWei and Philip. Special thanks are due to Dennis and his
family for seeing me through frustrating and difficult times.
I gratefully acknowledge the financial assistance that I received in the form of scholarships from
the Ontario Ministry of Education and Training, and, the Faculty of Graduate Studies. It made the
trip a little easier to endure. This work was also partially supported by a research grant from the
Natural Sciences and Engineering Research Council of Canada (NSERC).
There are not enough words to express my appreciation and thankfulness to the unreserved
encouragement, inspiration and support from my parents, Mr. and Mrs. P.R. Sangole, and, my
vi
Uncle and Aunt, Mr. and Mrs. D.J. Patel. The endless resource of moral support, patience and
understanding that I received from them deserves a special acknowledgement. They made the
journey worthwhile by being there for me at all times and I am truly grateful to be blessed with
such a wonderful family.
On a more official note, I would like to thank the Distributed Active Archive Centre (DAAC) at
the Goddard Space Flight Centre (GSFC), Greenbelt, MD, for the multi-spectral satellite data set,
Forrest Hall and Blanche Meeson, the principal investigators in the ISLSCP Initiative-II Project,
for the snow cover environmental data set, Prof. Hervé Delingette (INRIA), Prof. Jim Bezdek
(University of West Florida) and Mr. Robert Cushman (Director, CDIAC) for their helpful
comments and encouraging replies to my e-mails, the Stanford 3D Scanning Repository for the
bunny data set, the UCI Repository of Machine Learning for the Wisconsin breast cancer data and
Mr. Markus Wawryniuk (Computer Science Institute, Konstanz, Germany) for directing me to the
UCI Repository. The assistance from Mr. Peter Smith (DAAC-GSFC), Mr. Basem Yousef who
helped in acquiring the sculptured and human head range data and, Mr. Tesfaye Breta, a 2nd year
Engineering student, who modeled for the human head range data are gratefully acknowledged.
vii
TABLE OF CONTENTS CERTIFICATE OF EXAMINATION ………………………………………………. ii
ABSTRACT …………………………………………………………………………. iii
ACKNOWLEDGEMENTS ………………………………………………………… v
TABLE OF CONTENTS ……………………………………………………………. vii
LIST OF TABLES ………………………………………………………………….. xi
LIST OF FIGURES …………………………………………………………………. xii
CHAPTER 1 INTRODUCTION …………………………………………………. 1
1.1 The Problem ……………………………………………………………. 1
1.2 Basic Terminology …….…………………………………………….…. 1
1.3 General Concept of the Modeling Mechanism ..………………………. 2
1.4 Overview of the Thesis …………………………………………...…… 3
CHAPTER 2 THE SPHERICAL SELF-ORGANIZING FEATURE MAP ……. 5
2.1 Introduction ……………………………………………………………. 5
2.2 Unsupervised Topology Mapping Methods ……….…………………… 5
2.2.1 Sammon’s mapping method …………………………………. 5
2.2.2 Multi-dimensional scaling method ………………………….. 6
2.3 Self-Organizing Feature Map (SOFM) ………………………………… 7
2.3.1 Learning strategy in a one-dimensional SOFM ……………… 9
2.3.2 Topology of the SOFM lattice ………………………………. 11
2.3.3 Weight adaptation strategy in the spherical SOFM ………….. 13
2.3.4 The neighbourhood operator (NEi,j,k*) ……………………….. 16
2.3.5 The learning rate (α) ………………………………………… 16
2.3.6 The learning rules …………………………………………… 17
2.3.7 Computation time …………………………………………… 20
2.4 Significance of the Spherical Topology ………………………………… 20
2.5 Merits of the Self-Organizing Feature Map ……………………………. 23
2.6 Concluding Remarks …………………………………………………… 24
viii
CHAPTER 3 DATA-DRIVEN MODELING OF SCATTERED
COORDINATE DATA ……………………………………………. 25
3.1 Introduction ……………………………………………………………. 25
3.2 Freeform Shape Approximation Methods ……………………………. 26
3.2.1 Interpolation methods ……………………………………….. 26
3.2.2 Methods based on splines ……………………………………. 27
3.2.3 Signed distance function and radial basis function methods ….. 27
3.2.4 Triangulation methods ………………………………………. 28
3.2.5 Adaptive methods …………………………………………… 28
3.2.6 Deformable models ………………………………………….. 29
3.3 Generation of Tessellated Representations for Freeform Shapes ………. 31
3.3.1 Maintaining topological associations during learning ……….. 31
3.3.2 Updating topological associations during learning ………….. 32
3.4 Quality of the Tessellated Representation ……………………………… 36
3.5 Merits of the Data -Driven Modeling Mechanism for Scattered
Coordinate Data ……………………………………………………….. 37
3.6 Potential Areas of Applicability ………………………………………… 38
3.7 Concluding Remarks …………………………………………………… 38
CHAPTER 4 EXPERIMENTS USING SCATTERED COORDINATE DATA … 40
4.1 Introduction ……………………………………………………………. 40
4.2 Experiments using Coordinate Data …………………………………… 40
4.2.1 Synthetic data of a cube ………………………….…………. 40
4.2.2 Scattered data of a sculptured head …………………………. 41
4.2.3 Cloud of points of the Stanford bunny ………………………. 45
4.3 Quality of the Tessellated Form ……………………………………….. 49
4.4 Concluding Remarks …………………………………………………… 51
CHAPTER 5 DATA-DRIVEN MODELING FOR SCIENTIFIC DATA
VISUALIZATION …………………………………………………. 53
5.1 Introduction ……………………………………………………………. 53
5.2 Scientific Data Visualization Methods …………………………………. 54
5.2.1 Geometric projection methods …………………………….. 56
5.2.2 Glyph-based visualization methods ………………………….. 56
ix
5.2.3 Exploratory visualization methods …………………………… 59
5.2.4 Combined glyph-based and exploratory visualization
methods ……………………………………………………… 60
5.2.5 Visualization in the spherical domain ……………………….. 60
5.3 Generation of 3D Tessellated Graphical Forms for Numeric Data …….. 62
5.3.1 Mapping high-dimensional data into the tessellated
SOFM lattice …………………………………………………. 62
5.3.2 Mechanism to deform the spherical lattice ………………….. 64
5.4 Merits of the Data-Driven Modeling Mechanism for
Scientific Data Visualization ………………………………………….. 67
5.5 Metrics for an Effective Visualization Mechanism ……………………. 68
5.6 Concluding Remarks …………………………………………………… 70
CHAPTER 6 EXPERIMENTS FOR SCIENTIFIC DATA VISUALIZATION … 71
6.1 Introduction ……………………………………………………………. 71
6.2 Chaotic Attractor Data …………………………..…………………….. 71
6.2.1 The lozi attractor function ………….……………………..… 72
6.2.2 The Hénon attractor function ……………………………….. 72
6.2.3 The Rössler attractor function .………..……………………. 75
6.2.4 The Lorenz attractor function ………………………………. 75
6.3 Simulated High-Dimensional data ………..…………………………… 79
6.4 Fisher’s Iris Flower Data ………..…………………………………….. 82
6.5 Wisconsin Breast Cancer Data …..…………………………………….. 85
6.6 Multi-spectral Satellite Image Data ………..………………………….. 88
6.7 Snow-coverage Data ………..………………………………………….. 92
6.8 Discussion: Effectiveness of a Visualization mechanism .…………… 98
6.9 Concluding Remarks …………………………………………………… 100
CHAPTER 7 CONCLUSIONS …………………………………………………… 101
7.1 Review of the Data-Driven Modeling Mechanism …………………….. 101
7.2 Novel Features of the Method …………………………………………. 101
7.2.1 Scattered coordinate data …………………………………… 101
7.2.2 Scientific data visualization …..…………………………….. 103
7.3 Recommendations to Resolve Limitations …………………………….. 104
x
7.4 Future Improvement of the Data-Driven Modeling Mechanism ……….. 105 7.5 Final Remark ………………………………………………….……….. 107
BIBLIOGRAPHY …………………………………………………………………… 108
APPENDIX A: Tessellated Forms of Scattered Coordinate Data …………………. 117
A.1 Introduction ……………………………………………………………. 117
A.1.1 Scattered data of a human head ……………………………… 117
A.1.2 Synthetic coordinate data of an open ring …………………… 118
A.1.3 Synthetic coordinate data of a torus …………………………. 120
A.2 Observations from the Experimental Analysis ………………………… 121
APPENDIX B: Scientific Data Visualization Tables ……………………………… 122
APPENDIX C: Rigid and Non-Rigid Shape Transformation ……………………….. 125
C.1 Introduction ……………………………………………………………. 125
C.1.1 Rigid shape transformation: Registration …………………….. 125
C.1.2 Freeform shape registration example ………………………… 129
C.1.3 Non-rigid shape transformation: Morphing ………………… 131
C.2 Summary ……………………………………………………………… 134
CURRICULUM VITAE ……………………………………………………………… 135
xi
LIST OF TABLES
Table 4.1 Sum-of-squared errors between the tessellated forms and the respective
coordinate points. The corresponding figures are given in Appendix A ……. 50
Table 6.1 Visualization metrics for the different examples of abstract numeric
data ………………………………………………………………………… 98
Table B.1 Initial conditions and system parameters for the Lozi attractor function ……. 122
Table B.2 Initial conditions and system parameters for the Hénon attractor
function ……………………………………………………………………. 122
Table B.3 Initial conditions and system parameters for the Rössler attractor
function ……………………………………………………………………. 123
Table B.4 Initial conditions and system parameters for the Lorenz attractor
function …………………………………………………………………….. 123
Table B.5 Numeric test data set consisting of 6-dimensional feature vectors …………. 124
xii
LIST OF FIGURES
Figure 2.1 The feature map (Φ), its relationship with the input vector (xp) in the
N-dimensional space, that cluster unit or node located at (i,j,k) in the
discrete SOFM output space (ϖ) and the weight vector (wi,j,k)
associated with it (Haykin, 1999) ……………………………….…………. 8
Figure 2.2 Schematic representation of the learning strategy in a 1D SOFM.
Weight vectors connecting just one sample vector to the SOFM cluster
units have been shown ……………………………………………………… 10
Figure 2.3 Data association in a one-, two- and three-dimensional SOFM …………..… 11
Figure 2.4 Spherical SOFM with quadrilateral elements ………………………………. 12
Figure 2.5 Data association in a spherical SOFM ……………………………………… 13
Figure 2.6 The spherical self-organizing feature map (SOFM) ……………………… 14
Figure 2.7 Characteristic curves displaying the rate of change in the weights
during the learning operation ………………………………………………. 19
Figure 2.8 Cloud of points of the test object and the resulting two-dimensional
self-organizing feature map at the end of the learning operation …………… 21
Figure 2.9 The folding effect in a two-dimensional self-organizing feature map
(SOFM) due to the wrap-around condition in the neighbourhood
configuration along the boundaries of the map ……………………..……. 22
Figure 2.10 The undesirable folding effect when a 2D self-organizing feature map
(SOFM) is implemented using the wrap-around condition in the
neighbourhood configuration along the boundaries of the map. Part of
the open seam is traced onto the figure for visual enhancement ……………. 22
Figure 2.11 The resulting tessellated form of the test object after implementing the
spherical self-organizing feature map ……………………………………… 23
Figure 3.1 Schematic representation of surface reconstruction using the spherical
map. Weight vectors connecting just one coordinate point to a few
spherical SOFM cluster units have been shown …………………………… 31
Figure 3.2 Schematic representation of surface reconstruction for a torus, wherein
the map stretches across the hole due to the predefined connections in
the neural architecture of the spherical map ………………………………… 33
xiii
Figure 3.3 Reassignment of neighbours within the neighbourhood of a winning
node located at (i,j,k)*. The links to only two nodes are shown …………… 34
Figure 4.1 Synthetic data of a cube and initialization of the spherical map ……...……. 40
Figure 4.2 Evolution of the tessellated form as it adaptively learns the topological
associations in the cloud of points of the cube ……………………………. 41
Figure 4.3 Range image and scattered coordinate points of the sculptured head ………. 42
Figure 4.4 Initialization of the spherical map around the cloud of points …………….. 42
Figure 4.5 A sequence of the map as the tessellated form of the sculptured head
evolves during various stages of the learning process ……………………… 43
Figure 4.6 The final tessellated representation of the sculptured head along with
the scattered coordinate points superimposed on it. It displays an
interesting effect where the spherical SOFM tries to approximate the
geometry of the freeform object in regions of scarce data. ………………… 44
Figure 4.7 Cloud of points of the Stanford bunny ……………………………………... 45
Figure 4.8 Initialization of the spherical SOFM where it engulfs the cloud of
points ………………………………………………………………………... 46
Figure 4.9 Weight vectors and reconstructed surface representation of the bunny
without updating the topological connections in the spherical map ………… 47
Figure 4.10 Evolution of the tessellated form of the bunny as the topology of the
spherical lattice is updated during the learning process ……………………. 47
Figure 4.11 Weight vectors and the facetted surface model of the Stanford bunny at
the end of the learning process ……………………………………………... 48
Figure 4.12 Coordinate points, weight vectors and the facetted surface model that
comprise the base of the bunny ………………………………..……………. 48
Figure 4.13 The tessellated form generated by the spherical SOFM approximates the
topology of the cube coordinate data set …………………………………… 49
Figure 4.14 Nodes at the base of the sculptured head tessellated shape
representation that contribute to high SSEerror values ……………………… 51
Figure 5.1 Two-dimensional glyphs for visualizing a N-dimensional data vector
(a) Chernoff’s facial caricature and (b) Star plot ……………………..……. 57
Figure 5.2 Implicit surface generation to transform a N-dimensional vector in a
3D graphical form or glyph (Rohrer et al., 1999) …………………………… 58
xiv
Figure 5.3 Schematic representation of surface reconstruction using the spherical
deformable map. Weight vectors connecting just one coordinate point
to a few spherical SOFM cluster units have been shown …………………. 62
Figure 5.4 Deforming and colour-coding the spherical SOFM lattice to reflect
variation along the three dimensions of the input space. The
mechanism is illustrated using the analogy of five nodes uniformly
arranged on a circle ……………….….. 66
Figure 6.1 Poincaré maps and corresponding 3D tessellated graphical
representations for different initials conditions of the Lozi attractor
function. The tessellated forms reflect information about similarity
between cluster centres is coded as both shape deformation and colour
for enhanced visualization …………………………………………………. 73
Figure 6.2 Poincaré maps and corresponding 3D tessellated graphical
representations for different initials conditions of the Hénon attractor
function. The tessellated forms reflect information about similarity
between cluster centres is coded as both shape deformation and colour
for enhanced visualization …………………………………………………. 74
Figure 6.3 Poincaré maps and corresponding 3D tessellated graphical
representations for different initials conditions of the Rössler attractor
function. The tessellated forms reflect information about similarity
between cluster centres is coded as both shape deformation and colour
for enhanced visualization …………………………………………………. 76
Figure 6.4 Poincaré maps and corresponding 3D tessellated graphical
representations for different initials conditions of the Lorenz attractor
function. The tessellated forms reflect information about similarity
between cluster centres is coded as both shape deformation and colour
for enhanced visualization …………………………………………………. 77
Figure 6.5 Comparative analysis of the 3D glyphs for the Lorenz attractor
function under two sets of initial conditions in an interactive
visualization environment …………………………………………………. 78
Figure 6.6 The different views of the colour-coded 3D graphical representation
for the 6-dimensional numeric data with the four classes identified
(Sangole and Knopf, 2003) …………………………………………………. 80
xv
Figure 6.7 The scatter plot and three views of the 3D tessellated graphical
representation of the 6D numeric data along input dimension D3
(Sangole and Knopf, 2003) …………………………………………………. 81
Figure 6.8 The colour-coded graphical representations for each input dimension
of the 6D data set (Sangole and Knopf, 2003) ……………………………… 82
Figure 6.9 The three species of the Iris flower: Iris setosa, Iris versicolor and Iris
virginica ……………………………………………………………………. 83
Figure 6.10 The scatter plot of the Iris data along the sepal length and three views
of the corresponding graphical representations (Sangole and Knopf,
2003) ………………………………………………………………………… 84
Figure 6.11 Three-dimensional graphical representations (glyphs) of the Iris flower
for the four dimensional measurements of the data (Sangole and
Knopf, 2003) ………………………………………………………………. 85
Figure 6.12 Various views of the three-dimensional graphical representation of the
nine-dimensional Wisconsin cancer data. The location of one input
vector for each type of tissue sample, benign and malignant, are
identified on the colourized glyph ………………………………………….. 87
Figure 6.13 A misclassified benign tissue sample is identified in the three-
dimensional glyph ………………………………………………………….. 88
Figure 6.14 Three-dimensional glyph and the corresponding pattern projection onto
geographic space for multi-channel spectral data. The metric M3,, kji
reflects combined vegetation and temperature data for North America ……. 90
Figure 6.15 Three-dimensional glyphs and the corresponding pattern projections
onto geographic space for individual channels of spectral data. The
metric (M4,, kji )n reflects each channel of the multi-spectral data …………… 91
Figure 6.16 The different views of the 3D glyph representing annual snow
coverage during the year of 1991 and its projection onto the Northern
Hemisphere geographic space (Sangole and Knopf, 2002) ………………… 94
Figure 6.17 Enlarged views of the regions highlighted in Figure 6.16. These
regions primarily fall within valleys, thereby indicating that very little
variation exists amongst the input vectors that have been assigned to
the cluster units (Sangole and Knopf, 2002) ………………………………. 95
xvi
Figure 6.18 The graphical representations of the annual snow cover patterns during
1987 and 1988 with their respective projections onto the Northern
Hemisphere geographic space (Sangole and Knopf, 2002) ………………… 96
Figure 6.19 Graphical forms of the annual snow cover patterns for the years 1988
and from 1991-1995, with their respective projections onto the
geographic space. The region on the graphical form that has cluster
units onto which feature vectors of Alaska have been mapped is
highlighted in each representation (Sangole and Knopf, 2002) ……………. 97
Figure 7.1 Triangulation and its dual simplex mesh ……………………………………. 106
Figure A.1 Scattered coordinate data of a human head and initialization of the
spherical map ………………………………………………………………. 117
Figure A.2 Evolution of the tessellated form as it adaptively learns the topological
associations in the cloud of points of the human head …………………….. 118
Figure A.3 Synthetic coordinate data of an open ring and initialization of the
spherical map ……………………………………………………………….. 119
Figure A.4 Evolution of the tessellated form as it adaptively learns the topological
associations in the cloud of points of the open ring ………………………… 119
Figure A.5 Scattered coordinate data of the torus and initialization of the spherical
map …………………………………………………………………………. 120
Figure A.6 Evolution of the tessellated form as it adaptively learns the topological
associations in the cloud of points of the torus …………………………….. 120
Figure C.1 Illustration of shape registration based on using tessellated
representations of the freeform shapes (Knopf and Sangole, 2002a) ………. 127
Figure C.2 The original orientation of the tessellated representations for the
reference sculptured head sculptrS and the object to be matched sculpt
tS ……. 130
Figure C.3 Aligning the transformed with the reference tessellated representation
so that their centres-of-mass coincide ……………………………………… 130
Figure C.4 Transformation of the sculptured head at various stages of the
registration process along with corresponding registration error values …… 131
Figure C.5 Non-rigid transformation from Shape-1 to Shape-2 (Knopf and
Sangole, 2002b) ……………………………………………………………. 133
Figure C.6 Sequence of shape transformations involving morphing a cube into a
disc and back into the cube (Knopf and Sangole, 2002b) ………………….. 134
1
CHAPTER 1 INTRODUCTION
1.1 The Problem
Graphical methods are essential tools that assist data analysts and product designers in evaluating
their data and deriving meaningful inferences, thereby gaining deeper understanding about the
physical phenomenon characterizing their data. Emerging technologies have made available more
sophisticated graphical display tools. For example, recent advances in research have
demonstrated that immersive virtual reality (IVR) can provide a far richer visualization and,
interactive modeling and analysis medium (Van Dam et al., 2000; Nielson et al., 1997). These
sophisticated, interactive, task-driven display and analysis media employ the full range of human
sensorimotor capability and help in providing an insight on huge volumes of experimentally
acquired data. Generating graphical models with a structure that is simple to manipulate and also
facilitate replicating real world interactive techniques are issues that need to be addressed in order
to make use of this technology most effectively (Brooks, 1999). The data-driven modeling
approach discussed in this thesis presents a viable mechanism to generate a variety of 3D
graphical forms, or glyphs, of data that enable intuitive and task-driven interaction with
maximum flexibility and versatility. Several data sets are used to demonstrate the effectiveness of
the proposed methodology including scattered 3D coordinate data and a variety of abstract
numeric data sets. The following section briefly discusses the terminology used and the context
within which they apply before introducing the general data-driven modeling concept.
1.2 Basic Terminology
Data-driven, in this context, implies a process that tries to extract novel characteristics of the
underlying physical system or phenomenon from strings of numeric data with little or no a priori
assumptions about the organization of the data (Solomatine, 2002). Structure refers to the
configuration of the lower-dimensional mapping space that tries to simplify the representation of
complex patterns inherent in the data. Its purpose is to enhance and display novel features or
attributes of the data, or the data itself, in a manner that emphasizes other underlying patterns that
are not easily apparent in the string of numbers, thereby providing a better understanding of the
physical phenomenon. It therefore forms a basis of connecting or relating the input space
variables that describe the system’s behaviour and which, by themselves, provide limited
2
knowledge about the details of the phenomenon or the system under study. Topological
associations apply to neighbourhood relationships between the data vectors in the input space.
These associations are likewise reflected in the mapping space by establishing connectivity
between groups or clusters of similar data vectors. It is therefore essential that the map be
configured in a manner that reflects these associations. Throughout the thesis, ‘map’ signifies the
structure within the mapping space and is intuitive in the visual sense. It is hence different from
the mathematical perspective where it is used as a synonym for transformation. The term
“abstract” is used in conjunction with numeric data in order to distinguish between the two
categories of data: spatial (x, y, z) 3D coordinate data of freeform objects, and numeric data for
scientific visualization that consist of a string of numbers indicating different attributes of the
physical phenomenon being observed.
1.3 General Concept of the Modeling Mechanism
The research work presented is this thesis discusses a data -driven modeling mechanism to
generate 3D tessellated graphical representations of data. Its core algorithm utilizes the
unsupervised clustering algorithm of the self-organizing feature map (SOFM) to adaptively learn
associations hidden in the data. The topology-preservation capability of the SOFM causes groups
of similar data vectors to get assigned to identifiable neighbourhoods in its predefined lattice. The
closed structure of the lattice is utilized to create a 3D graphical form that represents the data
within a simple structure with known inter-node connectivity thereby making it favourable to be
intuitively manipulated in an interactive display medium and consequently facilitating an
exploratory, task-driven analysis.
A major ity of the research work involving the use of the self-organizing feature map,
primarily pattern classification, exploit the unsupervised clustering capability of the SOFM. On
the contrary, the research presented in this thesis gives emphasis to the topology of the predefined
lattice and explains how it influences the SOFM learning parameters. The predefined lattice of
the SOFM in this work is a tessellated unit sphere where each node represents a cluster unit.
Although clustering is the primary function of the SOFM algorithm the discussion is directed
towards highlighting the merits of its predefined spherical lattice and topology for modeling and
visualization. It is assumed that the same principles of clustering apply and are as valid as in the
case of a 1D or a 2D SOFM lattice (Kohonen, 1997).
3
Since there is very limited published literature on the spherical self-organizing feature map,
the thesis first explains the implementation of the data-driven modeling algorithm using scattered
3D coordinate data. This exercise demonstrates the topology preserving nature of the spherical
self-organizing feature map by generating tessellated representations of freeform objects given
their cloud of scattered points. Every node in the tessellated representation is therefore a
coordinate point in 3D space. It is later applied to seemingly “random” multi-dimensional data to
demonstrate how abstract information embedded in the data set is given a graphical form. Unlike
in the previous case, every node in the tessellated representation is now an N-dimensional vector.
In order to create a graphical representation, associations or inter-relationships between the
numeric vectors are extracted using metrics or mathematical formulations that best quantify and
relate to some aspect of the physical phenomenon. The metric may relate the N-dimensional
vectors of neighbouring nodes (cluster units) and consequently establish a relation between
groups of similar data vectors or, it may relate a group of input vectors assigned to a particular
cluster unit. In this manner, every node has one or more scalar values reflecting some aspect of
the physical phenomenon. The spherical lattice is then deformed or colourized in proportion to
these values thereby transforming it into a visual dimension of the display space. The resulting
colour-coded tessellated form is thus a glyph or a graphical representation reflecting patterns
hidden in the data. The mechanism provides researchers with an automatic/ semi-automatic
method to quickly enhance and observe novel characteristics of their data by means of graphical
representations and assist in gaining deeper insights on the physical phenomenon characterized by
their numeric experimental data. Unlike in the case of scattered coordinate data, interpretation of
the graphical representation is complex and largely application dependent.
In both cases, 3D scattered data and abstract numeric data, the resulting representations
originate from a tessellated sphere and the connected nodes in the lattice assist in the comparison
of multiple representations within interactive environments by simplifying the process of re-
orientating the object in the display space. Several different approaches are used to generate 3D
tessellated graphical representations of data. They have been discussed appropriately to address
the two general categories of data used in the analysis: scattered coordinate data and abstract
numeric data for scientific visualization. Three-dimensional coordinate data acquired using
commercial scanning systems help to illustrate certain novel features of the mechanism that
would otherwise be difficult to understand and perceive in the case of abstract numeric data. The
objective is to present the versatility of the spherical deformable map algorithm and the potential
of its applicability in data modeling and visualization using innovative immersive virtual reality
technology.
4
1.4 Overview of the Thesis
The thesis first explains in Chapter 2 the Sammon’s mapping method and the multi-dimensional
scaling method that, like the self-organizing feature map, retain some notion of topology. It then
provides an elaborate discussion on the self-organizing feature map learning strategy and the
influence of the topology of its predefined lattice on data organization. Chapter 3 discusses the
data-driven modeling mechanism when it is implemented using scattered 3D coordinate data
followed with several examples in Chapter 4 to illustrate its performance. The examples
demonstrate two cases, one in which the integrity of the map is maintained during learning and
the second, wherein the topological connections are updated during learning. Implementation of
the data-driven mechanism using abstract numeric data is explained in Chapter 5. Various
examples illustrating its application are presented in Chapters 6. The examples include chaotic
attractor data, simulated six-dimensional data, Fisher’s Iris flower data, Wisconsin breast cancer
data, geographic and annual snow-coverage data. These examples demonstrate the capability of
the mechanism to generate consistent, reproducible, colour-coded shapes that represent the
numeric data and reflect underlying associations (natural clusters) in the form of variations in
colour and surface geometry of the 3D graphical representations (glyphs). In conclusion, Chapter
7 summarizes significant results and suggests recommendations to resolve limitations and to
further improve the method.
Appendix A provides examples of scattered 3D coordinate data that are not included in
Chapter 4. Tables of the chaotic attractor parameters and the simulated high-dimensional abstract
numeric data, used in Chapter 6, are included in Appendix B. Preliminary work involving the
utilization of the tessellated representations of scattered 3D coordinate data in rigid and non-rigid
shape transformations are summarized in Appendix C.
5
CHAPTER 2 THE SPHERICAL SELF-ORGANIZING FEATURE MAP
2.1 Introduction
The proposed data-driven modeling method utilizes a deformable map that is initialized by a
tessellated unit sphere with predefined topological associations between the nodes of the spherical
lattice. The map learns the topology of the input space using the weight adaptation strategy of the
self-organizing feature map (SOFM). This chapter discusses other unsupervised mapping
methods that maintain topology before presenting an overview of the fundamentals of the self-
organizing feature map and its learning strategy. It highlights the merits of using a spherical
topology in the predefined lattice of the SOFM architecture. It also provides a discussion on how
the proposed method compares with the Sammon’s mapping approach and the multi-dimensional
scaling method. In conclusion, it summarizes key points that justify the selection of the self-
organizing feature map in this particular research work.
2.2 Unsupervised Topology Mapping Methods
This section briefly discusses the two primarily used methods, besides the self-organizing feature
map, that generate mappings that try to preserve some notion of input topological associations –
namely the Sammon Mapping method and the statistical method of Multi-Dimensional Scaling.
2.2.1 Sammon’s mapping method
The Sammon’s mapping method (Sammon J.W.Jr., 1969), is a non-linear mapping approach that
maps high-dimensional vectors to a lower-dimensional space while preserving inherent geometric
relationships among subsets of data vectors in the input space. The algorithm involves a point-to-
point mapping from the N-dimensional space to the lower (2 or 3) dimensional space such that
inter-point distances are approximately preserved. If there are P data vectors in the N-dimensional
space (ℜN), XP such that xp = [ pN
pp xxx ,...,, 21 ], where p = 1, 2, … P, and the mapped space is
denoted by D, where D ∈ ℜ2 or ℜ3, then the cost function Esammon of the mapping operation is
6
Esammon = ∑∑ <<
−P
jiij
ijij
ji
ij δdδ
δ
2][][
1 (2.1)
where
],[ ji xxdistδ ij = , (2.2a)
and
],[ ji yydistd ij = , (2.2b)
( ji xx , ) ∈ ℜN, ( ji yy , ) ∈ ℜ2 or ℜ3, i = 1, 2, … , P and j = 1, 2, … , P (Sammon J.W.Jr., 1969).
The initial starting configuration of D is generally defined by an orthogonal projection of the N-
dimensional vectors onto the D space.
2.2.2 Multi-dimensional scaling method
Multi-dimensional scaling (MDS) is another technique that like the Sammon’s method, does a
point-to-point mapping, by preserving inter-point distances. Given a set of P data vectors in the
input space, associations in the Euclidean sense, between approximately P(P-1)/2 pairs of points
are preserved in the low-dimensional space (Duda and Hart, 1973). The cost-function for finding
an optimum configuration in the low-dimensional space may be defined as below:
EMDS - 1 = ∑∑ <
−P
ji
ijdδ)(δ
22 )(1 ij
ij , (2.3a)
EMDS - 2 = 2
∑<
⎟⎟⎠
⎞⎜⎜⎝
⎛ −P
ji
ij
δdδ
ij
ij , (2.3b)
EMDS - 3 = ∑∑ <
<
−P
ji
ijij
P
ji
δdδ
δij
ij
2)(1 , (2.3c)
7
where ijδ is the measure of dissimilarity in ℜN while, dij is that in ℜ2 or ℜ3, i = 1, 2, … , P and j =
1, 2, … , P. EMDS - 1 emphasizes largest errors in the mapping operation, EMDS - 2 emphasizes
largest fractional errors, while, EMDS - 3 is a compromise of the previous two as it emphasizes the
largest product of the two errors and has also been implemented in the Sammon’s mapping
method. The initial configuration in the low-dimensional space is chosen randomly or by using a
projection of the vectors along the direction of largest variance.
2.3 Self-Organizing Feature Map (SOFM)
The self-organizing feature map (Kohonen, 1997) is an unsupervised clustering algorithm that
develops a set of “internally ordered” cluster units (nodes) by exploiting hidden redundant and
complementary feature vectors and tries to preserve topological associations of the input space
within its predefined lattice. A continuous input space of activation patterns is mapped onto a
discrete output space of nodes, arranged on a predefined lattice, by a process of competition
among nodes in the SOFM network (Haykin, 1999). The response of the network, to an input
pattern presented to it, is interpreted in terms of the position of the node in the predefined lattice
or the weight vector of a node closest to the input pattern in the Euclidean sense.
If χ represents the N-dimensional spatially continuous input space that comprises of a set of
{XN} activation patterns, whose topology is defined by metric relationship of the vectors xp ∈ χ ,
and ϖ denotes the spatially discrete SOFM space, then, in the mathematical sense (Haykin, 1999)
the SOFM non linear transformation may be expressed as
Φ : χ → ϖ (2.4)
where Φ the SOFM non-linear mapping between the input data space and the weight vectors of
the SOFM space, Figure 2.1. Every input vector gets assigned to a cluster unit in the discrete
SOFM space. The weight vector of the assigned cluster unit forms a representation of the input
vector and may be considered as an image of that cluster unit when it is projected in the input
space.