THREE-TIER DETECTION AND MULTI-LEVEL SYNERGY FOR COASTAL
MIXED-LAND ZONE CLASSIFICATION
MUHAMAD ASYRAF BIN MOHD POUZI
UNIVERSITI TEKNOLOGI MALAYSIA
THREE-TIER DETECTION AND MULTI-LEVEL SYNERGY FOR COASTAL
MIXED-LAND ZONE CLASSIFICATION
MUHAMAD ASYRAF BIN MOHD POUZI
A thesis submitted in partial fulfillment of the
requirements for the award of the degree of
Master of Science (Computer Science)
Faculty of Computing
Universiti Teknologi Malaysia
MAY 2013
iii
All praises to Allah the Almighty for
the strengths and His blessing in completing this thesis.
Specially dedicated to;
my beloved parents Mohd Pouzi bin Hamzah and Murni binti Ghani
my precious siblings Muhamad Afiq and Adlina Najihah
my most helpful friend Ismaliza binti Ismail
iv
ACKNOWLEDGEMENT
I would like to express my gratitude to my supervisor, Dr. Muhamad Razib
bin Othman, my co-supervisor, Dr. Hishammuddin bin Asmuni as well as Dr.
Rohayanti binti Hassan for their patience, guidance, encouragement, invaluable
comments, and advice that made this research possible and completed. I would like
to thank all members of the Laboratory of Computational Intelligence and
Biotechnology (LCIB) for their continuous support in many aspects of this research.
My deepest appreciation also goes to my parents as they were the ones who
encouraged me to pursue my MSc. My father, who is a lecturer, shared his valuable
experience in research including the ethics and skills while my mother has been
continuously giving her greatest support. The strength to withstand the hardships that
were encountered along this road came from their aspirations that live in me, for it
will inspire me to achieve more great things in life ahead. With God’s willing.
The datasets used in this study are the courtesy of Geographic Information
System (GIS) solution provider and Malaysian Remote Sensing Agency (ARSM).
Last but not least, I would like to acknowledge the funding from GATES BIOTECH
Solution Sdn. Bhd. (GBIT) under GATES Scholars Foundation (GSF) scheme
(LTR/GSF/2011-06) and MyMaster Scholarship of Ministry of Higher Education
Malaysia as well as the research opportunity provided by the Faculty of Computing,
Universiti Teknologi Malaysia.
v
ABSTRACT
Vegetation, urban terrain and water are considered as the problematic
segments in land use and land cover classifications because of confusion factors.
These segments are vulnerable to high misclassification level. In addressing these
problems, several fundamental issues shall be emphasized: ineffective stand-alone
data classification, high investment for data fusions and the need for high frequency
of data collection. Thus, this research proposes a classification method consisting of
two important components: Three-tier Detection (TTD) and Multi-level Synergy
(MLS) after evaluating LiDAR point cloud, aerial photography, Quickbird and
Landsat 7 ETM+ images. TTD which is a hierarchical and priority-based data fusion
method is used to solve the vegetation and urban terrain classification while MLS,
which is a synergy strategy by the utilization of single data and robust learning
algorithms is used for water classification. The creation of TTD that has managed to
outperform the stand-alone data classification made it a worthwhile investment while
for MLS, the usage of single data is capable of meeting the high data collection
demand. Both methods started with data processing such as image filtering followed
by the comparison of several existing techniques for each data (rank) to identify their
potentials and limitations. Next, multi-level data fusions and multi-level synergy are
conducted for TTD and MLS, respectively. The dataset employed is Bukit Kanada,
Sarawak which exemplifies a coastal mixed-land zone. The performance is then
measured using statistical indices include overall accuracy and Kappa Index of
Agreement. Both TTD and MLS outperformed recent works such as Normalized
Digital Surface Model, Edge Detection technique and Support Vector Machine.
Based on the success rates, TTD is suitable to be applied in planning and
development sectors, management and detection of land use changes while MLS is
suitable for creating maps, charts, and also in monitoring national coastline.
vi
ABSTRAK
Cabaran utama dalam pengklasifikasian penggunaan dan penutupan tanah
adalah kekeliruan yang berlaku pada segmen-segmen yang bermasalah seperti
tumbuh-tumbuhan, kawasan bandar dan air. Segmen-segmen ini terdedah kepada
tahap keterlepasan pengklasifikasian yang tinggi. Bagi menangani permasalahan ini,
beberapa isu asas perlu dititikberatkan iaitu pengklasifikasian data tunggal yang tidak
berkesan manakala paduan data melibatkan pelaburan yang tinggi serta
kebergantungan kepada frekuensi pengumpulan data yang tinggi. Oleh yang
demikian, kajian ini telah memperkenalkan satu kaedah pengklasifikasian yang
terdiri daripada dua komponen penting iaitu Three-tier Detection (TTD) dan Multi-
level Synergy (MLS) setelah menilai beberapa data. TTD merupakan kaedah paduan
data yang berasaskan kepada hierarki dan keutamaan yang digunakan untuk
pengklasifikasian tumbuh-tumbuhan dan kawasan bandar manakala MLS yang
merupakan strategi sinergi berdasarkan kepada data tunggal dan algoritma-algoritma
pembelajaran digunakan untuk pengklasifikasian air. Pencapaian TTD yang telah
berjaya mengatasi pengklasifikasian data tunggal menjadikannya suatu pelaburan
yang berbaloi manakala MLS yang dioperasikan berdasarkan data tunggal dilihat
mampu memenuhi kebergantungan kepada frekuensi pengumpulan data yang tinggi.
Kedua-dua kaedah ini bermula dengan pemprosesan data seperti penapisan imej dan
diikuti dengan perbandingan beberapa teknik yang sedia ada untuk setiap data bagi
mengenal pasti potensi dan kelemahannya. Seterusnya, pelbagai peringkat paduan
data dan sinergi diuji bagi TTD dan MLS. Lokasi kajian ini ialah Bukit Kanada,
Sarawak yang merupakan zon tanah bercampur di kawasan pantai. Berdasarkan
indeks-indeks statistik termasuk ketepatan keseluruhan dan Kappa Index of
Agreement, TTD dan MLS telah berjaya mengatasi kerja-kerja baru seperti
Normalized Digital Surface Model, teknik Edge Detection dan Support Vector
Machine. Dengan pencapaian ini, TTD sesuai untuk diaplikasikan dalam sektor
perancangan dan pembangunan, pengurusan dan pengesanan perubahan penggunaan
tanah manakala MLS sesuai untuk mewujudkan peta, carta dan juga memantau
perairan kebangsaan.
vii
TABLE OF CONTENTS
CHAPTER TITLE PAGE
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
ABSTRAK vi
TABLE OF CONTENTS vii
LIST OF TABLES xi
LIST OF FIGURES xiv
LIST OF ABBREVIATIONS xvii
1 INTRODUCTION 1
1.1 Background 1
1.2 Challenges of Land Use and Land Cover
Classification
3
1.3 Current Methods in Land Use and Land Cover
Classification
5
1.4 Problem Statement 6
1.5 Objectives of the Study 7
1.6 Scope and Significance of the Study 8
1.7 Organization of the Thesis 10
2 LITERATURE REVIEW 11
2.1 Introduction 11
2.2 Coastal Mixed-land Zone 12
viii
2.3 Land Use and Land Cover Classification 15
2.4 Remote Sensing Data 16
2.5 Data Fusions 21
2.6 Remote Sensing Data and Learning Algorithm(s) 26
2.7 Information Fusion Schemes 28
2.8 Trends and Directions 31
2.9 Summary 33
3 RESEARCH METHODOLOGY 34
3.1 Introduction 34
3.2 Research Framework 35
3.3 Data Sources and Preparation 38
3.3.1 LiDAR Point Cloud 38
3.3.2 Aerial Photography 39
3.3.3 Quickbird Image 39
3.3.4 Landsat 7 ETM+ Image 40
3.3.5 Area-of-Interest 40
3.4 Instrumentation and Results Analysis 42
3.4.1 Hardware and Software Requirements 42
3.4.2 Testing and Analysis 42
3.4.3 Evaluation Metrics 43
3.5 Summary 46
4 A COMPARATIVE STUDY OF STAND-ALONE
DATA APPLICATION IN COASTAL MIXED-LAND
ZONE
47
4.1 Introduction 47
4.2 Related Works 48
4.3 Materials and Method 49
4.3.1 Test Scenes 49
4.3.2 Generation of DSM and DTM 50
4.3.3 Orthorectification 52
4.3.4 Image Filtering 53
4.3.5 Signature Analysis 55
ix
4.3.6 Classification 56
4.4 Results and Discussion 57
4.4.1 Analysis on Classification Results 67
4.4.2 Analysis on Supervised Classifiers 60
4.4.3 Analysis on Non-classifiers 61
4.4.4 Analysis on False Negative and False
Positive
63
4.5 Summary 64
5 THREE-TIER DETECTION AS EFFECTIVE DATA
FUSIONS FOR VEGETATION AND URBAN
TERRAIN CLASSIFICATION IN COASTAL
MIXED-LAND ZONE
65
5.1 Introduction 65
5.2 Related Works 66
5.3 Materials and Method 67
5.3.1 Test Scenes 69
5.3.2 Orthorectification 70
5.3.3 Image Filtering 71
5.3.4 Signature Analysis 72
5.3.5 Classification 73
5.3.5.1 Stand-alone Data 73
5.3.5.2 Multi-level Data Fusions 73
5.4 Results and Discussion 74
5.4.1 Analysis on Classification Results 75
5.4.1.1 Analysis on Tier-1
(Inter-data Comparisons)
76
5.4.1.2 Analysis on Tier-2
(Fusion of Two Data)
77
5.4.1.3 Analysis on Tier-3
(Fusion of Three Data)
78
5.4.2 Analysis on False Negative and False
Positive
78
5.4.3 Comparison to Other Related Works 81
5.5 Summary 81
x
6 MULTI-LEVEL SYNERGY (MLS) OF AERIAL
PHOTOGRAPHY WITH THE FUSION OF
LEARNING ALGORITHMS FOR WATER
CLASSIFICATION IN COASTAL MIXED-LAND
ZONE
83
6.1 Introduction 83
6.2 Related Works 84
6.3 Materials and Method 85
6.3.1 Test Scenes 85
6.3.2 Orthorectification 86
6.3.3 Image Filtering 90
6.3.4 Signature Analysis 91
6.3.5 Classification 92
6.3.5.1 Synergy of Single Remote
Sensing Data with Different
Computational Techniques
92
6.3.5.2 Multi-level Synergy of Remote
Sensing Data with Computational
Techniques
93
6.4 Results and Discussion 95
6.4.1 Analysis on Stage 1
(Ranking Determination)
95
6.4.2 Analysis on Stage 2
(Synergy-based Classification)
96
6.4.3 Analysis on McNemar’s Test for End Result
Selection
96
6.4.4 Analysis on False Negative and False
Positive
98
6.4.5 Comparison with Other Related Works 99
6.5 Summary 100
7 CONCLUSION 101
7.1 Concluding Remarks 101
7.2 Contributions of the Research 103
7.3 Future Works 104
7.4 Closing Remarks 105
REFERENCES 106
xi
LIST OF TABLES
TABLE NO. TITLE PAGE
2.1 The advantage and disadvantage of LULC classification
patterns that include amount of classification steps taken,
information provided and also time consumption
17
2.2 Example of works for each LULC classification patterns 18
2.3 Descriptions of remote sensing data. Note that RGB
denotes the red, green and blue, respectively
20
2.4 Related works that utilize data capabilities as well as the
advantage and disadvantage reported for each of the
method
22
2.5 Results of the method by Awrangjeb et al. (2010). The Cmp
and Crp denote completeness and correctness, respectively
while both Ql and Qlp denote quality. Low completeness
and correctness level in Scene 2 and also low quality level
in Scene 3
24
2.6 Results of the method proposed by Hermosilla et al.
(2011). The µ denotes mean value while σ denotes standard
deviation. The high µ value performance indicates that the
thresholding-based approach performed better, meanwhile
the low σ value suggests a better robustness for this
approach
24
2.7 LULC classification based on data fusions that consist of
various type of data and methods
27
2.8 Results of the OMISII image classification by Du et al. 28
xii
(2012). The application of MCS outperformed single
classifier-based classification and between the MCS, the
concatenation combination outperformed the others
2.9 The application of learning algorithms to remote sensing
data including the advantage and disadvantage
29
3.1 Testing analysis conducted in this study for particular SOI
by using several data types as well as the respective chapter
for reference
43
3.2 KIA strength of agreement by Landis and Koch (1977) 45
4.1 Filtering formula and weighting function of ELF. Csi is
standard speckle index, Ci is varied standard speckle index,
L is the number of looks, Cmax is the upper threshold and Kd
is called damping factor
54
4.2 RGB attributes for the imageries based on the signature
analysis of both SOIs. 10 samples for each SOI. These
spectral information used as a reference for supervised
classifiers in this study
55
4.3 Results for stand-alone data classification (average of three
test scenes) for vegetation and urban terrain. The
highlighted rows indicate the best method for each data for
vegetation and urban terrain classification
59
4.4 FN and FP for vegetation and urban terrain classification.
The ‘*’ indicates that the process is conducted for
vegetation classification only while the highlighted rows
indicate the best method for each data to classify the SOIs
63
5.1 RGB attributes for the imageries based on the signature
analysis of both SOIs. 10 samples for each SOI. These
spectral information used as a reference for supervised
classifiers in this study
72
5.2 Results for TTD (average of five test scenes) for each SOI.
The highlighted rows indicate the best method in each tier
76
5.3 Decrement of FNs and FPs for the classification using
TTD. The highlighted rows signify no change to the FN or
79
xiii
FP values. This phenomenon is caused either by the value
is already 0 or no mutual agreement is achieve with the
additional supportive layer
6.1 RGB attributes for the imageries based on the signature
analysis of water (20 samples). These spectral information
used as a reference for supervised classifiers in this study
92
6.2 The results of water classification in Stage 1 (average of
ten test scenes) for LPC, AP, QB and LDST
96
6.3 The results of water classification in Stage 2 (average of
ten test scenes). The highlighted row indicates the selected
synergy level for the end results
97
6.4 Statistical significance of differences in overall
classification accuracy expressed by means of the
McNemar’s test for the datasets. Statistically significant
differences at 5% level of significance (|Zij| ≥ 1.96) are
highlighted in bold face. The 4th
and 5th
level of synergy do
not satify the McNemar’s rule. Thus, the 3rd
level of
synergy is chosen as the final result
98
6.5 Decrement of FNs and FPs for the classification using
MLS
99
6.6 Superiority of the proposed synergy approach over data
fusions (average of ten test scenes). The ‘’ indicates
synergy
100
xiv
LIST OF FIGURES
FIGURE NO. TITLE PAGE
1.1 Comparisons between a coastal mixed-land zone and
urban area; (a) a coastal mixed-land zone and (b) urban
area
2
1.2 Several characteristics of a coastal mixed-land zone; (a)
seawater, (b) industrial areas, (c) port facilities, and (d)
roads
3
1.3 Scope of the study. Note that “*” indicates the
involvement in water classification only
9
2.1 The characteristics of vegetation classes 12
2.2 The possible confusion factors 13
2.3 Rapid changes of LULC segments in year 2003 14
2.4 Patterns in LULC classification 15
2.5 Examples of remote sensing data. (a) LPC, (b) AP, (c)
QB and (d) LDST. Note that for (d), the similar segment
with (a) to (c) is highlighted in black box
19
2.6 Swimming pool and flowering plants in pots (highlighted
in red box)
21
2.7 Results of the method proposed by Pérez-Hoyos et al.
(2012). Note that OA denotes Oac
25
2.8 Results of the comparative study by Du et al. (2012) for
Decision Level Fusion. The Majority Voting (highlighted
in red box) was ranked first over Weighted Majority
Vote, Fuzzy Integral and Dempster-Shafer evidence
theory (denoted by DS) in all diversity cases
30
2.9 Results of the comparative study by Du et al. (2013) that
include Data Level Fusion and Decision Level Fusion.
Again, the Majority Voting (denoted by MV and
highlighted in red box) showed outstanding performance
followed by Fuzzy Integral (denoted by FI), Dempster-
Shafer evidence theory (denoted by D-S), Generalized
31
xv
Intensity-Hue-Saturation (denoted by GIHS), Principal
Component Analysis (denoted by PCA), Gram-Schmidt
(denoted by GS), High Pass Filter (denoted by HPF) and
Wavelet Transform (denoted by WT). In short, the
Majority Voting works by collecting the agreement
between particular numbers of parties in deciding the
final decision.
2.10 Directions of this study in improving the SOIs
classification
32
3.1 Research framework 35
3.2 Research structure 37
3.3 The LPC data 38
3.4 The AP data 39
3.5 The QB data 39
3.6 The LDST data 40
3.7 The details of Sarawak and overview of Bukit Kanada 41
4.1 Flowchart of the classification procedures 50
4.2 The three test scenes for each data 51
4.3 The Digital Surface Model (DSM) and Digital Terrain
Model (DTM)
52
4.4 The results of orthorectification process 53
4.5 The results of enhanced Lee filtering (ELF) 54
4.6 The results for all methods in each test scene 58
4.7 The KIA order of learning algorithms. (a) AP and (b) QB 60
4.8 Examples of the confusion factors (highlighted in red
box). The first rows are the original images while the
second rows are the classified images; (a) mossy roofs of
old buildings, (b) dark coloured buildings, (c) shadow
and (d) ground class. The green colour should indicating
vegetation classification result
61
4.9 The limitations of the classification. (a) LPC (TP for
vegetation is highlighted in green while low vegetation is
undetected), (b) LPC (TP for the building is highlighted
in blue while the uneven building structure is not fully
detected), (c) LPC (particular buildings are incorrectly
classified as medium vegetation highlighted in red) and
(d) QB (shaded areas)
62
5.1 Framework of TTD. “k” denotes the selected method in
Chapter 4
67
5.2 Algorithm of TTD for the fusions of LPC, AP and QB 68
xvi
5.3 The test scenes for this study. (a) TS1, (b) TS2, (c) TS3, (d)
TS4 and (e) TS5
69
5.4 The results of orthorectification process. (a) to (e) are TS1
to TS5 (AP), respectively while (f) to (j) are TS1 to TS5
(QB), respectively
70
5.5 The results of enhanced Lee filtering (ELF) 71
5.6 Illustration of multi-level data fusions by TTD 73
5.7 The results of TTD in each test scene from Tier-1 to
Tier-3
74
5.8 The differences between Majority Voting with the
proposed TTD
80
6.1 Framework of MLS. Note that “” is synergy and “*”
indicates the process is conducted for LDST only
86
6.2 Algorithm of MLS. The synergy approach () between
Rsd and Ct
87
6.3 The test scenes for this study. TS1 to TS5 show the
characteristics of water areas for: (a) AP and (b) QB
88
6.3 (cont.) The test scenes for this study. TS1 to TS5 show the
characteristics of water areas for: (c) LDST and (d) TS6
to TS10 are the different ratio of LULC segments.
89
6.4 The results of orthorectification process 90
6.5 The results of enhanced Lee filtering. (a) TS1 to TS5 and
(b) TS6 to TS10
91
6.6 The concept of the proposed synergy approach. Note that
in (c), the outputs of R1 synergized with R2 are
highlighted in red. These regions indicate the overlapped
regions between the FP or FN of R1 and R2
94
xvii
LIST OF ABBREVIATIONS
AB - AdaBoost
AP - Aerial photography
AOI - Area-of-interests
ANN - Artificial Neural Networks
BB - Bagging and Boosting
Bf - Branching factor
Ce - Commission error
Cm - Completeness
Ct - Computational technique
Cr - Correctness
DS - Dempster-Shafer
DEM - Digital Elevation Model
DSM - Digital Surface Model
DTM - Digital Terrain Model
ED - Edge Detection
ELF - Enhanced Lee Filtering
FN - False Negative
FP - False Positive
FS - Fuzzy Sets
GIS - Geographical Remote Sensing
Gr - Green
GCP - Ground control point
htnDSM - Height threshold of nDSM
ICP - Image control points
xviii
J48 DT - J48 Decision Tree
KIA - Kappa Index of Agreement
kNN - k-Nearest Neighbor
LC - Land Cover
LU - Land Use
LDST - Landsat 7 ETM+
LPC - LiDAR point cloud
ARSM - Malaysian Remote Sensing Agency
ML - Maximum Likelihood
M - Methods
MIR - Middle infrared band
MD - Minimum Distance
Mf - Miss factor
MODIS - Moderate-resolution Imaging Spectroradiometer
MNDWI - Modified Normalized Difference Water Index
MLP - Multi-layer Perceptron
MLPNN - Multi-layer Perceptron Neural Network
MLS - Multi-level Synergy
Multi - Multiple
MCS - Multiple classifier system
NB - Naïve Bayes
NIR - Near-infrared
NN - Neural Network
NDVI - Normalized Difference Vegetation Index
nDSM - Normalized Digital Surface Model
Oe - Omission error
Oac - Overall accuracy
PP - Parallelepiped
PC - Personal Computer
Q - Quality
QB - Quickbird
RBF - Radial Basis Function
RAM - Random Access Memory
xix
RF - Random Forest
R - Rank
RGB - Red, green and blue
Rsd - Remote sensing data
S - Sample
SOI - Segment-of-interest
SOM - Self-Organizing Map
SRTM - Shuttle Radar Topography Mission
Si - Single
SAM - Spectral Angle Mapper
SVM - Support Vector Machine
SAR - Synthetic Aperture Radar
TS - Test scenes
TTD - Three-tier Detection
TIN - Triangulated Irregular Network
TN - True Negative
TP - True Positive
UTM - Universal Transverse Mercator
CHAPTER 1
INTRODUCTION
1.1 Background
In Geographical Remote Sensing (GIS) perspective, Earth can be categorized
into two categories: (i) Land Use (LU) which describes the human use of land that
involves management or modification of natural environment or wilderness into built
environment such as residential and industrial area; and (ii) Land Cover (LC) which
describes physical materials at the surface of Earth such as vegetation, urban terrain
and water segments. The main difference between LU and LC is that LU concerns on
the changes that are made to the Earth’s surface while LC includes the changed area
and natural area. Among the important aspects related to LU and LC is the
classification that also depends on the efficiency of data procurement. The traditional
ways involve physical contact with the objects, high cost, time-consuming and some
areas are not easy to be reached such as volcanic, landslides and remote areas. These
limitations have been successfully overcome by the enhanced remote sensing
technology by using plane, unmanned aerial vehicle and satellite which do not
involve physical contact, low cost, fast and high reach capabilities. As well as the
data, the capabilities increase from time to time and each of them carries particular
advantages in which some are unique. LiDAR point cloud (LPC), aerial photography
(AP), Quickbird (QB) and Landsat 7 ETM+ (LDST) images are examples of remote
sensing data types used for the classification.
2
The classification task takes place in two types of area-of-interests (AOI: see
Figure 1.1) which are: (i) mixed-land zones (Garcia-Gutierrez et al., 2011); and (ii)
urban areas (Awrangjeb et al., 2010). A mixed-land zone is an area that is comprised
of land uses in a variety of ways such as industrial areas, port facilities, roads and
natural areas in the same space (Garcia-Gutierrez et al., 2011; see Figure 1.2). It
usually consists of low and medium-sized buildings and large area vegetation
segments. These characteristics exist in almost all countries in the world. However,
in a coastal mixed-land zone, seawater is included besides freshwater. Meanwhile, an
urban area consists mainly of high buildings and smaller vegetation areas. Besides,
an urban area may consist of more modern and sophisticated structures. Among the
obvious examples are Petronas Twin Towers and KL Tower in Kuala Lumpur as
well as Taming Sari Tower in Melaka. The basic criteria used to differentiate these
two AOIs are building density and total area of vegetation in which a coastal mixed-
land zone commonly has lower building density and larger total area of vegetation
compared to the urban areas.
a b
Figure 1.1 Comparisons between a coastal mixed-land zone and urban area; (a) a
coastal mixed-land zone and (b) urban area.
3
a b
c d
Figure 1.2 Several characteristics of a coastal mixed-land zone; (a) seawater, (b)
industrial areas, (c) port facilities, and (d) roads.
1.2 Challenges of Land Use and Land Cover Classification
The existence of many types of data with various capabilities accompanied by
advances in computer field provides wider options in LULC classification in order to
improve the classification accuracy. These options include Normalized Digital
Surface Model (nDSM) application presented by Demir et al. (2008), the height
threshold of nDSM (htnDSM) application by Hermosilla et al. (2011) and the Edge
Detection (ED) technique by Babykalpana and Thanushkodi (2011). The nDSM,
htnDSM and ED technique were reported to produce good classification accuracy
together with certain limitations. This is where the first challenge belongs. Due to the
experiments demonstrated by the previous researchers that are incomparable from
each other since many parameters are different such as: (i) dataset; (ii) method; and
4
segment-of-interest (SOI), these inconstancies made the comparison difficult while it
is crucial to recognize the advantages and disadvantages of each remote sensing data,
techniques and the proposed methods to provide better solution for particular
problems.
Several works such as by Sohn and Dowman (2007) and Awrangjeb et al.
(2010) have suggested data fusion as a new option for LULC classification. It
enables the collection of useful information from different sensors (Campos et al.,
2010) which is reported to outperform stand-alone data classification. A number of
researchers such as Rottensteiner et al. (2005) and Hyde et al. (2006) have proved
the potential of data fusions as new alternative in LULC classification. In order to
consider the data fusion options, the second challenge must be tackled. This
challenge concerns the data fusion methods used to solve the classification problem
since current results were not in satisfactory level. As example, the method by
Campos et al. (2010) which employs data fusion of LPC data with QB data achieved
Kappa Index of Agreement (KIA) value of 0.78 which is in the range of “Good”
based on Kappa strength of agreement. The achievement is below the “Very good”
range. Awrangjeb et al. (2010) fused various products of LPC with QB. Low
completeness and correctness level in particular test scenes as well as low quality
level were reported. To date, Pérez-Hoyos et al. (2012) created a synergetic land-
cover map by using four types of data. However, some issues arose such as the
reproducibility of the method, availability and price of the data. In short, the key of
tackling the second challenge relies on the handling of the first challenge.
In contrast with vegetation and urban terrain segments, the water segment
possessed many factors which rapidly affect the condition of this segment such as
daily human activity, natural phenomena and pollution. Hence, frequent
classification works are needed to fulfill the purposes of water classification such as
to monitor the national coastline. Since this task requires high frequency of data
collection, the investment in preparing the data must be minimized. Such
circumstance requires forcing good results from a single data which lead to the
consideration of learning algorithms and water index utilization, where the third
5
challenge stemmed from. Multiple classifier system (MCS) is an advanced approach
of learning algorithms which has been proven by Du et al. (2012) to outperform
single classifier while Modified Normalized Difference Water Index (MNDWI) is a
well-known water index which was explored by Ho et al. (2010). This method is
capable of classifying water bodies due to the sensitivity of LDST data. However,
several issues aroused such as the classifiers that will be chosen which affect the
effectiveness and time consumption of the MCS and the identification of the superior
method for water classification.
1.3 Current Methods in Land Use and Land Cover Classification
Generally, current methods for LULC classification can be categorized into
two: stand-alone data and data fusions:
(i) Stand-alone data - utilizes a single dataset for the classification task
by exploiting the benefits of each data type. For LPC, nDSM
(Brennan and Webster, 2006; Demir et al., 2008), ED (Babykalpana
and Thanushkodi , 2011) and LPC with intelligent techniques (Garcia-
Gutierrez et al., 2011). For imagery data, the application of single
learning algorithm (Foody et al., 2007; Perumal and Bhaskaran, 2010;
Szuster et al., 2011) while the fusion of the learning algorithms was
implemented by Du et al. (2012). Lee and Yeh (2009) utilized the
near-infrared band of QB images by Normalized Difference
Vegetation Index (NDVI) which is a well-known vegetation index.
(ii) Data fusions - utilize more than one dataset for the classification task
whereby many schemes have been observed. Amarsaikhan et al.
(2010) fused the nDSM of LPC data with Support Vector Machine
(SVM) application on imagery data. Khoshelham et al. (2010) fused
the htnDSM with MCS that consists of SVM and Maximum
Likelihood (ML). Guan et al. (2012) fused Triangulated Irregular
6
Network (TIN) and nDSM with SVM. Elghazali (2011) and Campos
et al. (2010) fused LPC with QB data.
1.4 Problem Statement
The accuracy of LULC classification is a general problem because LULC
does not only consist of vegetation, urban terrain and water segments. In such
confused environment, the method of reducing the misclassification levels by using
large amount of data has not been adequately investigated. In order to solve this
problem, various classification schemes have been observed with single data
application, data fusions and learning algorithms are among the options. However,
for some purposes, the classification results are highly demanded over a period of
time such as water classification (the details are presented in Chapter 6). Hence, the
investment in preparing the data and the dependency on frequent data collection need
to be addressed. Since the results by different researchers are incomparable due to
various experimental configurations, therefore, the classification problems to be
solved in this study can be described as follows:
“Given a number of remote sensing data with particular capabilities and
exploitation methods with their barely known advantages and disadvantages,
the main problem is to produce an accurate classification level by reducing
the misclassification levels namely False Negative (FN) and False Positive
(FP) caused by the presence of confusion factors in the study area in order to
achieve higher overall accuracy (Oac) and Kappa Index of Agreement (KIA).
While current data fusions seem to be insufficiently effective, particular
purposes urge to force good results from a single data. The investment to
prepare the data and the dependency on frequent data collection are taken into
account.”
7
Based on the above challenges, some factors need to be addressed by the
possible solution. The first factor is related to insufficient knowledge on the
advantages and disadvantages of particular methods in classifying a coastal mixed-
land zone caused by the different experiment settings. Thus, the results are
incomparable and the actual potential of the methods remains questionable. Thus,
this study aims to investigate a number of selected methods to recognize their
capabilities in terms of accuracy level and identify the contributors of the
misclassifications.
The second factor is the unsatisfactory performance of current data fusion
techniques which may lead to waste of investment. This condition is mainly caused
by the inability of the data fusions to fully utilize the potential of the data in order to
manage the misclassification factors that exist in a coastal mixed-land zone.
Realising these facts, after the advantages and disadvantages of the data and methods
have been identified, this study aims to properly utilize them in order to reduce the
FN and FP levels by considering several options of data fusion techniques. The third
factor is the urge to reduce the investment in preparing the data based on the demand
of high frequency of data collection. Forcing good results from a single data lead to
the use of learning algorithms which is among the considerable options. While
current MCS is at unsatisfactory level and consists of too many classifiers, this study
aims to produce a method based on MCS by decreasing the number of classifiers
involved, which is able to reduce time consumption as well as producing good
results.
1.5 Objectives of the Study
The goal of this study is to develop a method that can produce high
classification levels for vegetation, urban terrain and water segments in the presence
of various confusion factors. This can be objectified into:
8
(i) To perform a comparative study which investigates the potential of
each selected data and method in terms of classifying vegetation,
urban terrain and water segments in coastal mixed-land zone.
(ii) To develop a method that utilizes the capabilities of the data for
vegetation and urban terrain which is able to manage the
misclassification factors that exist in coastal-mixed land zone.
(iii) To establish a method based on MCS that is able to separate various
water types in a coastal mixed-land zone from other segments as well
as reducing the number of classifier combinations.
1.6 Scope and Significance of the Study
In this study, the remote sensing data used are LPC, AP, QB and LDST
images obtained from GIS solution provider and Malaysian Remote Sensing Agency
(ARSM: http://www.remotesensing.gov.my/). The proposed method consists of two
components: Three-tier Detection (TTD) and Multi-level Synergy (MLS). TTD is
used to solve the issues of vegetation and urban terrain classification while MLS is
for water classification. The proposed method is conducted in a coastal mixed-land
zone exemplified by Bukit Kanada in Sarawak. The performance is then measured
using reliable statistical indices which are completeness (Cm), correctness (Cr),
omission error (Oe), commission error (Ce), branching factor (Bf), miss factor (Mf),
quality (Q), Oac, KIA, and McNemar’s test. The scope of this study is simplified in
Figure 1.3.
The significance of this study can be branched according to the SOI: (i)
vegetation and urban terrain; and (ii) water. For vegetation and urban terrain, the
proposed method can be served in management, planning and development sectors
such as estate, oil palm plantations, city and housing. In other aspects, it can also be
used for detection of land use changes such as soil erosion and deforestation and
natural disaster management such as flood. The end results of water classification are
9
LiDAR point cloud
Aerial photography
Quickbird images
Landsat 7 ETM+ images*
Bukit Kanada in Sarawak
Three-tier
Detection (TTD)
Completeness
Correctness
Multi-level
Synergy (MLS)
Omission error
Commission error
Branching factor
Miss factor
Quality
Overall accuracy
Kappa Index of Agreement
McNemar’s test*
Figure 1.3 Scope of the study. Note that “*” indicates the involvement in water
classification only.
also substantial. Other than mapping the river, lake and reservoirs, the mapping of
coastal area is very important for many countries to define and monitor large national
coastline, create maps and charts, and monitor environmental change. Nautical charts
which are among the coastal mapping products are fundamental tools to mariners in
planning voyages and navigating ships using the shortest, safest, and the most
economical routes. Besides, coastal mapping is performed for coastal change
assessment which is to determine the changing rate of the coast, which can help with
future planning. It is conducted by measuring the differences in the past and present
shoreline locations. The ‘before’ and ‘after’ comparison is one way of how the
scientists determine shoreline change.
Data
Proposed
method
Evaluation
methods
Scope of
the Study
Area-of-interest
10
1.7 Organization of the Thesis
This thesis is organized into seven chapters. A brief description on the
content of each chapter is given below:
(i) Chapter 1 defines the challenges, problems, current methods,
objectives, scope and significance of the study.
(ii) Chapter 2 reviews the main subjects of interest which are coastal
mixed-land zone classification, remote sensing data, data fusions and
the application of remote sensing data with learning algorithm(s).
(iii) Chapter 3 provides the design of the computational method that
supports the objectives of the study. This includes research framework,
data sources, instrumentation and analysis of results.
(iv) Chapter 4 presents the comparative study of stand-alone data
application whereby several existing techniques were evaluated using
several statistical indices. The objectives are to produce comparable
results between the techniques and identify the contributors of FN and
FP.
(v) Chapter 5 describes the proposed TTD which is the data fusions used to
reduce the FN and FP levels for vegetation and urban terrain
classification in coastal mixed-land zone.
(vi) Chapter 6 describes another component of the proposed method namely
MLS. MLS synergizes single remote sensing data with the fusions of
learning algorithms used to solve the issues of water classification.
MLS fulfils the requirements of high frequency of data collection.
(vii) Chapter 7 draws general conclusions of the accomplished results and
presents the contributions of the study as well as suggests several ideas
for related future works.
REFERENCES
Al-Ahmadi, F. S. and Hames, A. S. (2009). Comparison of four classification
methods to extract land use and land cover from raw satellite images for some
remote arid areas, Kingdom of Saudi Arabia. Earth Sciences. 20(1): 167–191.
Alajlan, N., Bazi, Y., Melgani, F., and Yager, R. R. (2012). Fusion of supervised and
unsupervised earning for improved classification of hyperspectral images.
Information Sciences. 217(1): 39–55.
Amarsaikhan, D., Blotevogel, H. H., Van Genderen, J. L., Ganzorig, M., Gantuya,
R., and Nergui, B. (2010). Fusing high-resolution SAR and optical imagery for
improved urban land cover study and classification. International Journal of
Image and Data Fusion. 1(1): 83–97.
Awrangjeb, M., Ravanbaksh, M., and Fraser, S. C. (2010). Automatic detection of
residential buildings using LiDAR data and multispectral imagery. ISPRS
Journal of Photogrammetry and Remote Sensing. 65(1): 457–467.
Babykalpana, Y. and Thanushkodi, K. (2011). Classification of land use and land
cover change detection using remotely sensed data. International Journal of
Computer Science and Engineering. 3(4):1638–1644.
Brennan, R. and Webster, T. L. (2006). Object-oriented land cover classification of
LIDAR-derived surfaces. Canadaian Journal of Remote Sensing. 32(2): 167–
172.
Campos, N., Lawrence, R., McGlynn, B., and Gardner, K. (2010). Effects of LiDAR-
Quickbird fusion on object-oriented classification of mountain resort
development. Journal of Applied Remote Sensing. 4(1): 1–14.
Ceamanos, X., Waske, B., Benediktsson, J. A., Chanussot, J., Fauvel, M., and
Sveinsson, J. R. (2010). A classifier ensemble based on fusion support vector
machines for classifying hyperspectral data. International Journal of Image and
Data Fusion. 1(4): 293–307.
107
Chen, L., Teo, T., Shao, Y., Lai, Y., and Rau, J. (2004). Fusion of LIDAR data and
optical imagery for building modelling. International Archives of the
Photogrammetry, Remote Sensing and Spatial Information Sciences. 35(Part B4):
732–737.
Cohen, J. A. (1960). Coefficient of agreement for nominal scales. Educational and
Psychological Measurement. 20(1): 37–46.
Demir, N., Poli, D., and Baltavias, E. (2008). Extraction of building and trees using
images and LiDAR data and a combination of various methods. International
Archives of the Photogrammetry, Remote Sensing and Spatial Information
Sciences. 37(Part B4/1): 313–318.
Demir, N., Poli, D., and Baltavias, E. (2009). Detection of buildings at airport sites
using images and LiDAR data and combination of various methods. International
Archives of the Photogrammetry, Remote Sensing and Spatial Information
Sciences. 38(Part 3/W4): 71–76.
Devi, R. M. and Baboo, S. S. (2011). Land Use and Land Cover Classification using
RGB&L based Supervised Classification Algoritm. International Journal of
Computer Science and Engineering Technology. 2(10):167–180.
Du, P., Xia, J., Zhang, W., Tan, K., Liu, Y., and Liu, S. (2012). Multiple classifier
system for remote sensing image classification: A review. Sensors. 12(4): 4764–
4792.
Du, P., Liu, S., Xia, J. and Zhoa, Y. (2013). Information fusion techniques for change
detection from multi-temporal remote sensing images. Information Fusion. 14
(1): 19–27.
Elghazali, E. S. (2011). Performance of Quickbird image and LIDAR data fusion for
2d/3d city mapping. Australian Journal of Basic and Applied Sciences. 5(11):
1588–1600.
Eliason, E. M. and McEwen, A. S. (1990). Adaptive box filters for removal of
random noise from digital images. Photogrammetric Engineering and Remote
Sensing. 56(4): 453.
Er-Raki, S., Chehbouni., A., and Duchemin, B. (2010). Combining satellite remote
sensing data with the FAO-56 dual approach for water use mapping in irrigated
wheat fields of a semi-arid region. Remote Sensing. 2(1): 375–387.
108
Foody, G. M., Boyd, D. S., and Sanchez-Hernandez, C. (2007). Mapping a specific
class with an ensemble of classifiers. International Journal Remote Sensing.
28(8): 1733–1746.
Frost, V. S., Stiles, J. A., Shanmugan, K. S., and Holtzman, J. C. (1982). A model for
radar images and its application to adaptive digital filtering of multiplicative
noise. IEEE transactions on Pattern Analysis and Machine Intelligent. 4(2): 157–
166.
Garcia-Gutierrez, J., Goncalves-Seco, L., and Riquelme-Santos, J. C. (2011).
Automatic environmental quality assessment for mixed land zone using LiDAR
and intelligent techniques. Expert System with Applications. 38(6): 6805–6813.
Guan, H., Ji, Z., Zhong, L., Li, J., and Ren, Q. (2012). Partially supervised
hierarchical classification for urban features from LIDAR data with aerial
imagery. International Journal of Remote Sensing. 34(1): 190–210.
Heipke, C., Mayer, H., Wiedemann, C., and Jamet, O. (1997). Evaluation of
automatic road extraction. International Achieves of Photogrammetry and
Remote Sensing. 32(Part 3–2W3): 47–56.
Hermosilla, T., Ruiz, L. A., Recio, J. A., and Estornell, J. (2011). Evaluation of
automatic building detection approaches combining high resolution images and
LiDAR data. Remote Sensing. 3(6): 1188–1210.
Hill, R. A. and Thomson, A. G. (2005). Mapping woodland species composition and
structure using airborne spectral and LIDAR data. International Journal of
Remote Sensing. 26(17): 3763–3779.
Ho, L. T. K., Umitsu, M., and Yamaguchi, Y. (2010). Flood hazard mapping by
satellite images and SRTM DEM in Vu Gia-Thu Bon Alluvial Plain, Central
Vietnam. International Archives of the Photogrammetry, Remote Sensing and
Spatial Information Science. 38(8): 275–280.
Hostache, R., Matgen, P., and Wagner, W. (2012). Change detection approaches for
flood extent mapping: how to select the most adequate reference image from
online archives?. International Journal of Applied Earth Observation and
Geoinformation. 19(1): 205–213.
Houtsitjoki, H. and Juhola, M. Comparing the one-vs-one and one-vs-all methods in
Benthic macroinvertebrate image classification. Proceedings of the 7th
International Conference on Machine Learning and Data Mining in Pattern
109
Recognition. August 30-September 03, 2011. New York, USA: Machine
Learning and Data Mining. 2011. 399–413.
Huth, J., Kuenzer, C., Wehrmann, T., Gebhardt, S., Tuan, V. Q., and Dech, S.
(2012). Land cover and land use classification with TWOPAC: towards
automated processing for pixel and object based image classification. Remote
Sensing. 4(9): 2530–2553.
Hyde, P., Dubaya, R., Walker, J. B., Holten, M., and Hunsaker, C. (2006). Mapping
forest structure for wildlife habitat analysis using multi-sensor (LiDAR,
SAR/InSAR, ETM+, Quickbird) synergy. Remote Sensing Environment. 102(1-
2): 63–73.
Jun, G. and Ghosh, J. (2011). Spatially adaptive semi-supervised learning with
Gaussian process for hyperspectral data analysis. Statistical Analysis and Data
Mining. 4(4): 358–371.
Kay, S., Hedley, J. D., and Lavender, S. (2009). Sun glint correction of high and low
spatial resolution images of aquatics scenes: A review methods for visible and
near-infrared wavelengths. Remote Sensing. 1(4): 697–730.
Khoshelham, K., Nardinocchi, C., Frontoni, E., and Mancini, A. (2010). Performance
evaluation of automated approaches to building detection in multi-source aerial
data. ISPRS Journal of Photogrammetry and Remote Sensing. 65(1): 123–133.
Landis, J. R. and Koch, G. G. (1977). The measurement of observer agreement for
categorical data. Biometrics. 33(1): 159–174.
Lee, J. S. (1986). Speckle suppression and analysis for synthetic aperture radar
images. Optical Engineering. 25(1): 636–643.
Lee, T. M. and Yeh, H. C. (2009). Applying remote sensing technique to monitor
shifting wetland vegetation: A case study of Danshui River estuary mangrove
community, Taiwan. Ecological Engineering. 35(4): 487–496.
Loosvelt, L., Peters, J., Skriver, H., Lievens, H., Collie, F. V., Beats, B. D. and
Verhoest, N. E. C. (2012).Random Forest as a tool for estimating uncertainty at
pixel in SAR image classification. International Journal of Applied Earth
Observation and Geoinformation. 19(1): 173–184.
Lopes, A., Touzi, R., and Nezry, E. (1990). Adaptive speckle filters and scene
heterogeneity. IEEE Transactions on Geoscience and Remote Sensing. 28(6):
992–1000.
110
McCombs, J. W., Roberts, S. D., and Evans, D. L. (2003). Influence of fusing
LIDAR and multi-spectral imagery on remotely sensed estimates of stand density
and mean tree height in managed loblolly pine plantation. Forest Science. 49(3):
457–466.
Naidoo, L., Cho, M. A., Mathieu, R., and Asner, G. (2012). Classification of savanna
tree species, in the Greter Kruger National Park Region, by intergrating
hyperspectral and LiDAR data in a Random Forest data mining environment.
ISPRS Journal of Photogrammetry and Remote Sensing. 69(1):167–179.
Pérez-Hoyos, A., Garcia-Haro, F. J., and San-Minguek-Ayanz, J. (2012). A
methodology to generate a synergetic land-cover map by fusion of difference
land-cover products. International Journal of Applied Earth Observation and
Geoinformation. 19(1): 72–87.
Perumal, K. and Bhaskaran, R. (2010). Supervised classification performance of
multispectral images. Journal of Computing. 2(2): 124–129.
Plaza, J., Plaza, A. J., and Barra, C. (2009). Multi–channel morphological profiles for
classification of hyperspectral images using support vector machines. Sensors.
9(1): 196–218.
Reinart, A., Herlevi, A., Arst, H., and Sipelgas, L. (2003). Preliminary optical
classification of lakes and coastal waters in Estonia and south Finland. Journal of
Sea Research. 49(4): 357–366.
Rodriguez-Galiano, V. F., Ghimire, B., Rogan, J., Chicha-Olmo, M., and Rigol-
Sanchez, J. P. (2012). An assessment of the effectiveness of a random forest
classifier for land cover classification. ISPRS Journal of Photogrammetry and
Remote Sensing. 67(1): 93–104.
Rottensteiner, F., Trinder, J., Clode, S., and Kubik, K. (2005). Using the Dempster
Shafer method for the fusion of LiDAR data and multi-spectral images for
building detection. Information Fusion. 6(4): 283–300.
Rottensteiner, M., Trinder, J., Clode, S., and Kubuk, K. (2007). Building detection
by fusion of airborne laser scanner data and multi-spectral images: performance
evaluation and sensitivity analysis. ISPRS Journal of Photogrammetry and
Remote Sensing. 62(2): 135–149.
Rutzinger, M., Rottensteiner, F., and Pfeifer, N. (2009). A comparison of evaluation
techniques for building extraction from air-bone laser scanning. IEEE Journal of
Selected Topics in Applied and Remote Sensing. 2(1): 11–20.
111
Salah, H., Trinder, J., Shaker, A., Hamed, M., and Elsagheer, A. (2009). Aerial
image and LIDAR data fusion for automatic feature extraction using the Self-
Organization Map (SOM) classifier. Proceedings of the International Achieves of
Photogrammetry and Remote Sensing. September 1-2, 2009. Paris, France :
International Achieves of Photogrammetry and Remote Sensing. 317–322.
Sohn, G. and Dowman, I. (2007). Data fusion of high resolution satellite imagery and
LiDAR data for automatic building extraction. ISRPS Journal of
Photogrammetery and Remote Sensing. 62(1): 43–63.
Starek, M. J., Vemula, R., and Slatton, K. C. (2012). Probability detection of
morphology indicators for beach segmentation with multitemporal LiDAR
measurement. IEEE Transaction on Geoscience and Remote Sensing. 4(1): 1–12.
Szuster, B. W., Qi, C., and Borger, M. (2011). A comparison of classification
techniques to support land cover and land use analysis in tropical coastal zone.
Applied Geography. 31(2): 525–532.
Taharzadeh, E. and Shafri, H. Z. M. Using hyperspectral remote sensing data
mapping over Kuala Lumpur. Proceedings of the 2011 Joint Urban Remote
Sensing Event. April 11-13, 2011. Munich, Germany : Joint Urban Remote
Sensing Event. 2011. 405–408.
Tuia, D., Ratle, F., Kanevski, M. F., and Emery, W. J. (2010). Active learning
methods for remote sending image classification. IEEE Transaction on
Geoscience and Remote Sensing. 47(7): 2218–2232.
Vu, T., Yamazaki, F., and Matsuoka, M. (2009). Multi-scale solution for building
extraction from LiDAR and image. International Journal of Applied Earth
Observation and Geoinformation. 11(4): 281–289.
Waske, B., Van Der Linden, S., Benediktsson, J. A., Rabe, A., and Hostert, P.
(2010). Sensitivity of support vector machines to random feature selection in
classification of hyperspectral data. Geosciences and Remote Sensing. 48(7):
2880–2889.
Wu, J., Cai, Z., and Goa, E. (2010). Dynamic K-Nearest Neighbor with distance and
attribute weighted for classification. ISPRS Journal of Photogrammetry and
Remote Sensing. 66(1): 798–808.
Yuan, H., Van Der Wiele, C. F., and Khorarm, S. (2009). An automated artificial
neural network system for land use/land cover classification from Landsat TM
imagery. Remote Sensing. 1(3): 243–265.
112
Zhang, D. H., Zhang, H. K., Xie, B., Huang, Z. Q., Yu, L., and Cao., Y. Y. Analysis
of image fusion and classification for high resolution SAR data online.
Proceedings of the 2nd International Conference on Education Technology and
Computer. June 22-24, 2010. Shanghai, China: Education Technology and
Computer. 2010. 267–271.
Zheng, M., Cai, Q., and Qing, M. (2009). The effect of prior probabilities in the
maximum likelihood classification on individual classes: A theoretical reasoning
and empirical testing. Photogrammetric Engineering and Remote Sensing. 75(9):
1109–1117.