+ All Categories
Home > Documents > Remote sensing-based land cover classification and change...

Remote sensing-based land cover classification and change...

Date post: 25-Feb-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
70
Department of Thematic Studies Environmental Change MSc Thesis (30 ECTS credits) Science for Sustainable Development Malena Hesping Supervisor: Martin Karlson, PhD (Linköping University, Department of Thematic Studies) Co-supervisor: Markus Immitzer, PhD (BOKU University of Natural Resources and Life Sciences, Vienna, Department of Landscape, Spatial and Infrastructure Sciences) Remote sensing-based land cover classification and change detection using Sentinel-2 data and Random Forest A case study of Rusinga Island, Kenya Linköpings universitet, SE-581 83 Linköping, Sweden
Transcript
Page 1: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

Department of Thematic Studies

Environmental Change

MSc Thesis (30 ECTS credits)

Science for Sustainable Development

Malena Hesping

Supervisor: Martin Karlson, PhD (Linköping University, Department of Thematic Studies)

Co-supervisor: Markus Immitzer, PhD (BOKU University of Natural Resources and Life Sciences,

Vienna, Department of Landscape, Spatial and Infrastructure Sciences)

Remote sensing-based land cover

classification and change detection using

Sentinel-2 data and Random Forest

A case study of Rusinga Island, Kenya

Linköpings universitet, SE-581 83 Linköping, Sweden

Page 2: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

Copyright

The publishers will keep this document online on the Internet – or its possible replacement –

for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to

download, or to print out single copies for his/her own use and to use it unchanged for non-

commercial research and educational purpose. Subsequent transfers of copyright cannot revoke

this permission. All other uses of the document are conditional upon the consent of the

copyright owner. The publisher has taken technical and administrative measures to assure

authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her

work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures

for publication and for assurance of document integrity, please refer to its www home page:

http://www.ep.liu.se/.

© Malena Hesping

Page 3: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-
Page 4: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

i

Table of contents

List of figures ...................................................................................................................................... ii

List of tables ........................................................................................................................................ ii

Abstract ...................................................................................................................................... 1

List of abbreviations .................................................................................................................. 1

1. Introduction ........................................................................................................................... 3

1.1 Land use and land cover change in Africa and Kenya ............................................................. 3

1.2 Remote sensing for land use and land cover change monitoring ............................................ 4

1.3 Vegetation to reduce land degradation ...................................................................................... 5

1.4 Aim of this study .......................................................................................................................... 7

2. Materials and methods .......................................................................................................... 8

2.1 Study area ..................................................................................................................................... 8

2.2 Sentinel-2 data and pre-processing .......................................................................................... 10 2.2.1 Software ............................................................................................................................................... 12 2.2.2 Data acquisition and pre-processing .................................................................................................... 12 2.2.3 Vegetation indices ................................................................................................................................ 15

2.3 Classification scheme ................................................................................................................. 17

2.4 Reference data ............................................................................................................................ 20

2.5 Random Forest land cover classification model development ............................................... 23 2.5.1 Predictor datasets ................................................................................................................................. 24 2.5.2 Input feature selection and feature importance ranking ....................................................................... 25 2.5.3 Model accuracy assessment ................................................................................................................. 26

2.6 Post processing and land cover map creation ......................................................................... 27

2.7 Change detection ........................................................................................................................ 28

3. Results .................................................................................................................................. 31

3.1 Random Forest classification model selection ......................................................................... 31

3.2 Accuracy assessment.................................................................................................................. 33

3.3 Feature importance.................................................................................................................... 35

3.4 Land cover maps ........................................................................................................................ 39

3.5 Land cover change detection .................................................................................................... 43

4. Discussion ............................................................................................................................ 46

4.1 Sentinel-2 data for Random Forest land cover classification ................................................ 46 4.1.1 Single-date vs. multi-temporal datasets ............................................................................................... 46 4.1.2 Classification scheme ........................................................................................................................... 47 4.1.3 Seasonal differences and phenology .................................................................................................... 49 4.1.4 Vegetation indices ................................................................................................................................ 49

Page 5: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

ii

4.1.5 Rusinga land cover maps ..................................................................................................................... 50

4.2 Post-classification change detection for land cover change and vegetation monitoring ..... 51

4.3. Concluding remarks and future outlook ................................................................................ 52

Acknowledgements .................................................................................................................. 53

References ................................................................................................................................ 54

List of figures

Figure 1: Study area. ................................................................................................................................................ 8 Figure 2: Erosion and vegetation restoration activities on Rusinga Island. ......................................................... 10 Figure 3: Workflow of data acquisition and geometric pre-processing. ............................................................... 15 Figure 4: Average spectral signatures of the five land cover classes and the four land cover classes. ................ 19 Figure 5: Locations and distribution of reference samples. .................................................................................. 22 Figure 6: Workflow of the land cover classification and change detection. . ........................................................ 30 Figure 7: Input feature importance ranking measured by the mean decrease in accuracy. .................................. 36 Figure 8: Normalised feature importance score of the individual dates compared to the mean NDVI of all scenes

of the overall land area of Rusinga Island. ............................................................................................................ 38 Figure 9: Land cover maps of Rusinga Island. ...................................................................................................... 39 Figure 10: Rusinga Island land cover map with additional buildings data from OpenStreetMap. ....................... 40 Figure 11: Comparison of the land cover map produced in this study with existing (global) land cover maps. .. 42 Figure 12: Land cover change map of Rusinga Island between May 2016 – April 2017 and May 2018 – April

2019. ....................................................................................................................................................................... 43 Figure 13: Land cover change maps of Rusinga Island between May 2016 – April 2017 and May 2018 – April

2019 highlighting vegetation increase and decrease. ............................................................................................ 44

List of tables

Table 1: Technical specifications of the Sentinel-2 satellites. ................................................................................ 11 Table 2: Overview of Sentinel-2 scenes selected for analysis.. .............................................................................. 13 Table 3: Vegetation indices used in this study. ...................................................................................................... 17 Table 4: Description of land cover classes. ........................................................................................................... 19 Table 5: Selected scenes for the predictor datasets. .............................................................................................. 25 Table 6: Classification performance based on internal OOB validation. .............................................................. 32 Table 7: Confusion matrix of the multi-temporal Random Forest land cover classification model after feature

selection (based on internal OOB validation). ....................................................................................................... 33 Table 8: Accuracy assessment results based on the independent test dataset. ...................................................... 34 Table 9: Confusion matrices of the classification predictions and the actual classes of the independent test

dataset. .................................................................................................................................................................... 35 Table 10: Input feature importance ranking grouped by spectral band and vegetation index. ............................. 37 Table 11: Input feature importance ranking grouped by date. .............................................................................. 38 Table 12: Adopted from Table 8 in results chapter: accuracy assessment results including number of scenes and

number of features included in the predictor dataset after feature selection. ........................................................ 47

Page 6: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

1

Abstract

Healthy forests and soils are crucial for the very existence of mankind as they provide food,

clean water and air, shade and protection against floods and storms. With their photosynthetic

carbon storage ability, they mitigate climate change and fertilise and stabilise soils.

Unfortunately, deforestation and the loss of fertile soils are the bleak reality and among the

world’s most pressing challenges. Over the past decades Kenya has faced severe deforestation,

but efforts are being undertaken to reverse deforestation, revegetate degraded land and combat

erosion. Satellite remote sensing technology becomes increasingly useful for vegetation

monitoring as the data quality improves and the costs decrease. This thesis explores the

potential of free open access Sentinel-2 data for vegetation monitoring through Random Forest

land cover classification and post-classification change detection on Rusinga Island, Kenya.

Different single-date and multi-temporal predictor datasets differentiating respectively between

five and four classes were examined to develop the most suitable model. The classification

achieved acceptable results when assessed on an independent test dataset (overall accuracy of

90.06% with five classes and 96.89% with four classes), which should however be confirmed

on the ground and could potentially be improved with better reference data. In this study,

change detection could only be analysed over a time frame of two years, which is too short to

produce meaningful results. Nevertheless, the method was proven conceptually and could be

applied in the future to monitor land cover changes on Rusinga Island.

Keywords: Land cover classification, post-classification change detection, Random Forest,

remote sensing, Sentinel-2

List of abbreviations

AOT Aerosol Optical Thickness

ASTER Advanced Spaceborne Thermal Emission and Reflection Radiometer

AVHRR Advanced Very High-Resolution Radiometer

CCI Climate Change Initiative (ESA)

CNES National Centre for Space Studies (France)

DEM Digital Elevation Model

ENVISAT - MERIS Environmental Satellite - Medium Resolution Imaging Spectrometer

EOSDIS Earth Observing System Data and Information System

ESA European Space Agency

ETM+ Enhanced Thematic Mapper Plus (Landsat)

EU European Union

FAO United Nations Food and Agricultural Organisation

GIS Geographic Information System

GNDVI Green Normalised Difference Vegetation Index

GRVI Green Red Vegetation Index

HJ-1 Huanjing-1 (satellite system, China)

IPCC Intergovernmental Panel on Climate Change

Page 7: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

2

IRECI Inverted Red-edge Chlorophyll Index

ITPS Intergovernmental Technical Panel on Soils

LiDAR Light Detection and Ranging

LP DAAC Land Processes Distributed Active Archive Center

MDA Mean Decrease in Accuracy

MDG Mean Decrease Gini

METI Ministry of Economy, Trade, and Industry (Japan)

MSI Multi-spectral Instrument

NASA National Aeronautics and Space Administration (United States of

America)

NDII Normalised Difference Infrared Index

NDVI Normalised Difference Vegetation Index

NIR Near-infrared

OA Overall Accuracy

OOB Out-of-bag

PA Producer’s Accuracy

PROBA-V Project for On-board Autonomy - Vegetation

S2 Sentinel-2

SAVI Soil-adjusted Vegetation Index

SPOT Satellites Pour l’Obversation de la Terre (CNES)

SWIR Short Wave Infrared

TM 5 Thematic Mapper 5 (Landsat)

UA User’s Accuracy

UN United Nations

UNEP United Nations Environment Programme

USGS United States Geological Survey

UTM/WGS Universal Transverse Mercator / World Geodetic System

VHR Very High Resolution

VRE Vegetation Red Edge

WCED World Commission on Environment and Development

WV Water Vapour

Page 8: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

3

1. Introduction

The surface of the Earth is constantly changing for a great variety of reasons, many of which

are manmade. According to the recent special report on land by the Intergovernmental Panel

on Climate Change (IPCC, 2019) more than 70% of the global land surface is directly affected

by human use. Land use and land cover change, which often occurs in form of deforestation or

conversion from naturally vegetated areas to fields for agricultural use, is a major driver of

climate change, biodiversity loss and land degradation, which in turn have negative

implications for carbon cycling, ecosystems, and food security (FAO, 2016c; FAO & ITPS,

2015; IPCC, 2019; UNEP, 2016). Mapping and visualising these changes support

understanding their patterns, causes, and implications. Monitoring and analysing land use and

land cover change has emerged as a field of scientific research and found application in both

public and private sectors. Land cover monitoring provides valuable insights to advice policy-

and decision-making on different levels and influence strategies and regulations for

development and land use (Lunetta, Knight, Ediriwickrema, Lyon, & Worthy, 2006). The aim

of this thesis is to explore the suitability of free open access satellite data to classify land cover

and to detect land cover changes for vegetation monitoring purposes. The Kenyan island

Rusinga in Lake Victoria served as a case study.

This chapter introduces the topic of land use and land cover change in Africa and Kenya in the

current global development and climate change context. Moreover, it introduces remote sensing

and presents its application for land cover classification and change detection. This is followed

by a brief introduction of land degradation, soil erosion, and the role vegetation cover plays as

control measure to counteract erosion and land degradation. Finally, the aim of this study is

presented together with the research questions, which led the work. Chapter 2 presents the data

and methods used in this study and starts with a description of the study area. It then describes

the satellite and software and continues with the data acquisition and pre-processing, followed

by the description of the classification schemes and reference data. Next, the Random Forest

(Breiman, 2001) algorithm is introduced including a description of the predictor datasets used

in this study, feature selection and the accuracy assessment strategy. This is followed by a brief

description of the post-processing and map creation methods as well as the methodology for

the change detection. Chapter 3 presents the results of this study starting with the Random

Forest classification results including the feature selection and accuracy assessment. Then the

results of the feature importance analysis are presented before visualising the land cover maps.

Finally, the results of the change detection are presented by change maps. Chapter 4 provides a

discussion of the results guided by the two primary research questions including the sub-

questions and puts the outcomes in a wider context. Finally, it provides concluding remarks on

the suitability of land cover classification and change detection using Sentinel-2 data, Random

Forest and remotely collected reference data for vegetation monitoring by local organisations

on Rusinga Island.

1.1 Land use and land cover change in Africa and Kenya

Most dramatic land use and land cover change in Africa is seen in the reduction of forest cover,

often resulting from agricultural expansion and urbanisation. Additionally, mining causes land

Page 9: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

4

use and land cover change in Africa, directly through clearing of forest for mining activities,

and indirectly through settlement of workers and the subsequently increased need for

agricultural production in the region (UNEP, 2016). Deforestation remains a major problem in

Africa. While the African continent held only 15.6% of the world’s total forest area in 2015,

84% of the global forest loss between 2010 and 2015 occurred in Africa (FAO, 2016b). This is

an immense deforestation rate for the rather small share of forest cover. Increasing world

population, economic growth and investment in large-scale commercial agriculture are the main

drivers for forest loss and other land cover changes in Africa (UNEP, 2016). African population

is projected to increase by 115% between 2013 and 2050 (FAO & ITPS, 2015). Increasing

wealth and shifting dietary preferences towards more livestock-based foods increase the need

for agricultural land and puts pressure on the food production system and soil health (FAO,

2013; Montanarella et al., 2016). Subsequent unsustainable land use and agricultural practices,

such as cropland expansion, increasing livestock populations, large-scale monocultures and

chemical fertilisers as well as deforestation and over exploitation of natural resources cause

increasing soil erosion, soil fertility loss, and biodiversity loss (Borrelli et al., 2017; Cebecauer

& Hofierka, 2008; FAO & ITPS, 2015; Pimentel & Kounang, 1998). Deforestation and poor

soil quality have implications for carbon cycling and the climate. Climate change is another

important driver of land cover change, but at the same time changes in land use and land cover

contribute to climate change. Some land use and land cover change becomes inevitable as

climate change contributes to soil- and land degradation (UNEP, 2016). Such degraded land

becomes useless for agriculture or natural vegetation and often turns into wasteland.

Conversely, changes such as deforestation, agricultural expansion, mining or urbanisation

contribute to climate change as they reduce natural carbon sinks, cause pollution and

greenhouse gas emissions and drive soil degradation.

1.2 Remote sensing for land use and land cover change monitoring

Remote sensing, which is the acquisition of geospatial data from space or air, plays a vital role

for the analysis and monitoring of land use and land cover change and vegetation dynamics.

Remote sensing systems can be used as an alternative or complementary method to traditional

ground-based data collection, which can be labour intensive, time consuming and expensive.

Other advantages of remote sensing are that these systems can cover large areas and long

periods of time and enable consistent observations with high revisit frequency. Moreover, they

are not disturbing the landscape and enable researchers to collect data in otherwise inaccessible

areas such as mountain regions, glaciers and jungles (Willis, 2015). Defining properties of

remote sensing systems are their spatial, temporal, radiometric and spectral resolution. The

spatial resolution describes the pixel size of the sensed image, which represents the area covered

on the ground. There is some discrepancy in the categorisation of the systems by their spatial

resolution (Karlson, 2015; Rees, 2012). However, Sentinel-2 data, which were used in this

study and have a spatial resolution of 10 m, can be classified as high resolution images (ESA,

2015), while very high resolution (VHR) images have pixel sizes of 5 m or smaller and the

pixel size of medium resolution images ranges from 50 to 500 m. Any coarser pixel size is

referred to as low resolution (Rees, 2012). The temporal resolution describes the revisit time of

the sensor system, thus how frequently a place is covered by the sensing instrument (Karlson,

Page 10: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

5

2015; Lillesand, Kiefer, & Chipman, 2015). The sensors used in remote sensing detect

electromagnetic radiation of sunlight reflected from ground objects. The radiometric resolution

describes the capacity of the sensor to distinguish differences in light intensity. Multi-spectral

sensors, such as the one used for the collection of Sentinel-2 data, sense the radiation at different

wavelength ranges within the electromagnetic spectrum. One multi-spectral image is then

composed of a set of different spectral bands, each containing the reflectance data of a specific

wavelength range. The spectral resolution defines the ability of the sensor to differentiate

different wavelengths. Different types of ground surface have distinct reflectance properties

along the electromagnetic spectrum, called spectral signatures. The spectral signatures are

useful to detect and classify different types of land cover. Moreover, they are used to create

indices that provide more specific information on certain ground cover types. Various

vegetation indices utilise the distinct spectral signature of green vegetation (Albertz, 2009;

Lillesand et al., 2015).

Remotely sensed data has been used in numerous studies to detect, classify and monitor land

use and land cover changes, especially in degraded or vulnerable as well as in protected areas

(e.g. W. B. Cohen, Yang, Healey, Kennedy, & Gorelick, 2018; Fensholt & Proud, 2012;

Frampton, Dash, Watmough, & Milton, 2013; Islam, Jashimuddin, Nath, & Nath, 2018; Rawat

& Kumar, 2015; Turner et al., 2015; Willis, 2015). Furthermore, it is useful for understanding

and assessing impacts of land use and land cover change on erosion risk and to evaluate and

develop land management strategies (e.g. Leh, Bajwa, & Chaubey, 2013; Nyberg et al., 2015;

Willis, 2015). Remote sensing for land cover classification and change detection has emerged

as an intensely studied field of research, which has developed a large number of methodologies

and techniques to study land use and land cover change of various natures. In pixel-based

approaches spectral pattern recognition is used to categorise each image pixel according to their

spectral signatures and assign them a class (Lillesand et al., 2015). The Random Forest

classification algorithm (Breiman, 2001), which is described in more detail in Chapter 2.5, can

be used to predict pixel classes based on a predictor dataset and a training dataset, which are

fed into the model. The predictor dataset contains a set of so-called features, in this case the

spectral bands as well as vegetation indices. If it contains data sensed at one point in time, it is

referred to as single-date predictor dataset. Subsequently, multi-temporal predictor datasets

contain image data sensed at multiple points in time. The training dataset consists of

representative training areas to which the researcher assigned a class beforehand, ideally based

on ground evidence (Lillesand et al., 2015). In this study training data was collected remotely

using VHR satellite images. The study compared two different classification schemes, which

describe the different classes and were defined by the author. Change detection is one of the

most common applications of remote sensing image analysis and refers to the comparison of

an area over time. In post-classification change detection, the approach used in this study, two

images are separately classified and then the two classification outputs are compared to each

other (Lillesand et al., 2015; Tewkesbury, Comber, Tate, Lamb, & Fisher, 2015).

1.3 Vegetation to reduce land degradation

According to the IPCC (2014), global temperatures are increasing, precipitation patterns

become less predictable and extreme weather events causing droughts and floods are occurring

Page 11: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

6

more frequently. Africa is among the regions that are most severely affected by these climatic

changes (Niang et al., 2014), which cause unhealthy soils and erosion. Precipitation changes

affect soil moisture and the water holding capacity of soils, while temperature increase affects

soil formation. Soil erosion was found to be the number one threat to soil function globally and

especially in sub-Saharan Africa (FAO & ITPS, 2015). It contributes to land degradation and

affects food security and livelihoods, ecosystems and water resources and poses risks of

landslides, flooding and desertification to humans and nature. The loss of fertile topsoil

desolates agricultural productivity and reduces infiltration capacities of soils, which can cause

floods, water pollution and destruction of infrastructure (Bastola, Dialynas, Bras, Noto, &

Istanbulluoglu, 2018). A recent globally consistent comparative study assessed soil erosion

rates and found that in 2012 6.1% of the global land surface was affected by erosion with the

highest soil erosion rate of 10% found in Africa (Borrelli et al., 2017).

A solution, which is repeatedly suggested by researchers and international organisations is

sustainable land and forest management (FAO & ITPS, 2015; IPCC, 2019; Niang et al., 2014;

UNEP, 2016). It is very broadly defined, adopted from the common definition of sustainable

development (WCED, 1987), as the use of land and forest resources to meet the current

changing needs of humanity while ensuring long-term functionality and productivity of these

resources in the future. More practically this includes approaches like conservation agriculture

and forestry, agroecology including permaculture, agroforestry and perennial cropping systems

(IPCC, 2019), all of which are based on the principle of permanently covering the soil with

vegetation (FAO, 2014, 2015, 2016a; Ferguson & Lovell, 2013). The increasing awareness

these forms of land management are getting in the global arena is beneficial for climate

resilience initiatives and contributes to mitigating and adapting to the adverse effects of climate

change in many regions of the world, especially those affected by desertification or severe land

degradation and erosion. Vegetation cover is proven to prevent soil erosion by reducing run-

off, stabilising soils and infiltrating water and nutrients (Borrelli et al., 2017; Pimentel &

Burgess, 2013). For example, Bastola et al. (2018) found that both backfilling and revegetation

measures are effective in reducing gully erosion development. In the long term, however,

revegetation measures are found to be more effective although being highly dependent on the

density and strength of the roots and the revegetation management practices. Similar results

have previously been found by Gomez et al. (2003), who conclude that woody vegetation is

more effective for gully erosion prevention than grassy vegetation. Nyssen et al. (2004) prove

that a combination of sediment holding structures, such as check dams, and revegetation are

effective for gully erosion control. Increased vegetation cover does not only benefit local soils

by reducing erosion and increasing its moisture holding capacity, it also has a cooling effect on

the local and regional climate through increased evapotranspiration and acts as carbon sink,

mitigating climate change (IPCC, 2019). The Kenyan Government has included sustainable

land and forest management and extensive afforestation in its ‘Vision 2030’ (The Presidency,

2018). It also launched a ‘Greening Kenya’ initiative, which contributes to the national goal of

achieving 10% forest cover by 2022 by planting 1.8 billion trees (UN Environment, 2018). On

Rusinga Island, which is extremely deforested, a community-based organisation called

Badilisha undertakes efforts to revegetate the island to reduce erosion and land degradation and

to sustainably manage its natural resources. A similar project has proved successful in Lesotho,

Page 12: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

7

where highland communities built physical barriers and revegetated areas to combat soil

erosion and protect headwaters (Orange-Senqu River Commission, 2014).

1.4 Aim of this study

A number of institutions and projects have developed global land cover and land cover change

maps, which are freely available (e.g. ESA & Université Catholique de Louvain, 2010; ESA

CCI Land Cover Project, 2015; Hansen et al., 2013; Mayaux et al., 2003; National Geomatics

Center of China, 2010). However, many of these maps have a too coarse spatial resolution or

are not adjusted for application on a local scale. The aim of this study was to explore the

suitability of free open access high resolution Sentinel-2 satellite data and open source software

to classify and map land cover to support vegetation monitoring and erosion control on Rusinga

Island, Kenya. There is a wide range of VHR resolution satellite images available commercially

with resolutions as high as 0.3 m (DigitalGlobe, 2017). For free open access satellite images,

however, the 10 m resolution of the Sentinel-2 images is the highest spatial resolution currently

available. Using Rusinga Island as a case study, this study evaluated an established method for

remote sensing-based monitoring of vegetation changes using Random Forest (Breiman, 2001)

classification and post-classification change detection. Moreover, it aimed to support the

development of a tool, which can be used for processing Sentinel-2 imagery to identify,

visualise and analyse land cover change on Rusinga Island. The following questions led the

research in this study:

1. Are Sentinel-2 data suitable to classify land cover on Rusinga Island using

Random Forest classification and remotely collected reference data?

a. How does the performance of single-date predictor datasets

compare to the performance of multi-temporal predictor datasets?

b. How do seasonal differences in the landscape influence the

classification performance?

c. How does the definition of the classes in the classification scheme

affect the model performance?

d. How do vegetation indices contribute to classification accuracy

compared to the Sentinel-2 spectral bands?

2. Are the land cover classifications resulting from Sentinel-2 data suitable to detect

changes to support vegetation monitoring on Rusinga Island?

Page 13: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

8

2. Materials and methods

2.1 Study area

Rusinga is an island of about 40 km2 (Tryon et al., 2014) on the western border of Kenya in the

north eastern part of Lake Victoria. It stretches approximately 10 km from north to south and

12 km from east to west. Since the 1980s it is connected to the mainland by an artificial

causeway, which bridges the 250 m wide channel between the island and the mainland (Tryon

et al., 2014). The island is characterised by a number of hills of which Ligongo Hill is the

highest with 300 m above lake level (see Figure 1; Andrews, 1973; Tryon et al., 2014). Lake

Victoria itself is located 1134 m above sea level (Andrews, 1973).

Figure 1: Study area: A: Kenya in Africa; B: Rusinga Island in Kenya; C: Satellite image of

Rusinga Island including its outline, highest peak and road network; D: Digital elevation model

(DEM) of Rusinga Island. The ASTER DEM was retrieved from the online Earth Explorer,

courtesy of the NASA EOSDIS Land Processes Distributed Active Archive Center (LP DAAC),

https://earthexplorer.usgs.gov/ ASTER GDEM is a product of Japan’s Ministry of Economy,

Trade, and Industry (METI) and NASA (NASA & METI, 2011).

The climate in Kenya is characterised by two rainy seasons. The so-called ‘long rains’ are

concentrated from March to May and are more intense and more reliable than the ‘short rains’

occurring in October and November (Nicholson, 2017). However, in the lake region the

precipitation is relatively spread throughout the year so that even the driest month received a

modest amount of rain (Andrews, 1973). The lake water level as well as vegetation around Lake

Victoria are strongly influenced by local precipitation, and have been throughout the history of

A B

D C

Page 14: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

9

the lake (Tryon et al., 2014). However, it is perceived that climate change causes shifting

precipitation patterns, and droughts are becoming more common (Badilisha, n.d.). These

observations and concerns are confirmed by science observing and predicting more extreme

and less predictable weather events caused by climate change with increasingly negative

impacts on people living in regions that are already degraded (IPCC, 2019). In the regional

assessment for Africa (Niang et al., 2014) the IPCC reports observed and projected increases

in temperature in the past 50 to 100 years and for the 21st century. Precipitation patterns show

a less clear trend and reveal high spatial and temporal variation. Nevertheless, in eastern Africa

a decrease in precipitation was observed during the wet season (March – May). However,

models project more intense wet seasons (March – May and October – December) by the end

of the 21st century, but drier Augusts and Septembers. An increase of extreme events, such as

droughts, has been observed in East Africa over the past 30 to 60 years. Extreme precipitation

events are also projected to increase by the mid 21st century.

While it is difficult to find reliable population data for Rusinga Island, it is estimated that in

2012 the population was about 35,000 compared to only 5000 in the early 1980s (Byrne, 2013).

Official national census statistics counted a population of 24,275 on Rusinga in 2009 (Kenya

National Bureau of Statistics, 2010). However, no earlier or later statistics could be found. The

official census statistics for Homa Bay district, which includes Rusinga, report a population

increase from 745,040 in 1999 and 917,170 in 2009 to 1,131,950 in 2019 (Kenya National

Bureau of Statistics, 2012, 2019). This corresponds to an average yearly growth rate of

approximately 2.1%. The region is one of the poorest in Kenya with a large proportion of the

population living in extreme poverty. HIV/AIDS rates in the region are among the highest

nationwide. The island’s population is largely dependent on fishing for their livelihoods.

However, overfishing and the increasingly poor ecological state of Lake Victoria have caused

many residents to turn to agriculture and livestock keeping instead of or additional to fishing

(Badilisha, n.d.; Byrne, 2013; Kanyala Little Stars, n.d.). The region is considered highly food

insecure (UNEP, 2009).

The island used to be densely vegetated before deforestation increased driven by population

growth and the resulting need for building material, firewood or income generation.

Community elders narrate of lush forests covering the hills and of sufficient rain and water

resources on the island 35 to 50 years ago (Byrne, 2013). Although Andrews’ (1973) vegetation

study of Rusinga, which was published nearly five decades ago, depicts intense human

influence and anthropogenic forest clearance already in the 1970s, there is no doubt that the

situation worsened significantly since. Today the island is extremely deforested and its people

suffer from droughts, hunger, and poverty (Byrne, 2013; Mureithi, Mwagi, & Gruber, 2018).

The deforestation causes soil erosion, especially on the hill slopes, and increased risk of

landslides or mud floods as well as droughts since the soil loses the ability to capture and store

water. Land degradation decreases agricultural productivity and has negative impacts on the

island’s ecosystem. Subsequently, it undermines the livelihood of the local population. Several

large gullies have formed on Rusinga, especially on the hill slopes and on the foot of the hill

due to uncontrolled precipitation run-off during the rainy seasons over the past years as depicted

in Figure 2. Moreover, sand and gold mining are reported as major causes of land degradation

Page 15: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

10

on the island. Additionally, trees are usually regarded solely as a source of energy (as firewood

or charcoal) by the local community and are subsequently planted mainly for this purpose

(Nyaga, 2018; Okolla, 2018). To reduce these risks and to prevent erosion, revegetation efforts

have been initiated on the island. One local organisation engaging in revegetation efforts and

erosion control is the Badilisha Eco-village Foundation Trust. Badilisha is a community-based

organisation engaging in various community development projects in the fields of food security,

care for vulnerable children, livelihoods for women, education, tree planting and permaculture.

The principles of permaculture provide the base for Badilisha, which means ‘change’. The

organisation initiated erosion control efforts especially on the hill slopes since the major rainy

season in 2018 through building sediment holding structures like check dams and through

revegetation by spreading different native grass seeds and planting trees (Figure 2; Mureithi et

al., 2018; Wagenknecht, 2018).

Figure 2: Erosion and vegetation restoration activities on Rusinga Island: A: Large gully

erosion on Rusinga Island; B: Community members working on environmental restoration:

building dams and stabilising them with plants; C: Seedling nursery at Badilisha community

centre; D: Loose rock dam stabilised with vegetation (picture source: Books for Trees).

2.2 Sentinel-2 data and pre-processing

This study relied on open access data and open source software as the evaluated method is

intended to be easily replicable by non-profit organisations and other stakeholders. The satellite

imagery was retrieved from the European Space Agency’s (ESA) satellite Sentinel-2, which is

D C

A B

Page 16: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

11

a multispectral, high resolution, wide-swath twin-satellite system circling the Earth in a polar,

sun-synchronous orbit. Sentinel-2 collects data globally apart from open seas and the poles. The

first satellite was launched in June 2015, while the second followed in March 2017. Each

satellite has a revisit frequency of ten days. The satellites are phased at 180° to each other,

which allows for a combined revisit frequency of five days at the equator and two to three days

in mid-latitudes. Each satellite has a swath width of 290 km and carries a Multi-Spectral

Instrument (MSI), which collects data in 13 different spectral bands, four of which with a spatial

resolution of 10 m, six bands with 20 m resolution, and three bands at 60 m. Since the sensing

instrument works passively, using the reflectance of sunlight, the orbit synchronisation with the

sun is crucial as it minimises variations of the reflectance angle as well as potential shadows

(ESA, 2015). The technical specifications of the Sentinel-2 satellites are summarised in Table

1. The high resolution of the imagery is important for this study since it theoretically allows a

fine spatial scale and accurate classification, which is crucial for detecting small features. Since

the landscape on Rusinga Island is rather heterogenous and erosion damage usually occurs in

narrow but deep gullies, using high resolution imagery is crucial for the analysis. Moreover,

the revegetation efforts are also undertaken in small and specific areas rather than the creation

of large-scale plantations.

Table 1: Technical specifications of the Sentinel-2 satellites (ESA, 2015).

Satellites Sentinel-2 A

Sentinel-2 B

Launched in June 2015

Launched in March 2017

Sensing instrument Multi-Spectral Instrument (MSI)

Swath width 290 km

Temporal

resolution

5 days (at equator)

Spectral and

spatial resolution

Band Central

wavelength (nm)

Region Spatial

resolution (m)

1 443 Coastal aerosol 60

2 490 Blue 10

3 560 Green 10

4 665 Red 10

5 705 Vegetation red edge

(VRE)

20

6 740 Vegetation red edge 20

7 783 Vegetation red edge 20

8 842 Near-infrared (NIR) 10

8a 865 Narrow NIR 20

9 940 Water vapour 60

10 1375 Short wave infrared

(SWIR) cirrus

60

11 1610 SWIR 20

12 2190 SWIR 20

Two different Sentinel-2 data products at different processing levels are freely available for

users to download from ESA’s Copernicus Open Access Hub (ESA, 2019): level 1C, and

Page 17: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

12

level 2A data. To prepare satellite data for representation and analysis, pre-processing of the

data is needed. This involves correction of geometric and radiometric errors in the data.

Sentinel-2 L1C data is geometrically and radiometrically corrected and comes in ortho-images

of 100 km by 100 km, so-called tiles or granules. These are top-of-atmosphere reflectance

images. Sentinel-2 L2A data have the same cartographic geometry as the L1C data, but are

additionally corrected for atmospheric, topographic, and adjacency effects, and are thus called

bottom-of-atmosphere reflectance (ESA, 2015). To avoid distortions, only L2A data were used

in this study. L1C data products for the entire operating period of the satellite system are freely

available for download from the Copernicus Open Access Hub (ESA, 2019). After a successful

pilot since May 2017, ESA was working on providing readily processed L2A data products to

users, starting with the Mediterranean region as of 26 March 2018. Worldwide coverage with

L2A data was planned to be achieved by summer 2018, but was extended to the end of 2018

(ESA, 2018b, 2018d). Since mid-December 2018 L2A products for Rusinga Island (T36MXE)

sensed after 17 December 2018 can be downloaded from the Copernicus Open Access Hub

(ESA, 2019). For the data sensed earlier than that date, the atmospheric correction to transform

L1C data into L2A data needs to be performed by the user using the Sen2Cor processor as

plugin in the Sentinel Application Platform (ESA, 2018c) or as command line programme. This

processor is available for download on ESA’s Sentinel website.

The Sentinel-2 data tiles are projected in UTM/WGS84 (ESA, 2015). In this projection,

Rusinga Island is located in zone 36S. Subsequently, all spatial analysis in this thesis was

performed using WGS84/36S (EPSG 32736) projection.

2.2.1 Software

Most processing and analyses were performed using the programming language R (R Core

Team, 2018) in RStudio version 1.1.447 (RStudio, 2018). Apart from base commands, the

following packages were used: ‘raster’ (Hijmans et al., 2019), ‘randomForest’ (Breiman,

Cutler, Liaw, & Wiener, 2018), ‘rgdal’ (Bivand et al., 2019), ‘RStoolbox’ (Leutner, Horning,

Schwalb-Willmann, & Hijmans, 2019), ‘stringi’ (Gagolewski, Tartanus, contributors (stringi

source code), IBM and other contributors (ICU4C source code), & Unicode Inc. (Unicode

Character Database), 2019), ‘caret’ (Kuhn et al., 2019), ‘grDevices’ (R Core Team, 2019),

‘gdalUtils’ (Greenberg & Mattiuzzi, 2018), ‘DescTools’ (Signorell et al., 2017), ‘plotrix’

(Lemon et al., 2019), and ‘xtable’ (Scott et al., 2019). Atmospheric correction using the

Sen2Cor processor version 5.5.2 was performed as command line programme on MacOS 10.13.

For tasks when a geographic information system (GIS) interface was needed, mainly for

reference data collection and map production, QGIS versions 3.0 to 3.6 (QGIS Development

Team, 2019) were used.

2.2.2 Data acquisition and pre-processing

Sentinel-2 data for the study area (tile 36MXE) were downloaded directly from ESA’s Sentinel

data hub (ESA, 2019) at processing level 1C for those sensed before 17 December 2018, and at

processing level 2A for those sensed after that date. Scenes where Rusinga Island is cloud-free

were visually identified on the platform and selected for download. An overview of the selected

scenes is presented in Table 2.

Page 18: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

13

Table 2: Overview of Sentinel-2 scenes selected for analysis. * Scenes used for land cover

classification model selection as well as for change detection.

27.05.2016 15.08.2016 14.10.2016 12.01.2017 02.04.2017* Change

detection –

first period

06.06.2016 25.08.2016 23.12.2016 13.03.2017* 12.04.2017*

13.03.2017*

02.04.2017*

12.04.2017*

22.05.2017

01.07.2017

11.07.2017

31.07.2017

05.08.2017

15.08.2017

09.09.2017

04.10.2017

09.10.2017

29.10.2017

03.11.2017

18.11.2017

28.11.2017

13.12.2017

23.12.2017

28.12.2017

17.01.2018

22.01.2018

01.02.2018

11.02.2018

16.02.2018

26.02.2018

Land cover

classification

& model

selection

01.06.2018 26.07.2018 24.09.2018 07.01.2019 Change

detection –

second period

01.07.2018 31.07.2018 19.10.2018 12.01.2019

06.07.2018 09.09.2018 28.11.2018 18.03.2019

The data sensed before 17 December 2018 needed to be pre-processed to obtain data at

processing level 2A. The pre-processing was performed using the Sen2Cor processor as

command line tool. The scene classification algorithm detects snow, clouds, cirrus, and cloud

shadow and produces a scene classification map with a focus on cloud differentiation. This map

is used as an input for the cirrus removal of the subsequent atmospheric correction. The

atmospheric correction process consists of five steps. Firstly, Look-Up Tables are prepared,

which contain specific information on sensor and solar geometries, atmospheric parameters,

and ground elevation. Different Look-Up Tables are calculated for the specifics of the

respective tile and user configurations. Secondly, Aerosol Optical Thickness (AOT), which is

a measure for the visual transparency of the atmosphere, is retrieved using the Dense Dark

Vegetation algorithm described by Kaufman et al. (1997). Next, water vapour (WV) content

over land is retrieved using the Atmospheric Pre-corrected Differential Absorption algorithm

(Schläpfer, Borel, Keller, & Itten, 1998). Then, cirrus is removed using the classification map

produced by the previous scene classification algorithm. Especially the visible, near-infrared

(NIR), and short-wave infrared (SWIR) spectral bands are affected by disturbance of this cirrus

clouds, which are difficult to detect by broadband multispectral sensors. Therefore, the MSI of

Sentinel-2 contains a separate band (band 10: ~ 1337.5 – 1413 nm) to sense cirrus clouds.

Finally, surface reflectance is retrieved for all bands. As a result, a bottom-of-the-atmosphere

reflectance output at processing level 2A is produced. The processor was run at a resolution of

10 m. However, since cirrus correction is only possible at 20 and 60 m resolutions, the

processes have been run twice; first at 20 m and then at 10 m resolution (Müller-Wilm, 2018a,

2018b). Data naming of Sentinel-2 products has changed as of 06 December 2016 to overcome

pathname character limitations of Windows operating systems (ESA, 2018a). With the current

version of the Sen2Cor processor (v2.5.5) the old data naming format could not be processed.

However, for the study area data in the new naming format are available for download until as

early as 27 May 2016. Thus, for this thesis data conversion from processing L1C to L2A has

been performed for cloud-free scenes between 27 May 2016 and 28 November 2018. Five

earlier cloud-free scenes (between 29 November 2015 and 28 March 2016) in the old naming

format have been disregarded for this study.

Page 19: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

14

The Sen2Cor 20 m resolution processing produces separate atmospherically corrected spectral

reflectance bands with 20 m resolution (originally 20 and 10 m resolutions, except B08 is

omitted in 20 m processing). Additionally, AOT and WV files are produced along with the

scene classification map. The 10 m resolution processing produces atmospherically corrected

files of the original 10 m resolution bands. Besides, AOT and WV files are produced as well as

a true colour image. The cirrus band (B10) is omitted in both cases as it does not provide any

ground information (Müller-Wilm, 2018a).

The 10 m spectral bands (B02, B03, B04, B08) and 20 m spectral bands (B05, B06, B07, B8A,

B11, B12) were combined for each scene (10 bands per scene). The remaining Sen2Cor outputs

were disregarded.

The spatial extent of each tile was cropped to a rectangle covering the area of Rusinga Island

and small parts of the contiguous mainland to reduce the size of the files and subsequently

computing power and memory space requirements. Moreover, the selected bands of each scene

were resampled to the highest spatial resolution (10 m). Finally, the processed bands of each

scene were composed to single files, which were saved in GeoTiff format. The produced multi-

band subsets were visually evaluated for their suitability. Images which showed disturbance

from clouds or cloud shadows were excluded from the dataset.

Even if all data is obtained from the same source and has undergone the same processing, some

irregularities might occur. Systematic errors, thus those occurring in all scenes sensed by the

instrument, are corrected by ESA in the Payload Data Ground Segment. Non-systematic errors

might occur in single images and cannot be systematically corrected. Thus, any non-systematic

correction needs to be performed by the user (ESA, 2015; Jones & Vaughan, 2010). ESA aims

for 3 m (95% confidence level) performance of the multi-temporal registration, which,

according to their recent Quality Report, is currently at an average of 12 m (Clerc & Team,

2018). This means that images might be slightly shifted. Therefore, the pixel shift of all images

used for this study was calculated and corrected, if a shift was detected. A master image (13

March 2017) was defined, which is the temporally closest cloud-free Sentinel-2 scene to the

data used for the collection of reference data (20 February 2017, as described in Chapter 2.4).

This master image was visually compared to very high resolution (VHR) images (Google

Satellite layer, acquired by the French National Centre for Space Studies (CNES), the Bing

Virtual Earth layer, the Esri satellite layer, all three embedded in QGIS, and a false colour

Pléiades-1A image) and found to be well aligned, which qualifies it to be used as master image.

Subsequently, the shifts of all images relative to the master image were calculated and corrected

if necessary. The algorithm calculates the best shift based on maximum mutual information,

which is a measure for the mutual dependence of two random variables in information theory

(Leutner et al., 2019). The workflow of data acquisition and pre-processing is visualised in

Figure 3.

Page 20: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

15

Figure 3: Workflow of data acquisition and geometric pre-processing. Italic steps indicate

optional operations only for parts of the data.

2.2.3 Vegetation indices

Certain objects or ground covers reflect light differently at different wavelengths. There are

many factors influencing how light interacts with different ground characteristics. For example,

structural features, texture, water content and chlorophyll content determine how vegetation

reflects and absorbs light at different wavelengths. To measure and account for perturbing

parameters, different mathematical combinations of the spectral bands the satellite sensor

collects are used, so-called spectral indices. The use of indices allows for normalisation of the

spectral data of unrelated ground characteristics and for enhancement of sensitivity in a small

reflective spectrum. The use of vegetation indices is commonly adopted in studies of land cover

mapping and vegetation monitoring using remote sensing (e.g. Eckert, Hüsler, Liniger, &

Hodel, 2015; Lunetta et al., 2006; Motohka, Nasahara, Oguma, & Tsuchida, 2010; Nyberg et

al., 2015; Viña, Gitelson, Nguy-Robertson, & Peng, 2011; Wibowo, Ismullah, Dipokusumo, &

Wikantika, 2012; Yang et al., 2015). Vegetation indices are good and comparable proxies for

vegetation net primary production, vegetation trend analysis and vegetation phenology as they

make use of the vegetation’s sharp increase in reflectance between the red and the near-infrared

bands (Fensholt & Proud, 2012; Jones & Vaughan, 2010). Therefore, the ratio of those two

bands can be used as indicator for vegetation cover. While vegetated areas are represented by

a large difference in reflectance between those two bands, the difference in reflectance of bare

soil is low. Accordingly, vegetation indices are useful tools for vegetation monitoring through

change detection and classification and can serve as management and decision-making tools in

environmental management and revegetation activities. For example, bare soil areas can easily

be detected and together with a digital elevation model be used to prioritise areas where

interventions are needed most urgently. As input features for the Random Forest models,

vegetation indices provide additional spectral responses, which are used for the classification

algorithm to distinguish the different classes. Different indices have been developed for

different purposes and for different satellite sensors. Remote sensing sensors are constantly

being improved and developed further. Accordingly, the range of different spectral bands

increases and so does the data, which is retrieved from them. This means that vegetation indices

Page 21: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

16

are regularly refined and developed further as the technological improvements advance (Jones

& Vaughan, 2010).

The normalised difference vegetation index (NDVI) (Rouse, Haas, Schell, Deering, & Harlan,

1974) is the most commonly used vegetation index (e.g. Eckert et al., 2015; Fensholt & Proud,

2012; Gandhi, Parthiban, Thummalu, & Christy, 2015; Lunetta et al., 2006). It utilises the fact

that the chlorophyll of vegetation absorbs light in the red spectrum (600 – 700 nm), while the

light in the NIR spectrum (700 – 1000 nm) is reflected. Hence, the NDVI is obtained by

dividing the difference between the NIR and the red band by the sum of the NIR and the red

band. Hence, dense vegetation cover with high photosynthetic activity results in high NDVI

values close to 1, while areas free from vegetation, such as bare soil or water, result in much

lower NDVI values closer to 0 or negative. One adjustment of the NDVI is the green normalised

difference vegetation index (GNDVI) (Gitelson, Kaufman, & Merzlyak, 1996). It is defined

exactly as the NDVI, but substitutes the red band for the green band, which improves the

sensitivity to dense vegetation as its wider dynamic range is more sensitive to higher

concentrations of chlorophyll in vegetation compared to the red band used in the NDVI.

Another variation of the NDVI is the green-red vegetation index (GRVI) (Motohka et al., 2010),

which substitutes the NIR band for the green band. This modification improves the sensitivity

to colouring of the leaves from green to yellow. Furthermore, the normalised difference infrared

index (NDII) (Kimes, Markham, Tucker, & McMurtrey, 1981) is a variation of the NDVI,

which substitutes the red band with the short-wave infrared band (SWIR), which is sensitive to

leaf water content. Since the MSI collects SWIR reflectance around 1610 nm (band 11) and

around 2190 nm (band 12), two NDII combinations can be derived: NDII11, which uses band

11 and NDII12, which used band 12. Huete (1988) developed a vegetation index, which also

derived from the NDVI: The soil-adjusted vegetation index (SAVI) corrects the influence of

soil reflectance on vegetation reflectance by adding a constant to the denominator of the NDVI

and adding a multiplication factor to keep the values within the original NDVI bound (-1 to 1).

Additionally, the inverted red-edge chlorophyll index (IRECI) was particularly developed for

Sentinel-2 data and makes use of the three different vegetation red edge (VRE) bands, which

are collected by the MSI in the spectrum between red and NIR (690 – 795 nm) (Frampton et

al., 2013). It divides the difference between the third red-edge band and the red band by the

ratio of the first and second red-edge bands. A summary of the vegetation indices used in this

study can be found in Table 3.

Page 22: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

17

Table 3: Vegetation indices used in this study.

Name Equation Sentinel-2 bands used Reference

Normalised

Difference

Vegetation

Index

𝑁𝐷𝑉𝐼 = 𝑁𝐼𝑅 − 𝑅

𝑁𝐼𝑅 + 𝑅 =

𝐵08 − 𝐵04

𝐵08 + 𝐵04

(Rouse et

al., 1974)

Green

Normalised

Difference

Vegetation

Index

𝐺𝑁𝐷𝑉𝐼 = 𝜌𝑁𝐼𝑅 − 𝜌𝐺

𝜌𝑁𝐼𝑅 + 𝜌𝐺 =

𝐵08 − 𝐵03

𝐵08 + 𝐵03

(Gitelson et

al., 1996)

Green-Red

Vegetation

Index

𝐺𝑅𝑉𝐼 = 𝐺 − 𝑅

𝐺 + 𝑅 =

𝐵03 − 𝐵04

𝐵03 + 𝐵04

(Motohka et

al., 2010)

Normalised

Difference

Infrared

Index

𝑁𝐷𝐼𝐼 =𝜌𝑁𝐼𝑅 − 𝜌𝑆𝑊𝐼𝑅

𝜌𝑁𝐼𝑅 + 𝜌𝑆𝑊𝐼𝑅

𝑁𝐷𝐼𝐼11 =𝐵08 − 𝐵11

𝐵08 + 𝐵11

𝑁𝐷𝐼𝐼12 =𝐵08 − 𝐵12

𝐵08 + 𝐵12

(Kimes et

al., 1981)

Soil-adjusted

Vegetation

Index

𝑆𝐴𝑉𝐼

= (1 + 0,75)𝜌𝑁𝐼𝑅 − 𝜌𝑅

𝜌𝑁𝐼𝑅 + 𝜌𝑅 + 0,75

= (1 + 0,75)𝐵08 − 𝐵04

𝐵08 + 𝐵04 + 0,75

(Huete,

1988)

Inverted Red-

edge

Chlorophyll

Index

𝐼𝑅𝐸𝐶𝐼 =𝜌𝑅𝐸3 − 𝜌𝑅

𝜌𝑅𝐸1𝜌𝑅𝐸2

=𝐵07 − 𝐵04

𝐵05𝐵06

(Frampton

et al., 2013)

2.3 Classification scheme

Land cover classification is one of the most common applications of optical satellite data. Since

the beginning of systematic global satellite image processing the need for standardising land

use and land cover classification was recognised. Institutions like the United States Geological

Survey (USGS) and the United Nations Food and Agricultural Organisation (FAO) have

developed detailed and systematic classification schemes driven by the lack of uniformity and

comparability (Anderson, Hardy, Roach, & Witmer, 1976; Di Gregorio, 2005). It is crucial to

make a distinction between land cover and land use. In this study, the common definitions of

land cover and land use, as adopted for example by FAO and the European Union (EU), were

followed. While land cover describes the Earth’s observed (bio)physical cover, land use refers

to the way it is being used (Di Gregorio, 2005; Eurostat, 2018). A single land cover can serve

different use cases: a grassland, for example, can be used as a meadow to grow fodder for

animals, it can be used for animals to graze or for sports such as football or golf or it can also

be an inaccessible piece of grassland that is not being used for anything in particular. Then

again, a single land use type may appear in various land cover types: a recreational area can be

a grassland for sport activities, it can be a water body, a sandy beach, a forest, or a built-up

urban space. In this study the focus is solely laid on land cover. With the various uses of land

and the different purposes of land cover classification, the definition of classes varies greatly

Page 23: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

18

within scientific literature depending on the choice of classification scheme and the purpose of

the land cover map (e.g. Congalton, Gu, Yadav, Thenkabail, & Ozdogan, 2014; Di Gregorio,

2005; Herold & Di Gregorio, 2012). While an agricultural map distinguishes precisely between

different types of crops, the fields may be uniformly classified as agricultural fields in a tourist

map regardless of the crop type, because it is not relevant for touristic purposes.

For the purpose of this study, a very fine classification is not necessary as the aim is to provide

a land cover classification map as baseline for future monitoring of vegetation cover

development and afforestation. Therefore, the classes in this study were kept broad to minimise

the potential for misclassifications. The development of the classification scheme in this study

was inspired by the first level classes of the EU’s land use and land cover survey: Artificial

land, cropland, woodland, shrubland, grassland, water areas, wetlands, and bare land (Eurostat,

2018). However, some modifications were made to account for the local character of Rusinga

Island. Wetlands do not exist on the island, so the class was disregarded. Artificial land was

included in bare land. On Rusinga roads are built of sand and gravel, so that they have very

similar spectral response patters as bare soil. Buildings as separate class produced very high

misclassification rates, probably because they are rather small and scattered all over the island,

making them difficult to detect with 10 m spatial resolution data. Roofs of different colours

further cause disturbance in the spectral response patterns. As buildings are often surrounded

by bare soil, they were included in the bare land class in this study. Moreover, the classes

shrubland and woodland were combined in this study, because the reference data were collected

only from satellite imagery and not in situ, which made a distinction between small trees and

shrubs impossible. No large or dense forests exist on the island, but woody vegetation is

scattered, which further complicated the distinction between woodland and shrubland.

A second classification scheme with four classes was introduced at a later stage of the research

and compared to the 5-class scheme described above. This was done as a response to high

misclassification rates especially between the grassland and the cropland classes. The confusion

is not surprising since the spectral signatures of the two classes are very similar (Figure 4).

While the woodland and grassland classes displayed similar spectral signatures, those of the

grassland and cropland classes aligned even more closely. Subsequently, the number of classes

was reduced to four where water and bare land remained unchanged, woodland was translated

to continuous vegetation and complemented by evergreen crops and other year-round, dense

(grassy) vegetation. The remaining cropland and grassland samples, which correspond to

scattered or seasonal vegetation form the fourth class. Table 4 lists and describes the land cover

classes for the two schemes used in this study.

Page 24: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

19

Figure 4: Average spectral signatures of the five land cover classes (top) and the four land

cover classes (bottom).

Table 4: Description of land cover classes.

5 classes 4 classes

Name Description Name Description

Woodland high / woody / dark green /

dense vegetation

Continuous

vegetation

dense / evergreen vegetation;

trees

Grassland low / grassy / light green /

scattered vegetation

Scattered or

seasonal

vegetation

seasonal / scattered vegetation

Cropland rectangular fields; seasonal

vegetation

Bare land bare soil; roads; buildings Bare land bare soil; roads; buildings

Water open water (Lake Victoria) Water open water (Lake Victoria)

Page 25: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

20

2.4 Reference data

Supervised classification requires reference data, which are used to train, verify, and test the

classification model. Reference data collection for this study was conducted through visual

image interpretation. Because the Sentinel-2 scenes have a maximum spatial resolution of 10 m,

it is difficult to visually identify different types of land cover in these images. Subsequently,

reference data were collected using VHR images. The Google Satellite View imagery, acquired

by CNES Pléiades-1A satellite, which has a spatial resolution of 0.5 m and can be loaded in

QGIS, was used to identify and classify reference samples. This imagery was obtained on 20

February 2017 for most of the study area. A small part in the north east of the island is covered

by imagery dated 30 June 2017 (Google, 2018). The Pléiades-1A scene from 20 February 2017

was also used as false colour composite. The false colour composite facilitated the identification

of different vegetation types as it represents vegetation cover more sensitively than the true

colour composite. Moreover, the Esri World Imagery, which can be embedded in QGIS and is

acquired by DigitalGlobe, sensed on 29 August 2016, with a spatial resolution of 0.46 m, was

used to verify the visual classification (DigitalGlobe, 2016). Consulting both VHR images is

useful, as they are acquired relatively close in time but represent different seasons. While the

CNES image from mid-February represents the peak of the major dry season on Rusinga, the

DigitalGlobe image from August is taken from between the rains, after the major rainy season.

Even if the DigitalGlobe image does not represent the rainy season, it shows a clear difference

to the CNES image and is much greener. Eventually, the reference dataset used in this study

consists of two initially separately collected datasets, which were combined at a later stage of

this thesis work. The reason for this is that first the training dataset consisted of polygons of

which the mean reflectance was used to train the models, while the validation dataset was

comprised of data points. However, it was decided to convert the training samples into point

data and combine the two datasets for two reasons. Firstly, because using the polygons’ mean

reflectance values implies using artificially constructed values, that do not really occur in the

images. Secondly, combining the two datasets results in more accurate model prediction

outcomes, because more samples can be used for training the models.

For the first part of the reference data, initially the training dataset, 322 samples were collected.

To best represent the actual landscape of the study area, 250 points were randomly spread

covering the island and an additional 150 m buffer around the island to cover the coastal areas

and the lake. For each point the class was visually identified using the VHR images and a

corresponding homogeneous polygon was defined at or as close to the point as possible. If it

was not possible to identify the class at the location of the point, a polygon of a different class

representing the nearby area was defined. A drawback of simple random sampling is the

potential underrepresentation of classes (Lillesand et al., 2015). A stratified random sampling

method would be useful with each class representing a stratum. Since the points need to be

randomly selected first before assigning classes to them, it is technically not possible to stratify

the random sample by class. Instead, a 500 x 500 m grid was laid over the island area and it

was ensured that each grid cell included at least one training area to achieve a spread-out spatial

distribution of polygons. If a cell did not contain a training sample, a new polygon was added

preferably representing a potentially underrepresented class. When defining the training

polygons, it was also considered that not only the spectral response pattern of each land cover

Page 26: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

21

type is different, but that there are also variations within the classes, especially if the classes are

defined relatively broadly, as they are in this study. Therefore, it was attempted for the reference

data to cover as many of these intra-class variations as possible (Lillesand et al., 2015).

For the second part of the reference data, initially the validation dataset, areas, which could

clearly be classified, were visually identified using the VHR imagery described above.

Polygons were defined and assigned a class. Then, their centroids were calculated which were

used as reference pixels. The purposive sampling method was chosen to sample areas as

information-rich and accurate as possible. This resulted in a total of 215 samples.

For both reference datasets, plots of the NDVI reflectance for each sample over time were

produced and inspected. The plots showed disturbances and noise, which is supposedly caused

by impure data. To resolve this problem, the initial training data were converted from polygon

data to point data. Although this reduced the overall surface area used for training data

drastically, it also made the data purer and the spectral signatures were much clearer. Moreover,

all sample points were revised and corrected if necessary. On this occasion, the 4-classes

classification scheme was introduced and besides revising the assigned classes for the 5-classes

scheme, classes for the 4-classes scheme were assigned to each sample point. Eventually, the

process resulted in a total of 537 reference samples, which were each assigned a class in the 5-

class- as well as the 4-class scheme. Their distribution by geolocation and class is shown in

Figure 5. These samples were then randomly split into two datasets; one, containing 70% (376)

of the samples, for model training and validation, and the second, containing 30% (161) of the

samples, for independent testing of the models. This test dataset was exclusively used for

assessing the accuracy of the models on independent data, that is not associated with building

the models in any way. The use of independent test data is absolutely necessary, if the

comparison of kappa coefficients, which assumes independence of samples when assessing

accuracy, are used for accuracy assessment of the classification (Foody, 2004), which is the

case in the present study. Using the same data for training a model and for assessing its accuracy

results in overestimating the accuracy (Congalton, 1991).

Page 27: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

22

Figure 5: Locations and distribution of reference samples (training and test data combined) in

5-class scheme (upper) and 4-class scheme (lower).

Page 28: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

23

2.5 Random Forest land cover classification model development

Digital image classification approaches can broadly be divided into spectral- or spatial pattern

recognition, or a hybrid approach, such as object-based image analysis, which is receiving

increasing attention in scientific literature (Duro, Franklin, & Dubé, 2012; Hussain, Chen,

Cheng, Wei, & Stanley, 2013; Immitzer, Atzberger, & Koukal, 2012; Immitzer, Vuolo, &

Atzberger, 2016; Karlson, Ostwald, Reese, Bazié, & Tankoano, 2016; Myint, Gober, Brazel,

Grossman-Clarke, & Weng, 2011; Weih & Riggan, 2010). This study explored the potential of

using free open access Sentinel-2 data for land cover classification with Random Forest and

aimed to develop a tool for change detection and vegetation monitoring. There is a wide range

of VHR satellite images available commercially with resolutions as high as 0.3 m

(DigitalGlobe, 2017). For free open access satellite images, however, Sentinel-2 images with

10 m have the highest spatial resolution currently available. Studies comparing pixel-based and

object-based classification methods with 10 m spatial resolution imagery have found no

improved classification accuracy for the object-based approach (Duro et al., 2012; Immitzer et

al., 2016) while more accurate results have been shown for object-based classification with

VHR imagery with multispectral resolution of 2.0 – 2.4 m (Immitzer et al., 2012; Myint et al.,

2011). Thus, the 10 m spatial resolution of Sentinel-2 images is probably too low for spatial

pattern recognition, but their great variety of spectral bands provides a solid base for a spectral

pattern recognition approach to classify different types of land cover using a pixel-based

analysis. An ongoing ESA-funded research project on using Sentinel-2 data for automated

global land cover mapping also relies on pixel-based supervised classification with Random

Forest as it was found to be the most suitable approach (Lewinski et al., 2017).

Random Forest is a supervised non-parametric machine learning algorithm developed by

Breiman (2001) in the 1990s and early 2000s. It can be used to approach both regression and

classification problems. In this study, it was used for the latter. Random Forest has gained

increasing popularity to approach classification problems especially in remote sensing

(Immitzer, 2017; Karlson, 2015; Schultz et al., 2015; Zhu & Woodcock, 2014) and has been

proved to be an adequate classification strategy to produce land cover maps by a benchmark

study of state-of-the-art supervised classification methods (Inglada et al., 2015). It handles high

dimensional data, thus allows for a large number of input features and is robust against

overfitting. As non-parametric classifier, no assumptions about data distribution are needed

(Breiman, 2001). Moreover, it provides an internal error rate estimation and two feature

importance ranking tools. An additional advantage of Random Forest is its simplicity. Only two

parameters are needed to be set for the model to perform: the number of trees the model builds

(ntree) and the number of features randomly selected at each node (mtry). The Random Forest

model then grows the pre-determined number of decision trees. For each tree, a bootstrap

sample with replacement is drawn from the training data set. Two thirds of the data are used to

grow the decision tree while the remaining one third is used for internal model validation to

estimate the generalisation error, the so-called out-of-bag (OOB) error. The OOB data, which

is not used for training the model, is put down the decision trees and the resulting

misclassification rates provide the OOB error. It gives an estimate of the overall accuracy (OA)

of the classification model. Each tree is grown by randomly selecting a subset of input features

to be split at each node (mtry) and eventually votes for a class. The final classification is derived

Page 29: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

24

from a majority vote of all trees in the forest (Breiman, 2001). In this study, ntree was set to

500, which is the default and has proved suitable in previous research (Breiman et al., 2018;

Immitzer et al., 2012; Liaw & Wiener, 2002). Different mtry were tested of which the default

mtry value, which is the square root of the number of input features (Breiman et al., 2018)

proved suitable. Thus, mtry = 4 for the single-date datasets with 17 input features and mtry = 20

for the multi-temporal datasets with 425 input features.

Since the internal OOB error estimates are based on bootstrapping and bootstrapping might be

problematic with small sample sizes (Kuhn & Johnson, 2013), the OOB bootstrap validation

results were compared to the results of repeated 10-fold cross-validation. Repeated 10-fold

cross validation was used, because it produces acceptable bias and variance results, even with

rather small sample sizes (Kuhn & Johnson, 2013). The model performance results were

compared based on the Kappa coefficient (J. Cohen, 1960), because it is particularly useful

when data are distributed unevenly among the classes (Kuhn, 2008), which is the case here

(Figure 5). Because the results did not differ much, the internal bootstrap validation was

considered reliable and in this study the model performance was estimated based on the

Random Forest OOB error.

2.5.1 Predictor datasets

As the second Ssentinel-2 satellite was launched in March 2017, there are much more data

available since. Therefore, the period March 2017 – February 2018 was selected for model

development. A range of different models were developed from different predictor datasets and

compared to assess which is the most suitable for land cover mapping on Rusinga Island using

Sentinel-2 data and Random Forest classification. Table 5 provides an overview of the different

predictor datasets and the scenes they are comprised of. Different datasets were tested. A major

distinction can be made between single-date and multi-temporal datasets. The single-date

datasets each include the 10 spectral bands as well as the seven vegetation indices of one

selected scene. One scene representing the major wet season from March to May and one scene

representing the dry season from January to February were selected. The selection for each

season was based on the lowest OOB error. As for the single-date datasets, the respective set

of seven vegetation indices was included in the multi-temporal datasets. One uses all 25 scenes

covering the selected study period. The second multi-temporal dataset was constructed of all

four wet season scenes, while the third used all six dry season scenes. For the fourth multi-

temporal dataset the seasons were combined, but the amount of input features was reduced only

one scene from each season, the one which produced the best-performing model in the selection

of the single-date datasets. The fifth and the sixth multi-temporal predictor datasets were

comprised of 6 and 12 randomly selected scenes respectively. In this way, the total amount of

input data was reduced compared to the dataset with all scenes, but no assumptions were made

about the suitability of different seasons. Therefore, the randomly selected datasets also

represent the entire year. For each of the randomly selected predictor datasets three sets of

random samples were produced to account for variability of the randomness. Comparing

classification accuracies of single-date and multi-temporal data as well as between different

seasons has been previously studied, mostly in favour of multi-temporal classification, but it

has to be recognised that conditions differ and in some cases single-date classifications

Page 30: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

25

outperform multi-temporal approaches (Karlson et al., 2016; Langley, Cheshire, & Humes,

2001; Löw, Knöfel, & Conrad, 2015; Siachalou, Mallinis, & Tsakiri-Strati, 2015). Each of the

predictor datasets was used with the 5-class scheme as well as the 4-class scheme.

Table 5: Selected scenes for the predictor datasets (*same scenes for 5 classes and 4 classes).

Name Scene(s)

5 classes 4 classes

Single-date wet season 22.05.2017 13.03.2017

Single-date dry season 22.01.2018 17.01.2018

Multi-temporal all scenes* 13.03.2017

02.04.2017

12.04.2017

22.05.2017

01.07.2017

11.07.2017

31.07.2017

05.08.2017

15.08.2017

09.09.2017

04.10.2017

09.10.2017

29.10.2017

03.11.2017

18.11.2017

28.11.2017

13.12.2017

23.12.2017

28.12.2017

17.01.2018

22.01.2018

01.02.2018

11.02.2018

16.02.2018

26.02.2018

Multi-temporal wet season* 13.03.2017 02.04.2017 12.04.2017 22.05.2017

Multi-temporal dry season* 17.01.2018

22.01.2018

01.02.2018

11.02.2018

16.02.2018

26.02.2018

Multi-temporal combined seasons 22.05.2017 22.01.2018 13.03.2017 17.01.2018

Multi-temporal 6 random scenes

1st sample*

01.07.2017

31.07.2017

15.08.2017

04.10.2017

28.11.2017

22.01.2018

Multi-temporal 6 random scenes

2nd sample*

13.03.2017

09.10.2017

29.10.2017

22.01.2018

01.02.2018

11.02.2018

Multi-temporal 6 random scenes

3rd sample*

13.03.2017

02.04.2017

12.04.2017

11.07.2017

05.08.2017

28.11.2017

Multi-temporal 12 random scenes

1st sample*

13.03.2017

01.07.2017

31.07.2017

05.08.2017

15.08.2017

04.10.2017

09.10.2017

13.12.2017

28.12.2017

17.01.2018

22.01.2018

26.02.2018

Multi-temporal 12 random scenes

2nd sample*

13.03.2017

12.04.2017

22.05.2017

05.08.2017

09.10.2017

29.10.2017

13.12.2017

28.12.2017

22.01.2018

01.02.2018

11.02.2018

16.02.2018

Multi-temporal 12 random scenes

3rd sample*

13.03.2017

02.04.2017

12.04.2017

22.05.2017

11.07.2017

05.08.2017

04.10.2017

09.10.2017

18.11.2017

28.11.2017

13.12.2017

01.02.2018

2.5.2 Input feature selection and feature importance ranking

The input features influence the accuracy of the classification as not all input features are

equally relevant. The Random Forest classification algorithm provides two tools for input

feature importance ranking: the mean decrease in accuracy (MDA) and the mean decrease in

Gini (MDG). These tools generate a ranking of the input features based on their relative

importance to the classification results. The MDA is derived from randomly permuting the

values of each input feature based on the OOB data while keeping all other variables constant.

The feature’s relative importance is determined by the influence it has on the misclassification

rate. The greater the MDA of a feature, the greater it’s importance to the classification

(Breiman, 2001). The MDG is derived from the sum of the Gini impurity decreases normalised

Page 31: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

26

over the total amount of trees. The Gini impurity is a measure of the probability of a feature

being misclassified at a node. The further down a tree, the purer the splits and the smaller the

Gini impurity at each node. Hence, the greater the MDG of a feature, the greater it’s importance

(Breiman & Cutler, 2004). Even if Random Forest handles high-dimensional data, hence can

deal with a large number of input features, optimisation of model performance can be achieved

by elimination of input features with low explanatory power (Hastie, Tibshirani, & Friedman,

2009). For the feature selection in this study, a recursive elimination process based on the MDA

measure was applied: the models were run n times with n = number of features, starting with n

and reducing this number by 1 in every iteration until the number of input features = 0. Of these

iterated models the one with maximum OA was selected as best model. If one or more iterations

reached equal maximum OA, of these the one with the lowest number of input features was

selected as best model (Immitzer, 2017). The classification performance of the different models

tested in this study were compared before and after the feature selection process based on the

OA estimate.

Moreover, the MDA was used to rank the input features after feature selection to evaluate the

relative importance of the different features. The features were grouped by type, which are the

10 spectral bands and the seven vegetation indices. For each feature type the frequency of

occurring and a score was calculated to compare the importance of the different feature types.

The feature ranking was sorted by MDA value and each feature was assigned a score from 1

for the lowest-ranking feature to n for the highest-ranking feature, where n = number of input

features after feature selection. The individual scores were summed and normalised per feature

type. The same procedure was applied for grouping the input features by date. A potential

correlation between the importance of the dates and vegetation seasonality was investigated by

plotting the relative importance of the dates in the different models against the surface

reflectance NDVI, which is highly sensitive to plant greenness and subsequently provides a

good representation of the vegetation’s phenological cycle (Willis, 2015).

2.5.3 Model accuracy assessment

Classification accuracy assessment is, as the classification itself, an important part of the

classification process; i.e. the classification is not complete until its accuracy has been assessed

(Congalton, 1991). When different models are compared, like in this study, it is essential to

evaluate their accuracies. The use of an independent test dataset for the accuracy assessment is

not only good practice, but crucial for certain approaches, such as the Cohen’s kappa coefficient

(J. Cohen, 1960). The samples used to calculate the kappa coefficient are assumed to be

independent, which makes the use of an independent test dataset, which was held out of the

model production process, inevitable (Foody, 2004).

One of the most commonly used techniques for such comparison in accuracy assessment is the

confusion matrix (Congalton, 1991; Lillesand et al., 2015), which compares the classes

predicted by the model to the classes of the independent test dataset. It does not only present

the OA, which is the percentage of correctly allocated classes, but also includes error measures

for the individual classes. This is particularly interesting when one is primarily interested in one

or few classes. The producer’s accuracy (PA) indicates the percentage with which the test data

are classified correctly. The user’s accuracy (UA), subsequently indicates the percentage with

Page 32: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

27

which the modelled data represents the real world, that is the test data. The OA was calculated

as follows:

OA =∑ 𝑥𝑖𝑖

𝑟𝑖=1

𝑁

Equation 1

where

𝑁 = total number of observations

𝑟 = number of classes

𝑥𝑖𝑖 = number of correctly classified observations of each class

In addition, the kappa coefficient is often used in combination with the confusion matrix

(Immitzer et al., 2016; Karlson et al., 2016; Lillesand et al., 2015). It provides an indication of

the actual agreement between the validation data and the modelled classification outputs as it

accounts for chance agreement, i.e. the case that pixels are classified correctly just by chance

instead of by correct modelling. The kappa coefficient usually ranges between 0 and 1 where 1

indicates perfect agreement and 0 indicates no better classification than chance agreement.

Kappa () was calculated as follows (adopted from Lillesand et al., 2015):

=𝑁 ∑ 𝑥𝑖𝑖

𝑟𝑖=1 − ∑ (𝑥𝑖+ ∙ 𝑥+𝑖)

𝑟𝑖=1

𝑁2 − ∑ (𝑥𝑖+ ∙ 𝑥+𝑖)𝑟𝑖=1

Equation 2

where

𝑁 = total number of observations

𝑟 = number of classes

𝑥𝑖𝑖 = number of correctly classified observations of each class

𝑥𝑖+ = total number of observations of the modelled classification of each class

𝑥+𝑖 = total number of observations of the test data of each class

The prediction performance of the different models was assessed by predicting the classes of

the independent test dataset using the models and comparing the results to the actual classes,

which were assigned to the data by the author through visual interpretation. The model

accuracies were assessed by comparing their OA and their kappa coefficients.

2.6 Post processing and land cover map creation

The selected models were used to predict the classes of the entire image to produce a land cover

classification of the entire study area. The to-be-classified images need to be comprised of

exactly the same layers (i.e. features) as were used for developing the predictive model

(Hijmans et al., 2019).

To reduce noise and achieve a better visual representation of the classification map, a 3x3

majority filter was applied. The majority filter uses a 3x3 kernel to re-determine the class of

each pixel based on the majority of the particular pixel and its eight most direct neighbours.

Page 33: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

28

Additionally, for the same purpose, a sieve filter was used, which disregards connected pixel

groups of the same class, which are smaller than six pixels. Diagonal pixels are considered

connected (Lillesand et al., 2015; NASA Applied Remote Sensing Training, 2018). This

procedure was also tested with a 5x5 majority filter and a 10-pixel sieve. However, the resulting

maps turned out too coarse and generalised.

2.7 Change detection

Change detection was performed to identify, visualise, and analyse land cover changes on the

island. As for image classification, a range of different change detection techniques exist. A

comprehensive review by Tewkesbury et al. (2015) concludes that despite the great variety of

different methods pixel based post-classification techniques are remaining the most popular

choices for change detection due to their fast, convenient, descriptive and easy-to-interpret

characteristics. Since useful Sentinel-2 data are only available since May 2016 for the study

area, change detection could only be performed over a time period of three years, which is very

short for land cover change detection. Huang et al. (2009) suggest a time period covering a

minimum of ten years for effective change detection, however with data acquired every two

years to account for rapid changes. Subsequently, it was hypothesised that no noteworthy

changes could be detected over the analysed time frame. However, this change detection

application also served as test and example for future change detection to be performed by local

organisations.

For the post-classification change detection in this study two time periods were compared:

1. May 2016 – April 2017

2. May 2018 – April 2019

Even if in this study the models with all 25 scenes were selected for further analysis of the land

cover classification as they resulted in the highest accuracies, the number of scenes for the

change detection was limited to 10 and 12 for the first and second time period respectively.

This was done to limit model complexity and justified with the high accuracy results of the

tested models with 12 randomly selected scenes, which comes very close to the accuracy of the

models with all scenes and is therefore accepted as alternative. Moreover, for the first time

period only 10 images of the study area are available. Thus, for the second period 12 scenes

were randomly selected from the available Sentinel-2 images of which the study area was

cloud-free. For the first time period all 10 cloud-free scenes were used to create the predictor

dataset. The selected scenes were downloaded and pre-processed as described in Chapter 2.2.

The vegetation indices were calculated, which, together with the spectral bands of each selected

scene, make up the predictor datasets. The training data was extracted from the predictor dataset

and used to train the Random Forest models and select the best model after feature selection for

each of the two time periods. For the proof of concept of the change detection in this study, the

same training and test samples were used for both time periods as they are temporally so close

to each other and to the reference data date. Thus, it was assumed that no major land cover

changes occurred during that time. For change detection covering a larger time span new

reference data needs to be collected or the existing reference data needs to be updated. The

accuracy of each model was assessed on the independent test dataset as described in

Page 34: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

29

Chapter 2.5.3. The models were used to predict the classes of the entire images and

subsequently produce land cover maps. Filtering the maps was done after the comparison to

maintain the original precision at this stage. To allow for unique change values the classification

output of the first time period was reclassified by the following formula:

𝑟 = 𝑐 ∙ 5 + 1

Equation 3

where

𝑐 = initial class

𝑟 = new class

Then, the newer image was subtracted from the older image, resulting in a change map with

unique values for each change direction. For visual representation these change maps were

filtered in the same way as the land cover maps described in Chapter 2.6. Figure 6 presents an

overview of the entire workflow.

Page 35: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

30

Figure 6: Workflow of the land cover classification and change detection.

S2 data

Pre-processing:

- atmospheric correction

- spectral band selection

- spatial extent selection

- resampling to 10 m

- image coregistration

Vegetation

indices:

- NDVI

- GNDVI

- GRVI

- NDII11

- NDII12

- SAVI

- IRECI

Pre-processed datasets

Predictor dataset extraction

VHR imagery Land cover classes

Reference data

Training data (70%) Test data (30%)

Random Forest land cover classification

modelling

validation method

comparison: internal

RF bootstrap

validation & repeated

10-fold cross-

validation

Feature

selection

Best model selection Accuracy assessment

Rusinga Island land

cover classification

Change detection:

- May 2016 - April 2017

- May 2018 - April 2019

Confusion matrix

Feature importance ranking

Post-processing for

map visualisation:

- majority filter

- sieve

Post-processing for

map visualisation:

- majority filter

- sieve

Rusinga Island land

cover map

Rusinga Island land

cover change maps

Change matrix

Page 36: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

31

3. Results

3.1 Random Forest classification model selection

A land cover classification of Rusinga Island, Kenya was performed using Sentinel-2 images

and the Random Forest classification algorithm. Different models were tested and compared

based on different predictor datasets differentiating a range of single-date and multi-temporal

datasets and two classification schemes. Moreover, the model performances were compared

before and after the feature selection process. In the final step, the model accuracies were

assessed on an independent test dataset.

The comparison of the classification performance of the different models is summarised in

Table 6. It shows the overall accuracy (OA) and the kappa coefficient of each predictor dataset

before and after feature selection for the 5-class scheme and the 4-class scheme. The feature

selection resulted in higher performance values for all models except for the single-date dry

season, for which the performance decreased slightly after feature selection in the 5-class model

and remained the same in the 4-class model. However, this inconsistency can be attributed to

the natural variation of the results effected by the randomness component of the Random Forest

algorithm. The multi-temporal models achieved higher prediction performance than the single-

date models, except for the multi-temporal dry season model with five classes and the first

iteration of the 4-class multi-temporal model with six randomly selected scenes. Again, these

exceptions can be attributed to the randomness of the Random Forest. While the models with

all 25 scenes reached the highest accuracy, those with 12 randomly selected scenes followed

close behind. Generally, the models with four classes achieved higher accuracies than the

models with five classes. Regarding different seasons separately revealed that the wet season

models reached better classification performance than the dry season (Table 6). While for the

single-date models the performance of the wet and dry season differed only slightly, it is

interesting to note that the 5-class multi-temporal wet season model reached a much higher OA

than the dry season one. The difference is 5.32 percentage points (pp) while for the 4-class

multi-temporal models the difference is with 2.13 pp much smaller.

Page 37: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

32

Table 6: Classification performance based on internal OOB validation, reported as Kappa

coefficient and overall accuracy (OA), of the Random Forest land cover classification models

with different predictor datasets before and after feature selection for the 5-class- and the 4-

class scheme (colour scaling relative for each column).

The classification models with four classes achieved much higher accuracies than those with

five classes. This is not surprising, since the spectral signatures (Figure 4) of the grassland and

cropland classes in the 5-class models were very similar, which causes confusion in the

prediction outputs of the models (exemplified in the confusion matrix of the 5-class multi-

temporal model with all scenes in Table 7). While the woodland and grassland classes displayed

similar spectral signatures, those of the grassland and cropland classes aligned even more

closely. The confusion matrix reveals that the grassland and the cropland classes achieved the

lowest classification accuracies with producer’s accuracies (PA) of 57% and 81% respectively

and user accuracies (UA) of 71% and 81% respectively, while the remaining classes achieved

accuracies over 93%. It also shows that of the 47 grassland samples 27 were classified correctly

while 20 were misclassified, 12 of which (60%) as cropland. Likewise, of the 67 cropland

samples 54 were correctly classified while 13 were misclassified, eight of which (62%) as

grassland. However, the confusion of grassland and cropland is not the only source of the low

PA of the two classes. While for both classes about 60% of the misclassified samples were

confused for the respective other, the remaining 40% were confused for either woodland or bare

land. Of the misclassified grassland samples 25% were confused for bare land while 15% were

confused for woodland. The opposite is the case for the misclassified cropland samples of which

23% were confused for woodland and 15% for bare land.

Page 38: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

33

Table 7: Confusion matrix of the multi-temporal Random Forest land cover classification

model with all scenes and five classes after feature selection (based on internal OOB

validation).

For the further analysis, the multi-temporal models with all scenes for both classification

schemes were used as they achieved the highest prediction accuracy results. Feature selection

proved beneficial for classification accuracy. Thus, the models for further analysis were

considered after feature selection. Both classification schemes were considered delivering

valuable information. The 4-class models showed fewer prediction errors, which can be

attributed to the similarity of the grassland and the cropland class of the classification scheme

with five classes (Figure 4), and therefore seem more reliable. However, the 5-class models

provide more detail and additionally give an indication of the land use (i.e. agricultural use).

Since the purpose of the models is to map land cover to provide a tool for vegetation monitoring,

using both classification schemes and providing the option to choose might prove beneficial. A

larger number of scenes incorporated in the predictor dataset increased the prediction accuracy,

but also the complexity of the model and subsequently its computational needs. The predictor

dataset comprised of 12 randomly selected scenes also provided acceptable prediction accuracy

compared to the one comprised of all scenes but reduces complexity by using only half as many

scenes. Therefore, it could be worth considering using the 12-random-scenes models if the

complexity of the all-scenes models increases due to an increased amount of available satellite

image data.

3.2 Accuracy assessment

The results of the accuracy assessment are summarised in Table 8 and reflect the results of the

Random Forest internal performance assessment (Table 6). The 4-class models performed

better than the 5-class models and the multi-temporal models achieved higher accuracies than

the single-date models. While for the 4-class models the predictor dataset with all scenes

achieved the highest accuracy, for the 5-class models the highest accuracy was achieved by one

of the predictor datasets with six randomly selected scenes (Table 8). As mentioned previously,

this can be attributed to the randomness of the Random Forest algorithm. Moreover, the 4-class

models indicate that the more scenes are incorporated in the predictor datasets, the higher the

prediction accuracy. This trend is not represented by the 5-class models, which is assumed to

derive from the high misclassification rates between the grassland and the cropland classes.

Page 39: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

34

Table 8: Accuracy assessment results based on the independent test dataset, reported as Kappa

coefficient () and overall accuracy (OA), for the Random Forest land cover classification

models with different predictor datasets after feature selection for the 5-class- and the 4-class

scheme (colour scaling relative for each column).

Kappa OA Kappa OA

Single-date wet season 0.7501 81.37% 0.8786 91.30%

Single-date dry season 0.7490 81.37% 0.8530 89.44%

Multi-temporal all scenes 0.8666 90.06% 0.9564 96.89%

Multi-temporal wet season 0.8233 86.96% 0.8771 91.30%

Multi-temporal dry season 0.7531 81.37% 0.8872 91.93%

Multi-temporal combined seasons 0.7989 85.09% 0.9132 93.79%

Multi-temporal 6_1 random scenes 0.8832 91.30% 0.9126 93.79%

Multi-temporal 6_2 random scenes 0.8167 86.34% 0.9476 96.27%

Multi-temporal 6_3 random scenes 0.8413 88.20% 0.9302 95.03%

Multi-temporal 12_1 random scenes 0.8673 90.06% 0.9477 96.27%

Multi-temporal 12_2 random scenes 0.8325 87.58% 0.9389 95.65%

Multi-temporal 12_3 random scenes 0.8246 86.96% 0.9390 95.65%

Predictor dataset

5-class models 4-class models

Table 9 shows the confusion matrices of the accuracy assessment for the selected models, those

containing all scenes. The models reached an OA of 90.06% ( = 0.8666) with five classes and

96.89% ( = 0.9564) with four classes. For the 5-class model, only the water class reached UA

and PA of 100%. While the UAs of the woodland and the bare land classes were both around

93%, the PA of the bare land class reached nearly 97% and the woodland PA reached nearly

98%. As expected, for the grassland and cropland classes the model performed much worse.

While the UA of the cropland class with nearly 93% kept up with the other classes, its PA

reached only 59%. The grassland UA equalled 68% and its PA equalled 90%. Of the 21

grassland samples 17 were classified correctly, while four were misclassified, two of which as

bare land and one each as woodland and cropland. Of the 22 cropland samples 13 were

classified correctly, while nine were misclassified, five of which as cropland and two each as

woodland and bare land. For the 4-class model the continuous vegetation and water classes both

achieved UAs and PAs of 100%. The scattered or seasonal vegetation class reached a UA of

nearly 95% and a PA of just over 92%. The UA and PA of the bare land class reached just over

95% and nearly 97% respectively. Of the 39 scattered or seasonal vegetation samples in the test

data, 36 were classified correctly, while three were misclassified as bare land. Likewise, of the

60 bare land samples in the test data, 58 were correctly classified, while two were misclassified

as scattered or seasonal vegetation.

Page 40: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

35

Table 9: Confusion matrices of the classification predictions by the multi-temporal model with

all scenes after feature selection and the actual classes of the independent test dataset with five

classes (upper) and four classes (lower).

3.3 Feature importance

The input feature importance rankings of the selected models are depicted in Figure 7. Table

10 provides a summary of the importance grouped by feature type (the 10 spectral bands and

the 7 vegetation indices). In the 5-class model the predictor dataset was comprised of 47 input

features after the feature selection process, while the predictor dataset for the 4-class model

after feature selection was comprised of 46 input features. In the 5-class model, only 12 of the

17 feature types contributed to the classification. In the 4-class model this number decreased to

nine. Overall, the blue band is the feature with the highest explanatory power. It accounts for

31% and 25 % of the explanatory power in the 5-class model and the 4-class model respectively.

Additionally, the red band, NDVI, GNDVI and SAVI rank high. It is notable that the NIR,

narrow NIR, and VRE bands rank rather low or are not even occurring in the models after

feature selection.

Page 41: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

36

Figure 7: Input feature importance ranking measured by the mean decrease in accuracy of the

multi-temporal models with all scenes with five classes (upper) and four classes (lower).

Page 42: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

37

Table 10: Input feature importance ranking of the multi-temporal models with all scenes after

feature selection grouped by spectral band and vegetation index for the 5-class model (left) and

the 4-class model (right).

Feature Frequency Normalised score

Blue 10 0.31

Red 5 0.15

GNDVI 5 0.11

NDVI 4 0.11

SAVI 3 0.09

IRECI 5 0.06

Green 3 0.06

SWIR1 2 0.05

VRE1 2 0.02

NIR 2 0.02

VRE3 2 0.01

NDII12 2 0.01

SWIR2 1 0.00

VRE2 1 0.00

NarrowNIR 0 0.00

GRVI 0 0.00

NDII11 0 0.00

Multi-temporal model with all scenes

5 classes

Feature Frequency Normalised score

Blue 8 0.25

NDVI 9 0.23

SAVI 7 0.17

Red 5 0.10

SWIR1 5 0.08

GNDVI 3 0.08

IRECI 4 0.07

GRVI 1 0.01

NDII12 3 0.01

NarrowNIR 1 0.00

Green 0 0.00

VRE1 0 0.00

VRE2 0 0.00

VRE3 0 0.00

NIR 0 0.00

SWIR2 0 0.00

NDII11 0 0.00

Multi-temporal model with all scenes

4 classes

The importance ranking by scene (i.e. date) is summarised in Table 11. The 13 March 2017 and

22 May 2017 scenes can clearly be identified as the scenes resulting in the highest mean

decrease in accuracy (MDA). In the 5-class model they account respectively for 17% and 23%

of the explanatory power, while in the 4-class model it is 20% and 17% respectively. Moreover,

the scenes from 18 November 2017 and 28 November 2017 rank relatively high. It is notable

that all these dates represent the wet season. While March and May fall into the major wet

season, November falls into the minor wet season. To investigate a potential correlation of the

explanatory power of the dates and the seasonality of vegetation on Rusinga Island more

closely, the MDA scores for each date were plotted against the mean land area NDVI, which

represents the phenology (Figure 8). While no unambiguous correlation between the feature

importance of the dates and the phenology can be identified, Figure 8 clearly shows that the

dates with the highest explanatory power (13 March, 22 May, 18 November and 28 November)

correlate with the vegetation peaks represented by the mean land area NDVI.

Page 43: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

38

Table 11: Input feature importance ranking of the multi-temporal models with all scenes after

feature selection grouped by date for the 5- (left) and the 4-class (right) models.

Date Frequency Normalised score

20170522 9 0.23

20170313 7 0.17

20171128 4 0.13

20171118 3 0.08

20170815 5 0.08

20170805 4 0.05

20180226 4 0.05

20171029 1 0.04

20180117 1 0.03

20170909 1 0.03

20170711 1 0.03

20170731 1 0.02

20171228 1 0.02

20170701 3 0.02

20180211 1 0.01

20171004 1 0.01

20170402 0 0.00

20170412 0 0.00

20171009 0 0.00

20171103 0 0.00

20171213 0 0.00

20171223 0 0.00

20180122 0 0.00

20180201 0 0.00

20180216 0 0.00

Multi-temporal with all scenes

5 classes

Date Frequency Normalised score

20170313 6 0.20

20170522 5 0.17

20171118 4 0.11

20171103 4 0.09

20180216 3 0.08

20171128 2 0.06

20180226 5 0.06

20171029 2 0.05

20171228 2 0.05

20170815 4 0.04

20171223 1 0.03

20180117 1 0.02

20180122 2 0.01

20180211 1 0.01

20171009 2 0.01

20171004 1 0.00

20170805 1 0.00

20170402 0 0.00

20170412 0 0.00

20170701 0 0.00

20170711 0 0.00

20170731 0 0.00

20170909 0 0.00

20171213 0 0.00

20180201 0 0.00

Multi-temporal model with all scenes

4 classes

Figure 8: Normalised feature importance score of the individual dates of the 5- and 4-class

multi-temporal models with all scenes compared to the mean NDVI of all scenes of the overall

land area of Rusinga Island.

Page 44: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

39

3.4 Land cover maps

Figure 9: Land cover maps of Rusinga Island with five classes (upper) and four classes (lower)

based on the model with all scenes and for visualisation purposes filtered to reduce noise.

Page 45: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

40

The maps in Figure 9 show the results of the Random Forest classification model with all scenes

and five classes (upper) and four classes (lower). To reduce noise for better visual

interpretability the maps were filtered. When including buildings (OpenStreetMap contributors,

2017) in the map, some bare land areas can be identified as residential areas (Figure 10). Most

obviously, the largest residential area right at the driveway to the mainland. It also shows that

the large bare land areas around the central hill, especially on its north-eastern, eastern, and

southern slopes are mostly free from residential use. The same applies to the smaller hill with

the tower on top (in the south-west of the central hill) and the large bare area in the very western

part of the island.

Figure 10: Rusinga Island land cover map with five classes based on the model with all scenes

and additional buildings data from OpenStreetMap.

Figure 11 shows a comparison of the land cover map created in this study with existing land

cover maps. While the Global Land Cover 2000 map (A; Mayaux et al., 2003) with a resolution

of 1 km merely just identifies the island, the two 300 m maps GlobCover 2009 and Land Cover

Map 2015 (B and C; ESA & Université Catholique de Louvain, 2010; ESA CCI Land Cover

Project, 2015) provide more detail and give a brief overview of the island’s land cover

composition. The Globeland30-2010 map (D; National Geomatics Center of China, 2010) with

a resolution of 30 m achieves a quite detailed discrimination between forest, shrubland,

grassland, and cultivated land. However, it does not detect any bare land or artificial land on

Rusinga. The Global Forest Cover Change map (E; Hansen et al., 2013), also with a spatial

resolution of 30 m, does not detect any changes in forest cover on Rusinga between 2000 and

2018. However, it gives a quite precise estimation of the island’s rather low forest cover. The

Page 46: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

41

vector map Kenya Land Cover 2007 (F; Landsberg & Henninger, 2007) is neither detailed nor

accurate. The S2 Land Cover Map of Africa 2016 (G; ESA CCI Land Cover Project, 2017) with

a spatial resolution of 20 m, is very comparable to the one produced in this study (H) as it was

derived from the same satellite data. For better comparison, the map produced in this study was

adjusted to comply with the legend of the ESA Climate Change Initiative (CCI) land cover map.

The major differences are that this study better detected the bare areas at the hill slopes.

Moreover, the ESA CCI land cover map shows some grassland – cropland confusion on the

hilltop and it detects a few aquatic or regularly flooded vegetation areas, which this study did

not observe. This study seems to detect more woodland (including trees and shrubs), while the

ESA CCI map classified much of these areas as grassland. Moreover, this study detected much

more bare land / built-up areas.

Page 47: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

42

Figure 11: Comparison of the land cover map produced in this study with existing (global) land

cover maps for the study area: A: European Commission Joint Research Centre: Global Land

Cover 2000 (based on SPOT 4 - VEGETATION 1 (spatial resolution 1 km); Mayaux et al.,

Page 48: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

43

2003); B: ESA & UC Louvain: GlobCover 2009 (based on ENVISAT - MERIS (spatial

resolution 300 m); ESA & Université Catholique de Louvain, 2010); C: ESA Climate Change

Initiative: Land Cover Map 2015 (based on ENVISAT - MERIS, AVHRR, SPOT -

VEGETATION, and PROBA-V (spatial resolution 300 m); ESA CCI Land Cover Project,

2015); D: Chinese National Geomatics Center: Globeland 30 (based on Landsat TM 5, Landsat

ETM+, and HJ-1 (spatial resolution 30 m); National Geomatics Center of China, 2010); E:

USGS & University of Maryland: Global Forest Cover Change (based on Landsat (spatial

resolution 30 m); Hansen et al., 2013); F: World Resources Institute: Kenya Land Cover 2007

(vector data; Landsberg & Henninger, 2007); G: Prototype S2 Land Cover Map of Africa 2016

(based on Sentinel-2 (spatial resolution 20 m); ESA CCI Land Cover Project, 2017); H: this

study’s output: Rusinga Island Land Cover Map (based on Sentinel-2 (spatial resolution 10

m)).

3.5 Land cover change detection

Owing to the prominent confusion between grassland and cropland in the 5-class scheme the

resulting change maps for these scenarios present a relatively great amount of changes between

these classes, where it is impossible to distinguish between actual conversion and the result of

misclassification. Therefore, the results of the change detection are presented only for the 4-

class scheme. The map in Figure 12 presents the entire change map, while the two maps in

Figure 13 represent areas where vegetation cover increased or upgraded (upper) and decreased

or degraded (lower).

Figure 12: Land cover change map of Rusinga Island between May 2016 – April 2017 and May

2018 – April 2019.

Page 49: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

44

Figure 13: Land cover change maps of Rusinga Island between May 2016 – April 2017 and

May 2018 – April 2019 highlighting vegetation increase (upper) and decrease (lower).

Page 50: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

45

The vegetation cover change maps (Figure 13) reveal that increased or upgraded vegetation

cover is predominantly located in the north eastern part of the island and at the northern hill of

the central part of the island where the major changes are conversion from scattered or seasonal

vegetation to continuous vegetation cover. Moreover, some areas north of the residential area

at the driveway experienced conversion from bare land to scattered or seasonal vegetation. The

most common changes are scattered or seasonal vegetation to continuous vegetation and bare

land to scattered or seasonal vegetation. Changes from bare land or water to continuous

vegetation and from water to scattered or seasonal vegetation are neglectable. The most severe

decrease of vegetation cover is observed at the slopes of the island’s central hill, where scattered

or seasonal vegetation was converted to bare land, except for the north eastern slopes.

Furthermore, several rather scattered areas in the south western parts of the island experienced

vegetation decrease and degradation both from continuous vegetation to scattered or seasonal

vegetation and from scattered or seasonal vegetation to bare land.

Page 51: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

46

4. Discussion

4.1 Sentinel-2 data for Random Forest land cover classification

One aim of this study was to explore the suitability of free open access Sentinel-2 data (ESA,

2019) and remotely collected reference data to classify land cover on Rusinga Island, Kenya

using the Random Forest classifier (Breiman, 2001). It is generally accepted among researchers

that in remote sensing applications and land management overall classification accuracies (OA)

should reach at least 85% (Kussul, Lavreniuk, Skakun, & Shelestov, 2017; Otukei & Blaschke,

2010; Willis, 2015; Wulder, Franklin, White, Linke, & Magnussen, 2006), as initially suggested

by Anderson (1971) when developing a standardised framework for land use and land cover

classification (Anderson et al., 1976). However, this benchmark is also questioned as evaluating

model performance highly depends on the purpose and the nature of the model (Laba et al.,

2002; Wulder et al., 2006). This is discussed in more detail in the sections below. In this study,

a feature selection process was undertaken to reduce the number of features to an optimum and

select those with the highest contribution to the model accuracy (Chapter 2.5.2). It was found

that the feature selection improved model performance. Subsequently, all models considered

for further comparisons have undergone the feature selection process. Generally, most of the

models examined in this study produce classification accuracies above the 85% benchmark with

OAs ranging between 83.51% and 95.48% ( = 0.7859 – 0.9380) in the internal out-of-bag

(OOB) assessment and 81.37% and 96.89% ( = 0.7490 – 0.9564) in the accuracy assessment

based on the independent test dataset. Only three out of the 24 tested models reach an OA of

less than 85% (Table 6 and Table 8).

4.1.1 Single-date vs. multi-temporal datasets

The models were compared by their different predictor datasets. Some of them were comprised

of single-date data, while others contained multi-temporal data. The multi-temporal models

clearly reached higher classification accuracies than the single-date models (Table 6 and Table

8). This result was expected as including remote sensing data collected at different points in

time in the predictor datasets broadens the variability of spectral signatures on which the model

is being trained. Thus, the greater the variability within the training data, the better the model

performs at correctly classifying image pixels which deviate from the most common spectral

signatures of the classes. This finding is in line with previous research, which has highlighted

the benefits of using multi-temporal datasets for remote sensing-based image classification

(Esch, Metz, Marconcini, & Keil, 2014; Guo, Price, & Stiles, 2003; Karlson et al., 2016; Senf,

Leitão, Pflugmacher, van der Linden, & Hostert, 2015; Tigges, Lakes, & Hostert, 2013). Vuolo,

Richter, and Atzberger (2011) found an increasing number of features included in the

classification model resulting in higher accuracies. However, in this study the trend only held

up to a certain number of features. Some features with low explanatory power do not contribute

to the classification accuracy and therefore can be eliminated to optimise model performance

(Hastie et al., 2009). The feature selection results of this study, which indicated higher

performance of the models after feature selection, thus including less features, seem to

contradict the assumption of more features resulting in higher accuracies. It is however

complementary. The more features were initially included, the greater are the chances of those

Page 52: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

47

features with the highest explanatory power being selected in the feature selection process. The

results of the accuracy assessment in this study only partly confirm the hypothesis that more

features increase model accuracy. Table 12 (adopted from Table 8 in the results section)

summarises the classification accuracy and adds the number of sensing dates and the number

of features included in each dataset after feature selection. It clearly shows that in both cases

(with five classes and four classes) the model with all 25 scenes reached the highest accuracy

and also included the highest number of features (and scenes). However, the results show no

clear correlation between the model accuracy and the number of features or number of scenes.

Hence, the results indicate that other factors than the number of features or scenes included in

the predictor dataset, for example the season in which the data was sensed and the classification

scheme, play relevant roles in the classification accuracy of the models. These factors are

discussed in the following sections.

Table 12: Adopted from Table 8 in results chapter: accuracy assessment results of the 5-class

models (upper) and 4-class models (lower) after feature selection sorted by Kappa. Number of

scenes indicates the number of different sensing dates included in the predictor datasets after

feature selection. Number of features indicates the number of features included in the predictor

dataset after feature selection.

Predictor datasets 5-class models Kappa OA No. of scenes No. of features

Single-date dry season 0.7490 81.37% 1 11

Single-date wet season 0.7501 81.37% 1 2

Multi-temporal dry season 0.7531 81.37% 6 11

Multi-temporal combined seasons 0.7989 85.09% 2 9

Multi-temporal wet season 0.8233 86.96% 4 35

Multi-temporal 12 random scenes* 0.8415 88.20% 7 13

Multi-temporal 6 random scenes* 0.8471 88.61% 6 29

Multi-temporal all scenes 0.8666 90.06% 16 47

* Average of three iterations

Predictor datasets 4-class models Kappa OA No. of scenes No. of features

Single-date dry season 0.8530 89.44% 1 9

Multi-temporal wet season 0.8771 91.30% 2 5

Single-date wet season 0.8786 91.30% 1 16

Multi-temporal dry season 0.8872 91.93% 6 20

Multi-temporal combined seasons 0.9132 93.79% 2 15

Multi-temporal 6 random scenes* 0.9301 95.03% 4 13

Multi-temporal 12 random scenes* 0.9419 95.86% 10 34

Multi-temporal all scenes 0.9564 96.89% 17 46

* Average of three iterations

4.1.2 Classification scheme

The reason for distinguishing two different classification schemes throughout this study is the

rather high misclassification rate between the grassland and the cropland classes in the 5-class

scheme as described in Chapter 2.3. The challenge to correctly classify grassland and cropland

Page 53: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

48

can be attributed to the similar spectral signatures of the two classes (Figure 4) and has been

discussed in previous research (Esch et al., 2014; Yin, Pflugmacher, Kennedy, Sulla-Menashe,

& Hostert, 2014). Nevertheless, the predictor datasets were tested with both five and four

classes, because even if the classification accuracy of the 5-class models is lower, the results

might still be useful for application purposes as the 5-class scheme provides a finer distinction

of land cover. Table 6 and Table 8 clearly show the higher prediction accuracy of the 4-class

models compared to the 5-class models. For the best performing models, the OAs when

discriminating four classes are around seven percentage points higher than those of the same

models discriminating five classes. The higher performance of the models discriminating fewer

classes is not surprising regarding the previously described similarity of spectral signatures of

the grassland and cropland classes. However, misclassification in the 5-class model does not

only occur between the grassland and cropland classes. The confusion matrices of the 5-class

model in Table 7 and Table 9 show that a rather large share of the misclassified grassland and

cropland samples, 38-40% in the internal validation and 44-75% in the external accuracy

assessment, were misclassified as either woodland or bare land. While grassland is more

commonly confused for bare land, cropland is more commonly misclassified as woodland. This

is assumed to result from the reference data, which includes some cropland samples that most

probably are perennial crops or fruit trees. Moreover, the distinction between grassland and

bare land in the reference data collection process was a challenge as the visual interpretation

was mainly performed on the very high resolution (VHR) image, which was taken in the dry

season where much of the grassland is dry and brown which makes it difficult to distinguish it

from bare soil. The use of the VHR image from August for verification reduced this limitation

to some extent. Another factor for the higher accuracy performance of the 4-class models is

simply the number of classes. The fewer classes to distinguish, the higher can the prediction

accuracy be expected. This is because the classification criteria become more distinct and, at

least in this study, the total amount of training samples is divided among fewer classes, resulting

in more training samples per class, which positively contributes to the classification accuracy.

That the number of classes to discriminate between affect the classification accuracy has also

been observed by Esch et al. (2014), whose classification model OA reached 90% when

discriminating only grassland and cropland, 86% when regarding five classes, and only 70%

when regarding 11 different classes. It is suggested here that the 5-class model will be more

useful for application purposes, assuming that the cropland – grassland discrimination provides

acceptable results that can be worked with on the ground. Being able to work with three

different vegetation classes, woodland, grassland and cropland, could benefit the vegetation

restoration and monitoring processs by differentiating the type and purpose of the vegetation.

While cropland areas would be useful to promote sustainable farming practices such as

perennial crops or agroforestry, the focus of restoring vegetation cover on grass- or woodland

(including shrubs) is stabilisation and subsequently nurturing and moisturinsing the soil.

However, in the 5-class model, the user’s accuracy (UA) of the grassland class with 68% is

particularly low while the remaining three classes all reach more than 92% (Table 9). In the 4-

class model the UAs all reach more than 94%. Thus, if the classification outputs prove

inconsistent with ground truth, it might be better to consider the 4-class model. Thus, as the

classification outputs were not validated in situ, the change detection was performed using a 4-

class model.

Page 54: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

49

4.1.3 Seasonal differences and phenology

Determining the optimal timing for data acquisition is a re-occurring issue in remote sensing-

based land cover classification as it depends on the local ground circumstances, climate, data

availability and the purpose of the classification. For example, while Karlson et al. (2016) found

dry season imagery, thus post growing season, resulting in higher classification accuracies for

tree species discrimination in Burkina Faso, Tigges et al. (2013) found growing season imagery

slightly more accurate for the same purpose in Berlin, Germany. Regarding the results of the

single-date models in this study, which concerns land cover classification rather than tree

species discrimination, the growing (wet) season models reach better classification

performance. While for the single-date models the performance of the wet and dry season differ

only slightly, the multi-temporal wet season model with five classes reaches a much higher OA

than the dry season one. However, the difference is much smaller for the 4-class models

(Table 6 and Table 8). This indicates that the discrimination between grassland and cropland,

the class discrimination that was dissolved in the 4-class scheme, is dependent on the

phenology. While the vegetation is greener during the wet season, grassland and cropland might

be better distinguishable. Subsequently, it would be worth analysing the confusion matrices and

comparing the misclassification between grassland, cropland and bare land during the wet

season and the dry season. A land cover classification study from southern Portugal (Senf et

al., 2015) found different optimal timing of image acquisition for different land cover classes.

The authors suggest that in southern Portugal March, which represents the end of the rainy

season and the peak of vegetation productivity, is the best time for discriminating between

cropland and grassland. For Rusinga, the end of the rainy season and the peak of vegetation

productivity would be late May and June and, slightly less pronounced, late November and

early December. Besides considering the seasonal models, the importance of the dates in this

study was assessed on the multi-temporal models with all scenes (Table 11 and Figure 8). The

analysis shows that of the four most important dates two fall into the major wet season

(13 March 2017 and 22 May 2017) and two into the minor wet season (18 and 28 November

2017). Thus, these findings are in line with the suggestion by Senf et al. (2015). Nevertheless,

the seasonal models clearly produce lower accuracies than the models covering a wider time

period spread over the year (Table 8). For example, the multi-temporal models with combined

seasons, which include only the features from the single-date dry season and the single-date

wet season (i.e., two scenes), perform better than the multi-temporal single season models,

which include 4 and 6 scenes with an exception of the 5-class multi-temporal wet season model,

which scores higher. This indicates that not only the number of features included in the predictor

dataset influences the prediction accuracy, but also the time of the year when the data were

sensed. The models with input data covering a larger timeframe and different times of the year,

result in higher accuracies. This finding is supported by previous studies (Esch et al., 2014;

Karlson et al., 2016; Senf et al., 2015; Tigges et al., 2013).

4.1.4 Vegetation indices

One of the sub-questions of this study was how vegetation indices contribute to the

classification accuracy compared to the spectral bands. Many land cover classification studies

include one or more vegetation index measures in their data to account for phenological changes

and to improve identification and distinction of vegetation cover (e.g. Eckert et al., 2015;

Page 55: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

50

Lunetta et al., 2006; Motohka et al., 2010; Nyberg et al., 2015; Viña et al., 2011; Wibowo et

al., 2012; Yang et al., 2015). Therefore, it was expected that vegetation indices contribute

additionally to the classification performance. To compare the contribution (i.e. importance) of

the seven different vegetation indices and the spectral bands used in this study, a feature

importance ranking was performed, which proved the high contribution of the vegetation

indices to the classification accuracy (Figure 7 and Table 10). The blue band ranks particularly

high, while the red band, the NDVI, the GNDVI and the SAVI follow. The high importance of

the blue band could be explained by the large amount water surrounding the study area.

Verifying this hypothesis by repeating the analysis with only the land area of Rusinga was,

however, not included in this thesis. The importance of the NDVI could be expected due to its

popularity as vegetation proxy. The relatively high importance of the SAVI was to be expected

as it corrects for soil reflectance on vegetation reflectance, which is particularly useful in

scattered or patchy vegetated areas (Huete, 1988) such as Rusinga Island. The high importance

of the GNDVI is somewhat more surprising as it was developed to increase the sensitivity to

dense vegetation (Gitelson et al., 1996), which is not particularly common on Rusinga Island.

The rather low importance of the two NDIIs was to be expected as they are sensitive to leaf

water content (Kimes et al., 1981), while the vegetation on Rusinga Island mainly consists of

scattered shrubs and grasses. Surprising was the low importance of the IRECI, which was

particularly developed for Sentinel-2 data to incorporate the newly included red-edge bands

(Frampton et al., 2013). However, Frampton et al. (2013) note the need for further validation

of their index. In the same line of argument, it is notable that the VRE as well as the NIR and

narrow NIR spectral bands all show relatively low importance in this study (Table 10).

Nevertheless, it could be worth testing to try variations of the vegetation indices for example

substituting the NIR band for the narrow NIR band, as is was designed to reduce the water

vapour disturbance from which the NIR band suffered in earlier Landsat missions (ESA, 2015).

Jin et al. (2013) developed the Multi-Index Integrated Change Analysis method, which was

developed for the USGS National Land Cover Database. Besides the NDVI as vegetation index

it includes the Normalised Burn Ratio, the Change Vector, and the Relative Change Vector

Maximum, which complement each other being sensitive to different kinds of changes.

Accordingly, it could have been worth to diversify the features in this study more and to pay

more attention to the complementation of the different input features. Nevertheless, this study

proves that including vegetation indices in the training data benefits classification performance.

Moreover, it confirms the suitability of the NDVI, but also shows that other vegetation indices

might be worth to consider.

4.1.5 Rusinga land cover maps

Even if the real suitability of the produced land cover maps needs to be confirmed locally on

the ground, the comparison with existing land cover maps (Figure 11) shows that this study

improved the quality of land cover maps for Rusinga Island. The comparison clearly shows the

improvement of remote sensing data over the years and the benefit of higher spatial resolution.

While the S2 Land Cover Map of Africa 2016 from the European Space Agency’s Climate

Change Initiative (ESA CCI Land Cover Project, 2017) was derived from data from same

satellite, the maps of this study seem to be more accurate. It has to be noted that the ESA CCI

map is a prototype and has a spatial resolution of 20 m. Nevertheless, the higher accuracy of

Page 56: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

51

this study’s map is not surprising as the training data and the classification models in this study

were developed particularly for the study area, while the existing 2016 Sentinel-2 land cover

map was taken from an Africa-wide classification output, which cannot include as much local

detail in the model development. This shows the benefit of developing a classification model

particularly for the concerned area rather than on a large scale. However, land cover

classification maps of large areas would be valuable and there are several projects currently

working on automated land cover classification of entire continents, such as the ESA CCI Land

Cover Project (ESA CCI Land Cover Project, 2019) that developed the cited Africa map and

another ESA funded project led by the Space Research Centre of the Polish Academy of

Sciences (ESA, n.d.) that has produced a 10 m land cover map of entire Europe and aims for a

global scale.

4.2 Post-classification change detection for land cover change and

vegetation monitoring

Even if post-classification change detection is a convenient and often-used approach for

detecting and mapping land cover changes, post-classification methods are entirely dependent

on the input maps. Thus, any error in any of the maps is directly transferred to the change map

and subsequently makes it very dependent on the quality of the classification process (Coppin,

Jonckheere, Nackaerts, Muys, & Lambin, 2004; Tewkesbury et al., 2015). Therefore, it is

difficult to assess the quality of the change detection without being entirely certain of the

classification quality. Moreover, change detection validation is particularly challenging for two

reasons: Firstly, it requires validation data from the earlier and the later point in time of the

change period. Secondly, changes typically occur only over a rather small area compared to the

overall classified area.

The change maps produced in this study seem to achieve acceptable results, although the actual

quality should be verified on the ground. Several studies have utilised Sentinel-2 data and noted

their suitability for various types of land cover classification and change detection (Frampton

et al., 2013; Immitzer et al., 2016; Inglada et al., 2015; Pesaresi et al., 2016). Many of these

classifications are much more complex than the one in this study, discriminating not only

between different types of land cover, but between different species, of trees or crops for

example. Accordingly, it should be possible to achieve better classification accuracies, which

represent the ground truth more precisely. The major challenge for the classification in this

study is the questionable quality of the reference data, which was collected remotely and over

a limited period of time. Although potential intra-annual changes were accounted for by

consulting VHR satellite images sensed at different times of the year (August 2016 and

February 2017) and different scenes temporally spread over the time period from March 2017

to February 2018, the reliability of the ‘ground truth’, which the reference data should represent,

remains questionable as only satellite-based data and visual image interpretation was used.

Therefore, in-situ verification of the change detection is considered necessary. Moreover, to

perform a reliable change analysis covering a larger time span updated reference data is

necessary. With the methods used in this study, each classification model was developed based

on the corresponding predictor dataset and the reference samples. To avoid re-examining or

collecting new reference data, a method would be needed that allows a model to learn on a

Page 57: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

52

given training dataset and predict any other input dataset accordingly. To complement the

change detection analysis a change matrix indicating the percentages of changes from one class

to another would be beneficial. However, this would make most sense when only the land area

of the island, including a small shoreline area, would be considered to reduce the amount of

water, as mentioned in relation to the feature importance (Chapter 4.1.4)

4.3. Concluding remarks and future outlook

This study found Sentinel-2 data and Random Forest useful to classify and map land cover on

Rusinga Island. As VHR satellite data can be costly and therefore financially unfeasible for

many applications, especially in poorer regions, data and software used in this study are

available free of charge. The trade-off between spatial and temporal resolution on the one hand

and the cost of data acquisition on the other hand is often referred to as a limitation of remote

sensing (Karlson et al., 2016; Willis, 2015). However, the rapid developments in remote sensing

technology reduce this trade-off as data becomes available at higher resolution and lower cost.

With a spatial resolution of up to 10 m for multi-spectral bands and a temporal resolution of

five days the freely available Sentinel-2 data, which have been awaited by the scientific

community, provide substantial improvements compared to for example the Landsat-8 satellite

launched in May 2013, which provides multi-spectral data at 30 m resolution and a

panchromatic band at 15 m resolution (Karlson, 2015). Sentinel-2’s high temporal resolution

also reduces issues with limited data availability caused by cloud or haze contamination, which

is a prominent issue in remote sensing applications (Esch et al., 2014; Karlson, 2015; Willis,

2015). Hence, using multi-temporal data for land cover classification, which has been proven

to provide better classification results than single-date data, became more feasible with the

improvements of Sentinel-2. Many scientists highlight the benefits of Sentinel-2 data for Earth

Observation and the twin-satellites undoubtfully set a new status quo in the field of freely

available high spatial and temporal resolution satellite data (Frampton et al., 2013; Immitzer et

al., 2016; Inglada et al., 2015; Karlson, 2015; Pesaresi et al., 2016). Despite no doubt about the

suitability of Sentinel-2 data for terrestrial monitoring, efforts are being made to evaluate the

usefulness and performance of combined data sources such as Sentinel-2 and Landsat 8 (Inglada

et al., 2015; Senf et al., 2015; Zhu & Woodcock, 2014). Moreover, additional input features

such as LiDAR data and digital elevation models could improve the classification.

Random Forest is a widely used and accepted classifier not only in the field of land use and

land cover change and performs well in this study. The Random Forest classifier has also been

found most suitable in an ESA-funded project aiming to automatically classify global land

cover using Sentinel-2 data (ESA CCI Land Cover Project, 2019). Nevertheless, many studies

compare different classification methods different specific purposes within the same field and

find several suitable classifiers among which Random Forest, Support Vector Machine and

Neural Networks (Duro et al., 2012; Lu & Weng, 2007; Otukei & Blaschke, 2010). Moreover,

with the increasing spatial resolution of freely available satellite data a scientific discussion has

started about the benefits of object-based classification compared to pixel-based classification

(Duro et al., 2012; Hussain et al., 2013; Immitzer et al., 2012; Immitzer et al., 2016; Karlson,

2015; Tewkesbury et al., 2015; Weih & Riggan, 2010; Yu, Zhou, Qian, & Yan, 2016).

Page 58: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

53

The biggest challenge and limitation in this study is the remotely collected reference data.

Visual image interpretation, which was used to collect the reference data for this study, requires

experience and knowledge of the objects on the ground (Albertz, 2009), which could not be

assured in this study. If it could be assured that the classes in the training and test data were

allocated correctly, the models work quite well, however, any errors are drawn into the

classification and result in undetectable errors. Therefore, at least a sample test dataset should

be collected in situ or a random sample of the reference data should be verified on the ground.

Additionally, other existing land cover maps could be included to inform the remote reference

data collection. However, the availability and quality of existing maps cannot be guaranteed.

The change detection produced acceptable results, however, their usefulness for vegetation

monitoring is questionable. To allow for the detection of detailed vegetation changes, the

classification quality should be improved for example by including in situ reference data. Once

the reference data is improved, using a finer classification scheme could become possible for

example including a differentiation between trees, shrubs, grasses and crops as well as between

bare land and built-up areas. Additionally, to achieve more useful change detection results the

analysed time frame should be longer. However, for the ongoing vegetation restoration on the

island the maps produced in this study could prove useful for detecting eroded areas to support

the revegetation planning. To support the identification of areas at risk of erosion, the

development of an erosion model for example based on the Revised Universal Soil Loss

Equation (Renard, Foster, Weesies, McCool, & Yoder, 1997) would be beneficial. The results

of this study will be used to develop a tool for land cover classification and change detection to

be used by the organisations working on Rusinga Island to restore vegetation cover. The tool is

expected to support planning and monitoring of the planting activities.

Acknowledgements

I would like to express my gratitude to my supervisors Martin Karlson at the Department for

Thematic Studies – Environmental Change at Linköping University and Markus Immitzer at

the Department of Landscape, Spatial and Infrastructure Sciences – Geomatics at BOKU

University of Natural Resources and Life Sciences, Vienna. Thank you for your support, your

invaluable feedback and your patience. I also want to thank Sebastian Böck for supporting me

with data acquisition and programming. Further, I want to thank Bernhard Wagenknecht, the

founder of Books for Trees, for all the background information, stories and pictures that helped

me gaining an impression of Rusinga Island without having been there. Thank you for initiating

the idea for this thesis together with Herbert Formayer at BOKU University. Moreover, I

express my thanks to Isabella Ostovary and Annika Neid, who have been on Rusinga for Books

for Trees and shared their impressions, knowledge, stories and pictures with me. Thank you

also, Evans Odula, founder of Badilisha, for providing me with background information.

Finally, I would like to express my gratitude to my friends and family, also my partner’s family,

who supported me whenever possible. Special gratitude goes to my partner Felix, who endured

this thesis process with me, always had a motivating attitude and supported me with technical

and programming issues.

Page 59: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

54

References

Albertz, J. (2009). Einführung in die Fernerkundung. Grundlagen der Interpretation von Luft-

und Satellitenbildern (4th ed.). Darmstadt, Gernamy: Wissenschaftliche

Buchgesellschaft.

Anderson, J. R. (1971). Land Use Classification Schemes used in selected recent geographic

applications of remote sensing. Photogrammetric Engineering & Remote Sensing,

April, 379-387.

Anderson, J. R., Hardy, E. E., Roach, J. T., & Witmer, R. E. (1976). A Land Use and Land

Cover Classification System for Use with Remote Sensor Data (Vol. 964).

Washington, DC, USA United States Government Printing Office.

Andrews, P. (1973). Vegetation of Rusinga Island. Journal of the East Africa Natural History

Society and National Museum, 142.

Badilisha. (n.d.). Rusinga Island Community. Retrieved from

http://www.badilishapermaculture.org/people/. Accessed: 31 January 2019

Bastola, S., Dialynas, Y. G., Bras, R. L., Noto, L. V., & Istanbulluoglu, E. (2018). The role of

vegetation on gully erosion stabilization at a severely degraded landscape: A case

study from Calhoun Experimental Critical Zone Observatory. Geomorphology, 308,

25-39. doi:10.1016/j.geomorph.2017.12.032

Bivand, R., Keitt, T., Rowlingson, B., Pebesma, E., Sumner, M., Hijmans, R. J., . . . Rundel,

C. (2019). Bindings for the 'Geospatial' Data Abstraction Library. Retrieved from

https://cran.r-project.org/web/packages/rgdal/rgdal.pdf. Accessed: 07 September 2019

Borrelli, P., Robinson, D. A., Fleischer, L. R., Lugato, E., Ballabio, C., Alewell, C., . . .

Panagos, P. (2017). An assessment of the global impact of 21st century land use

change on soil erosion. Nature Communications, 8(1). doi:10.1038/s41467-017-

02142-7

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.

doi:10.1023/A:1010933404324

Breiman, L., & Cutler, A. (2004). Random forests - classification description. Random

Forests. Retrieved from

https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm. Accessed: 14

December 2018

Breiman, L., Cutler, A., Liaw, A., & Wiener, M. (2018). Breiman and Cutler's Random

Forests for Classification and Regression. 29. Retrieved from https://cran.r-

project.org/web/packages/randomForest/randomForest.pdf. Accessed: 07 September

2019

Byrne, L. (2013). Helping small farmers help themselves, on Rusinga Island, Lake Victoria,

Kenya. Retrieved from https://permaculturenews.org/2013/01/29/helping-small-

farmers-help-themselves-on-rusinga-island-lake-victoria-kenya/ Accessed: 22

November 2019

Cebecauer, T., & Hofierka, J. (2008). The consequences of land-cover changes on soil erosion

distribution in Slovakia. Geomorphology, 98(3-4), 187-198.

doi:10.1016/j.geomorph.2006.12.035

Clerc, S., & Team, M. (2018). S2 MPC. L1C Data Quality Report (28). Retrieved from

https://earth.esa.int/documents/247904/3897638/Sentinel-

2_L1C_Data_Quality_Report/9699703d-556e-407a-a955-65c0c82dfcb5?version=1.2.

Accessed: 15 December 2018

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and

Psychological Measurement(1), 37-46.

Page 60: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

55

Cohen, W. B., Yang, Z., Healey, S. P., Kennedy, R. E., & Gorelick, N. (2018). A LandTrendr

multispectral ensemble for forest disturbance detection. Remote Sensing of

Environment, 205, 131-140. doi:https://doi.org/10.1016/j.rse.2017.11.015

Congalton, R. G. (1991). A review assessing the accuracy of classifications of remotely

sensed data. Remote Sensing of Environment, 37, 12.

Congalton, R. G., Gu, J., Yadav, K., Thenkabail, P., & Ozdogan, M. (2014). Global Land

Cover Mapping: A Review and Uncertainty Analysis. Remote Sensing, 6(12), 12070-

12093. doi:10.3390/rs61212070

Coppin, P., Jonckheere, I., Nackaerts, K., Muys, B., & Lambin, E. (2004). Digital change

detection methods in ecosystem monitoring: a review. International Journal of

Remote Sensing, 25(9), 1565-1596. doi:10.1080/0143116031000101675

Di Gregorio, A. (2005). Land Cover Classification System (LCCS), version 2: Classification

Concepts and User Manual. Retrieved from

http://www.fao.org/docrep/008/y7220e/y7220e02.htm#TopOfPage. Accessed: 08

October 2018

DigitalGlobe. (2016). World Imagery [Satellite Imagery]. Retrieved from:

https://www.arcgis.com/home/webmap/viewer.html?webmap=c1c2090ed8594e01931

94b750d0d5f83,

https://www.arcgis.com/home/item.html?id=10df2279f9684e4a9f6a7f08febac2a9.

Accessed: 08 May 2018

DigitalGlobe. (2017). WorldView-4. Retrieved from

http://worldview4.digitalglobe.com/#/main. Accessed: 27 March 2019

Duro, D. C., Franklin, S. E., & Dubé, M. G. (2012). A comparison of pixel-based and object-

based image analysis with selected machine learning algorithms for the classification

of agricultural landscapes using SPOT-5 HRG imagery. Remote Sensing of

Environment, 118, 259-272. doi:10.1016/j.rse.2011.11.020

Eckert, S., Hüsler, F., Liniger, H., & Hodel, E. (2015). Trend analysis of MODIS NDVI time

series for detecting land degradation and regeneration in Mongolia. Journal of Arid

Environments, 113, 16-28. doi:10.1016/j.jaridenv.2014.09.001

ESA. (2015). Sentinel-2 User Handbook. Retrieved from

https://sentinel.esa.int/documents/247904/685211/Sentinel-2_User_Handbook.

Accessed: 15 November 2018

ESA. (2018a). Naming Convention. Retrieved from https://sentinel.esa.int/web/sentinel/user-

guides/sentinel-2-msi/naming-convention. Accessed: 15 November 2018

ESA. (2018b). News. Upcoming Sentinel-2 Level-2A product evolution. Retrieved from

https://earth.esa.int/web/sentinel/missions/sentinel-2/news/-/article/upcoming-sentinel-

2-level-2a-product-evolution. Accessed: 15 November 2018

ESA. (2018c). Sentinel Application Platform (SNAP): European Space Agency. Accessed:

ESA. (2018d). User Guides. Level-2A. Retrieved from

https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/product-types/level-2a.

Accessed: 15 November 2018

ESA. (2019). Copernicus Open Access Hub [Data Base]. Retrieved from

https://scihub.copernicus.eu/dhus/#/home. Accessed: 25 August 2019

ESA. (n.d.). Global Land Cover - Sentinel-2. Retrieved from http://s2glc.cbk.waw.pl/.

Accessed: 05 December 2019

ESA, & Université Catholique de Louvain. (2010). GlobCover 2009 [Map]. Retrieved from:

http://due.esrin.esa.int/page_globcover.php. Accessed: 02 August 2019

ESA CCI Land Cover Project. (2015). Land Cover Map 2015 [Map]. Retrieved from

http://maps.elie.ucl.ac.be/CCI/viewer/index.php. Accessed: 02 August 2019

Page 61: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

56

ESA CCI Land Cover Project. (2017). CCI Land Cover - S2 Prototype Land Cover 20m Map

of Africa 2016 [Map]. Retrieved from:

http://2016africalandcover20m.esrin.esa.int/download.php. Accessed: 05 December

2019

ESA CCI Land Cover Project. (2019). ESA CCI Land Cover - S2 Prototype Land Cover 20 m

Map of Africa 2016. Retrieved from http://2016africalandcover20m.esrin.esa.int/.

Accessed: 05 December 2019

Esch, T., Metz, A., Marconcini, M., & Keil, M. (2014). Combined use of multi-seasonal high

and medium resolution satellite imagery for parcel-related mapping of cropland and

grassland. International Journal of Applied Earth Observation and Geoinformation,

28, 230-237. doi:10.1016/j.jag.2013.12.007

Eurostat. (2018). LUCAS - Land use and land cover survey. Retrieved from

https://ec.europa.eu/eurostat/web/lucas/overview. Accessed: 13 August 2019

FAO. (2013). World Livestock 2013 - Changing disease landscapes. Retrieved from

http://www.fao.org/3/i3440e/i3440e.pdf. Accessed: 04 March 2019

FAO. (2014). Perennial Crops for Food Security. Proceedings of the FAO Expert Workshop.

28-30 August, 2013, Rome, Italy. Retrieved from http://www.fao.org/3/a-i3495e.pdf.

Accessed: 09 August 2019

FAO. (2015). Agroforestry. Definition. Retrieved from

http://www.fao.org/forestry/agroforestry/80338/en/. Accessed: 09 August 2019

FAO. (2016a). Conservation Agriculture. Retrieved from http://www.fao.org/3/a-i6169e.pdf.

Accessed: 09 August 2019

FAO. (2016b). Global Forest Resources Assessment 2015. How are the world’s forests

changing? Retrieved from http://www.fao.org/3/a-i4793e.pdf. Accessed: 04 March

2019

FAO. (2016c). The State of Food and Agriculture. Climate Change, Agriculture and Food

Security. Retrieved from http://www.fao.org/3/a-i6030e.pdf. Accessed: 06 September

2019

FAO, & ITPS. (2015). Status of the World's Soil Resources (Main Report). Retrieved from

http://www.fao.org/3/a-i5199e.pdf. Accessed: 01 March 2019

Fensholt, R., & Proud, S. R. (2012). Evaluation of Earth Observation based global long term

vegetation trends — Comparing GIMMS and MODIS global NDVI time series.

Remote Sensing of Environment, 119, 131-147. doi:10.1016/j.rse.2011.12.015

Ferguson, R. S., & Lovell, S. T. (2013). Permaculture for agroecology: design, movement,

practice, and worldview. A review. Agronomy for Sustainable Development, 34(2),

251-274. doi:10.1007/s13593-013-0181-6

Foody, G. M. (2004). Thematic Map Comparison: Evaluating the Statistical Significance of

Differences in Classification Accuracy. Photogrammetric Engineering & Remote

Sensing, 70(5), 627-633.

Frampton, W. J., Dash, J., Watmough, G., & Milton, E. J. (2013). Evaluating the capabilities

of Sentinel-2 for quantitative estimation of biophysical variables in vegetation. ISPRS

Journal of Photogrammetry and Remote Sensing, 82, 83-92.

doi:10.1016/j.isprsjprs.2013.04.007

Gagolewski, M., Tartanus, B., contributors (stringi source code), IBM and other contributors

(ICU4C source code), & Unicode Inc. (Unicode Character Database). (2019).

Character String Processing Facilities. Retrieved from https://cran.r-

project.org/web/packages/stringi/stringi.pdf. Accessed: 07 September 2019

Gandhi, G. M., Parthiban, S., Thummalu, N., & Christy, A. (2015). Ndvi: Vegetation Change

Detection Using Remote Sensing and Gis – A Case Study of Vellore District.

Procedia Computer Science, 57, 1199-1210. doi:10.1016/j.procs.2015.07.415

Page 62: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

57

Gitelson, A. A., Kaufman, Y. J., & Merzlyak, M. N. (1996). Use of a green channel in remote

sensing of global vegetation from EOS-MODIS. Remote Sensing of Environment,

58(3), 289-298. doi:10.1016/S0034-4257(96)00072-7

Gomez, B., Banbury, K., Marden, M., Trustrum, N. A., Peacock, D. H., & Hoskin, P. J.

(2003). Gully erosion and sediment production: Te Weraroa Stream, New Zealand.

Water Resources Research, 39(7). doi:10.1029/2002WR001342

Google. (2018). Google Earth (Version 7.3.1.4507): Google. Accessed: 2018-05-15

Greenberg, J. A., & Mattiuzzi, M. (2018). Wrappers for the Geospatial Data Abstraction

Library (GDAL) Utilities: RDocumentation. Retrieved from

https://www.rdocumentation.org/packages/gdalUtils/versions/2.0.1.14. Accessed: 07

September 2019

Guo, X., Price, K. P., & Stiles, J. (2003). Grasslands Discriminant Analysis Using Landsat

TM Single and Multitemporal Data. Photogrammetric Engineering & Remote

Sensing, 69(11), 1255-1262.

Hansen, M. C., Potapov, P. V., Moore, R., Hancher, M., Turubanova, S. A., Tyukavina, A., . .

. Townshend, J. R. G. (2013). High-Resolution Global Maps of 21st-Century Forest

Cover Change [Map]. Retrieved from http://earthenginepartners.appspot.com/science-

2013-global-forest. Accessed: 07 September 2019

Hastie, T., Tibshirani, R., & Friedman, J. (2009). Elements of Statistical Learning. Data

Mining, Inference, and Prediction (Second Edition; Corrected 12th printing ed.).

Stanford, CA, USA: Springer Science+Business Media, LLC.

Herold, M., & Di Gregorio, A. (2012). Evaluating Land-Cover Legends Using the UN Land-

Cover Classification System. In Remote Sensing of Land Use and Land Cover.

Principles and Applications (pp. 425). Boca Raton, FL, USA: Tylor & Francis Group.

Hijmans, R. J., van Etten, J., Cheng, J., Sumner, M., Mattiuzzi, M., Greenberg, J. A., . . .

Wueest, R. (2019). Geographic Data Analysis and Modeling. Retrieved from

https://cran.r-project.org/web/packages/raster/raster.pdf. Accessed: 07 September

2019

Huang, C., Goward, S. N., Schleeweis, K., Thomas, N., Masek, J. G., & Zhu, Z. (2009).

Dynamics of national forests assessed using the Landsat record: Case studies in

eastern United States. Remote Sensing of Environment, 113(7), 1430-1442.

doi:10.1016/j.rse.2008.06.016

Huete, A. R. (1988). A soil-adjusted vegetation index (SAVI). Remote Sensing of

Environment, 25(3), 295-309. doi:10.1016/0034-4257(88)90106-X

Hussain, M., Chen, D., Cheng, A., Wei, H., & Stanley, D. (2013). Change detection from

remotely sensed images: From pixel-based to object-based approaches. ISPRS Journal

of Photogrammetry and Remote Sensing, 80, 91-106.

doi:10.1016/j.isprsjprs.2013.03.006

Immitzer, M. (2017). Mapping Tree Species, Forest Calamities and Growing Stock using

High Resolution Satellite Imagery - Possibilities and Limits. Vienna, Austria:

University of Natural Resources and Life Sciences.

Immitzer, M., Atzberger, C., & Koukal, T. (2012). Tree species classification with random

forest using very high spatial resolution 8-band WorldView-2 satellite data. Remote

Sensing, 4(9), 2661-2693.

Immitzer, M., Vuolo, F., & Atzberger, C. (2016). First Experience with Sentinel-2 Data for

Crop and Tree Species Classifications in Central Europe. Remote Sensing, 8(3).

doi:10.3390/rs8030166

Inglada, J., Arias, M., Tardy, B., Hagolle, O., Valero, S., Morin, D., . . . Koetz, B. (2015).

Assessment of an Operational System for Crop Type Map Production Using High

Page 63: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

58

Temporal and Spatial Resolution Satellite Optical Imagery. Remote Sensing, 7(9),

12356-12379. doi:10.3390/rs70912356

IPCC. (2014). Climate Change 2014: Synthesis Report. Contribution of Working Groups I, II

and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate

Change. Retrieved from https://www.ipcc.ch/report/ar5/syr/. Accessed: 05 January

2020

IPCC. (2019). IPCC Special Report on Climate Change, Desertification, Land Degradation,

Sustainable Land Management, Food Security, and Greenhous gas fluxes in

Terrestrial Ecosystems. Summary for Policymakers. Retrieved from

https://www.ipcc.ch/srccl/. Accessed: 13 January 2020

Islam, K., Jashimuddin, M., Nath, B., & Nath, T. K. (2018). Land use classification and

change detection by using multi-temporal remotely sensed imagery: The case of

Chunati wildlife sanctuary, Bangladesh. The Egyptian Journal of Remote Sensing and

Space Science, 21(1), 37-47. doi:10.1016/j.ejrs.2016.12.005

Jin, S., Yang, L., Danielson, P., Homer, C., Fry, J., & Xian, G. (2013). A comprehensive

change detection method for updating the National Land Cover Database to circa

2011. Remote Sensing of Environment, 132, 159-175. doi:10.1016/j.rse.2013.01.012

Jones, H. G., & Vaughan, R. A. (2010). Remote Sensing of Vegetation. Principles,

Techniques, and Application. New York, USA: Oxford University Press.

Kanyala Little Stars. (n.d.). Where We Are - Rusinga Island. Retrieved from

https://kanyalalittlestars.wordpress.com/where-we-are/. Accessed: 31 January 2019

Karlson, M. (2015). Remote Sensing of Woodland Structure and Composition in the Sudano-

Sahelian zone : Application of WorldView-2 and Landsat 8. Linköping, Sweden:

Linköping University.

Karlson, M., Ostwald, M., Reese, H., Bazié, H. R., & Tankoano, B. (2016). Assessing the

potential of multi-seasonal WorldView-2 imagery for mapping West African

agroforestry tree species. International Journal of Applied Earth Observation and

Geoinformation, 50, 80-88. doi:10.1016/j.jag.2016.03.004

Kaufman, Y. J., Wald, A. E., Remer, L. A., Gao, B.-C., Li, R.-R., & Flynn, L. (1997). The

MODIS 2.1-/spl mu/m channel-correlation with visible reflectance for use in remote

sensing of aerosol. IEEE transactions on Geoscience Remote Sensing, 35(5), 1286-

1298.

Kenya National Bureau of Statistics. (2010). The 2009 Kenya Population and Housing

Census. Volume 1A. Population Distribution by Administrative Units. Retrieved from

https://www.knbs.or.ke/publications/. Accessed: 10 November 2019

Kenya National Bureau of Statistics. (2012). 2009 Kenya Population and Housing Census.

Analytical Report on Population Dynamics. Retrieved from

https://www.knbs.or.ke/?page_id=3142. Accessed: 10 November 2019

Kenya National Bureau of Statistics. (2019). 2019 Kenya Population and Housing Census.

Population by County and Sub-County. Retrieved from

https://www.knbs.or.ke/?wpdmpro=2019-kenya-population-and-housing-census-

volume-i-population-by-county-and-sub-county. Accessed: 10 November 2019

Kimes, D. S., Markham, B. L., Tucker, C. J., & McMurtrey, J. E. (1981). Temporal

relationships between spectral response and agronomic variables of a corn canopy.

Remote Sensing of Environment, 11, 401-411. doi:10.1016/0034-4257(81)90037-7

Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of

Statistical Software, 28(5), 26.

Kuhn, M., Contributions from Jed Wing, Weston, S., Williams, A., Keefer, C., Engelhardt,

A., . . . Hunt, T. (2019). Classification and Regression Training. Retrieved from

https://cran.r-project.org/web/packages/caret/caret.pdf. Accessed: 07 September 2019

Page 64: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

59

Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling. New York, NY, USA:

Springer Science+Business Media.

Kussul, N., Lavreniuk, M., Skakun, S., & Shelestov, A. (2017). Deep Learning Classification

of Land Cover and Crop Types Using Remote Sensing Data. IEEE Geoscience and

Remote Sensing Letters, 14(5), 778-782. doi:10.1109/LGRS.2017.2681128

Laba, M., Gregory, S. K., Barden, J., Ogurcak, D., Hill, E., Fegraus, E., . . . DeGloria, S. D.

(2002). Conventional and fuzzy accuracy assessment of the New York Gap Analysis

Project land cover map. Remote Sensing of Environment, 81, 443-455.

Landsberg, F., & Henninger, N. (2007). Kenya GIS Data [Map]. Retrieved from:

https://www.wri.org/resources/data-sets/kenya-gis-data. Accessed: 07 September 2019

Langley, S. K., Cheshire, H. M., & Humes, K. S. (2001). A comparison of single date and

multitemporal satellite image classifications in a semi-arid grassland. Journal of Arid

Environments, 49(2), 401-411.

Leh, M., Bajwa, S., & Chaubey, I. (2013). Impact of Land Use Change on Erosion Risk: An

Integrated Remote Sensing, Geographic Information System and Modeling

Methodology. Land Degradation & Development, 24(5), 409-421.

doi:10.1002/ldr.1137

Lemon, J., Toews, M., Biancotto, E., Levy, O., Engelmann, R., Hecker, M., . . . Baral, D.

(2019). Various Plotting Functions: RDocumentation. Retrieved from

https://www.rdocumentation.org/packages/plotrix/versions/3.7-6. Accessed: 13 March

2019

Leutner, B., Horning, N., Schwalb-Willmann, J., & Hijmans, R. J. (2019). Tools for Remote

Sensing Data Analysis. Retrieved from https://cran.r-

project.org/web/packages/RStoolbox/RStoolbox.pdf. Accessed: 07 September 2019

Lewinski, S., Nowakowski, A., Rybicki, M., Kukawska, E., Malinowski, R. K., Michal

Krätzschmar, Elke Hofmann, Peter, Bielski, C., . . . Prieto, D. F. (2017). Towards

Automatic Global Land Cover Classification on Sentinel-2 Data. Paper presented at

the WorldCover 2017 Conference, Frascati, Italy.

http://s2glc.cbk.waw.pl/sites/default/files/inline-images/CBK_poster_A0_v2.pdf.

Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R News,

2(3), 18-22.

Lillesand, T. M., Kiefer, R. W., & Chipman, J. W. (2015). Remote sensing and image

interpretation (7th ed.). Hoboken, NJ, USA: John Wiley & Sons.

Löw, F., Knöfel, P., & Conrad, C. (2015). Analysis of uncertainty in multi-temporal object-

based classification. ISPRS Journal of Photogrammetry and Remote Sensing, 105, 91-

106.

Lu, D., & Weng, Q. (2007). A survey of image classification methods and techniques for

improving classification performance. International Journal of Remote Sensing, 28(5),

823-870. doi:10.1080/01431160600746456

Lunetta, R. S., Knight, J. F., Ediriwickrema, J., Lyon, J. G., & Worthy, L. D. (2006). Land-

cover change detection using multi-temporal MODIS NDVI data. Remote Sensing of

Environment, 105(2), 142-154. doi:10.1016/j.rse.2006.06.018

Mayaux, P., Bartholom, E., Cabral, A., Cherlet, M., Defourny, P., Di Gregorio, A., . . .

Vasconcelos, M. (2003). The Land Cover Map for Africa in the Year 2000 [Map].

Retrieved from: https://forobs.jrc.ec.europa.eu/products/glc2000/products.php.

Accessed: 02 August 2019

Montanarella, L., Pennock, D. J., McKenzie, N., Badraoui, M., Chude, V., Baptista, I., . . .

Vargas, R. (2016). World's soils are under threat. Soil, 2(1), 79-82. doi:10.5194/soil-2-

79-2016

Page 65: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

60

Motohka, T., Nasahara, K. N., Oguma, H., & Tsuchida, S. (2010). Applicability of Green-Red

Vegetation Index for Remote Sensing of Vegetation Phenology. Remote Sensing,

2(10), 2369-2387. doi:10.3390/rs2102369

Müller-Wilm, U. (2018a). Sen2Cor Configuration and User Manual. Retrieved from

http://step.esa.int/thirdparties/sen2cor/2.5.5/docs/S2-PDGS-MPC-L2A-SUM-

V2.5.5_V2.pdf. Accessed: 30 October 2018

Müller-Wilm, U. (2018b). Sen2Cor Software Release Note. Retrieved from

http://step.esa.int/thirdparties/sen2cor/2.5.5/docs/S2-PDGS-MPC-L2A-SRN-

V2.5.5.pdf. Accessed: 30 October 2018

Mureithi, D. S. M., Mwagi, A., & Gruber, B. [Unpublished Work]. Field Report: Training on

Permaculture with emphasis on erosion control in Rusinga Island, Homabay County.

(2018). Training Report Manuscript. Rusinga Island, Kenya.

Myint, S. W., Gober, P., Brazel, A., Grossman-Clarke, S., & Weng, Q. (2011). Per-pixel vs.

object-based classification of urban land cover extraction using high spatial resolution

imagery. Remote Sensing of Environment, 115(5), 1145-1161.

doi:https://doi.org/10.1016/j.rse.2010.12.017

NASA, & METI. (2011). Routine ASTER Global Digital Elevation Model (Publication no.

10.5067/ASTER/ASTGTM.002). Retrieved from https://earthexplorer.usgs.gov/.

Accessed: 17 December 2018

NASA Applied Remote Sensing Training (Producer). (2018, 05 April 2019). Advanced

Webinar: Change Detection for Land Cover Mapping. [Webinar] Accessed: 05 April

2019

National Geomatics Center of China. (2010). Globeland30-2010 [Map]. Retrieved from

http://globallandcover.com/GLC30Download/index.aspx. Accessed: 07 September

2019

Niang, I., Ruppel, O. C., Abdrabo, M. A., Essel, A., Lennard, C., Padgham, J., & Urquhart, P.

(2014). Africa. In V. R. Barros, C. B. Field, D. J. Dokken, M. D. Mastrandrea, K. J.

Mach, T. E. Bilir, M. Chatterjee, K. L. Ebi, Y. O. Estrada, R. C. Genova, B. Girma, E.

S. Kissel, A. N. Levy, S. MacCracken, P. R. Mastrandrea, & L. L. White (Eds.),

Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part B: Regional

Aspects. Contribution of Working Group II to the Fifth Assessment Report of the

Intergovernmental Panel on Climate Change Cambridge, UK & New York, NY,

USA: Cambridge University Press.

Nicholson, S. E. (2017). Climate and climatic variability of rainfall over eastern Africa.

Reviews of Geophysics, 55(3), 590-635. doi:10.1002/2016RG000544

Nyaga, M. [Unpublished Work]. Data collection on woody vegetation species diversity in

Rusinga Island, Homa Bay county, Kenya. (2018). Student Field Report. University of

Nairobi, Department of Land Resource Management and Agricultural Technology.

Nairobi, Kenya.

Nyberg, G., Knutsson, P., Ostwald, M., Öborn, I., Wredle, E., Otieno, D. J., . . . Malmer, A.

(2015). Enclosures in West Pokot, Kenya: Transforming land, livestock and

livelihoods in drylands. Pastoralism, 5(1). doi:10.1186/s13570-015-0044-7

Nyssen, J., Veyret‐Picot, M., Poesen, J., Moeyersons, J., Haile, M., Deckers, J., & Govers, G.

(2004). The effectiveness of loose rock check dams for gully control in Tigray,

northern Ethiopia. Soil Use and Management, 20(1), 55-64. doi:10.1111/j.1475-

2743.2004.tb00337.x

Okolla, L. [Unpublished Work]. Reconnaissance field visit to Rusinga Island on 2nd - 4th

May 2018. (2018). Student Field Report. University of Nairobi, Department of Land

Resource Management and Agricultural Technology. Nairobi, Kenya.

Page 66: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

61

OpenStreetMap contributors. (2017). OpenStreetMap Data in Layered GIS Format. Retrieved

from https://download.geofabrik.de/africa/kenya.html. Accessed: 03 December 2018

Orange-Senqu River Commission. (2014). Rehabilitating rangelands for healthy headwaters:

Steps Basotho Communities are taking to reverse land degradation at the source of

the Orange-Senqu River. Retrieved from

https://iwlearn.net/resolveuid/5196ea7381784c7caba124942cab109c. Accessed: 05

September 2019

Otukei, J. R., & Blaschke, T. (2010). Land cover change assessment using decision trees,

support vector machines and maximum likelihood classification algorithms.

International Journal of Applied Earth Observation and Geoinformation, 12, S27-

S31. doi:10.1016/j.jag.2009.11.002

Pesaresi, M., Corbane, C., Julea, A., Florczyk, A., Syrris, V., & Soille, P. (2016). Assessment

of the Added-Value of Sentinel-2 for Detecting Built-up Areas. Remote Sensing,

8(299), 1-18. doi:10.3390/rs8040299

Pimentel, D., & Burgess, M. (2013). Soil Erosion Threatens Food Production. Agriculture,

3(3), 443-463. doi:10.3390/agriculture3030443

Pimentel, D., & Kounang, N. (1998). Ecology of Soil Erosion in Ecosystems. Ecosystems, 1,

416-426.

QGIS Development Team. (2019). QGIS. A Free and Open Source Geographic Information

System (Version 3.0 Girona, 3.2 Bonn, 3.4 Madeira (LTR), 3.6 Noosa): QGIS.

Accessed: 2018-05-02

R Core Team. (2018). R: A Language and Environment for Statistical Computing. Vienna,

Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-

project.org. Accessed: 02 May 2018

R Core Team. (2019). The R Graphics Devices and Support for Colours and Fonts:

RDocumentation. Retrieved from

https://www.rdocumentation.org/packages/grDevices/versions/3.6.1. Accessed: 07

September 2019

Rawat, J. S., & Kumar, M. (2015). Monitoring land use/cover change using remote sensing

and GIS techniques: A case study of Hawalbagh block, district Almora, Uttarakhand,

India. The Egyptian Journal of Remote Sensing and Space Science, 18(1), 77-84.

doi:10.1016/j.ejrs.2015.02.002

Rees, W. G. (2012). Physical Principles of Remote Sensing (3rd ed.). New York, USA:

Cambridge University Press.

Renard, K. G., Foster, G. R., Weesies, G. A., McCool, D. K., & Yoder, D. C. (1997).

Predicting Soil Erosion by Water: A Guide to Conservation Planning with the Revised

Universial Soil Loss Equation (RUSLE) (Vol. No. 703). Washingston, DC, USA: U.S.

Department of Agriculture.

Rouse, J. W., Haas, R. H., Schell, J. A., Deering, D. W., & Harlan, J. C. (1974). Monitoring

the vernal advancement and retrogradation (greenwave effect) of natural vegetation.

Accessed: 21 July 2018

RStudio. (2018). RStudio. Open Source and Enterprise-Ready Professional Software for R

(Version 1.1.447). Accessed: 02 May 2018

Schläpfer, D., Borel, C. C., Keller, J., & Itten, K. I. (1998). Atmospheric precorrected

differential absorption technique to retrieve columnar water vapor. Remote Sensing of

Environment, 65(3), 353-366.

Schultz, B., Immitzer, M., Formaggio, A., Sanches, I., Luiz, A., & Atzberger, C. (2015). Self-

Guided Segmentation and Classification of Multi-Temporal Landsat 8 Images for

Crop Type Mapping in Southeastern Brazil. Remote Sensing, 7(11), 14482-14508.

doi:10.3390/rs71114482

Page 67: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

62

Scott, D., Mitchell, D., Shah, A., Laake, J., Henningsen, A., Andronic, L., . . . Garbade, S.

(2019). Export Tables to LaTeX or HTML: RDocumentation. Retrieved from

https://www.rdocumentation.org/packages/xtable/versions/1.8-4. Accessed: 07

September 2019

Senf, C., Leitão, P. J., Pflugmacher, D., van der Linden, S., & Hostert, P. (2015). Mapping

land cover in complex Mediterranean landscapes using Landsat: Improved

classification accuracies from integrating multi-seasonal and synthetic imagery.

Remote Sensing of Environment, 156, 527-536. doi:10.1016/j.rse.2014.10.018

Siachalou, S., Mallinis, G., & Tsakiri-Strati, M. (2015). A Hidden Markov Models Approach

for Crop Classification: Linking Crop Phenology to Time Series of Multi-Sensor

Remote Sensing Data. Remote Sensing, 7(4), 3633-3650. doi:10.3390/rs70403633

Signorell, A., Gel, Y., Mitchell, D., Barton, K., Chamely, S., Chhay, L., . . . Sabbe, N. (2017).

Tools for Descriptive Statistics: RDocumentation. Retrieved from

https://www.rdocumentation.org/packages/DescTools/versions/0.99.19. Accessed: 07

September 2019

Tewkesbury, A. P., Comber, A. J., Tate, N. J., Lamb, A., & Fisher, P. F. (2015). A critical

synthesis of remotely sensed optical image change detection techniques. Remote

Sensing of Environment, 160, 1-14. doi:10.1016/j.rse.2015.01.006

The Presidency. (2018). Kenya Vision 2030. Marking 10 Years of Progress (2008 - 2018).

Sector Progress & Project Updates, June 2018. Retrieved from

http://vision2030.go.ke/inc/uploads/2018/09/Kenya-Vision-2030-Sector-Progress-

Project-Updates-June-2018.pdf. Accessed: 05 September 2019

Tigges, J., Lakes, T., & Hostert, P. (2013). Urban vegetation classification: Benefits of

multitemporal RapidEye satellite data. Remote Sensing of Environment, 136, 66-75.

doi:10.1016/j.rse.2013.05.001

Tryon, C. A., Faith, J. T., Peppe, D. J., Keegan, W. F., Keegan, K. N., Jenkins, K. H., . . .

Beverly, E. J. (2014). Sites on the landscape: Paleoenvironmental context of late

Pleistocene archaeological sites from the Lake Victoria basin, equatorial East Africa.

Quaternary International, 331, 20-30. doi:10.1016/j.quaint.2013.05.038

Turner, W., Rondinini, C., Pettorelli, N., Mora, B., Leidner, A. K., Szantoi, Z., . . .

Woodcock, C. (2015). Free and open-access satellite data are key to biodiversity

conservation. Biological Conservation, 182, 173-176.

doi:10.1016/j.biocon.2014.11.048

UN Environment. (2018). UN Environment joins campaign to green Kenya [Press release].

Retrieved from https://www.unenvironment.org/news-and-stories/press-release/un-

environment-joins-campaign-green-kenya. Accessed: 05 September 2019

UNEP. (2009). Kenya: Atlas of Our Changing Environment. Nairobi, Kenya: Division of

Early Warning and Assessment (DEWA) & United Nations Environment Programme

(UNEP).

UNEP. (2016). GEO-6 Regional Assessment for Africa. Retrieved from

http://wedocs.unep.org/bitstream/handle/20.500.11822/7595/GEO_Africa_201611.pdf

?sequence=1&isAllowed=y. Accessed: 04 March 2019

Viña, A., Gitelson, A. A., Nguy-Robertson, A. L., & Peng, Y. (2011). Comparison of

different vegetation indices for the remote assessment of green leaf area index of

crops. Remote Sensing of Environment, 115(12), 3468-3478.

doi:10.1016/j.rse.2011.08.010

Vuolo, F., Richter, K., & Atzberger, C. (2011). Evaluation of time-series and phenological

indicators for land cover classification based on MODIS data. Paper presented at the

Remote Sensing for Agriculture, Ecosystems, and Hydrology XIII, Prague, Czech

Republic.

Page 68: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

63

Wagenknecht, B. (Personal Communication). Background on Books for Trees, Badilisha Eco-

village Foundation Trust, and Rusinga Island. 2018, 16 March.

WCED. (1987). Our Common Future. Retrieved from http://www.un-documents.net/wced-

ocf.htm. Accessed: 09 August 2019

Weih, R. C., & Riggan, N. D. (2010). Object-Based Classification Vs. Pixel-Based

Classification: Comparative Importance of Multi-Resolution Imagery. The

International Archives of the Photogrammetry, Remote Sensing and Spatial

Information Sciences, XXXVIII-4(C7), 6.

Wibowo, A., Ismullah, I. H., Dipokusumo, B. S., & Wikantika, K. (2012). Land Degradation

Model Based on Vegetation and Erosion Aspects Using Remote Sensing Data. ITB

Journal of Sciences, 44(1), 19-34. doi:10.5614/itbj.sci.2012.44.1.3

Willis, K. S. (2015). Remote sensing change detection for ecological monitoring in United

States protected areas. Biological Conservation, 182, 233-242.

doi:10.1016/j.biocon.2014.12.006

Wulder, M. A., Franklin, S. E., White, J. C., Linke, J., & Magnussen, S. (2006). An accuracy

assessment framework for large‐area land cover classification products derived from

medium‐resolution satellite data. International Journal of Remote Sensing, 27(4), 663-

683. doi:10.1080/01431160500185284

Yang, X., Xu, B., Jin, Y., Qin, Z., Ma, H., Li, J., . . . Zhu, X. (2015). Remote sensing

monitoring of grassland vegetation growth in the Beijing–Tianjin sandstorm source

project area from 2000 to 2010. Ecological Indicators, 51, 244-251.

doi:10.1016/j.ecolind.2014.04.044

Yin, H., Pflugmacher, D., Kennedy, R. E., Sulla-Menashe, D., & Hostert, P. (2014). Mapping

Annual Land Use and Land Cover Changes Using MODIS Time Series. IEEE Journal

of Selected Topics in Applied Earth Observations and Remote Sensing, 7(8), 3421-

3427. doi:10.1109/jstars.2014.2348411

Yu, W., Zhou, W., Qian, Y., & Yan, J. (2016). A new approach for land cover classification

and change analysis: Integrating backdating and an object-based method. Remote

Sensing of Environment, 177, 37-47. doi:10.1016/j.rse.2016.02.030

Zhu, Z., & Woodcock, C. E. (2014). Continuous change detection and classification of land

cover using all available Landsat data. Remote Sensing of Environment, 144, 152-171.

doi:10.1016/j.rse.2014.01.011

Page 69: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-
Page 70: Remote sensing-based land cover classification and change ...liu.diva-portal.org/smash/get/diva2:1443933/FULLTEXT01.pdfLand cover monitoring provides valuable insights to advice policy-

Recommended