+ All Categories
Home > Documents > arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely...

arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely...

Date post: 22-Jan-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
15
Deep Learning-based Aerial Image Segmentation with Open Data for Disaster Impact Assessment Ananya Gupta, Simon Watson, Hujun Yin Department of Electrical and Electronic Engineering, The University of Manchester, Manchester, United Kingdom Abstract Satellite images are an extremely valuable resource in the aftermath of natural disasters such as hurricanes and tsunamis where they can be used for risk assessment and disaster management. In order to provide timely and actionable information for disaster response, in this paper a framework utilising segmentation neural networks is proposed to identify impacted areas and accessible roads in post-disaster scenarios. The effectiveness of pretraining with ImageNet on the task of aerial image segmentation has been analysed and performances of popular segmentation models compared. Experimental results show that pretraining on ImageNet usually improves the segmentation performance for a number of models. Open data available from OpenStreetMap (OSM) is used for training, forgoing the need for time-consuming manual annotation. The method also makes use of graph theory to update road network data available from OSM and to detect the changes caused by a natural disaster. Extensive experiments on data from the 2018 tsunami that struck Palu, Indonesia show the effectiveness of the proposed framework. ENetSeparable, with 30% fewer parameters compared to ENet, achieved comparable segmentation results to that of the state-of-the-art networks. Keywords: Disaster Response, Aerial Images, Semantic Segmentation, Convolutional Neural Networks, Graph Theory 1. Introduction Satellite imagery is an extremely important resource for dis- aster management and response. Following a major natural disaster such as an earthquake or a tsunami, authorised users from national civil protection, rescue or security organisations can activate the International Charter: Space and Major Dis- asters [5]. The Charter is a worldwide collaboration amongst space agencies and space systems operators that provide satel- lite imagery for disaster monitoring. This imagery can then be used to identify damaged areas that need the most support and also routes that are still accessible for evacuation and emer- gency responses. Such image analysis is typically done manually with sup- port from volunteer initiatives such as the Humanitarian Open- StreetMap (OSM) team. They organise mapathons with volun- teers from around the world to manually annotate high resolu- tion satellite images. Inevitably, this process can be slow and error-prone due to the inexperience of many volunteers [32]. Time is extremely critical in post-disaster situations for prompt relief efforts. Timely and accurate road maps are also extremely important for navigation in post-disaster scenarios. Pre-existing maps can be rendered inaccurate due to possible route block- ages, water-logging, landslides and structural damages. For instance, on September 28, 2018, a 7.5 magnitude earth- quake with an epicenter in Central Sulawesi struck Indonesia and led to a tsunami in the province capital Palu, which washed away a lot of the coastal infrastructure. This was the deadli- est earthquake worldwide in 2018, with over 4,000 fatalities and damages to over 60,000 buildings. Following the event, a Figure 1: Extracted images from satellite imagery of Palu, Indonesia showing the devastation due to the tsunami and earthquake in September, 2018 [10]. Left: Before the tsunami. Right: The day after the tsunami. number of rapid mapping efforts were initiated by the govern- ment and volunteer initiatives for damage assessment. How- ever, these efforts took days to complete due to many manual processes [1]. Deep learning (DL) based techniques such as convolutional neural networks (CNN) are becoming increasingly pervasive as a means of automating the knowledge discovery process in many fields including remote sensing [43]. These techniques are used in conjunction with Earth Observation data for ap- plications such land-use classification, change detection, object detection and disaster analysis. However, DL models are data driven and typically require a large amount of manually anno- tated data for training [1]. This data annotation is a slow pro- cess and hence these methods cannot be directly used for rapid disaster analysis. Preprint submitted to Neurocomputing June 11, 2020 arXiv:2006.05575v1 [eess.IV] 10 Jun 2020
Transcript
Page 1: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

Deep Learning-based Aerial Image Segmentation with Open Data for Disaster ImpactAssessment

Ananya Gupta Simon Watson Hujun Yin

Department of Electrical and Electronic Engineering The University of Manchester Manchester United Kingdom

Abstract

Satellite images are an extremely valuable resource in the aftermath of natural disasters such as hurricanes and tsunamis wherethey can be used for risk assessment and disaster management In order to provide timely and actionable information for disasterresponse in this paper a framework utilising segmentation neural networks is proposed to identify impacted areas and accessibleroads in post-disaster scenarios The effectiveness of pretraining with ImageNet on the task of aerial image segmentation has beenanalysed and performances of popular segmentation models compared Experimental results show that pretraining on ImageNetusually improves the segmentation performance for a number of models Open data available from OpenStreetMap (OSM) is usedfor training forgoing the need for time-consuming manual annotation The method also makes use of graph theory to update roadnetwork data available from OSM and to detect the changes caused by a natural disaster Extensive experiments on data fromthe 2018 tsunami that struck Palu Indonesia show the effectiveness of the proposed framework ENetSeparable with 30 fewerparameters compared to ENet achieved comparable segmentation results to that of the state-of-the-art networks

Keywords Disaster Response Aerial Images Semantic Segmentation Convolutional Neural Networks Graph Theory

1 Introduction

Satellite imagery is an extremely important resource for dis-aster management and response Following a major naturaldisaster such as an earthquake or a tsunami authorised usersfrom national civil protection rescue or security organisationscan activate the International Charter Space and Major Dis-asters [5] The Charter is a worldwide collaboration amongstspace agencies and space systems operators that provide satel-lite imagery for disaster monitoring This imagery can thenbe used to identify damaged areas that need the most supportand also routes that are still accessible for evacuation and emer-gency responses

Such image analysis is typically done manually with sup-port from volunteer initiatives such as the Humanitarian Open-StreetMap (OSM) team They organise mapathons with volun-teers from around the world to manually annotate high resolu-tion satellite images Inevitably this process can be slow anderror-prone due to the inexperience of many volunteers [32]Time is extremely critical in post-disaster situations for promptrelief efforts Timely and accurate road maps are also extremelyimportant for navigation in post-disaster scenarios Pre-existingmaps can be rendered inaccurate due to possible route block-ages water-logging landslides and structural damages

For instance on September 28 2018 a 75 magnitude earth-quake with an epicenter in Central Sulawesi struck Indonesiaand led to a tsunami in the province capital Palu which washedaway a lot of the coastal infrastructure This was the deadli-est earthquake worldwide in 2018 with over 4000 fatalitiesand damages to over 60000 buildings Following the event a

Figure 1 Extracted images from satellite imagery of Palu Indonesia showingthe devastation due to the tsunami and earthquake in September 2018 [10]Left Before the tsunami Right The day after the tsunami

number of rapid mapping efforts were initiated by the govern-ment and volunteer initiatives for damage assessment How-ever these efforts took days to complete due to many manualprocesses [1]

Deep learning (DL) based techniques such as convolutionalneural networks (CNN) are becoming increasingly pervasiveas a means of automating the knowledge discovery process inmany fields including remote sensing [43] These techniquesare used in conjunction with Earth Observation data for ap-plications such land-use classification change detection objectdetection and disaster analysis However DL models are datadriven and typically require a large amount of manually anno-tated data for training [1] This data annotation is a slow pro-cess and hence these methods cannot be directly used for rapiddisaster analysis

Preprint submitted to Neurocomputing June 11 2020

arX

iv2

006

0557

5v1

[ee

ssI

V]

10

Jun

2020

Some recent work has explored the use of data from OSM [28]for training ML models in the absence of high quality man-ually labelled training data [20] OSM data can be assumedto be weakly labelled training data due to issues such as mis-registration and out-of-date labels The work in [20] showedthat a large enough training dataset helped alleviate the issuesof using a weakly labelled dataset

A framework to detect damaged roads from satellite im-agery and register them to OSM was recently proposed in [15]It was trained on publicly available OSM data forgoing the re-quirement for expensive manually annotated data It provideda method inspired by graph theory to automate the process ofupdating the OSM database by combining the changes detectedusing their framework with OSM road data This paper buildson [15] and shows how it can be extended to multiple semanticclasses A systematic analysis of the performance of differentneural networks for the task of aerial image segmentation is per-formed The effect of pretraining the neural networks on a largeimage dataset is also analysed Two variants of popular neuralnetworks are also proposed the first ENetSeparable focuseson efficiency and the second UNetUpsample on accuracy Fi-nally it is shown that the proposed framework from [15] canbe seen as being architecture agnostic since it helps reduce thedifference in segmentation performance from the various seg-mentation networks

2 Related Work

21 Aerial Image SegmentationRecent successes of deep learning models in image classi-

fication and big data analysis have promoted much increaseduse of such models in remote sensing for tasks such as landcover classification and change detection [43] Readers are re-ferred to [42] and [43] for an extensive background and reviewon the use of deep learning for remote sensing tasks Commontasks in this field are extraction of road networks [3 24 38] andbuilding footprints [39] using semantic segmentation networkspopularised by large-scale competitions such as DeepGlobe [9]and SpaceNet [40]

Most popular semantic segmentation architectures are struc-tured as encoder-decoder models popularised by UNet [33]The encoder consists of a number of blocks where each blocktakes an input image or feature map and produces a set of down-sampled feature maps which progressively identify higher levelfeatures The decoder network mirrors the encoder network andprogressively upsamples the output from the encoder networkIndividual decoder blocks are connected to the correspondingencoder blocks with skip links to help recover the fine-graineddetails lost in the downsampling The upsampling is typicallydone using transposed convolutions with learnable weights

A study on using OSM data for learning aerial image seg-mentation showed that using a large amount of weakly labelleddata for training helped achieve reasonable performance with-out the need for large well-labelled datasets [20] Alternativeschemes to train models for aerial image segmentation employself-supervision [37] and supervised pretraining on the Ima-geNet [2]

22 Disaster Analysis

Remote sensing is being increasingly used for disaster re-sponse management due to the increasing availability of remotesensing data which can be acquired relatively quickly [22] Themain datatypes used in such cases are synthetic aperture radar(SAR) and high resolution optical images SAR is extremelyuseful for dealing with low-light conditions and for areas withcloud cover It is especially useful in finding flooded areas [35]and identifying ground displacements after earthquakes [31]However it cannot be used in urban areas with the same effec-tiveness due to radar backscattering to the sensors caused bytall objects such as buildings [5]

High resolution optical imagery is typically used for visualinterpretation in the case of events such as hurricanes cyclonesand tsunamis which leave visible damages to an area Recentwork in automating this process has focused on assessing thedamage to buildings in disaster-struck areas A combinationof pre- and post-tsunami satellite images has been used to as-sess whether a building was washed away by a flood with theimplementation of a CNN [14] Similarly an approach fusingmulti-resolution multi-sensor and multi-temporal imagery in aCNN was used to segment flooded buildings [34] Howeverthese approaches require manually labelled post-disaster datafor training their networks which is time-consuming and ex-pensive to obtain

Automated road extraction from satellite imagery is an areaof interest since a number of location and navigation servicesrequire up-to-date road maps [25] A number of approaches us-ing segmentation methods have been proposed to extract roadnetworks These methods typically depend on heuristics-basedtechniques in post processing to fix incorrect gaps from thesegmentation networks [24] However these methods whilevalid in typical road extraction scenarios are not suitable inpost-disaster scenarios because gaps in the segmentation maskscould be caused by the effects of a disaster and are of extremeimportance

There is some existing research for road extraction in post-disaster scenarios Vehicle trajectories have been used for iden-tifying obstacles such as standing waters and fallen trees [7]Road centerline extraction from post-disaster imagery has usedOSM vector data for generating seed points and creating a moreaccurate road map following an earthquake which can causeregistration errors [23] This method only corrects the regis-tration errors but does not deal with the problem of destroyedroads A crowd-sourced pedestrian map builder has also beendeveloped [4] but it requires people walking around in poten-tially destroyed areas and is not scalable

Segmentation networks have also been used for detectingchanges caused by disasters [11] These methods use the differ-ence between outputs of pre-disaster and post-disaster imageryto obtain a measure on areas that have been damaged the mostIn contrast the current work extends the previous work by iden-tifying the changes to the road networks at a fine-grained levelFurthermore the proposed framework also allows for an up-date to OSM to achieve more realistic road network maps forthe area under consideration

2

3 Methodology

The proposed disaster impact assessment is based on find-ing the difference in roads and buildings between satellite im-agery from before and after a disaster This is done by usinga semantic segmentation network trained on pre-disaster aerialimagery for identifying these objects in the before and after im-agery The difference in the predicted road masks is furtherused to update data from OSM for finding accessible routes inthe post-disaster scenario

31 Segmentation Models

The models used in this study are modified versions of theUNet and LinkNet [6] The modifications were inspired by theTernausNet [18] which showed that replacing the UNet en-coder with a pretrained VGG11 encoder improved segmenta-tion results

Here a systematic study was carried out to compare the ef-fectiveness of different encoder backbones In the tested mod-els the encoder backbone was replaced by the convolutionallayers from VGG [36] and ResNet [16] for the UNet The orig-inal LinkNet model with ResNet18 as its encoder and anotherone with a ResNet34 backend were also tested

A slight modification of the UNet was also studied wherethe transposed convolutions in the decoders were replaced withthe nearest neighbour upsampling to deal with possible checker-board artifacts [27] This modified version has been called UN-etUp in the remainder of the text

Another model tested in this study is the ENet [29] It isan encoder-decoder model optimised for efficiency in terms oflatency and parameters with an encoder inspired by ResNet anda small decoder It uses early downsampling with a relativelylow number of feature maps to reduce the number of operationsrequired It also decomposes n times n convolutions into smallerconvolutions of ntimes1 and 1timesn[19] allowing for large speedups

Inspired by Xception-Net [8] a modified version of ENetcalled ENetSeparable is proposed In this model all convolu-tional filters are replaced by depthwise separable convolutionsa modification that reduces the number of parameters by 30

The loss function is a weighted cross entropy loss with anadditional soft Jaccard constraint and is given as follows

L = (1minusα)1I

Isumi

(minuswklog

eoiksumCc eoic

)minusα

Csumc

logsum

i eoic lowast ticsumi eoic + tic minus eoic lowast tic

(1)where

wk =

sumCc=1 S c

C times S k

Subscript k denotes the target class i indexes over all pixelswhere the total number of pixels is given by I and the totalnumber of classes is given by C The output for class c at pixeli is given by oic and t is a one-hot encoded target vector α is theweighting for the Jaccard loss The weight for class k is given

by wk and S k denotes the number of samples in the training setfor target class k while S c the number of samples for class c

32 Disaster Impact Assessment with Change DetectionThe segmentation network is used to identify buildings and

roads in pre-disaster and post-disaster aerial imagery Due toshadows and occlusions the segmentation output can have anumber of incorrect gaps The building and road segmentationmasks are dilated with a small kernel (eg 5 times 5) for several it-erations (6 in our experiments) to overcome some of these gapsThe resulting masks can be used to distinguish the infrastruc-ture that was destroyed due to the disaster as follows

Mdi f fp =

1 if Mprep isin 1 2 and Mpostp = 00 otherwise (2)

Subscript p indexes over all the pixels in each image andMdi f f is the disaster difference mask Mpre and Mpost are thesegmentation masks from pre and post-disaster imagery respec-tively The inferred label is one of 012 referring to the back-ground building or road class This function computes truefor any pixel that was identified as a road or building in thepre-disaster image but as background in the post-disaster imagesince that can be assumed to be damaged due to the disaster

Due to small mis-registration issues and non-ideal segmen-tation outputs the segmentation masks from the pre-disasterand post-disaster images do not completely overlap Hencesmall blobs in the difference mask can be assumed to be noiseor artifacts caused by the registration error Morphological ero-sion and opening are used to remove all such noise and the finalmask obtained represents the damaged infrastructure due to thedisaster The intermediate steps of this process are shown inFig 2

33 Generating Road GraphsThe output segmentation mask is converted to a road net-

work graph motivated by graph theory to obtain a map suitablefor route computation Firstly all pixels marked as road areextracted to form a road mask The road mask is dilated todeal with small gaps in the segmentation output since these cancause large errors in the network graph Morphological thin-ning is performed on the obtained mask to get a single pixelthick road skeleton The road skeleton is traversed to find allnodes where each node is any positive pixel with three or morepositive pixel neighbours All pixels between two nodes aremarked as part of an edge

Since the edges approximated with this method are fairlycrooked and small road segments can be assumed to be straightthe edges are simplified to piece-wise linear segments using theRamer-Douglas-Pecker algorithm [12]

34 Registering changes to OSMThe road network generated from post-disaster imagery us-

ing the methodology described above could be used for rout-ing in most scenarios However non-ideal segmentation masks

3

Figure 2 Pipeline for change detection in pre-disaster and post-disaster segmentation masks Roads shown in blue and buildings shown in green

can cause long detours when creating the road network graphHence it is proposed to further use data from OSM as the bestestimate of the world prior to a disaster and register the changescaused by such an event with the OSM road graph to obtainan updated map of the affected region The change graph canbe obtained from the difference mask generated in Section 32Note that the OSM data is not completely accurate [24] butbased on empirical observations using it provides more robustresults

There are a number of methods in graph theory for measur-ing graph similarity However these methods compare logicaltopology of graphs by looking for common nodes In the caseof road networks the physical topology is extremely importantand the graph comparison problem becomes non-trivial In suchcases corresponding nodes in the two graphs may not spatiallycoincide due to image offsets and errors in the segmentationmasks making the pre-existing methods of graph comparisonunfeasible

In order to compare the topological graphs each edge ofthe graphs Ga and Gb is sliced into smaller sub-segments oflength l to obtain simplified graphs Gprimea and Gprimeb Correspondingsub-segments in the two graphs can be found using Eq 3 [15]where two sub-segments are assumed to be corresponding ifboth vertices of one sub-segment are within a certain distanceof the other sub-segment A visual representation of this can beseen in Figure 3

forallea eb ea isin Gprimea eb isin Gprimebea = va1 va2 eb = vb1 vb2

ea = eb iff |a1 minus b1| lt l2 and |a2 minus b2| lt l2(3)

In Eq 3 ea and eb are the sub-segments in graphs Gprimea andGprimeb and are defined in terms of their two vertices va1 and va2and vb1 and vb2 respectively The euclidean distance betweentwo vertices is given by |a1minus b1| where a1 and b1 represent thecoordinates of the first vertices of va1 and vb1 respectively

Figure 3 Graph Comparison Top Gprimea in blue and Gprimeb in red with dashed circlesof radius l2 drawn around the nodes of Gprimea Bottom Common sub-segmentsshown in green non-corresponding sub-segments from Gprimea shown in purple andthose from Gprimeb shown in yellow

4

Figure 4 Dataset Extent Training extent in blue and testing extent in yellowThe split was chosen such that most of the damaged area was part of the testingextent

4 Experimental Setup

41 Neural Network Structures

The structures of the encoders and decoders are summarisedin Table 1 and Table 2 respectively In these tables convx-yimplies a convolutional layer with a kernel size of x and y fil-ters with the nc in the final layer meaning the number of out-put classes Similarly convTranx-y is a transposed convolutionlayer with a kernel size of x and y filters The individual de-coder block structure for the different models is given in Table3 All convolutional layers use the ReLU[26] activation and theencoders include pooling and batch-norm layers as proposedby the original authors Note that UNet-style architectures con-catenate the encoder feature map and the decoder feature mapwhile LinkNet architectures add the feature maps instead ofconcatenating them to make the network more efficient

42 Datasets

DigitalGlobersquos Open Data Program1 provides high resolu-tion satellite imagery in the wake of natural disasters to enablea timely response This study uses the data from Palu In-donesia which was struck by an earthquake and tsunami on 28September 2018 and had visible damages to its coastlines andinfrastructure The pre-disaster imagery was from 7th April2018 and the post disaster imagery was from 1st October 2018The imagery had a ground sampling distance of approximately50 cm pixelminus1

1httpswwwdigitalglobecomecosystemopen-data

An area of 45 km2 around Palu city was extracted for theexperiments with 14 km2 of the area with visible damage beingset aside for testing The remainder of the imagery was used fortraining and validation The dataset split is visualised in Fig 4where the area in yellow was used for testing

The labels for training the segmentation networks were down-loaded from OSM2 All polylines marked as motorways pri-mary secondary tertiary residential service trunk and theirlinks were extracted as roads The roads and buildings in OSMwere provided as vectors and polygons respectively They wereconverted to a raster format to create a dataset suitable for train-ing All the lat-long coordinates were converted to pixel coor-dinates The roads were rasterised with a buffer of 2 m and thebuilding polygons were rasterised as filled polygons For thebinary segmentation tasks separate road and building mask im-ages were generated where the target classes were labelled as 1For the multiclass segmentation experiments the buildings androads were labelled as 1 and 2 respectively The backgroundpixels were always marked with 0 The test datasets were an-notated manually

Note that only pre-disaster data was used for training theneural networks and the segmentation based results The post-disaster imagery was used purely for inference and for obtain-ing the post-disaster mapping results

43 Metrics

The Jaccard Index or Intersection over Union (IoU) is a typ-ical per-pixel metric for evaluating segmentation results It isgiven by Eq 4 and measures the overlap of predicted labelswith the true labels For the binary segmentation cases theIoU for the target class is reported and for the multi-class casethe mean IoU (mIOU) over the target classes is also reportedThe IoU for the background class is not included since the highnumber of background pixels would bias the results

IoU =T P

T P + FP + FN(4)

The IoU metric measures the segmentation performance butis not the most suitable metric for graphs because a small gapin the segmentation mask may only cause a small error in theIoU metric but can lead to large detours if the resulting road net-work is used for navigation As outlined in Section 34 compar-ing two topological graphs is a non-trivial task and graph con-nectivity is as important as graph completeness Herein twometrics are used the first to evaluate the completeness of thegraph and the second to evaluate the connectivity of the gener-ated graph

The first metric measures the similarity of the sub-segmentsdescribed in Section 34 using the precision-recall metrics asfollows

2httpswwwopenstreetmaporg

5

Table 1 Encoder Structures

Block VGG11 VGG16 ResNet18 ResNet34

enc1 conv3-64 conv3-64 conv7-64 conv7-64conv3-64

enc2 conv3-128 conv3-128 conv3-64 x2 conv3-64 x3conv3-128 conv3-64 conv3-64

enc3conv3-256 conv3-256 conv3-128

x2conv3-128

x4conv3-256 conv3-256 conv3-128 conv3-128conv3-256

enc4conv3-512 conv3-512 conv3-256

x2conv3-256

x6conv3-512 conv3-512 conv3-256 conv3-256conv3-512

enc4conv3-512 conv3-512 conv3-512

x2conv3-512

x3conv3-512 conv3-512 conv3-512 conv3-512conv3-512

Table 2 Decoder Structures

Block UNet UNetUp LinkNetcenter dec unet(512256) dec unet up(512256) dec link(512256)dec5 dec unet(512256) dec unet up(512256) dec link(256128)dec4 dec unet(256128) dec unet up(256128) dec link(12864)dec3 dec unet(12864) dec unet up(12864) dec link(6464)dec2 dec unet(6432) dec unet up(6432) convTran3-32dec1 conv3-32 conv3-32 conv3-32final conv3-nc conv3-nc conv3-nc

Table 3 Decoder Block Structure

Block Layers

dec unet(ab) conv3-aconvTran4-b

dec unet up(ab)upsampleconv3-aconv3-b

dec link(ab)conv3-a4convTran4-a4conv3-b

precision =T P

T P + FP

recall =T P

T P + FN

Fscore = 2 timesp times rp + r

(5)

The metric proposed in [41] has been reported for evaluat-ing graph connectivity This metric measures the similarity ofgraphs by comparing the shortest path length for a large set ofrandom source-destination pairs between the actual graph andthe predicted graph If the extracted paths have a similar lengththey can be assumed to be a match and are marked as rsquoCorrectrsquoin the results If the path length in the predicted graph is smallerthan the actual graph the generated graph has incorrect connec-

tions and this is reported as rsquoToo Shortrsquo Conversely if there areincorrect gaps in the predicted graph the paths are either rsquoTooLongrsquo or there are no possible paths giving rsquoNo Connectionsrsquo

44 Training DetailsThe models were trained using the Adam optimiser [21]

with a learning rate of 10minus4 A minibatch size of 5 images wasused for all the UNet-based models and 32 images for the othermodels The models were built in Pytorch [30] The VGG11VGG16 ResNet18 and ResNet34 models provided by the Py-torch model zoo were used for initialising the encoder networksin the pretrained networks He initialisation [17] was used forall the other layers

The training images and their corresponding masks werecropped to 416times416 pixels and were augmented with horizontaland vertical flipping All images were zero-mean normalisedOnly the pre-disaster images were used for training and thepost-disaster images were used for inference

All models were trained for 600 epochs to enable a fair com-parison between the different models A validation set was usedfor preventing overfitting the final model used for measuringthe performance was set to be the one where the validation lossconverged to just before starting diverging (ie overfitting onthe training set)

5 Results

51 Segmentation ResultsOur experiments tested the following scenarios

6

Figure 5 Buildings Validation Loss

bull Effect of pretraining with ImageNet on aerial image seg-mentation models

bull Efficiency vs accuracy trade-off between different pop-ular architectures

511 Effect of pretraining on aerial image segmentationThe first set of experiments were conducted to analyse the

effect of pretraining with ImageNet on the segmentation taskThe purpose of these experiments was two-fold Firstly whetherpretraining on a large classification dataset such as ImageNetimproves the accuracy of an unrelated task where the imagestatistics are quite different (ground-based object images forclassification vs aerial images for segmentation) Secondly as-suming accuracy with pretraining is similar or even better thanthat without whether pretraining improves the convergence speed

Four different network architectures were tested UNet withVGG11 or VGG16 backend and LinkNet with ResNet18 orResNet34 backend These models were trained for segmen-tation of buildings and roads while validation loss curves areshown in Fig 5 and Fig 6 respectively From the loss curvesit can be seen that pretrained networks generally convergedquicker than their non-pretrained equivalents requiring approx-imately 10 less epochs regardless of the model used and thetarget class The training curves for the VGG-based UNet mod-els for the road segmentation task also show that these modelsdid not converge very well when training from scratch thoughtheir validation loss which was used for preventing overfittingwas lower than their pretrained equivalent

The results of these models on the test set summarised inTable 4 show that in general the pretrained models had a higherIoU by a couple of points as compared to their non-pretrainedequivalents and this corresponds to the previous results in theliterature [2] Note that this section focuses on identifying thedifferences between training models from scratch and usingpretrained encoders The results across different models arecompared in the next section

Figure 6 Roads Validation Loss

Table 4 Effects of pretraining All results given as IoU

Model Pretrained Roads BuildingsUNet (VGG11) No 3773 5747UNet (VGG11) Yes 3936 5758UNet (VGG16) No 392 5772UNet (VGG16) Yes 4007 5972LinkNet (ResNet18) No 323 5129LinkNet (ResNet18) Yes 3508 5708LinkNet (ResNet34) No 3542 5482LinkNet (ResNet34) Yes 372 5715

7

512 Model capacity design and accuracyVisualisation of binary segmentation results for different

models can be seen in Figure 7 and Figure 8 and their quan-titative performance are reported in Table 5 Sizes of these dif-ferent models are reported in Table 6 From the results it can beseen that the proposed UNetUp with a VGG16 encoder outper-formed all other models by a couple of points on each task

The building segmentation masks in Figure 7 show that theENet-based models led to fairly blob-like outputs without clearboundaries The LinkNet models gave more distinct boundariesbut the clearest results were with the UNet and UNetUp modelsThe road segmentation masks did not appear as distinctivelydifferent in terms of visual comparison though the ENet basedmodels seemed to miss the most segments in this case

It is interesting to note that the model performance was notdirectly correlated to model size For instance in the binarysegmentation task in Table 5 it can be seen that ENetSeparableoutperformed ENet even though it had 30 fewer parametersThe number of parameters in these models were smaller by twoorders of magnitude compared to all the other models testedbut for the binary segmentation task these models were closeto the top performing UNetUp (VGG16) model However thetradeoff between model size and capacity became obvious inthe multiclass segmentation task where the smaller models didnot converge

From the results it can be seen that the VGG-based encodersoutperformed the ResNet-based encoders for all tasks For in-stance it can be seen from Table 5 that for the road segmenta-tion task using the UNet models the VGG11 and VGG16 en-coders were consistently better than the ResNet18 and ResNet34encoders This performance difference can also be seen acrossthe building segmentation and the multi-class segmentation tasks

The major difference between the UNet models and theLinkNet models is the way the skip link features are treatedIn the former the skip link features are concatenated with thecorresponding decoder features whereas in the latter they areadded to the decoder features to make the process more ef-ficient From the results it can be observed that the featureconcatenation in the UNet models allowed the network to learnmore discriminative features as these models always outper-formed their LinkNet equivalents even when the encoder wasthe same

Finally the proposed UNetUp with a VGG16 encoder out-performed all other models on the segmentation tasks It couldalso be seen that the UNetUp models outperformed equivalentUNet models when controlled for the encoder even though theyhad fewer parameters since they used upsampling instead oftransposed convolution layers

Table 5 also shows that all the models had better perfor-mance for the binary segmentation task as compared to multi-class segmentation for the same classes This seems to implythat more training data does not necessarily improve the perfor-mance if the task is more complex An example of the resultsof the UNetUp (VGG16) model on the multi-class segmenta-tion problem is seen in Figure 9 The sample image is the sameas the one shown for the building segmentation case in Figure 7

and by comparing the two it can be seen that the results in themulti-class case are less distinctive and more blob like

52 Quantitative Disaster Mapping Results

The precision-recall results of the obtained road networksare given in Table 7 The road networks created from the seg-mentation mask of the post-disaster image have been denoted asPost The results of the proposed method where the OSM roadnetwork were updated by removing all destroyed road segmentsare given as Diff

The Post results convey the generalisation capability of thetested networks across image datasets from different times sincethey were trained on pre-disaster imagery while the evalua-tion was over the post-disaster imagery In contrast to the pre-disaster results the best performing model for precision-recallwas UNet with a ResNet backend

It can be seen that in the case of Post the precision was usu-ally much higher than the recall implying that the segmentationnetwork has a higher number of false negatives than false pos-itives This was due to the fact that there were gaps in the seg-mentation mask caused due to occlusions from shadows build-ings etc LinkNet with a ResNet34 backend gave the highestrecall in this case

As Table 7 shows the proposed Diff framework helped im-prove the generated road graph regardless of the base networkused The difference in results between the various architec-tures also became less pronounced as can be seen in Table 7where the difference between the maximum and minimum F scorein the case of Post was approximately 8 whereas that for Diffwas 2 This was largely due to the fact that the proposedmethod benefited from prior knowledge from OSM Note thatthe OSM data is not completely accurate [24] However basedon empirical observations using this data provides significantlybetter results than assuming no prior knowledge

The connectivity results of the estimated post-disaster roadnetworks are reported in Table 8 Similar to the precision-recallresults it can be seen that the proposed framework improvedthe results by a large margin This was due to the fact thatthe output of the segmentation networks often had gaps whichcaused missing connections in the generated road networksThe use of the OSM network which is properly connectedas an initial estimate helped deal with these missing connec-tions This conjecture is supported by the number of pairs thatare marked as having rsquoNo Connectionsrsquo in Table 8 where usingthe Diff framework reduced the number of rsquoNo Connectionrsquopairs to half of those from Post The Post results had a num-ber of small disconnected segments and some spurious pathscaused due to a non-ideal segmentation mask The Diff resultson the other hand were much better connected However Diffdid have some incorrect segments where the mask differencemissed segments

53 Qualitative Disaster Impact Results

As outlined in Section 32 the difference between the seg-mentation masks from pre-disaster and post-disaster imagerycan be used for disaster impact assessment This process is

8

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 7 Visualisation of the building segmentation results using pretrained encoders

9

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 8 Visualisation of the road segmentation results using pretrained encoders

10

Table 5 Segmentation Results (IoU) using pretrained models

Model Binary MulticlassRoads Buildings Roads Buildings Average

UNet (VGG11) 3936 5758 3591 5192 4392UNet (VGG16) 4007 5972 3512 5223 4368UNet (ResNet18) 3643 5790 3079 5071 4075UNet (ResNet34) 3756 5829 3418 5108 4263LinkNet (ResNet18) 3508 5708 2925 5059 3992LinkNet (ResNet34) 372 5715 3270 5009 4140ENet 3634 5944 - - -ENetSeparable 3744 5958 - - -UNetUp (VGG11) 3916 5872 3386 5066 4226UNetUp (VGG16) 4113 6004 3612 5386 4499UNetUp (ResNet18) 3748 5808 3364 5262 4104UNetUp (ResNet34) 3897 5839 3469 5294 4191

(a) Image (b) Ground Truth (c) UNetUp (VGG16)

Figure 9 Multi-class segmentation output

11

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 2: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

Some recent work has explored the use of data from OSM [28]for training ML models in the absence of high quality man-ually labelled training data [20] OSM data can be assumedto be weakly labelled training data due to issues such as mis-registration and out-of-date labels The work in [20] showedthat a large enough training dataset helped alleviate the issuesof using a weakly labelled dataset

A framework to detect damaged roads from satellite im-agery and register them to OSM was recently proposed in [15]It was trained on publicly available OSM data forgoing the re-quirement for expensive manually annotated data It provideda method inspired by graph theory to automate the process ofupdating the OSM database by combining the changes detectedusing their framework with OSM road data This paper buildson [15] and shows how it can be extended to multiple semanticclasses A systematic analysis of the performance of differentneural networks for the task of aerial image segmentation is per-formed The effect of pretraining the neural networks on a largeimage dataset is also analysed Two variants of popular neuralnetworks are also proposed the first ENetSeparable focuseson efficiency and the second UNetUpsample on accuracy Fi-nally it is shown that the proposed framework from [15] canbe seen as being architecture agnostic since it helps reduce thedifference in segmentation performance from the various seg-mentation networks

2 Related Work

21 Aerial Image SegmentationRecent successes of deep learning models in image classi-

fication and big data analysis have promoted much increaseduse of such models in remote sensing for tasks such as landcover classification and change detection [43] Readers are re-ferred to [42] and [43] for an extensive background and reviewon the use of deep learning for remote sensing tasks Commontasks in this field are extraction of road networks [3 24 38] andbuilding footprints [39] using semantic segmentation networkspopularised by large-scale competitions such as DeepGlobe [9]and SpaceNet [40]

Most popular semantic segmentation architectures are struc-tured as encoder-decoder models popularised by UNet [33]The encoder consists of a number of blocks where each blocktakes an input image or feature map and produces a set of down-sampled feature maps which progressively identify higher levelfeatures The decoder network mirrors the encoder network andprogressively upsamples the output from the encoder networkIndividual decoder blocks are connected to the correspondingencoder blocks with skip links to help recover the fine-graineddetails lost in the downsampling The upsampling is typicallydone using transposed convolutions with learnable weights

A study on using OSM data for learning aerial image seg-mentation showed that using a large amount of weakly labelleddata for training helped achieve reasonable performance with-out the need for large well-labelled datasets [20] Alternativeschemes to train models for aerial image segmentation employself-supervision [37] and supervised pretraining on the Ima-geNet [2]

22 Disaster Analysis

Remote sensing is being increasingly used for disaster re-sponse management due to the increasing availability of remotesensing data which can be acquired relatively quickly [22] Themain datatypes used in such cases are synthetic aperture radar(SAR) and high resolution optical images SAR is extremelyuseful for dealing with low-light conditions and for areas withcloud cover It is especially useful in finding flooded areas [35]and identifying ground displacements after earthquakes [31]However it cannot be used in urban areas with the same effec-tiveness due to radar backscattering to the sensors caused bytall objects such as buildings [5]

High resolution optical imagery is typically used for visualinterpretation in the case of events such as hurricanes cyclonesand tsunamis which leave visible damages to an area Recentwork in automating this process has focused on assessing thedamage to buildings in disaster-struck areas A combinationof pre- and post-tsunami satellite images has been used to as-sess whether a building was washed away by a flood with theimplementation of a CNN [14] Similarly an approach fusingmulti-resolution multi-sensor and multi-temporal imagery in aCNN was used to segment flooded buildings [34] Howeverthese approaches require manually labelled post-disaster datafor training their networks which is time-consuming and ex-pensive to obtain

Automated road extraction from satellite imagery is an areaof interest since a number of location and navigation servicesrequire up-to-date road maps [25] A number of approaches us-ing segmentation methods have been proposed to extract roadnetworks These methods typically depend on heuristics-basedtechniques in post processing to fix incorrect gaps from thesegmentation networks [24] However these methods whilevalid in typical road extraction scenarios are not suitable inpost-disaster scenarios because gaps in the segmentation maskscould be caused by the effects of a disaster and are of extremeimportance

There is some existing research for road extraction in post-disaster scenarios Vehicle trajectories have been used for iden-tifying obstacles such as standing waters and fallen trees [7]Road centerline extraction from post-disaster imagery has usedOSM vector data for generating seed points and creating a moreaccurate road map following an earthquake which can causeregistration errors [23] This method only corrects the regis-tration errors but does not deal with the problem of destroyedroads A crowd-sourced pedestrian map builder has also beendeveloped [4] but it requires people walking around in poten-tially destroyed areas and is not scalable

Segmentation networks have also been used for detectingchanges caused by disasters [11] These methods use the differ-ence between outputs of pre-disaster and post-disaster imageryto obtain a measure on areas that have been damaged the mostIn contrast the current work extends the previous work by iden-tifying the changes to the road networks at a fine-grained levelFurthermore the proposed framework also allows for an up-date to OSM to achieve more realistic road network maps forthe area under consideration

2

3 Methodology

The proposed disaster impact assessment is based on find-ing the difference in roads and buildings between satellite im-agery from before and after a disaster This is done by usinga semantic segmentation network trained on pre-disaster aerialimagery for identifying these objects in the before and after im-agery The difference in the predicted road masks is furtherused to update data from OSM for finding accessible routes inthe post-disaster scenario

31 Segmentation Models

The models used in this study are modified versions of theUNet and LinkNet [6] The modifications were inspired by theTernausNet [18] which showed that replacing the UNet en-coder with a pretrained VGG11 encoder improved segmenta-tion results

Here a systematic study was carried out to compare the ef-fectiveness of different encoder backbones In the tested mod-els the encoder backbone was replaced by the convolutionallayers from VGG [36] and ResNet [16] for the UNet The orig-inal LinkNet model with ResNet18 as its encoder and anotherone with a ResNet34 backend were also tested

A slight modification of the UNet was also studied wherethe transposed convolutions in the decoders were replaced withthe nearest neighbour upsampling to deal with possible checker-board artifacts [27] This modified version has been called UN-etUp in the remainder of the text

Another model tested in this study is the ENet [29] It isan encoder-decoder model optimised for efficiency in terms oflatency and parameters with an encoder inspired by ResNet anda small decoder It uses early downsampling with a relativelylow number of feature maps to reduce the number of operationsrequired It also decomposes n times n convolutions into smallerconvolutions of ntimes1 and 1timesn[19] allowing for large speedups

Inspired by Xception-Net [8] a modified version of ENetcalled ENetSeparable is proposed In this model all convolu-tional filters are replaced by depthwise separable convolutionsa modification that reduces the number of parameters by 30

The loss function is a weighted cross entropy loss with anadditional soft Jaccard constraint and is given as follows

L = (1minusα)1I

Isumi

(minuswklog

eoiksumCc eoic

)minusα

Csumc

logsum

i eoic lowast ticsumi eoic + tic minus eoic lowast tic

(1)where

wk =

sumCc=1 S c

C times S k

Subscript k denotes the target class i indexes over all pixelswhere the total number of pixels is given by I and the totalnumber of classes is given by C The output for class c at pixeli is given by oic and t is a one-hot encoded target vector α is theweighting for the Jaccard loss The weight for class k is given

by wk and S k denotes the number of samples in the training setfor target class k while S c the number of samples for class c

32 Disaster Impact Assessment with Change DetectionThe segmentation network is used to identify buildings and

roads in pre-disaster and post-disaster aerial imagery Due toshadows and occlusions the segmentation output can have anumber of incorrect gaps The building and road segmentationmasks are dilated with a small kernel (eg 5 times 5) for several it-erations (6 in our experiments) to overcome some of these gapsThe resulting masks can be used to distinguish the infrastruc-ture that was destroyed due to the disaster as follows

Mdi f fp =

1 if Mprep isin 1 2 and Mpostp = 00 otherwise (2)

Subscript p indexes over all the pixels in each image andMdi f f is the disaster difference mask Mpre and Mpost are thesegmentation masks from pre and post-disaster imagery respec-tively The inferred label is one of 012 referring to the back-ground building or road class This function computes truefor any pixel that was identified as a road or building in thepre-disaster image but as background in the post-disaster imagesince that can be assumed to be damaged due to the disaster

Due to small mis-registration issues and non-ideal segmen-tation outputs the segmentation masks from the pre-disasterand post-disaster images do not completely overlap Hencesmall blobs in the difference mask can be assumed to be noiseor artifacts caused by the registration error Morphological ero-sion and opening are used to remove all such noise and the finalmask obtained represents the damaged infrastructure due to thedisaster The intermediate steps of this process are shown inFig 2

33 Generating Road GraphsThe output segmentation mask is converted to a road net-

work graph motivated by graph theory to obtain a map suitablefor route computation Firstly all pixels marked as road areextracted to form a road mask The road mask is dilated todeal with small gaps in the segmentation output since these cancause large errors in the network graph Morphological thin-ning is performed on the obtained mask to get a single pixelthick road skeleton The road skeleton is traversed to find allnodes where each node is any positive pixel with three or morepositive pixel neighbours All pixels between two nodes aremarked as part of an edge

Since the edges approximated with this method are fairlycrooked and small road segments can be assumed to be straightthe edges are simplified to piece-wise linear segments using theRamer-Douglas-Pecker algorithm [12]

34 Registering changes to OSMThe road network generated from post-disaster imagery us-

ing the methodology described above could be used for rout-ing in most scenarios However non-ideal segmentation masks

3

Figure 2 Pipeline for change detection in pre-disaster and post-disaster segmentation masks Roads shown in blue and buildings shown in green

can cause long detours when creating the road network graphHence it is proposed to further use data from OSM as the bestestimate of the world prior to a disaster and register the changescaused by such an event with the OSM road graph to obtainan updated map of the affected region The change graph canbe obtained from the difference mask generated in Section 32Note that the OSM data is not completely accurate [24] butbased on empirical observations using it provides more robustresults

There are a number of methods in graph theory for measur-ing graph similarity However these methods compare logicaltopology of graphs by looking for common nodes In the caseof road networks the physical topology is extremely importantand the graph comparison problem becomes non-trivial In suchcases corresponding nodes in the two graphs may not spatiallycoincide due to image offsets and errors in the segmentationmasks making the pre-existing methods of graph comparisonunfeasible

In order to compare the topological graphs each edge ofthe graphs Ga and Gb is sliced into smaller sub-segments oflength l to obtain simplified graphs Gprimea and Gprimeb Correspondingsub-segments in the two graphs can be found using Eq 3 [15]where two sub-segments are assumed to be corresponding ifboth vertices of one sub-segment are within a certain distanceof the other sub-segment A visual representation of this can beseen in Figure 3

forallea eb ea isin Gprimea eb isin Gprimebea = va1 va2 eb = vb1 vb2

ea = eb iff |a1 minus b1| lt l2 and |a2 minus b2| lt l2(3)

In Eq 3 ea and eb are the sub-segments in graphs Gprimea andGprimeb and are defined in terms of their two vertices va1 and va2and vb1 and vb2 respectively The euclidean distance betweentwo vertices is given by |a1minus b1| where a1 and b1 represent thecoordinates of the first vertices of va1 and vb1 respectively

Figure 3 Graph Comparison Top Gprimea in blue and Gprimeb in red with dashed circlesof radius l2 drawn around the nodes of Gprimea Bottom Common sub-segmentsshown in green non-corresponding sub-segments from Gprimea shown in purple andthose from Gprimeb shown in yellow

4

Figure 4 Dataset Extent Training extent in blue and testing extent in yellowThe split was chosen such that most of the damaged area was part of the testingextent

4 Experimental Setup

41 Neural Network Structures

The structures of the encoders and decoders are summarisedin Table 1 and Table 2 respectively In these tables convx-yimplies a convolutional layer with a kernel size of x and y fil-ters with the nc in the final layer meaning the number of out-put classes Similarly convTranx-y is a transposed convolutionlayer with a kernel size of x and y filters The individual de-coder block structure for the different models is given in Table3 All convolutional layers use the ReLU[26] activation and theencoders include pooling and batch-norm layers as proposedby the original authors Note that UNet-style architectures con-catenate the encoder feature map and the decoder feature mapwhile LinkNet architectures add the feature maps instead ofconcatenating them to make the network more efficient

42 Datasets

DigitalGlobersquos Open Data Program1 provides high resolu-tion satellite imagery in the wake of natural disasters to enablea timely response This study uses the data from Palu In-donesia which was struck by an earthquake and tsunami on 28September 2018 and had visible damages to its coastlines andinfrastructure The pre-disaster imagery was from 7th April2018 and the post disaster imagery was from 1st October 2018The imagery had a ground sampling distance of approximately50 cm pixelminus1

1httpswwwdigitalglobecomecosystemopen-data

An area of 45 km2 around Palu city was extracted for theexperiments with 14 km2 of the area with visible damage beingset aside for testing The remainder of the imagery was used fortraining and validation The dataset split is visualised in Fig 4where the area in yellow was used for testing

The labels for training the segmentation networks were down-loaded from OSM2 All polylines marked as motorways pri-mary secondary tertiary residential service trunk and theirlinks were extracted as roads The roads and buildings in OSMwere provided as vectors and polygons respectively They wereconverted to a raster format to create a dataset suitable for train-ing All the lat-long coordinates were converted to pixel coor-dinates The roads were rasterised with a buffer of 2 m and thebuilding polygons were rasterised as filled polygons For thebinary segmentation tasks separate road and building mask im-ages were generated where the target classes were labelled as 1For the multiclass segmentation experiments the buildings androads were labelled as 1 and 2 respectively The backgroundpixels were always marked with 0 The test datasets were an-notated manually

Note that only pre-disaster data was used for training theneural networks and the segmentation based results The post-disaster imagery was used purely for inference and for obtain-ing the post-disaster mapping results

43 Metrics

The Jaccard Index or Intersection over Union (IoU) is a typ-ical per-pixel metric for evaluating segmentation results It isgiven by Eq 4 and measures the overlap of predicted labelswith the true labels For the binary segmentation cases theIoU for the target class is reported and for the multi-class casethe mean IoU (mIOU) over the target classes is also reportedThe IoU for the background class is not included since the highnumber of background pixels would bias the results

IoU =T P

T P + FP + FN(4)

The IoU metric measures the segmentation performance butis not the most suitable metric for graphs because a small gapin the segmentation mask may only cause a small error in theIoU metric but can lead to large detours if the resulting road net-work is used for navigation As outlined in Section 34 compar-ing two topological graphs is a non-trivial task and graph con-nectivity is as important as graph completeness Herein twometrics are used the first to evaluate the completeness of thegraph and the second to evaluate the connectivity of the gener-ated graph

The first metric measures the similarity of the sub-segmentsdescribed in Section 34 using the precision-recall metrics asfollows

2httpswwwopenstreetmaporg

5

Table 1 Encoder Structures

Block VGG11 VGG16 ResNet18 ResNet34

enc1 conv3-64 conv3-64 conv7-64 conv7-64conv3-64

enc2 conv3-128 conv3-128 conv3-64 x2 conv3-64 x3conv3-128 conv3-64 conv3-64

enc3conv3-256 conv3-256 conv3-128

x2conv3-128

x4conv3-256 conv3-256 conv3-128 conv3-128conv3-256

enc4conv3-512 conv3-512 conv3-256

x2conv3-256

x6conv3-512 conv3-512 conv3-256 conv3-256conv3-512

enc4conv3-512 conv3-512 conv3-512

x2conv3-512

x3conv3-512 conv3-512 conv3-512 conv3-512conv3-512

Table 2 Decoder Structures

Block UNet UNetUp LinkNetcenter dec unet(512256) dec unet up(512256) dec link(512256)dec5 dec unet(512256) dec unet up(512256) dec link(256128)dec4 dec unet(256128) dec unet up(256128) dec link(12864)dec3 dec unet(12864) dec unet up(12864) dec link(6464)dec2 dec unet(6432) dec unet up(6432) convTran3-32dec1 conv3-32 conv3-32 conv3-32final conv3-nc conv3-nc conv3-nc

Table 3 Decoder Block Structure

Block Layers

dec unet(ab) conv3-aconvTran4-b

dec unet up(ab)upsampleconv3-aconv3-b

dec link(ab)conv3-a4convTran4-a4conv3-b

precision =T P

T P + FP

recall =T P

T P + FN

Fscore = 2 timesp times rp + r

(5)

The metric proposed in [41] has been reported for evaluat-ing graph connectivity This metric measures the similarity ofgraphs by comparing the shortest path length for a large set ofrandom source-destination pairs between the actual graph andthe predicted graph If the extracted paths have a similar lengththey can be assumed to be a match and are marked as rsquoCorrectrsquoin the results If the path length in the predicted graph is smallerthan the actual graph the generated graph has incorrect connec-

tions and this is reported as rsquoToo Shortrsquo Conversely if there areincorrect gaps in the predicted graph the paths are either rsquoTooLongrsquo or there are no possible paths giving rsquoNo Connectionsrsquo

44 Training DetailsThe models were trained using the Adam optimiser [21]

with a learning rate of 10minus4 A minibatch size of 5 images wasused for all the UNet-based models and 32 images for the othermodels The models were built in Pytorch [30] The VGG11VGG16 ResNet18 and ResNet34 models provided by the Py-torch model zoo were used for initialising the encoder networksin the pretrained networks He initialisation [17] was used forall the other layers

The training images and their corresponding masks werecropped to 416times416 pixels and were augmented with horizontaland vertical flipping All images were zero-mean normalisedOnly the pre-disaster images were used for training and thepost-disaster images were used for inference

All models were trained for 600 epochs to enable a fair com-parison between the different models A validation set was usedfor preventing overfitting the final model used for measuringthe performance was set to be the one where the validation lossconverged to just before starting diverging (ie overfitting onthe training set)

5 Results

51 Segmentation ResultsOur experiments tested the following scenarios

6

Figure 5 Buildings Validation Loss

bull Effect of pretraining with ImageNet on aerial image seg-mentation models

bull Efficiency vs accuracy trade-off between different pop-ular architectures

511 Effect of pretraining on aerial image segmentationThe first set of experiments were conducted to analyse the

effect of pretraining with ImageNet on the segmentation taskThe purpose of these experiments was two-fold Firstly whetherpretraining on a large classification dataset such as ImageNetimproves the accuracy of an unrelated task where the imagestatistics are quite different (ground-based object images forclassification vs aerial images for segmentation) Secondly as-suming accuracy with pretraining is similar or even better thanthat without whether pretraining improves the convergence speed

Four different network architectures were tested UNet withVGG11 or VGG16 backend and LinkNet with ResNet18 orResNet34 backend These models were trained for segmen-tation of buildings and roads while validation loss curves areshown in Fig 5 and Fig 6 respectively From the loss curvesit can be seen that pretrained networks generally convergedquicker than their non-pretrained equivalents requiring approx-imately 10 less epochs regardless of the model used and thetarget class The training curves for the VGG-based UNet mod-els for the road segmentation task also show that these modelsdid not converge very well when training from scratch thoughtheir validation loss which was used for preventing overfittingwas lower than their pretrained equivalent

The results of these models on the test set summarised inTable 4 show that in general the pretrained models had a higherIoU by a couple of points as compared to their non-pretrainedequivalents and this corresponds to the previous results in theliterature [2] Note that this section focuses on identifying thedifferences between training models from scratch and usingpretrained encoders The results across different models arecompared in the next section

Figure 6 Roads Validation Loss

Table 4 Effects of pretraining All results given as IoU

Model Pretrained Roads BuildingsUNet (VGG11) No 3773 5747UNet (VGG11) Yes 3936 5758UNet (VGG16) No 392 5772UNet (VGG16) Yes 4007 5972LinkNet (ResNet18) No 323 5129LinkNet (ResNet18) Yes 3508 5708LinkNet (ResNet34) No 3542 5482LinkNet (ResNet34) Yes 372 5715

7

512 Model capacity design and accuracyVisualisation of binary segmentation results for different

models can be seen in Figure 7 and Figure 8 and their quan-titative performance are reported in Table 5 Sizes of these dif-ferent models are reported in Table 6 From the results it can beseen that the proposed UNetUp with a VGG16 encoder outper-formed all other models by a couple of points on each task

The building segmentation masks in Figure 7 show that theENet-based models led to fairly blob-like outputs without clearboundaries The LinkNet models gave more distinct boundariesbut the clearest results were with the UNet and UNetUp modelsThe road segmentation masks did not appear as distinctivelydifferent in terms of visual comparison though the ENet basedmodels seemed to miss the most segments in this case

It is interesting to note that the model performance was notdirectly correlated to model size For instance in the binarysegmentation task in Table 5 it can be seen that ENetSeparableoutperformed ENet even though it had 30 fewer parametersThe number of parameters in these models were smaller by twoorders of magnitude compared to all the other models testedbut for the binary segmentation task these models were closeto the top performing UNetUp (VGG16) model However thetradeoff between model size and capacity became obvious inthe multiclass segmentation task where the smaller models didnot converge

From the results it can be seen that the VGG-based encodersoutperformed the ResNet-based encoders for all tasks For in-stance it can be seen from Table 5 that for the road segmenta-tion task using the UNet models the VGG11 and VGG16 en-coders were consistently better than the ResNet18 and ResNet34encoders This performance difference can also be seen acrossthe building segmentation and the multi-class segmentation tasks

The major difference between the UNet models and theLinkNet models is the way the skip link features are treatedIn the former the skip link features are concatenated with thecorresponding decoder features whereas in the latter they areadded to the decoder features to make the process more ef-ficient From the results it can be observed that the featureconcatenation in the UNet models allowed the network to learnmore discriminative features as these models always outper-formed their LinkNet equivalents even when the encoder wasthe same

Finally the proposed UNetUp with a VGG16 encoder out-performed all other models on the segmentation tasks It couldalso be seen that the UNetUp models outperformed equivalentUNet models when controlled for the encoder even though theyhad fewer parameters since they used upsampling instead oftransposed convolution layers

Table 5 also shows that all the models had better perfor-mance for the binary segmentation task as compared to multi-class segmentation for the same classes This seems to implythat more training data does not necessarily improve the perfor-mance if the task is more complex An example of the resultsof the UNetUp (VGG16) model on the multi-class segmenta-tion problem is seen in Figure 9 The sample image is the sameas the one shown for the building segmentation case in Figure 7

and by comparing the two it can be seen that the results in themulti-class case are less distinctive and more blob like

52 Quantitative Disaster Mapping Results

The precision-recall results of the obtained road networksare given in Table 7 The road networks created from the seg-mentation mask of the post-disaster image have been denoted asPost The results of the proposed method where the OSM roadnetwork were updated by removing all destroyed road segmentsare given as Diff

The Post results convey the generalisation capability of thetested networks across image datasets from different times sincethey were trained on pre-disaster imagery while the evalua-tion was over the post-disaster imagery In contrast to the pre-disaster results the best performing model for precision-recallwas UNet with a ResNet backend

It can be seen that in the case of Post the precision was usu-ally much higher than the recall implying that the segmentationnetwork has a higher number of false negatives than false pos-itives This was due to the fact that there were gaps in the seg-mentation mask caused due to occlusions from shadows build-ings etc LinkNet with a ResNet34 backend gave the highestrecall in this case

As Table 7 shows the proposed Diff framework helped im-prove the generated road graph regardless of the base networkused The difference in results between the various architec-tures also became less pronounced as can be seen in Table 7where the difference between the maximum and minimum F scorein the case of Post was approximately 8 whereas that for Diffwas 2 This was largely due to the fact that the proposedmethod benefited from prior knowledge from OSM Note thatthe OSM data is not completely accurate [24] However basedon empirical observations using this data provides significantlybetter results than assuming no prior knowledge

The connectivity results of the estimated post-disaster roadnetworks are reported in Table 8 Similar to the precision-recallresults it can be seen that the proposed framework improvedthe results by a large margin This was due to the fact thatthe output of the segmentation networks often had gaps whichcaused missing connections in the generated road networksThe use of the OSM network which is properly connectedas an initial estimate helped deal with these missing connec-tions This conjecture is supported by the number of pairs thatare marked as having rsquoNo Connectionsrsquo in Table 8 where usingthe Diff framework reduced the number of rsquoNo Connectionrsquopairs to half of those from Post The Post results had a num-ber of small disconnected segments and some spurious pathscaused due to a non-ideal segmentation mask The Diff resultson the other hand were much better connected However Diffdid have some incorrect segments where the mask differencemissed segments

53 Qualitative Disaster Impact Results

As outlined in Section 32 the difference between the seg-mentation masks from pre-disaster and post-disaster imagerycan be used for disaster impact assessment This process is

8

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 7 Visualisation of the building segmentation results using pretrained encoders

9

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 8 Visualisation of the road segmentation results using pretrained encoders

10

Table 5 Segmentation Results (IoU) using pretrained models

Model Binary MulticlassRoads Buildings Roads Buildings Average

UNet (VGG11) 3936 5758 3591 5192 4392UNet (VGG16) 4007 5972 3512 5223 4368UNet (ResNet18) 3643 5790 3079 5071 4075UNet (ResNet34) 3756 5829 3418 5108 4263LinkNet (ResNet18) 3508 5708 2925 5059 3992LinkNet (ResNet34) 372 5715 3270 5009 4140ENet 3634 5944 - - -ENetSeparable 3744 5958 - - -UNetUp (VGG11) 3916 5872 3386 5066 4226UNetUp (VGG16) 4113 6004 3612 5386 4499UNetUp (ResNet18) 3748 5808 3364 5262 4104UNetUp (ResNet34) 3897 5839 3469 5294 4191

(a) Image (b) Ground Truth (c) UNetUp (VGG16)

Figure 9 Multi-class segmentation output

11

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 3: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

3 Methodology

The proposed disaster impact assessment is based on find-ing the difference in roads and buildings between satellite im-agery from before and after a disaster This is done by usinga semantic segmentation network trained on pre-disaster aerialimagery for identifying these objects in the before and after im-agery The difference in the predicted road masks is furtherused to update data from OSM for finding accessible routes inthe post-disaster scenario

31 Segmentation Models

The models used in this study are modified versions of theUNet and LinkNet [6] The modifications were inspired by theTernausNet [18] which showed that replacing the UNet en-coder with a pretrained VGG11 encoder improved segmenta-tion results

Here a systematic study was carried out to compare the ef-fectiveness of different encoder backbones In the tested mod-els the encoder backbone was replaced by the convolutionallayers from VGG [36] and ResNet [16] for the UNet The orig-inal LinkNet model with ResNet18 as its encoder and anotherone with a ResNet34 backend were also tested

A slight modification of the UNet was also studied wherethe transposed convolutions in the decoders were replaced withthe nearest neighbour upsampling to deal with possible checker-board artifacts [27] This modified version has been called UN-etUp in the remainder of the text

Another model tested in this study is the ENet [29] It isan encoder-decoder model optimised for efficiency in terms oflatency and parameters with an encoder inspired by ResNet anda small decoder It uses early downsampling with a relativelylow number of feature maps to reduce the number of operationsrequired It also decomposes n times n convolutions into smallerconvolutions of ntimes1 and 1timesn[19] allowing for large speedups

Inspired by Xception-Net [8] a modified version of ENetcalled ENetSeparable is proposed In this model all convolu-tional filters are replaced by depthwise separable convolutionsa modification that reduces the number of parameters by 30

The loss function is a weighted cross entropy loss with anadditional soft Jaccard constraint and is given as follows

L = (1minusα)1I

Isumi

(minuswklog

eoiksumCc eoic

)minusα

Csumc

logsum

i eoic lowast ticsumi eoic + tic minus eoic lowast tic

(1)where

wk =

sumCc=1 S c

C times S k

Subscript k denotes the target class i indexes over all pixelswhere the total number of pixels is given by I and the totalnumber of classes is given by C The output for class c at pixeli is given by oic and t is a one-hot encoded target vector α is theweighting for the Jaccard loss The weight for class k is given

by wk and S k denotes the number of samples in the training setfor target class k while S c the number of samples for class c

32 Disaster Impact Assessment with Change DetectionThe segmentation network is used to identify buildings and

roads in pre-disaster and post-disaster aerial imagery Due toshadows and occlusions the segmentation output can have anumber of incorrect gaps The building and road segmentationmasks are dilated with a small kernel (eg 5 times 5) for several it-erations (6 in our experiments) to overcome some of these gapsThe resulting masks can be used to distinguish the infrastruc-ture that was destroyed due to the disaster as follows

Mdi f fp =

1 if Mprep isin 1 2 and Mpostp = 00 otherwise (2)

Subscript p indexes over all the pixels in each image andMdi f f is the disaster difference mask Mpre and Mpost are thesegmentation masks from pre and post-disaster imagery respec-tively The inferred label is one of 012 referring to the back-ground building or road class This function computes truefor any pixel that was identified as a road or building in thepre-disaster image but as background in the post-disaster imagesince that can be assumed to be damaged due to the disaster

Due to small mis-registration issues and non-ideal segmen-tation outputs the segmentation masks from the pre-disasterand post-disaster images do not completely overlap Hencesmall blobs in the difference mask can be assumed to be noiseor artifacts caused by the registration error Morphological ero-sion and opening are used to remove all such noise and the finalmask obtained represents the damaged infrastructure due to thedisaster The intermediate steps of this process are shown inFig 2

33 Generating Road GraphsThe output segmentation mask is converted to a road net-

work graph motivated by graph theory to obtain a map suitablefor route computation Firstly all pixels marked as road areextracted to form a road mask The road mask is dilated todeal with small gaps in the segmentation output since these cancause large errors in the network graph Morphological thin-ning is performed on the obtained mask to get a single pixelthick road skeleton The road skeleton is traversed to find allnodes where each node is any positive pixel with three or morepositive pixel neighbours All pixels between two nodes aremarked as part of an edge

Since the edges approximated with this method are fairlycrooked and small road segments can be assumed to be straightthe edges are simplified to piece-wise linear segments using theRamer-Douglas-Pecker algorithm [12]

34 Registering changes to OSMThe road network generated from post-disaster imagery us-

ing the methodology described above could be used for rout-ing in most scenarios However non-ideal segmentation masks

3

Figure 2 Pipeline for change detection in pre-disaster and post-disaster segmentation masks Roads shown in blue and buildings shown in green

can cause long detours when creating the road network graphHence it is proposed to further use data from OSM as the bestestimate of the world prior to a disaster and register the changescaused by such an event with the OSM road graph to obtainan updated map of the affected region The change graph canbe obtained from the difference mask generated in Section 32Note that the OSM data is not completely accurate [24] butbased on empirical observations using it provides more robustresults

There are a number of methods in graph theory for measur-ing graph similarity However these methods compare logicaltopology of graphs by looking for common nodes In the caseof road networks the physical topology is extremely importantand the graph comparison problem becomes non-trivial In suchcases corresponding nodes in the two graphs may not spatiallycoincide due to image offsets and errors in the segmentationmasks making the pre-existing methods of graph comparisonunfeasible

In order to compare the topological graphs each edge ofthe graphs Ga and Gb is sliced into smaller sub-segments oflength l to obtain simplified graphs Gprimea and Gprimeb Correspondingsub-segments in the two graphs can be found using Eq 3 [15]where two sub-segments are assumed to be corresponding ifboth vertices of one sub-segment are within a certain distanceof the other sub-segment A visual representation of this can beseen in Figure 3

forallea eb ea isin Gprimea eb isin Gprimebea = va1 va2 eb = vb1 vb2

ea = eb iff |a1 minus b1| lt l2 and |a2 minus b2| lt l2(3)

In Eq 3 ea and eb are the sub-segments in graphs Gprimea andGprimeb and are defined in terms of their two vertices va1 and va2and vb1 and vb2 respectively The euclidean distance betweentwo vertices is given by |a1minus b1| where a1 and b1 represent thecoordinates of the first vertices of va1 and vb1 respectively

Figure 3 Graph Comparison Top Gprimea in blue and Gprimeb in red with dashed circlesof radius l2 drawn around the nodes of Gprimea Bottom Common sub-segmentsshown in green non-corresponding sub-segments from Gprimea shown in purple andthose from Gprimeb shown in yellow

4

Figure 4 Dataset Extent Training extent in blue and testing extent in yellowThe split was chosen such that most of the damaged area was part of the testingextent

4 Experimental Setup

41 Neural Network Structures

The structures of the encoders and decoders are summarisedin Table 1 and Table 2 respectively In these tables convx-yimplies a convolutional layer with a kernel size of x and y fil-ters with the nc in the final layer meaning the number of out-put classes Similarly convTranx-y is a transposed convolutionlayer with a kernel size of x and y filters The individual de-coder block structure for the different models is given in Table3 All convolutional layers use the ReLU[26] activation and theencoders include pooling and batch-norm layers as proposedby the original authors Note that UNet-style architectures con-catenate the encoder feature map and the decoder feature mapwhile LinkNet architectures add the feature maps instead ofconcatenating them to make the network more efficient

42 Datasets

DigitalGlobersquos Open Data Program1 provides high resolu-tion satellite imagery in the wake of natural disasters to enablea timely response This study uses the data from Palu In-donesia which was struck by an earthquake and tsunami on 28September 2018 and had visible damages to its coastlines andinfrastructure The pre-disaster imagery was from 7th April2018 and the post disaster imagery was from 1st October 2018The imagery had a ground sampling distance of approximately50 cm pixelminus1

1httpswwwdigitalglobecomecosystemopen-data

An area of 45 km2 around Palu city was extracted for theexperiments with 14 km2 of the area with visible damage beingset aside for testing The remainder of the imagery was used fortraining and validation The dataset split is visualised in Fig 4where the area in yellow was used for testing

The labels for training the segmentation networks were down-loaded from OSM2 All polylines marked as motorways pri-mary secondary tertiary residential service trunk and theirlinks were extracted as roads The roads and buildings in OSMwere provided as vectors and polygons respectively They wereconverted to a raster format to create a dataset suitable for train-ing All the lat-long coordinates were converted to pixel coor-dinates The roads were rasterised with a buffer of 2 m and thebuilding polygons were rasterised as filled polygons For thebinary segmentation tasks separate road and building mask im-ages were generated where the target classes were labelled as 1For the multiclass segmentation experiments the buildings androads were labelled as 1 and 2 respectively The backgroundpixels were always marked with 0 The test datasets were an-notated manually

Note that only pre-disaster data was used for training theneural networks and the segmentation based results The post-disaster imagery was used purely for inference and for obtain-ing the post-disaster mapping results

43 Metrics

The Jaccard Index or Intersection over Union (IoU) is a typ-ical per-pixel metric for evaluating segmentation results It isgiven by Eq 4 and measures the overlap of predicted labelswith the true labels For the binary segmentation cases theIoU for the target class is reported and for the multi-class casethe mean IoU (mIOU) over the target classes is also reportedThe IoU for the background class is not included since the highnumber of background pixels would bias the results

IoU =T P

T P + FP + FN(4)

The IoU metric measures the segmentation performance butis not the most suitable metric for graphs because a small gapin the segmentation mask may only cause a small error in theIoU metric but can lead to large detours if the resulting road net-work is used for navigation As outlined in Section 34 compar-ing two topological graphs is a non-trivial task and graph con-nectivity is as important as graph completeness Herein twometrics are used the first to evaluate the completeness of thegraph and the second to evaluate the connectivity of the gener-ated graph

The first metric measures the similarity of the sub-segmentsdescribed in Section 34 using the precision-recall metrics asfollows

2httpswwwopenstreetmaporg

5

Table 1 Encoder Structures

Block VGG11 VGG16 ResNet18 ResNet34

enc1 conv3-64 conv3-64 conv7-64 conv7-64conv3-64

enc2 conv3-128 conv3-128 conv3-64 x2 conv3-64 x3conv3-128 conv3-64 conv3-64

enc3conv3-256 conv3-256 conv3-128

x2conv3-128

x4conv3-256 conv3-256 conv3-128 conv3-128conv3-256

enc4conv3-512 conv3-512 conv3-256

x2conv3-256

x6conv3-512 conv3-512 conv3-256 conv3-256conv3-512

enc4conv3-512 conv3-512 conv3-512

x2conv3-512

x3conv3-512 conv3-512 conv3-512 conv3-512conv3-512

Table 2 Decoder Structures

Block UNet UNetUp LinkNetcenter dec unet(512256) dec unet up(512256) dec link(512256)dec5 dec unet(512256) dec unet up(512256) dec link(256128)dec4 dec unet(256128) dec unet up(256128) dec link(12864)dec3 dec unet(12864) dec unet up(12864) dec link(6464)dec2 dec unet(6432) dec unet up(6432) convTran3-32dec1 conv3-32 conv3-32 conv3-32final conv3-nc conv3-nc conv3-nc

Table 3 Decoder Block Structure

Block Layers

dec unet(ab) conv3-aconvTran4-b

dec unet up(ab)upsampleconv3-aconv3-b

dec link(ab)conv3-a4convTran4-a4conv3-b

precision =T P

T P + FP

recall =T P

T P + FN

Fscore = 2 timesp times rp + r

(5)

The metric proposed in [41] has been reported for evaluat-ing graph connectivity This metric measures the similarity ofgraphs by comparing the shortest path length for a large set ofrandom source-destination pairs between the actual graph andthe predicted graph If the extracted paths have a similar lengththey can be assumed to be a match and are marked as rsquoCorrectrsquoin the results If the path length in the predicted graph is smallerthan the actual graph the generated graph has incorrect connec-

tions and this is reported as rsquoToo Shortrsquo Conversely if there areincorrect gaps in the predicted graph the paths are either rsquoTooLongrsquo or there are no possible paths giving rsquoNo Connectionsrsquo

44 Training DetailsThe models were trained using the Adam optimiser [21]

with a learning rate of 10minus4 A minibatch size of 5 images wasused for all the UNet-based models and 32 images for the othermodels The models were built in Pytorch [30] The VGG11VGG16 ResNet18 and ResNet34 models provided by the Py-torch model zoo were used for initialising the encoder networksin the pretrained networks He initialisation [17] was used forall the other layers

The training images and their corresponding masks werecropped to 416times416 pixels and were augmented with horizontaland vertical flipping All images were zero-mean normalisedOnly the pre-disaster images were used for training and thepost-disaster images were used for inference

All models were trained for 600 epochs to enable a fair com-parison between the different models A validation set was usedfor preventing overfitting the final model used for measuringthe performance was set to be the one where the validation lossconverged to just before starting diverging (ie overfitting onthe training set)

5 Results

51 Segmentation ResultsOur experiments tested the following scenarios

6

Figure 5 Buildings Validation Loss

bull Effect of pretraining with ImageNet on aerial image seg-mentation models

bull Efficiency vs accuracy trade-off between different pop-ular architectures

511 Effect of pretraining on aerial image segmentationThe first set of experiments were conducted to analyse the

effect of pretraining with ImageNet on the segmentation taskThe purpose of these experiments was two-fold Firstly whetherpretraining on a large classification dataset such as ImageNetimproves the accuracy of an unrelated task where the imagestatistics are quite different (ground-based object images forclassification vs aerial images for segmentation) Secondly as-suming accuracy with pretraining is similar or even better thanthat without whether pretraining improves the convergence speed

Four different network architectures were tested UNet withVGG11 or VGG16 backend and LinkNet with ResNet18 orResNet34 backend These models were trained for segmen-tation of buildings and roads while validation loss curves areshown in Fig 5 and Fig 6 respectively From the loss curvesit can be seen that pretrained networks generally convergedquicker than their non-pretrained equivalents requiring approx-imately 10 less epochs regardless of the model used and thetarget class The training curves for the VGG-based UNet mod-els for the road segmentation task also show that these modelsdid not converge very well when training from scratch thoughtheir validation loss which was used for preventing overfittingwas lower than their pretrained equivalent

The results of these models on the test set summarised inTable 4 show that in general the pretrained models had a higherIoU by a couple of points as compared to their non-pretrainedequivalents and this corresponds to the previous results in theliterature [2] Note that this section focuses on identifying thedifferences between training models from scratch and usingpretrained encoders The results across different models arecompared in the next section

Figure 6 Roads Validation Loss

Table 4 Effects of pretraining All results given as IoU

Model Pretrained Roads BuildingsUNet (VGG11) No 3773 5747UNet (VGG11) Yes 3936 5758UNet (VGG16) No 392 5772UNet (VGG16) Yes 4007 5972LinkNet (ResNet18) No 323 5129LinkNet (ResNet18) Yes 3508 5708LinkNet (ResNet34) No 3542 5482LinkNet (ResNet34) Yes 372 5715

7

512 Model capacity design and accuracyVisualisation of binary segmentation results for different

models can be seen in Figure 7 and Figure 8 and their quan-titative performance are reported in Table 5 Sizes of these dif-ferent models are reported in Table 6 From the results it can beseen that the proposed UNetUp with a VGG16 encoder outper-formed all other models by a couple of points on each task

The building segmentation masks in Figure 7 show that theENet-based models led to fairly blob-like outputs without clearboundaries The LinkNet models gave more distinct boundariesbut the clearest results were with the UNet and UNetUp modelsThe road segmentation masks did not appear as distinctivelydifferent in terms of visual comparison though the ENet basedmodels seemed to miss the most segments in this case

It is interesting to note that the model performance was notdirectly correlated to model size For instance in the binarysegmentation task in Table 5 it can be seen that ENetSeparableoutperformed ENet even though it had 30 fewer parametersThe number of parameters in these models were smaller by twoorders of magnitude compared to all the other models testedbut for the binary segmentation task these models were closeto the top performing UNetUp (VGG16) model However thetradeoff between model size and capacity became obvious inthe multiclass segmentation task where the smaller models didnot converge

From the results it can be seen that the VGG-based encodersoutperformed the ResNet-based encoders for all tasks For in-stance it can be seen from Table 5 that for the road segmenta-tion task using the UNet models the VGG11 and VGG16 en-coders were consistently better than the ResNet18 and ResNet34encoders This performance difference can also be seen acrossthe building segmentation and the multi-class segmentation tasks

The major difference between the UNet models and theLinkNet models is the way the skip link features are treatedIn the former the skip link features are concatenated with thecorresponding decoder features whereas in the latter they areadded to the decoder features to make the process more ef-ficient From the results it can be observed that the featureconcatenation in the UNet models allowed the network to learnmore discriminative features as these models always outper-formed their LinkNet equivalents even when the encoder wasthe same

Finally the proposed UNetUp with a VGG16 encoder out-performed all other models on the segmentation tasks It couldalso be seen that the UNetUp models outperformed equivalentUNet models when controlled for the encoder even though theyhad fewer parameters since they used upsampling instead oftransposed convolution layers

Table 5 also shows that all the models had better perfor-mance for the binary segmentation task as compared to multi-class segmentation for the same classes This seems to implythat more training data does not necessarily improve the perfor-mance if the task is more complex An example of the resultsof the UNetUp (VGG16) model on the multi-class segmenta-tion problem is seen in Figure 9 The sample image is the sameas the one shown for the building segmentation case in Figure 7

and by comparing the two it can be seen that the results in themulti-class case are less distinctive and more blob like

52 Quantitative Disaster Mapping Results

The precision-recall results of the obtained road networksare given in Table 7 The road networks created from the seg-mentation mask of the post-disaster image have been denoted asPost The results of the proposed method where the OSM roadnetwork were updated by removing all destroyed road segmentsare given as Diff

The Post results convey the generalisation capability of thetested networks across image datasets from different times sincethey were trained on pre-disaster imagery while the evalua-tion was over the post-disaster imagery In contrast to the pre-disaster results the best performing model for precision-recallwas UNet with a ResNet backend

It can be seen that in the case of Post the precision was usu-ally much higher than the recall implying that the segmentationnetwork has a higher number of false negatives than false pos-itives This was due to the fact that there were gaps in the seg-mentation mask caused due to occlusions from shadows build-ings etc LinkNet with a ResNet34 backend gave the highestrecall in this case

As Table 7 shows the proposed Diff framework helped im-prove the generated road graph regardless of the base networkused The difference in results between the various architec-tures also became less pronounced as can be seen in Table 7where the difference between the maximum and minimum F scorein the case of Post was approximately 8 whereas that for Diffwas 2 This was largely due to the fact that the proposedmethod benefited from prior knowledge from OSM Note thatthe OSM data is not completely accurate [24] However basedon empirical observations using this data provides significantlybetter results than assuming no prior knowledge

The connectivity results of the estimated post-disaster roadnetworks are reported in Table 8 Similar to the precision-recallresults it can be seen that the proposed framework improvedthe results by a large margin This was due to the fact thatthe output of the segmentation networks often had gaps whichcaused missing connections in the generated road networksThe use of the OSM network which is properly connectedas an initial estimate helped deal with these missing connec-tions This conjecture is supported by the number of pairs thatare marked as having rsquoNo Connectionsrsquo in Table 8 where usingthe Diff framework reduced the number of rsquoNo Connectionrsquopairs to half of those from Post The Post results had a num-ber of small disconnected segments and some spurious pathscaused due to a non-ideal segmentation mask The Diff resultson the other hand were much better connected However Diffdid have some incorrect segments where the mask differencemissed segments

53 Qualitative Disaster Impact Results

As outlined in Section 32 the difference between the seg-mentation masks from pre-disaster and post-disaster imagerycan be used for disaster impact assessment This process is

8

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 7 Visualisation of the building segmentation results using pretrained encoders

9

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 8 Visualisation of the road segmentation results using pretrained encoders

10

Table 5 Segmentation Results (IoU) using pretrained models

Model Binary MulticlassRoads Buildings Roads Buildings Average

UNet (VGG11) 3936 5758 3591 5192 4392UNet (VGG16) 4007 5972 3512 5223 4368UNet (ResNet18) 3643 5790 3079 5071 4075UNet (ResNet34) 3756 5829 3418 5108 4263LinkNet (ResNet18) 3508 5708 2925 5059 3992LinkNet (ResNet34) 372 5715 3270 5009 4140ENet 3634 5944 - - -ENetSeparable 3744 5958 - - -UNetUp (VGG11) 3916 5872 3386 5066 4226UNetUp (VGG16) 4113 6004 3612 5386 4499UNetUp (ResNet18) 3748 5808 3364 5262 4104UNetUp (ResNet34) 3897 5839 3469 5294 4191

(a) Image (b) Ground Truth (c) UNetUp (VGG16)

Figure 9 Multi-class segmentation output

11

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 4: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

Figure 2 Pipeline for change detection in pre-disaster and post-disaster segmentation masks Roads shown in blue and buildings shown in green

can cause long detours when creating the road network graphHence it is proposed to further use data from OSM as the bestestimate of the world prior to a disaster and register the changescaused by such an event with the OSM road graph to obtainan updated map of the affected region The change graph canbe obtained from the difference mask generated in Section 32Note that the OSM data is not completely accurate [24] butbased on empirical observations using it provides more robustresults

There are a number of methods in graph theory for measur-ing graph similarity However these methods compare logicaltopology of graphs by looking for common nodes In the caseof road networks the physical topology is extremely importantand the graph comparison problem becomes non-trivial In suchcases corresponding nodes in the two graphs may not spatiallycoincide due to image offsets and errors in the segmentationmasks making the pre-existing methods of graph comparisonunfeasible

In order to compare the topological graphs each edge ofthe graphs Ga and Gb is sliced into smaller sub-segments oflength l to obtain simplified graphs Gprimea and Gprimeb Correspondingsub-segments in the two graphs can be found using Eq 3 [15]where two sub-segments are assumed to be corresponding ifboth vertices of one sub-segment are within a certain distanceof the other sub-segment A visual representation of this can beseen in Figure 3

forallea eb ea isin Gprimea eb isin Gprimebea = va1 va2 eb = vb1 vb2

ea = eb iff |a1 minus b1| lt l2 and |a2 minus b2| lt l2(3)

In Eq 3 ea and eb are the sub-segments in graphs Gprimea andGprimeb and are defined in terms of their two vertices va1 and va2and vb1 and vb2 respectively The euclidean distance betweentwo vertices is given by |a1minus b1| where a1 and b1 represent thecoordinates of the first vertices of va1 and vb1 respectively

Figure 3 Graph Comparison Top Gprimea in blue and Gprimeb in red with dashed circlesof radius l2 drawn around the nodes of Gprimea Bottom Common sub-segmentsshown in green non-corresponding sub-segments from Gprimea shown in purple andthose from Gprimeb shown in yellow

4

Figure 4 Dataset Extent Training extent in blue and testing extent in yellowThe split was chosen such that most of the damaged area was part of the testingextent

4 Experimental Setup

41 Neural Network Structures

The structures of the encoders and decoders are summarisedin Table 1 and Table 2 respectively In these tables convx-yimplies a convolutional layer with a kernel size of x and y fil-ters with the nc in the final layer meaning the number of out-put classes Similarly convTranx-y is a transposed convolutionlayer with a kernel size of x and y filters The individual de-coder block structure for the different models is given in Table3 All convolutional layers use the ReLU[26] activation and theencoders include pooling and batch-norm layers as proposedby the original authors Note that UNet-style architectures con-catenate the encoder feature map and the decoder feature mapwhile LinkNet architectures add the feature maps instead ofconcatenating them to make the network more efficient

42 Datasets

DigitalGlobersquos Open Data Program1 provides high resolu-tion satellite imagery in the wake of natural disasters to enablea timely response This study uses the data from Palu In-donesia which was struck by an earthquake and tsunami on 28September 2018 and had visible damages to its coastlines andinfrastructure The pre-disaster imagery was from 7th April2018 and the post disaster imagery was from 1st October 2018The imagery had a ground sampling distance of approximately50 cm pixelminus1

1httpswwwdigitalglobecomecosystemopen-data

An area of 45 km2 around Palu city was extracted for theexperiments with 14 km2 of the area with visible damage beingset aside for testing The remainder of the imagery was used fortraining and validation The dataset split is visualised in Fig 4where the area in yellow was used for testing

The labels for training the segmentation networks were down-loaded from OSM2 All polylines marked as motorways pri-mary secondary tertiary residential service trunk and theirlinks were extracted as roads The roads and buildings in OSMwere provided as vectors and polygons respectively They wereconverted to a raster format to create a dataset suitable for train-ing All the lat-long coordinates were converted to pixel coor-dinates The roads were rasterised with a buffer of 2 m and thebuilding polygons were rasterised as filled polygons For thebinary segmentation tasks separate road and building mask im-ages were generated where the target classes were labelled as 1For the multiclass segmentation experiments the buildings androads were labelled as 1 and 2 respectively The backgroundpixels were always marked with 0 The test datasets were an-notated manually

Note that only pre-disaster data was used for training theneural networks and the segmentation based results The post-disaster imagery was used purely for inference and for obtain-ing the post-disaster mapping results

43 Metrics

The Jaccard Index or Intersection over Union (IoU) is a typ-ical per-pixel metric for evaluating segmentation results It isgiven by Eq 4 and measures the overlap of predicted labelswith the true labels For the binary segmentation cases theIoU for the target class is reported and for the multi-class casethe mean IoU (mIOU) over the target classes is also reportedThe IoU for the background class is not included since the highnumber of background pixels would bias the results

IoU =T P

T P + FP + FN(4)

The IoU metric measures the segmentation performance butis not the most suitable metric for graphs because a small gapin the segmentation mask may only cause a small error in theIoU metric but can lead to large detours if the resulting road net-work is used for navigation As outlined in Section 34 compar-ing two topological graphs is a non-trivial task and graph con-nectivity is as important as graph completeness Herein twometrics are used the first to evaluate the completeness of thegraph and the second to evaluate the connectivity of the gener-ated graph

The first metric measures the similarity of the sub-segmentsdescribed in Section 34 using the precision-recall metrics asfollows

2httpswwwopenstreetmaporg

5

Table 1 Encoder Structures

Block VGG11 VGG16 ResNet18 ResNet34

enc1 conv3-64 conv3-64 conv7-64 conv7-64conv3-64

enc2 conv3-128 conv3-128 conv3-64 x2 conv3-64 x3conv3-128 conv3-64 conv3-64

enc3conv3-256 conv3-256 conv3-128

x2conv3-128

x4conv3-256 conv3-256 conv3-128 conv3-128conv3-256

enc4conv3-512 conv3-512 conv3-256

x2conv3-256

x6conv3-512 conv3-512 conv3-256 conv3-256conv3-512

enc4conv3-512 conv3-512 conv3-512

x2conv3-512

x3conv3-512 conv3-512 conv3-512 conv3-512conv3-512

Table 2 Decoder Structures

Block UNet UNetUp LinkNetcenter dec unet(512256) dec unet up(512256) dec link(512256)dec5 dec unet(512256) dec unet up(512256) dec link(256128)dec4 dec unet(256128) dec unet up(256128) dec link(12864)dec3 dec unet(12864) dec unet up(12864) dec link(6464)dec2 dec unet(6432) dec unet up(6432) convTran3-32dec1 conv3-32 conv3-32 conv3-32final conv3-nc conv3-nc conv3-nc

Table 3 Decoder Block Structure

Block Layers

dec unet(ab) conv3-aconvTran4-b

dec unet up(ab)upsampleconv3-aconv3-b

dec link(ab)conv3-a4convTran4-a4conv3-b

precision =T P

T P + FP

recall =T P

T P + FN

Fscore = 2 timesp times rp + r

(5)

The metric proposed in [41] has been reported for evaluat-ing graph connectivity This metric measures the similarity ofgraphs by comparing the shortest path length for a large set ofrandom source-destination pairs between the actual graph andthe predicted graph If the extracted paths have a similar lengththey can be assumed to be a match and are marked as rsquoCorrectrsquoin the results If the path length in the predicted graph is smallerthan the actual graph the generated graph has incorrect connec-

tions and this is reported as rsquoToo Shortrsquo Conversely if there areincorrect gaps in the predicted graph the paths are either rsquoTooLongrsquo or there are no possible paths giving rsquoNo Connectionsrsquo

44 Training DetailsThe models were trained using the Adam optimiser [21]

with a learning rate of 10minus4 A minibatch size of 5 images wasused for all the UNet-based models and 32 images for the othermodels The models were built in Pytorch [30] The VGG11VGG16 ResNet18 and ResNet34 models provided by the Py-torch model zoo were used for initialising the encoder networksin the pretrained networks He initialisation [17] was used forall the other layers

The training images and their corresponding masks werecropped to 416times416 pixels and were augmented with horizontaland vertical flipping All images were zero-mean normalisedOnly the pre-disaster images were used for training and thepost-disaster images were used for inference

All models were trained for 600 epochs to enable a fair com-parison between the different models A validation set was usedfor preventing overfitting the final model used for measuringthe performance was set to be the one where the validation lossconverged to just before starting diverging (ie overfitting onthe training set)

5 Results

51 Segmentation ResultsOur experiments tested the following scenarios

6

Figure 5 Buildings Validation Loss

bull Effect of pretraining with ImageNet on aerial image seg-mentation models

bull Efficiency vs accuracy trade-off between different pop-ular architectures

511 Effect of pretraining on aerial image segmentationThe first set of experiments were conducted to analyse the

effect of pretraining with ImageNet on the segmentation taskThe purpose of these experiments was two-fold Firstly whetherpretraining on a large classification dataset such as ImageNetimproves the accuracy of an unrelated task where the imagestatistics are quite different (ground-based object images forclassification vs aerial images for segmentation) Secondly as-suming accuracy with pretraining is similar or even better thanthat without whether pretraining improves the convergence speed

Four different network architectures were tested UNet withVGG11 or VGG16 backend and LinkNet with ResNet18 orResNet34 backend These models were trained for segmen-tation of buildings and roads while validation loss curves areshown in Fig 5 and Fig 6 respectively From the loss curvesit can be seen that pretrained networks generally convergedquicker than their non-pretrained equivalents requiring approx-imately 10 less epochs regardless of the model used and thetarget class The training curves for the VGG-based UNet mod-els for the road segmentation task also show that these modelsdid not converge very well when training from scratch thoughtheir validation loss which was used for preventing overfittingwas lower than their pretrained equivalent

The results of these models on the test set summarised inTable 4 show that in general the pretrained models had a higherIoU by a couple of points as compared to their non-pretrainedequivalents and this corresponds to the previous results in theliterature [2] Note that this section focuses on identifying thedifferences between training models from scratch and usingpretrained encoders The results across different models arecompared in the next section

Figure 6 Roads Validation Loss

Table 4 Effects of pretraining All results given as IoU

Model Pretrained Roads BuildingsUNet (VGG11) No 3773 5747UNet (VGG11) Yes 3936 5758UNet (VGG16) No 392 5772UNet (VGG16) Yes 4007 5972LinkNet (ResNet18) No 323 5129LinkNet (ResNet18) Yes 3508 5708LinkNet (ResNet34) No 3542 5482LinkNet (ResNet34) Yes 372 5715

7

512 Model capacity design and accuracyVisualisation of binary segmentation results for different

models can be seen in Figure 7 and Figure 8 and their quan-titative performance are reported in Table 5 Sizes of these dif-ferent models are reported in Table 6 From the results it can beseen that the proposed UNetUp with a VGG16 encoder outper-formed all other models by a couple of points on each task

The building segmentation masks in Figure 7 show that theENet-based models led to fairly blob-like outputs without clearboundaries The LinkNet models gave more distinct boundariesbut the clearest results were with the UNet and UNetUp modelsThe road segmentation masks did not appear as distinctivelydifferent in terms of visual comparison though the ENet basedmodels seemed to miss the most segments in this case

It is interesting to note that the model performance was notdirectly correlated to model size For instance in the binarysegmentation task in Table 5 it can be seen that ENetSeparableoutperformed ENet even though it had 30 fewer parametersThe number of parameters in these models were smaller by twoorders of magnitude compared to all the other models testedbut for the binary segmentation task these models were closeto the top performing UNetUp (VGG16) model However thetradeoff between model size and capacity became obvious inthe multiclass segmentation task where the smaller models didnot converge

From the results it can be seen that the VGG-based encodersoutperformed the ResNet-based encoders for all tasks For in-stance it can be seen from Table 5 that for the road segmenta-tion task using the UNet models the VGG11 and VGG16 en-coders were consistently better than the ResNet18 and ResNet34encoders This performance difference can also be seen acrossthe building segmentation and the multi-class segmentation tasks

The major difference between the UNet models and theLinkNet models is the way the skip link features are treatedIn the former the skip link features are concatenated with thecorresponding decoder features whereas in the latter they areadded to the decoder features to make the process more ef-ficient From the results it can be observed that the featureconcatenation in the UNet models allowed the network to learnmore discriminative features as these models always outper-formed their LinkNet equivalents even when the encoder wasthe same

Finally the proposed UNetUp with a VGG16 encoder out-performed all other models on the segmentation tasks It couldalso be seen that the UNetUp models outperformed equivalentUNet models when controlled for the encoder even though theyhad fewer parameters since they used upsampling instead oftransposed convolution layers

Table 5 also shows that all the models had better perfor-mance for the binary segmentation task as compared to multi-class segmentation for the same classes This seems to implythat more training data does not necessarily improve the perfor-mance if the task is more complex An example of the resultsof the UNetUp (VGG16) model on the multi-class segmenta-tion problem is seen in Figure 9 The sample image is the sameas the one shown for the building segmentation case in Figure 7

and by comparing the two it can be seen that the results in themulti-class case are less distinctive and more blob like

52 Quantitative Disaster Mapping Results

The precision-recall results of the obtained road networksare given in Table 7 The road networks created from the seg-mentation mask of the post-disaster image have been denoted asPost The results of the proposed method where the OSM roadnetwork were updated by removing all destroyed road segmentsare given as Diff

The Post results convey the generalisation capability of thetested networks across image datasets from different times sincethey were trained on pre-disaster imagery while the evalua-tion was over the post-disaster imagery In contrast to the pre-disaster results the best performing model for precision-recallwas UNet with a ResNet backend

It can be seen that in the case of Post the precision was usu-ally much higher than the recall implying that the segmentationnetwork has a higher number of false negatives than false pos-itives This was due to the fact that there were gaps in the seg-mentation mask caused due to occlusions from shadows build-ings etc LinkNet with a ResNet34 backend gave the highestrecall in this case

As Table 7 shows the proposed Diff framework helped im-prove the generated road graph regardless of the base networkused The difference in results between the various architec-tures also became less pronounced as can be seen in Table 7where the difference between the maximum and minimum F scorein the case of Post was approximately 8 whereas that for Diffwas 2 This was largely due to the fact that the proposedmethod benefited from prior knowledge from OSM Note thatthe OSM data is not completely accurate [24] However basedon empirical observations using this data provides significantlybetter results than assuming no prior knowledge

The connectivity results of the estimated post-disaster roadnetworks are reported in Table 8 Similar to the precision-recallresults it can be seen that the proposed framework improvedthe results by a large margin This was due to the fact thatthe output of the segmentation networks often had gaps whichcaused missing connections in the generated road networksThe use of the OSM network which is properly connectedas an initial estimate helped deal with these missing connec-tions This conjecture is supported by the number of pairs thatare marked as having rsquoNo Connectionsrsquo in Table 8 where usingthe Diff framework reduced the number of rsquoNo Connectionrsquopairs to half of those from Post The Post results had a num-ber of small disconnected segments and some spurious pathscaused due to a non-ideal segmentation mask The Diff resultson the other hand were much better connected However Diffdid have some incorrect segments where the mask differencemissed segments

53 Qualitative Disaster Impact Results

As outlined in Section 32 the difference between the seg-mentation masks from pre-disaster and post-disaster imagerycan be used for disaster impact assessment This process is

8

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 7 Visualisation of the building segmentation results using pretrained encoders

9

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 8 Visualisation of the road segmentation results using pretrained encoders

10

Table 5 Segmentation Results (IoU) using pretrained models

Model Binary MulticlassRoads Buildings Roads Buildings Average

UNet (VGG11) 3936 5758 3591 5192 4392UNet (VGG16) 4007 5972 3512 5223 4368UNet (ResNet18) 3643 5790 3079 5071 4075UNet (ResNet34) 3756 5829 3418 5108 4263LinkNet (ResNet18) 3508 5708 2925 5059 3992LinkNet (ResNet34) 372 5715 3270 5009 4140ENet 3634 5944 - - -ENetSeparable 3744 5958 - - -UNetUp (VGG11) 3916 5872 3386 5066 4226UNetUp (VGG16) 4113 6004 3612 5386 4499UNetUp (ResNet18) 3748 5808 3364 5262 4104UNetUp (ResNet34) 3897 5839 3469 5294 4191

(a) Image (b) Ground Truth (c) UNetUp (VGG16)

Figure 9 Multi-class segmentation output

11

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 5: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

Figure 4 Dataset Extent Training extent in blue and testing extent in yellowThe split was chosen such that most of the damaged area was part of the testingextent

4 Experimental Setup

41 Neural Network Structures

The structures of the encoders and decoders are summarisedin Table 1 and Table 2 respectively In these tables convx-yimplies a convolutional layer with a kernel size of x and y fil-ters with the nc in the final layer meaning the number of out-put classes Similarly convTranx-y is a transposed convolutionlayer with a kernel size of x and y filters The individual de-coder block structure for the different models is given in Table3 All convolutional layers use the ReLU[26] activation and theencoders include pooling and batch-norm layers as proposedby the original authors Note that UNet-style architectures con-catenate the encoder feature map and the decoder feature mapwhile LinkNet architectures add the feature maps instead ofconcatenating them to make the network more efficient

42 Datasets

DigitalGlobersquos Open Data Program1 provides high resolu-tion satellite imagery in the wake of natural disasters to enablea timely response This study uses the data from Palu In-donesia which was struck by an earthquake and tsunami on 28September 2018 and had visible damages to its coastlines andinfrastructure The pre-disaster imagery was from 7th April2018 and the post disaster imagery was from 1st October 2018The imagery had a ground sampling distance of approximately50 cm pixelminus1

1httpswwwdigitalglobecomecosystemopen-data

An area of 45 km2 around Palu city was extracted for theexperiments with 14 km2 of the area with visible damage beingset aside for testing The remainder of the imagery was used fortraining and validation The dataset split is visualised in Fig 4where the area in yellow was used for testing

The labels for training the segmentation networks were down-loaded from OSM2 All polylines marked as motorways pri-mary secondary tertiary residential service trunk and theirlinks were extracted as roads The roads and buildings in OSMwere provided as vectors and polygons respectively They wereconverted to a raster format to create a dataset suitable for train-ing All the lat-long coordinates were converted to pixel coor-dinates The roads were rasterised with a buffer of 2 m and thebuilding polygons were rasterised as filled polygons For thebinary segmentation tasks separate road and building mask im-ages were generated where the target classes were labelled as 1For the multiclass segmentation experiments the buildings androads were labelled as 1 and 2 respectively The backgroundpixels were always marked with 0 The test datasets were an-notated manually

Note that only pre-disaster data was used for training theneural networks and the segmentation based results The post-disaster imagery was used purely for inference and for obtain-ing the post-disaster mapping results

43 Metrics

The Jaccard Index or Intersection over Union (IoU) is a typ-ical per-pixel metric for evaluating segmentation results It isgiven by Eq 4 and measures the overlap of predicted labelswith the true labels For the binary segmentation cases theIoU for the target class is reported and for the multi-class casethe mean IoU (mIOU) over the target classes is also reportedThe IoU for the background class is not included since the highnumber of background pixels would bias the results

IoU =T P

T P + FP + FN(4)

The IoU metric measures the segmentation performance butis not the most suitable metric for graphs because a small gapin the segmentation mask may only cause a small error in theIoU metric but can lead to large detours if the resulting road net-work is used for navigation As outlined in Section 34 compar-ing two topological graphs is a non-trivial task and graph con-nectivity is as important as graph completeness Herein twometrics are used the first to evaluate the completeness of thegraph and the second to evaluate the connectivity of the gener-ated graph

The first metric measures the similarity of the sub-segmentsdescribed in Section 34 using the precision-recall metrics asfollows

2httpswwwopenstreetmaporg

5

Table 1 Encoder Structures

Block VGG11 VGG16 ResNet18 ResNet34

enc1 conv3-64 conv3-64 conv7-64 conv7-64conv3-64

enc2 conv3-128 conv3-128 conv3-64 x2 conv3-64 x3conv3-128 conv3-64 conv3-64

enc3conv3-256 conv3-256 conv3-128

x2conv3-128

x4conv3-256 conv3-256 conv3-128 conv3-128conv3-256

enc4conv3-512 conv3-512 conv3-256

x2conv3-256

x6conv3-512 conv3-512 conv3-256 conv3-256conv3-512

enc4conv3-512 conv3-512 conv3-512

x2conv3-512

x3conv3-512 conv3-512 conv3-512 conv3-512conv3-512

Table 2 Decoder Structures

Block UNet UNetUp LinkNetcenter dec unet(512256) dec unet up(512256) dec link(512256)dec5 dec unet(512256) dec unet up(512256) dec link(256128)dec4 dec unet(256128) dec unet up(256128) dec link(12864)dec3 dec unet(12864) dec unet up(12864) dec link(6464)dec2 dec unet(6432) dec unet up(6432) convTran3-32dec1 conv3-32 conv3-32 conv3-32final conv3-nc conv3-nc conv3-nc

Table 3 Decoder Block Structure

Block Layers

dec unet(ab) conv3-aconvTran4-b

dec unet up(ab)upsampleconv3-aconv3-b

dec link(ab)conv3-a4convTran4-a4conv3-b

precision =T P

T P + FP

recall =T P

T P + FN

Fscore = 2 timesp times rp + r

(5)

The metric proposed in [41] has been reported for evaluat-ing graph connectivity This metric measures the similarity ofgraphs by comparing the shortest path length for a large set ofrandom source-destination pairs between the actual graph andthe predicted graph If the extracted paths have a similar lengththey can be assumed to be a match and are marked as rsquoCorrectrsquoin the results If the path length in the predicted graph is smallerthan the actual graph the generated graph has incorrect connec-

tions and this is reported as rsquoToo Shortrsquo Conversely if there areincorrect gaps in the predicted graph the paths are either rsquoTooLongrsquo or there are no possible paths giving rsquoNo Connectionsrsquo

44 Training DetailsThe models were trained using the Adam optimiser [21]

with a learning rate of 10minus4 A minibatch size of 5 images wasused for all the UNet-based models and 32 images for the othermodels The models were built in Pytorch [30] The VGG11VGG16 ResNet18 and ResNet34 models provided by the Py-torch model zoo were used for initialising the encoder networksin the pretrained networks He initialisation [17] was used forall the other layers

The training images and their corresponding masks werecropped to 416times416 pixels and were augmented with horizontaland vertical flipping All images were zero-mean normalisedOnly the pre-disaster images were used for training and thepost-disaster images were used for inference

All models were trained for 600 epochs to enable a fair com-parison between the different models A validation set was usedfor preventing overfitting the final model used for measuringthe performance was set to be the one where the validation lossconverged to just before starting diverging (ie overfitting onthe training set)

5 Results

51 Segmentation ResultsOur experiments tested the following scenarios

6

Figure 5 Buildings Validation Loss

bull Effect of pretraining with ImageNet on aerial image seg-mentation models

bull Efficiency vs accuracy trade-off between different pop-ular architectures

511 Effect of pretraining on aerial image segmentationThe first set of experiments were conducted to analyse the

effect of pretraining with ImageNet on the segmentation taskThe purpose of these experiments was two-fold Firstly whetherpretraining on a large classification dataset such as ImageNetimproves the accuracy of an unrelated task where the imagestatistics are quite different (ground-based object images forclassification vs aerial images for segmentation) Secondly as-suming accuracy with pretraining is similar or even better thanthat without whether pretraining improves the convergence speed

Four different network architectures were tested UNet withVGG11 or VGG16 backend and LinkNet with ResNet18 orResNet34 backend These models were trained for segmen-tation of buildings and roads while validation loss curves areshown in Fig 5 and Fig 6 respectively From the loss curvesit can be seen that pretrained networks generally convergedquicker than their non-pretrained equivalents requiring approx-imately 10 less epochs regardless of the model used and thetarget class The training curves for the VGG-based UNet mod-els for the road segmentation task also show that these modelsdid not converge very well when training from scratch thoughtheir validation loss which was used for preventing overfittingwas lower than their pretrained equivalent

The results of these models on the test set summarised inTable 4 show that in general the pretrained models had a higherIoU by a couple of points as compared to their non-pretrainedequivalents and this corresponds to the previous results in theliterature [2] Note that this section focuses on identifying thedifferences between training models from scratch and usingpretrained encoders The results across different models arecompared in the next section

Figure 6 Roads Validation Loss

Table 4 Effects of pretraining All results given as IoU

Model Pretrained Roads BuildingsUNet (VGG11) No 3773 5747UNet (VGG11) Yes 3936 5758UNet (VGG16) No 392 5772UNet (VGG16) Yes 4007 5972LinkNet (ResNet18) No 323 5129LinkNet (ResNet18) Yes 3508 5708LinkNet (ResNet34) No 3542 5482LinkNet (ResNet34) Yes 372 5715

7

512 Model capacity design and accuracyVisualisation of binary segmentation results for different

models can be seen in Figure 7 and Figure 8 and their quan-titative performance are reported in Table 5 Sizes of these dif-ferent models are reported in Table 6 From the results it can beseen that the proposed UNetUp with a VGG16 encoder outper-formed all other models by a couple of points on each task

The building segmentation masks in Figure 7 show that theENet-based models led to fairly blob-like outputs without clearboundaries The LinkNet models gave more distinct boundariesbut the clearest results were with the UNet and UNetUp modelsThe road segmentation masks did not appear as distinctivelydifferent in terms of visual comparison though the ENet basedmodels seemed to miss the most segments in this case

It is interesting to note that the model performance was notdirectly correlated to model size For instance in the binarysegmentation task in Table 5 it can be seen that ENetSeparableoutperformed ENet even though it had 30 fewer parametersThe number of parameters in these models were smaller by twoorders of magnitude compared to all the other models testedbut for the binary segmentation task these models were closeto the top performing UNetUp (VGG16) model However thetradeoff between model size and capacity became obvious inthe multiclass segmentation task where the smaller models didnot converge

From the results it can be seen that the VGG-based encodersoutperformed the ResNet-based encoders for all tasks For in-stance it can be seen from Table 5 that for the road segmenta-tion task using the UNet models the VGG11 and VGG16 en-coders were consistently better than the ResNet18 and ResNet34encoders This performance difference can also be seen acrossthe building segmentation and the multi-class segmentation tasks

The major difference between the UNet models and theLinkNet models is the way the skip link features are treatedIn the former the skip link features are concatenated with thecorresponding decoder features whereas in the latter they areadded to the decoder features to make the process more ef-ficient From the results it can be observed that the featureconcatenation in the UNet models allowed the network to learnmore discriminative features as these models always outper-formed their LinkNet equivalents even when the encoder wasthe same

Finally the proposed UNetUp with a VGG16 encoder out-performed all other models on the segmentation tasks It couldalso be seen that the UNetUp models outperformed equivalentUNet models when controlled for the encoder even though theyhad fewer parameters since they used upsampling instead oftransposed convolution layers

Table 5 also shows that all the models had better perfor-mance for the binary segmentation task as compared to multi-class segmentation for the same classes This seems to implythat more training data does not necessarily improve the perfor-mance if the task is more complex An example of the resultsof the UNetUp (VGG16) model on the multi-class segmenta-tion problem is seen in Figure 9 The sample image is the sameas the one shown for the building segmentation case in Figure 7

and by comparing the two it can be seen that the results in themulti-class case are less distinctive and more blob like

52 Quantitative Disaster Mapping Results

The precision-recall results of the obtained road networksare given in Table 7 The road networks created from the seg-mentation mask of the post-disaster image have been denoted asPost The results of the proposed method where the OSM roadnetwork were updated by removing all destroyed road segmentsare given as Diff

The Post results convey the generalisation capability of thetested networks across image datasets from different times sincethey were trained on pre-disaster imagery while the evalua-tion was over the post-disaster imagery In contrast to the pre-disaster results the best performing model for precision-recallwas UNet with a ResNet backend

It can be seen that in the case of Post the precision was usu-ally much higher than the recall implying that the segmentationnetwork has a higher number of false negatives than false pos-itives This was due to the fact that there were gaps in the seg-mentation mask caused due to occlusions from shadows build-ings etc LinkNet with a ResNet34 backend gave the highestrecall in this case

As Table 7 shows the proposed Diff framework helped im-prove the generated road graph regardless of the base networkused The difference in results between the various architec-tures also became less pronounced as can be seen in Table 7where the difference between the maximum and minimum F scorein the case of Post was approximately 8 whereas that for Diffwas 2 This was largely due to the fact that the proposedmethod benefited from prior knowledge from OSM Note thatthe OSM data is not completely accurate [24] However basedon empirical observations using this data provides significantlybetter results than assuming no prior knowledge

The connectivity results of the estimated post-disaster roadnetworks are reported in Table 8 Similar to the precision-recallresults it can be seen that the proposed framework improvedthe results by a large margin This was due to the fact thatthe output of the segmentation networks often had gaps whichcaused missing connections in the generated road networksThe use of the OSM network which is properly connectedas an initial estimate helped deal with these missing connec-tions This conjecture is supported by the number of pairs thatare marked as having rsquoNo Connectionsrsquo in Table 8 where usingthe Diff framework reduced the number of rsquoNo Connectionrsquopairs to half of those from Post The Post results had a num-ber of small disconnected segments and some spurious pathscaused due to a non-ideal segmentation mask The Diff resultson the other hand were much better connected However Diffdid have some incorrect segments where the mask differencemissed segments

53 Qualitative Disaster Impact Results

As outlined in Section 32 the difference between the seg-mentation masks from pre-disaster and post-disaster imagerycan be used for disaster impact assessment This process is

8

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 7 Visualisation of the building segmentation results using pretrained encoders

9

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 8 Visualisation of the road segmentation results using pretrained encoders

10

Table 5 Segmentation Results (IoU) using pretrained models

Model Binary MulticlassRoads Buildings Roads Buildings Average

UNet (VGG11) 3936 5758 3591 5192 4392UNet (VGG16) 4007 5972 3512 5223 4368UNet (ResNet18) 3643 5790 3079 5071 4075UNet (ResNet34) 3756 5829 3418 5108 4263LinkNet (ResNet18) 3508 5708 2925 5059 3992LinkNet (ResNet34) 372 5715 3270 5009 4140ENet 3634 5944 - - -ENetSeparable 3744 5958 - - -UNetUp (VGG11) 3916 5872 3386 5066 4226UNetUp (VGG16) 4113 6004 3612 5386 4499UNetUp (ResNet18) 3748 5808 3364 5262 4104UNetUp (ResNet34) 3897 5839 3469 5294 4191

(a) Image (b) Ground Truth (c) UNetUp (VGG16)

Figure 9 Multi-class segmentation output

11

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 6: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

Table 1 Encoder Structures

Block VGG11 VGG16 ResNet18 ResNet34

enc1 conv3-64 conv3-64 conv7-64 conv7-64conv3-64

enc2 conv3-128 conv3-128 conv3-64 x2 conv3-64 x3conv3-128 conv3-64 conv3-64

enc3conv3-256 conv3-256 conv3-128

x2conv3-128

x4conv3-256 conv3-256 conv3-128 conv3-128conv3-256

enc4conv3-512 conv3-512 conv3-256

x2conv3-256

x6conv3-512 conv3-512 conv3-256 conv3-256conv3-512

enc4conv3-512 conv3-512 conv3-512

x2conv3-512

x3conv3-512 conv3-512 conv3-512 conv3-512conv3-512

Table 2 Decoder Structures

Block UNet UNetUp LinkNetcenter dec unet(512256) dec unet up(512256) dec link(512256)dec5 dec unet(512256) dec unet up(512256) dec link(256128)dec4 dec unet(256128) dec unet up(256128) dec link(12864)dec3 dec unet(12864) dec unet up(12864) dec link(6464)dec2 dec unet(6432) dec unet up(6432) convTran3-32dec1 conv3-32 conv3-32 conv3-32final conv3-nc conv3-nc conv3-nc

Table 3 Decoder Block Structure

Block Layers

dec unet(ab) conv3-aconvTran4-b

dec unet up(ab)upsampleconv3-aconv3-b

dec link(ab)conv3-a4convTran4-a4conv3-b

precision =T P

T P + FP

recall =T P

T P + FN

Fscore = 2 timesp times rp + r

(5)

The metric proposed in [41] has been reported for evaluat-ing graph connectivity This metric measures the similarity ofgraphs by comparing the shortest path length for a large set ofrandom source-destination pairs between the actual graph andthe predicted graph If the extracted paths have a similar lengththey can be assumed to be a match and are marked as rsquoCorrectrsquoin the results If the path length in the predicted graph is smallerthan the actual graph the generated graph has incorrect connec-

tions and this is reported as rsquoToo Shortrsquo Conversely if there areincorrect gaps in the predicted graph the paths are either rsquoTooLongrsquo or there are no possible paths giving rsquoNo Connectionsrsquo

44 Training DetailsThe models were trained using the Adam optimiser [21]

with a learning rate of 10minus4 A minibatch size of 5 images wasused for all the UNet-based models and 32 images for the othermodels The models were built in Pytorch [30] The VGG11VGG16 ResNet18 and ResNet34 models provided by the Py-torch model zoo were used for initialising the encoder networksin the pretrained networks He initialisation [17] was used forall the other layers

The training images and their corresponding masks werecropped to 416times416 pixels and were augmented with horizontaland vertical flipping All images were zero-mean normalisedOnly the pre-disaster images were used for training and thepost-disaster images were used for inference

All models were trained for 600 epochs to enable a fair com-parison between the different models A validation set was usedfor preventing overfitting the final model used for measuringthe performance was set to be the one where the validation lossconverged to just before starting diverging (ie overfitting onthe training set)

5 Results

51 Segmentation ResultsOur experiments tested the following scenarios

6

Figure 5 Buildings Validation Loss

bull Effect of pretraining with ImageNet on aerial image seg-mentation models

bull Efficiency vs accuracy trade-off between different pop-ular architectures

511 Effect of pretraining on aerial image segmentationThe first set of experiments were conducted to analyse the

effect of pretraining with ImageNet on the segmentation taskThe purpose of these experiments was two-fold Firstly whetherpretraining on a large classification dataset such as ImageNetimproves the accuracy of an unrelated task where the imagestatistics are quite different (ground-based object images forclassification vs aerial images for segmentation) Secondly as-suming accuracy with pretraining is similar or even better thanthat without whether pretraining improves the convergence speed

Four different network architectures were tested UNet withVGG11 or VGG16 backend and LinkNet with ResNet18 orResNet34 backend These models were trained for segmen-tation of buildings and roads while validation loss curves areshown in Fig 5 and Fig 6 respectively From the loss curvesit can be seen that pretrained networks generally convergedquicker than their non-pretrained equivalents requiring approx-imately 10 less epochs regardless of the model used and thetarget class The training curves for the VGG-based UNet mod-els for the road segmentation task also show that these modelsdid not converge very well when training from scratch thoughtheir validation loss which was used for preventing overfittingwas lower than their pretrained equivalent

The results of these models on the test set summarised inTable 4 show that in general the pretrained models had a higherIoU by a couple of points as compared to their non-pretrainedequivalents and this corresponds to the previous results in theliterature [2] Note that this section focuses on identifying thedifferences between training models from scratch and usingpretrained encoders The results across different models arecompared in the next section

Figure 6 Roads Validation Loss

Table 4 Effects of pretraining All results given as IoU

Model Pretrained Roads BuildingsUNet (VGG11) No 3773 5747UNet (VGG11) Yes 3936 5758UNet (VGG16) No 392 5772UNet (VGG16) Yes 4007 5972LinkNet (ResNet18) No 323 5129LinkNet (ResNet18) Yes 3508 5708LinkNet (ResNet34) No 3542 5482LinkNet (ResNet34) Yes 372 5715

7

512 Model capacity design and accuracyVisualisation of binary segmentation results for different

models can be seen in Figure 7 and Figure 8 and their quan-titative performance are reported in Table 5 Sizes of these dif-ferent models are reported in Table 6 From the results it can beseen that the proposed UNetUp with a VGG16 encoder outper-formed all other models by a couple of points on each task

The building segmentation masks in Figure 7 show that theENet-based models led to fairly blob-like outputs without clearboundaries The LinkNet models gave more distinct boundariesbut the clearest results were with the UNet and UNetUp modelsThe road segmentation masks did not appear as distinctivelydifferent in terms of visual comparison though the ENet basedmodels seemed to miss the most segments in this case

It is interesting to note that the model performance was notdirectly correlated to model size For instance in the binarysegmentation task in Table 5 it can be seen that ENetSeparableoutperformed ENet even though it had 30 fewer parametersThe number of parameters in these models were smaller by twoorders of magnitude compared to all the other models testedbut for the binary segmentation task these models were closeto the top performing UNetUp (VGG16) model However thetradeoff between model size and capacity became obvious inthe multiclass segmentation task where the smaller models didnot converge

From the results it can be seen that the VGG-based encodersoutperformed the ResNet-based encoders for all tasks For in-stance it can be seen from Table 5 that for the road segmenta-tion task using the UNet models the VGG11 and VGG16 en-coders were consistently better than the ResNet18 and ResNet34encoders This performance difference can also be seen acrossthe building segmentation and the multi-class segmentation tasks

The major difference between the UNet models and theLinkNet models is the way the skip link features are treatedIn the former the skip link features are concatenated with thecorresponding decoder features whereas in the latter they areadded to the decoder features to make the process more ef-ficient From the results it can be observed that the featureconcatenation in the UNet models allowed the network to learnmore discriminative features as these models always outper-formed their LinkNet equivalents even when the encoder wasthe same

Finally the proposed UNetUp with a VGG16 encoder out-performed all other models on the segmentation tasks It couldalso be seen that the UNetUp models outperformed equivalentUNet models when controlled for the encoder even though theyhad fewer parameters since they used upsampling instead oftransposed convolution layers

Table 5 also shows that all the models had better perfor-mance for the binary segmentation task as compared to multi-class segmentation for the same classes This seems to implythat more training data does not necessarily improve the perfor-mance if the task is more complex An example of the resultsof the UNetUp (VGG16) model on the multi-class segmenta-tion problem is seen in Figure 9 The sample image is the sameas the one shown for the building segmentation case in Figure 7

and by comparing the two it can be seen that the results in themulti-class case are less distinctive and more blob like

52 Quantitative Disaster Mapping Results

The precision-recall results of the obtained road networksare given in Table 7 The road networks created from the seg-mentation mask of the post-disaster image have been denoted asPost The results of the proposed method where the OSM roadnetwork were updated by removing all destroyed road segmentsare given as Diff

The Post results convey the generalisation capability of thetested networks across image datasets from different times sincethey were trained on pre-disaster imagery while the evalua-tion was over the post-disaster imagery In contrast to the pre-disaster results the best performing model for precision-recallwas UNet with a ResNet backend

It can be seen that in the case of Post the precision was usu-ally much higher than the recall implying that the segmentationnetwork has a higher number of false negatives than false pos-itives This was due to the fact that there were gaps in the seg-mentation mask caused due to occlusions from shadows build-ings etc LinkNet with a ResNet34 backend gave the highestrecall in this case

As Table 7 shows the proposed Diff framework helped im-prove the generated road graph regardless of the base networkused The difference in results between the various architec-tures also became less pronounced as can be seen in Table 7where the difference between the maximum and minimum F scorein the case of Post was approximately 8 whereas that for Diffwas 2 This was largely due to the fact that the proposedmethod benefited from prior knowledge from OSM Note thatthe OSM data is not completely accurate [24] However basedon empirical observations using this data provides significantlybetter results than assuming no prior knowledge

The connectivity results of the estimated post-disaster roadnetworks are reported in Table 8 Similar to the precision-recallresults it can be seen that the proposed framework improvedthe results by a large margin This was due to the fact thatthe output of the segmentation networks often had gaps whichcaused missing connections in the generated road networksThe use of the OSM network which is properly connectedas an initial estimate helped deal with these missing connec-tions This conjecture is supported by the number of pairs thatare marked as having rsquoNo Connectionsrsquo in Table 8 where usingthe Diff framework reduced the number of rsquoNo Connectionrsquopairs to half of those from Post The Post results had a num-ber of small disconnected segments and some spurious pathscaused due to a non-ideal segmentation mask The Diff resultson the other hand were much better connected However Diffdid have some incorrect segments where the mask differencemissed segments

53 Qualitative Disaster Impact Results

As outlined in Section 32 the difference between the seg-mentation masks from pre-disaster and post-disaster imagerycan be used for disaster impact assessment This process is

8

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 7 Visualisation of the building segmentation results using pretrained encoders

9

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 8 Visualisation of the road segmentation results using pretrained encoders

10

Table 5 Segmentation Results (IoU) using pretrained models

Model Binary MulticlassRoads Buildings Roads Buildings Average

UNet (VGG11) 3936 5758 3591 5192 4392UNet (VGG16) 4007 5972 3512 5223 4368UNet (ResNet18) 3643 5790 3079 5071 4075UNet (ResNet34) 3756 5829 3418 5108 4263LinkNet (ResNet18) 3508 5708 2925 5059 3992LinkNet (ResNet34) 372 5715 3270 5009 4140ENet 3634 5944 - - -ENetSeparable 3744 5958 - - -UNetUp (VGG11) 3916 5872 3386 5066 4226UNetUp (VGG16) 4113 6004 3612 5386 4499UNetUp (ResNet18) 3748 5808 3364 5262 4104UNetUp (ResNet34) 3897 5839 3469 5294 4191

(a) Image (b) Ground Truth (c) UNetUp (VGG16)

Figure 9 Multi-class segmentation output

11

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 7: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

Figure 5 Buildings Validation Loss

bull Effect of pretraining with ImageNet on aerial image seg-mentation models

bull Efficiency vs accuracy trade-off between different pop-ular architectures

511 Effect of pretraining on aerial image segmentationThe first set of experiments were conducted to analyse the

effect of pretraining with ImageNet on the segmentation taskThe purpose of these experiments was two-fold Firstly whetherpretraining on a large classification dataset such as ImageNetimproves the accuracy of an unrelated task where the imagestatistics are quite different (ground-based object images forclassification vs aerial images for segmentation) Secondly as-suming accuracy with pretraining is similar or even better thanthat without whether pretraining improves the convergence speed

Four different network architectures were tested UNet withVGG11 or VGG16 backend and LinkNet with ResNet18 orResNet34 backend These models were trained for segmen-tation of buildings and roads while validation loss curves areshown in Fig 5 and Fig 6 respectively From the loss curvesit can be seen that pretrained networks generally convergedquicker than their non-pretrained equivalents requiring approx-imately 10 less epochs regardless of the model used and thetarget class The training curves for the VGG-based UNet mod-els for the road segmentation task also show that these modelsdid not converge very well when training from scratch thoughtheir validation loss which was used for preventing overfittingwas lower than their pretrained equivalent

The results of these models on the test set summarised inTable 4 show that in general the pretrained models had a higherIoU by a couple of points as compared to their non-pretrainedequivalents and this corresponds to the previous results in theliterature [2] Note that this section focuses on identifying thedifferences between training models from scratch and usingpretrained encoders The results across different models arecompared in the next section

Figure 6 Roads Validation Loss

Table 4 Effects of pretraining All results given as IoU

Model Pretrained Roads BuildingsUNet (VGG11) No 3773 5747UNet (VGG11) Yes 3936 5758UNet (VGG16) No 392 5772UNet (VGG16) Yes 4007 5972LinkNet (ResNet18) No 323 5129LinkNet (ResNet18) Yes 3508 5708LinkNet (ResNet34) No 3542 5482LinkNet (ResNet34) Yes 372 5715

7

512 Model capacity design and accuracyVisualisation of binary segmentation results for different

models can be seen in Figure 7 and Figure 8 and their quan-titative performance are reported in Table 5 Sizes of these dif-ferent models are reported in Table 6 From the results it can beseen that the proposed UNetUp with a VGG16 encoder outper-formed all other models by a couple of points on each task

The building segmentation masks in Figure 7 show that theENet-based models led to fairly blob-like outputs without clearboundaries The LinkNet models gave more distinct boundariesbut the clearest results were with the UNet and UNetUp modelsThe road segmentation masks did not appear as distinctivelydifferent in terms of visual comparison though the ENet basedmodels seemed to miss the most segments in this case

It is interesting to note that the model performance was notdirectly correlated to model size For instance in the binarysegmentation task in Table 5 it can be seen that ENetSeparableoutperformed ENet even though it had 30 fewer parametersThe number of parameters in these models were smaller by twoorders of magnitude compared to all the other models testedbut for the binary segmentation task these models were closeto the top performing UNetUp (VGG16) model However thetradeoff between model size and capacity became obvious inthe multiclass segmentation task where the smaller models didnot converge

From the results it can be seen that the VGG-based encodersoutperformed the ResNet-based encoders for all tasks For in-stance it can be seen from Table 5 that for the road segmenta-tion task using the UNet models the VGG11 and VGG16 en-coders were consistently better than the ResNet18 and ResNet34encoders This performance difference can also be seen acrossthe building segmentation and the multi-class segmentation tasks

The major difference between the UNet models and theLinkNet models is the way the skip link features are treatedIn the former the skip link features are concatenated with thecorresponding decoder features whereas in the latter they areadded to the decoder features to make the process more ef-ficient From the results it can be observed that the featureconcatenation in the UNet models allowed the network to learnmore discriminative features as these models always outper-formed their LinkNet equivalents even when the encoder wasthe same

Finally the proposed UNetUp with a VGG16 encoder out-performed all other models on the segmentation tasks It couldalso be seen that the UNetUp models outperformed equivalentUNet models when controlled for the encoder even though theyhad fewer parameters since they used upsampling instead oftransposed convolution layers

Table 5 also shows that all the models had better perfor-mance for the binary segmentation task as compared to multi-class segmentation for the same classes This seems to implythat more training data does not necessarily improve the perfor-mance if the task is more complex An example of the resultsof the UNetUp (VGG16) model on the multi-class segmenta-tion problem is seen in Figure 9 The sample image is the sameas the one shown for the building segmentation case in Figure 7

and by comparing the two it can be seen that the results in themulti-class case are less distinctive and more blob like

52 Quantitative Disaster Mapping Results

The precision-recall results of the obtained road networksare given in Table 7 The road networks created from the seg-mentation mask of the post-disaster image have been denoted asPost The results of the proposed method where the OSM roadnetwork were updated by removing all destroyed road segmentsare given as Diff

The Post results convey the generalisation capability of thetested networks across image datasets from different times sincethey were trained on pre-disaster imagery while the evalua-tion was over the post-disaster imagery In contrast to the pre-disaster results the best performing model for precision-recallwas UNet with a ResNet backend

It can be seen that in the case of Post the precision was usu-ally much higher than the recall implying that the segmentationnetwork has a higher number of false negatives than false pos-itives This was due to the fact that there were gaps in the seg-mentation mask caused due to occlusions from shadows build-ings etc LinkNet with a ResNet34 backend gave the highestrecall in this case

As Table 7 shows the proposed Diff framework helped im-prove the generated road graph regardless of the base networkused The difference in results between the various architec-tures also became less pronounced as can be seen in Table 7where the difference between the maximum and minimum F scorein the case of Post was approximately 8 whereas that for Diffwas 2 This was largely due to the fact that the proposedmethod benefited from prior knowledge from OSM Note thatthe OSM data is not completely accurate [24] However basedon empirical observations using this data provides significantlybetter results than assuming no prior knowledge

The connectivity results of the estimated post-disaster roadnetworks are reported in Table 8 Similar to the precision-recallresults it can be seen that the proposed framework improvedthe results by a large margin This was due to the fact thatthe output of the segmentation networks often had gaps whichcaused missing connections in the generated road networksThe use of the OSM network which is properly connectedas an initial estimate helped deal with these missing connec-tions This conjecture is supported by the number of pairs thatare marked as having rsquoNo Connectionsrsquo in Table 8 where usingthe Diff framework reduced the number of rsquoNo Connectionrsquopairs to half of those from Post The Post results had a num-ber of small disconnected segments and some spurious pathscaused due to a non-ideal segmentation mask The Diff resultson the other hand were much better connected However Diffdid have some incorrect segments where the mask differencemissed segments

53 Qualitative Disaster Impact Results

As outlined in Section 32 the difference between the seg-mentation masks from pre-disaster and post-disaster imagerycan be used for disaster impact assessment This process is

8

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 7 Visualisation of the building segmentation results using pretrained encoders

9

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 8 Visualisation of the road segmentation results using pretrained encoders

10

Table 5 Segmentation Results (IoU) using pretrained models

Model Binary MulticlassRoads Buildings Roads Buildings Average

UNet (VGG11) 3936 5758 3591 5192 4392UNet (VGG16) 4007 5972 3512 5223 4368UNet (ResNet18) 3643 5790 3079 5071 4075UNet (ResNet34) 3756 5829 3418 5108 4263LinkNet (ResNet18) 3508 5708 2925 5059 3992LinkNet (ResNet34) 372 5715 3270 5009 4140ENet 3634 5944 - - -ENetSeparable 3744 5958 - - -UNetUp (VGG11) 3916 5872 3386 5066 4226UNetUp (VGG16) 4113 6004 3612 5386 4499UNetUp (ResNet18) 3748 5808 3364 5262 4104UNetUp (ResNet34) 3897 5839 3469 5294 4191

(a) Image (b) Ground Truth (c) UNetUp (VGG16)

Figure 9 Multi-class segmentation output

11

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 8: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

512 Model capacity design and accuracyVisualisation of binary segmentation results for different

models can be seen in Figure 7 and Figure 8 and their quan-titative performance are reported in Table 5 Sizes of these dif-ferent models are reported in Table 6 From the results it can beseen that the proposed UNetUp with a VGG16 encoder outper-formed all other models by a couple of points on each task

The building segmentation masks in Figure 7 show that theENet-based models led to fairly blob-like outputs without clearboundaries The LinkNet models gave more distinct boundariesbut the clearest results were with the UNet and UNetUp modelsThe road segmentation masks did not appear as distinctivelydifferent in terms of visual comparison though the ENet basedmodels seemed to miss the most segments in this case

It is interesting to note that the model performance was notdirectly correlated to model size For instance in the binarysegmentation task in Table 5 it can be seen that ENetSeparableoutperformed ENet even though it had 30 fewer parametersThe number of parameters in these models were smaller by twoorders of magnitude compared to all the other models testedbut for the binary segmentation task these models were closeto the top performing UNetUp (VGG16) model However thetradeoff between model size and capacity became obvious inthe multiclass segmentation task where the smaller models didnot converge

From the results it can be seen that the VGG-based encodersoutperformed the ResNet-based encoders for all tasks For in-stance it can be seen from Table 5 that for the road segmenta-tion task using the UNet models the VGG11 and VGG16 en-coders were consistently better than the ResNet18 and ResNet34encoders This performance difference can also be seen acrossthe building segmentation and the multi-class segmentation tasks

The major difference between the UNet models and theLinkNet models is the way the skip link features are treatedIn the former the skip link features are concatenated with thecorresponding decoder features whereas in the latter they areadded to the decoder features to make the process more ef-ficient From the results it can be observed that the featureconcatenation in the UNet models allowed the network to learnmore discriminative features as these models always outper-formed their LinkNet equivalents even when the encoder wasthe same

Finally the proposed UNetUp with a VGG16 encoder out-performed all other models on the segmentation tasks It couldalso be seen that the UNetUp models outperformed equivalentUNet models when controlled for the encoder even though theyhad fewer parameters since they used upsampling instead oftransposed convolution layers

Table 5 also shows that all the models had better perfor-mance for the binary segmentation task as compared to multi-class segmentation for the same classes This seems to implythat more training data does not necessarily improve the perfor-mance if the task is more complex An example of the resultsof the UNetUp (VGG16) model on the multi-class segmenta-tion problem is seen in Figure 9 The sample image is the sameas the one shown for the building segmentation case in Figure 7

and by comparing the two it can be seen that the results in themulti-class case are less distinctive and more blob like

52 Quantitative Disaster Mapping Results

The precision-recall results of the obtained road networksare given in Table 7 The road networks created from the seg-mentation mask of the post-disaster image have been denoted asPost The results of the proposed method where the OSM roadnetwork were updated by removing all destroyed road segmentsare given as Diff

The Post results convey the generalisation capability of thetested networks across image datasets from different times sincethey were trained on pre-disaster imagery while the evalua-tion was over the post-disaster imagery In contrast to the pre-disaster results the best performing model for precision-recallwas UNet with a ResNet backend

It can be seen that in the case of Post the precision was usu-ally much higher than the recall implying that the segmentationnetwork has a higher number of false negatives than false pos-itives This was due to the fact that there were gaps in the seg-mentation mask caused due to occlusions from shadows build-ings etc LinkNet with a ResNet34 backend gave the highestrecall in this case

As Table 7 shows the proposed Diff framework helped im-prove the generated road graph regardless of the base networkused The difference in results between the various architec-tures also became less pronounced as can be seen in Table 7where the difference between the maximum and minimum F scorein the case of Post was approximately 8 whereas that for Diffwas 2 This was largely due to the fact that the proposedmethod benefited from prior knowledge from OSM Note thatthe OSM data is not completely accurate [24] However basedon empirical observations using this data provides significantlybetter results than assuming no prior knowledge

The connectivity results of the estimated post-disaster roadnetworks are reported in Table 8 Similar to the precision-recallresults it can be seen that the proposed framework improvedthe results by a large margin This was due to the fact thatthe output of the segmentation networks often had gaps whichcaused missing connections in the generated road networksThe use of the OSM network which is properly connectedas an initial estimate helped deal with these missing connec-tions This conjecture is supported by the number of pairs thatare marked as having rsquoNo Connectionsrsquo in Table 8 where usingthe Diff framework reduced the number of rsquoNo Connectionrsquopairs to half of those from Post The Post results had a num-ber of small disconnected segments and some spurious pathscaused due to a non-ideal segmentation mask The Diff resultson the other hand were much better connected However Diffdid have some incorrect segments where the mask differencemissed segments

53 Qualitative Disaster Impact Results

As outlined in Section 32 the difference between the seg-mentation masks from pre-disaster and post-disaster imagerycan be used for disaster impact assessment This process is

8

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 7 Visualisation of the building segmentation results using pretrained encoders

9

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 8 Visualisation of the road segmentation results using pretrained encoders

10

Table 5 Segmentation Results (IoU) using pretrained models

Model Binary MulticlassRoads Buildings Roads Buildings Average

UNet (VGG11) 3936 5758 3591 5192 4392UNet (VGG16) 4007 5972 3512 5223 4368UNet (ResNet18) 3643 5790 3079 5071 4075UNet (ResNet34) 3756 5829 3418 5108 4263LinkNet (ResNet18) 3508 5708 2925 5059 3992LinkNet (ResNet34) 372 5715 3270 5009 4140ENet 3634 5944 - - -ENetSeparable 3744 5958 - - -UNetUp (VGG11) 3916 5872 3386 5066 4226UNetUp (VGG16) 4113 6004 3612 5386 4499UNetUp (ResNet18) 3748 5808 3364 5262 4104UNetUp (ResNet34) 3897 5839 3469 5294 4191

(a) Image (b) Ground Truth (c) UNetUp (VGG16)

Figure 9 Multi-class segmentation output

11

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 9: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 7 Visualisation of the building segmentation results using pretrained encoders

9

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 8 Visualisation of the road segmentation results using pretrained encoders

10

Table 5 Segmentation Results (IoU) using pretrained models

Model Binary MulticlassRoads Buildings Roads Buildings Average

UNet (VGG11) 3936 5758 3591 5192 4392UNet (VGG16) 4007 5972 3512 5223 4368UNet (ResNet18) 3643 5790 3079 5071 4075UNet (ResNet34) 3756 5829 3418 5108 4263LinkNet (ResNet18) 3508 5708 2925 5059 3992LinkNet (ResNet34) 372 5715 3270 5009 4140ENet 3634 5944 - - -ENetSeparable 3744 5958 - - -UNetUp (VGG11) 3916 5872 3386 5066 4226UNetUp (VGG16) 4113 6004 3612 5386 4499UNetUp (ResNet18) 3748 5808 3364 5262 4104UNetUp (ResNet34) 3897 5839 3469 5294 4191

(a) Image (b) Ground Truth (c) UNetUp (VGG16)

Figure 9 Multi-class segmentation output

11

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 10: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

(a) Image (b) Ground Truth

(c) ENet (d) ENetSeparable (e) LinkNet (ResNet18) (f) LinkNet (ResNet34)

(g) UNet (VGG11) (h) UNet (VGG16) (i) UNet (ResNet18) (j) UNet (ResNet34)

(k) UNetUp (VGG11) (l) UNetUp (VGG16) (m) UNetUp (ResNet18) (n) UNetUp (ResNet34)

Figure 8 Visualisation of the road segmentation results using pretrained encoders

10

Table 5 Segmentation Results (IoU) using pretrained models

Model Binary MulticlassRoads Buildings Roads Buildings Average

UNet (VGG11) 3936 5758 3591 5192 4392UNet (VGG16) 4007 5972 3512 5223 4368UNet (ResNet18) 3643 5790 3079 5071 4075UNet (ResNet34) 3756 5829 3418 5108 4263LinkNet (ResNet18) 3508 5708 2925 5059 3992LinkNet (ResNet34) 372 5715 3270 5009 4140ENet 3634 5944 - - -ENetSeparable 3744 5958 - - -UNetUp (VGG11) 3916 5872 3386 5066 4226UNetUp (VGG16) 4113 6004 3612 5386 4499UNetUp (ResNet18) 3748 5808 3364 5262 4104UNetUp (ResNet34) 3897 5839 3469 5294 4191

(a) Image (b) Ground Truth (c) UNetUp (VGG16)

Figure 9 Multi-class segmentation output

11

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 11: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

Table 5 Segmentation Results (IoU) using pretrained models

Model Binary MulticlassRoads Buildings Roads Buildings Average

UNet (VGG11) 3936 5758 3591 5192 4392UNet (VGG16) 4007 5972 3512 5223 4368UNet (ResNet18) 3643 5790 3079 5071 4075UNet (ResNet34) 3756 5829 3418 5108 4263LinkNet (ResNet18) 3508 5708 2925 5059 3992LinkNet (ResNet34) 372 5715 3270 5009 4140ENet 3634 5944 - - -ENetSeparable 3744 5958 - - -UNetUp (VGG11) 3916 5872 3386 5066 4226UNetUp (VGG16) 4113 6004 3612 5386 4499UNetUp (ResNet18) 3748 5808 3364 5262 4104UNetUp (ResNet34) 3897 5839 3469 5294 4191

(a) Image (b) Ground Truth (c) UNetUp (VGG16)

Figure 9 Multi-class segmentation output

11

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 12: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

Table 6 Model Size

Model params SizeENetSeparable 226596 11MBENet 349068 16MBLinkNet (ResNet18) 11686561 468MBUNetUp (ResNet18) 20290377 812MBLinkNet (ResNet34) 21794721 873MBUNet (ResNet18) 22383433 896MBUNetUp (VGG11) 22927393 917MBUNet (VGG11) 25364513 1015MBUNetUp (VGG16) 29306465 1172 MBUNetUp (ResNet34) 30398537 1217MBUNet (VGG16) 32202337 1288MBUNet (ResNet34) 32491593 1301MB

Figure 10 Disaster Impact assessment Left Road and building masks fromsatellite imagery using segmentation network with buildings in yellow androads in blue Top Right Estimated difference between the infrastructure be-fore and after disaster given in red Bottom Right Change heatmap overlaidonto an image of the test region

shown in Figure 10 where the difference in the buildings androads caused by the disaster are marked in red in the image onthe top right

The area under consideration was divided into a grid of cellsof a fixed size and the number of changed pixels per grid cellwas used as an overall estimate of the damage caused to a par-ticular area This has been plotted as a heatmap in Figure 10The heatmap shows that the major destruction was along thecoast and an area in the south-west of Palu city This findingcorresponds to the European Commissionrsquos (EC) CorpernicusEmergency Mapping Services report[13] for the area A majorportion of the coast was washed away due to the tsunami andthe south-west region of the city was washed away due to soilliquefaction

6 Conclusions

This work provides a comparison among different segmen-tation models and presents a framework for the identificationof damaged areas and accessible roads in post-disaster scenar-ios using satellite imagery The framework leverages on pre-existing knowledge from OSM to obtain a robust estimate ofthe affected road network

The performances of various models for the tasks of binaryand multi-class semantic segmentation in aerial images havebeen analysed and compared The results show that using en-coders pretrained on ImageNet improved the training time byaround 10 epochs and the accuracy by a couple of percentagepoints despite the domain gap that existed between ImageNetand aerial images

On comparing the effects of using different encoders for thetask of semantic segmentation it could be seen that VGG16outperformed all other feature extraction modules The trade-off between accuracy and efficiency has been studied An ex-tremely efficient neural network termed ENetSeparable wasproposed It has 30 fewer parameters than ENet and still per-formed better on the binary segmentation task

For post-disaster scenarios areas affected by the disasterwere identified using the difference in the predicted segmenta-tion masks The evaluated road changes were used to updatethe road networks available from OSM There was a significantdifference in the results of the various segmentation networkswhere the F score varied by as much as 8 The use of theproposed framework alleviated the differences and brought thedifference in F score down to 2 The highest F score achievedwith the use of the proposed framework was 9476 as comparedto the highest F score of 7398 from the segmentation networks

The proposed framework uses OSM data for training anddoes not require time-consuming manually annotated post-disasterdata Finally the qualitative assessment of the aftermath dam-age can be generated easily as shown in the Palu tsunami whichwas validated from the European Commission report

This work can be further improved in a number of waysNamely the results of the different models could be ensembledto help improve the road connectivity results Classification ofdamages could be used to identify where the infrastructure hasbeen completely destroyed as in the case of soil liquefaction orif a road blockage is something that can be dealt with relativelyeasily such as one caused by a fallen tree

ACKNOWLEDGMENT

The authors would like to thank Dr Andrew West DrThomas Wright and Ms Elisabeth Welburn for their valuablecomments and feedback A Gupta is funded by a Scholarshipfrom the Department of Electrical and Electronic EngineeringThe University of Manchester and the ACM SIGHPCIntel Com-putational and Data Science Fellowship

References

[1] Adriano B Xia J Baier G Yokoya N Koshimura SMulti-Source Data Fusion Based on Ensemble Learning for Rapid

12

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 13: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

Table 7 Precision Recall of Sub-segments

Model Precision Recall F scorePost Diff Post Diff Post Diff

ENet 7036 9249 6597 9313 6809 9281ENetSeparable 6120 9265 6712 955 6402 9405LinkNet (ResNet18) 7702 938 7036 9529 7354 9454LinkNet (ResNet34) 718 9518 7629 9434 7398 9476UNet (VGG11) 7366 948 6787 928 7065 938UNet (VGG16) 7573 9499 723 9402 7398 945UNet (ResNet18) 8171 9364 6386 9227 7169 9295UNet (ResNet34) 7832 9409 7457 9541 7640 9475UNetUp (VGG11) 7130 944 6731 9393 6925 9416UNetUp (VGG16) 6717 951 7093 9368 690 9438UNetUp (ResNet18) 7817 9432 6742 9512 7240 9472UNetUp (ResNet34) 764 9376 6714 9367 7147 9371

Table 8 Connectivity results All numbers given as percentage

Model Correct Too Long Too Short No ConnectionPost Diff Post Diff Post Diff Post Diff

ENet 2182 5402 1915 1481 229 126 5666 2983ENetSeparable 2029 6222 1843 1494 218 135 5904 2139LinkNet (ResNet18) 1833 7036 2365 693 235 135 5559 2125LinkNet (ResNet34) 3105 7459 1497 52 397 141 4996 1871UNet (VGG11) 2515 6556 2558 999 528 102 4394 2333UNet (VGG16) 268 6768 3164 809 542 097 364 2316UNet (ResNet18) 2103 5862 2015 1257 472 107 54 2765UNet (ResNet34) 3704 7558 1713 441 468 143 411 1849UNetUp (VGG11) 1958 6103 1422 1358 194 099 6423 243UNetUp (VGG16) 243 6618 235 898 471 109 4743 2366UNetUp (ResNet18) 2244 6735 2164 846 266 114 5323 2305UNetUp (ResNet34) 1871 6601 239 889 224 157 551 2344

13

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 14: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

Building Damage Mapping during the 2018 Sulawesi Earthquakeand Tsunami in Palu Indonesia Remote Sensing 11(7) 886(4 2019) httpsdoiorg103390rs11070886 httpswwwmdpicom2072-4292117886

[2] Audebert N Saux BL Lefevre S Semantic Segmentation of EarthObservation Data Using Multimodal and Multi-scale Deep Networks (92016) httparxivorgabs160906846

[3] Bastani F He S Abbar S Alizadeh M Balakrishnan H ChawlaS Madden S DeWitt D RoadTracer Automatic Extraction of RoadNetworks from Aerial Images In Computer Vision and Pattern Recog-nition (2018) httpsdoiorg101109CVPR201800496 httpsroadmapshttparxivorgabs180203680

[4] Bhattacharjee S Roy S Das Bit S Post-disaster map builderCrowdsensed digital pedestrian map construction of the disaster affectedareas through smartphone based DTN Computer Communications134 96ndash113 (1 2019) httpsdoiorg101016jcomcom201811010httpswwwsciencedirectcomsciencearticlepiiS0140366418305917

[5] Boccardo P Giulio Tonolo F Remote Sensing Role in EmergencyMapping for Disaster Response In Engineering Geology for Societyand Territory - Volume 5 pp 17ndash24 Springer International Publish-ing Cham (2015) httpsdoiorg101007978-3-319-09048-1 3 httplinkspringercom101007978-3-319-09048-1_3

[6] Chaurasia A Culurciello E LinkNet Exploiting encoder repre-sentations for efficient semantic segmentation In 2017 IEEE VisualCommunications and Image Processing VCIP 2017 vol 2018-Janua pp 1ndash4 (6 2018) httpsdoiorg101109VCIP20178305148httparxivorgabs170703718httpdxdoiorg101109VCIP20178305148

[7] Chen L Fan X Wang L Zhang D Yu Z Li J NguyenTMT Pan G Wang C RADAR Proceedings of theACM on Interactive Mobile Wearable and Ubiquitous Tech-nologies 1(4) 1ndash23 (1 2018) httpsdoiorg1011453161159httpsdoiorg1011453161159httpdlacmorgcitationcfmdoid=31781573161159

[8] Chollet F Xception Deep Learning with Depthwise Separable Convo-lutions (10 2016) httparxivorgabs161002357

[9] Demir I Koperski K Lindenbaum D Pang G HuangJ Basu S Hughes F Tuia D Raska R DeepGlobe2018 A challenge to parse the earth through satellite imagesIn IEEE Computer Society Conference on Computer Vision andPattern Recognition Workshops vol 2018-June pp 172ndash181 (52018) httpsdoiorg101109CVPRW201800031 httparxivorgabs180506561

[10] DigitalGlobe Open Data Initiative httpswwwdigitalglobecomecosystemopen-data

[11] Doshi J Basu S Pang G From SatelliteImagery to Disaster Insights (2018) httpsresearchfbcomwp-contentuploads201811From-Satellite-Imagery-to-Disaster-Insightspdfhttparxivorgabs181207033

[12] Douglas DH Peucker TK Algotrithms for the Reduc-tion of the Number of Points Required to Represent a Digi-tized Line or its Caricature Cartographica The InternationalJournal for Geographic Information and Geovisualization 10(2)112ndash122 (12 1973) httpsdoiorg103138FM57-6770-U75U-7727 httpsutpjournalspressdoi103138FM57-6770-U75U-7727

[13] European Commision Joint Research Centre Mw 75 Earthquake in In-donesia 28 Emergency Report Tech rep (2018) httpinatewsbmkggoidnewtsunami15php

[14] Fujita A Sakurada K Imaizumi T Ito R Hikosaka SNakamura R Damage detection from aerial images via con-volutional neural networks In 5th IAPR International Con-ference on Machine Vision Applications pp 5ndash8 IEEE (52017) httpsdoiorg1023919MVA20177986759 httpieeexploreieeeorgdocument7986759

[15] Gupta A Welburn E Watson S Yin H CNN-Based Se-mantic Change Detection in Satellite Imagery In InternationalConference on Artificial Neural Networks pp 669ndash684 (2019)httpsdoiorg101007978-3-030-30493-5 61 httplink

springercom101007978-3-030-30493-5_61[16] He K Zhang X Ren S Sun J Deep Residual Learn-

ing for Image Recognition ArxivOrg 7(3) 171ndash180 (2015)httpsdoiorg103389fpsyg201300124 httparxivorgpdf151203385v1pdf

[17] He K Zhang X Ren S Sun J Delving deep into rectifiers Surpass-ing human-level performance on imagenet classification In Proceedingsof the IEEE International Conference on Computer Vision vol 2015Inter pp 1026ndash1034 (2 2015) httpsdoiorg101109ICCV2015123httparxivorgabs150201852

[18] Iglovikov V Shvets A TernausNet U-Net with VGG11 EncoderPre-Trained on ImageNet for Image Segmentation (1 2018) httparxivorgabs180105746

[19] Jin J Dundar A Culurciello E Flattened Convolutional Neural Net-works for Feedforward Acceleration (12 2014) httparxivorgabs14125474

[20] Kaiser P Wegner JD Lucchi A Jaggi M HofmannT Schindler K Learning Aerial Image Segmentation fromOnline Maps IEEE Transactions on Geoscience and RemoteSensing (7 2017) httpsdoiorg101109TGRS20172719738httpwwwopenstreetmaporghttpsarxivorgpdf170706879pdfhttparxivorgabs170706879httpdxdoiorg101109TGRS20172719738

[21] Kingma DP Ba J Adam A Method for Stochastic Opti-mization In International Conference on Learning Representations(2015) httpsdoiorg10106314902458 httpsarxivorgpdf14126980pdfhttparxivorgabs14126980

[22] Li S Lyons J Voigt S Muthike DM Pedersen WGiulio-Tonolo F Schneiderhan T Guha-Sapir D Kaku KProy C Bequignon J Czaran L Platzeck G Kucera JHazarika MK James GK Jones B Global trends insatellite-based emergency mapping Science 353(6296) 247ndash252 (72016) httpsdoiorg101126scienceaad8728 httpsciencesciencemagorgcontent3536296247

[23] Liu Z Zhang J Li X An automatic method for roadcenterline extraction from post-earthquake aerial images (12019) httpsdoiorg101016jgeog201811008 httpswwwsciencedirectcomsciencearticlepiiS1674984718301101

[24] Mattyus G Luo W Urtasun R DeepRoadMapper Extract-ing Road Topology from Aerial Images In IEEE InternationalConference on Computer Vision vol 2017-Octob pp 3458ndash3466IEEE (10 2017) httpsdoiorg101109ICCV2017372 httpieeexploreieeeorgdocument8237634

[25] Miller G The Huge Unseen Operation Behind the Accuracy ofGoogle Maps (2014) httpswwwwiredcom201412google-maps-ground-truthhttpwwwwiredcom201412google-maps-ground-truth

[26] Nair V Hinton GE Rectified Linear Units Improve RestrictedBoltzmann Machines In 27th International Conference on Ma-chine Learning (2010) httpspdfssemanticscholarorga538b05ebb01a40323997629e171c91aa28b8e2fpdf

[27] Odena A Dumoulin V Olah C Deconvolution and CheckerboardArtifacts Distill 1(10) (10 2016) httpsdoiorg1023915distill00003httpdistillpub2016deconv-checkerboard

[28] OpenStreetMap Contributors Planet dump retrieved fromhttpsplanetosmorg (2017)

[29] Paszke A Chaurasia A Kim S Culurciello E ENet A Deep NeuralNetwork Architecture for Real-Time Semantic Segmentation (6 2016)httparxivorgabs160602147

[30] Paszke A Gross S Chintala S Chanan G Yang E DeVito ZLin Z Desmaison A Antiga L Lerer A Automatic differentiationin PyTorch (2017)

[31] Pathier E Fielding EJ Wright TJ Walker R Parsons BE Hens-ley S Displacement field and slip distribution of the 2005 Kashmirearthquake from SAR imagery Geophysical Research Letters 33(20)L20310 (10 2006) httpsdoiorg1010292006GL027193 httpdoiwileycom1010292006GL027193

[32] Poiani TH Rocha RdS Degrossi LC Albuquerque JPd Poten-tial of collaborative mapping for disaster relief A case study of open-streetmap in the Nepal earthquake 2015 In Proceedings of the Annual

14

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15

Page 15: arXiv:2006.05575v1 [eess.IV] 10 Jun 2020(SAR) and high resolution optical images. SAR is extremely useful for dealing with low-light conditions and for areas with cloud cover. It is

Hawaii International Conference on System Sciences vol 2016-Marchpp 188ndash197 IEEE (1 2016) httpsdoiorg101109HICSS201631httpieeexploreieeeorgdocument7427206

[33] Ronneberger O Fischer P Brox T U-net Convolutional net-works for biomedical image segmentation In Lecture Notes in Com-puter Science (including subseries Lecture Notes in Artificial Intel-ligence and Lecture Notes in Bioinformatics) vol 9351 pp 234ndash241 (5 2015) httpsdoiorg101007978-3-319-24574-4 28 httparxivorgabs150504597

[34] Rudner TGJ Ruszligwurm M Fil J Pelich R Bischke B KopackovaV Bilinski P Multi3Net Segmenting Flooded Buildings via Fusionof Multiresolution Multisensor and Multitemporal Satellite Imagery InThirty-Third AAAI Conference on Artificial Intelligence (2019) wwwaaaiorghttparxivorgabs181201756

[35] Schumann G Hostache R Puech C Hoffmann L Matgen PPappenberger F Pfister L High-Resolution 3-D Flood InformationFrom Radar Imagery for Flood Hazard Management IEEE Transac-tions on Geoscience and Remote Sensing 45(6) 1715ndash1725 (6 2007)httpsdoiorg101109TGRS2006888103 httpieeexploreieeeorgdocument4215088

[36] Simonyan K Zisserman A Very Deep ConvolutionalNetworks for Large-Scale Image Recognition InternationalConference on Learning Representations pp 1ndash14 (2015)httpsdoiorg101016jinfsof200809005 httparxivorgabs14091556

[37] Singh S Batra A Pang G Torresani L Basu S Paluri M Jawa-har C Self-supervised Feature Learning for Semantic Segmentation ofOverhead Imagery In BMVC (2018)

[38] Sun T Chen Z Yang W Wang Y Stacked U-nets with multi-output for road extraction In IEEE Computer Society Conference onComputer Vision and Pattern Recognition Workshops vol 2018-Junepp 187ndash191 (2018) httpsdoiorg101109CVPRW201800033httpopenaccessthecvfcomcontent_cvpr_2018_workshopspapersw4Sun_Stacked_U-Nets_With_CVPR_2018_paperpdf

[39] Van Etten A You Only Look Twice Rapid Multi-Scale Object Detec-tion In Satellite Imagery (2018) httparxivorgabs180509512

[40] Van Etten A Lindenbaum D Bacastow TM SpaceNet A Re-mote Sensing Dataset and Challenge Series Arxiv (2018) httpswwwsemanticscholarorgpaperSpaceNethttparxivorgabs180701232

[41] Wegner JD Montoya-Zegarra JA Schindler K Road net-works as collections of minimum cost paths ISPRS Journalof Photogrammetry and Remote Sensing 108 128ndash137 (102015) httpsdoiorg101016jisprsjprs201507002 httpswwwsciencedirectcomsciencearticlepiiS0924271615001690

[42] Zhang L Zhang L Du B Deep Learning for RemoteSensing Data A Technical Tutorial on the State of the ArtIEEE Geoscience and Remote Sensing Magazine 4(2) 22ndash40 (6 2016) httpsdoiorg101109MGRS20162540798httpieeexploreieeeorgdocument7486259

[43] Zhu XX Tuia D Mou L Xia GS Zhang L Xu F FraundorferF Deep Learning in Remote Sensing A Comprehensive Review and Listof Resources IEEE Geoscience and Remote Sensing Magazine 5(4) 8ndash36 (12 2017) httpsdoiorg101109MGRS20172762307 httpieeexploreieeeorgdocument8113128

15


Recommended