Advanced Data Fusion for Classification and...

IJCSIT International Journal of Computer Science and Information Technology, Vol. 4, No. 1, June 2011, pp. 7-14

Advanced Data Fusion for Classificationand Extraction

S. T. Patil1, Rajesh Jalanekar1 S. B. Mule2 and R. S. Gangapure2

1Professor, Vishwakarma Institute of Technology, PuneE-mail: [email protected], [email protected]

2Assistant Professor, Sinhgad College of Engineering, PuneE-mail: [email protected], [email protected]

ABSTRACT: This paper describes the fusion of information extracted from multispectral digital aerial images for land useclassification. The proposed approach integrates spectral classification techniques and spatial information. The multispectraldigital aerial images consist of a high resolution panchromatic channel as well as lower resolution RGB and NIR channelsand form the basis for information extraction. Our land use classification is a 2-step approach that uses RGB and NIRimages for an initial classification and the panchromatic images as well as a digital surface model for a refined classification.The digital surface model is generated from the high resolution panchromatic images of a specific photo mission. Based onthe aerial triangulation using area and feature based points of interest we are able to generate a dense digital surface modelby a dense matching procedure. This approach produces the desired land use classification results and exploits the highredundancy of the source data set to automatically perform after a short interactive training phase.

Keywords: Classification, Extraction, Fusion, Orthoimage, DEM/DTM

1. INTRODUCTIONDigital aerial cameras can be used to produce imageswith high degree of image overlap in flight directionat almost no additional costs. A terrain point may bevisible in 5 to 15 or even more images. Currently, aerialphotogrammetry is undergoing a “paradigm shift”(Leberl, 2004) which means the transition fromminimizing the number of film photos due to humanoperator intensive processing to maximizing therobustness of automation due to high redundant imageinformation using new large format digital aerialcameras.

This contribution is based on images from theUltraCamD camera from Vexcel Imaging with itsmultispectral capability. UltraCamD offerssimultaneously sensing of high resolutionpanchromatic information and additionalmultispectral - thus red, green, blue and near infrared(NIR) - information. The benefit of the multispectralsensing is obviously the simultaneous recording ofall bands, i.e. any classification can be performedwithout cumbersome registration of different scenes.Additional ideas about photogrammetric color sensingmay be found in (Leberl, 2002). The image data usedcomprise the panchromatic high resolution images as

well as the low resolution multispectral images, seeFig. 1.

Figure 1: Image Data Used(left) panchromatic high resolution image(middle) RGB low resolution image(right) NIR-R-G low resolution image

The proposed workflow includes the followingsteps:

(a) initial classification of all images, seesection 2,

(b) the aerial triangulation (AT), see section 3,

mailto:[email protected]




8 International Journal of Computer Science and Information Technology

(c) dense matching to generate a dense digitalsurface model (DSM), see section 4,

(d) ortho image production, see section 5,

(e) refined classification using the DSM and theortho images, see section 6.

This paper will focus on the information fusion inclassification.

2. INITIAL CLASSIFICATIONThe initial classification is a supervised classificationperformed on each of the overlapping color imageswith 4 color channels RGB and NIR. The classifiedclasses mainly rely on color and infrared andwill be refined later using the information from theDSM.

The following types of classifiers for supervisedclassification can be found in literature:

(a) maximum likelihood classifier(b) neural network classifier(c) decision tree classifier

(d) support vector machineMany - especially newer - papers propose support

vector machines for use in multispectral data. Thepurpose of (Huang, 2002) is to demonstrate theapplicability of support vector machines to derivingland use from operational sensor systems and toevaluate systematically their performances incomparison to other popular classifiers.

The support vector machine (SVM) represents agroup of theoretically superior machine learningalgorithms, see (Vapnik, 1995). The SVM employsoptimization algorithms to locate the optimalboundaries between classes. Statistically, the optimalboundaries should be generalized to unseen sampleswith least errors among all possible boundariesseparating the classes, therefore minimizing theconfusion between classes.

An important benefit of the support vectormachine approach is that the complexity of theresulting classifier is characterized by the number ofsupport vectors rather than the dimensionality of thetransformed space. As a result, SVMs tend to be lessprone to the problem of over-fitting than some othermethods. Initial classification uses the SVM libraryLIBSVM developed at the National TaiwanUniversity, see (Chang, 2005) for software details and(Hsu, 2003) for a practical guide to support vectorclassification.

Initial classification discriminates all classes thatare more significantly described by color and NIRvalues than by texture and spatial relationship. Theseclasses are:

(a) Solid: man made structures like streets,buildings with gray or non-colored roofs

(b) Colored roofs

(c) Soil, bare earth

(d) Lake, river, sea

(e) Vegetation: wood, grassland, fields

(f) Dark shadows

(g) Swimming pools

(h) Snow/ice

The first step in classification is feature extraction,i.e. the process of generating spectral feature vectorsfrom the 4 input planes. The selection of the featuresto be extracted is important because it determines theamount of features that have to be computed andprocessed. In addition to the improved computationalspeed in lower dimensional feature spaces there mightalso be an increase in the accuracy of the classificationalgorithm. The features computed for initialclassification include

(a) Single pixel values of all input planes

(b) Normalized ratio between image planes Ratioimages may be used to remove the influenceof light and shadow on a ridge due to the sunangle. It is also possible to calculate certainindices which can enhance vegetation orgeology. NDVI - Normalized DifferenceVegetation Index - is a commonly usedvegetation index which uses the red andinfrared bands of the spectrum.

(c) Values computed in a circular neighborhoodof given radius like minimum, maximum orstandard deviation

Fig. 2, illustrates a subset of feature values for someclassified image regions. The colors represent thefollowing features:

(a) R,G, and B colored red, green and blue

(b) NIR is colored violet

(c) NDVI is colored in yellow

Additional features computed are not included asthey would reduce the clearness of Fig. 2. Thedistribution of features can be depicted as shown inFig. 3 for each two features. The SVM is trained to

Advanced Data Fusion for Classification and Extraction 9

find optimal boundaries between the classesrepresented by these image features. Initialclassification is performed by applying the trainedSVM on each pixel.

classes and their probabilities will be used when thefusion of several initial classification results will beperformed, see section 5.

Fig. 4 lists the color representation of the classesthat are classified in initial classification and that areused in the following classification images.

Figure 2: A Subset of Feature Values used in InitialClassification

Figure 3: Distribution of Two Image Features Related to thePredefined Classes

The result of the initial classification is for eachpixel the most probable class including its probabilityand additionally a second class and its probability ifthere are two classes with high probabilities. The two

Figure 4: Classes for Initial Classification and their ColorRepresentation

Fig. 5 illustrates a detail of the classification resultbased on the color scheme in Fig. 4.

A more complex example for the input and outputof initial classification is depicted in Fig. 6. The inputis RGB and NIR data of a flight stripe and output is asequence of initial classification results.

Figure 5: (left) RGB Image Detail

(right) Initial Classification Result


3. AUTOMATIC AERIAL TRIANGULATIONDigital airborne cameras are able to deliver highredundant images which result in small baselines.Normally, the stripes of images have at least 80%forward overlap and at least 20% side overlap (in urbanareas 60% side overlap). This high redundancy and theconstraint motion of a plane help to find good startingsolutions needed for a fully automated AT.Nevertheless, an accurate extraction of tie points isneeded for a robust and accurate AT (Thurgood, 2004).Our POI extraction is based on Harris points and POIsfrom line intersections (Bauer, 2004).

After the POIs extraction in each image wecalculate feature vectors from the close neighborhood.These feature vectors are used to find 1 to n

correspondences between POIs in two images. Thenumber of candidates is further reduced using affineinvariant area based matching. In order to fulfill thenon- ambiguous criteria, only matches with a highdistinctive score are retained. The robustness of thematching process is enhanced by processing a back-matching as well.

Another restriction is enforced by the epipolargeometry. Therefore the RANSAC method is appliedto the well known five point algorithm (Nister, 2003).As a result we obtain inlier correspondences as wellas the essential matrix. By decomposition of theessential matrix the relative orientation of the currentimage pair can be calculated.

This step is accomplished for all consecutive imagepairs. In order to get the orientation of the whole set,the scale factor for additional image pairs has to bedetermined. This is done using corresponding POIsavailable in at least three images. A block bundleadjustment refines the relative orientation of the wholeset and integrates other data like GPS or ground controlinformation. Fig. 7 shows an oriented block of images.

Figure 6: (bottom) RGB images of a flight stripe

(top) Initial classification results of flight stripe Ina supervised classification the analyst identifiesseveral areas in an image which represent knownfeatures or land use. These known areas are referredto as ‘training sites’ where groups of pixels are a goodrepresentation of the land cover or surfacephenomenon. Using the pixel information theclassification procedure then looks for other areaswhich have a similar grouping and pixel value. Theanalyst decides on the training sites and thussupervises the classification process.

The identification of training sites has to be done• after radiometric camera calibration• when the weather and lightening conditions or the

land properties significantly change

Figure 7: Oriented Block of 7 Stripes of about 50 Images EachDenoted by a Small Arrow

The 7 x 50 aerial images are oriented to each otherusing about

70.000 tie points on the ground which are shownas white dots in Fig. 7. The whole block of imageswas processed without any human interaction.


4. DENSE MATCHINGOnce the AT is finished we perform a dense area basedmatching to produce a dense DSM (digital surfacemodel). During the last few years more and more newdense matching algorithms were introduced. A goodcomparison of stereo matching algorithms is given ina paper by Scharstein et al. (Scharstein, 2002).Recently, a PDE based multi-view matching methodwas introduced by Strecha et al. (Strecha, 2003). Inour approach we focus on an iterative and hierarchicalmethod based on homographies to find densecorresponding points. For each input image an imagepyramid is created and the calculation starts at thecoarsest level. Corresponding points are determinedand upsampled to the next finer level where thecalculation proceeds. This procedure continues untilthe full resolution level is reached. A more detaileddescription of this algorithm implemented on graphicshardware can be found in (Zach, 2003).

The workflow for aerial triangulation and densematching is as follows: the input dataset consist of oneor more flight stripes of panchromatic data, cameraparameters and approximations for the exteriororientation of the images. The resulting output is aDSM, see Fig. 8.

The fusion of initial classification results is donein the following way:

• Determine if a pixel of an initial classificationresult is visible in the ortho image, i.e. nothidden by an object

• For each visible pixel get the class withhighest probability as well as the class withsecond highest probability – if available

• Perform special handling of shadows: removevisible shadow results if other initialclassification results have more specificresults like solid or vegetation

• Perform majority voting using the classes withhighest and second highest probability for allvisible pixel

The advantages of the ortho classification are

• Moving objects are removed, see Fig. 10

• Very scattered classification results areimproved, see Fig. 11

Figure 8: (top) panchromatic images of flight stripe

(bottom) DSM textured with panchromatic image dataof the area marked red in the top image

Figure 9: Ortho RGB Image

Figure 10: Moving Red Car in the 2 Left Images is Removed inthe Ortho Initial Classification in the Right Image


5. DATA FUSION AND ORTHO IMAGEGENERATION

An ortho image is obtained by the ortho projectionof the DSM, see Fig. 9 for an RGB ortho image. Thecolor information of the ortho image is calculated usingall available aerial images of one or more flight stripesand is based on view-dependent texture mappingdescribed in (Bornik, 2001). The color information mayeither be panchromatic, RGB or CIR (NIR-R- G).

• Compute maximal height difference, i.e. thedifference between height value andsignificant minimal height value

• If the maximal height difference is higher thanthe trained minimal building height, then thepixel belongs to a building

• Remove small regions up to a specified sizeto prevent for example street-lamps or othersmall but high objects to be classified asbuildings

Refined classification for objects of class solid orroof is based on the computed building blocks. SeeFig. 12 for an example on building blocks and onrefined classification results in which buildings – inyellow – are correctly classified. The RGB ortho imageof the scene used is given in Fig. 9.

6. DATA FUSION AND REFINEDCLASSIFICATION

Data fusion and the use of multiple classifiers, see(Roli, 2002), is a topic of special interest. The followingdata form the basis for data fusion in refinedclassification:

• Ortho-initial-classification

• Ortho-panchromatic image

• Ortho-height image extracted from the DSM

Refined classification updates the results frominitial classification arranged into an orthoclassification using the spatial properties of the 3Dfeatures and the high resolution ortho panchromaticimage. Spatial properties allow to differentiate treesfrom low vegetation, concrete roofs from streets, etc.Data fusion includes the computation of additionalinformation from the ortho input data like heightgradients, building blocks and texture measures.

The following refinement of initial classificationresults is performed:• Solid gets refined into

o Streets depicted gray

o Buildings depicted in yellow

• Vegetation gets refined into

o Grass land and fields depicted in bright green

o Wood and trees depicted in dark green

The refined classification of objects of class solidimplies the training of a minimal building height todistinguish between objects with low height like carsand small huts. The minimal building height is used tocompute building blocks. Building blocks are definedas local height maxima as described in (Bolter, 2001)that are restricted to all non vegetation and non waterclasses. The building blocks are computed in thefollowing way for each pixel classified as solid or roofin initial classification:

• Compute significant minimal height value ina region with specified radius

Fig. 13 shows how small objects like street-lampsor cars are handled in refined classification. The 2lamps are seen as solid objects with large height valueswhich may lead to a classification as building. Due tothe small size of the lamps they are not classified asbuilding but as solid. Additionally small red objectslike the red car in Fig. 13 that are misclassified as roofsare reclassified as solid in the refined classification dueto their small size.

Figure 11: Scattered Classification Result in a Kitchen Gardenis Improved in the Ortho Initial Classification in theRight Image

The fusion of several initial classification resultsimproves the quality of the classification. Scatteredresults, for example caused by chimneys or windowson roofs, are improved in refined classification whena fusion with the height data is performed.


Refined classification detects not only buildingsbut refines the class vegetation into grass and wood ortrees. The refinement is based on a SVM trained usingthe following features:

• the panchromatic value

• mean and standard deviation of panchromaticvalues in a specified neighborhood of the pixel

• mean and standard deviation of heightgradient values in a specified neighborhoodof the pixel

The use of height gradient values improved thedetection of wood or trees compared to approacheswhere only texture measures are used. Fig. 14 givesan example of a house as well as trees, grass and solid.The trees are correctly classified.

Refined classification performs data fusion in away that the classification results are less scattered,see again Fig. 14: the initial classification – top middleimage – has the roof correctly classified but thechimney and a small roof over a window are classifiedas solid. The height data – top right image – as well asthe height gradients – bottom left image - and the

building blocks – bottom middle image - cause aclassification of the whole roof as one block, seebottom right image.

7. CONCLUSIONS AND FUTURE WORKIn our approaches we use the high redundancy in thesource input images to execute classification. Thefusion of initial classification results to an orthoclassification, the DSM generation and the fusion ofortho images and DSM are based on images with ahigh degree of image overlap. The classification taskis performed without human interaction after an initialtraining phase. In the future we see the need to extractedge-based information and to construct digitalelevation models from the DSM and the classificationresults.

REFERENCES[1] J. Bauer, H. Bischof, A. Klaus, K. Karner, (2004), Robust and

Fully Automated Image Registration using Invariant Features,The International Archives of the Photogrammetry, RemoteSensing and Spatial Information Sciences, Volume XXXV,Istanbul, Turkey, ISSN 1682-1777.

[2] R. Bolter, (2001), Buildings from SAR: Detection andReconstruction of Buildings from Multiple View HighResolution Interferometric SAR Data, Dissertation, TU Graz.

[3] Bornik, K. Karner, J. Bauer, F. Leberl, H. Mayer, (2001), High-quality Texture Reconstruction from Multiple Views, TheJournal of Visualization and Computer Animation, Volume 12,Issue 5, 2001, Online ISSN: 1099-1778, Print ISSN: 1049-8907, John Wiley & Sons, pp. 263-276.

Figure 12: (left) building blocks

(right) refined classification with buildings in red andyellow – depending on roof color – as well astrees in dark green and grass in bright green

Figure 13: ( left) RGB ortho image with 2 street lamps and ared car marked

(middle) refined classification without verification ofregions with small size

(right) refined classification result ignoring thestreet-lamps and the car

Figure 14: (top left) RGB image of house with red roof(top middle) initial classification(top right) height data(bottom left) height gradients(bottom middle) building blocks(bottom right) refined classification


[4] C. C. Chang and C. J. Lin, (2005), LIBSVM: a Library forSupport Vector Machines, National Taiwan University.

[5] C. W. Hsu, C. C. Chang, and C. J. Lin, (2003), A PracticalGuide to Support Vector Classification, Department ofComputer Science and Information Engineering, NationalTaiwan University, Taipei 106, Taiwan.

[6] Huang, L.S. Davis, J.R.G. Townshend, (2002), An assessmentof support vector machines for land cover classification, Int.J. Remote Sensing, 23(4), 725-749.

[7] A. Klaus, J. Bauer, K. Karner, K. Schindler, (2002),MetropoGIS: A Semi-Automatic City Documentation System,Photogrammetric Computer Vision 2002 (PCV’02), ISPRS -Commission III, Symposium 2002, September 9-13, 2002,Graz, Austria.

[8] F. Leberl, J. Thurgood., (2004), The Promise of SoftcopyPhotogrammetry Revisited. ISPRS 2004, The InternationalArchives of the Photogrammetry, Remote Sensing and SpatialInformation Sciences, Volume XXXV, Istanbul, Turkey, ISSN1682-1777.

[9] Nister, (2003), An Efficient Solution to the Five-point RelativePose Problem, CVPR 2003, 195–202.

[10] F. Roli, J. Kittler, (2002), Fusion of Multiple Classifiers,Information Fusion, 3, 243.

[11] D. Scharstein and R. Szeliski, (2002), A Taxonomy andEvaluation of Dense Two-frame Stereo CorrespondenceAlgorithms. IJCV, 47(1/2/3), 7-42.

[12] M. Sormann, A. Klaus, J. Bauer, K. Karner, (2004), VRModeler: From Image Sequences to 3D Models, SCCG (SpringConference on Computer Graphics) 2004, ISBN 80-223-1918-X, pp. 152-160.

[13] Strecha, T. Tuytelaars, L. Van Gool, (2003), Dense Matchingof Multiple Wide-baseline Views, ICCV 2003, 2, 1194-1201.

[14] J. Thurgood, M. Gruber, K. Karner, (2004), Multi-RayMatching for Automated 3D Object Modeling. TheInternational Archives of the Photogrammetry, Remote Sensingand Spatial Information Sciences, Volume XXXV, Istanbul,Turkey, ISSN 1682-1777.

[15] V. N. Vapnik, (1995), The Nature of Statistical LearningTheory, Springer, 1995.

[16] C. Zach, A. Klaus and K. Karner, (2003), Accurate DenseStereo Reconstruction using 3D Graphics Hardware,Eurographics 2003, Short Presentations, pp. 227-234.

Date post:	16-Apr-2018
Category:	Documents
Upload:	hathuan
View:	230 times
Download:	3 times

Advanced Data Fusion for Classification and...

Documents