Video and Seismic Sensor-Based Structural Health Monitoring:...

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 8, NO. 2, JUNE 2007 169

Video and Seismic Sensor-Based Structural HealthMonitoring: Framework, Algorithms,

and ImplementationTarak Gandhi, Member, IEEE, Remy Chang, and Mohan Manubhai Trivedi

Abstract—This paper presents the design and applicationof novel multisensory testbeds for collection, synchronization,archival, and analysis of multimodal data for health monitoring oftransportation infrastructures. The framework for data capturefrom vision and seismic sensors is described, and the importantissue of synchronization between these modalities is addressed.Computer-vision algorithms are used to detect and track vehiclesand extract their properties. It is noted that the video and seismicsensors in the testbed supply complementary information aboutpassing vehicles. Data fusion between features obtained from thesemodalities is used to perform vehicle classification. Experimentalresults of vehicle detection, tracking, and classification obtainedwith these testbeds are described.

Index Terms—Multisensor systems, pattern classification, struc-tural health monitoring, tracking.

I. INTRODUCTION AND MOTIVATION

THE deterioration of civil infrastructure in NorthAmerica, Europe, and Japan has been well documented

and publicized. In the United States, 50% of all bridgeswere built before the 1940s and approximately 42% of thesestructures are structurally deficient [1], [2]. In addition, forregions such as the U.S. West Coast and Japan, the occurrenceof earthquakes can cause further damage on the structures.Since the earthquakes in Northridge, CA, (1994), and Kobe,Japan (1995), there has been a quantum jump in the numberof civil structures that have been instrumented for monitoringpurposes. The Federal Highway Administration has beenactively involved in structural health monitoring of bridgesand other infrastructure using various types of sensors.

Manuscript received January 4, 2005; revised May 25, 2006, July 16, 2006,August 11, 2006, and November 14, 2006. This work was supported by theNSF-ITR under Grant 0205720. The Associate Editor for this paper wasH. Chen.

T. Gandhi is with the California Institute for Telecommunications and Infor-mation Technology and Computer Vision and Robotics Research Laboratory,University of California, San Diego, La Jolla, CA 92093-0434 USA (e-mail:[email protected]).

R. Chang was with the Computer Vision and Robotics Research Laboratory,University of California, San Diego, La Jolla, CA 92093-0434 USA. He isnow with Broadcom Corporation, San Jose, CA 95134-1933 USA (e-mail:[email protected]).

M. M. Trivedi is with the Computer Vision and Robotics Research Labo-ratory, University of California, San Diego, La Jolla, CA 92093-0434 USA(e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TITS.2006.888601

The International Society for Structural Health Monitoringof Intelligent Infrastructure [http://www.ishmii.org] has de-scribed several case studies of monitoring various bridgesincluding Commodore Barry Bridge on the Delaware River [3]and the Portage Creek Bridge in Victoria, BC, Canada [4].Various issues involved in the use of sensors for structuraldamage detection in civil as well as aeronautical applicationsare discussed in [5].

The principal objective of civil infrastructure health monitor-ing is to detect and assess the level of damage due to naturaland man-made events, as well as progressive environmentaldeterioration. Sensor data, including video streams, are likelyto be useful for measuring parameters related to these eventsand their effect on the infrastructure. Since vehicular trafficis one of the main causes of deterioration, it is important tocharacterize the overall traffic flow as well as properties ofindividual vehicles such as size and weight, and relate theseto vibrations and other observable effects.

Seismic sensors such as strain gauges, accelerometers, andfiber optic devices can provide a temporal signature of thevehicles passing over them that could be used to extract theweight and the effect of the vehicles on the structure with goodaccuracy. However, seismic sensors are also sensitive to othernatural and artificial phenomena, such as earthquakes, blasts,and external vibrations. Video sensors could be useful fordistinguishing these phenomena from normal vehicular traffic.Video sensors give rich information about the shape, size, color,velocity, and track history of vehicles. However, they are moresusceptible to environmental conditions such as illumination,shadows, and imaging artifacts such as occlusion. Hence, theinformation provided by the seismic sensors and video sensorsis complementary and can be combined to improve the reliabil-ity of the overall system.

We are a part of the multidisciplinary group sponsored byNSF developing a framework for civil infrastructure healthmonitoring [2]. A multisensory testbed on the University ofCalifornia, San Diego (UCSD) campus is used to collect,process, and archive data from seismic as well as vision sensors.In addition, data from 26 strain gauges mounted on the VincentThomas Bridge are recorded on the San Diego Supercom-puting Center (SDSC) database. Pattern recognition methodssuch as neural networks are being used to perform systemidentification and anomaly detection for identifying damagein the structures, as well as to identify different classes ofvehicles passing over the testbed using signatures from the

1524-9050/$25.00 © 2007 IEEE

170 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 8, NO. 2, JUNE 2007

Fig. 1. Block diagram for structural health monitoring of bridges.

seismic sensors [6]. Information on our activities is availableat http://healthmonitoring.ucsd.edu.

This paper describes the design and application of a mul-tisensory framework for collection, analysis, and archival ofdata from seismic as well as visual sensors for health moni-toring of civil infrastructure. An application performing vehicledetection, quantitative property extraction, and classification isdiscussed in detail, followed by a set of real-world experimentsto show the potential of this framework. The organizationof this paper is as follows. Section II describes the overallsystem configuration and issues such as data retrieval andsynchronization between multiple data streams. Vehicle detec-tion, tracking, and information extraction using vision sensors,is described in Section III. The information obtained fromvisual sensors is combined with that from the seismic sensorsin Section IV to perform multisensory vehicle classification.Section V summarizes this paper and explores avenues forfuture work.

II. MULTISENSORY FRAMEWORK FOR

HEALTH MONITORING

Fig. 1 shows the block diagram of the overall system. Theinputs are obtained from multiple cameras as well as seis-mic sensors. To correlate information from various sensors,synchronization issues are addressed using timestamps. Videostreams are processed by the vision module to perform detec-tion and tracking of vehicles and extraction of their propertiessuch as size, speed, and track history. These are used in con-junction with the information obtained from seismic sensorsin order to obtain reliable classification. Since the video datais too large for complete archival, only snapshots of vehicles

detected by the vision module are stored. The extracted vehicleproperties are also stored in the database.

To conduct research on sensor-based infrastructure healthmonitoring, we have designed testbeds containing video aswell as seismic sensors on our campus. The project involvesmultidisciplinary teams, with the Computer Vision and Ro-botics Research (CVRR) Laboratory designing the distributedvideo array (DIVA), the Department of Structural Engineeringdesigning the seismic sensor array, and SDSC working onstorage and retrieval of multisensory data.

A. DIVA

An array of video sensors has been installed near the CVRRlaboratory, and it is used for computer-vision research withapplications in transportation and homeland security [7]–[9].Issues such as modularity, scalability, wide area distributionusing wired or wireless links, as well as open standards forhardware and software, were considered during the design. Theconfiguration of the vision sensor array is shown in Fig. 2. Sev-eral sensor clusters are configured, with two of them located onstreetlights near a busy roadway on campus where buses, cars,bikes, and people are regularly in motion, as shown in Fig. 3(a).Other sensors are mounted on a neighboring building withcoverage of nearby roads and courtyards. Each sensor clustercontains a high-speed pan/tilt/zoom (PTZ) rectilinear cameraand Omni-Directional Video Sensor (ODVS), both mountedin weatherproof housings. The video from these sensors isdigitized by special video server units and streamed using directfiber optic cable and a pair of 1-Gb switches. This special high-speed network connection has the capability of carrying sixteenfull-size video streams. The modular design of this sensor

GANDHI et al.: VIDEO AND SEISMIC SENSOR-BASED STRUCTURAL HEALTH MONITORING 171

Fig. 2. Outdoor video clusters provide continuous streaming video of roads and other areas over standard or high-speed networks.

architecture allows for different algorithms to be tested withoutany difficulty in a plug-and-play manner. Another cluster ofPTZ and ODVS cameras has a direct view of the I-5 freewayfrom the campus.

In addition, the Voigt Bridge on campus that passes overthe I-5 freeway (see Fig. 4) has been instrumented with vi-sion as well as seismic sensors. Vision sensors include SonyXCD-X710CR Firewire cameras to capture high-resolutionimages (1024 × 768). The camera outputs the images inBayer format to save bandwidth, and these can be decodedto produce normal color images. The high-bandwidth datafrom the cameras are transferred to a computer mounted insidethe bridge. Due to the distance between the cameras and thecomputer, transmission is performed using fiber-optic cable,with Firewire-fiber converters at both ends. Data can be re-trieved from the computer on to the main network through awireless link.

B. Seismic Sensor Array

To monitor the long-term performance under usual trafficloading conditions, the Department of Structural Engineeringat UCSD has installed three fiber-reinforced polymer bridge-deck panels along the same campus roadway [10], as shownin Fig. 3(b). The composite deck is mounted with 16 strain-gauge sensors. The data from these sensors are captured andtransferred to the database located at the SDSC. Fig. 3(c)shows the response of two of the strain gauges when a buspasses over the composite deck. Two prominent peaks cor-responding to the passing of the front and rear wheels arerecorded on the strain gauge corresponding to the lane overwhich the bus has passed. The value of the peaks is dependenton the vehicle load, and the interval between the peaks gives

the indication of the time taken by the vehicle to pass overthe sensors.

In addition, a number of seismic sensors have been recentlymounted under the Voigt Bridge on campus that crosses theI-5 freeway (Fig. 4). Data from these sensors are captured bythe bridge-mounted computer using the PXI data-acquisitionsystem from National Instruments. A wireless connection isused to send the sensor as well as the video data from thiscomputer to the main campus network.

C. Capture and Synchronization Issues

The block diagram of the video and seismic event cap-ture system is shown in Fig. 5. The video camera thatprovides the view of the testbed in Fig. 3 is used to cap-ture video in a time-synchronized manner with the strain-gauge capture system by synchronizing the system clocks ofthe separate capture systems and adding a timestamp to thedata as soon as samples have been acquired. The systemis modularized into data capture components and the dataprocessing component. This allows incremental improvementsto be made to each component without directly affecting otherparts of the system. The components include strain-gaugedata capture, video data capture, data processing, and dataarchival.

Seismic sensors such as strain gauges and accelerometerstypically output 16-bit data with a sampling rate of 200 Hzeach. For the testbed used, the total data rate of the sensorsis 51.2 kb/s. This relatively low data rate allows the strain-gauge sensors to be operational at all times, transmitting sensordata to the data storage server. Although the current videocapture system is capable of recording video at 640 × 480at 30 frames/s in Motion-JPEG compressed format, the data


Fig. 3. (a) Image acquired from a camera on a campus roadway testbed with abus passing over the composite deck panel shown as blue rectangle. (b) Strain-gauge sensors inside the composite deck panel (from [6]). (c) Time series ofmeasurements recorded by the two strain-gauge sensors against time.

rate for such configuration would be 45 Mb/s, which makeslong-term data acquisition and transmission over remote linksimpractical. It is possible to store subsampled images for sub-sequent processing, or use other compression formats such asMPEG2 using upgraded hardware capabilities. However, sincethe goal of the project is to study the effect of traffic on thehealth of bridges and other structures, storage of full motionvideo is not critical to the analysis of the state of the bridge.It is more useful to process the videos in real time to provideanother source of traffic data in addition to that provided bythe strain gauges and accelerometers. In addition, it is useful tostore snapshot sequences around interesting events detected bythe vision system. A local buffer of the past 15 seconds of data

Fig. 4. Testbed on the Voigt Bridge, crossing the I-5 Freeway. Pole-mountedcameras overlook the traffic on the bridge and freeway. A seismic sensor arraymounted under the bridge monitors the vibrations due to traffic and other events.

Fig. 5. Block diagram of seismic and video event capture system.

is maintained, and it is transmitted to the data archival systemwhen an event of interest occurs.

Both of the data capture systems reside on a local networkand transmit the data to be processed over a network link. Thestrain-gauge data capturing component is able to capture andtransmit time-stamped data samples with 100-µs precision overthe local network due to the low data rate, whereas the videocapture system down-samples and compresses the video datato maintain a consistent frame rate. The compression methodchosen was lossy JPEG compression due to its frame-to-frameindependence and relatively low computational complexity.The frame-to-frame independence is necessary to allow foran accurate history of the past 15-s worth of video, and alsoallows for more straightforward access to the archived datafor review. A simple bidirectional protocol is implementedbetween the data analyzer and the video server to transmit thetimestamp along with every captured frame and receive triggersthat will send the data to be archived to the storage server whennecessary.

It should be noted that the synchronization obtained usingtimestamps is sufficient only if the seismic sensors are visiblein the field of view, as in Camera 1 in Fig. 5. When the sensorsare not in the field of view, as in Camera 2 in Fig. 5, the vehiclespassing over the seismic sensors need to be matched with thoseseen in the camera. Currently, the matching is performed manu-ally. However, the velocity of the vehicle obtained from vehicletracking in images can be used to estimate the approximate timeat which it would pass over the seismic sensors.


Fig. 6. Block diagram of image-based vehicle detection and tracking algorithm.

III. VEHICLE DETECTION AND TRACKING

To detect interesting events corresponding to moving ve-hicles, image-based vehicle detection and tracking algorithmshown in Fig. 6 is used. The algorithm is capable of process-ing the video stream captured at 30 Hz. Further accelerationis possible with the use of relatively simple digital signalprocessor or field programmable gate array hardware. Singleinstruction multiple data-stream vector processing available inmany modern-day general-purpose processors can also be usedto speed up the processing.

The algorithm generates a background image from a numberof frames using a fast infinite impulse response (IIR) low-pass filter. The background is subtracted from the originalimage frame to enhance moving objects and suppress stationarybackground. A statistical parametric algorithm is used to sup-press shadows of the moving objects. The shadow-suppresseddifference image is then used to extract blobs, and vehicles aredetected by selecting blobs satisfying appropriate constraints.The vehicles are tracked over a number of frames, using meanshift and Kalman filters, and their statistics such as the blobarea, aspect ratio, and velocity, are determined.

A. Motion Analysis and Video Segmentation

To identify moving objects in a video sequence, it is nec-essary to identify locations where changes occur in the videoscene. A computationally inexpensive method of accomplish-ing this is through background subtraction for segmentation,where a background image is subtracted from the currentvideo image.

The background is generated using an IIR low-pass filter asdescribed in [8]. The use of an IIR low-pass filter allows forthe background to be slowly updated for changes in lightingand scenery while utilizing a small memory footprint as wellas having a low computational complexity. Other methodsusing running average [11] and median filtering [12] requiresubstantially more frames to be kept in memory to providea sufficient estimate of the background. On the other hand,

Gaussian mixture models [13] and optical flow analysis [14]are capable of detecting moving objects more precisely and canalso be used for this application. However, they are computa-tionally more intensive.

An advantage of utilizing an IIR filter is the relatively in-expensive computational requirements. Only two frames arekept in memory to represent the filter taps of a first-order low-pass filter. The calculations for generating the background aredeterministic and computationally inexpensive, requiring onlythree multiply and four accumulate operations per pixel for eachcolor channel. The background image is subtracted from theincoming frame to generate a difference image that suppressesthe background and enhances moving objects. The IIR filtercontinuously updates the background in order to accommodateillumination changes.

B. Shadow Suppression

The estimation of vehicle statistics can be adversely affectedwhen shadows are present. In those cases, the shadows shouldbe separated before classifying the pixels as object or back-ground. In this paper, shadow suppression is performed using astatistical parametric approach in [15]. Shadow suppression canbe switched on or off depending on environmental conditions.

Negative values in the background subtracted image areobserved to determine which pixels have decreased in intensity.These are candidates for shadow pixels, because cast shadowswill always cause the intensity of the pixels to drop. Also, pixelscovered by shadows generally exhibit a drop in the normalizedred color component and a rise in the normalized blue colorcomponent. For a pixel to be classified as a shadow, it musthave a negative intensity shift relative to the background ofa magnitude greater than the hysteresis threshold. For thesepixels, if the blue-to-red ratio of the foreground is at least 2%higher than that of the corresponding background pixel, it isclassified as a predicted shadow pixel. Filtering is performed toeliminate gaps for shadow pixels as well as foreground pixels.This suppresses noise and produces the final predicted shadows.


Fig. 7. (a) Vehicle segmentation without shadow suppression. (b) Backgroundsubtraction, showing shadow pixel candidates in red. (c) Vehicle segmentationwith shadow pixels suppressed.

Fig. 7 shows examples of shadow suppression, with a boundingbox around the detected vehicle. The results are compared withthose without shadow suppression.

C. Segmentation Analysis

After the shadow-suppressed difference image is obtained,a hysteresis threshold on the magnitude is employed to ex-tract blobs from the difference image. Hysteresis thresholdtakes advantage of the fact that the foreground blobs tend toform connected components to reduce misdetections and falsealarms. In this procedure, two threshold levels are set based onobservations of the magnitude of the difference image. Pixelsthat are higher than the high threshold are classified as fore-ground, and those lower than the low threshold as background.The pixels between the two thresholds are set as foregroundonly if a set of connected pixels touches a foreground pixel.

After the binary image of blobs is obtained, morphologicalclosing is used to eliminate spurs. The following constraints areplaced on the blobs to select the blobs of interest:

1) Minimum dimension requirement: to decrease the likeli-hood of pedestrians being mistaken as a vehicle;

2) Minimum blob area requirement: to make sure that theblob is sufficiently large to be an object of interest;

3) Blob density requirement: requires that the bounding boxof the blob has a certain minimum density. This is used toeliminate cases of loosely connected blobs.

The result of these constraints produces an image containingonly relatively large connected blobs. The bounding box foreach blob is used as an initial estimate of the location ofthe vehicle which is input into the mean-shift algorithm [16]to produce a better estimate of the size and location. Thistechnique operates on the assumption that the vehicles are moretextured compared to the roadway; thus, the bounding boxshould increase or decrease to fit the contour of the vehicle. Aback projection histogram image is created in HSV space andused by mean-shift for accurate localization of objects.

Fig. 8. Steps in the implementation of vehicle tracking algorithm. (a) Originalframe from the video sequence. (b) Output after background subtraction.(c) Identification of vehicle regions. (d) Tracking over frames using Kalmanfilter.

D. Vehicle Tracking

After the bounding box is extracted for each vehicle, the pixellocations are remapped using planar homography to obtainground plane coordinates, correcting for perspective distortion.Kalman filtering is used to predict the positions of the remappedcentroid and bounding box locations of each vehicle as well asestimate the velocity. Frame-to-frame association is performedby first using the nearest neighbor centroid locations for severalframes to allow for the Kalman filter to converge. The closestcentroid to the predicted centroid and closest instantaneousvelocity to the estimated velocity are then used for association.In the case where a vehicle is not detected in the next frame,the Kalman prediction of the bounding box location and cen-troid are used in conjunction with the mean-shift algorithm toestimate the next location of the vehicle.

E. Application

Fig. 8 shows an example of vehicle detection and trackingalgorithm. Fig. 9 shows the application with live data takenfrom the testbed. The snapshots on the side of the vehiclespassing the seismic sensors are recorded. The bottom figuresshow the output from the strain-gauge sensors. Note that thecurrent implementation has a delay in getting sensor output;hence, the blips due to the vehicle are not visible in the image.Fig. 10 shows snapshots of vehicles detected by the algorithm,along with strain-gauge measurements recorded at the sametime. The algorithm for detection and tracking of vehicles wasalso tested on the I-5 testbed camera (Fig. 4). The results areshown in Fig. 11.

IV. SENSOR FUSION AND CLASSIFICATION

The classification of vehicles moving over civil infrastruc-tures is useful for studying the effect of vehicles in deterioration


Fig. 9. Snapshot of the vehicle detection and tracking application. Thedetected vehicles in left and right lanes are identified, and the snapshots ofthem shown on the side are archived. The bottom figures show the output fromthe strain-gauge sensors. Note that the current implementation has a delay ingetting sensor output; hence, the blip due to the vehicle is not visible in theimage.

Fig. 10. Snapshots of detected vehicles with the measurements from twostrain gauges corresponding to vehicles traveling in left and right lanes.

Fig. 11. Results of vehicle detection and tracking algorithm on freeway traffic.

of roadways. The signatures from seismic sensors could beused for classifying vehicles as in [7]. However, the straingauges are also sensitive to natural and artificial events suchas earthquakes, blasts, and vibrations due to heavy traffic onunderpasses. Video sensors can help disambiguate these ex-ternal events from the effects of traffic. The following sectiondescribes the use of multiple modalities, strain gauge and videosensors, to perform classification of vehicles.

A. Multimodal Feature Selection

The strain-gauge response for the vehicles contain two peakscorresponding to the passing of the front and rear wheelsover the composite deck. The peak strain-gauge response ishigher for heavier vehicles such as buses and relatively lowfor passenger cars. Furthermore, heavy vehicles have largerlength compared to cars and tend to travel slower. Due to thisreason, the time interval between the peaks is likely to be largerfor heavier vehicles. Hence, the peak strain-gauge responseas well as the time duration between two peak responses canbe used to discriminate between vehicle types. However, thepeak response of the strain-gauge sensors can vary greatlydepending on the load the vehicles are carrying. Also, withlight traffic such as cars, the signal-to-noise ratio is low, andthe peak responses, as well as interval between peaks, arelikely to be less reliable. For classification using video, theblob area is useful for distinguishing classes based on physicalsize. Aspect ratio can also be used in distinguishing betweendifferent types of vehicles. However, the accuracy of the blobparameters deteriorates in presence of shadows, occlusions, andpoor lighting.

Thus, it is seen that vision and seismic sensors have theirlimitations under different operating conditions, but can com-plement each other for performing classification. To performsensor fusion between the two modalities, the following fea-tures obtained from the video and strain-gauge sensors werestudied:

1) peak response recorded by the strain gauge in a timeinterval around the detected vehicle;

2) time interval between peak responses of strain gauge;3) vehicle blob area obtained from the image;4) aspect ratio of the vehicle blob.

To get scale invariance, logarithms of these features wereused during classification. However, a small offset was addedbefore taking the logarithms in order to avoid logarithm of zeroand minimize sensitivity to noise when feature values are small.

B. Classification Using Fisher Linear Discriminant

For performing classification, Fisher linear discriminantanalysis [17] was applied to these features. The original featurespace is projected to a lower dimensional space in a waythat preserves as much discriminability between the classes aspossible. This classifier uses the expressions for within classscatter (SW) and the between class scatter (SB) given by

SW =∑

c

∑k

(xc,k − µc)(xc,k − µc)T

SB =∑

c

nc(µc − µ)(µc − µ)T

where xc,k represents the feature vector of sample k in class c,nc is the number of samples in class c, µc is the class mean, andµ is the overall mean of all the classes. The feature vector x isprojected into a lower dimensional space as

y = WTx.


Fig. 12. Scatter plots of various features associated with vehicles. Cars and other light vehicles are marked with a blue cross, and buses and other heavy vehiclesare marked with a red triangle.

To preserve the discriminative power even when projected inlower dimensional space, one tries to maximize the betweenclass scatter while minimizing the within class scatter. Thescatter matrices of the projected feature vector are given by

S̃B = WTSBW S̃W = WTSWW.

The optimal projection is found by minimizing their ratio

J(W ) =WTSBW

WTSWW.

It can be shown that W is then a matrix whose columnsare eigenvectors wi corresponding to the d largest generalizedeigenvalues of the system, with d being the dimension of thereduced space

SBwi = λiSWwi.

The class distributions of the projected features are thenmodeled as lower dimensional Gaussians with class means andvariances projected from their original values. To classify thesamples, Bayesian criterion can be applied to the log likelihoodfunctions Lc given by

Lc =(y − µ̃c)TΣ̃−1

c (y − µ̃c)2

− ln√

2π|Σ̃c|

µ̃c =WTµc Σ̃c = WTΣcW.

It should be noted that the presence of outliers in a classthat are very far from most of the elements of the class has alarge effect on computation of these matrices. Hence, duringthe computation of scatter matrices, an iterative method is usedto identify and remove the outliers. The Mahalanobis distancebetween all the samples and the cluster mean is computed as

√(xc,k − µc)TΣ−1

c (xc,k − µc)


TABLE ICLASSIFICATION PERFORMANCE USING ALL COMBINATIONS OF

FEATURES (a) AREA UNDER ROC (b) TOTAL ERROR PROBABILITY

where Σc is the class covariance matrix. Features with largeMahalanobis distance are removed, and the scatter matrices arerecomputed. The process is repeated for a few iterations.

C. Experimental Results and Analysis

For experimental studies, 40 min of video and seismic sensordata obtained from the roadway testbed were processed. A totalof 67 vehicles were detected and classified using the featuresextracted from the video and strain-gauge sensors. Fig. 12shows the scatter plots between various pairs of features. It isseen that the blob area works the best in discriminating betweenthe two classes which form distinctive clusters except for twooutliers having abnormally large blob areas due to merging oftwo blobs into one. The clusters approximately correspond totwo classes: 1) small vehicles such as cars, vans, and smallpick-up trucks (blue crosses), and 2) large vehicles such asbuses and trucks (red triangles). However, it should be notedthat the performance of image features could deteriorate forconditions with poor illumination, sharp shadows, occlusions,and background motion. The strain-gauge response would beuseful in such cases. The peak strain is usually lower for the carsthan for the buses. However, since the strain-gauge response isrelated to the weight rather than the volume of the vehicle, thereis some overlap between heavier SUVs and empty buses. Thegap between the peaks was quite consistent for large vehicles.However, for small vehicles, the peaks in strain-gauge responsewere not well defined; therefore, the peak gap was unreliableand had large spread. On the other hand, aspect ratio of smallvehicles was found to be more consistent than that of the largevehicles.

Linear discriminant analysis described above was then ap-plied to all different combinations of the features, project-ing them into one dimension, and then classifying based onBayesian criterion. For performance evaluation, a bootstrap-ping method was used to generate larger number of trialsfrom the same data set. Each trial randomly partitioned thedata set into training and testing subsets of nearly equal size.

Fig. 13. ROC curves for classification obtained using combinations of variousfeatures.

The classifier was trained using the training set and then ap-plied to the testing set. The experiment was repeated for 1000trials, and receiver operating curves (ROCs) were obtained bypooling the classified test samples from all the trials. Differentcombinations of features were used to classify the test samples,


Fig. 14. Scatter plots of features associated with vehicles divided into three classes. Cars are marked with a blue cross, vans and pickup trucks with a magentacircle, and buses and other large vehicles are marked with a red triangle.

showing the performance in terms of Area under ROC (AROC)as well as the total error of classification for all combina-tions of features (Table I). Individual ROCs generated fromselected combinations are shown in Fig. 13 for comparisonpurpose.

It was observed that the classification obtained using the areawas better than that from the peak strain measurement. Thestrain-gauge peaks and peak gaps were not very reliable forcars, since their amplitudes were only slightly greater than thebackground noise. Also, some small but heavier vehicles likepick up trucks included in the car class possibly gave largestrain peak response. Combining peak strain and blob area gavebetter results than using the strain-gauge information alone.However, the results using area and strain together were notmuch different than using only the blob area.

The approach described above can also be used for dividingthe vehicles into more number of classes. Fig. 14 shows thescatter plots of features of vehicles divided into three classes:

1) cars, 2) buses and other large vehicles, and 3) vans andsmall pick-up trucks. However, it is observed that the distinctionbetween classes 1) and 3) in terms of these features is notsubstantial enough to perform reliable classification. Otherfeatures such as shape contours and the full signature of theseismic sensors may help to disambiguate between the classes.Furthermore, a much larger dataset would be needed to ade-quately train the classifier. Linear discriminant analysis wouldbe especially useful in this case to reduce the dimensionality ofthe problem.

It should be noted that camera placement can play a crucialrole in the accuracy of vehicle classification. As shown inFig. 5, there are two cameras overlooking the roadway thatthe composite deck lies on. Camera 2, which has a side view,shown in Fig. 15, overlooks the road from a high perspective,shown in Fig. 8. In contrast to camera 1, there is less perspectivedistortion to influence vehicle area measurements. Furthermore,in this view, the length of the vehicle can be measured, which


Fig. 15. (a) Image from the side-view camera. It is seen that length ofthe vehicle (instead of width) can be measured, giving better discriminationbetween cars and buses. (b) Scatter plot of strain-gauge response againstblob area.

has greater variation between classes of cars and buses. Due tothese reasons, a greater separability between classes is obtainedusing the side-view blob area, as seen in the scatter plot of thepeak strain against the blob area (Fig. 15). This camera view,however, does not overlook the strain-gauge sensors on thecomposite deck, and it currently requires manual registrationto correlate the strain-gauge data with that of the video data. Ifautomatic registration is desired, one can try to use the speedof the vehicle computed by tracking in the vision sensor toestimate the expected time it would be detected by the seismicsensor. Correlation between the image-based features and theseismic sensor response would also be useful to disambiguatethe matches.

These results show the potential of using sensor fusion forimproving classification. Nonlinear classifiers such as supportvector machines and neural networks may be of help for ob-taining more robust performance.

V. SUMMARY AND FUTURE WORK

This paper described the issues involved in the design ofmultisensory testbed for health monitoring of transportationinfrastructure. An application for detection and tracking ofvehicles passing over the testbed using computer-vision algo-rithms was described. It was noted that the vision and seismicsensors provide complementary information and have theiradvantages and limitations. Hence, a combination of thesemodalities would be useful for robust analysis. In particular,an application to classify vehicles using features from bothmodalities was illustrated.

For future work, we plan to use multiple cameras and seismicsensors in order to obtain more robust tracking and classifica-tion of vehicles. Multiple cameras with overlapping fields ofview would be useful in separating the three dimensions of thevehicles, and help handling occlusion between vehicles. On theother hand, cameras with slightly overlapping or nonoverlap-ping fields of view could be used to cover a wider area. Inthat case, the problem of handoff and reidentification needsto be addressed. Also, when the seismic sensors are not inthe line of sight of the cameras, one also needs to match thevehicles observed in the sensor and the camera. This is a multi-modal reidentification problem which also needs to be explored.Possible cues for reidentifying the vehicles are the expectedtime interval computed from the vehicle speed, as well as thecorrelation between size of the vehicle and the response of theseismic sensors. Using these cues, the classification processcould be combined with reidentification to form a unifiedsystem.

If there are multiple traffic lanes, the camera could identifythe lane from the lateral positions of the vehicle track. Each lanecould have its own seismic sensor to detect vehicles passingon that lane. In this way, correspondences can be establishedseparately for each lane. In that case, the vehicle velocity canbe used to synchronize the camera and the seismic outputs. Ifthe lanes cannot be identified, then the correlation method willprobably need to be used.

The classification illustrated in this paper uses two featureseach from both modalities. Features such as shape, texture,and color from vision sensors and the time series propertiesfrom seismic sensors would be useful for finer classificationinto more vehicle classes. Use of classifiers such as supportvector machines or neural networks could also help to in-crease the robustness of classification. Other modalities suchas audio, LASER scanners, and inductive loops can also helpto improve the reliability of detection and classification ofvehicles [18].

ACKNOWLEDGMENT

The authors acknowledge the support of the National Sci-ence Foundation and UC Discovery Grant for the research.The authors thank the editor and the reviewers for providinginsightful comments that helped improve the quality of thispaper, and the collaborators A. Elgamal and J. Conte alongwith their research groups from the Department of StructuralEngineering, UCSD. The authors would also like to thank,in particular, M. Fraser for his valuable assistance on data


collection, the colleagues from the CVRR laboratory for theircontributions and support, J. Ploetner for implementation ofshadow removal algorithm, and J. Wu for insights on patternclassification.

REFERENCES

[1] J. M. Stallings, J. W. Tedesco, M. El-Mihilmy, and M. McCauley, “Fieldperformance of FRP bridge repair,” ASCE, J. Bridge Eng., vol. 5, no. 2,pp. 107–113, 2000.

[2] A. Elgamal, J. P. Conte, L. Yan, M. Fraser, S. F. Masri, M. El Zarki,T. Fountain, and M. Trivedi, “A framework for monitoring bridges andcivil infrastructure,” in Proc. 3rd China-Japan-US Symp. Struct. HealthMonitoring and Control, Dalian, China, Oct. 2004.

[3] A. E. Aktan, F. N. Catbas, K. A. Grimmelsman and M. Pervizpour,“Development of a model health monitoring guide for major bridges,”Drexel Intell. Infrastruct. Transp. Safety Inst., Philadelphia, PA,Federal Highway Administration Research Development, Tech. Rep.No. DTFH61-01-P-00347, Sep. 2002.

[4] A. A. Mufti, S. Rahman, V. Lemay, P. Sargent, and S. Huffman, “Fieldassessment and remote monitoring of a bridge pier strengthened withGFRP wrap,” in Proc. Develop. Short and Medium Span Bridge Eng.,Aug. 2, 2002, vol. 1, pp. 611–618.

[5] H. Van der Auweraer and B. Peeters, “Sensors and systems for struc-tural health monitoring,” J. Struct. Control, vol. 10, no. 2, pp. 117–125,2003.

[6] L. Yan, M. Fraser, A. Elgamal, and J. P. Conte, “Applications ofneural networks in structural health monitoring,” in Proc. 3rd China-Japan-US Symp. Struct. Health Monitoring and Control, Dalian, China,Oct. 2004.

[7] G. Kogut and M. M. Trivedi, “Real-time wide area tracking: Hardwareand software infrastructure,” in Proc. 5th Int. IEEE Conf. Intell. Transp.Syst., Singapore, Sep. 2002, pp. 587–593.

[8] R. Chang, T. Gandhi, and M. M. Trivedi, “Computer vision for multi-sensory structural health monitoring system,” in Proc. 7th IEEE Conf.Intell. Transp. Syst., Oct. 2004, pp. 971–976.

[9] M. M. Trivedi, T. L. Gandhi, and K. S. Huang, “Distributed interactivevideo arrays for event capture and enhanced situational awareness,” IEEEIntell. Syst.—Special Issue on Artificial Intelligence in Homeland Secu-rity, vol. 20, no. 5, pp. 58–66, Sep./Oct. 2005.

[10] M. Fraser, A. Elgamal, K. Oliver, and J. P. Conte, “Data fusion applicationfor health monitoring,” in Proc. 1st Int. Workshop Adv. Smart Mater. andSmart Struct. Technol., Honolulu, HI, Jan. 2004.

[11] K. Karmann, A. Brandt, and R. Gerl, “Moving object segmentation basedon adaptive reference images,” in Proc. European Signal ProcessingConference, 1990, vol. 5, pp. 951–954.

[12] A. Lai and N. Yung, “A fast and accurate scoreboard algorithm for esti-mating stationary backgrounds in an image sequence,” in Proc. IEEE Int.Symp. Circuits and Syst., 1998, vol. 4, pp. 241–244.

[13] C. Stauffer and W. E. L. Grimson, “Adaptive background mixture modelsfor real-time tracking,” in Proc. IEEE Conf. Comput. Vis. and PatternRecog., Fort Colins, CO, Jun. 1999, pp. 246–252.

[14] B. D. Lucas and T. Kanade, “An iterative image registration techniquewith an application to stereo vision,” in Proc. DARPA Image Understand-ing Workshop, 1981, pp. 121–130.

[15] I. Mikic, P. Cosman, G. Kogut, and M. M. Trivedi, “Moving shadow andobject detection in traffic scenes,” in Proc. Int. Conf. Pattern Recog.,Sep. 2000, vol. 1, pp. 321–324.

[16] D. Comaniciu, V. Ramesh, and P. Meer, “Real-time tracking of non-rigidobjects using mean shift,” in Proc. IEEE Conf. Comput. Vis. and PatternRecog., 2000, vol. 2, pp. 142–149.

[17] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2nd ed.New York: Wiley-Interscience, Oct. 2000.

[18] J. Ploetner and M. M. Trivedi, “A multimodal framework for vehicleand traffic flow analysis,” in Proc. IEEE Intell. Transport. Syst. Conf.,Sep. 2006, pp. 1507–1512.

Tarak Gandhi (S’93–M’99) received the Bache-lor of Technology degree in computer science andengineering from Indian Institute of Technology,Bombay, India, and the M.S. and Ph.D. degrees incomputer science and engineering, specializing incomputer vision from Pennsylvania State University,State College.

He was with Adept Technology, Inc., where hedesigned algorithms for robotic systems. Currently,he is an Assistant Project Scientist at the CaliforniaInstitute for Telecommunications and Information

Technology (Cal-IT2), University of California, San Diego, La Jolla. He isalso one of the key members of the Computer Vision and Robotics ResearchLaboratory (CVRR), University of California, San Diego. He is working onprojects involving intelligent driver assistance, motion-based event detection,traffic flow analysis, and structural health monitoring of bridges. His researchinterests include computer vision, motion analysis, image processing, robotics,target detection, and pattern recognition.

Remy Chang received the B.S. and M.S. degrees inelectrical and computer engineering from the Univer-sity of California, San Diego, La Jolla.

During his graduate studies, he was with the Com-puter Vision and Robotics Research Laboratory, Uni-versity of California, San Diego, where he performedmultimodal analysis of traffic data from video as wellas seismic sensors. He is currently with BroadcomCorporation, San Jose, CA, as an Engineer for enter-prise switching ASICS.

Mohan Manubhai Trivedi received the B.E. degree(with honors) from Birla Institute of Technology andScience, Pilani, India, and the Ph.D. degree fromUtah State University, Logan.

He is a Professor of Electrical and Computer En-gineering and the founding Director of the ComputerVision and Robotics Research Laboratory, Univer-sity of California, San Diego (UCSD), La Jolla. He,in partnership with research laboratories of major au-tomobile companies and the UC Discovery Program,has established the Laboratory for Intelligent and

Safe Automobiles, UCSD to pursue a multidisciplinary research agenda. Heserved as the Editor-in-Chief of the Machine Vision and Applications Journal(1996–2003) and on the editorial boards of several journals. His researchinterests include intelligent systems, computer vision, intelligent ("smart")environments, intelligent vehicles and transportation systems, and human-machine interfaces areas. In partnership with research laboratories of majorautomobile companies and the UC Discovery Program, he has established theLaboratory for Intelligent and Safe Automobiles (LISA) at UCSD to pursue amultidisciplinary research agenda.

Prof. Trivedi served as the Editor-in-Chief of the Machine Vision andApplications Journal (1996–2003) and on the editorial boards of severaljournals. Currently, he is an Associate Editor for the IEEE TRANSACTIONS ON

INTELLIGENT TRANSPORTATION SYSTEMS. He served as a Program Chairfor the IEEE International Intelligent Vehicles Symposium IV, 2006. He serveson the Executive Committees of the University of California Digital MediaInnovation Program and of the California Institute for Telecommunication andInformation Technologies (Cal-IT2) as the leader of the Intelligent Transporta-tion and Telematics Layer at UCSD. He received the Distinguished AlumnusAward from Utah State University, the Pioneer Award (Technical Activities),and the Meritorious Service Award from the IEEE Computer Society. Heregularly serves as a Consultant to industry and government agencies in theUSA and abroad.

Date post:	09-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Video and Seismic Sensor-Based Structural Health Monitoring:...

Documents