+ All Categories
Home > Documents > Advances in Multimodal Tracking of Driver Distraction

Advances in Multimodal Tracking of Driver Distraction

Date post: 08-Dec-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
18
Advances in Multimodal Tracking of Driver Distraction Carlos Busso and Jinesh Jain The University of Texas at Dallas, Richardson, TX 75080, USA [email protected], [email protected] Summary. This chapter discusses our research efforts in tracking driver distraction using multimodal features. A car equipped with various sensors is used to collect a database with real driving conditions. During the recording, the drivers were asked to perform common secondary tasks such as operating a cell phone, talking to another passenger and changing the radio stations. We analyzed the differences observed across multimodal features when the driver is engaged in these secondary tasks. The study considers features extracted from the controller area network-bus (CAN- Bus), a frontal camera facing the driver and a microphone. These features are used to predict the distraction level of the drivers. The output of the proposed regression model has high correlation with human subjective evaluations (ρ=0.728), which validates our approach. Keywords: Driver Behavior, Attention, Distraction, Multimodal Feature Analysis, Driver Distraction, Secondary Task, CAN-Bus Data, Head Pose Es- timation, Subjective Evaluation of Distraction, Real Traffic Driving Recording 1 Introduction New advances in sensing technologies and signal processing have opened inter- esting opportunities for in-vehicle systems that aim to increase the security on the roads. One important direction is to monitor driver behaviors that can lead to car accidents. According to the National Highway Traffic Safety Administration (NHTSA), more than 25% of the police-reported accidents involved distracted drivers [26]. This fact is supported by the “100-car Natu- ralistic Driving Study,” which concluded that over 78% of crashes and 65% of near crashes were the result of inattentive drivers [21]. These statistics are not surprising since 30% of the time that a car is moving, the driver is involved in secondary tasks that are potentially distracting [25]. With the development of new in-vehicle technologies, these numbers are expected to increase. There- fore, it is important to identify and develop feasible monitoring systems that
Transcript

Advances in Multimodal Tracking of DriverDistraction

Carlos Busso and Jinesh Jain

The University of Texas at Dallas, Richardson, TX 75080, [email protected], [email protected]

Summary. This chapter discusses our research efforts in tracking driver distractionusing multimodal features. A car equipped with various sensors is used to collect adatabase with real driving conditions. During the recording, the drivers were asked toperform common secondary tasks such as operating a cell phone, talking to anotherpassenger and changing the radio stations. We analyzed the differences observedacross multimodal features when the driver is engaged in these secondary tasks.The study considers features extracted from the controller area network-bus (CAN-Bus), a frontal camera facing the driver and a microphone. These features are usedto predict the distraction level of the drivers. The output of the proposed regressionmodel has high correlation with human subjective evaluations (ρ=0.728), whichvalidates our approach.

Keywords: Driver Behavior, Attention, Distraction, Multimodal FeatureAnalysis, Driver Distraction, Secondary Task, CAN-Bus Data, Head Pose Es-timation, Subjective Evaluation of Distraction, Real Traffic Driving Recording

1 Introduction

New advances in sensing technologies and signal processing have opened inter-esting opportunities for in-vehicle systems that aim to increase the securityon the roads. One important direction is to monitor driver behaviors thatcan lead to car accidents. According to the National Highway Traffic SafetyAdministration (NHTSA), more than 25% of the police-reported accidentsinvolved distracted drivers [26]. This fact is supported by the “100-car Natu-ralistic Driving Study,” which concluded that over 78% of crashes and 65% ofnear crashes were the result of inattentive drivers [21]. These statistics are notsurprising since 30% of the time that a car is moving, the driver is involved insecondary tasks that are potentially distracting [25]. With the developmentof new in-vehicle technologies, these numbers are expected to increase. There-fore, it is important to identify and develop feasible monitoring systems that

2 Carlos Busso and Jinesh Jain

!"#$%"&$#'()&$*+&(

,+-.*$"&(

/++01.23(

!$2&"4-"#+5(

/&"#%.6(7.8+&.(

79:;,<5(

=".0(7.8+&.(

Fig. 1. Monitoring driver behavior system using multimodal information.

are able to detect and warn inattentive drivers. These systems will play acrucial role in preventing accidents and increasing the overall security on theroads.

Distraction can affect visual, cognitive, auditory, psychological and phys-ical capabilities of the driver. Distraction is defined by the Australian RoadSafety Board as “the voluntary or involuntary diversion of attention from theprimary driving tasks not related to impairment (from alcohol, drugs, fatigue,or a medical condition)” [32]. Under this well accepted definition, the driveris involved in additional activities that are not related to the primary drivingtask, which include talking to a passenger, focusing on events or objects, andmanipulating in-cab technologies. As a result, the driver reduces his/her sit-uational awareness, which affects his/her decision making, increasing the riskof crashes.

We have been working in detecting inattentive drivers by combining differ-ent modalities including Controller Area Network-Bus (CAN-Bus) data, videocameras, and microphones [9, 14, 15]. Our long term goal is to develop a mul-timodal framework that can quantify the attention level of the driver by usingthese noninvasive sensors (Fig. 1). Instead of relying on simulations, the studyis based on recordings with actual drivers in real world scenarios using theUTDriver platform - a car equipped with multiple sensors [2]. First, we havestudied the changes observed in features across modalities when the driver isinvolved in common secondary tasks such as operating navigation systems,radio and cell phone [14]. Then, we have proposed a regression model basedon relevant multimodal features that can predict driver distraction [9]. Theresults have shown that the outputs of the proposed system correlate withhuman subjective evaluations.

This chapter discusses the state-of-the-art in detecting inattentive driversusing multiple sensing technologies. It describes previous studies and our owncontributions in field. Notice that we only focus on distractions produced bysecondary tasks. We do not include distractions or impairments produced byalcohol, fatigue or drugs [18, 24].

The chapter is organized as follows. Section 2 gives a brief overview of pre-vious work related to the study presented in this chapter. Section 3 describes

Advances in Multimodal Tracking of Driver Distraction 3

the protocol used to collect the database. Section 4 presents subjective eval-uations to quantify the perceived distractive behaviors. Section 5 reports ouranalysis of features extracted from the CAN-Bus signal, a frontal camera anda microphone. We study the changes in behaviors observed when the driveris engaged in secondary tasks. Section 6 demonstrates that the multimodalfeatures can be used to recognize drivers engaged in secondary tasks, and toinfer the distraction level of the drivers. Section 7 concludes the chapter withdiscussion and future directions.

2 Related Work

Several studies have attempted to detect inattentive drivers. These stud-ies have proposed different sensing technologies including Controller AreaNetwork-Bus (CAN-Bus) data [2, 3, 16], video cameras facing the driver[11, 17, 30], microphones [3] and invasive sensors to capture biometric sig-nals [10, 18, 23]. Some studies have analyzed data from real driving scenarios[2, 6, 16], while other have considered car simulators [12, 23, 31]. They alsodiffer on the secondary tasks considered in the analysis. Bach et al. presentedan exhaustive review of 100 papers that have considered the problem of un-derstanding, measuring and evaluating driver attention [4]. This section givesa brief overview of the current approaches to detect driver distractions.

2.1 Modalities

Features derived from the vehicle such as speed, acceleration, and steer-ing wheel angle can be very valuable to assess driver behaviors [12, 17,27, 28, 31, 33]. Relevant information can be extracted from CAN-Bus data.Sathyanarayana et al. used CAN-Bus signals to model driver behaviors [27].They extracted steering wheel angle, gas and brake pedal pressure. The pro-posed information was used to detect driving maneuvers such as turns, stops,and lane changes. After maneuver recognition, they recognized distractionusing driver-dependent Gaussian mixture model-universal background model(GMM-UBM). Unfortunately, accessing the CAN-Bus information is not al-ways possible, since the car manufactures protect this information. Accessingto the car information is easier in studies that use car simulators. These inter-faces usually provide detailed information about the car. For example, Tangoand Botta used features such as the steering angle, lateral position, lateralacceleration, and speed of the host-vehicle to predict the reaction time of thedrivers [31]. Along with other features, Liang et al. used the steering wheelposition, steering error and lane position to assess cognitive distraction [17].Ersal et al. built a radial-basis neural network model to characterize normaldriving behavior [12]. The proposed system used features derived from thepedal position. They used this normal model to identify variations on the

4 Carlos Busso and Jinesh Jain

behaviors displayed when the driver was engaged in secondary tasks. Like-wise, Yang et al. used a GPS signal to approximate information that can beextracted from the CAN-Bus signal such as the velocity and steering wheelangle. They use computer simulations for the study [33].

Cameras have been used to detect and tract inattentive drivers. Severalstudies have attempted to infer the head pose and/or eyelid movements ofthe driver from frontal cameras [6, 16, 17, 30]. Liang et al. showed that eyemovements and driving performance measures (e.g., steering wheel position,lane position) were useful to detect cognitive distraction [17]. They proposeda classifier trained with support vector machine (SVM), achieving an averageaccuracy of 81.1%. Su et al. presented a simple approach to monitor driverinattention using eyelid movements and facial orientation [30]. They used alow cost CCD camera mounted in the car dashboard. Bergasa et al. also con-sidered eyelid movements and head pose to detect fatigue [6]. They estimatedthe percent eye closure (PERCLOS), eye closure duration, blink frequency,face position, fixed gaze and nodding frequency, which were fused with fuzzyclassifiers. In addition to head rotations, Kutila et al. used the gaze of thedriver and lane tracking data to detect visual and cognitive workload. [16].They used a stereo camera system to estimate the head and gaze features. Animportant challenge in this field is processing video in real driving conditionsdue to lighting variations. Fortunately, advances in computer vision have ledto robust tracking algorithms that are effective and suitable for in-vehicle sys-tems [20]. Furthermore, Bergasa et al. showed that IR-illuminator can be usedto reduce changes in illumination [6].

Other studies have considered physiological signals, which are correlatedwith the drivers workload, attention and fatigue [10, 18, 23]. Among all phys-iological signals, electroencephalography (EEG) is the predominant and mostused modality [4]. Putze et al. used multiple biometric signals such as skinconductance (SC), photoplethysmography (PPG), respiration, and EEG [23].Damousis and Tzovaras proposed a fuzzy fusion system to detect alert versusdrowsy drivers using electrooculogram (EOG) [10]. They used this signal to in-fer different eyelid activity indicators that were used as features. Lin et al. usedEGG to detect drowsiness [18]. Sathyanarayana et al. extracted CAN-Bus in-formation along with body sensors (accelerometer and gyroscope) to detectdriver distraction. They attached body sensors to the driver’ head and legs.The drivers were also recorded with cameras, which were used to manuallyand automatically segment the corpus into normal and task conditions. Theyreported accuracies above 90% for the detection of distraction with k-NNclassifiers.

2.2 Inducing visual and cognitive distractions

Different approaches have been used to induce visual and cognitive distrac-tions. They aim to increase the driver workload, affecting the primary drivingtask. Therefore, the recordings can include samples of inattentive behaviors.

Advances in Multimodal Tracking of Driver Distraction 5

The most common approaches to induce cognitive distractions include solvingmath problems [13, 16, 23], talking to another passenger [16, 27], and pay-ing attention to cognitive activities (e.g., follow stock market) [17]. For visualdistractions, common techniques are “look and find” tasks [17, 23, 31], operat-ing devices (e.g., touchscreen, cellphone, GPS) [12, 14], and reading numbers[16]. In our work, we are interested in analyzing behaviors when the driveris engaged in secondary tasks that are commonly performed in real drivingscenarios.

2.3 Driving Platforms

Bach et al. reported that most of the studies in this research area have con-sidered recording from car simulators (51% of the studies considered in thereview) [4]. In many cases, using simulators is the only feasible and secure ap-proach. For examples, studies that aim to detect fatigue are usually conductedin laboratory setup [10, 18]. As an exception, Bergasa et al. studied fatigue inreal driving condition. However, the subjects were asked to simulated drowsybehaviors [6]. Likewise, car simulators are normally used when physiologicalsignals are used [10, 18, 23]. Some of these signals are difficult to collect in realcar. Also, the invasive nature of the sensors makes this approach less suitablefor real-world driving scenarios.

Some studies have used recording from cars in real roads [6, 16, 27, 28].For this purpose, different car platforms have been designed to collect driverbehaviors with data acquisition systems consisting of invasive and noninvasivesensors in real-world driving scenarios. Examples include Argos [22], UYANIK[1] and UTDrive [2, 3] (this study uses the UTDrive platform). These cars willprovide more realistic data to study driver behaviors.

3 Recording Drivers in Real Road Conditions

The scope of our work is to use multimodal sensors to track driver behaviorsduring real driving conditions. This requirement implies that the study can-not rely on recordings based on driving simulations. Therefore, we recordedsubjects driving the UTDrive platform on real roads near the main campusof The University of Texas at Dallas (UTD). The UTDrive car is a 2006Toyota RAV4 equipped with data acquisition systems with multiple sensorsincluding camera and microphone arrays (Fig. 2-a). The system records theCAN-Bus data, which provides relevant car information such as the brake,gas, acceleration, vehicle speed and steering wheel angle. A frontal camera(PBC-700H) facing the driver was placed on the dashboard behind the steer-ing wheel (Fig. 2-b). This camera records 30fps at a resolution of 320x240. Itprovides valuable information about facial expressions and head orientationof the driver. There is also a camera facing the road ahead, which records15fps at a resolution of 320x240. Although we have not used this camera in

6 Carlos Busso and Jinesh Jain

(a) UTDrive (b) Setup

Fig. 2. UTDrive Car and sensors Setup

our work, it provides important information that can be used for lane tracking[8, 16]. The information is simultaneously stored in a Dewetron computer. Inaddition, a Global Positioning System (GPS) is placed at the center of thefront windshield for the experiments. The database was collected in dry dayswith good light conditions to reduce the impact of environment in the study.The readers are referred to [2] for further details about the UTDrive project.

In our study, we are interested in analyzing the observed behaviors whenthe driver is involved in secondary tasks. We decided to include commonactivities such as changing radio stations, talking by cellphone, operating aGPS and having spontaneous conversation with a passenger. There are othercommon secondary tasks such as texting, eating, drinking, grooming, andsmoking that were not included for security reasons.

A multimodal database was collected from 20 subjects consisting of stu-dents and employees of the university. They were asked to drive the UTDrivercar following a predefined, 5.6 mile route, described in Figure 3. This routeincludes many traffic lights, stop signs, heavy and low traffic zones, residen-tial areas and a school zone. Each subject completed this route twice (12 - 16minutes per lap).

In the first lap, drivers are asked to sequentially perform common tasksas described in Figure 3. The first task corresponds to change the built-in carradio to targeted stations (red route in Fig. 3). The second task requires thedriver to input a specific pre-decided address into the GPS and then followthe instructions to the desired destination (green route in Fig. 3). Prelimi-nary analysis on the data indicated that driver behaviors during operatingand following the GPS were different. Therefore, we subdivided this task intwo. Then, the subject is asked to make a phone call from his/her cell phoneto obtain flight information between two US cities (navy blue route in Fig. 3).Due to similar reasons observed in the GPS task, we subdivided this taskin operating and talking on the cell phone. Notice that at the time of therecording, the State of Texas allowed drivers to use cell phones. After that, apassenger shows randomly selected pictures and the driver is asked to describethem (orange route in Fig. 3). The purpose of this task is to collect approxi-mated (and maybe exaggerated) data from distractions caused by billboards,

Advances in Multimodal Tracking of Driver Distraction 7

Fig. 3. Route used for the recording (5.6 miles long). Subjects drove this route twotimes. In the first lap, the subjects performed the tasks in order, starting with theRadio task and ending with the Conversation task. In the second lap, the subjectsdrove the same route without performing any task

sign boards and shops. The last task corresponds to spontaneous conversationbetween the driver and a passenger, who asked few general questions (blackroute in Fig. 3). After subdividing the phone and GPS tasks, the database in-cludes the following 7 tasks: Radio, GPS - Operating, GPS - Following, Phone- Operating, Phone - Talking, Picture and Conversation.

The second lap involves normal driving without any of the aforementionedtasks, which is used as a normal reference. Since the same route is used forboth normal and task conditions, the analysis is less dependent on the selectedroad.

4 Assessing Driver Distraction

After collecting the database, the first research question is to assess the level ofdistraction induced on the drivers by the selected secondary tasks [15]. Defin-ing the ground truth for driver distraction is an crucial problem for trainingand testing systems that aim to identify inattentive drivers. However, this is anontrivial task, since different in-cab activities will create specific demands onthe drivers causing visual, cognitive, auditory, psychological and/or physicaldistractions.

As a first approximation, we conducted perceptual evaluations to assessdriver distraction. A graphical user interface (GUI) was created for this sub-jective evaluation (Fig. 4). This GUI allows the evaluators to watch videos ex-tracted from the frontal camera. They can evaluate the perceived distractionlevel of the driver on a scale from 1 – Less distracted, to 5 – More distracted.

8 Carlos Busso and Jinesh Jain

Fig. 4. Subjective Evaluation GUI. The subjects are required to rate the videobased on how distracted they feel the driver is. (1 for less distraction, 5 for moredistraction.)

Notice that evaluators should be able to identify visual distractions with highaccuracy. However, they may not be able to assess more challenging typeof distractions. Quantifying cognitive or psychological distractions remain anopen challenge.

The database contains over 7 hours of data. However, we decided to as-sess only a portion of this corpus to reduce the time and resources neededfor the evaluation. The corpus was automatically split into 5-sec videos. Foreach of the driver, three videos were randomly selected per task. In addition,we include three videos of the drivers under normal conditions. Therefore,we selected 480 5-sec videos for the evaluation (3 videos × 8 conditions ×20 drivers = 480). Nine students participated in the subjective experiment.They assessed only 160 videos (1 video × 8 conditions × 20 drivers = 160).We read the adopted definition of distraction (Sec. 1) before the evaluationto unify their understanding of distraction. The presentation of the videoswas randomized to avoid biases. With this setup, each video was rated by 3independent evaluators.

Figure 5 gives the means and standard deviations of the perceived distrac-tion level across secondary tasks. The results indicate that the tasks GPS -Following and Phone - Talking are not perceived as distracting as the tasksGPS - Operating and Phone - Operating, respectively. Notice that talking ona cell phone increases the cognitive load of the drivers. Studies have reportedthat the use of cellphone affect the driver performance (e.g., missing traf-fic lights, fail to recognize billboard) [4, 29]. This subjective evaluation doesnot seem to capture this type of distraction. Likewise, Radio and Picture areperceived as distractive tasks.

Advances in Multimodal Tracking of Driver Distraction 9

Fig. 5. Perceived distraction levels based on subjective evaluations. The figure showsthe means and standard deviations for each task across drivers and evaluators.

5 Analysis of Multimodal Features

Our next research problem is to identify multimodal features that can char-acterize inattentive drivers. As mentioned in Section 3, the corpus includesCAN-bus information, videos and audio. Identifying informative features fromthese noninvasive modalities is an important step toward detecting distracteddrivers.CAN-bus: One important source of information is provided by the CAN-bus,which includes steering wheel angle, brake value, vehicle speed, and accelera-tion. The car has also sensors to measure and record the brake and gas pedalpressures. From these continuous streams of data, we estimate the derivativeof the brake and gas pedal information. In addition, we estimate the jitter inthe steering wheel angle, since we expect that drivers involved in secondarytasks will produce more “jittery” behaviors. Vehicle speed is also considered,since it is hypothesized that drivers tend to reduce the speed of the car whenthey are engaged in a secondary task.Frontal video camera: The camera captures frontal views of the drivers.From this modality, we estimate head orientation and eye closure count. Thehead pose is described by the yaw and pitch angles. Head roll movementis hypothesized to be less important, given the considered secondary tasks.Therefore it is not included in the analysis. Likewise, eye closure percentageis defined as the percentage of frames in which the eyelids are lowered below agiven threshold. This threshold is set at the point where the eyes are lookingstraight at the frontal camera. These variables are automatically extractedwith the AFECT software [5]. Previous studies have shown that this toolkitis robust against large datasets and different illumination conditions. Anotheradvantage of this toolkit is that the information is independently estimatedframe-by-frame. Therefore, the errors do not propagate across frames. Unfor-tunately, some information is lost when the head is rotated beyond a certain

10 Carlos Busso and Jinesh Jain

degree or when the face is occluded by the driver’ hands. The algorithm pro-duces empty data in those cases.Microphone array: The acoustic information is a relevant modality for sec-ondary tasks characterized by sound or voice activity such as GPS - Following,Phone - Talking, Pictures and Conversation. Here, we estimate the averageaudio energy from the microphone that is closest to the driver.

The proposed monitoring system segments the data into small windows(e.g., 5 seconds), from which it extracts relevant features. We estimate themean and standard deviation of each of the aforementioned data, which areused as features. Details of other preprocessing steps are described in Jain etal. [14].

After the multimodal features are estimated, we compare their values un-der task and normal conditions. Notice that segments of the road have differentspeed limits and number of turns. Therefore, the features observed when thedriver was engaged in one task (first lap – Sec. 3) is only compared with thedata collected when the driver was not performing any task over the sameroute segment (second lap – Sec. 3). This approach reduces the variabilityintroduced by the route.

We conducted a statistical analysis to identify features that change theirvalues when the driver is engaged in secondary tasks. A matched pairs hy-pothesis test is used to assess whether the differences in the features betweeneach task and the corresponding normal condition are significant. We usedmatched pairs instead of independent sample, because we want to compen-sate for potential driver variability. For each feature f , we have the followinghypothesis test [19]:

H0 : µfnormal − µ

ftask = 0

H1 : µfnormal − µ

ftask 6= 0 (1)

where µfnormal and µf

task are the means of f in normal and task conditions,across speakers. Since the database consists of 20 drivers, we use a t-test forsmall sample,

tf =df

sfd/√n

(2)

where df and sfd represent the mean and standard deviation of the sample of

differences across the n (=20) drivers. Figure 6 shows the features that arefound significant at p-value = 0.05 (dark gray) p-value = 0.10 (gray) and p-value = 0.20 (light gray). The figure shows that mean of the energy and headyaw movement are significantly different for all the tasks (p-value = 0.05). Eyeclosure percentage (blink) is also significantly different for tasks such as Radio,GPS - Following, Phone - Operating and Pictures. The figure also shows thatthere are tasks such as GPS - Following, Phone - Talking, and Conversation

Advances in Multimodal Tracking of Driver Distraction 11

Radio

GPS Ope

ratin

g

GPS Foll

owing

Phone

Ope

ratin

g

Phone

Talk

ing

Pictur

es

Conve

rsatio

n

Jitter − MeanYaw − Mean

Yaw − STDPitch − MeanPitch − STD

BlinkSpeed − Mean

Speed − STDBrake − Mean

Brake − STDGas − MeanGas − STD

Audio Energy − MeanAudio Energy − STD

Fig. 6. Results of the Matched Pairs t-test: Features vs Tasks. For a particular task,gray regions indicate the features that are found to have significant differences (darkgray, p-value=0.05; gray, p-value=0.10; light gray p-value=0.20).

in which few of the selected features present significant differences at p-value= 0.05. Interestingly, these tasks are perceived less distracting than other (seeFig. 5). Notice that for different tasks, there are significant features across allthe modalities considered in this study (CAN-bus, video, and microphone).

While Figure 6 identifies features that are significantly different under nor-mal and task conditions, it does not show the characteristic patterns for eachcondition. Therefore, we estimated the mean and standard deviation of thefeatures for each of the secondary task, across speakers. We also estimatedthese statistics for features observed in normal condition over the correspond-ing route segments. Figure 7 shows the errors plot for a) head yaw movement,b) head pitch movement, c) vehicle speed, and d) steering wheel jitter. Figure7-a reveals that drivers tend to look at their right while performing the tasks.Changes in head pitch movements are more evident in tasks such as Phone -Operating, and Picture (Fig. 7-b). In those cases, drivers tend to look down.Figure 7-c shows that drivers reduce the car speed when they are engaged insecondary tasks. This result is consistently observed across task. Figure 7-dshows that the jitter in the steering wheel is slightly higher in GPS - Operatingand Phone - Operating. However, these differences are not significant (see Fig.6). Figure 7 also shows differences in the features during normal conditionsacross tasks. These differences are inherently dependent on the road. Thisresult suggests that the characteristic of the route is an important variablethat should be considered in the design of automatic feedback systems [27].

Figure 8 shows the percentage of eye closure for the normal and task con-ditions. It can be seen that the closure rates differ from the patterns observedduring normal condition for tasks such as Radio, Phone - Operating, and Pic-tures. For these tasks, the drivers tend to keep their eyelid more open.

Figure 9 provides further information about the differences observed inhead yaw movements for the tasks a) Radio and b) Conversation. The figureprovides the feature distributions for normal and task conditions. Notice that

12 Carlos Busso and Jinesh Jain

−30

−20

−10

0

10

20

Radio

GPS Oper

atin

g

GPS Follo

wing

Phone Oper

atin

g

Phone Talk

ing

Picture

s

Convers

atio

n

NeutralTask

0

5

10

15

20

Radio

GPS Oper

atin

g

GPS Follo

wing

Phone Oper

atin

g

Phone Talk

ing

Picture

s

Convers

atio

n

NeutralTask

(a) HEAD - YAW (b) HEAD - PITCH

20

30

40

50

60

70

Radio

GPS Oper

atin

g

GPS Follo

wing

Phone Oper

atin

g

Phone Talk

ing

Picture

s

Convers

atio

n

NeutralTask

−1

0

1

2

3

4

5

Radio

GPS Oper

atin

g

GPS Follo

wing

Phone Oper

atin

g

Phone Talk

ing

Picture

s

Convers

atio

n

NeutralTask

(c) VEHICLE SPEED (d) VEHICLE JITTER (steering wheel)

Fig. 7. Error-bar plots displaying the mean and standard deviation for: a) Head -Yaw b) Head - Pitch c) Vehicle Speed d) Jitter in the steering wheel

0

20

40

60

80

100

Radio

GPS − Operating

GPS − Following

Phone − Operating

Phone − TalkingPictures

Conversation

Neutral

Task

Fig. 8. Percentage of eye closure in task and normal conditions. The values fornormal conditions are estimated over the features observed in the correspondingroute of the tasks

these are among the most common secondary tasks performed by drivers.The figure shows that both distributions present positive skewness for thetask conditions. The implication is that drivers shift their attention from theroad, which may affect their situational awareness.

Advances in Multimodal Tracking of Driver Distraction 13

−50 −40 −30 −20 −10 0 10 20 30 40 500

0.2

0.4

0.6

Head −Yaw[o]

NeutralMean

−50 −40 −30 −20 −10 0 10 20 30 40 500

0.2

0.4

Head −Yaw[o]

TaskMean

−50 −40 −30 −20 −10 0 10 20 30 40 500

0.2

0.4

0.6

Head −Yaw[o]

NeutralMean

−50 −40 −30 −20 −10 0 10 20 30 40 500

0.2

0.4

Head −Yaw[o]

TaskMean

(a) Radio (b) Conversation

Fig. 9. Distribution of head yaw movements for the tasks (a) Radio and (b) Conver-sation. The distributions are estimated from data recorded during task (dark gray)and normal (light gray) conditions. The vertical lines represent the correspondingmeans.

6 Prediction of Driver Distractions

After studying relevant features that can signal driver distraction, this sec-tion explores whether the proposed multimodal features can be used to detectdriver distractions. The study includes two evaluations. First, we train a clas-sifier to recognize whether the driver is performing any of the secondary tasks(Sec. 6.1). Then, we build a regression model that aims to predict the level ofdistraction of the driver (Sec. 6.2).

6.1 Classification of Secondary Tasks

The perceptual evaluation in Section 4 shows that some secondary tasks areperceived more distracting than others. This result suggests that recognizingdriver engaged in secondary tasks is an important problem. We have proposedbinary classifiers to distinguish between individual tasks and normal condi-tions [14]. Here, we are interested in the multi-class problem of recognizingbetween the 7 tasks and normal conditions (8-class problem) [9]. We arguethat this is a more practical approach that can lead to useful applications.

For this evaluation, we trained a k-Nearest Neighbor classifier. The databaseis split in 5 sec windows which are considered as independent samples. Thetask labels are assigned to the samples according to the task in the corre-sponding route segment. To ensure driver independent results, samples fromone speaker are included either on the training or testing sets. This was imple-mented using a “leave-one-driver-out” cross validation scheme. Table 1 givesthe average accuracies across folds for different values of k. The best per-formance is obtained for k = 20, which gives an accuracy of 42.72%. Thisaccuracy is significantly higher than chances (12.5%).

14 Carlos Busso and Jinesh Jain

Table 1. Accuracies for the Multi-class kNN Classifier for different values of k.

k 4 8 12 16 20

Accuracy 0.3582 0.3958 0.4021 0.4218 0.4272

6.2 Regression Model for Driver Distraction

The second evaluation consists in building a regression model to predict thedistraction level of the driver [9]. The baseline for this experiment is the av-erage of the subjective evaluation results presented in Section 4. Only thesamples that were perceptually evaluated are considered in this analysis (480,5-sec segments with balanced number of samples per task and driver). Themultimodal features are included as dependent variables. The proposed modelincludes interaction and quadratic terms, because they improved the perfor-mance. The coefficient of determination for this model is R2=0.53, whichcorresponds to a correlation of ρ = 0.728. This result shows that the proposedfeatures can be used to predict the distraction level of the driver.

7 Discussion and Conclusions

Any distraction can affect the situational awareness of the driver leading todisastrous consequences. This chapter summarized our current research effortin detecting distracted drivers using multiple modalities. The study was basedon a database collected with real driving conditions, in which 20 drivers wereasked to perform common secondary tasks. Our analysis identified featuresextracted from the CAN-Bus data, a camera and a microphone that presentcharacteristic differences during task conditions. Our results shows that thesemultimodal features can be used to predict the distraction level of the drivers.The proposed metric estimated from a regression model was highly correlatedwith human subjective evaluations (ρ = 0.728).

One weakness in our approach is the ground truth for the distraction level,which is derived from subjective evaluations. The proposed perceptual eval-uation may not capture cognitive and psychological distractions. Indicatorsfor these types of distraction may be derived by conducting workload rat-ings (e.g., SWAT, NASA-TLX) [4], by measuring brain activity with invasivesensors [7], or by indirectly inferring the underlying cognitive state of thedriver (i.e., emotions, stress). In this area, we are working toward detectingthe emotional state of the drivers, especially for negative reactions.

Another area of interest is expanding the set of features. For example, ad-vances in the field of computer vision can lead to algorithms to directly detectdistractive objects (e.g., cell phone) or actions (e.g., eating). The outputs ofthese algorithms can be used as discrete variables in the proposed regressionmodels.

Advances in Multimodal Tracking of Driver Distraction 15

The work presented in this chapter represents our first step towards ob-taining a metric to determine the attention level of the drivers. A real-timealgorithm with such capability will facilitate the design of a feedback sys-tem to alert inattentive drivers, preventing potential accidents, and therefore,improving the overall driver experience.

References

[1] H. Abut, H. Erdogan, A. Ercil, B. Curuklu, H.C. Koman, F. Tas, A.O.Argunsah, S. Cosar, B. Akan, H. Karabalkan, et al. Real-world datacollection with “UYANIK”. In K. Takeda, H. Erdogan, J.H.L. Hansen,and H. Abut, editors, In-Vehicle Corpus and Signal Processing for DriverBehavior, pages 23–43. Springer, New York,NY,USA, 2009.

[2] P. Angkititrakul, D. Kwak, S. Choi, J. Kim, A. Phucphan, A. Sathya-narayana, and J.H.L. Hansen. Getting start with UTDrive: Driver-behavior modeling and assessment of distraction for in-vehicle speechsystems. In Interspeech 2007, pages 1334–1337, Antwerp, Belgium, Au-gust 2007.

[3] P. Angkititrakul, M. Petracca, A. Sathyanarayana, and J.H.L. Hansen.UTDrive: Driver behavior and speech interactive systems for in-vehicleenvironments. In IEEE Intelligent Vehicles Symposium, pages 566–569,Istanbul, Turkey, June 2007.

[4] K. M. Bach, M.G. Jaeger, M.B. Skov, and N.G. Thomassen. Interactingwith in-vehicle systems: understanding, measuring, and evaluating atten-tion. In Proceedings of the 23rd British HCI Group Annual Conference onPeople and Computers: Celebrating People and Technology, Cambridge,United Kingdom, September 2009.

[5] M. Bartlett, G. Littlewort, T. Wu, and J. Movellan. Computer expressionrecognition toolbox. In IEEE International Conference on AutomaticFace and Gesture Recognition (FG 2008), Amsterdam, The Netherlands,September 2008.

[6] L.M. Bergasa, J. Nuevo, M.A. Sotelo, R. Barea, and M.E. Lopez. Real-time system for monitoring driver vigilance. IEEE Transactions on In-telligent Transportation Systems, 7(1):63–77, March 2006.

[7] C. Berka, D.J. Levendowski, M.N. Lumicao, A. Yau, G. Davis, V.T.Zivkovic, R.E. Olmstead, P.D. Tremoulet, and P.L. Craven. EEG corre-lates of task engagement and mental workload in vigilance, learning, andmemory tasks. Aviation, Space, and Environmental Medicine, 78(5):231–244, May 2007.

[8] P. Boyraz, X.Yang, and J.H.L.Hansen. Computer vision applications forcontext-aware intelligent vehicles. In 4th Biennial Workshop on DSP forIn-Vehicle Systems and Safety, Dallas, TX, USA, June 2009.

16 Carlos Busso and Jinesh Jain

[9] C. Busso and J.J. Jain. Modeling of driver behavior in real world scenariosusing multiple noninvasive sensors. IEEE Transactions on IntelligentTransportation Systems, Submitted, 2011.

[10] I.G. Damousis and D. Tzovaras. Fuzzy fusion of eyelid activity indica-tors for hypovigilance-related accident prediction. IEEE Transactions onIntelligent Transportation Systems, 9(3):491–500, September 2008.

[11] Y. Dong, Z. Hu, K. Uchimura, and N. Murayama. Driver inattentionmonitoring system for intelligent vehicles: A review. In IEEE IntelligentVehicles Symposium, pages 875–880, Xi’an, Shaanxi, China, June 2009.

[12] T. Ersal, H.J.A. Fuller, O. Tsimhoni, J.L. Stein, and H.K. Fathy.Model-based analysis and classification of driver distraction under sec-ondary tasks. IEEE Transactions on Intelligent Transportation Systems,11(3):692–701, September 2010.

[13] J.L. Harbluk, Y.I. Noy, P.L. Trbovich, and M. Eizenman. An on-roadassessment of cognitive distraction: Impacts on drivers’ visual behaviorand braking performance. Accident Analysis and Prevention, 39(2):372–379, March 2007.

[14] J. Jain and C. Busso. Analysis of driver behaviors during common tasksusing frontal video camera and CAN-Bus information. In IEEE Inter-national Conference on Multimedia and Expo (ICME 2011), Barcelona,Spain, July 2011.

[15] J.J. Jain and C. Busso. Assessment of drivers distraction using per-ceptual evaluations, self assessment and multimodal feature analysis. In5th Biennial Workshop on DSP for In-Vehicle Systems, Kiel, Germany,September 2011.

[16] M. Kutila, M. Jokela, G. Markkula, and M.R. Rue. Driver distractiondetection with a camera vision system. In IEEE International Conferenceon Image Processing (ICIP 2007), volume 6, pages 201–204, San Antonio,Texas, USA, September 2007.

[17] Y. Liang, M.L. Reyes, and J.D. Lee. Real-time detection of driver cog-nitive distraction using support vector machines. IEEE Transactions onIntelligent Transportation Systems, 8(2):340–350, June 2007.

[18] C.-T. Lin, R.-C. Wu, S.-F. Liang, W.-H. Chao, Y.-J. Chen, and T.-P.Jung. EEG-based drowsiness estimation for safety driving using indepen-dent component analysis. IEEE Transactions on Circuits and Systems I:Regular Papers, 52(12):2726–2738, December 2005.

[19] W. Mendenhall and T. Sincich. Statistics for Engineering and the Sci-ences. Prentice-Hall, Upper Saddle River, NJ, USA, 2006.

[20] E. Murphy-Chutorian and M.M. Trivedi. Head pose estimation and aug-mented reality tracking: An integrated system and evaluation for moni-toring driver awareness. IEEE Transactions on Intelligent TransportationSystems, 11(2):300–311, June 2010.

[21] V. Neale, T. Dingus, S. Klauer, J. Sudweeks, and M. Goodman. Anoverview of the 100-car naturalistic study and findings. Technical Report

Advances in Multimodal Tracking of Driver Distraction 17

Paper No. 05-0400, National Highway Traffic Safety Administration, June2005.

[22] A. Perez, M.I. Garcia, M. Nieto, J.L. Pedraza, S. Rodriguez, andJ. Zamorano. Argos: An advanced in-vehicle data recorder on a mas-sively sensorized vehicle for car driver behavior experimentation. IEEETransactions on Intelligent Transportation Systems, 11(2):463–473, June2010.

[23] F. Putze, J.-P. Jarvis, and T. Schultz. Multimodal recognition of cogni-tive workload for multitasking in the car. In International Conference onPattern Recognition (ICPR 2010), Istanbul, Turkey, August 2010.

[24] T. Rahman, S. Mariooryad, S. Keshavamurthy, G. Liu, J.H.L. Hansen,and C. Busso. Detecting sleepiness by fusing classifiers trained with novelacoustic features. In 12th Annual Conference of the International SpeechCommunication Association (Interspeech’2011), Florence, Italy, August2011.

[25] T.A. Ranney. Driver distraction: A review of the current state-of-knowledge. Technical Report DOT HS 810 787, National Highway TrafficSafety Administration, April 2008.

[26] T.A. Ranney, W.R. Garrott, and M.J. Goodman. NHTSA driver dis-traction research: Past, present, and future. Technical Report PaperNo. 2001-06-0177, National Highway Traffic Safety Administration, June2001.

[27] A. Sathyanarayana, P. Boyraz, Z. Purohit, R. Lubag, and J.H.L. Hansen.Driver adaptive and context aware active safety systems using CAN-bussignals. In IEEE Intelligent Vehicles Symposium (IV 2010), San Diego,CA, USA, June 2010.

[28] A. Sathyanarayana, S. Nageswaren, H. Ghasemzadeh, R. Jafari, andJ.H.L.Hansen. Body sensor networks for driver distraction identification.In IEEE International Conference on Vehicular Electronics and Safety(ICVES 2008), Columbus, OH, USA, September 2008.

[29] D. L. Strayer, J. M. Cooper, and F. A. Drews. What do drivers fail tosee when conversing on a cell phone? In Proceedings of Human Factorsand Ergonomics Society Annual Meeting, volume 48, New Orleans, LA,USA, September 2004.

[30] M.C. Su, C.Y. Hsiung, and D.Y. Huang. Ieee international conferenceon systems, man and cybernetics ( smc 2006). In A Simple Approachto Implementing a System for Monitoring Driver Inattention, volume 1,pages 429–433, Taipei, Taiwan, October 2006.

[31] F. Tango and M. Botta. Evaluation of distraction in a driver-vehicle-environment framework: An application of different data-mining tech-niques. In P. Perner, editor, Advances in Data Mining. Applications andTheoretical Aspects, volume 5633 of Lecture Notes in Computer Science,pages 176–190. Springer Berlin / Heidelberg, Berlin, Germany, 2009.

[32] I. Trezise, E.G. Stoney, B. Bishop, J. Eren, A. Harkness, C. Langdon,and T. Mulder. Inquiry into driver distraction: Report of the road safety

18 Carlos Busso and Jinesh Jain

committee on the inquiry into driver distraction. Technical Report No.209 Session 2003-2006, Road Safety Committee, Parliament of Victoria,Melbourne, Victoria, Australia, August 2006.

[33] J. Yang, T.N. Chang, and E. Hou. Driver distraction detection for vehicu-lar monitoring. In Annual Conference of the IEEE Industrial ElectronicsSociety (IECON 2010), Glendale, AZ, USA, November 2010.


Recommended