+ All Categories
Transcript

Abstract—Driver’s body posture in 3D contains information potentially related to driver intent, driver affective state, and driver distraction. In this paper, we discuss issues and possibilities in developing a vision-based, markerless system to systematically explore the role of 3D driver posture dynamics for driver assistance. At high level, two main emphases in the proposed system are: (i) The coordination between real world driving testbed and simulation environment and (ii) The usefulness of driver posture dynamics is studied not only as an individual cue but also in relation with other contextual information (e.g. head dynamics, facial features, and vehicle dynamics). Some initial results in our experiment following these guidelines show the feasibility and promise of extracting and using 3D driver posture dynamics for driver assistance.

I. INTRODUCTION OOKING at driver to understand driver state (e.g. affective states, distraction states) and intention is an

important component in driver assistance systems. It is reported that human errors caused a large portion of roadway accidents [1]. In this paper, we concern about vision-based, markerless systems looking at driver and analyzing driver state and intention. Vision-based approach provides a more natural and non-contact solution compared to using bio or physiological sensors which require driver to wear some specific devices. It should also be mentioned that an effective intelligent driver assistance systems need to be human centric and work in a holistic manner which takes into account different components including sensors for environment (e.g. looking at roads, other cars) and sensors for vehicle (e.g. looking at steering angle, vehicle speed) beside looking at driver [18].

In vision-based systems for driver state and intention analysis, lots of related research studies focus on using features related to head and face. For example, head pose and/or eye gaze were used for predicting lane change intent [8, 11, 14]. Head movement, eye movement, and facial features were used for monitoring driver mental state [3, 22], for detecting fatigue state [21, 23]. There were also some research studies using hand position, e.g. hand position was incorporated with head pose for lane change intent prediction [6], for driver distraction alert system [17]. In [5], the authors proposed a method for determining whether driver or passenger’s hand is in infotainment area which also has connection with driver distraction.

Authors are with LISA: Laboratory for Safe and Intelligent Automobiles,

University of California at San Diego (http://cvrr.ucsd.edu/lisa). Email: {cutran, mtrivedi}@ucsd.edu.

We see that just hand position already contains important and useful information for driver assistance. The whole 3D driver posture, which is more informative with torso, head, and arms dynamics, seems to be a very potential cue for driver assistance system. Therefore exploring the role of 3D driver posture dynamics for driver assistance in a systematic manner is useful and needed. In this paper, we will discuss several related issues and possibilities of using some developed techniques for such task. Following these discussions are some initial experimental results which show the feasibility and promise of extracting and using 3D posture dynamics for driver support systems.

II. RELATED STUDIES Driver posture dynamics in 3D is informative and can

help to develop better driver support systems. Study in [2] indicated the relation between sitting posture and affective state, e.g. having slumped pose after a failure, upright pose after a success. It also pointed out the importance of mood congruent interaction in smart interactive system. In [13], a marker-based posture tracking system was used to study the relation between postural stability and driver controlling state. There were also research studies using posture information for passive driver assistance systems, e.g. in [19] sitting posture was used to adjust airbag deployment, in [4] driver posture was studied to build a more comfortable driver cockpit. The role of driver posture dynamics in active driver assistance, e.g. detecting driver intents, driver states and then having appropriate interaction to improve driver safety as well as comfortability, has not been studied much.

Fig. 1 shows some possible ranges of driver posture movement which seems to have connection to driver state and intention. For example, leaning backward might indicate relax position, leaning forward indicates concentration. Before doing some specific tasks, driver may also have some posture changes in preparation such as moving head forward to prepare for a better visual check before lane change (Sect. IV A has some real-world driving illustrations). We will discuss several related issues and possibilities towards the goal of systematically exploring the role of 3D posture dynamics in vision-based, active driver assistance. At high level, two main emphases in developing our testbeds and approaches are (i) the need of coordination between real world driving testbed and simulation environment and (ii) the usefulness of driver posture dynamics should be studies not only as an individual cue but also in a holistic manner with other contextual information.

Towards a Vision-based System Exploring 3D Driver Posture Dynamics for Driver Assistance: Issues and Possibilities

Cuong Tran and Mohan M. Trivedi

L

2010 IEEE Intelligent Vehicles SymposiumUniversity of California, San Diego, CA, USAJune 21-24, 2010

TuB1.30

978-1-4244-7868-2/10/$26.00 ©2010 IEEE 179

Fig. 1. Illustration of some possible range of driver posture movement during driving

Fig. 2. Coordination between real world and simulation environment

III. DEVELOPING A VISION BASED SYSTEM FOR EXPLORING THE ROLE OF 3D POSTURE DYNAMICS IN DRIVER ASSISTANCE

A. Coordination between Real World Driving Testbed and Simulation Environment

Working with real world driving testbed is important and is the ultimate goal. However simulation environment can provide more flexibility in configuring sensors and designing experiment tasks for deeper analysis which might be difficult and unsafe for implementing in real world driving. Hence, the coordination between real world driving and simulation environment is useful and we take it into account when developing our system.

As shown in Fig. 2, observations from real world driving data can initiate the design of simulation experiment. Then this simulation experiment can be modified and improved to achieve several desired analyses. However, there are always gaps between simulation environment and real world. This happens even with nowadays complex and expensive simulations. For example, the realistic feel of driver about the vehicle dynamics and surround environment will be different. There are several random difficulties which only happen in real world situation such as sudden difficult lighting condition, highly dynamics background. Therefore the usefulness of analyses and findings in simulation environment should again be verified with the real world driving data.

B. A System for Exploring the Role of 3D Upper Body Pose in Combination with Other Contextual Information

Following the underlying principle of a holistic sensing approach [18], the interaction between different cues is important for an effective driver assistance system. A cue might seem to be not very useful and related when be considered separately but using it in combination with other contextual information cues could help improving the whole system performance. Fig. 3 shows the flowchart of our proposed system for studying the role of driver posture

Fig. 3. General flowchart of the system using 3D posture dynamics for driver assistance

dynamics in combination with other contextual information. First, the inputs from different contextual sensors (observing driver, environment, and vehicle state) need to be captured in a synchronous manner. Contextual information from different sensors is then extracted separately. In the next step, extracted contextual information can be fused in different combination sets to analyze driver state and intent. Finally based on the above analysis, the system will interact to assist driver when needed. Different types of interaction can be used such as visual interaction (e.g. using Active Heads-up Display [9]), audio interaction (e.g. using beep sound), or mechanical interaction (e.g. lightly shaking the steering wheel).

This paper focus on part looking at driver so we will go into more details of related components in the following sections.

180

XMOB (Extremity Movement Observation) for upper body pose tracking [16]

The skeletal upper body model used in XMOB upper body tracker is shown in Fig. 4. The length of body parts are considered fixed, which means there is only kinematics movement at the joints. There are 4 joints in the model: two shoulder joints, each has 3 Degree of Freedom (DOF) and two 1 DOF elbow joints. An upper body pose at time t can be represented by 7 positions in 3D of inner joints and extremities

Xt = {Pthea, Pt

lha, Ptrha, Pt

lsh, Ptrsh, Pt

leb, Ptreb}

The idea of XMOB is to break the 3D upper body pose tracking problem into two sub-problems: First track the 3D movements of extremal parts, i.e. head and hands {Pt

hea, Ptlha,

Ptrha). Then using human knowledge of upper body

configuration constraints, those 3D extremity movements are used to predict the whole upper body pose sequence as an inverse kinematics problem.

The underlying motivation is that extremities are easier to track with less occlusion compared to inner body parts like elbow and shoulder joints. Moreover by breaking the problem of high dimensional search for 3D upper body pose into two sub-problems, the complexity is reduced considerably to achieve real-time performance (XMOB run at ~15 frames per second on an Intel Core i7 3.0 GHz). On the other hand since human upper body kinematics is redundant, i.e. with the same head and hand positions, there could be many solutions for upper body pose, the inverse kinematics problem mentioned above is ambiguous. To deal with this ambiguity, XMOB took a numerical approach utilizing the dynamics information, not just head and hand positions at a single frame. First at each frame, XMOB determines a set of inner joints candidates from head and hand positions based on geometric constraints between upper body inner joints and extremities. Then by observing extremity movements in a period of time, XMOB uses assumptions of minimizing total joint displacement to predict the corresponding upper body sequence. Head pose tracking with HyHOPE (Hybrid Head Orientation and Pose Estimation) [12]

We use HyHOPE which is a real time, robust head pose tracking method from monocular view [12] to extract head dynamics information. HyHOPE combines static head pose estimation with a real time 3D model-based tracking system for better tracking performance. From an initial estimate of head position and orientation, the system generates a texture mapped 3D model of the head from the most recent head image and using particle filter approach to find the best match of this 3D model from each subsequent frame. HyHOPE also use GPU (Graphics Processing Unit) programming to parallelize computations and achieve real-time performance (~ 30 frames per second). Gabor Wavelet Filters for Facial Feature Extraction

Fig. 4. Upper body model used in XMOB upper body tracker

Gabor wavelet filter closely models the response function of simple cells in primary visual cortex consisting of a Gaussian kernel function modulated by a sinusoidal plane wave. Using Gabor filters has been shown to be a good feature extraction method for recognition Facial Action Coding System (FACS) [20].

Since Gabor features are good for FACS recognition, which can be considered as basic facial movements, they should also be an effective representation for facial dynamics. However when applied to real world driving situation, there are several issues that need to be considered like challenging lighting condition, shadow. Moreover since typically facial features can only be extracted reliably on a frontal face, we will use head pose information from HyHOPE to only extract Gabor facial features when driver head looks straight. Driver state and intent analysis using different combination of extracted contextual information

This analysis step can either follow: • A rule-based approach, e.g. IF No hands on wheel

AND Head turn away THEN Alert for a serious distraction like what have been done in [17] or

• A statistical learning approach such as using Relevance Vector Machine (RVM) method [15] which was shown to be quite effective in analyzing multimodal data from different sensors for driver intent prediction [8, 11]. RVM can produce a sparse representation of the data from a large feature set for classification and it also provides a probability output of class membership (versus a binary output, e.g. Support Vector Machine method) which can be useful for situations when output ranking is needed

IV. EXPERIMENT

A. Real World Driving Data Testbed LISA-P and Some Motivate Observations Fig.5 shows our real world testbed LISA-P (http://cvrr.ucsd.edu/LISA/index.html). As discussed in Sect. III.A, we utilize data collected from our LISA-P testbed to determine some initial scenarios for experiment setup in simulation environment.

Fig.6 illustrates some observations from real world driving data that we are interested in.

181

Fig. 5. LISA-P real world testbed with the positions of 3 cameras observing driver (two for upper body, one for head and face).

Driver leans forward before head turn for a more careful lane change visual check, which is quite common in our freeway driving data. Incorporating detection of such body posture movement would indicate a higher probability of lane change intent compared to e.g. using head turn only.

Driver leaned to the right to better hear passenger in a conversation which might be an indication of distraction

Fig. 6. Some observations from real world driving data

• We observe that on highway driving, for a more careful lane change visual check drivers commonly tend to leans forward before making a head turn. This observation is kind of related to relax and concentrated sitting pose. By incorporating detection of such driver posture movement with e.g. head turn information, we will have a stronger indication of lane change intent.

• Sometimes, driver leans to the right to better hear the passenger in a conversation which might be used as one of indications for distraction.

• On highway driving, we also observed that driver looks more serious before doing lane change, e.g. a head turn with a smile on the face commonly happens only when driver is talking to other passengers, not in a lane change. This kind of information could help to reduce false alarm in predicting lane change intent using head pose.

B. Driving Simulation Environment LISA-S Fig. 7 shows our simulation environment which we call

LISA-S. There are 2 cameras used for 3D driver posture tracking, 1 camera for head and face tracking, and we also install a stereo eye tracking system. The steering wheel is the same size with real steering wheel and can turn 450

Fig. 7. LISA-S Simulation Environment

Fig. 8. Some visualization samples of Gabor features extraction from face image in LISA-S data

degree each direction (900 degree in total). We use the TORCS open source as driving simulator

(http://torcs.sourceforge.net/) with which we can design the road track, environment scene, and control the dynamics of ego vehicle as well as other vehicle.

C. Some Initial Results To demonstrate the ability to extract concerned contextual

information including 3D upper body posture dynamics, head dynamics, facial features: Fig. 8 shows some visual samples of apply Gabor wavelets with different spatial frequencies and orientations on face image for facial feature extraction. For more reliable feature extraction, we only extract facial feature when it is close to the frontal view (determined from HyHOPE head tracking). Fig 9-12. shows pretty good visual evaluation of 3D XMOB upper body pose tracking and HyHOPE head pose tracking in both real world driving sequence (from LISA-P) and simulation driving sequence (from LISA-S).

Fig. 13 shows quantitative plots (compared with manually annotated ground truth) of a simple analysis using extracted contextual information of 3D upper body pose and head pose to determine some concerned events. From 3D driver posture dynamics, driver state is classified into relax or concentrated. Head dynamics is classified into look straight, turn to the left, or turn to the right. Moreover when combining head and 3D posture information, we can see some examples that driver change from relax to concentrated state and then make a head turn which could be a strong indicator of lane change intent.

Regarding the interaction part, Fig. 14 illustrate example of using Active Head-ups Display for visual interaction with driver in the assistance system for “Keeping hands on the wheel and eyes on the road” [17].

V. CONCLUDING REMARKS AND FUTURE WORK In this paper, we discussed several related issues and

possibilities of developing a vision-based, markerless system

182

for systematically exploring the role of 3D driver posture dynamics for active driver assistance. Some initial results in our experiment indicated the feasibility and promise of extracting and using 3D driver posture in combination with other contextual information to analyze driver affective state and intent.

Based on these discussions as well as initial implementation and results, the obvious follow-up work is to design more natural test cases in the LISA-S simulation environment for analyzing the usefulness of 3D driver posture information. Besides using intuitive rule-based approach similar to approach in [17], some statistical learning methods like RVM will also be used for analysis and comparison.

ACKNOWLEDGMENT We thank our colleagues at CVRR lab for useful

discussions and assistances, especially Mr. Anup Doshi who played a main role in setting up the LISA-S simulation environment.

REFERENCES [1] "World Report on Road Traffic Injury Prevention : Summary,"

Technical Report, World Health Organization, 2004. [2] H.I. Ahn, A. Teeters, A. Wang, C. Breazeal, and R.W. Picard,, "Stoop

to Conquer: Posture and affect interact to influence computer users' persistence," The 2nd International Conference on Affective Computing and Intelligent Interaction, 2007.

[3] S. Baker, I. Matthews, J. Xiao, R. Gross, T. Ishikawa, and T. Kanade, “Real-time non-rigid driver head tracking for driver mental state estimation,” Robot. Inst., Carnegie Mellon Univ., Tech. Rep. 04-10, Feb. 2004.

[4] R. Brodeur, H.M. Reynolds, K. Rayes, and Y. Cui, “The Initial Position and Postural Attitudes of Driver Occupants, Posture”, ERL-TR-95-009, Ergonomics Research Laboratory, 1996.

[5] Shinko Y. Cheng and Mohan M. Trivedi, "Vision-based Infotainment User Determination by Hand Recognition for Driver Assistance,” IEEE Transactions on Intelligent Transportation Systems, 2010

[6] S. Cheng and M. M. Trivedi, "Turn-Intent Analysis Using Body Pose for Intelligent Driver Assistance", IEEE Pervasive Computing, 5(4):28-37, Oct-Dec 2006.

[7] A.Datta, Y. Sheikh, and T. Kanade, “Linear Motion Estimation for Systems of Articulated Planes”, IEEE Conference on Computer Vision and Pattern Recognition, 2008.

[8] A. Doshi and M. M. Trivedi, "On the Roles of Eye Gaze and Head Pose in Predicting Driver's Intent to Change Lanes", IEEE Transactions on Intelligent Transportation Systems, September 2009.

[9] A. Doshi, S. Y. Cheng, and M. M. Trivedi, "A Novel, Active Heads-Up Display for Driver Assistance", IEEE Transactions on Systems, Man, and Cybernetics, Part B, Feb 2009.

[10] V. Ferrari, M. Jimenez, A. Zisserman, “Progressive Search Space Reduction for Human Pose Estimation”, IEEE Conference on Computer Vision and Pattern Recognition, 2008.

[11] J. McCall, D. Wipf, M. M. Trivedi, B. Rao, "Lane Change Intent Analysis Using Robust Operators and Sparse Bayesian Learning", IEEE Transactions on Intelligent Transportation Systems, Sept 2007.

[12] E. Murphy-Chutorian and M. M. Trivedi, "HyHOPE: Hybrid Head Orientation and Position Estimation for Vision-based Driver Head Tracking", IEEE Intelligent Vehicles Symposium, 2008.

[13] A Petersen, R Barrett, “Postural Stability and Vehicle Kinematics During an Evasive Lane Change Manoeuvre: A Driver Training Study”, Ergonomics, Vol. 52, Issue 5, May 2009.

[14] L. Tijerina, W. R. Garrott, D. Stoltzfus, and E. Parmer, “Eye glance behavior of van and passenger car drivers during lane change decision phase,” Trans. Res. Rec., vol. 1937, pp. 37–43, 2005.

[15] M. E. Tipping, “Sparse Bayesian learning and the relevance vector machine,” J. Mach. Learn. Res., vol. 1, pp. 211–244, Sep. 2001.

[16] C. Tran and M. M. Trivedi, "Introducing 'XMOB': Extremity Movement Observation Framework for Upper Body Pose Tracking in 3D," IEEE International Symposium on Multimedia, 2009.

[17] C. Tran and M. M. Trivedi, "Driver Assistance for 'Keeping Hands on the Wheel and Eyes on the Road'," IEEE International Conference on Vehicular Electronics and Safety, 2009.

[18] M. M. Trivedi, S. Cheng, "Holistic Sensing and Active Displays for Intelligent Driver Support Systems", In IEEE Computer Magazine, May 2007.

[19] M. M. Trivedi, S. Cheng, E. Childers, S. Krotosky, "Occupant Posture Analysis with Stereo and Thermal Infrared Video: Algorithms and Experimental Evaluation,” IEEE Transactions on Vehicular Technology, Special Issue on In-Vehicle Vision Systems, Volume: 53, Issue: 6, November 2004.

[20] M.B. Stewart, J.R. Movellan, G.C. Littlewort, B. Braathen, M.G. Frank, and T.J. Sejnowski, "Towards automatic recognition of spontaneous facial actions”, In P. Ekman (Ed.), What the Face Reveals, 2nd Edition, Oxford University Press, 2005.

[21] E. Vural, M. Çetin, A. Erçil, G. Littlewort, M. S. Bartlett, J. R. Movellan: "Drowsy Driver Detection Through Facial Movement Analysis", IEEE International Conference on Computer Vision –Human Computer Interaction, 2007.

[22] Y. Zhu and K. Fujimara, “Head pose estimation for driver monitoring,” IEEE Intell. Vehicles Symp., 2004.

[23] Z. Zhu and Q. Ji, “Real Time and Non-intrusive Driver Fatigue Monitoring”, IEEE International Conference on Intelligent Transportation Systems, 2004.

Fig. 9. Visual evaluation of HyHOPE head pose tracking (top) and 3D XMOB upper body tracking superimposed on input image (bottom) in some real driving sequences from LISA-P

183

Fig. 10. Visual evaluation of XMOB upper body tracking in 3D from a driving sequence in LISA-S. White blobs are 3D voxel reconstructed from 2-view skin segmentation of head and hands. Color lines are the estimated 3D upper body pose

Fig. 11. Visual evaluation of HyHOPE head pose tracking from sequences in LISA-S simulation environment

Fig. 12. Visual evaluation 3D XMOB upper body tracking superimposed on input images. Top – subject 1, view 2. Bottom – Subject 2, view 1 (LISA-S environment)

Fig. 13. Quantitative plot for comparison between extracted results from XMOB and HyHOPE and the manually annotated ground truth

Fig. 14. Illustrative example of using Active Head-ups Display for visual interaction with driver in the assistance system for “Keeping hands on the wheel and eyes on the road” [17]

184


Top Related