+ All Categories
Home > Documents > Data-Driven Sub-Units and Modeling Structure for...

Data-Driven Sub-Units and Modeling Structure for...

Date post: 21-Jan-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
1
Vassilis Pitsikalis, Stavros Theodorakis and Petros Maragos School of ECE, National Technical University of Athens 15773, Greece Data-Driven Sub-Units and Modeling Structure for Continuous Sign Language Recognition with Multiple Cues 2. Outline-Contributions Visual Front-End + Feature Extraction: Hands’ centroid movement-position measurements (2D): positions, derivatives, velocity, dynamics. Data-Driven Sub-Unit Construction exploiting Multiple Cues Model-based sign segmentation + Dynamic vs. Static Labels Dynamic vs. Static Specific Modeling: Features, Architecture, Parameters. HMM Recognition framework + Evaluation 1. Sign Language Motivation Visual patterns are formed by hand shapes, manual or general body motion and facial expressions. Word in spoken language Sign. Position, Movement, Hand Shape, Orientation, Facial. Sub-Units: different nature compared to speech phoneme unit; lack of annotated data Focus on automatic data-driven modeling of sub-units without any linguistic phonological information. Goal: continuous Sign Language Recognition. 3. Visual Front-End Hand and Head Detection Probabilistic skin color model, Color force in a Geodesic Active Regions model [2]. Occlusion Handling Features: Movement-Position of the hands. Simple Features Effect on Sub-Unit modeling . Movement-Position: Among the main cues that describe a sign [4, 9]. Fitting ellipses to hands after segmentation 4. Overview 7. Feature Normalizations X X X Y X 8. Subunit Construction D/S Modeling Structure: Appropriate feature/ model architecture for each case. Feature Normalizations Intuitive Subunits per cue SU-1, SU-2 SU-3, SU-4 Y Direction SU-1 SU-2 Y X Scale Y X coordinate Position SU-1, SU-2 SU-3, SU-4 10. SU-Sequence to Sign Mapping Apparent Signs Similarity Metric Effective Signs References [1] B. Bauer and K. F. Kraiss. 2001. Towards an automatic sign language recognition system using subunits. In Proc. of Intl Gesture Workshop, 2298 : 6475. [2] O. Diamanti and P. Maragos. 2008. Geodesic active regions for segmentation and tracking of human gestures in sign language videos. In Proc. Int’l Conf. Image Processing (ICIP) 2008. [3] P. Dreuw, C. Neidle, V. Athitsos, S. Sclaroff, and H. Ney. 2008. Benchmark databases for video-based automatic sign language recognition. In Proc. Int’l Conf. on Language Resources and Evaluation (LREC), May. [4] K. Emmorey. 2002. Language, Cognition, and the Brain: Insights from Sign Language Research. Erlbaum. [5] G. Fang, X. Gao, W. Gao, and Y. Chen. 2004. A novel approach to automatically extracting basic units from chinese sign language. In Proc. Int’l Conf. Pattern Recognition 2004. [6] J. Han, G. Awad, and A. Sutherland. 2009. Modelling and segmenting subunits for sign language recognition based on hand motion analysis. Pat. Rec. Lett., 30(6) : 623633. [7] B. H. Juang and L. R. Rabiner. 1985. A probabilistic distance for hidden markov models. AT & T Technical Journal. [8] S. K. Liddell and R. E. Johnson. 1989. American sign language: The phonological base. Sign Language Studies, 64 : 195 277. [9] S. Ong and S. Ranganath. 2005. Automatic sign language analysis: A survey and the future beyond lexical meaning. In Pattern Analysis Machine Intelligence 2005, 27(6) : 873891. [10] N. Paragios and R. Deriche. 2002. Geodesic Active Regions: A New Framework to Deal with Frame Partition Problems in Computer Vision. Journ. of Vis. Commun. and Image Repres., 13(1/2) : 249268. [11] L. R. Rabiner. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE, 77 : 257286. [12] P. Smyth. 1997. Clustering sequences with hidden markov models. In Advances in Neural Information Processing Systems, 9 : 648654. [13] C. Vogler and D. Metaxas. 2003. Handshapes and movements: Multiple-channel american sign language recognition. In Proc. of Intl Gesture Workshop, 247258. 12. SL Recognition Experiments Continuous American Sign Language [3]: 843 utterances, 406 words, 4 signers, Uniform background. Sign level transcriptions; English Glosses; annotated start/end points. BU400 HQ, 6 videos, 648×484 frames, 60fps. Most frequent Glosses (94); cross-validate; train/test 60-40%; For further information Please contact [sth, vpitsik, maragos]@cs.ntua.gr. More information can be found at http://cvsp.cs.ntua.gr and http://www.dictasign.eu . Acknowledgments This research work was supported by the EU under the research program Dictasign with grant FP7-ICT-3-231135. We also wish to thank Boston University and C. Neidle for providing the BU400 database. 5 6 Static Segment Dynamic Segment Segmentation Point 6. Segmentation Sample 11.SU Sequence to Sign Map: Sample SU-1, SU-2 SU-3, SU-4 X Position Dynamics Zoom # Clusters in Static SUs Sign Accuracy % K=21 Position Sign Accuracy % SU Accuracy % Discrimination Factor D2S2 Subunit instances (3) shown schematically on superimposed initial and final corresponding frames. D1S4 D1S4 X 9. Lexicon Sample BECAUSE P1: HP10 BECAUSE P2: HP10 MD1 BECAUSE P3: HP10 MD1 HP10 BETTER P3: MD1 HP2 BETTER P4: MD1 HP8 BECAUSE P1: HP10 BECAUSE P2: HP10 MS1 BECAUSE P3: HP10 MS1 HP10 BETTER P3: MS1 HP2 BETTER P4: MS1 HP8 BECAUSE P1: HP10 BECAUSE P2: HP10 MSPn1 BECAUSE P3: HP10 MSPn1 HP10 BETTER P3: MSPn1 HP2 BETTER P4: MSPn1 HP8 Gloss:SU-Seq Features: Dynamic + Static MSPn + P D + P S + P 5. Segmentation + Dynamic/Static Automatic Segmentation Manual Segmentation (by LIMSI CNRS) Feature Level Dynamic (D) Static (S) Accelerating (A) Uniform (U) Acceleration Frames Frames S S D D A U U Model Level vs. Velocity
Transcript

Vassilis Pitsikalis, Stavros Theodorakis and Petros MaragosSchool of ECE, National Technical University of Athens 15773, Greece

Data-Driven Sub-Units and Modeling Structure for

Continuous Sign Language Recognition

with Multiple Cues

2. Outline-Contributions Visual Front-End + Feature Extraction: Hands’ centroid movement-position measurements (2D): positions, derivatives, velocity, dynamics.

Data-Driven Sub-Unit Construction exploiting Multiple Cues

Model-based sign segmentation + Dynamic vs. Static Labels

Dynamic vs. Static Specific Modeling: Features, Architecture, Parameters.

HMM Recognition framework + Evaluation

1. Sign Language – MotivationVisual patterns are formed by hand shapes, manual or general body motion and facial expressions.

Word in spoken language Sign.

Position, Movement, Hand Shape, Orientation, Facial.

Sub-Units: different nature compared to speech phoneme unit; lack of annotated data

Focus on automatic data-driven modeling of sub-units without any linguistic – phonological information.

Goal: continuous Sign Language Recognition.

3. Visual Front-EndHand and Head Detection

Probabilistic skin color model,

Color force in a Geodesic Active Regions model [2].

Occlusion Handling

Features: Movement-Position of the hands.

Simple Features Effect on Sub-Unit modeling .

Movement-Position: Among the main cues that describe a sign [4, 9].

Fitting ellipses to hands after

segmentation

4. Overview

7. Feature Normalizations

XXX

Y

X

8. Subunit Construction

D/S Modeling Structure: Appropriate feature/ model architecture for each case.

Feature Normalizations

Intuitive Subunits per cue

SU-1, SU-2

SU-3, SU-4

Y

Direction

SU-1

SU-2

Y

X

Scale

Y

X coordinate

Position

SU-1, SU-2

SU-3, SU-4

10. SU-Sequence to Sign Mapping

Apparent Signs

Sim

ilar

ity

Met

ric

Effective Signs

References[1] B. Bauer and K. F. Kraiss. 2001. Towards an automatic sign language recognition system using subunits. In Proc.of Int’ l Gesture Workshop, 2298 : 64–75.

[2] O. Diamanti and P. Maragos. 2008. Geodesic active regions for segmentation and tracking of human gestures insign language videos. In Proc. Int’l Conf. Image Processing (ICIP) 2008.

[3] P. Dreuw, C. Neidle, V. Athitsos, S. Sclaroff, and H. Ney. 2008. Benchmark databases for video-based automaticsign language recognition. In Proc. Int’l Conf. on Language Resources and Evaluation (LREC), May.

[4] K. Emmorey. 2002. Language, Cognition, and the Brain: Insights from Sign Language Research. Erlbaum.

[5] G. Fang, X. Gao, W. Gao, and Y. Chen. 2004. A novel approach to automatically extracting basic units fromchinese sign language. In Proc. Int’l Conf. Pattern Recognition 2004.

[6] J. Han, G. Awad, and A. Sutherland. 2009. Modelling and segmenting subunits for sign language recognitionbased on hand motion analysis. Pat. Rec. Lett., 30(6) : 623–633.

[7] B. H. Juang and L. R. Rabiner. 1985. A probabilistic distance for hidden markov models. AT & T Technical Journal.

[8] S. K. Liddell and R. E. Johnson. 1989. American sign language: The phonological base. Sign Language Studies, 64 :195 – 277.

[9] S. Ong and S. Ranganath. 2005. Automatic sign language analysis: A survey and the future beyond lexicalmeaning. In Pattern Analysis Machine Intelligence 2005, 27(6) : 873–891.

[10] N. Paragios and R. Deriche. 2002. Geodesic Active Regions: A New Framework to Deal with Frame PartitionProblems in Computer Vision. Journ. of Vis. Commun. and Image Repres., 13(1/2) : 249–268.

[11] L. R. Rabiner. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition.Proceedings of the IEEE, 77 : 257–286.

[12] P. Smyth. 1997. Clustering sequences with hidden markov models. In Advances in Neural Information ProcessingSystems, 9 : 648–654.

[13] C. Vogler and D. Metaxas. 2003. Handshapes and movements: Multiple-channel american sign languagerecognition. In Proc. of Int’ l Gesture Workshop, 247–258.

12. SL Recognition ExperimentsContinuous American Sign Language [3]: 843 utterances, 406 words, 4 signers, Uniform background. Sign level transcriptions; English Glosses; annotated start/end points.

BU400 HQ, 6 videos, 648×484 frames, 60fps. Most frequent Glosses (94); cross-validate; train/test 60-40%;

For further informationPlease contact [sth, vpitsik, maragos]@cs.ntua.gr. More information

can be found at http://cvsp.cs.ntua.gr and http://www.dictasign.eu.

Acknowledgments

This research work was supported by the EU under the research program Dictasign with grant FP7-ICT-3-231135. We also wish to thank Boston University and C. Neidle for providing the BU400 database.

5 6

Static Segment Dynamic Segment

Segmentation Point

6. Segmentation Sample 11.SU Sequence to Sign Map: Sample

SU-1, SU-2

SU-3, SU-4

X

Position

Dynamics

Zoom

# Clusters in Static SUs

Sig

n A

ccu

racy

%

K=21

Position

Sig

n A

ccu

racy

%

SU

Acc

ura

cy %

Dis

crim

inat

ion

Fac

tor

D2S2

Subunit instances (3) shown schematically on superimposed initial and final corresponding frames.

D1S4 D1S4

X

9. Lexicon Sample

BECAUSE P1: HP10

BECAUSE P2: HP10 MD1

BECAUSE P3: HP10 MD1 HP10

BETTER P3: MD1 HP2

BETTER P4: MD1 HP8

BECAUSE P1: HP10

BECAUSE P2: HP10 MS1

BECAUSE P3: HP10 MS1 HP10

BETTER P3: MS1 HP2

BETTER P4: MS1 HP8

BECAUSE P1: HP10

BECAUSE P2: HP10 MSPn1

BECAUSE P3: HP10 MSPn1 HP10

BETTER P3: MSPn1 HP2

BETTER P4: MSPn1 HP8

Gloss:SU-Seq

Features: Dynamic + Static

MSPn + P D + P S + P

5. Segmentation + Dynamic/Static

Automatic Segmentation

Manual Segmentation

(by LIMSI – CNRS)

Featu

re L

evel

Dynamic (D)

Static (S)

Accelerating (A)

Uniform (U)

Acc

eler

atio

n

FramesFrames

SS

D D

A

U U

Mo

del L

evel

vs.

Vel

oci

ty

Recommended