+ All Categories
Home > Software > DCU Search Runs at MediaEval 2014 Search and Hyperlinking

DCU Search Runs at MediaEval 2014 Search and Hyperlinking

Date post: 06-Jul-2015
Category:
Upload: multimediaeval
View: 25 times
Download: 2 times
Share this document with a friend
Description:
We described Dublin City University (DCU)'s participation in the Search sub-task of the Search and Hyperlinking Task at MediaEval 2014. Exploratory experiments were carried out to investigate the utility of prosodic prominence features in the task of retrieving relevant video segments from a collection of BBC videos. Normalised acoustic correlates of loudness, pitch, and duration were incorporated in a standard TF-IDF weighting scheme to increase weights for terms that were prominent in speech. Prosodic models outperformed a text-based TF-IDF baseline on the training set but failed to surpass the baseline on the test set.
Popular Tags:
12
DCU Search Runs at MediaEval 2014 Search and Hyperlinking David N. Racca, Maria Eskevich, Gareth J.F. Jones CNGL Centre for Global Intelligent Content School of Computing, Dublin City University Dublin, Ireland
Transcript
Page 1: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

DCU Search Runs at MediaEval2014 Search and HyperlinkingDavid N. Racca, Maria Eskevich, Gareth J.F. Jones

CNGL Centre for Global Intelligent ContentSchool of Computing, Dublin City University

Dublin, Ireland

Page 2: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

Overview

Novelty of Approach for MediaEval 2014:• What: integrate prosodic prominence in the IR weighting scheme• How: increase weight of prosodic prominent terms• Effect: promote rank of retrieved segments containing prominent terms  

Page 3: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

Motivation

●  Speech is much more than a “bag of words”

●  Prosodic information: e.g. intonation, duration, and loudness

● Shown useful in many speech processing tasks:● emotions, discourse structure, speech acts, speaker ID, focus, contrast, topic shifting

Page 4: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

Related Work

● Possible correlation between acoustic stress and TF­IDF scores in English (Crestani 2001)

● Use of signal amplitude and duration in a spoken content retrieval task in Chinese (Chen et al. 2001)

● Use of pitch and intensity in a topic tracking task in French (Guinaudeau 2011)

Page 5: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

Approach

● We followed Guinaudeau's method

● Implemented a SCR system that gives greater importance to terms that are prosodically prominent in the spoken content

● Prosodic prominent terms stand out from their surroundings by means of their acoustic characteristics

Page 6: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

Approach: Indexing

Video Collection

LVCSR

Feature extraction

XML

CSV

Prosodic features: F0, loudness

~ 10 ms

Automatic text transcripts

XML

Transcripts with prosodic

information

XML

Normalised max, min of F0, and

loudness

XMLNon-overlapping fixed

time segmentation

Segments with prosodic

information

Video Collection

Segments Index

Terrier Indexing

Stopword removal,stemming

Page 7: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

Approach: Indexing

XML

Normalised max, min of F0, and

loudness

Segments Index

XML

Transcripts with prosodic

information

car 350 (1,3, [.78, .34,.20 ] ,[ .56,.23, .25] ,[0.66,1.0, .33])(4,1,[ .23,.10, .15]) ,...

race 45 (1,1, [0.81,0.98 ])(9,2,[0.23,0.10 ] ,[0.54,0.27]) ,⋯

idf docid tf [F0, L, D]

Transcript

Speech Segment context

Speech Seg 0

Speech Seg 1

Speech Seg N

...

Full transcript context

Page 8: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

Approach: Retrieval

Terrier Matching

w s (t )=θ ir∗( idf t∗tf t )+θac∗act

θ ir+θac

Segments Index

“car race”Query Result List

1 Traffic cops 0.0 1.30 3.35

2 Top gear 1.30 3.00 2.82

3 Top gear 48.0 49.30 2.73

... ... ... ... ...

score

act=max (durt )

act=max ( loudt )∗max (F0 t )

(G-dur)

(G-pr)

(G-lp)

Page 9: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

Results: Development Set

TranscriptRetrieval 

ModelMRR

Subtitles

TF­IDF 1 0 .428

G­pr 2 3 .126

G­lp 3 1 .281

G­dur 1 3 ­­­

NST­Sheffield

TF­IDF 1 0 .313

G­pr 2 3 .329

G­lp 3 1 .319

G-dur 1 3 ---

● 50 queries – 1335 hours

θir θacTranscript

Retrieval Model

MRR

LIMSI

TF­IDF 1 0 .259

G­pr 2 3 .264

G­lp 3 1 .285

G­dur 1 3 .279

LIUM

TF­IDF 1 0 .296

G­pr 2 3 .253

G-lp 3 1 ---

G-dur 1 3 ---

θir θac

● Known-item Task

Page 10: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

Results: Test Set

TranscriptRetrieval 

ModelMAP

Subtitles

TF­IDF 1 0 .639

G­pr 2 3 .599

G­lp 3 1 .533

G­dur 1 3 .345

NST­Sheffield

TF­IDF 1 0 .440

G­pr 2 3 .434

G­lp 3 1 .435

G-dur 1 3 .404

● 36 queries – 2686 hours

θir θacTranscript

Retrieval Model

MAP

LIMSI

TF­IDF 1 0 .525

G­pr 2 3 .508

G­lp 3 1 .428

G­dur 1 3 .505

LIUM

TF­IDF 1 0 .451

G­pr 2 3 .444

G­lp 3 1 .436

G-dur 1 3 .358

θir θac

● Ad-hoc Retrieval Task ● MAP is Overlap MAP

Page 11: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

Conclusion

● We explored the potential of prosodic information in the Search and Hyperlinking task

● Prosodic models outperformed a text­based baseline on the development set but failed to do so on the test set:

● known-item vs ad hoc task● differences in query length● other differences in style of query

Page 12: DCU Search Runs at MediaEval 2014 Search and Hyperlinking

References

● F. Crestani. Towards the use of prosodic information for spoken document retrieval. SIGIR'01

● B. Chen, H­M. Wang, and L.­S. Lee. Improved spoken document retrieval by exploring extra acoustic and linguistic cues. INTERSPEECH'01

● C. Guinaudeau and J. Hirschberg. Accounting for prosodic information to improve ASR­based topic tracking for TV broadcast news. INTERSPEECH'11


Recommended