Date post: | 08-Jan-2017 |
Category: |
Technology |
Upload: | petra-galuscakova |
View: | 269 times |
Download: | 2 times |
Audio Information for Hyperlinking of TV Content
Petra Galuščáková and Pavel [email protected]
Faculty of Mathematics and PhysicsCharles University in Prague
SLAM Workshop, 30. 10. 2015
2
Hyperlinking TV Content
● Our main objective: create hyperlinks● Retrieve segments similar to a given query segment from
the collection of television programmes.
● Benefits:● Recommendation – bring additional entertainment value● Exploratory search – explore the topic and enable users to
find unexplored connections
3
BBC Broadcast Data
● Subtitles● Three ASR transcripts
● LIMSI– word variants occurring at the same time– confidence of each word variant
● TED-LIUM– confidence of each word
● NST-Sheffield● Metadata● Prosodic features
4
System Description
● Retrieve relevant segments● Divide documents into 60-second long segments● A new segment is created each 10 seconds● Index textual segments● Post-filter retrieved segments
● A query segment is transformed to textual query● Terrier IR Framework● Speech retrieval
● Suffering from problems associated with ASR systems
5
Speech Retrieval Problems
1. Restricted vocabulary● Data and query segment expansion● Combination of transcripts
2. Lack of reliability● Utilizing only the most confident words of the
transcripts● Using confidence score
3. Lack of content● Audio music information● Acoustic similarity
6
1. Restricted Vocabulary
● Number of unique words in transcripts is almost three times smaller than in subtitles.
● Low frequency words are expected to be the most informative for the information retrieval.
● Expand data and query segments● Metadata ● Content surrounding the query segment
● Combine different transcripts
7
Data and Query Segment Expansion
● Metadata● Concatenate each data and query segment with
metadata of the corresponding file.● Title, episode title, description, short episode synopsis,
service name, and program variant● Content surrounding the query segment
● Use 200 seconds before and after the query segment.
10
Data and Query Segment Expansion Results
● The improvement is significant in terms both measures.● Expansion using metadata and context may substantially
reduce query expansion problem.● The highest MAP-tol score was achieved on the LIUM
transcript.● Even though the transcripts have a relatively high WER.
● The metadata and context produce much higher relative improvement to the automatic transcripts than to the subtitles.
● MAP-bin score corresponds with the WER
12
Transcripts Combination
● The combination is generally helpful.● Even though the high score achieved by the LIUM
transcripts● The overall highest MAP-bin score was achieved using
union of the LIMSI and NST transcripts.● Outperforms the results achieved with the subtitles
13
2. Transcript Reliability
● WER● LIMSI: 57.5%● TED-LIUM: 65.1%● NST-Sheffield: 58.6%
● Word variants● Word confidence
14
Word Variants
● Compare utilization of the first, most reliable word and all word variants in LIMSI transcripts.
15
Word Confidence
● Only use words with high confidence scores● Only the words from LIMSI and LIUM transcripts with a
confidence score higher than a given threshold● Increased both scores for the development set● It did not outperform fully transcribed test data● We also experimented with voting
16
3. Lack of Content
● We only use content of the subtitles/transcripts● A wide range of acoustic attributes could also be
utilized: applause, music, shouts, explosions, whispers, background noise, …
● Acoustic fingerprinting● Acoustic similarity
17
Acoustic FingerprintingMotivation
● Obtain additional information from the music contained within the query segment
● Especially helpful for hyperlinking music programmes
18
Acoustic Fingerprinting
● 1) Minimize noise in each query segment● Query segments were divided into 10-second long
passages; a new passage was created each second● 2) Submit sub-segments to Doreso API service● 3) Retrieve song title, artist and album
● Development set: 4 queries out of 30 ● Test set: 10 queries out of 30
● 4) Concatenate title and artist and album name with text of query segment
● Both retrieval scores drop
19
Acoustic SimilarityMotivation
● Retrieve identical acoustic segments● E.g. signature tunes and jingles
● Detect semantically related segments ● E.g. segments containing action
scenes and music
20
Acoustic Similarity
● Calculate similarity between data and query vector sequences of prosodic features
● Find the most similar sequences near the beginning
● Linearly combine the highest acoustic similarity with text-based similarity score
● MAP-bin: 0.2689 0.2687
● MAP-tol: 0.2465 0.2473
22
Overview
Restricted vocabularyData expansion +Transcripts combination +
Transcript reliabilityWord variants +Word confidence -
Lack of contentAcoustic fingerprinting -Acoustic similarity +