Date post: | 29-Mar-2015 |
Category: |
Documents |
Upload: | molly-noyes |
View: | 216 times |
Download: | 3 times |
Question-Answering ofLarge News Video Archives
CHUA, Tat-Seng, Yang, Hui, Chaisorn, Lekha & Zhao, Yun-Long
School of ComputingNational University of Singapore
Email: [email protected]: http://www.comp.nus.edu.sg/~chuats
Outline of Talk
• Introduction and Motivation
• News Video Processing & Story Segmentation
• Video Transcript Correction
• Question-answering on News Video
• Results
• Conclusion
3
PersonalizedNews Video Retrieval
• Infotainment, including news video, is one of the major applications of MM Technology
• In a personalized news video scenario, users “interact” with the system to enquire info such as:o show me latest news video on Iraq “Iraq”o highlight of last nights European football “European
football”o Results are time-specific
• Users increasingly want to see video news, supplemented with audio and texto and summarized to as much detail as is necessary
• In a more futuristic setup, these will be accomplished through “natural” human-oriented I/O
4
Issues to Resolve
• Imprecision of users querieso “highlight of football match last night?”
• Extraction of semantic contents of video:o Multi-modalityo Multi-sources
• Segmentation of news video into story units with genre classifications
• Summarization of info for viewing at different level of details
5
What Kinds of Data Do we Have?
• Most research in the past has looked into only one sourceo Example, video and its accompanying audio track, + ASR
• In most real-life applications, information is readily available in multiple sources:o Broadcast news -- video and audioo Web-based news articles (by news stations)o On-line wired news (by news agencies)o Other general resources: ontologies, dictionary etc…
• Other types of info increasingly used in IR community:o User models: query logs, user profiles etc.
• A challenge in developing usable systems ..How to use these available data effectively In co-training/ testing type framework?? Ignoring these obvious data resources will result in
unsatisfactory solutions.
6
Outline of Our Approach
• In this talk, I will describe our approach in developing systems to handle large scale video corpuses – TREC video
• Sources of data used:o News video itself: visual, audio features, ASRo External sources: on-line news articles of the same periodo General resources – ontology of countries, dictionary -
WORDNET
• Approach (see architecture):
7
Tra n s cript R e trie v a l
C o r r e c t e d T r a n sc r ip tC o r p us
D o c um e n tR e t r ie v a l
R e le v a n tT r a n sc r ip t
O ut p ut Vide oA n swe r
A n s we r Ex tra ct io n
Se n t e n c eR a n k in g
A n swe rSe le c t io n
Q u e s t io n Pro ce s s in g
I n p utQ ue st io n
Q ue st io nC la ssif ic a t io n
E x p e c t e dA n swe r T y p e
O r igin a lQ ue r y T e r m s
Q u e ry R e in fo rce m e n t
Q ue st io n P a r sin g
V ide o S u m m a riza t io n
W o r dN e t
C o n te n t Pre pro ce s s in g
Sp e e c hR e c o gn it io n
I m p e r f e c t Vide oT r a n sc r ip t
Se gm e n t a t io n Vide o St o r y
Vide o
3 - se n t e n c eA n swe r W in do w
T r a n sc r ip tC o r r e c t io n
Sh o t I de n t if ic a t io n Vide o T r im m in g
R e in f o r c e d Q ue r y
W e b
System Architecture of VideoQA
Overview of QA on News Video
Stage 1:Stage 2:
Stage 3:
Stage 4:
Stage 5:
Stage 6
Outline of Talk
• Introduction and Motivation
• News Video Processing & Story Segmentation
• Video Transcript Correction
• Question-answering on News Video
• Results
• Conclusion
9
Video Story Segmentationfor News Video
• First basic problem: break the news video into meaningful units based on stories. Issues:o How to classify shots into the correct class/category?o How to detect story boundaries?
• Most news adopt the structure similar to CNN’s (?)
Intro News Com1 News Finance News Com2 Sports News Weather
10
Video Story Segmentationfor News Video -2
• To help alleviate the estimation problem in statistical learning, we adopt a two stage process:o Stage 1: Shot classificationo Stage 2: Scene segmentation & classification
• The set of features consideredo Visual (color histogram, b/g change)o Temporal [Motion activity, Audio type, Shot duration,
speaker change]o Mid-Level [# of Faces, Shot type, # of Text Lines, and
text-position, cue phrases]
11
Stage 1: Shot Classification• Divide video sequence into shots
• Consider 13 categories of shotso Intro/Highlighto Anchor; 2-Anchor; Meeting; Speecho Still image shot; Text Sceneo Sports; Live reportingo Finance; Weather; Commercial; Special
• Perform classification using Decision Tree (SEE 6.0)
12
Stage 2: Scene Detection
• Employ Hidden Markov Model (HMM) to detect story boundaries
• Features (sequence level features) used at this stage:o Shot classes – shot tagso Scene change [c/u]o Speaker change [c/u]o Cue phrases at the beginning of new stories
• Input to HMM:[1cc 1uu 1cu ..2cc 4c 4uu 6uu 6uu …. 2cc …. ]
• Tested on 120 hours of TREC video and achieve around 76% in F1 accuracy in story segmentation
• TREC data may be down-loaded from TREC web sites later (?)(Chaisorn & Chua et al, ICME’02, WWW Journal’02, TREC’03)
Outline of Talk
• Introduction and motivation
• News Video Processing & Story Segmentation
• Video Transcript Correction
• Question-answering on News Video
• Results
• Conclusion
14
Text Transcript: from Speech to Text
• Need accurate transcript for QAo not a problem for document or story retrieval
• Performance of speech recognition systemo Accuracy about 80% for news o Most errors are named entities – likely answer targets (ATs)o Most such errors are type substitution homonym problemo Examples: pneumonia new area; Tony Blair Teddy Bear
• How to correct errors in ATs? use phonetic sound matching to correct the errorso May use confusion matrix successfully used in spoken docm
retrievalo Problem: low precision match to many irrelevant phrases
• One solution: limit scope of phonetic sound matcho By utilizing on-line text news of same period (extract base
noun phrases and named entities) – reasonable
15
Use of External Resourceto correct Speech Errors
• Extract all ATs from on-line news articles, Ai = (ai1,.. aiq)
• Given video transcript Ti with a list of terms (ti1, .., tip)
• The basic problem is then to select an aikAi to replace a sequence of terms sjTi that maximizes the probability:
where sj contains one or more consecutive terms in Ti
arg max ( | )j i ik i
j iks T a A
p s a
• Basic idea: use co-occurrence probabilities & phonetic matching to find most likely aikAi to replace sequence of terms sjTi,:
a) Extract list of probable ATs using co-occurrence probabilitiesa) Matching at phonetic syllable level; b) Matching at confusion syllable string level(see Wang & Chua, ACL’03)
Outline of Talk
• Introduction and Motivation
• News Video Processing & Story Segmentation
• Video Transcript Correction
• Question-answering on News Video
• Results
• Conclusion
17
Tra n s cript R e trie v a l
C o r r e c t e d T r a n sc r ip tC o r p us
D o c um e n tR e t r ie v a l
R e le v a n tT r a n sc r ip t
O ut p ut Vide oA n swe r
A n s we r Ex tra ct io n
Se n t e n c eR a n k in g
A n swe rSe le c t io n
Q u e s t io n Pro ce s s in g
I n p utQ ue st io n
Q ue st io nC la ssif ic a t io n
E x p e c t e dA n swe r T y p e
O r igin a lQ ue r y T e r m s
Q u e ry R e in fo rce m e n t
Q ue st io n P a r sin g
V ide o S u m m a riza t io n
W o r dN e t
C o n te n t Pre pro ce s s in g
Sp e e c hR e c o gn it io n
I m p e r f e c t Vide oT r a n sc r ip t
Se gm e n t a t io n Vide o St o r y
Vide o
3 - se n t e n c eA n swe r W in do w
T r a n sc r ip tC o r r e c t io n
Sh o t I de n t if ic a t io n Vide o T r im m in g
R e in f o r c e d Q ue r y
W e b
System Architecture of VideoQA
Overview ofQA on News Video
(Similar to our text-based QA work – Yang & Chua, SIGIR’03)
18
• Users typical issue short queries (several keywords):o “development in North Korea”o “match last night”o Query is ambiguous!!
• Analyze the query to extract:o Key terms in queryo Likely answer targeto NP & NE in queryo Type of video genreo Temporal constrainto Duration constraint
Question Processing
Example: “football match last night?” “football”, “match” “football team” (ORG-NAME) “football match” SPORTS LAST-NIGHT 30 seconds (default)
19
• The query, however, is ambiguous! o Use on-line news articles to provide the context (user
independent)
• Basic Idea: Given original query q(o):o Use web (or news sites) and dictionary – WordNeto Find terms (from web articles) co-occur frequently
with q(o)
o Extract semantically related terms from WordNeto Add high probability terms into q(0) to get q(1)
• Expect q(1) to contain more context terms than q(0)
o For the football example: we expect q(1) to also contain terms like: “arsenal”, “inter milan”, “soccer”, etc (the big match last night)
Query Reinforcement
20
Query ReinforcementAnother Example
• q(0) = “What are the symptoms of atypical pneumonia?”• q(1) = “ “symptoms, pneumonia, virus, spread, fever,
cough, breath, doctor”
O r i g i n a l Q u e r y Te r m ss y m p to m s , aty p ical , p n eu m o n ia
W e b
C o n t e x t W o r d s f r o m W e b
sy m p t o m s, p n e um o n ia , sp r e a d,v ir us, f e v e r , c o ugh , h o sp it a l,a t y p ic a l, do c t o r , A sia , . . .
R e f i n e d C o n t e x t W o r d s u s i n g W o r d N e t
sy m p t o m s, p n e um o n ia , v ir us,sp r e a d, f e v e r , c o ugh , br e a t h ,do c t o r , . . . W o r dN e t
Use q(1) to retrieve a list of news transcripts at story level
21
k kjα wijk
S
• Final score is:
where αk=1 and wkj = {wnj, whj, wcj, wej, waj, wvj}
• The top K sentences are selected as the candidate answer sentences based on Sij
Candidate Sentence Extraction
• For the retrieved transcript Ti, we select sentences Sentij that best match the user query as follows: o noun phrases, wnj
o named entities, whj
o original query words q(0), wcj
o expanded query words q(1-0) = q(1) - q(0), wej
o video genre, wvj
Outline of Talk
• Introduction
• News Video Processing & Story Segmentation
• Video Transcript Correction
• Question-answering on News Video
• Results
• Conclusion
23
Results• Use 7 days of CNN news video from 13-19 Mar 2003
o contained a total of 350 minutes of news videoo retrieved about 600 news articles per day from the Alta
Vista news web site during these 7 days
• Designed 40 factoid questionso 28 general questions that are asked everydayo 12 questions are date-specifico Give a total of 208 questions
Transcript Correct Answers Accuracy
without error correction 116 55.8%
with error correction 153 73.6%
(To present in ACM Multimedia ’03)
• Results
24
Results -- Example• Query: “What are the symptoms of atypical pneumonia?”,
• the 3-sentence window selected by the QA engine iso S1: He and his two companions are now in isolation and the one
hundred and fifty five passengers on the flight were briefly quarantined.
o S2: Symptoms include high fever, coughing, shortness of breath and difficulty breathing.
o S3: But health officials say there's no reason to panic.
• The video summary example (4 shots) is:
Outline of Talk
• Introduction
• News Video Processing & Story Segmentation
• Video Transcript Correction
• Question-answering on News Video
• Results
• Conclusion
26
• Research in correcting speech recognition errors(ACL’03, EMNLP’02)
• News story and dialogue segmentation (Columbia U)(ICME’03, ACL’03)
• Question-answering in text(TREC’02, SIGIR’03)
• Infomedia Projecto Uses multi-modality features effectively, esp speecho Insufficient emphasis on external resources
• Works on Video-TREC - Large scale testing• Collaboration with Ramesh jain (Georgia Tech)
as part of Video Tagging Projecto Employ TV-Anytime metadata for news (collaborate
with ETRI Korea)o Automatic tagging of TV-Anytime metadata, and use it
as basis for video QA
Related Work
27
• Works are preliminaryo Many processes needs to be automated
• Participating in this year’s Video-TREC and test on large scale corpuses (120 hours of news video)o On both story segmentation and retrieval
• Experience:o Story Segmentation: content features are important, text
or ASR feature less importanto Retrieval: Text or ASR is important; content features help
in enhancing precision
• Current Work:o Build appropriate meta model to encode domain
knowledgeo Use higher order statistics to analyze data
• KEY MESSAGE– Must incorporate domain model and utilize multi-modality, multi-source information
Summary
28
THANK YOU
29
Question classification and possible video genres
Answer Target Likely Video Genre Example
Human Anchor, meeting, speech, General-news
Who is the Secretary of State of the United States?
Location Live report, Anchor, General-news
Where is Saddam Hussein hiding?
Organization Live report, anchor Which hospital is the center for SARS treatment in Singapore?
Time Anchor, General-news When did the Iraq war start?
Number Finance What is the expected GDP of Singapore this year?
Sports, Text-scene How many points did Yao Ming score?
Weather, Text-scene What is the highest temperature tomorrow?
Object Anchor, Still-image, Text-scene
Which kinds of bombs are used in the current Iraq war?
Description Anchor, Text-scene What does SARS stand for?
30
Question analysis
Question What is the score of the football match last night?
What are the symptoms of atypical pneumonia?
q(0) score, football, match, last, night
symptoms atypical pneumonia
n football match, last night symptom, atypical pneumonia
h football atypical pneumonia
Answer Target Number Description
Video Genre Sports, Text-scene General News
31
1. Who is the British Prime Minister?2. Who is elected to be China's President?3. Who is the President of the United States?4. What is the name of the former Premier of China? 5. What is the name of the new Premier of China? 6. Who will pay the heaviest tallies? 7. Who was arrested in Pakistan? 8. Which musician called off his US tour? 9. When will NASA resume shuttle flights?10. When will Germany, France and Russia meet?11. When is the funeral of DjinDjic? 12. Which are the three countries involved in the summit today?13. Where was the summit held? 14. Which city is the capital of Central African Republic? 15. Which are the three major war opponent countries?16. To whom US withdrew the aid offer?17. Which country vowed to veto the resolution today? 18. Which country's compromise proposal was rejected by US?19. Where is Kashmir Hotel?20. Where did Iraq invite the chief weapons inspectors to?
List of Questions
32
21. Which city has the largest anti war demonstration?22. Where did a AL QUEDA suspect arrested?23. How many people attended the rally in San Francisco? 24. What is the cost of war?25. How many people were killed in a Kashmir Hotel?26. How many people participated in the rally in Madrid?27. How many people were killed by the new pneumonia?28. What are the symptoms of the atypical pneumonia?29. What sanction did President Bush lift?30. What was the name of the space shuttle broken apart in February? 31. Which rally shows the support for President Bush? 32. What is the official name for the mysterious pneumonia? 33. Which company tests their new passenger profiling system? 34. Name one Jewish holiday. 35. What is British stance?36. How did Serbs Prime Minister die?37. How is the anti-war protest in Madrid?38. How is tomorrow's weather?39. What is the conflict between US and Turkey?40. What does the WHO call the new pneumonia?
List of Questions – cont.
33
Some Remarks onStory Segmentation Task• Our 2-stage approach helps alleviate the statistical
estimation problem – requires less training data
• Similar works done in Columbia Uo Using maximum entropy methodo For video segmentation (ICME’03) and dialogue
segmentation (ACL’03)o Achieves similar performance
• Our current work:o Integration of multiple machine learning methods: HMM,
ME, heuristic rule methods, and co-training approacho Fusion of multiple modal features: visual/audio features,
text (speech to text), meta-data + domain knowledgeo Note: Use only text feature (ASR) performs badly
34
• We perform matching at 2 levels to find the most likely aikAi to replace the sequence of terms sjTi,: a) Phonetic syllable level; b) confusion syllable string level
Multi-tier mapping(Wang, Chua, ACL’03)
Recall Precision
• At each level, we compute:
o LCS(qi,cj): gives longest common subsequence (LCS) match between aik and sj at phonetic syllable level in the order of their occurrence
o Mk == I for Levels a and b match; and == coefficients of confusion matrix at Level c match
),(
0*
|}||,max{|
),(),(
iik saLCS
kk
iik
iikiik M
sa
saLCSsaSim
35
• The query, however, is ambiguous! o Use on-line news articles to provide the context (user
independent)
• Basic Idea: Given original query q(o):o Go to web (or news sites) to retrieve top N
documentso Extract terms with high co-location probabilities with
q(o), Cq
o Extract semantically related terms from WordNet, Gq & Sq
o Extra terms to be added: Kq = Cq + (Gq Sq)
o (q(1)= q(0)+{top m termsKq with weights>=σ}
• Expect q(1) to contain more context terms than q(0)
o For the football example: expect q(1) to also contain terms like: “real madrid”, “manchester united”, “soccer”
Query Reinforcement
36
• q(0) = “What are the symptoms of atypical pneumonia?”• q(1) = “ “symptoms, pneumonia, virus, spread, fever,
cough, breath, doctor”
Query ReinforcementAnother example
O r i g i n a l Q u e r y Te r m ss y m p to m s , aty p ical , p n eu m o n ia
W e b
C o n t e x t W o r d s f r o m W e b
sy m p t o m s, p n e um o n ia , sp r e a d,v ir us, f e v e r , c o ugh , h o sp it a l,a t y p ic a l, do c t o r , A sia , . . .
R e f i n e d C o n t e x t W o r d s u s i n g W o r d N e t
sy m p t o m s, p n e um o n ia , v ir us,sp r e a d, f e v e r , c o ugh , br e a t h ,do c t o r , . . . W o r dN e t
Use q(1) to retrieve a list of news transcripts at story level