+ All Categories
Home > Technology > Instant Question Answering System

Instant Question Answering System

Date post: 22-May-2015
Category:
Upload: dhwaj-raj
View: 305 times
Download: 0 times
Share this document with a friend
Description:
Instant Question Answering System using machine learning and natural language processing
Popular Tags:
38
Instant Question Answering Dhwaj Raj
Transcript
Page 1: Instant Question Answering System

Instant Question Answering

Dhwaj Raj

Page 2: Instant Question Answering System

● User asks a question in text format and the instantQA system automatically retrieves or formulates an answer and presents it back to the user, instantly.

What is Instant Question Answering?

Page 3: Instant Question Answering System

Why Instant Question Answering?

● In spite of the continuous progress of search engines, many of users’ needs still remain unanswered.

● While Community Question Answering (e.g. AnA platform) can feature factoid questions but their primary goal is to satisfy needs such as: Opinion seeking, Recommendation, Open-ended questions, Problem solving.

● In community question answering user has to wait for answers which he seeks, even if question is very simple and a mere fact.

● Better User Experience : Why browse through search result listings or related questions when information can be catered upfront.

Page 4: Instant Question Answering System

Why Instant Question Answering?

● CASE : SHIKSHA.COM

● Top domains being searched based on Both query logs and data availability with listings: fees, duration, seats, application date, application url, affiliation, approval, entrance exams, placement companies and job salaries.

● High number of Fact type questions, which can be targeted, although we are not targeting opinion based or open ended questions.

● 23% of questions belong to these 10 domains out of 1.15L random sample.

Page 5: Instant Question Answering System

Is it something similar to AnA platform?

● Our organization have a discussion forum called as AnA(Ask and Answer) platform.

● InstantQA has no relation what so ever and no direct usecase with the current AnA forum contents, as of now.

Page 6: Instant Question Answering System

What kind of questions we target?

● What is the price of X?

● When is the last date of Y?

● How much is the fee for W?

● What is the fee for W?

● Which company hire from campus Q?

● How is the placement at Z?

● Is Z college in Delhi? (transform to where)

● What is meaning of life, universe and everything?

● I do not feel like studying, what to do?

● Will I get admission in Z?

● How to improve my career?

● Should I invest in noida?

● I have purchased X project, should I sell it now or hold?

● Is it beneficial to buy 2bhk in 30 lacs?

Page 7: Instant Question Answering System

What kind of questions we target?

● What is the price of X?

● When is the last date of Y?

● How much is the fee for W?

● What is the fee for W?

● Which company hire from campus Q?

● How is the placement at Z?

● Is Z college in Delhi? (transform to where)

● What is meaning of life, universe and everything?

● I do not feel like studying, what to do?

● Will I get admission in Z?

● How to improve my career?

● Should I invest in noida?

● I have purchased X project, should I sell it now or hold?

● Is it beneficial to buy 2bhk in 30 lacs?

FACTOID

S

Open e

nded.

Not def

inite

Page 8: Instant Question Answering System
Page 9: Instant Question Answering System

● General architecture

question Question Classification and Analysis

Information Retrieval

Answer

ExtractionAnswer

answer

e.g.

What is Calvados?

/Q is /A where:/Q=“(Calvados)”

Query=“Calvados is”

Text retrieva l=“…Calvados is often used in cooking…Calvados is a dry apple brandy made in…

/A is : a dry apple brandy

Answer:

/Q is /A:

“Calvados” is ”a dry apple brandy”

What is the very basic approach to instant question answering?

Page 10: Instant Question Answering System

If it is so simple, why haven't you done it already?

Page 11: Instant Question Answering System

There are challenges in QA !

● Quality of text data.● Language variability (paraphrase)● Knowledge base domain: the answer has to be

supported by the collection, not by the current state of the world.

● How to locate the information given the question keywords.

● It is unlikely that a system will have all necessary resources pre-computed.

● The task requires some deduction or extra linguistic knowledge.

● How does a reasoning system find relevant pieces of information.

Page 12: Instant Question Answering System

Do we have any prior research to tackle these challeneges?

Page 13: Instant Question Answering System

QA research

● Well established over two decades● TREC (Text REtrieval Conference)

● funded by NIST/DARPA since 1992● QA track 1999 – 2007, directed at ‘Factoids’

● CLEF (Cross Language Evaluation Forum)● 2001- current● Information Retrieval, language resources

● NTCIR (NII Test Collection for IR Systems)● 1997 – current● IR, question answering, summarization, extraction

● Our Literature Survey can be accessed at : http://svn.infoedge.com:8080/Common_Engineering_Projects_Trac/wiki/instant_question_answering#LiteratureSurvey

Page 14: Instant Question Answering System

Ok investigation is done.

But how to do it actually?

Page 15: Instant Question Answering System

Knowledge base generation

Page 16: Instant Question Answering System

Knowledge base generation

PHASE 1

Page 17: Instant Question Answering System

Knowledge base generation: Example

PHASE 1

● The fees for Btech course in IIT D is 24000 INR.

● The <<fees>> for <<Btech>> course in <<IIT D>> is <<24000 INR>>.

● Fees, Btech, IIT D, 24000

● What is the fees of Btech course at IIT Delhi?

● How much is the fees for Btech Coure from IIT Delhi?

● How many INR is the fees of btech from iit delhi.

● What ….........

IndexBtech, iit d, fees, 24000, INR

The fees for Btech course in IIT D is 24000 INR.

The <<fees>> for <<Btech>> course in <<IIT D>> is <<24000 INR>>.

Page 18: Instant Question Answering System

Answer Retreival

Page 19: Instant Question Answering System

Answer Retreival : Example

How much will I pay for btech from IIT D?

How much will I <<pay for>>

<<btech>> from <<IIT D>>?

Focus: How MuchObject : Pay

Class: quanitity to pay, fees

Consistency checks

● You should pay 24000 INR for Btech from IIT D.

● The fees for Btech from IITD is 24000 INR.

● 24000 INR should be paid for Btech from IIT D.

Already indexed knowledge base.

Trained once at startup.

Rank and prune best answer based on collective match.

Page 20: Instant Question Answering System

So many boxes !!Let us check out major components in

brief.

Page 21: Instant Question Answering System

A.1. Fact phrase generator from structured listings

● Structured listing to factoid text.● No need to rely only on user generated sentences.● Use basic language model techniques to create

sentences from templates.

<doc>….. <college_name>iit</college_name> <college_id>13213</college_id> <fee>54000 inr annual</fee> <location>delhi</location>…....</doc>

Language Model

Fee of iit delhi is 54000 inr annual.

Page 22: Instant Question Answering System

A.2. Template Generator

● Start with identifying:– Answer Type

– Entities in focus

– Part of Speech tags

● With these tags and language grammar rules, a factoid/ sentence can be converted into all possible question forms. (Question Generation QG task)

Fee of iit delhi is 54000 inr annually. ● What is the fee of iit delhi annually?● What is the fee of iit delhi● How much is the fee of iit delhi?● Is fee of iit delhi 54000 inr?

Answer type: quantity focus: feeentity : iit + delhiPos tags etc.

Fee of <II> <LL> is <$$>.Fees of <II> <LL> is <$$>.Cost of <II> <LL> is <$$>.

Page 23: Instant Question Answering System

B.1. Text Preprocessing● Short-forms

– i’m, im, i m i am– can’t, cant, can t can not

● Spelling correction

● Repeated punctuation (!!!, ???, …)

● Smilies

● Salutations (Hi all, Hiya, etc.)

● Names, signature, course codes

Page 24: Instant Question Answering System

B.2. Entity and POS Tagger

● QER– Names, locations etc.

● Part of Speech Tagger using word sequence patterns

– Sequence (noun, verbs, auxiliaries, modifiers)

● Phrase Chunker● Dependency parsing : validate tag relationships

Page 25: Instant Question Answering System

B.3. Question Analysis● Create features to be used during answer extraction

● Identify keywords to be matched in document sentences

● Identify answer type to match answer candidates. We can create an inventory of questions and expected answer types and so we can train a classifier– Quantity?– Dates?– Definition?

● Select a list of useful patterns from a pattern repository

● Identify question relations which may be used for sentence analysis, etc.

Page 26: Instant Question Answering System

B.4. Query Formulation

● The question needs to be transformed in a query to the document retrieval system

● Each IR system has its own query language so we need to perform this mapping

● Identify useful keywords; use type of answer sought, entities to boost etc.

● Query Creation : Ordered terms, combined terms, weighted terms.

Page 27: Instant Question Answering System

B.5. Answer Candidate Searcher

● Index the <question, qtypes, entities, answer template> in a training corpus

● Retrieve set of n <question, qtypes, entities, answer template> given a new question

● Decide based on the scores of answers returned the best answer to the new question

Page 28: Instant Question Answering System

Pheww.... !

Page 29: Instant Question Answering System

Where do we need Natural Language Processing?

● Tokenisation (words, numbers, punctuation, whitespace)● Sentence detection● Part of speech tagging (verbs, nouns, pronouns, etc.)● Query entity recognition● Chunking/Parsing (noun/verb phrases and relationships)● Statistical modelling tools● Dictionaries, word-lists, WordNet , VerbNet● Template generation using grammar rules.

Page 30: Instant Question Answering System

So you are telling me there are readymade nlp tools?

Page 31: Instant Question Answering System

NLP tools problems● Training data issues

● Training domains are completely different.

● Local english language: slang, spell, localisation

● Sentence detection failures:● Bad style (capitalisation, punctuation)● Ellipsis (i tried... it failed... error message...)

● Tokenisation failures:● Multiple punctuation ???, !!! (student emphasis)● Abbreviations (im, m.b.a, cant, doesnt, etc.)

● POS errors● Spelling, grammar

● We need to experiment, modify codes and train on our domain data !

Page 32: Instant Question Answering System

What are the use cases of instant QA ?

How does it fit in our system?

Page 33: Instant Question Answering System

Interaction● If users are not writing good english then try to minimize their

writings. We can focus on capturing user intent with least amount of typed text.

● This helps not onle user experience but increases the accuracy of language based statistical systems.

✔ Auto complete

✔ Guidance

✔ Spell check

✔ Auto correct

✔ Manual feedback on conflicts

✔ Make them write good queries

Page 34: Instant Question Answering System

Shiksha : main search & cafe search

Page 35: Instant Question Answering System

Shiksha : Integration with main search auto-suggestor

We will already generate good quality questions. Could be intigrated here.

Page 36: Instant Question Answering System

99acres● Similar use cases like shiksha.

● The real estate domain has more open ended opinion question and very less factoid questions.

● If a single text box search is introduced in future– SRP can cater not only listings but also Question

Answers– Instant QA would be really helpful in user experience.

Page 37: Instant Question Answering System

And many more other use cases …...

Plus some components of this system will be utilized separately in improving other existing systems.

Page 38: Instant Question Answering System

Thank you.


Recommended