+ All Categories
Home > Documents > Question-Answering: Systems & Resources

Question-Answering: Systems & Resources

Date post: 24-Feb-2016
Category:
Upload: roger
View: 57 times
Download: 0 times
Share this document with a friend
Description:
Question-Answering: Systems & Resources. Ling573 NLP Systems & Applications April 8, 2010. Roadmap. Two extremes in QA systems: LCC’s PowerAnswer-2 Insight’s Patterns… Question classification (Li & Roth) Resources . PowerAnswer2. Language Computer Corp. Lots of UT Dallas affiliates - PowerPoint PPT Presentation
Popular Tags:
120
Question- Answering: Systems & Resources Ling573 NLP Systems & Applications April 8, 2010
Transcript
Page 1: Question-Answering: Systems & Resources

Question-Answering:Systems & Resources

Ling573NLP Systems & Applications

April 8, 2010

Page 2: Question-Answering: Systems & Resources

RoadmapTwo extremes in QA systems:

LCC’s PowerAnswer-2

Insight’s Patterns…

Question classification (Li & Roth)Resources

Page 3: Question-Answering: Systems & Resources

PowerAnswer2Language Computer Corp.

Lots of UT Dallas affiliatesTasks: factoid questionsMajor novel components:

Web-boosting of resultsCOGEX logic proverTemporal event processingExtended semantic chains

Results: “Above median”: 53.4% main

Page 4: Question-Answering: Systems & Resources

Challenges: Co-referenceSingle, basic referent:

Page 5: Question-Answering: Systems & Resources

Challenges: Co-referenceSingle, basic referent:

Multiple possible antecedents:Depends on previous correct answers

Page 6: Question-Answering: Systems & Resources

Challenges: EventsEvent answers:

Not just nominal concepts

Page 7: Question-Answering: Systems & Resources

Challenges: EventsEvent answers:

Not just nominal conceptsNominal events:

Preakness 1998

Page 8: Question-Answering: Systems & Resources

Challenges: EventsEvent answers:

Not just nominal conceptsNominal events:

Preakness 1998Complex events:

Plane clips cable wires in Italian resort

Page 9: Question-Answering: Systems & Resources

Challenges: EventsEvent answers:

Not just nominal conceptsNominal events:

Preakness 1998Complex events:

Plane clips cable wires in Italian resort

Establish question context, constraints

Page 10: Question-Answering: Systems & Resources

PowerAnswer-2Factoid QA system:

Page 11: Question-Answering: Systems & Resources

PowerAnswer-2Standard main components:

Question analysis, passage retrieval, answer processing

Page 12: Question-Answering: Systems & Resources

PowerAnswer-2Standard main components:

Question analysis, passage retrieval, answer processing

Web-based answer boosting

Page 13: Question-Answering: Systems & Resources

PowerAnswer-2Standard main components:

Question analysis, passage retrieval, answer processing

Web-based answer boostingComplex components:

Page 14: Question-Answering: Systems & Resources

PowerAnswer-2Standard main components:

Question analysis, passage retrieval, answer processing

Web-based answer boostingComplex components:

COGEX abductive proverWord knowledge, semantics:

Extended WordNet, etcTemporal processing

Page 15: Question-Answering: Systems & Resources

Web-Based BoostingCreate search engine queries from question

Page 16: Question-Answering: Systems & Resources

Web-Based BoostingCreate search engine queries from questionExtract most redundant answers from search

Cf. Dumais et al - AskMSR

Page 17: Question-Answering: Systems & Resources

Web-Based BoostingCreate search engine queries from questionExtract most redundant answers from search

Cf. Dumais et al - AskMSRIncrease weight on TREC candidates that match

Higher weight if higher frequency

Page 18: Question-Answering: Systems & Resources

Web-Based BoostingCreate search engine queries from questionExtract most redundant answers from search

Cf. Dumais et al - AskMSRIncrease weight on TREC candidates that match

Higher weight if higher frequencyIntuition:

Common terms in search likely to be answerQA answer search too focused on query terms

Page 19: Question-Answering: Systems & Resources

Web-Based BoostingCreate search engine queries from questionExtract most redundant answers from search

Cf. Dumais et al - AskMSR Increase weight on TREC candidates that match

Higher weight if higher frequency Intuition:

Common terms in search likely to be answer QA answer search too focused on query terms Reweighting improves

Web-boosting improves significantly: 20%

Page 20: Question-Answering: Systems & Resources

Deep Processing: Query/Answer Formulation

Preliminary shallow processing:Tokenization, POS tagging, NE recognition,

Preprocess

Page 21: Question-Answering: Systems & Resources

Deep Processing: Query/Answer Formulation

Preliminary shallow processing:Tokenization, POS tagging, NE recognition,

PreprocessParsing creates syntactic representation:

Focused on nouns, verbs, and particles Attachment

Page 22: Question-Answering: Systems & Resources

Deep Processing: Query/Answer Formulation

Preliminary shallow processing:Tokenization, POS tagging, NE recognition,

PreprocessParsing creates syntactic representation:

Focused on nouns, verbs, and particles Attachment

Coreference resolution links entity references

Page 23: Question-Answering: Systems & Resources

Deep Processing: Query/Answer Formulation

Preliminary shallow processing:Tokenization, POS tagging, NE recognition,

PreprocessParsing creates syntactic representation:

Focused on nouns, verbs, and particles Attachment

Coreference resolution links entity referencesTranslate to full logical form

As close as possible to syntax

Page 24: Question-Answering: Systems & Resources

Syntax to Logical Form

Page 25: Question-Answering: Systems & Resources

Syntax to Logical Form

Page 26: Question-Answering: Systems & Resources

Syntax to Logical Form

Page 27: Question-Answering: Systems & Resources

Deep Processing:Answer Selection

Lexical chains:Bridge gap in lexical choice b/t Q and A

Improve retrieval and answer selection

Page 28: Question-Answering: Systems & Resources

Deep Processing:Answer Selection

Lexical chains:Bridge gap in lexical choice b/t Q and A

Improve retrieval and answer selectionCreate connections between synsets through

topicalityQ: When was the internal combustion engine

invented?A: The first internal-combustion engine was built in

1867.invent → create_mentally → create → build

Page 29: Question-Answering: Systems & Resources

Deep Processing:Answer Selection

Lexical chains: Bridge gap in lexical choice b/t Q and A

Improve retrieval and answer selection Create connections between synsets through topicality

Q: When was the internal combustion engine invented?A: The first internal-combustion engine was built in 1867. invent → create_mentally → create → build

Perform abductive reasoning b/t QLF & ALF Tries to justify answer given question

Page 30: Question-Answering: Systems & Resources

Deep Processing:Answer Selection

Lexical chains: Bridge gap in lexical choice b/t Q and A

Improve retrieval and answer selection Create connections between synsets through topicality

Q: When was the internal combustion engine invented?A: The first internal-combustion engine was built in 1867. invent → create_mentally → create → build

Perform abductive reasoning b/t QLF & ALF Tries to justify answer given question Yields 10% improvement in accuracy!

Page 31: Question-Answering: Systems & Resources

Temporal Processing16% of factoid questions include time reference

Page 32: Question-Answering: Systems & Resources

Temporal Processing16% of factoid questions include time referenceIndex documents by date: absolute, relative

Page 33: Question-Answering: Systems & Resources

Temporal Processing16% of factoid questions include time referenceIndex documents by date: absolute, relativeIdentify temporal relations b/t events

Store as triples of (S, E1, E2)S is temporal relation signal – e.g. during, after

Page 34: Question-Answering: Systems & Resources

Temporal Processing16% of factoid questions include time referenceIndex documents by date: absolute, relativeIdentify temporal relations b/t events

Store as triples of (S, E1, E2)S is temporal relation signal – e.g. during, after

Answer selection:Prefer passages matching Question temporal

constraintDiscover events related by temporal signals in Q & AsPerform temporal unification; boost good As

Page 35: Question-Answering: Systems & Resources

Temporal Processing16% of factoid questions include time reference Index documents by date: absolute, relative Identify temporal relations b/t events

Store as triples of (S, E1, E2)S is temporal relation signal – e.g. during, after

Answer selection: Prefer passages matching Question temporal constraint Discover events related by temporal signals in Q & As Perform temporal unification; boost good As

Improves only by 2% Mostly captured by surface forms

Page 36: Question-Answering: Systems & Resources

Results

Page 37: Question-Answering: Systems & Resources

OverviewKey sources of improvement:

Shallow processing: Web-boosting: +20%

Page 38: Question-Answering: Systems & Resources

OverviewKey sources of improvement:

Shallow processing: Web-boosting: +20%

Deep processing:COGEX logic prover + semantics: 10%Temporal processing: 2%

Page 39: Question-Answering: Systems & Resources

OverviewKey sources of improvement:

Shallow processing: Web-boosting: +20%

Deep processing:COGEX logic prover + semantics: 10%Temporal processing: 2%

Relation queries:All relatively shallow:

Biggest contributors: Keyword extraction, Topic signatures

Page 40: Question-Answering: Systems & Resources

Patterns of Potential Answer Expressions…

“Insight”Shallow-pattern-based approach

Contrasts with deep processing techniques

Page 41: Question-Answering: Systems & Resources

Patterns of Potential Answer Expressions…

“Insight”Shallow-pattern-based approach

Contrasts with deep processing techniques

Intuition:Some surface patterns highly correlated to

information

Page 42: Question-Answering: Systems & Resources

Patterns of Potential Answer Expressions…

“Insight”Shallow-pattern-based approach

Contrasts with deep processing techniques

Intuition:Some surface patterns highly correlated to

informationE.g. Mozart (1756-1791)

Page 43: Question-Answering: Systems & Resources

Patterns of Potential Answer Expressions…

“Insight”Shallow-pattern-based approach

Contrasts with deep processing techniques

Intuition:Some surface patterns highly correlated to information

E.g. Mozart (1756-1791)Person – birthdate, death date

Pattern: Capitalized word; paren, 4 digits; dash; 4 digits; paren Attested 850 times in a corpus

Page 44: Question-Answering: Systems & Resources

Pattern LibraryPotentially infinite patterns

Page 45: Question-Answering: Systems & Resources

Pattern LibraryPotentially infinite patternsPattern structure:

Fixed components:Words, characters, symbols

Page 46: Question-Answering: Systems & Resources

Pattern LibraryPotentially infinite patternsPattern structure:

Fixed components:Words, characters, symbols

Variable components:Usually query terms and answer terms

Page 47: Question-Answering: Systems & Resources

Pattern LibraryPotentially infinite patternsPattern structure:

Fixed components:Words, characters, symbols

Variable components:Usually query terms and answer terms

List of 51 pattern elements – combined for patternsOrdered or unordered

Page 48: Question-Answering: Systems & Resources

Pattern LibraryPotentially infinite patternsPattern structure:

Fixed components:Words, characters, symbols

Variable components:Usually query terms and answer terms

List of 51 pattern elements – combined for patternsOrdered or unordered

More complex patterns are typically more indicative

Page 49: Question-Answering: Systems & Resources

Other ExamplesPost questions: Who is the Queen of the

Netherlands?

Page 50: Question-Answering: Systems & Resources

Other ExamplesPost questions: Who is the Queen of the

Netherlands?Beatrix, Queen of the Netherlands

Page 51: Question-Answering: Systems & Resources

Other ExamplesPost questions: Who is the Queen of the

Netherlands?Beatrix, Queen of the NetherlandsPattern elements:

Country namePost namePerson nameTitle (optional)

In some order

Page 52: Question-Answering: Systems & Resources

Basic ApproachQuestion analysis:

Identify detailed question type

Page 53: Question-Answering: Systems & Resources

Basic ApproachQuestion analysis:

Identify detailed question type

Passage retrievalCollect large number of retrieval snippets

Possibly with query expansion

Page 54: Question-Answering: Systems & Resources

Basic ApproachQuestion analysis:

Identify detailed question type

Passage retrievalCollect large number of retrieval snippets

Possibly with query expansion

Answer processing:Find matching patterns in candidates

10s of patterns/answer type

Page 55: Question-Answering: Systems & Resources

ResultsBest result in TREC-10MRR (strict) 0.676:

Correct: 289; 120 unanswered

Retrieval based on shallow patternsBag of patterns, and sequencesStill highly effective

Page 56: Question-Answering: Systems & Resources

Question Classification:

Li&Roth

Page 57: Question-Answering: Systems & Resources

RoadmapMotivation:

Page 58: Question-Answering: Systems & Resources

Why Question Classification?

Page 59: Question-Answering: Systems & Resources

Why Question Classification?

Question classification categorizes possible answers

Page 60: Question-Answering: Systems & Resources

Why Question Classification?

Question classification categorizes possible answersConstrains answers types to help find, verify answer

Q: What Canadian city has the largest population?

Type?

Page 61: Question-Answering: Systems & Resources

Why Question Classification?

Question classification categorizes possible answersConstrains answers types to help find, verify answer

Q: What Canadian city has the largest population?

Type? -> CityCan ignore all non-city NPs

Page 62: Question-Answering: Systems & Resources

Why Question Classification?

Question classification categorizes possible answersConstrains answers types to help find, verify answer

Q: What Canadian city has the largest population?

Type? -> CityCan ignore all non-city NPs

Provides information for type-specific answer selectionQ: What is a prism?Type? ->

Page 63: Question-Answering: Systems & Resources

Why Question Classification?

Question classification categorizes possible answersConstrains answers types to help find, verify answer

Q: What Canadian city has the largest population?

Type? -> CityCan ignore all non-city NPs

Provides information for type-specific answer selectionQ: What is a prism?Type? -> Definition

Answer patterns include: ‘A prism is…’

Page 64: Question-Answering: Systems & Resources

Challenges

Page 65: Question-Answering: Systems & Resources

ChallengesVariability:

What tourist attractions are there in Reims?What are the names of the tourist attractions in

Reims?What is worth seeing in Reims?

Type?

Page 66: Question-Answering: Systems & Resources

ChallengesVariability:

What tourist attractions are there in Reims?What are the names of the tourist attractions in

Reims?What is worth seeing in Reims?

Type? -> Location

Page 67: Question-Answering: Systems & Resources

ChallengesVariability:

What tourist attractions are there in Reims?What are the names of the tourist attractions in

Reims?What is worth seeing in Reims?

Type? -> Location

Manual rules?

Page 68: Question-Answering: Systems & Resources

ChallengesVariability:

What tourist attractions are there in Reims?What are the names of the tourist attractions in

Reims?What is worth seeing in Reims?

Type? -> Location

Manual rules?Nearly impossible to create sufficient patterns

Solution?

Page 69: Question-Answering: Systems & Resources

ChallengesVariability:

What tourist attractions are there in Reims?What are the names of the tourist attractions in

Reims?What is worth seeing in Reims?

Type? -> Location

Manual rules?Nearly impossible to create sufficient patterns

Solution?Machine learning – rich feature set

Page 70: Question-Answering: Systems & Resources

Approach Employ machine learning to categorize by answer

typeHierarchical classifier on semantic hierarchy of types

Coarse vs fine-grained Up to 50 classes

Differs from text categorization?

Page 71: Question-Answering: Systems & Resources

Approach Employ machine learning to categorize by answer

typeHierarchical classifier on semantic hierarchy of types

Coarse vs fine-grained Up to 50 classes

Differs from text categorization?Shorter (much!)Less information, but Deep analysis more tractable

Page 72: Question-Answering: Systems & Resources

ApproachExploit syntactic and semantic information

Diverse semantic resources

Page 73: Question-Answering: Systems & Resources

ApproachExploit syntactic and semantic information

Diverse semantic resourcesNamed Entity categoriesWordNet senseManually constructed word listsAutomatically extracted semantically similar word

lists

Page 74: Question-Answering: Systems & Resources

ApproachExploit syntactic and semantic information

Diverse semantic resourcesNamed Entity categoriesWordNet senseManually constructed word listsAutomatically extracted semantically similar word

lists

Results:Coarse: 92.5%; Fine: 89.3%Semantic features reduce error by 28%

Page 75: Question-Answering: Systems & Resources

Question Hierarchy

Page 76: Question-Answering: Systems & Resources

Learning a Hierarchical Question Classifier

Many manual approaches use only :

Page 77: Question-Answering: Systems & Resources

Learning a Hierarchical Question Classifier

Many manual approaches use only :Small set of entity types, set of handcrafted rules

Page 78: Question-Answering: Systems & Resources

Learning a Hierarchical Question Classifier

Many manual approaches use only :Small set of entity types, set of handcrafted rules

Note: Webclopedia’s 96 node taxo w/276 manual rules

Page 79: Question-Answering: Systems & Resources

Learning a Hierarchical Question Classifier

Many manual approaches use only :Small set of entity types, set of handcrafted rules

Note: Webclopedia’s 96 node taxo w/276 manual rules

Learning approaches can learn to generalizeTrain on new taxonomy, but

Page 80: Question-Answering: Systems & Resources

Learning a Hierarchical Question Classifier

Many manual approaches use only :Small set of entity types, set of handcrafted rules

Note: Webclopedia’s 96 node taxo w/276 manual rules

Learning approaches can learn to generalizeTrain on new taxonomy, but

Someone still has to label the data…

Two step learning: (Winnow)Same features in both cases

Page 81: Question-Answering: Systems & Resources

Learning a Hierarchical Question Classifier

Many manual approaches use only : Small set of entity types, set of handcrafted rules

Note: Webclopedia’s 96 node taxo w/276 manual rules

Learning approaches can learn to generalize Train on new taxonomy, but

Someone still has to label the data…

Two step learning: (Winnow) Same features in both cases

First classifier produces (a set of) coarse labels Second classifier selects from fine-grained children of coarse tags

generated by the previous stageSelect highest density classes above threshold

Page 82: Question-Answering: Systems & Resources

Features for Question Classification

Primitive lexical, syntactic, lexical-semantic featuresAutomatically derivedCombined into conjunctive, relational featuresSparse, binary representation

Page 83: Question-Answering: Systems & Resources

Features for Question Classification

Primitive lexical, syntactic, lexical-semantic featuresAutomatically derivedCombined into conjunctive, relational featuresSparse, binary representation

WordsCombined into ngrams

Page 84: Question-Answering: Systems & Resources

Features for Question Classification

Primitive lexical, syntactic, lexical-semantic featuresAutomatically derivedCombined into conjunctive, relational featuresSparse, binary representation

WordsCombined into ngrams

Syntactic features:Part-of-speech tagsChunksHead chunks : 1st N, V chunks after Q-word

Page 85: Question-Answering: Systems & Resources

Syntactic Feature ExampleQ: Who was the first woman killed in the Vietnam

War?

Page 86: Question-Answering: Systems & Resources

Syntactic Feature ExampleQ: Who was the first woman killed in the Vietnam

War?

POS: [Who WP] [was VBD] [the DT] [first JJ] [woman NN] [killed VBN] {in IN] [the DT] [Vietnam NNP] [War NNP] [? .]

Page 87: Question-Answering: Systems & Resources

Syntactic Feature ExampleQ: Who was the first woman killed in the Vietnam

War?

POS: [Who WP] [was VBD] [the DT] [first JJ] [woman NN] [killed VBN] {in IN] [the DT] [Vietnam NNP] [War NNP] [? .]

Chunking: [NP Who] [VP was] [NP the first woman] [VP killed] [PP in] [NP the Vietnam War] ?

Page 88: Question-Answering: Systems & Resources

Syntactic Feature ExampleQ: Who was the first woman killed in the Vietnam

War?

POS: [Who WP] [was VBD] [the DT] [first JJ] [woman NN] [killed VBN] {in IN] [the DT] [Vietnam NNP] [War NNP] [? .]

Chunking: [NP Who] [VP was] [NP the first woman] [VP killed] [PP in] [NP the Vietnam War] ?

Head noun chunk: ‘the first woman’

Page 89: Question-Answering: Systems & Resources

Semantic FeaturesTreat analogously to syntax?

Page 90: Question-Answering: Systems & Resources

Semantic FeaturesTreat analogously to syntax?

Q1:What’s the semantic equivalent of POS tagging?

Page 91: Question-Answering: Systems & Resources

Semantic FeaturesTreat analogously to syntax?

Q1:What’s the semantic equivalent of POS tagging?

Q2: POS tagging > 97% accurate; Semantics? Semantic ambiguity?

Page 92: Question-Answering: Systems & Resources

Semantic FeaturesTreat analogously to syntax?

Q1:What’s the semantic equivalent of POS tagging?

Q2: POS tagging > 97% accurate; Semantics? Semantic ambiguity?

A1: Explore different lexical semantic info sources

Differ in granularity, difficulty, and accuracy

Page 93: Question-Answering: Systems & Resources

Semantic FeaturesTreat analogously to syntax?

Q1:What’s the semantic equivalent of POS tagging?Q2: POS tagging > 97% accurate;

Semantics? Semantic ambiguity?

A1: Explore different lexical semantic info sourcesDiffer in granularity, difficulty, and accuracy

Named Entities WordNet SensesManual word listsDistributional sense clusters

Page 94: Question-Answering: Systems & Resources

Tagging & AmbiguityAugment each word with semantic category

What about ambiguity?E.g. ‘water’ as ‘liquid’ or ‘body of water’

Page 95: Question-Answering: Systems & Resources

Tagging & AmbiguityAugment each word with semantic category

What about ambiguity?E.g. ‘water’ as ‘liquid’ or ‘body of water’Don’t disambiguate

Keep all alternatives Let the learning algorithm sort it outWhy?

Page 96: Question-Answering: Systems & Resources

Semantic CategoriesNamed Entities

Expanded class set: 34 categoriesE.g. Profession, event, holiday, plant,…

Page 97: Question-Answering: Systems & Resources

Semantic CategoriesNamed Entities

Expanded class set: 34 categoriesE.g. Profession, event, holiday, plant,…

WordNet: IS-A hierarchy of sensesAll senses of word + direct hyper/hyponyms

Page 98: Question-Answering: Systems & Resources

Semantic CategoriesNamed Entities

Expanded class set: 34 categoriesE.g. Profession, event, holiday, plant,…

WordNet: IS-A hierarchy of senses All senses of word + direct hyper/hyponyms

Class-specific words Manually derived from 5500 questions

E.g. Class: Food {alcoholic, apple, beer, berry, breakfast brew butter candy cereal

champagne cook delicious eat fat ..} Class is semantic tag for word in the list

Page 99: Question-Answering: Systems & Resources

Semantic TypesDistributional clusters:

Based on Pantel and LinCluster based on similarity in dependency relationsWord lists for 20K English words

Page 100: Question-Answering: Systems & Resources

Semantic TypesDistributional clusters:

Based on Pantel and LinCluster based on similarity in dependency

relationsWord lists for 20K English words

Lists correspond to word sensesWater:

Sense 1: { oil gas fuel food milk liquid} Sense 2: {air moisture soil heat area rain} Sense 3: {waste sewage pollution runoff}

Page 101: Question-Answering: Systems & Resources

Semantic TypesDistributional clusters:

Based on Pantel and LinCluster based on similarity in dependency

relationsWord lists for 20K English words

Lists correspond to word sensesWater:

Sense 1: { oil gas fuel food milk liquid} Sense 2: {air moisture soil heat area rain} Sense 3: {waste sewage pollution runoff}

Treat head word as semantic category of words on list

Page 102: Question-Answering: Systems & Resources

EvaluationAssess hierarchical coarse->fine classificationAssess impact of different semantic featuresAssess training requirements for diff’t feature

set

Page 103: Question-Answering: Systems & Resources

EvaluationAssess hierarchical coarse->fine classificationAssess impact of different semantic featuresAssess training requirements for diff’t feature

setTraining:

21.5K questions from TREC 8,9; manual; USC dataTest:

1K questions from TREC 10,11

Page 104: Question-Answering: Systems & Resources

EvaluationAssess hierarchical coarse->fine classificationAssess impact of different semantic featuresAssess training requirements for diff’t feature setTraining:

21.5K questions from TREC 8,9; manual; USC dataTest:

1K questions from TREC 10,11Measures: Accuracy and class-specific precision

Page 105: Question-Answering: Systems & Resources

ResultsSyntactic features only:

POS useful; chunks useful to contribute head chunksFine categories more ambiguous

Page 106: Question-Answering: Systems & Resources

ResultsSyntactic features only:

POS useful; chunks useful to contribute head chunksFine categories more ambiguous

Semantic features:Best combination: SYN, NE, Manual & Auto word lists

Coarse: same; Fine: 89.3% (28.7% error reduction)

Page 107: Question-Answering: Systems & Resources

ResultsSyntactic features only:

POS useful; chunks useful to contribute head chunksFine categories more ambiguous

Semantic features: Best combination: SYN, NE, Manual & Auto word lists

Coarse: same; Fine: 89.3% (28.7% error reduction)

Wh-word most common class: 41%

Page 108: Question-Answering: Systems & Resources
Page 109: Question-Answering: Systems & Resources
Page 110: Question-Answering: Systems & Resources

ObservationsEffective coarse and fine-grained categorization

Mix of information sources and learningShallow syntactic features effective for coarseSemantic features improve fine-grained

Most feature types help WordNet features appear noisy Use of distributional sense clusters dramatically

increases feature dimensionality

Page 111: Question-Answering: Systems & Resources

Software ResourcesBuild on existing tools

Focus on QA specific tasksGeneral: Machine learning tools

Page 112: Question-Answering: Systems & Resources

Software ResourcesGeneral: Machine learning tools

Mallet: http://mallet.cs.umass.eduWeka toolkit: www.cs.waikato.ac.nz/ml/weka/

Page 113: Question-Answering: Systems & Resources

Software ResourcesGeneral: Machine learning tools

Mallet: http://mallet.cs.umass.eduWeka toolkit: www.cs.waikato.ac.nz/ml/weka/

NLP toolkits, collections:GATE: http://gate.ac.ukNLTK: http://www.nltk.orgLingPipe: alias-i.com/lingpipe/Stanford NLP tools: http://nlp.stanford.edu/software/

Page 114: Question-Answering: Systems & Resources

Software Resources: Specific

Information retrieval:Lucene: http://lucene.apache.org (on patas)

Standard system, tutorials

Indri/Lemur: http://www.lemurproject.org/indri/High quality research system

Managing Gigabytes: http://ww2.cs.mu.oz.au/mg//Linked to textbook on IR

Page 115: Question-Answering: Systems & Resources

Software Resources: Cont’d

POS taggers:Stanford POS taggerTreetagger Maxent POS taggerBrill tagger

Stemmers: http://snowball.tartarus.org Implementations of Porter stemmer in many langs

Sentence splittersNIST

Page 116: Question-Answering: Systems & Resources

Software ResourcesParsers:

Constituency parserStanford parserCollins/Bikel parserCharniak parser

Dependency parsersMinipar

WSD packages:WordNet::Similarity

Page 117: Question-Answering: Systems & Resources

Software ResourcesSemantic analyzer:

Shalmaneser

Databases, ontologies:WordNetFrameNetPropBank

Page 118: Question-Answering: Systems & Resources

Information ResourcesProxies for world knowledge:

WordNet: Synonymy; IS-A hierarchy

Page 119: Question-Answering: Systems & Resources

Information ResourcesProxies for world knowledge:

WordNet: Synonymy; IS-A hierarchyWikipedia

Page 120: Question-Answering: Systems & Resources

Information ResourcesProxies for world knowledge:

WordNet: Synonymy; IS-A hierarchyWikipediaWeb itself….

Training resources:Question classification sets (UIUC)Other TREC QA data (Questions, Answers)


Recommended