+ All Categories
Home > Documents > Beyond Keywords: Finding Information More Accurately and...

Beyond Keywords: Finding Information More Accurately and...

Date post: 28-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
42
Beyond Keywords: Beyond Keywords: Finding Information More Accurately Finding Information More Accurately and Easily Using Natural Language and Easily Using Natural Language Matt Lease Matt Lease [email protected] [email protected] Brown Laboratory for Linguistic Brown Laboratory for Linguistic Information Processing (BLLIP) Information Processing (BLLIP) Brown University Brown University Center for Intelligent Information Center for Intelligent Information Retrieval (CIIR) Retrieval (CIIR) University of Massachusetts Amherst University of Massachusetts Amherst
Transcript
Page 1: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

Beyond Keywords:Beyond Keywords:Finding Information More AccuratelyFinding Information More Accuratelyand Easily Using Natural Languageand Easily Using Natural Language

TexPoint fonts used in EMF.Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAAAA

Matt LeaseMatt [email protected]@cs.brown.edu

Brown Laboratory for LinguisticBrown Laboratory for LinguisticInformation Processing (BLLIP)Information Processing (BLLIP)

Brown UniversityBrown University

Center for Intelligent InformationCenter for Intelligent InformationRetrieval (CIIR)Retrieval (CIIR)

University of Massachusetts AmherstUniversity of Massachusetts Amherst

Page 2: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

2Matt Lease <[email protected]>

All Results Relevant !

Page 3: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

3Matt Lease <[email protected]>

What is the state ofWhat is the state ofrecognizing handwriting inrecognizing handwriting intoday's computer systems?today's computer systems?

Only 2 relevant results!

1st relevant result: rank 5

Page 4: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

4Matt Lease <[email protected]>

Community Q&ACommunity Q&A

Page 5: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

5Matt Lease <[email protected]>

Searching off the DesktopSearching off the Desktop

Longer and more natural queries emergeLonger and more natural queries emergein spoken settingsin spoken settings [Du and Crestani[Du and Crestani’’06]06]

Page 6: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

6Matt Lease <[email protected]>

Verbosity and ComplexityVerbosity and Complexity▶ Complex information requires complex descriptionComplex information requires complex description

Information theory [ShannonInformation theory [Shannon’’51]51]

Human discourse implicitly respects this [GriceHuman discourse implicitly respects this [Grice’’67]67]

▶ Simple searches easily expressed in keywordsSimple searches easily expressed in keywords

navigation: navigation: ““alaskaalaska airlines airlines””

information: information: ““americanamerican revolution revolution””

▶ Verbosity naturally increases with complexityVerbosity naturally increases with complexity

More specific information needs [More specific information needs [PhanPhan et al. et al.’’07]07]

Iterative reformulation [Lau and HorvitzIterative reformulation [Lau and Horvitz’’99]99]

Keywords?

Page 7: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

7Matt Lease <[email protected]>

Outline of TalkOutline of Talk▶ Natural language queries: what, where & why?

▶ Term-based models for NL queries

Problem: query complexity → query ambiguity

▶ Regression Rank [Lease, Allan, and Croft, ECIR’09]

Learning framework independent of retrieval model

▶ Extensions

Modeling term relationships [Lease, SIGIR’09]

Relevance feedback: explicit and pseudo [Lease, TREC’08]

Page 8: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

8Matt Lease <[email protected]>

.R elevance(Q;D) =

X

w2V

weightQD (w).

R elevance(Q;D) =X

w2V

weightQD (w)+ P rior(D )

Term-Based RetrievalTerm-Based Retrieval

Standard approachesStandard approaches

▶ Vector-similarity Vector-similarity [Salton et al.[Salton et al.’’60s, 60s, SinghalSinghal et al. et al.’’96]96]

▶ Document-likelihood Document-likelihood [[SparckSparck Jones et al. Jones et al.’’00]00]

▶ Query-likelihood Query-likelihood [Ponte and Croft[Ponte and Croft’’98]98]

KL-divergence variant [KL-divergence variant [Lafferty and Zhai’01]]

Roughly same features and accuracyRoughly same features and accuracy [Fang et al.[Fang et al.’’04]04]

DL QL under = parameterization DL QL under = parameterization [Lease, SIGIR[Lease, SIGIR’’09]09]

.r ank=

Page 9: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

9Matt Lease <[email protected]>

KL-Divergence RankingKL-Divergence Ranking▶ Estimate a unigram Estimate a unigram ££DD underlying each document underlying each document

Length- & order-independent representation of topicalityLength- & order-independent representation of topicality

Smoothing assigns non-zero probability to unseen termsSmoothing assigns non-zero probability to unseen terms

▶ Estimate similar unigram Estimate similar unigram ££Q Q underlying the queryunderlying the query

Default: maximum-likelihood (ML) estimationDefault: maximum-likelihood (ML) estimation

▶ Rank documents by minimal KL(Rank documents by minimal KL(££Q Q || || ££DD)) - KL(- KL(££Q Q || || ££DD)) == ££QQ ¢ ¢ log log ££DD + C+ CQQ

▶ Key IdeaKey Idea: : weightweightQDQD((¢¢)) decomposed into decomposed into ££Q Q && ££DD

££D D fixed for all queries (Dirichlet smoothing)fixed for all queries (Dirichlet smoothing)

££QQ expresses importance of terms for a given query expresses importance of terms for a given query

E xample: D = \duck duck goose"

ML estimate: µDduck =

23 , µ

Dgoose =

13

Smoothed: µDduck <

23 , µ

Dgoose <

13 ; (8w) µ

Dw > 0

E xample: D = \duck duck goose"

ML estimate: µDduck =

23 , µ

Dgoose =

13

Page 10: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

10Matt Lease <[email protected]>

Verbosity vs. Retrieval AccuracyVerbosity vs. Retrieval AccuracyTREC Topic 838TREC Topic 838

TitleTitle: : ““urban suburban coyotesurban suburban coyotes””DescriptionDescription: : ““How have humans responded and how should they respondHow have humans responded and how should they respond

to the appearance of coyotes in urban and suburban areas?to the appearance of coyotes in urban and suburban areas?””

Page 11: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

11Matt Lease <[email protected]>

Verbosity vs. Retrieval AccuracyVerbosity vs. Retrieval AccuracyTREC Topic 838TREC Topic 838

TitleTitle: : ““urban suburban coyotesurban suburban coyotes”” <urban suburban <urban suburban coyotcoyot>>

DescriptionDescription: : ““How have humans responded and how should they respondHow have humans responded and how should they respondto the appearance of coyotes in urban and suburban areas?to the appearance of coyotes in urban and suburban areas?””

<human respond <human respond respondrespond appear appear coyotcoyot urban suburban area> urban suburban area>

Average Precision example:Average Precision example:AP = (1/1 + 2/2 + 3/5) / 3AP = (1/1 + 2/2 + 3/5) / 3 1 2 3 4 5

NaturalLanguage?

Page 12: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

12Matt Lease <[email protected]>

RRIA Workshop [Buckley and Harman’04]

▶ 10-40 hours error analysis per-query, 45 Description queries

▶ Models failed to emphasize the right terms for Models failed to emphasize the right terms for ¼¼ 2/3 queries 2/3 queries

Verbosity vs. Retrieval Accuracy (2)Verbosity vs. Retrieval Accuracy (2)

Document C ollection Type # Documents # QueriesRobust04 Newswire 528,155 250W10g Web 1,692,096 100GOV2 Web 25,205,179 150

Mean Average Precision (MAP):per-query AP averaged across queries

Page 13: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

13Matt Lease <[email protected]>

Problem: Query AmbiguityProblem: Query Ambiguity

ML assumes all query tokens equally important to ML assumes all query tokens equally important to ££QQ!!

▶ The core information is often obscuredThe core information is often obscured

▶ Details distract rather than informDetails distract rather than inform

<human respond respond appear coyot urban suburban area>

Page 14: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

14Matt Lease <[email protected]>

Example: Better EstimateExample: Better Estimate ££QQ

More important terms should be assigned greater weight in More important terms should be assigned greater weight in ££QQ

How to estimate How to estimate ££Q Q ????

Page 15: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

15Matt Lease <[email protected]>

Outline of TalkOutline of Talk▶ Natural language queries: what, where & why?

▶ Term-based models for NL queries

Problem: query complexity → query ambiguity

▶ Regression Rank [Lease, Allan, and Croft, ECIR’09]

Learning framework independent of retrieval model

▶ Extensions

Modeling term relationships [Lease, SIGIR’09]

Relevance feedback: explicit and pseudo [Lease, TREC’08]

Page 16: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

16Matt Lease <[email protected]>

Supervised Learning of Supervised Learning of ££QQ

▶ Training data: Training data: document relevancedocument relevance Known relevance: documents manually assessedKnown relevance: documents manually assessed

Inferred relevance: query log Inferred relevance: query log ““click-throughclick-through”” data data

▶ Potential benefitsPotential benefits Data-driven: let examples guide estimationData-driven: let examples guide estimation

Lifetime learning: continually improve with more dataLifetime learning: continually improve with more data

Expressiveness: keep terms, replace estimationExpressiveness: keep terms, replace estimation

▶ Challenge: Challenge: sparsitysparsity One parameter per vocabulary term [cf. Mei et al.One parameter per vocabulary term [cf. Mei et al.’’07]07]

Existing Existing Learning To Rank Learning To Rank methods donmethods don’’t address thist address this

Page 17: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

17Matt Lease <[email protected]>

00

11

00

00

11

respondrespond

coyotcoyot

urbanurban

suburbansuburban

DallasDallas

4.134.13

3.483.48

3.833.83

3.733.73

3.233.23

0.030.03

0.30.3

0.110.11

0.160.16

0.400.40

Query Capitalized? Is noun? Log(DF) £Q

¸1 + ¸2 + ¸3 =

00

00

00

00

11

Regression Rank Regression Rank [Lease et al.[Lease et al.’’09]09]▶ IdeaIdea: Predict : Predict ££Q Q using fewer parametersusing fewer parameters

Find features correlated with Find features correlated with ££QQ (term importance) (term importance)

Predict Predict ££Q Q from these featuresfrom these features

Page 18: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

18Matt Lease <[email protected]>

EstimationEstimation, , Feature ExtractionFeature Extraction, , RegressionRegression

.d£Q1

.d£Q2

Estimate“gold” £Q’s

.d£Q3

“gold” £Q’sTraining Examples

Features

Feature Extraction

F = {f1, f2, f3}

Regression Training

Feature Weights

¤ ={¸1, ¸2, ¸3}

Feature ExtractionFeatures

Predicted £Q

.d£Qn

Regression Prediction ¤ ¢ F = £Q

Training

1 2 3

Input Query

n

Estimation: Given relevant/non-relevant documents, find strong £Q

Explicit relevance feedback with massive feedbackFeature Extraction: define features correlated with term importance Regression: predict £Q given features Run-time

Page 19: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

19Matt Lease <[email protected]>

Regression Rank: Regression Rank: EstimationEstimation▶ GoalGoal: optimize : optimize ££QQ for rank-based metric (e.g. AP) for rank-based metric (e.g. AP)

ChallengeChallenge: non-differentiable, non-convex: non-differentiable, non-convex

Simpler metrics to optimize, but diverge from goalSimpler metrics to optimize, but diverge from goal

▶ Grid search (sampling)Grid search (sampling)

[cf. Metzler and Croft[cf. Metzler and Croft’’05]05]

Embarrassingly parallelEmbarrassingly parallel

Exponential # samplesExponential # samples

E AP [£Q]=

1

Z

X

s

AP (£Qs )£

Qs

argmax£ Q AP (£

Qs )

Page 20: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

20Matt Lease <[email protected]>

ML £Q

[1, 0, 0][0, 1, 0][0, 0, 1]

Estimation Estimation ExampleExample

.

E AP [£Q]=

1

Z

X

s

AP (£Qs )£

Qs ; Z = 0:3859+ 0:2992+ 0:4897= 1:175

=0:3859

1:175£

Q1+

0:2992

1:175£

Q2+

0:4897

1:175£

Q3

=[0:3285;0:2547;0:4168 ]

AP(£Q)0.38590.29920.4897

Sub-queryQ1: humanQ2: suburbanQ3: urban

Query: [human suburban urban]

Page 21: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

21Matt Lease <[email protected]>

EstimationEstimation, , Feature ExtractionFeature Extraction, , RegressionRegressionTraining Examples

Features

Feature Extraction

F = {f1, f2, f3}

Training

1 2 3

Feature Extraction: define features correlated with term importance

Page 22: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

22Matt Lease <[email protected]>

Regression Rank: Regression Rank: FeaturesFeatures▶ FeaturesFeatures

Traditional IR statisticsTraditional IR statistics: e.g. term frequency, document frequency: e.g. term frequency, document frequency

▶ source: document collection & large external corporasource: document collection & large external corpora

PositionPosition: integer index of term in query: integer index of term in query

Lexical contextLexical context: Preceding/following terms and punctuation: Preceding/following terms and punctuation

Syntactic part-of-speechSyntactic part-of-speech: e.g. is term a noun / verb / other?: e.g. is term a noun / verb / other?

▶ Feature normalization: Feature normalization: set mean=0 & standard deviation=1set mean=0 & standard deviation=1

▶ Feature selectionFeature selection: prune features occurring <12 times: prune features occurring <12 times

Page 23: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

23Matt Lease <[email protected]>

EstimationEstimation, , Feature ExtractionFeature Extraction, , RegressionRegression

.d£Q1

.d£Q2

Estimate“gold” £Q’s

.d£Q3

“gold” £Q’sTraining Examples

Features

Feature Extraction

F = {f1, f2, f3}

Regression Training

Feature Weights

¤ ={¸1, ¸2, ¸3}

Training

1 2 3

Estimation: given relevant/non-relevant documents, find strong£Q

Feature Extraction: define features correlated with term importance Regression: predict £Q given features

Page 24: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

24Matt Lease <[email protected]>

▶ Ridge regression (L2 regularization of least-squares)Ridge regression (L2 regularization of least-squares)

Consistently better than ML, Lasso (L1), and othersConsistently better than ML, Lasso (L1), and others

Metric divergence (squared-loss vs. AP)Metric divergence (squared-loss vs. AP)

Regression Rank: Regression Rank: RegressionRegression

00

11

00

00

11

respondrespond

coyotcoyot

urbanurban

suburbansuburban

DallasDallas

4.134.13

3.483.48

3.833.83

3.733.73

3.233.23

0.030.03

0.30.3

0.110.11

0.160.16

0.400.40

Query Capitalized? Is noun? Log(DF) £Q

¸1 + ¸2 + ¸3 =

00

00

00

00

11

Page 25: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

25Matt Lease <[email protected]>

▶ Learning framework is independent of retrieval modelLearning framework is independent of retrieval model

e.g. Predict weights for term-interactions rather than termse.g. Predict weights for term-interactions rather than terms

Similar to Probabilistic Indexing Similar to Probabilistic Indexing [Fuhr and Buckley’91]

▶ Can learn context-dependent term weightsCan learn context-dependent term weights

Model richer context than just query lengthModel richer context than just query length

▶ Together: query-specific LTR Together: query-specific LTR [[GengGeng et al. et al.’’08]08]

e.g. Dynamically-weighted mixture modele.g. Dynamically-weighted mixture model

Regression Rank: StrengthsRegression Rank: Strengths

Page 26: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

26Matt Lease <[email protected]>

Key Concepts Key Concepts [[BenderskyBendersky and Croft and Croft’’08]08]▶ Annotate Annotate ““keykey”” NP for each query, train a classifier NP for each query, train a classifier

▶ Weight NPs by classifier confidence, and mix with ML Weight NPs by classifier confidence, and mix with ML ££QQ

Document C ollection Type # Documents # QueriesRobust04 Newswire 528,155 250W10g Web 1,692,096 100GOV2 Web 25,205,179 150

Page 27: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

27Matt Lease <[email protected]>

Regression Rank: ResultsRegression Rank: Results

Collection Type # Documents # Queries # Dev QueriesRobust04 Newswire 528,155 250 150W10g Web 1,692,096 100 -GOV2 Web 25,205,179 150 -

BLIND

5-fold cross-validation

▶ Fully-predicts all parameters (no mixing/tying)Fully-predicts all parameters (no mixing/tying)▶Can optimize model accuracy for any metricCan optimize model accuracy for any metric▶ Lifetime learning from query logLifetime learning from query log

Page 28: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

28Matt Lease <[email protected]>

Example: Predicted Example: Predicted ££QQ

TREC Topic 838TREC Topic 838How have humans responded and how should they respond toHow have humans responded and how should they respond to

the appearance of coyotes in urban and suburban areas?the appearance of coyotes in urban and suburban areas?<human respond respond appear coyot urban suburban areas>

E AP [£Q]=

1

Z

X

s

AP (£Qs )£

Qs

Page 29: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

29Matt Lease <[email protected]>

Room for Further ImprovementRoom for Further Improvement▶ Expectation below restricted to query vocabularyExpectation below restricted to query vocabulary

Expand vocabulary: feedback documentsExpand vocabulary: feedback documents

Model more than terms: e.g. term-interactionsModel more than terms: e.g. term-interactions

Page 30: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

30Matt Lease <[email protected]>

Outline of TalkOutline of Talk▶ Natural language queries: what, where & why?

▶ Term-based models for NL queries

Problem: query complexity → query ambiguity

▶ Regression Rank [Lease, Allan, and Croft, ECIR’09]

Learning framework independent of retrieval model

▶ Extensions

Modeling term relationships [Lease, SIGIR’09]

Relevance feedback: explicit and pseudo [Lease, TREC’08]

Page 31: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

31Matt Lease <[email protected]>

Sequential Dependency ModelSequential Dependency Model▶ [Metzler and Croft[Metzler and Croft’’05]05]

Simple, efficient,Simple, efficient, & consistently beats unigram& consistently beats unigram

▶ConsecutiveConsecutive query terms are scored 3 ways query terms are scored 3 ways Individual occurrence: Individual occurrence: unigramunigram

Co-occurrence: Co-occurrence: adjacencyadjacency (ordered) & (ordered) & proximityproximity

▶ExampleExampleWhat research is ongoing for new What research is ongoing for new fuel sourcesfuel sources??

Document = Document = ““fuel source fuel sourcefuel source fuel source””unigramunigramadjacencyadjacencyproximityproximity

Page 32: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

32Matt Lease <[email protected]>

Better Estimation of SD UnigramBetter Estimation of SD Unigram▶Estimate SD Unigram by Regression RankEstimate SD Unigram by Regression Rank

Adjacency and Proximity still use MLAdjacency and Proximity still use ML Consistent improvement [Lease, SIGIRConsistent improvement [Lease, SIGIR’’09]09]

Page 33: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

33Matt Lease <[email protected]>

Dependency Importance Varies tooDependency Importance Varies too

What research is ongoing for new fuel sources?What research is ongoing for new fuel sources?<research ongoing new fuel sources><research ongoing new fuel sources>{{research,ongoingresearch,ongoing} {} {ongoing,newongoing,new} {} {new,fuelnew,fuel} {} {fuel,sourcesfuel,sources}}

Page 34: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

34Matt Lease <[email protected]>

Filtering Spurious DependenciesFiltering Spurious DependenciesOracle ExperimentOracle Experiment [Lease, SIGIR [Lease, SIGIR’’09]09]

Rank dependencies by expected weightRank dependencies by expected weight

Successively add them in rank orderSuccessively add them in rank order

▶3% better MAP using single best dependency3% better MAP using single best dependency

Page 35: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

35Matt Lease <[email protected]>

Next: Estimate Dependency WeightsNext: Estimate Dependency Weights

▶Apply current features like TF/IDFApply current features like TF/IDF

▶Add new term relationship featuresAdd new term relationship features

Syntax, collocations, named-entities, etc.Syntax, collocations, named-entities, etc.

Page 36: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

36Matt Lease <[email protected]>

Outline of TalkOutline of Talk▶ Natural language queries: what, where & why?

▶ Term-based models for NL queries

Problem: query complexity → query ambiguity

▶ Regression Rank [Lease, Allan, and Croft, ECIR’09]

Learning framework independent of retrieval model

▶ Extensions

Modeling term relationships [Lease, SIGIR’09]

Relevance feedback: explicit and pseudo [Lease, TREC’08]

Page 37: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

37Matt Lease <[email protected]>

Relevance Feedback (Explicit & Pseudo)Relevance Feedback (Explicit & Pseudo)

▶ Idea: Idea: Better estimate Better estimate ££QQ using related documents using related documents

Particularly valuable for finding other related termsParticularly valuable for finding other related terms

▶ Explicit: Explicit: Given examples of relevant documentsGiven examples of relevant documents

Compute average Compute average ££DD, mix with query , mix with query ££QQ

▶ Pseudo: Pseudo: Blind expansionBlind expansion

Score documents with Score documents with ££QQ

Compute expected Compute expected ££DD, mix with query , mix with query ££QQ

▶ How can we apply supervised learning here?How can we apply supervised learning here?

[Rochio[Rochio’’71, 71, LavrenkoLavrenko and Croft and Croft’’01, Lafferty and Zhai01, Lafferty and Zhai’’01]01]

Page 38: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

38Matt Lease <[email protected]>

Preliminaries: TRECPreliminaries: TREC’’08 RF Track08 RF Track▶ Varied feedback: none (ad-hoc) to many documentsVaried feedback: none (ad-hoc) to many documents

▶ Approach: RF + PRF + Sequential Term DependenciesApproach: RF + PRF + Sequential Term Dependencies

▶ Best results in track [LeaseBest results in track [Lease’’08] (GOV2)08] (GOV2)

Page 39: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

39Matt Lease <[email protected]>

Step 1: Supervised Step 1: Supervised ££Q Q + PRF+ PRF

Without PRF With PRF

▶Are supervision and PRF complementary?Are supervision and PRF complementary?

▶Yes, and dependencies too! [Lease, SIGIRYes, and dependencies too! [Lease, SIGIR’’09]09]

Page 40: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

40Matt Lease <[email protected]>

Outlook: Supervised RF/PRFOutlook: Supervised RF/PRF▶ [Cao et al.[Cao et al.’’08]08]

Standard PRF: only 17% terms help, Standard PRF: only 17% terms help, 26-37% 26-37% hurthurt

Classify terms as good/bad, weight by confidenceClassify terms as good/bad, weight by confidence

Some details of approach can be improvedSome details of approach can be improved

▶ Future workFuture work: apply Regression Rank: apply Regression Rank

Feedback Feedback document(sdocument(s) just more verbosity) just more verbosity

Apply better learning, more featuresApply better learning, more features

Page 41: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

41Matt Lease <[email protected]>

SummarySummary▶ Natural language queries: what, where & why?

▶ Term-based models for NL queries

Problem: query complexity → query ambiguity

▶ Regression Rank [Lease, Allan, and Croft, ECIR’09]

Learning framework independent of retrieval model

▶ Extensions

Modeling term relationships [Lease, SIGIR’09]

Relevance feedback: explicit and pseudo [Lease, TREC’08]

Page 42: Beyond Keywords: Finding Information More Accurately and ...feast.coli.uni-saarland.de/slides/LeaseM240609.pdf · Beyond Keywords: Finding Information More Accurately and Easily Using

Brown Laboratory for Linguistic Information Processing (BLLIP)Brown Laboratory for Linguistic Information Processing (BLLIP)Brown UniversityBrown University

http://http://bllip.cs.brown.edubllip.cs.brown.edu

Center for Intelligent Information Retrieval (CIIR)Center for Intelligent Information Retrieval (CIIR)University of Massachusetts AmherstUniversity of Massachusetts Amherst

http://http://ciir.cs.umass.educiir.cs.umass.edu

Support for this work comes from theSupport for this work comes from theNational Science FoundationNational Science Foundation

Partnerships for International Research and Education (PIRE)Partnerships for International Research and Education (PIRE)


Recommended