+ All Categories
Home > Documents > Research--Probabilistic Models in Information Retrieval

Research--Probabilistic Models in Information Retrieval

Date post: 30-May-2018
Category:
Upload: andrew-denner
View: 216 times
Download: 0 times
Share this document with a friend

of 29

Transcript
  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    1/29

    PROBABILISTICPROBABILISTIC

    MODELS INMODELS IN

    INFORMATIONINFORMATIONRETRIEVAL

    Norbert Fuhr

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    2/29

    Introduction

    The intrinsic uncertainty of IR.

    Two approaches:Relevance models

    Proof-theoretic model

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    3/29

    Relevance models

    A user assigns relevance judgments todocument w.r.t. his/her query.

    The IR systems yield the approximationof the set of relevant documents.

    Some models: BIR model, BII model, DIAmodel, etc

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    4/29

    Relevance models

    Binary independence retrieval model(BIR)A document d_m is composed of a set of

    terms and represented as a vector.Assumptions:cluster hypothesis: Terms are distributed

    differently within relevant and non-relevant

    documents.A query q_k is also a set of terms.

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    5/29

    Relevance models

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    6/29

    Relevance models

    An example

    Ranking is (1,1),(1,0),(0,1),(0,)

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    7/29

    The probability rankingrinci le

    Let C be the costs for the retrieval of arelevant document. for non-relevantdocuments.

    Retrieve that document for which the expectedcosts of retrieval are a minimum.

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    8/29

    Proof-theoretic model

    IR is interpreted as uncertain inference.

    A generation of deductive databases:queries and contents are treated as logical

    formulas.

    The query has to be proved from theformulas.

    A document is an answer for a queryiffthe logic formula is true.

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    9/29

    GOAL-CENTRICGOAL-CENTRIC

    TRACEABILITY FORTRACEABILITY FOR

    MANAGING NON-MANAGING NON-FUNCTIONAL

    Jane Cleland-Huang, Reffaella Settimi, Oussama

    BenKhadra, Eugenia Berezhanskaya, Selvia Christina

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    10/29

    Non-Functional Requirements (NFR) aredifficult to trace:Global impact upon a software system

    Extensive network of interdependencies andtrace-offs

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    11/29

    Goal centric traceability (GCT) approach:NFRs are modeled as goals and

    operationalizations within SIG.

    Dynamically establish traces from impactedfunctional design element to elements in SIG.

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    12/29

    Softgoal InterdependencyGra h

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    13/29

    GCT Model

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    14/29

    Impact detection in GCTDocuments

    Queries

    Index terms

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    15/29

    The relevance of a document to aquery q is pr( ,q)

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    16/29

    UTILIZINGUTILIZING

    SUPPORTINGSUPPORTING

    EVIDENCE TOEVIDENCE TOIMPROVE DYNAMIC

    Jane Cleland-Huang, Reffaella Settimi, Chuan Duan,

    Xuchang Zou

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    17/29

    Introduction

    Current workRecall level close to 90%

    Precision from 10% to 45%.

    Target:Maintain recall level at least 90%

    Precision at least 20%

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    18/29

    Introduction

    Three strategies to improve theperformance of dynamic requirementstraceability:

    Hierarchical modelingLogical clustering of artifacts

    Semi-automated pruning of the probabilisticnetwork.

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    19/29

    Enhancement strategies

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    20/29

    Motivation Example

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    21/29

    Hierarchical

    R3 label is De-icing

    Using hierarchical information in R3 ->R5 describe de-icing service.

    Similarly, C4 describe about truckmaintenance service.

    The link between C4 and R5 is notcorrect !!!

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    22/29

    Hierarchical

    Solution:Build a DAG graph to display the direct

    relationship between artifacts.

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    23/29

    Results

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    24/29

    Clustering

    Links tend to occur

    in clusters:

    q d_j => higher

    prob that q dq q_i => higher

    prob that d q

    Care about relationship

    of sibling artifacts.

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    25/29

    Clustering

    Solution

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    26/29

    Clustering

    Evaluation

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    27/29

    Graph PruningEnhancementObservation:Word schedule used for both de-icing

    schedule and truck maintenance schedules

    Query with schedule will returns artifactsfrom both domains make precision lower.

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    28/29

    Graph PruningEnhancementSolution:Utilize initial decision made by the analyst to

    place constraints and improve precision inproblematic area.

    Rules to place constrains:

    1. One or more links between two groups areall rejected by an analyst.

    2. Basic retrieval algorithm generatedcandidate links between two groups.

  • 8/14/2019 Research--Probabilistic Models in Information Retrieval

    29/29

    Graph PruningEnhancementEvaluation


Recommended