+ All Categories
Home > Documents > Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short...

Wong Cheuk Fun Presentation on Keyword Search. Head, Modifier, and Constraint Detection in Short...

Date post: 01-Jan-2016
Category:
Upload: madeleine-hodges
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:

of 21

Click here to load reader

Transcript

Presentation on Keyword Search

Wong Cheuk FunPresentation on Keyword SearchHead, Modifier, and Constraint Detection in Short TextsZhongyuan Wang, Haixun Wang, Zhirui HuPopular iphone 5s smart coverModifiersConstraintHead

90% of distinct queries consist of 2 or more componentsDetection ChallengesNo grammar rulesPopular iphone 5s smart cover vs Popular smart cover iphone 5sRequire external knowledgeJob search vs Job interviewInstance-level head-modifier knowledgeConceptual knowledgeConcept-level head-modifier knowledgeDetection Approach(concept[head], concept[modifier], score)

e.g. (accessary[head], device[modifier], 0.9)

Three major challenges:Knowledges coverage to handle all possible inputAvoid deriving conflicting patternsIdentify constraints from non-constraint modifiersMining Concept Patterns -- Probase

IsA taxonomyEntities vs concepts(Barack Obama) vs USA president2.7 million conceptsP(e|c) tells how popular eas concept c is concernedand vice versa.e.g.P(Fujitsu|Computer)> P(Acer|Computer)

n(e,c) denotes the frequencies of e and c occur togetherMining Concept Patterns Instance-level Head-ModifiersIdentify head and modifiers no matter what their orders aresmart cover for iphone 5sOther prepositions:of, with, in, on, atWhen they are used, (A for B, A of B, A with B) it is almost always true that A is the head and B is the constraint.

Mining Concept Patterns Concept-level Head-ModifiersLevels of Conceptualization (head, modifier, score) (smart cover, iphone 5s) too specific, (obj, obj) too general Conflicting rules: (company, device) vs (device, company)Conceptualizing instancesMap e to c if P(c|e) is among top k;Map e to c if P(e|c) is among top k;Map e to c if P(c|e)*P(e|c) is amont top k;Map e to itself if e is itself a conceptFirst two are not desirable as they are either too general or too specificFor(3), larger value shows evidence of the closeness between c and e.For(4), we use entropy to identify popular instance:

Mining Concept Patterns Conceptualizing PairsTerm apple conceptualizes to fruit or companyCEO for apple (CEO, fruit), (CEO, company)Obviously, (CEO, fruit) is wrong.

Wrong concept pairs introduced will be filtered out due to low score

Head and Modifier Detection Parsing1. Text are parsed using Probase*New York and New York Times

2. Remove non-constraint modifiers

3. Cluster terms Cluster short text having more than one head(e.g. apple ipad microsoft surface)Reduce pair for conceptualization

Head and Modifier Detection for 2 components

Head and Modifier Detection for > 2 components

Modifier can thus be ranked by its closeness to the headFor query college football player, we remove the likely weakest edge college player.

Mining non-constraint modifiersTop query Seattle, good travelling hostelNon-constraint modifiers: Top, good

Non-constraint modifiers are more likely on the left of the querye.g. cheap red shoe instead of red cheap shoe

Mining non-constraint modifiers using Probase2.7 million concepts

Mining non-constraint modifiers mining processConstruct modifier networks based on observationsCalculate score of each node as a non-constraint modifier in the networks

Lower PMS makes it a non-constraint modifier

Framework for head, modifier and constraint detection

On Masking Topical Intent in Keyword SearchPeng Wang and Chinya V. Ravishankar

Keyword-Based ObfuscationHide real query in a mass of dummy queries generated using a Dummy Query Generation Algorithm (DGA).Advantage: Purely client-based

Disadvantage: Not secure, cannot ensure real and dummy queries are indistinguishable

Topical Intent ObfuscationFor a real user query q, dummy queries are created matching other topics.*Topic Relevance ensure obfuscationUnder two thresholds, , ( < ), with topic t and query q,Pr[t] : ts relevance based on general interest patternPr[t|q] : ts relevance after taking q into accountPr[t|q] - Pr[t] > t is relevant to q.Aim: Pr[t|q] - Pr[t] < to create irrelevant dummy queries


Recommended