+ All Categories
Home > Documents > Www2013 acronym-mining

Www2013 acronym-mining

Date post: 13-Apr-2017
Category:
Upload: tao-cheng
View: 488 times
Download: 0 times
Share this document with a friend
30
Mining Acronym Expansions and Their Meanings Using Query Click Log 06/19/2022 WWW 2013 ilyana Taneva, Tao Cheng, Kaushik Chakrabarti, Yeye DMX Group, Microsoft Research
Transcript
Page 1: Www2013 acronym-mining

Mining Acronym Expansions and Their Meanings Using Query Click

Log

05/02/2023 WWW 2013

Bilyana Taneva, Tao Cheng, Kaushik Chakrabarti, Yeye He

DMX Group, Microsoft Research

Page 2: Www2013 acronym-mining

The Popularity of Acronyms

Acronym: abbreviations formed from the initial components of words or phrases

E.g., CMU, MIT, RISC, MBA, …

Acronyms are very commonly used inWeb search TweetsText messages…

Even more common on mobile devices

Page 3: Www2013 acronym-mining

Acronym Characteristics

Ambiguous: one acronym can have many different meanings

E.g., CMU can refer to “Central Michigan University”, “Carnegie Mellon University”, “Central Methodist University”, and many other meanings

Disambiguated by context: the meaning is often clear when context is available

E.g., “cmu football” -> “Central Michigan University” “cmu computer science” -> “Carnegie Mellon

University”

Page 4: Www2013 acronym-mining

Application Scenario

Web Search

Acronym QueriesSuggest the different meanings of the input acronym, or expand to the most likely intended meaning

Acronym + Context QueriesInfer the most likely intended meaning given the context and then perform query alteration, e.g., “cmu football” -> “central michigan university football”

Page 5: Www2013 acronym-mining

Problem Statement

Input: an acronym

Output: the various different meanings of the acronym; each meaning is represented by its canonical expansion, a popularity score and a set of associated context words

Meaning Popularity

Context Words

central michigan university

0.615 michigan, athletics, football, …

carnegie mellon university

0.312 pittsburgh, library, computer, …

concrete masonry unit 0.045 block, concrete, cement, …central methodist university

0.017 fayette, central, missouri, …

canton municipal utilities

0.004 court, docket, case, …

Input

CMU

Page 6: Www2013 acronym-mining

Insight: Exploiting Query Co-click

cmucentral mich univ

cmu football

central michigan university

𝑑1

𝑑2

𝑑3

carnegie mellon university

𝑑4cs carnegie mellon

Page 7: Www2013 acronym-mining

Technical Challenges

Identify co-clicked queries that are expansions

Mined expansions are often noisy, containing variants for the same meaning

Handle tail meanings

Identify context words for each meaning

cmu central mich univ

cmu football

central michigan university

𝑑1

𝑑2

𝑑3

carnegie mellon university𝑑4cs carnegie mellon

Page 8: Www2013 acronym-mining

Mining Steps

central michigan university

carnegie mellon university

concrete masonry unit

0.615

0.312

0.045

michigan, athletics, football, …

pittsburgh, library, computer, …

block, concrete, cement, …

central mich univ

caneigie mellon univ

central mi universityCMU

Expansion Identification

Expansion Clustering

Canonical Expansion Identification

1 2PopularityMining

ContextMining

3 4 5

Page 9: Www2013 acronym-mining

Acronym Candidate Expansion Identification

Rely on Acronym-Expansion Checking FunctionNot a trivial task, e.g., “Hypertext Transfer Protocol” for “HTTP”, “Master of Business Administration” is for “MBA”

cmu central mich univ

central michigan university

𝑑1

𝑑2

𝑑3

carnegie mellon university

𝑑4

Page 10: Www2013 acronym-mining

Mining Steps

central michigan university

carnegie mellon university

concrete masonry unit

0.615

0.312

0.045

michigan, athletics, football, …

pittsburgh, library, computer, …

block, concrete, cement, …

central mich univ

caneigie mellon univ

central mi universityCMU

Expansion Identification

Expansion Clustering

Canonical Expansion Identification

1 2PopularityMining

ContextMining

3 4 5

Page 11: Www2013 acronym-mining

Acronym Expansion Clustering

Edit distance is inadequateE.g, “central michigan university” and “central mich univ”

Insight: leveraging clicked documentsEach document typically corresponds to a single meaningExpansion of same meaning click on same set of documents, and expansion of different meanings click on different documents

Clicked document based distanceSet distance (Jaccard distance)Distributional distance (Jensen-Shannon Divergence)

Page 12: Www2013 acronym-mining

Mining Steps

central michigan university

carnegie mellon university

concrete masonry unit

0.615

0.312

0.045

michigan, athletics, football, …

pittsburgh, library, computer, …

block, concrete, cement, …

central mich univ

caneigie mellon univ

central mi universityCMU

Expansion Identification

Expansion Clustering

Canonical Expansion Identification

1 2PopularityMining

ContextMining

3 4 5

Page 13: Www2013 acronym-mining

Identifying Canonical ExpansionThe probability that a click of acronym query on document is intended for expansion

For each meaning group, canonical expansion is the one with the highest probability

The probability that acronym query is intended for expansion

Page 14: Www2013 acronym-mining

Mining Steps

central michigan university

carnegie mellon university

concrete masonry unit

0.615

0.312

0.045

michigan, athletics, football, …

pittsburgh, library, computer, …

block, concrete, cement, …

central mich univ

caneigie mellon univ

central mi universityCMU

Expansion Identification

Expansion Clustering

Canonical Expansion Identification

1 2PopularityMining

ContextMining

3 4 5

Page 15: Www2013 acronym-mining

Measure Meaning Popularity

Remember we mined the probability for an expansion in identifying the canonical expansion

The popularity for a meaning for acronym is the aggregated popularity of all the expansions in its group

Page 16: Www2013 acronym-mining

Mining Steps

central michigan university

carnegie mellon university

concrete masonry unit

0.615

0.312

0.045

michigan, athletics, football, …

pittsburgh, library, computer, …

block, concrete, cement, …

central mich univ

caneigie mellon univ

central mi universityCMU

Expansion Identification

Expansion Clustering

Canonical Expansion Identification

1 2PopularityMining

ContextMining

3 4 5

Page 17: Www2013 acronym-mining

Compute Context Words for Each Meaning

Consider the set of documents clicked by expansions in group , we treat all the words from queries clicked on these documents as the context words for the meaning group

Let be the aggregated frequency of a word w in group , the probability of a word given a meaning is:

Page 18: Www2013 acronym-mining

Enhancement for Tail Meanings

mit

mass institute of tech

mit boston

massachusetts institute of technology

𝑑1

𝑑2

𝑑3maharashtra institute of technology pune𝑑4 mahakal institute of technology ujjain

mit pune

mit ujjain

mahakal institute of technology

Page 19: Www2013 acronym-mining

Expansion Identification (Enhanced)Consider acronym supersequence queries

E.g, “mit pune”, “mit ujjain”, etc.

Identify expansions from the co-clicked queries of the acronym supersequence queries

E.g, “maharashtra institute of technology pune”, “mahakal institute of technology ujjain”, etc.

Page 20: Www2013 acronym-mining

Expansion Clustering (Enhanced)

Need to aggregate across supersequence queriesE.g., “mahakal institute of technology ujjain”, “mahakal institute of technology india”, …

Distance aggregationFor each supersequence pair, compute the distance and then aggregate the distances over all supersequence pairs

Click frequency aggregationFor each expansion, consider all the documents, including the ones clicked by supersequence queries, and then compute the distributional distance on the aggregated click distribution

Page 21: Www2013 acronym-mining

Application: Online Meaning Prediction

Given an acronym and context, predict the meaning of the acronym under that context

Given a context word , the probability that the intended meaning is is calculated as follows:

This can be extended to handle context with multiple words

Page 22: Www2013 acronym-mining

Experiments

Data: 100 input acronyms sampled from Wikipedia disambiguation pages

Compared methodsEdit Distance based Clustering (EDC)Jaccard Distance based Clustering (JDC)Acronym Expansion Clustering (AEC)Enhanced Acronym Expansion Clustering (EAEC)

Ground TruthWikipedia meanings: Wikipedia disambiguation pageGolden standard meanings: manually captured from co-clicked queries

Page 23: Www2013 acronym-mining

Evaluation Measures

Standard measures used for evaluating clustering, specifically:

Purity: how pure are the meaning clusters

Normalized Mutual Information (NMI): considering both the quality of clusters and the number of clusters

Recall: number of meanings found with respect to the Golden Standard

Page 24: Www2013 acronym-mining

Meanings, Popularity and Context Words

Page 25: Www2013 acronym-mining

Mining Results

AEC > JDE > EDC: weighting by click frequency helps

EAEC > ACE: exploiting supersequence queries boost recall

Page 26: Www2013 acronym-mining

Wikipedia and Golden Standard Meanings

Page 27: Www2013 acronym-mining

Wikipedia vs. Golden Standard Meanings

Page 28: Www2013 acronym-mining

Online Meaning Prediction Results

Data: 7,612 acronym+context queries

Each query is manually labeled to the most probable meaning by judges.

Examples:

Average Precision: 94.1%

Page 29: Www2013 acronym-mining

Summary

We introduce the problem of finding distinct meanings of each acronym, along with the canonical expansion, popularity score and context words

We present a novel, end-to-end solution leveraging query click log

We demonstrate the mined information can be used effectively for online queries in web search

Page 30: Www2013 acronym-mining

Thanks!


Recommended