+ All Categories
Home > Documents > Web Behavior Analysis

Web Behavior Analysis

Date post: 05-Jan-2016
Category:
Upload: ismael
View: 45 times
Download: 0 times
Share this document with a friend
Description:
Web Behavior Analysis. Your Last Words? (in 22 nd century). To family To your best friend?. Web Behavior Analysis. Why important? Why scary?. Part I: Why Important?. We rely more and more on search for our real-life decision Opportunities for business Concerns for privacy. - PowerPoint PPT Presentation
Popular Tags:
64
Web Behavior Analysis
Transcript
Page 1: Web Behavior Analysis

Web Behavior Analysis

Page 2: Web Behavior Analysis

Your Last Words? (in 22nd century)

• To family• To your best friend?

Page 3: Web Behavior Analysis

Web Behavior Analysis

• Why important?• Why scary?

Page 4: Web Behavior Analysis

Part I: Why Important?

Page 5: Web Behavior Analysis

Q. In the past six months have you used a search engine to help inform your decisions for the following tasks?

66%of people are using search

more frequently to make

decisions

• We rely more and more on search for our real-life decision– Opportunities for

business– Concerns for

privacy

Page 6: Web Behavior Analysis

Length of Sessions by Type

What should be done?

• Focus on new territory

Page 7: Web Behavior Analysis

Taxonomy of Web queries

• Navigational (we are good at this)– to reach a particular site

• E.g., Searching for top page of company

• Informational– to acquire pages that provide

knowledge for user’s information need• Conventional ad hoc retrieval

• Transactional– to perform a Web-mediated activity

• E.g., online shopping

Page 8: Web Behavior Analysis

Navigational Queries Pseudo- Navigational Queries

Example: Good and Bad

Page 9: Web Behavior Analysis

• Car GPS around $300• Four day trip to Bhutan from Delhi to

visit important Buddhist places

Example of “Hard Queries”:Informational/Transactional

Page 10: Web Behavior Analysis

Game Consol

es

Party Site

Page 11: Web Behavior Analysis

What we want?

Page 12: Web Behavior Analysis

Current research directions

• How to classify queries?• Then what?

– Search engines trying to reduce clicks for “hard queries”

– Extracting info from forum

Page 13: Web Behavior Analysis

Importance of query classification: “obama”

• Informational: People may search to know more about Barak Obama

• Navigational: visit his official website • Transactional: perhaps the user goal

is to donate money online to support Mr. Obama’s campaign

Page 14: Web Behavior Analysis

Yahoo numbers

• ~25 informational content text?• ~40 navigational anchor text?• ~35 transactional site template?

Page 15: Web Behavior Analysis

Can you tell if query is “navigational” or not?

Page 16: Web Behavior Analysis

Lee et al.[WWW05]: Overview

• Analyzing how query term is used in anchor texts

WWW2008 WWW2008search search

Top page ofWWW2008

Description in Wikipedia

Search engine

Destinations are identical → Navigational

Destinations are diverse → Informational

Q = “search” Q = “WWW2008”

Page 17: Web Behavior Analysis

Anchor-link distribution (ALD)

Probability that page linked by t is d

Top page of WWW2008

t = WWW2008

ALD is skewed

)|( tdP

Google Yahoo!Wikipedia

t = search

ALD is uniform

NavigationalInformational

)|( tdP

Page 18: Web Behavior Analysis

Lee et al.: Problem• Targeting only anchor texts that are

exactly same as the query– If the same anchor text as the query

does not exist on the Web, ALD cannot be computed

• Problematic queries– Long phrase

• E.g., “information retrieval system research”

– Multiple keywords• E.g., “trec, nist, test collection”

Page 19: Web Behavior Analysis

Multi-query solutionQuery Q = “trec, test collection”

t = trec t = test t = collection)|( tdP

Terms T = {trec, test, collection}

destinations D = {d1, d2, …}

Compute ALD on a term-by-term basis and integrate them

Page 20: Web Behavior Analysis

Computation of classification score

• Entropy of D

Tt Dd

tdPtdPtPTDH )|(log)|()()|(

Entropy of a single term tWeighted average

Page 21: Web Behavior Analysis

Now what?

• For “WWII”– Google: http://www.google.com/search?q=WWII&hl=

en&tbo=1&output=search&tbs=ww:1 – Microsoft: http://

www.bing.com/reference/semhtml/World_War_II?fwd=1&qpvt=wwii&src=abop&q=wwii

– Wolfram: http://www.wolframalpha.com/input/?i=wwII • Can you tell information vs. transactional?

Page 22: Web Behavior Analysis

Challenges/Opportunities

• Slightly subtle/interleaved• But huge advertisement revenue (yet to be

explored)!!!!• Classic querylog+Clicks on surface web no

t enough..• Any ideas?

Page 23: Web Behavior Analysis

More signals?

• Eye movement? • Brain signal?

Page 24: Web Behavior Analysis

More corpus? (social corpus for polls? expert advice?)

Page 25: Web Behavior Analysis

More signal

Page 26: Web Behavior Analysis

CS: Client Simple

• First representation:– Trajectory length– Horizontal range– Vertical range

Horizontal range

Vertical range

Trajectory length

Page 27: Web Behavior Analysis

CF: Client Full

• Second representation: – 5 segments:

initial, early, middle, late, and end

– Each segment: speed, acceleration, rotation, slope, etc.

1

2

3

4

5

Page 28: Web Behavior Analysis

Navigational query: “facebook”

Page 29: Web Behavior Analysis

Informational query: “spanish wine”

Page 30: Web Behavior Analysis

Transactional query: “integrator”

Page 31: Web Behavior Analysis

More corpus

• cQA successful, as “additional corpus”, not as “additional means”

• Challenges?

Page 32: Web Behavior Analysis

cQA (Yahoo Answers)

Page 33: Web Behavior Analysis
Page 34: Web Behavior Analysis
Page 35: Web Behavior Analysis
Page 36: Web Behavior Analysis

How Yahoo Answers works

Page 37: Web Behavior Analysis
Page 38: Web Behavior Analysis
Page 39: Web Behavior Analysis
Page 40: Web Behavior Analysis
Page 41: Web Behavior Analysis
Page 42: Web Behavior Analysis

Good questions draw good answers

Page 44: Web Behavior Analysis

Good Q/A? -- Clicks

Page 45: Web Behavior Analysis

Good Q/As? -- Community

Page 46: Web Behavior Analysis

Why scary?

Page 47: Web Behavior Analysis

Useful beyond imagination

• Spell checker: SIGMOD Did you mean “sigmoid”?

• Entity relation: SIGMOD ~ SIGIR• Translation: SIGMOD, 씨그모드 sigmo

d.com• Query suggestion: 영일대 호텔 영일대• Rank learning: top 10 entry is visited all th

e time, what should we do?• Reason of migrain?

Page 48: Web Behavior Analysis

Companies need YOUR HELP

• AOL released logs• Guess what happened?

Page 49: Web Behavior Analysis

More scientific observations (Yahoo Research)

• X={query1, query2, query3}• Y= age

gender area

XY (how likely?) Validate with ground-truth info (Yahoo

account)

Page 50: Web Behavior Analysis

See if you can do it?

• You observe yourself:

http://aolpsycho.com/user/5826-kallemeyn

Page 51: Web Behavior Analysis

Gender

• Female: fanfiction, bridal, makeup, women’s, knitting, hair, ecards, glitter, yoga, and diet

• Male: nfl, poker, espn, ufc, railroad, prostate, footb

all, golf, male, wrestling, compusa, as well as a variety of adult terms

Accuracy: 80+%

Page 52: Web Behavior Analysis

Age

• YOUNG: myspace, pregnancy, wikipedia, lyrics, quotes, apartments, torrent, baby, wedding, mall, soundtrack;

• OLD: aarp, telephone, lottery, amazon.com, retirement, funeral, senior, mapquest, medicare, newspapers, repair.

Page 53: Web Behavior Analysis

Place

• A user’s zip code (US postal code) or other identifier of location may be detectable from place names used in

• Check out YahooGEO Apis

Page 54: Web Behavior Analysis

Name?

• 50+% issued their name• (but other names too)

Ref: "Vanity Fair: Privacy in Querylog Bundles"

Page 55: Web Behavior Analysis

User Solutions?

TrackMeNot (TMN) Their tool is an extensio

n to the Firefox web browser, and initiates randomized search queries in the background to a number of commercial search engines.

• Tor: change IP/cookie (prevents aggregation)

- Losing services e.g., personalization

Page 56: Web Behavior Analysis

Company Solution

• K-anonymity (bundling)Reported to be unsafe for (vanity

search + geo-query, long-tail keywords)

[so far, it is considered to be TOO RISKY]

Page 57: Web Behavior Analysis

Summary

• You are leaving trails in the cyber world, which aligns more and more with real-life trails

• Companies are interested in predicting as much as possible of your next behaviors

• More signals? More corpus?• Can you hide as much to protect privacy,

while reveal as much to enable such prediction? (privacy dilemma)

• But it is ok even if we can’t know (product state-of-the-art)

Page 58: Web Behavior Analysis

Search UI? Visualization?

Page 59: Web Behavior Analysis

What are query aspects?

Page 60: Web Behavior Analysis

Challenge

• Intentions are hidden– omission of key information makes intent in q

ueries ambiguous– eg: omitting “reviews” when searching for revie

ws of “Canon EOS 40D SLR”– eg: omitting “location/city” when searching for

“jobs”• Queries are often too broad

Page 61: Web Behavior Analysis

Goal

• Mine broad latent aspects from search logs– Formulate the problem based on a real-world m

odel of user interaction with search engine (session = 10 mins)

– Bring interesting aspects to the attention of editors who can then determine saliency and usefulness

Page 62: Web Behavior Analysis

User reformulates query by adding

qualifier “reviews”

User reformulates query by selecting “reviews” aspect

User interaction modelUser enters

original query “Canon EOS 40D”

Search engine (SE) returns general

results

SE returns reviews of the camera

User’s query is satisfied. eg: she clicks

on a result.

Search engine (SE) returns general results + query

aspects

Learning of query aspectsfrom reformulations

Page 63: Web Behavior Analysis

Results: Examples of aspects found

Page 64: Web Behavior Analysis

New directions might be

• Taking target web page clicked into account while constructing aspects

• Or visualization techniques helping to visually/perceptually/cognitively mine such “aspects”– Visualization/refinement iterations to narrow down

Tomorrow 4:15pm (B2 102)Title:

Using Information Visualization to Understand Data Abstract:

Information Visualization is the art and science of representing abstract information in a visual form that enables users to understand data through their perceptual and cognitive capabilities.

Dr. Bongshin Lee (Microsoft Research)


Recommended