+ All Categories
Home > Technology > Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web...

Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web...

Date post: 10-May-2015
Category:
Upload: mounia-lalmas
View: 1,373 times
Download: 0 times
Share this document with a friend
Description:
Aggregating search results from a variety of heterogeneous sources, so-called verticals, such as news, image and video, into a single interface is a popular paradigm in web search. Current approaches that evaluate the effectiveness of aggregated search systems are based on rewarding systems that return highly relevant verticals for a given query, where this relevance is assessed under different assumptions. It is difficult to evaluate or compare those systems without fully understanding the relationship between those underlying assumptions. To address this, we present a formal analysis and a set of extensive user studies to investigate the effects of various assumptions made for assessing query vertical relevance. A total of more than 20,000 assessments on 44 search tasks across 11 verticals are collected through Amazon Mechanical Turk and subsequently analysed. Our results provide insights into various aspects of query vertical relevance and allow us to explain in more depth as well as questioning the evaluation results published in the literature. Work with Ke (Adam) Zhou, Ronan Cummins and Joemon Jose. Presented at WWW 2013, Rio de Janeiro.
Popular Tags:
49
Which Ver)cal Search Engines are Relevant? Understanding Ver)cal Relevance Assessments for Web Queries Ke Zhou1, Ronan Cummins2, Mounia Lalmas3, Joemon M. Jose1 1University of Glasgow 2University of Greenwich 3Yahoo! Labs Barcelona WWW 2013, Rio de Janeiro
Transcript
Page 1: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Which  Ver)cal  Search  Engines    are  Relevant?  

Understanding  Ver)cal  Relevance  Assessments  for  Web  Queries    

Ke  Zhou1,  Ronan  Cummins2,  Mounia  Lalmas3,  Joemon  M.  Jose1  

1University  of  Glasgow    2University  of  Greenwich    3Yahoo!  Labs  Barcelona  

WWW  2013,  Rio  de  Janeiro  

Page 2: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Aggregated  Search  

•  Diverse  search  ver)cals  (image,  video,  news,  etc.)  are  available  on  the  web.  

•  Aggrega)ng  (embedding)  ver)cal  results  into  “general  web”  results  has  become  de-­‐facto  in  commercial  web  search  engine.  

Ver)cal  search  engines  

General  web  search  

Mo)va)on  

Page 3: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Aggregated  Search  

•  Diverse  search  ver)cals  (image,  video,  news,  etc.)  are  available  on  the  web.  

•  Aggrega)ng  (embedding)  ver)cal  results  into  “general  web”  results  has  become  de-­‐facto  in  commercial  web  search  engine.  

Ver)cal  search  engines  

General  web  search  

Mo)va)on  

Ver)cal  selec)on  

Page 4: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Evalua)on  of  Aggregated  Search  

•  Evalua)on  solely  based  on  ver)cal  selec)on.  

•  Compare  system  predic)on  set  against  user  annota%on  set.  

•  Annota)on  is  gathered  – Explicitly  (assessing)  –  Implicitly  (deriving  from  search  logs)  

Mo)va)on  

assessor  

Topic:  yoga  poses  

System  A  

System  B  

System  C  >  System  B  >  System  A  

System  C  

Page 5: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Assessor:  which  ver)cal  search  engines  are  relevant?  

•  Defini)on  of  relevance  of  a  ver)cal,  given  a  query,  remains  complex.  – Different  work  makes  different  assump)ons.  – The  underlying  assump)ons  made  may  have  a  major  effect  on  the  evalua)on  of  a  SERP.    

•  We  want  to  understand  different  ver)cal  assessment  processes  and  inves)gate  the  impact  of  these.  

Mo)va)on  

Page 6: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Assessor:  which  ver)cal  search  engines  are  relevant?  

•  Defini)on  of  relevance  of  a  ver)cal,  given  a  query,  remains  complex.  – Different  work  makes  different  assump)ons.  – The  underlying  assump)ons  made  may  have  a  major  effect  on  the  evalua)on  of  a  SERP.    

•  We  want  to  understand  different  ver)cal  assessment  processes  and  inves)gate  the  impact  of  these.  

Mo)va)on  

Page 7: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ1)  Assump)ons:  user  perspec)ve  •  Pre-­‐retrieval:  –  Ver%cal  Orienta%on:  before  issuing  the  query,  the  user  thinks  about  which  ver)cals  might  provide  be`er  results.  

•  Post-­‐retrieval:    –  Aaer  viewing  search  results,  the  user  considers  which  ver)cal  provides  be`er  results.  

•  Influencing  factors  –  Ver)cal  orienta)on  (type  preference)  – Within-­‐ver)cal  ranking  

•  Serendipity    –  Visual  a`rac)veness  

Problem  and  Previous  Work  

pre-­‐retrieval  user-­‐need  

Page 8: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ1)  Assump)ons:  user  perspec)ve  •  Pre-­‐retrieval:  –  Ver%cal  Orienta%on:  before  issuing  the  query,  the  user  thinks  about  which  ver)cals  might  provide  be`er  results.  

•  Post-­‐retrieval:    –  Aaer  viewing  search  results,  the  user  considers  which  ver)cal  provides  be`er  results.  

•  Influencing  factors  –  Ver)cal  orienta)on  (type  preference)  – Within-­‐ver)cal  ranking  

•  Serendipity    –  Visual  a`rac)veness  

Problem  and  Previous  Work  

……  

Post-­‐retrieval  user  perspec)ve    

Page 9: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ1)  Assump)ons:  user  perspec)ve  •  Pre-­‐retrieval:  –  Ver%cal  Orienta%on:  before  issuing  the  query,  the  user  thinks  about  which  ver)cals  might  provide  be`er  results.  

•  Post-­‐retrieval:    –  Aaer  viewing  search  results,  the  user  considers  which  ver)cal  provides  be`er  results.  

•  Influencing  factors  –  Ver)cal  orienta)on  (type  preference)  – Within-­‐ver)cal  ranking  

•  Serendipity    –  Visual  a`rac)veness  

Problem  and  Previous  Work  

……  

pre-­‐retrieval  user-­‐need  

Post-­‐retrieval  user  perspec)ve    

Page 10: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ1)  Assump)ons:  user  perspec)ve  •  Pre-­‐retrieval:  –  Ver%cal  Orienta%on:  before  issuing  the  query,  the  user  thinks  about  which  ver)cals  might  provide  be`er  results.  

•  Post-­‐retrieval:    –  Aaer  viewing  search  results,  the  user  considers  which  ver)cal  provides  be`er  results.  

•  Influencing  factors  –  Ver)cal  orienta)on  (type  preference)  – Within-­‐ver)cal  ranking  

•  Serendipity    –  Visual  a`rac)veness  

Problem  and  Previous  Work  

……  

pre-­‐retrieval  user-­‐need  

Post-­‐retrieval  user  perspec)ve    

Page 11: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ2)  Assump)ons:  dependency  of  relevance    •  Inter-­‐dependent  approach:    –  quality  of  ver)cals  is  rela)ve  and  dependent  on  each  other.  

•  Web-­‐anchor  approach:    –  quality  of  “general  web”  serves  as  a  reference  criteria  for  deciding  relevance.  

•  Context    –  Does  the  context  (results  returned  from  other  ver)cals)  affect  a  user’s  percep)on  of  the  relevance  of  the  ver)cal  of  interest?  

•  U)lity  vs.  Effort  

Problem  and  Previous  Work  

Inter-­‐dependent  approach  

Page 12: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ2)  Assump)ons:  dependency  of  relevance    •  Inter-­‐dependent  approach:    –  quality  of  ver)cals  is  rela)ve  and  dependent  on  each  other.  

•  Web-­‐anchor  approach:    –  quality  of  “general  web”  serves  as  a  reference  criteria  for  deciding  relevance.  

•  Context    –  Does  the  context  (results  returned  from  other  ver)cals)  affect  a  user’s  percep)on  of  the  relevance  of  the  ver)cal  of  interest?  

•  U)lity  vs.  Effort  

Problem  and  Previous  Work  

Web-­‐anchor  approach  

Page 13: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ2)  Assump)ons:  dependency  of  relevance    •  Inter-­‐dependent  approach:    –  quality  of  ver)cals  is  rela)ve  and  dependent  on  each  other.  

•  Web-­‐anchor  approach:    –  quality  of  “general  web”  serves  as  a  reference  criteria  for  deciding  relevance.  

•  Context    –  Does  the  context  (results  returned  from  other  ver)cals)  affect  a  user’s  percep)on  of  the  relevance  of  the  ver)cal  of  interest?  

•  U)lity  vs.  Effort  

Problem  and  Previous  Work  

Inter-­‐dependent  approach  

Web-­‐anchor  approach  

Page 14: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ2)  Assump)ons:  dependency  of  relevance    •  Inter-­‐dependent  approach:    –  quality  of  ver)cals  is  rela)ve  and  dependent  on  each  other.  

•  Web-­‐anchor  approach:    –  quality  of  “general  web”  serves  as  a  reference  criteria  for  deciding  relevance.  

•  Context    –  Does  the  context  (results  returned  from  other  ver)cals)  affect  a  user’s  percep)on  of  the  relevance  of  the  ver)cal  of  interest?  

•  U)lity  vs.  Effort  

Problem  and  Previous  Work  

Inter-­‐dependent  approach  

Web-­‐anchor  approach  

Page 15: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ2)  Assump)ons:  dependency  of  relevance    •  Inter-­‐dependent  approach:    –  quality  of  ver)cals  is  rela)ve  and  dependent  on  each  other.  

•  Web-­‐anchor  approach:    –  quality  of  “general  web”  serves  as  a  reference  criteria  for  deciding  relevance.  

•  Context    –  Does  the  context  (results  returned  from  other  ver)cals)  affect  a  user’s  percep)on  of  the  relevance  of  the  ver)cal  of  interest?  

•  U)lity  vs.  Effort  

Problem  and  Previous  Work  

Inter-­‐dependent  approach  

Web-­‐anchor  approach  

Page 16: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ3)  Assump)ons:  assessment  grade  •  Binary  (pairwise)  preference  •  Mul)-­‐grade  preference  •  SERP  (one  possible  slot)  –  ToP:  top  of  the  page  –  NS:  not  shown  

•  Is  the  binary  (pairwise)  preference  informa)on  provided  by  a  popula)on  of  users  able  to  predict  the  “perfect”  embedding  posi)on  of  a  ver)cal?  

Problem  and  Previous  Work  

Binary  preference  (ToP  or  NS)  

End  of  SERP  

ToP  

Page 17: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ3)  Assump)ons:  assessment  grade  •  Binary  (pairwise)  preference  •  Mul)-­‐grade  preference  •  SERP  (three  possible  slots)  –  ToP:  top  of  the  page  – MoP:  middle  of  the  page  –  BoP:  bo`om  of  the  page  –  NS:  not  shown  

•  Is  the  binary  (pairwise)  preference  informa)on  provided  by  a  popula)on  of  users  able  to  predict  the  “perfect”  embedding  posi)on  of  a  ver)cal?  

Problem  and  Previous  Work  

Mul)-­‐grade  preference  (ToP,  MoP,  BoP  or  NS)  

End  of  SERP  

ToP  

MoP  

BoP  

Page 18: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ3)  Assump)ons:  assessment  grade  •  Binary  (pairwise)  preference  •  Mul)-­‐grade  preference  •  SERP  (three  possible  slots)  –  ToP:  top  of  the  page  – MoP:  middle  of  the  page  –  BoP:  bo`om  of  the  page  –  NS:  not  shown  

•  Is  the  binary  (pairwise)  preference  informa)on  provided  by  a  popula)on  of  users  able  to  predict  the  “perfect”  embedding  posi)on  of  a  ver)cal?  

Problem  and  Previous  Work  

Binary  preference  (ToP  or  NS)  

Mul)-­‐grade  preference  (ToP,  MoP,  BoP  or  NS)  

End  of  SERP  

ToP  

End  of  SERP  

ToP  

MoP  

BoP  

Page 19: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

(RQ3)  Assump)ons:  assessment  grade  •  Binary  (pairwise)  preference  •  Mul)-­‐grade  preference  •  SERP  (three  possible  slots)  –  ToP:  top  of  the  page  – MoP:  middle  of  the  page  –  BoP:  bo`om  of  the  page  –  NS:  not  shown  

•  Is  the  binary  (pairwise)  preference  informa)on  provided  by  a  popula)on  of  users  able  to  predict  the  “perfect”  embedding  posi)on  of  a  ver)cal?  

Problem  and  Previous  Work  

Binary  preference  (ToP  or  NS)  

Mul)-­‐grade  preference  (ToP,  MoP,  BoP  or  NS)  

End  of  SERP  

ToP  

End  of  SERP  

ToP  

MoP  

BoP  

Page 20: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Experimental  Design  Overview  •  Manipula)on  (Independent)  Variables  

–  Search  Tasks  –  Ver)cals  of  Interest  –  User  Perspec)ve  (Study  1:  RQ1)  –  Dependency  of  Relevance  (Study  2:  RQ2)  –  Assessment  Grade  (Study  3:  RQ3)  

Experimental  Design  

•  Dependent  Variables  –  Inter-­‐assessor  Agreement  

•  Measured  by  Fleiss’  Kappa  (KF)  

–  Ver)cal  Relevance  Correla)on  •  Measured  by  Spearman  Correla)on  

RQ1  

RQ2  RQ3  

Page 21: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Experimental  Design  Overview  •  Manipula)on  (Independent)  Variables  

–  Search  Tasks  –  Ver)cals  of  Interest  –  User  Perspec)ve  (Study  1:  RQ1)  –  Dependency  of  Relevance  (Study  2:  RQ2)  –  Assessment  Grade  (Study  3:  RQ3)  

Experimental  Design  

•  Dependent  Variables  –  Inter-­‐assessor  Agreement  

•  Measured  by  Fleiss’  Kappa  (KF)  

–  Ver)cal  Relevance  Correla)on  •  Measured  by  Spearman  Correla)on  

RQ1  

RQ2  RQ3  

Page 22: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Experiment  Design  Details  •  Crowd-­‐sourcing  Data  Collec)on  

–  We  hire  crowd-­‐sourced  workers  on  Amazon  Mechanical  Turk  to  make  assessments.  

•  Ver)cals  –  Cover  a  variety  of  11  ver)cals  employed  by  three  major  

commercial  search  engines.  –  Use  exis)ng  commercial  ver)cal  search  engines.  

•  Search  Tasks  –  44  tasks  cover  a  variety  of  (ver)cal)  intents    –  Come  from  exis)ng  aggregated  search  collec)on  (TREC)  

•  Quality  Control  –  4  assessment  points  for  one  manipula)on  –  Trap  HITs  (assessment  page  with  results  of  other  queries)  –  Trap  search  tasks  (assessment  page  with  explicit  ver)cal  

request)  

Experimental  Design  

Page 23: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Experiment  Design  Details  •  Crowd-­‐sourcing  Data  Collec)on  

–  We  hire  crowd-­‐sourced  workers  on  Amazon  Mechanical  Turk  to  make  assessments.  

•  Ver)cals  –  Cover  a  variety  of  11  ver)cals  employed  by  three  major  

commercial  search  engines.  –  Use  exis)ng  commercial  ver)cal  search  engines.  

•  Search  Tasks  –  44  tasks  cover  a  variety  of  (ver)cal)  intents    –  Come  from  exis)ng  aggregated  search  collec)on  (TREC)  

•  Quality  Control  –  4  assessment  points  for  one  manipula)on  –  Trap  HITs  (assessment  page  with  results  of  other  queries)  –  Trap  search  tasks  (assessment  page  with  explicit  ver)cal  

request)  

Experimental  Design  

Page 24: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Experiment  Design  Details  •  Crowd-­‐sourcing  Data  Collec)on  

–  We  hire  crowd-­‐sourced  workers  on  Amazon  Mechanical  Turk  to  make  assessments.  

•  Ver)cals  –  Cover  a  variety  of  11  ver)cals  employed  by  three  major  

commercial  search  engines.  –  Use  exis)ng  commercial  ver)cal  search  engines.  

•  Search  Tasks  –  44  tasks  cover  a  variety  of  (ver)cal)  intents    –  Come  from  exis)ng  aggregated  search  collec)on  (TREC)  

•  Quality  Control  –  4  assessment  points  for  one  manipula)on  –  Trap  HITs  (assessment  page  with  results  of  other  queries)  –  Trap  search  tasks  (assessment  page  with  explicit  ver)cal  

request)  

Experimental  Design  

Page 25: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Experiment  Design  Details  •  Crowd-­‐sourcing  Data  Collec)on  

–  We  hire  crowd-­‐sourced  workers  on  Amazon  Mechanical  Turk  to  make  assessments.  

•  Ver)cals  –  Cover  a  variety  of  11  ver)cals  employed  by  three  major  

commercial  search  engines.  –  Use  exis)ng  commercial  ver)cal  search  engines.  

•  Search  Tasks  –  44  tasks  cover  a  variety  of  (ver)cal)  intents    –  Come  from  exis)ng  aggregated  search  collec)on  (TREC)  

•  Quality  Control  –  4  assessment  points  for  one  manipula)on  –  Trap  HITs  (assessment  page  with  results  of  other  queries)  –  Trap  search  tasks  (assessment  page  with  explicit  ver)cal  

request)  

Experimental  Design  

Page 26: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Experimental  Design:  Study  1  •  Manipula)on  (Independent)  Variables  

–  Search  Tasks  –  Ver)cals  of  Interest  –  User  Perspec)ve  (Study  1:  RQ1)  –  Dependency  of  Relevance  (Study  2:  RQ2)  –  Assessment  Grade  (Study  3:  RQ3)  

Experimental  Design  

•  Dependent  Variables  –  Inter-­‐assessor  Agreement  

•  Measured  by  Fleiss’  Kappa  (KF)  

–  Ver)cal  Relevance  Correla)on  •  Measured  by  Spearman  Correla)on  

RQ1  

Page 27: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  1  Results:    pre-­‐retrieval  vs.  post-­‐retrieval  

•  Both  pre-­‐retrieval  and  post-­‐retrieval  inter-­‐assessor  agreements  are  moderate  and  assessors  have  the  similar  level  of  difficulty  in  assessing  for  both.  

•  Ver)cal  relevance  are  moderately  (but  significantly)  correlated  (0.53)  between  pre-­‐retrieval  and  post-­‐retrieval.  

•  Highly  relevant  ver)cals  derived  from  pre-­‐retrieval  and  post-­‐retrieval  overlap  significantly.  –  Almost  60%  overlap  on  at  least  2  out  of  3  top  ver)cals  

•  There  is  a  bias  in  visually  salient  ver)cals  for  post-­‐retrieval  search  u%lity.    

 

Experimental  Results  

Page 28: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  1  Results:    pre-­‐retrieval  vs.  post-­‐retrieval  

•  Both  pre-­‐retrieval  and  post-­‐retrieval  inter-­‐assessor  agreements  are  moderate  and  assessors  have  the  similar  level  of  difficulty  in  assessing  for  both.  

•  Ver)cal  relevance  are  moderately  (but  significantly)  correlated  (0.53)  between  pre-­‐retrieval  and  post-­‐retrieval.  

•  Highly  relevant  ver)cals  derived  from  pre-­‐retrieval  and  post-­‐retrieval  overlap  significantly.  –  Almost  60%  overlap  on  at  least  2  out  of  3  top  ver)cals  

•  There  is  a  bias  in  visually  salient  ver)cals  for  post-­‐retrieval  search  u%lity.    

 

Experimental  Results  

Page 29: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  1  Results:    pre-­‐retrieval  vs.  post-­‐retrieval  

•  Both  pre-­‐retrieval  and  post-­‐retrieval  inter-­‐assessor  agreements  are  moderate  and  assessors  have  the  similar  level  of  difficulty  in  assessing  for  both.  

•  Ver)cal  relevance  are  moderately  (but  significantly)  correlated  (0.53)  between  pre-­‐retrieval  and  post-­‐retrieval.  

•  Highly  relevant  ver)cals  derived  from  pre-­‐retrieval  and  post-­‐retrieval  overlap  significantly.  –  Almost  60%  overlap  on  at  least  2  out  of  3  top  ver)cals  

•  There  is  a  bias  in  visually  salient  ver)cals  for  post-­‐retrieval  search  u%lity.    

 

Experimental  Results  

Page 30: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  1  Results:    pre-­‐retrieval  vs.  post-­‐retrieval  

•  Both  pre-­‐retrieval  and  post-­‐retrieval  inter-­‐assessor  agreements  are  moderate  and  assessors  have  the  similar  level  of  difficulty  in  assessing  for  both.  

•  Ver)cal  relevance  are  moderately  (but  significantly)  correlated  (0.53)  between  pre-­‐retrieval  and  post-­‐retrieval.  

•  Highly  relevant  ver)cals  derived  from  pre-­‐retrieval  and  post-­‐retrieval  overlap  significantly.  –  Almost  60%  overlap  on  at  least  2  out  of  3  top  ver)cals  

•  There  is  a  bias  in  visually  salient  ver)cals  for  post-­‐retrieval  search  u%lity.    

 

Experimental  Results  

Page 31: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  1  Results:    topical  relevance  vs.  pre-­‐retrieval  orienta)on  

Experimental  Results  

N R N

RRN

nDCG(vi)

nDCG(w)  

……  

Orienta)on

Topical  relevance

•  Ver)cal  relevance  between  pre-­‐retrieval  orienta)on  and  post-­‐retrieval  is  moderately  correlated  (0.53).  

•  Ver)cal  relevance  between  topical-­‐relevance  and  post-­‐retrieval  is  weakly  correlated  (0.36).  

•  Impact  of  pre-­‐retrieval  orienta%on  is  more  important  for  post-­‐retrieval  search  u%lity,  compared  with  post-­‐retrieval  topical  relevance.  

Post-­‐retrieval  search  u)lity

Page 32: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  1  Results:    topical  relevance  vs.  pre-­‐retrieval  orienta)on  

Experimental  Results  

N R N

RRN

nDCG(vi)

nDCG(w)  

……  

Orienta)on

Topical  relevance

•  Ver)cal  relevance  between  pre-­‐retrieval  orienta)on  and  post-­‐retrieval  is  moderately  correlated  (0.53).  

•  Ver)cal  relevance  between  topical-­‐relevance  and  post-­‐retrieval  is  weakly  correlated  (0.36).  

•  Impact  of  pre-­‐retrieval  orienta%on  is  more  important  for  post-­‐retrieval  search  u%lity,  compared  with  post-­‐retrieval  topical  relevance.  

Post-­‐retrieval  search  u)lity

Page 33: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  1  Results:    topical  relevance  vs.  pre-­‐retrieval  orienta)on  

•  Ver)cal  relevance  between  pre-­‐retrieval  orienta)on  and  post-­‐retrieval  is  moderately  correlated  (0.53).  

•  Ver)cal  relevance  between  topical-­‐relevance  and  post-­‐retrieval  is  weakly  correlated  (0.36).  

•  Impact  of  pre-­‐retrieval  orienta%on  is  more  important  for  post-­‐retrieval  search  u%lity,  compared  with  post-­‐retrieval  topical  relevance.  

Experimental  Results  

N R N

RRN

nDCG(vi)

nDCG(w)  

……  

Orienta)on

Topical  relevance Post-­‐retrieval  

search  u)lity

Page 34: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Experimental  Design:  Study  2  •  Manipula)on  (Independent)  Variables  

–  Search  Tasks  –  Ver)cals  of  Interest  –  User  Perspec)ve  (Study  1:  RQ1)  –  Dependency  of  Relevance  (Study  2:  RQ2)  –  Assessment  Grade  (Study  3:  RQ3)  

Experimental  Design  

•  Dependent  Variables  –  Inter-­‐assessor  Agreement  

•  Measured  by  Fleiss’  Kappa  (KF)  

–  Ver)cal  Relevance  Correla)on  •  Measured  by  Spearman  Correla)on  

RQ2  

Page 35: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  2  (Dependency  of  Relevance)  Results  •  Both  inter-­‐assessor  agreements  are  moderate  and  there  is  not  much  difference  between  the  user  agreement  for  both  approaches.  

•  Ver)cal  relevance  correla)on  between  inter-­‐dependent  and  web-­‐anchor  approach  is  moderate  (0.573).  

•  The  overlap  of  top-­‐three  relevant  ver)cals  between  two  approaches  is  quite  high.  – More  than  70%  overlap  on  2  out  of  3  top  ver)cals.  

•  Web-­‐anchor  approach  provides  be`er  trade-­‐off  between  u)lity  and  effort.  

 

Experimental  Results  

Page 36: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  2  (Dependency  of  Relevance)  Results  •  Both  inter-­‐assessor  agreements  are  moderate  and  there  is  not  much  difference  between  the  user  agreement  for  both  approaches.  

•  Ver)cal  relevance  correla)on  between  inter-­‐dependent  and  web-­‐anchor  approach  is  moderate  (0.573).  

•  The  overlap  of  top-­‐three  relevant  ver)cals  between  two  approaches  is  quite  high.  – More  than  70%  overlap  on  2  out  of  3  top  ver)cals.  

•  Web-­‐anchor  approach  provides  be`er  trade-­‐off  between  u)lity  and  effort.  

Experimental  Results  

Page 37: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  2  (Dependency  of  Relevance)  Results  •  Both  inter-­‐assessor  agreements  are  moderate  and  there  is  not  much  difference  between  the  user  agreement  for  both  approaches.  

•  Ver)cal  relevance  correla)on  between  inter-­‐dependent  and  web-­‐anchor  approach  is  moderate  (0.573).  

•  The  overlap  of  top-­‐three  relevant  ver)cals  between  two  approaches  is  quite  high.  – More  than  70%  overlap  on  2  out  of  3  top  ver)cals.  

•  Web-­‐anchor  approach  provides  be`er  trade-­‐off  between  u)lity  and  effort.  

Experimental  Results  

Page 38: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  2  (Dependency  of  Relevance)  Results  •  Both  inter-­‐assessor  agreements  are  moderate  and  there  is  not  much  difference  between  the  user  agreement  for  both  approaches.  

•  Ver)cal  relevance  correla)on  between  inter-­‐dependent  and  web-­‐anchor  approach  is  moderate  (0.573).  

•  The  overlap  of  top-­‐three  relevant  ver)cals  between  two  approaches  is  quite  high.  – More  than  70%  overlap  on  2  out  of  3  top  ver)cals.  

•  Web-­‐anchor  approach  provides  be`er  trade-­‐off  between  u)lity  and  effort.  

Experimental  Results  

Page 39: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  2  (Dependency  of  Relevance)  Results  

•  Not  much  difference  is  observed  by  using  different  anchors  (different  observed  topical  relevance  level).  

•  Context  ma`ers  –  The  context  of  other  ver)cals  can  diminish  the  u)lity  of  a  ver)cal.  

–  Examples:  (“Answer”,  “Wiki”),  (“Books”,  “Scholar”),  etc.    

Experimental  Results  

Page 40: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  2  (Dependency  of  Relevance)  Results  

•  Not  much  difference  is  observed  by  using  different  anchors  (different  observed  topical  relevance  level).  

•  Context  ma`ers  –  The  context  of  other  ver)cals  can  diminish  the  u)lity  of  a  ver)cal.  

–  Examples:  (“Answer”,  “Wiki”),  (“Books”,  “Scholar”),  etc.    

Experimental  Results  

Page 41: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Experimental  Design:  Study  3  •  Manipula)on  (Independent)  Variables  

–  Search  Tasks  –  Ver)cals  of  Interest  –  User  Perspec)ve  (Study  1:  RQ1)  –  Dependency  of  Relevance  (Study  2:  RQ2)  –  Assessment  Grade  (Study  3:  RQ3)  

Experimental  Design  

•  Dependent  Variables  –  Inter-­‐assessor  Agreement  

•  Measured  by  Fleiss’  Kappa  (KF)  

–  Ver)cal  Relevance  Correla)on  •  Measured  by  Spearman  Correla)on  

RQ3  

Page 42: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  3  (Assessment  Grade)  Results  

•  Deriving  “perfect”  embedding  posi)on  from  mul)-­‐graded  assessments  

•  Thresholding  for  binary  assessment  (User  type  simula)on)  –  Risk-­‐seeking    –  Risk-­‐medium    –  Risk-­‐averse    

 

Experimental  Results  

1.0 0.5 0.75 0.25 0.25 0.0

……

0.0 ……

Risk-­‐seeking

Risk-­‐medium

Risk-­‐averse ToP

ToP

ToP MoP

MoP

BoP

BoP

MoP BoP

NS

NS

Ver)cals Majority  preference

Page 43: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  3  (Assessment  Grade)  Results  

•  Inter-­‐assessor  agreement  are  moderate  and  different  users  have  different  risk-­‐level.  

•  Ver)cal  Relevance  Correla)on  – Most  of  the  binary  approach  significantly  correlates  with  the  mul)-­‐graded  ground-­‐truth,  however  mostly  are  modest.  

– Risk-­‐medium  thresholding  approach  performs  best.  

 

Experimental  Results  

Page 44: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Study  3  (Assessment  Grade)  Results  

•  Inter-­‐assessor  agreement  are  moderate  and  different  users  have  different  risk-­‐level.  

•  Ver)cal  Relevance  Correla)on  – Most  of  the  binary  approach  significantly  correlates  with  the  mul)-­‐graded  ground-­‐truth,  however  mostly  are  modest.  

– Risk-­‐medium  thresholding  approach  performs  best.  

 

Experimental  Results  

Page 45: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Final  take-­‐out •  Study  1  –  Assessing  for  aggregated  search  is  difficult.  –  Highly  relevant  ver)cals  overlaps  significantly  for  pre-­‐retrieval  and  post-­‐retrieval  user  perspec)ves.  

–  Ver)cal  (type)  orienta)on  is  more  important  than  topical  relevance.  •  Study  2  –  Anchor-­‐based  approach  might  be  a  be`er  approach  than  inter-­‐dependent  approach,  with  respect  to  u)lity-­‐effort  trade-­‐off.    

–  Context  ma`ers.  •  Study  3  –  Binary  approach  can  be  used  to  determine  “perfect”  embedding  posi)on  of  the  ver)cals  and  it  performs  rela)vely  well  with  not  a  lot  of  assessments.  

Page 46: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Final  take-­‐out •  Study  1  –  Assessing  for  aggregated  search  is  difficult.  –  Highly  relevant  ver)cals  overlaps  significantly  for  pre-­‐retrieval  and  post-­‐retrieval  user  perspec)ves.  

–  Ver)cal  (type)  orienta)on  is  more  important  than  topical  relevance.  •  Study  2  –  Anchor-­‐based  approach  might  be  a  be`er  approach  than  inter-­‐dependent  approach,  with  respect  to  u)lity-­‐effort  trade-­‐off.    

–  Context  ma`ers.  •  Study  3  –  Binary  approach  can  be  used  to  determine  “perfect”  embedding  posi)on  of  the  ver)cals  and  it  performs  rela)vely  well  with  not  a  lot  of  assessments.  

Page 47: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Final  take-­‐out •  Study  1  –  Assessing  for  aggregated  search  is  difficult.  –  Highly  relevant  ver)cals  overlaps  significantly  for  pre-­‐retrieval  and  post-­‐retrieval  user  perspec)ves.  

–  Ver)cal  (type)  orienta)on  is  more  important  than  topical  relevance.  •  Study  2  –  Anchor-­‐based  approach  might  be  a  be`er  approach  than  inter-­‐dependent  approach,  with  respect  to  u)lity-­‐effort  trade-­‐off.    

–  Context  ma`ers.  •  Study  3  –  Binary  approach  can  be  used  to  determine  “perfect”  embedding  posi)on  of  the  ver)cals  and  it  performs  rela)vely  well  with  not  a  lot  of  assessments.  

Page 48: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Conclusions  

•  We  compare  different  ver)cal  relevance  assessment  processes  and  analyzed  their  impact.  

•  Our  work  has  implica)ons  with  regard  to  "how"  and  "what"  evalua)on  design  decisions  affect  the  actual  evalua)on.    

•  This  work  also  creates  a  need  to  re-­‐interpret  previous  evalua)on  efforts  in  this  area.    

Page 49: Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Ques)ons?  

•  Thanks!  


Recommended