Post on 02-Jan-2016
description
transcript
Search Science @ eBay… from a Web Search perspective
Tim ConverseSenior Director
Head of Search Science Engineering
Outline
• The eBay Search problem (starting from Web search)–What’s the same? What’s different?
• Search Science @ eBay–Who we are–What we do
• Current frontier
My background• Inktomi => Yahoo! Web Search
– Spam detection / doc classification
• Powerset => Bing Web Search (Microsoft)– ML Ranking and metrics for NLP-driven engine– Led Search Summaries group for Bing
• Jybe => Yahoo!– Co-founded small mobile personalization co.– Personalization of main Yahoo! page
• Joined eBay November 2013
The eBay Search Problem
The eBay Search Results Page (SRP)
The eBay Search Results Page (SRP)
Auction listing
The eBay Search Results Page (SRP)
Fixed-price listing
The eBay Search Results Page (SRP)Category refinements
The eBay Search Results Page (SRP)
Aspect refinements
The eBay Search Results Page (SRP)
Related Searches
The eBay Search Results Page (SRP)
Sort type:
Search Intents
• Informational– Info in document or
summary
• Navigational– Get me to a known
site or page
• Transactional– I want to buy
something
Web Search (Broder ’02)
• Transactional–Most common:
want to buy something specific
– Less common: browsing/window-shopping
eBay Search
Search inputsWeb eBayDocuments from Web Listings from sellers (not
products!)Mostly unstructured text Semi-structuredLink graph N/AAnchortext N/AN/A Seller history, reputationUser clicks, dwell time ClicksN/A Conversions(!)
Search challengesWeb eBayCrawling for discovery N/AIndex freshness Freshness + short-lived inventoryQuery understanding/rewriting
Query understanding/rewritingMapping to categories/aspects
Web graph analysis N/AAuthoritativeness Seller trustworthinessResult ranking Auction ranking
Pricing and value
eBay Scale![Numbers from 2012 / 2013]• ~130 million buyers and sellers• 250 million queries/day• ~1 billion listings• 100 petabytes of data (Hadoop / Terabyte)• ~2 billion pageviews / day• $75 billion of merchandise sold (2012)
(mostly as a result of searches)
Search ecosystemWeb eBayPublishers mostly passively crawled, indexed
Sellers must actively list“Paying customer”Implications: eBay can impose requirements Sellers can complain
Advertising is primary model Transaction fees primaryPolicies enforced by removal or demotion
Policies enforced at listing time and by later sanctions
Types of spamWeb eBayKeyword stuffing Keyword stuffing (title)
Duplication Duplication
Link spam Shill bidding and buying
Cloaking/redirects/invisible text N/A (no links, eBay controls presentation)
N/A Fitment spam
N/A Multi-SKU spam (targeting sorts)
Unique e-commerce challengeseBay point of difference ConsequenceItems, not products • Ranking bags of words
• Hard to compare relative value of items
Open door selling policy • Limited standardisation of seller behaviour more dimensions of choice
New vs. Refurb vs. Used vs. Broken Shipping time/cost/CBT/custom/Returns/ Service C2C / b2C / B2C
• Incomplete/inaccurate structured data• Spam / Trust
Many options for selling : solving for edge cases
• Complexity in search. For example:• 1/3/5/7/10/30 day FP, 1/3/5/7/10 day auctions, ABIN,
BIN with Best offer, live auctions, pickup only• Auction ranking and intermingling is required
Broad and deep inventory Buyers need to refine / disambiguate before they buy
Long history / heavy buyers Existing workflows are deeply ingrained. Resistance to change
Search Science @ eBay
Search Science Mission• Mission : Help the user find the item they want as quickly
and easily as possible• Users typically care about three things in search results
– Relevance– Trust– Value
• Success Measures : – Revenue per user (primary)– Human-judged relevance tests
• Search is a product of Search Front End, Search Back End, and Search Science
Search Science Areas
Recall• Which items match the
user query
Ranking• Which items are you most
likely to want to buy
Universal Search• Relevance of left nav,
related searches and inline elements
Metrics• Tracking performance,
delivering insights and developing internal tools
Spam• Spotting deliberate search
manipulation, actioning via CS & neutralizing on site
Recall• Responsible for mapping a query to a set of items
to return• Primarily accomplished by query rewriting
– User query transformed into much larger back-end query– Terms mapped to: synonyms, plural/singular, categories,
aspects• e.g. [red dress] => [category = dresses, color = red]
– Phrases enforced ([iphone 5])– Whole-query expansions for popular queries– Process driven by data-mining on
queries/clicks/purchases
Ranking• Given a query and a recall set, rank the set• Largely machine-learned (with business rules on top)
– Separate machine-learned models for auction and fixed-price– Auction / fixed-price interleaving handled separately
• A number of targets for machine learning– Clicks– Revenue per query/item (Why this is a good idea)
• Features:– Query/item match features, clicks & sales, seller metrics ….
Refinements and Universal Search• Responsible for most non-item SRP elements:
– Aspect refinements– Category refinements– Related searches– Inline elements (universal search)– Snippets
• Mostly driven by offline analysis of user behavior
Refinements and Universal Search
Category refinements
Refinements and Universal Search
Aspect refinements
Refinements and Universal Search
Related Searches
Metrics, Tools, Monitoring• Human judgments for training sets, internal
metrics
• Tools for scraping, studying queries and result sets
• Intelligent alerting on relevance and system problems
Spam• Effort broader than Search Science• Alternatives:
– Educate (explain to sellers why they shouldn’t game)– Block (don’t allow sellers to add spammy listings in first
place)– Neutralize (algorithmically remove advantage of abuse)– Enforce (remove bad listings and sellers)
• Search Science helps neutralize and flag for enforcement
Details of our anti-spam algorithms
Details of our anti-spam algorithms
[This page intentionally left blank]
Current Frontier
Query segmentation by frequency
HeadThousands
TorsoHundreds of thousands
Unique queries
TailTens of millions
Result set size
HeadMillions
TorsoTens of
thousands
TailHundreds
Immediate ConversionImmediate Filtering
Head
Torso(1/3 Head)
Tail(1/5 Head)
Head
Torso(3x Head)
Tail(5x Head)
Different opportunities by segment
Head • Store per-query info (likely categories)• Browse-oriented experience• Forefront refinements
Torso • Store per-query info (likely categories)
Tail • No per-query info (match similar head/torso)?• Category prediction as classification?• Null/Low recovery (broaden result set)
Query understanding (annotation)
• Query: [white gold hoop earrings]• Stage 1 (bag of words): hoop, gold, earrings,
white• Stage 2 (phrasing and rewriting):– “white gold” hoop earrings, or– color=gold, color=white, hoop, earrings
category=jewelry• Stage 3:– ProductType = earrings, style=hoop,
material=[white gold], color=white, category=jewelry
Deterministic sorts• For sorting, we offer– Best Match (relevance sort)– Time: ending soonest– Time: newly listed– Price + Shipping: lowest first– Price + Shipping: highest first– Distance: nearest first
• Deterministic (non-best-match) sorts can surface very irrelevant items
Wedding dress – Best Match
Wedding dress – Price + shipping (low)
Explicit result set construction for diversity
• “3rd-phase ranking” (after recall and ranking)
• Examine result set, enforce diverse mixes of products/interpretations
Personalization and contextualization• Personalization
– What does this person like to {click, buy}– Product types, price ranges, condition, shipping
• Session contextualization– What [product type, price range, condition, etc]
item did this person {click, buy} in the immediately-preceding query?
• Contextualization > personalization
Image understanding for ranking
Query
“red shoes”
Search Results
Technologies we like and use• Hadoop ecosystem
• Scala (Scoobi, Scalding)
• R (gbm)
Q & A