Learned Embeddings for Search and Discovery at Instacart

transcript

Sharath RaoData Scientist / ManagerSearch and Discovery

Collaborators: Angadh Singh

Talk Outline

• Quick overview of Search and Discovery at Instacart

• Word2vec for product recommendations

• Extending Word2vec embeddings to improve search ranking

About me

• Worked on several ML products in catalog, search and personalization at Instacart since 2015

• Currently leading full stack search and discovery team

@sharathrao

Search and Discovery@ Instacart

Help customers find what they are looking for and discover what they might like

Grocery Shopping in “Low Dimensional Space”

Search

Restock

Discover

v“Here is what I want”

Query has easily accessible information content

“Here I am, what next?”

Context is the query, and recommendations vary

as contexts do

Search Discovery

Entities we try to model

products

aisles

departments

retailers

queries

customers

brands

Most of our data products are about modeling relationships

between them

(part of which is learning embeddings)

Common paradigm for search/discovery problems

1st phase: Candidate Generation

2st phase: Reranking

• Select top 100s from among potentially millions

• Must be fast and simple• often not even a learned model

• Recall oriented

Candidate Generation

• Ranks fewer products, but users richer models/features

• Tuned towards high precision

• Often happens in real-time

Reranking

vWord2vec for Product Recommendations

“Frequently bought with” recommendations

Not necessarily consumed together

Help customers shop for the next item

Probablyconsumed together

vQuick Tour of Word2vec

(simplified, with a view to develop some intuition)

v• We need to represent words as features in a model/task

• Naive representation => one hot encoding

• Problem: Every word is dissimilar to every other word => not intuitive

One-hot encodings are good, but not good enough

• stadium, field and restaurant are in some sense similar

• stadium and field are more similar than stadium and restaurant

• game has another meaning/sense that is similar to stadium/field/restaurant

What one-hot encodings fail to capture

Observation 1: “game at the stadium”

Observation 2: “game at the field”

Observation 3: “ate at the restaurant”

Observation 4: “met at the game”

Data What models must learn

None of these are easy to learn with one-hot encoded representations without supplementing with hand-engineered features

vYou shall know a word by the company it keeps - Firth, J.R., 1957

Core motivation behind semantic word representations

• Learns a feature vector per word

• for 1M vocabulary and 100 dimensional vector, we learn 1M vectors => 100M numbers

• Vectors must be such that words appearing in similar contexts are closer than any random pair of words

• Must work with unlabeled training data

• Must scale to large (unlabeled) datasets

• Embedding space itself must be general enough that word representations are broadly applicable

Word2vec is a scalable way to learn semantic word representations

vWord2vec beyond text and words

On graphs where random walks are the sequences

On songs where plays are the sequences

Even the emojis weren’t spared!

vYou shall know a

word product

by the company it keeps by what its purchased with

We applied word2vec to purchase sequences

Typical purchasing session contains tens of cart adds

Sequences of products added to cart are the ‘sentences’

“Frequently Bought With” Recommendations

Extract Training

Sequences

Learn word2vec

representations

Eliminate substitute products

Event Data

Approximate Nearest Neighbors

Cache recommendations

Word2vec for product recommendations

Surfaces complementary

products

Next step is to make recommendations contextual

Not ideal if already shopped for sugar recently

inappropriate if allergic to walnuts

Not if favorite brand of butter isn’t

otherwise popular

vWe see word2vec recommendations as a candidate generation step

vWord2vec for Search Ranking

The Search Ranking ProblemIn response to a query, rank products in the order that

maximizes the probability of purchase

The Search Ranking Problem

Learning to Rank Model

Candidate Generation

Boolean matching

BM25 ranking

Synonym/Semantic Expansion

(Online) Reranking

Matching features

Historical aggregates

Personalization features

Search Ranking - Training

Generate training features

Label Generation with Implicit Feedback

Event Data

Learning to Rank Training

Model Repository

Search Ranking - Scoring

Online reranker

Model Repository

Reranked products

Top N products

productsprocessed query

Final ranking

Features for Learning to Rank

Historical aggregates => normalized purchase counts

High coverage Low precision

Low coverage More precision

[query, product]

[query, product, user]

[query, brand]

[query, aisle][product]

But historical aggregates alone are not enough. Because sparsity and cold start.

[user, brand]

Learning word2vec embeddings from search logs

• Representations learnt from wikipedia or petabyte text aren’t ideal for Instacart

• No constraint that word2vec models must be learnt on temporal or spatial sequences

• We constructed training data for word2vec from search logs

Training Contexts from Search Logs

• Create contexts that are observed and desirable

<query1> <purchased description of converted product1> <query2> <purchased description of converted product2> .. ..

• Learn embeddings for each word in the unified vocabulary of query tokens and catalog product descriptions

Examples of nearest neighbors in embeddings space

OK, so now we have word representations

Matching features require product features in the same space

orowheat bonaparte zissel ryebread

raisins

sunmade monukka raisinettes fruitsource

mushrooms

cremini portabello shitake

Deriving other features from word embeddings

• One option is to create contexts with product identifiers in training sequences

• Promising but we will have a product cold start problem

Use learnt word embeddings to derive features for other entities such as products, brand names, queries and users etc.

Simple averaging of word representations works well

Product

Average embeddings of words in product description

Brands:

Average embeddings of products sold by the brand

Average embeddings of products purchased by the user

Aisles/Departments

Average embeddings of products in the aisle/department

Word Representation

Learnt from converted search logs

Average embeddings of words in query

Wait, so we have 2 different representations for products?

Embeddings learnt from purchase sequences

Embeddings derived from words from search logs

Products are similar if they bought together

Products are similar if their descriptions are semantically similar

Product representations in different embedding spaces

Embeddings learnt from purchase sequences

Products are similar if they bought together

Product representations in different embedding spaces

Embeddings derived from words from search logs

Products are similar if their descriptions are semantically similar

Examples of nearest neighbors for products

Cinnamon Toast Crunch CerealGolden Grahams Cereal

Not much common between the product names

Examples of nearest neighbors for brands

Using word2vec features in search ranking

[query, product]

[query, brand]

[query, aisle]

[query, department]

[user, product]

[user, brand]

[user, aisle]

[user, department]

• Construct similarity scores between different entities as features

Matching Features Personalization Features

We saw significant improvement with word2vec features

100.0%

101.0%

102.0%

103.0%

104.0%

AUC Recall@10

Relative(Improvement(with(word2vec(features

Baseline With:word2vec: features

word2vec features rank high among top features

[query,(aisle](word2vec

[query,(product](historical(aggregate

[query,(department](word2vec

[query,(product](historical(aggregate

[user,(product](word2vec

lda(model(2

lda(model(1(

query(length

[user,(brand](word2vec

[query,(product](word2vec

position

XGBoost logistic loss

with early stopping

vOther contextual recommendation problems

Broad based discovery oriented recommendations

Including from stores customers may have never shopped from

Run out of X?

Rank products by repurchase probability

Introduce customers to products new in the catalog

Also mitigates product cold start problems

Replacement Product Recommendations

Mitigate adverse impact of last-minute out of stocks

Data Products in Search and Discovery

• query autocorrection• query spell correction• query expansion• deep matching/document expansion• search ranking• search advertising

• Substitute/replacements products• Frequently bought with products• Next basket recommendations• Guided discovery• Interpretable recommendations

Search Discovery

Thank you!

We are hiring!

Senior Machine Learning Engineer http://bit.ly/2kzHpcg

Learned Embeddings for Search and Discovery at Instacart

Engineering