+ All Categories
Home > Technology > How Lucene Powers the LinkedIn Segmentation and Targeting Platform

How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Date post: 15-Jan-2015
Category:
Upload: hien-luu
View: 482 times
Download: 1 times
Share this document with a friend
Description:
 
Popular Tags:
42
Transcript
Page 1: How Lucene Powers the LinkedIn Segmentation and Targeting Platform
Page 2: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

HOW LUCENE POWERS LINKEDIN SEGMENTATION & TARGETING PLATFORM

Hien Luu & Raj Rangaswamy

Page 3: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

About Us

Hien  Luu   Rajasekaran  Rangaswamy  

Page 4: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

•  Little bit about LinkedIn •  Segmentation & Targeting Platform Overview •  How Lucene powers Segmentation & Targeting Platform •  Q&A

Agenda

Page 5: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Our Mission Connect the world’s professionals to make them

more productive and successful.

Our Vision Create economic opportunity for every

professional in the world.

Members First!

Page 6: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

©2013  LinkedIn  Corpora3on.  All  Rights  Reserved.  

The world’s largest professional network Over 65% of members are now international

Company  Pages    

>3M  

Languages    

>30M  

>90%  

Fortune  100  Companies    use  LinkedIn  Talent  Soln  to  hire  

Professional  searches  in  2012    

>5.7B  19  

Page 7: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Other Company Facts •  Headquartered  in  Mountain  View,  Calif.,  with  offices  around  the  world! •  LinkedIn  has  ~4200  full-­‐3me  employees  located  around  the  world    

Page 8: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Segmenta3on  &  Targe3ng  PlaRorm  Overview  

Page 9: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Segmentation & Targeting Platform Overview

Page 10: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Segmentation & Targeting Platform Overview

Page 11: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Segmentation & Targeting Platform Overview 1. Create attributes

§  Name §  Email §  State §  Occupation §  Etc.

2. Attributes Added to Table

Name   Email   State   OccupaEon   …  

John  Smith   [email protected]   California   Engineer  

Jane  Smith   [email protected]   Nevada   HR  Manager  

3. Create Target Segment: California, Engineer

Name   Email   State   OccupaEon  

John  Smith   [email protected]   California   Engineer  

Jane  Doe   [email protected]   California   Engineer  

4. Export List & Send Vendor

Jane  Doe   [email protected]   California   Engineer  

Page 12: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

•  Business definition –  Business would like to launch new campaigns often –  Business would like to specify targeting criteria using

arbitrary set of attributes –  Attributes need to be computed to fulfill the targeting

criteria –  The attribute data resides on Hadoop or TD –  Business is most comfortable with SQL-like language

Segmentation & Targeting Platform Overview

Page 13: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Segmentation & Targeting Platform Overview

A[ribute  Computa3on    

Engine  

A[ribute    Serving    Engine  

Page 14: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Segmentation & Targeting Platform Overview

A[ribute  Computa3on    

Engine  

Self-service

Support  various  data  sources  

Attribute consolidation

Attribute availability

Page 15: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Segmentation & Targeting Platform Overview

Attribute computation

~238M

PB

TB

TB

~440

Page 16: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Segmentation & Targeting Platform Overview

A[ribute    Serving    Engine  

Self-service

A[ribute  predicate  expression  

Build segments

Build lists

Page 17: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Segmentation & Targeting Platform Overview

Attribute Serving Engine

$  

 count filter sum complex

expressions

Σ  1234

~238M

~440

Page 18: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Segmentation & Targeting Platform Overview

Who are north American recruiters that don’t work for a competitor?

Who are the LinkedIn Talent Solution prospects in Europe?

Who are the job seekers?

Page 19: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Segmentation & Targeting Platform Overview

Page 20: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

How  Lucene  powers  Segmenta3on  &  Targe3ng  PlaRorm  

Page 21: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

•  Architecture –  Indexer Architecture –  Serving Architecture

•  Load Balanced Model •  Next Steps - Distributed Model •  DocValues •  Lessons Learnt •  Why not use an existing solution?

How Lucene powers Segmentation & Targeting Platform

Page 22: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Architecture

Data

StorageLayer

AttributeCreationEngine

AttributeMaterialization

EngineAttributeComputationEngine

AttributeMetastore

AttributeIndexingAttribute

ServingEngine

AttributeServingEngine

Page 23: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Architecture

Index Merger

Web Servers

HDFS

shard 1

shard 2

shard n

Avro data in HDFS

mysql attribute

store

Hadoop Indexer MR

Attribute Definitions

LuceneOutputFormat  RecordWriter        LuceneDocumentWrapper                        

           Document                            Index  

Mapper  K=>  AvroKey<GenericRecord>    V=>  AvroValue<NullWritable>   Reducer  K=>  NullWritable    V=>  LuceneDocumentWrapper  

Page 24: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Architecture JSON  Predicate  Expression  

JSON  Lucene    Query  Parser  

Inverted    Index  

Inverted    Index  

Inverted    Index  

Segment  &  List  

Page 25: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

•  Architecture –  Indexer Architecture –  Serving Architecture

•  Load Balanced Model •  Next Steps - Distributed Model •  DocValues •  Lessons Learnt •  Why not use an existing solution?

How Lucene powers Segmentation & Targeting Platform

Page 26: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Serving – Load Balanced Model

Shard 1

Shared Drive

Shard 2 Shard n

Web Server 2 Web Server nWeb Server 1

Load Balancer

HTTP Request

Page 27: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Serving – Load Balanced Model

But  Wait…..  

•  Is  load  balancing  alone  good  enough?  

•  What  about  distribu3on  and  failover?  

Page 28: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

•  Architecture –  Indexer Architecture –  Serving Architecture

•  Load Balanced Model •  Next Steps - Distributed Model •  DocValues •  Lessons Learnt •  Why not use an existing solution?

How Lucene powers Segmentation & Targeting Platform

Page 29: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Next Steps – Distributed Model

•  A  generic  cluster  management  framework  

•  Manage  par33oned  and  replicated  resources  in  distributed  systems  

•  Built  on  top  of  Zookeeper  that  hides  the  complexity  of  ZK  primi3ves  

•  Provides  distributed  features  such  as  leader  elec3on,  two-­‐phase  

commit  etc.  via  a  model  of  state  machine  

 hLp://helix.incubator.apache.org/  

Page 30: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Next Steps – Distributed Model

Shard 1

Shard 2

Web Server 2 Web Server 3Web Server 1

Load Balancer

HTTP Request

Scatter Gather

active

standby

Shard 2

Shard3

active

standby

Shard 3

Shard1

active

standby

Page 31: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Next Steps – Distributed Model

Shard 1

Shard 2

Web Server 2 Web Server 3Web Server 1

Load Balancer

HTTP Request

Scatter Gather

active

standby

Shard 2

Shard3

active

active

Shard 3

Shard1

failure

failure

Page 32: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

•  Architecture –  Indexer Architecture –  Serving Architecture

•  Load Balanced Model •  Next Steps - Distributed Model •  DocValues •  Lessons Learnt •  Why not use an existing solution?

Page 33: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

•  Once segments are built, users want to forecast, see a

target revenue projection for the campaigns that they

want to run.

•  Campaigns can be run on various Revenue Models

•  This involves adding per member Propensity Scores and

Dollar Amounts

DocValues – Use Case

Page 34: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

DocValues – Why not Stored Fields?

Why  not  use  Stored  Fields?  

•  Stored  fields  have  one  indirec3on  per  

document  resul3ng  in  two  disk  seeks  

per  document  

•  Performance  cost  quickly  adds  up  when  

fetching  millions  of  documents  

Document ID

.fdx fetch filepointer to field data

.fdt scan by id until field is found

Page 35: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

•  Why not use Field Cache?

–  Is memory resident

–  Works fine when there is enough memory

–  But keeping millions of un-inverted values in memory is

impossible

–  Additional cost to parse values (from String and to String)

DocValues – Why not Stored Fields?

Page 36: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

•  Dense column based storage

–  (1 Value per Document and 1 Column per field and segment)

•  Accepts primitives

•  No conversion from/to String needed

•  Loads 80x-100x faster than building a FieldCache

•  All the work is done during Indexing

•  DocValue fields can be indexed and stored too

DocValues

Page 37: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

•  Architecture –  Indexer Architecture –  Serving Architecture

•  Load Balanced Model •  Next Steps - Distributed Model •  DocValues •  Lessons Learnt •  Why not use an existing solution?

Page 38: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Indexing •  Reuse index writers, field and document instances

•  Create many partitions and merge them in a different process

•  Rebuild (bootstrap) entire index if possible

•  Use partial updates with caution

•  Analyze the index

Lessons Learnt

Page 39: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Serving •  Reuse a single instance of IndexSearcher

•  Limit usage of stored fields and term vectors

•  Plan for load balancing and failover

•  Cache term frequencies

•  Use different machines for serving and indexing

Lessons Learnt

Page 40: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

•  Architecture –  Indexer Architecture –  Serving Architecture

•  Load Balanced Model •  Next Steps - Distributed Model •  DocValues •  Lessons Learnt •  Why not use an existing solution?

Page 41: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Why not use existing solutions?

•  Doesn’t  allow  dynamic  schema  •  Difficult  to  bootstrap  indexes  built  in  Hadoop  •  Indexing  elevates  query  latency    

•  Doesn’t  allow  dynamic  schema  •  Difficult  to  bootstrap  indexes  built  in  Hadoop  •  Larger  memory  overhead  •  Compara3vely  slow  

Page 42: How Lucene Powers the LinkedIn Segmentation and Targeting Platform

Ques3ons?    

More  info:  data.linkedin.com  


Recommended