+ All Categories
Home > Data & Analytics > Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

Date post: 09-Feb-2017
Category:
Upload: eoda-gmbh
View: 63 times
Download: 0 times
Share this document with a friend
22
© 2010 2016 eoda GmbH Erik Barzagar-Nazari eRum 2016 Erik Barzagar-Nazari Data Scientist Data Science outside Developing a Generic Scoring Algorithm for Customer Acquisition
Transcript
Page 1: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari eRum 2016Erik Barzagar-NazariData Scientist

Data Science outside

Developing a Generic Scoring Algorithm for Customer Acquisition

Page 2: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Interdisciplinary Team Statisticians | Engineers | Economists | Sociologist | …

Based in Kassel - Germany

Data Science Consulting, Training, Support, Software and Analytic Services with a focus on R

About eoda

Page 3: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Aims of Today’s Talk

I Present a real-world case study

II Discuss unique challenges

III Take a look into our solution

IV Reflect the benefits of using R

Page 4: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Our Client: databyte GmbH

Provides business information

Database of about five million companies

100 million pieces of information such as sales, size, branches and many more

Daily updated!

Page 5: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Use Case: Customer Acquisition

databyte’s clients are usually businesses/organizations…

…looking for new business clients(e.g. for direct marketing campaigns)

Page 6: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Use Case: Customer Acquisition

List of current customers

Dataset of new potential business clients

Scoring

Start

Page 7: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Case Study: Our Task

Main taskDevelop a new scoring algorithm, that…

…learns from the current customer

base &…

…identifies the most promising entries in databyte’s database.

Page 8: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Challenges

Image source: http://vignette3.wikia.nocookie.net/simpsons/images/4/43/Daredevil_bart.jpg/revision/latest/scale-to-width-down/1000?cb=20160619043051

Page 9: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Challenges | Training on Customer Data

Standard approachTrain a binary classifier to distinguish between non-customers & customers

{0;1}

Bad News: Does not work in this case, because we only know the positive data.

Page 10: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Challenges | Training on Customer Data

Positive Data = Customer DataAlready known customers of the clientP

UUnlabeled Data = databyte’s DatabaseContains companies, that may fit into the clients customer base as well as companies that do not

NNegative Data = ?Companies, that definitely do no fit into the client’s customer base

Page 11: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Challenges | Training on Customer Data

Positive-Unlabeled-ClassificationPUThere are strategies to deal with PU-Problems, but…

…there are no well established best practices yet…strategies usually require strong assumptions…PU-Classifiers require a lot of tuning, and are quite fragile

Page 12: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Challenges | Self-Training Algorithm

databyte has many clients

Page 13: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Challenges | Self-Training Algorithm

The scoring algorithm must be able to train itselfbased on

unseen training data (= customer lists)!

Page 14: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Challenges | Conclusion

PU

We have to get creative!

Page 15: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Solution

Page 16: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Solution | Basic Idea

Our approach is based on similarities.

Core concept:1. Cluster customer data and extract medoids, these are

representative customers

2. Calculate similarities between database entries and medoids

Page 17: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Solution | Basic Steps

Segmentation StepIdentify segments based on branches

Core concept:1. Cluster customer data and extract medoids, these are

representative customers2. Calculate similarities between database entries and

medoids

Weighting StepWeight similarities based on the distribution of branches

Page 18: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Solution | Pros & Cons

It works and performs nicely!

Comprehensible approach, even for laymen.

Similarity calculation is costly.

Lack of “rock-solid” theory.

Pro

Con

Page 19: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Benefits of Using R

Page 20: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Benefits of Using R

{data.table} {proxy} fpc::pamk()

Fast & efficient data handling

Library of distance and similarity measures

Allows calculation of cross-proximities

Many measures are implemented in C!

Partitioning around medoids…

…with estimation of number of clusters

Page 21: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

Thank youfor your attention!

Any questions?

Page 22: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition

© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de

@eodaGmbH

@eodaGmbH eodaGmbH

blog.eoda.de

eoda GmbHUniversitätsplatz 12

34127 Kassel - Germany

www.eoda.de/[email protected]

+49 561 202724-40

The Data Science Specialists.


Recommended