+ All Categories
Home > Data & Analytics > Data Science in E-commerce

Data Science in E-commerce

Date post: 17-Jan-2017
Category:
Upload: vincent-michel
View: 314 times
Download: 0 times
Share this document with a friend
30
Data Science in E-commerce industry DSSP 2016/05/20 Vincent Michel Big Data Europe, BDD, Rakuten Inc. / PriceMinister [email protected] @HowIMetYourData
Transcript
Page 1: Data Science in E-commerce

Data Science in E-commerce industry DSSP 2016/05/20Vincent Michel

Big Data Europe, BDD, Rakuten Inc. / PriceMinister

[email protected] @HowIMetYourData

Page 2: Data Science in E-commerce

2

Short Bio

ESPCI: engineer in Physics / Biology

ENS Cachan: MVA Master Mathematics Vision and Learning

INRIA Parietal team: PhD in Computer ScienceUnderstanding the visual cortex by using classification techniques

Logilab – Development and data science consultingData.bnf.fr (French National Library open-data platform)Brainomics (platform for heterogeneous medical data)

EducationExperience

Rakuten PriceMinister– Senior Developer and data scientistData engineer and data science consulting

Page 3: Data Science in E-commerce

Software engineeringLessons learned from (painful) experiences

Page 4: Data Science in E-commerce

4

Do not redo it yourself !

Lots of really interesting open-source libraries for all your needs:Test first on a small POC, then contribute/developScikit-learn, pandas, Caffe, Scikit-image, opencv, ….Be careful: it is really easy to do something wrong !

Open-data:More and more open-data for catalogs, …E.g. data.bnf.fr

~ 2.000.000 authors~ 200.000 works~ 200.000 topics

Contribute to open-source:Is there a need / pool of potential developers ?Do it well (documentation / test)Unless you are doing some kind of super magical algorithmMay bring you help, bug fixes, and engineers ! But it takes time and energy

Page 5: Data Science in E-commerce

5

Quality in data science software engineering

Never underestimates integration costReally easy to write a 20 lines Python code doing somefancy Random Forests… …that could be really hard to deploy (data pipeline, packaging, monitoring)Developer != DevOps != Sys admin

Make it clean from the start (> 2 days of dev or > 100 lines of code):Tests, tests, tests, tests, tests, tests, tests, …DocumentationPackaging / supervision / monitoringRelease often release earlierAgile development, Pull request, code versioning

Choose the right tool:Do you really need this super fancy NoSQL databaseto store your transactions?

Page 6: Data Science in E-commerce

6

Monitoring and metrics

Always monitor:Your development: continuous integration (Jenkins)Your service: nagios/shinkenYour business data (BI): KibanaYour user: trackerYour data science process : e.g. A/B test

Evaluation:Choose the right metricPrediction accuracy / Precision-recall …Always A/B test rather than relying on personal thoughtsGood question leads to good answer: Define your problem

Page 7: Data Science in E-commerce

Hiring remarksFinding the good data scientist

Page 8: Data Science in E-commerce

8

Finding your data scientist

Do not try to find a unicorn!

Define your needs(and unicorns no longer exist…)

Page 9: Data Science in E-commerce

9

Few remarks on hiring – my personal opinion

Be careful of CVs with buzzwords!E.g. “IT skills: SVM (linear, non-linear), Clustering (K-means, Hierarchical), Random Forests, Regularization (L1, L2, Elastic net…) …”It is like as someone saying “ IT skills: Python (for loop, if/else pattern, …)

Often found in Junior CVs (ok), but huge warning in Senior CVs

Hungry for data?Loving data is the most important thing to checkOpendata? Personal project? Curious about data? (Hackaton?)Pluridisciplinary == knowing how to handle various datasets

Check for IT skills:Should be able to install/develop new libraries/algorithmsA huge part of the job could be to format / cleanup the dataExperience VS education -> Autonomy

Page 10: Data Science in E-commerce

Recommendations @RakutenData science use-case

Page 11: Data Science in E-commerce

11

Rakuten Group Worldwide

Recommendationchallenges

Different languagesUsers behaviorBusiness areas

Page 12: Data Science in E-commerce

12

Rakuten Group in Numbers

Rakuten in Japan

> 12.000 employees> 48 billions euros of GMS> 100.000.000 users> 250.000.000 items> 40.000 merchants

Rakuten Group

Kobo 18.000.000 usersViki 28.000.000 usersViber 345.000.000 users

Page 13: Data Science in E-commerce

13

Rakuten Ecosystem

Rakuten global ecosystem :Member-based business model that connects Rakuten servicesRakuten ID common to various Rakuten servicesOnline shopping and services;

Main business areasE-commerceInternet financeDigital content

Recommendation challengesCross-servicesAggregated dataComplex users features

Page 14: Data Science in E-commerce

14

Rakuten’s e-commerce: B2B2C Business Model

Business to Business to Consumer:Merchants located in different regions / online virtual shopping mallMain profit sources

• Fixed fees from merchants• Fees based on each transaction and other service

Recommendationchallenges

Many shopsItems referencesGlobal catalog

Page 15: Data Science in E-commerce

15

Big Data Department @ Rakuten

Big Data Department150+ engineers – Japan / Europe / US

Missions

Development and operations of internal systems for:

RecommendationsSearchTargetingUser behavior tracking

Average traffic

> 100.000.000 events / day> 40.000.000 items view / day> 50.000.000 search / day> 750.000 purchases / day

Technology stackJava / Python / RubySolr / LuceneCassandra / CouchbaseHadoop / Hive / PigRedis / Kafka

Page 16: Data Science in E-commerce

16

Recommendations on Rakuten Marketplaces

Non-personalized recommendationsAll-shop recommendations:

Item to itemUser to item

In-shop recommendationsReview-based recommendations

Personalized recommendationsPurchase history recommendationsCart add recommendationsOrder confirmation recommendations

System status and scaleIn production in over 35 services of Rakuten Group worldwideSeveral hundreds of servers running:

HadoopCassandraAPIS

Page 17: Data Science in E-commerce

17

Challenges in Recommendations

ItemsCatalogue

ItemsSimilarity

Recommendationsengine

EvaluationProcess

Items cataloguesCatalogue for multiple shops with different items

references ?Items similarity / distances

Cross services aggregation ?Lots of parameters ?

Recommendations engineBest / optimal recommendations logic ?

Evaluation processOffline / online evaluation ?Long-tail ? KPI ?

Page 18: Data Science in E-commerce

18

Recommendations Architecture: Constantly Evolving

BrowsingEvents

Cocounts Storage

PurchaseEvents

Cat

alog

ue(s

)

Dis

tribu

tion

laye

r

RecommendationsOffline / materialized

RecommendationsOnline algebra / multi-arm

Page 19: Data Science in E-commerce

19

Items Catalogues

Use different levels of aggregation to improve recommendations

Category-level(e.g. food, soda, clothes, …)

Product-level(manufactured items)

Item in shop-level(specific product sell by a specific shop)

Increased statistical power in co-events computation

Easier business handling(picking the good item)

Page 20: Data Science in E-commerce

20

Enriching Catalogues using Record Linkage

Marketplace 2Marketplace 1 Reference database

Record linkage Use external sources (e.g., Wikidata) to align markets' products Fuzzy matching of 600K vs 350K items for movies alignments usecase. Blocking algorithm

Cross recommendation Global catalog Items aggregation Helps with cold start issues Improved navigation

Page 21: Data Science in E-commerce

21

Co-occurrences and Similarities Computation

Only access to unitary data (purchase / browsing)

Use co-occurrences for computing items similarity

Multiple possible parameters: Size of time window to be considered:

Does browsing and purchase data reflect similar behavior ?

Threshold on co-occurrencesIs one co-occurrence significant enough to be used ? Two ? Three ?

Symmetric or asymmetricIs the order important in the co-occurrence ? A then B == B then A ?

Similarity metricsWhich similarity metrics to be used based on the co-occurrences ?

Page 22: Data Science in E-commerce

22

Co-occurrences Example

Browsing

Purchase

Session ? Session ?Time window 1

Session ?Time window 2

07/11/2015 08/11/2015

08/11/2015

24/11/2015

08/11/2015

08/11/2015

10/09/2015

08/09/2015

10/09/2015

Page 23: Data Science in E-commerce

23

Co-occurrences Computation

Co-purchases

Co-browsing

Classical co-occurrences

Complementaryitems

Substituteitems

Other possible co-occurrences

Items browsed and bought together

Items browsed and not bought together

“You may also want…”

“Similar items…”

08/11/2015

08/11/2015

08/11/2015

07/11/2015

08/11/201510/09/2015

08/09/2015

07/11/2015

Page 24: Data Science in E-commerce

24

Recommendation Quality Challenges

Recommendations categories

Cold start issue• External data ?• Cross-services ?

Hot products (A)• Top-N items ?

Short tail (B)

Long tail (C + D)

Minor Product

Major Product

(Popular)New Product

OldProduct

(A)(B)

(D)

(C)

Page 25: Data Science in E-commerce

25

Long Tail is Fat

Long tail numbers

• Most of the items are long tail• They still represent a large

portion of the traffic

Long tail approaches

• Content-based• Aggregation / clustering• Personalization

Popular

Short tail

Long tail

Browsing share Number of items

Long tail Short tail Popular

Page 26: Data Science in E-commerce

26

Recommendations Offline Evaluation

Pros/Cons

• Convenient way to try new ideas

• Fast and cheap• But hard to align

with online KPI

Approaches

• Rescoring• Prediction game• Business simulator

Page 27: Data Science in E-commerce

27

Public Initiative – Viki Recommendation Challenge

567 submissions from 132 participantshttp://www.dextra.sg/challenges/rakuten-viki-video-challenge

Page 28: Data Science in E-commerce

28

Datascience everywhere !

Rakuten provides marketplaces worldwide

Specific challenges for recommendations

Items catalogue: reinforce statistical power of co-occurrences across shops and services;

Items similarities: find the good parameters for the different use-cases;

Recommendations models: what is the best models for in-shop, all-shops, personalization?

Evaluation: handling long-tail? Comparing different models?

Page 30: Data Science in E-commerce

30

We are Hiring!

Big Data Department – team in Parishttp://global.rakuten.com/corp/careers/bigdata/

http://www.priceminister.com/recrutement/?p=197  

Data Scientist / Software Developer

Build algorithms for recommendations, search, targeting Predictive modeling, machine learning, natural language processing Working close to business Python, Java, Hadoop, Couchbase, Cassandra…

Also hiring: search engine developers, big data system administrators, etc.


Recommended