June Spark meetup : search as recommandation

transcript

Search as recommendation

Avec les technologies de nos jours!

Bertrand Dechoux

Jeudi 11 Juin 2015

Spark User

Bertrand Dechoux @BertrandDechoux

FROM DATA TO SALES

Réseaux Bayésiens

Inspiré de faits réels

Ce que nous allons voir

• search vs recommandation

• reco full-text (ElasticSearch)

• 11 avril 2015 => Mahout 0.10 (samsara)

• 25 février 2015 => Confluent 1.0

Recherche ou recommandation ?

classification binaire ou système de tri => precision / recall / DCG …

factorisation de matrices => minimisation des écarts mis au carré

Il était une fois…Hadoop

Lucene1999

Elastic Search

Solr2004

1/3 : Stockage du Comportement

Application Web

Historique récent

temporaire

Historique complet

permanent

Moteur recherche

full-text utilisateur

Analyse de similarité

2/3 : Analyse en batch

Application Web

Historique récent

temporaire

Historique complet

permanent

Moteur recherche

3/3 : Reco full-text

Application Web

Historique récent

temporaire

Historique complet

permanent

Moteur recherche

ES : données

curl -XPUT 'http://localhost:9200/search/item/1' -d ‘{"name" : “spark", "languages" : "scala java python” }’

curl -XPUT 'http://localhost:9200/search/item/2' -d ‘{"name" : “gatling", "languages" : “scala" }’

curl -XPUT 'http://localhost:9200/search/item/3' -d ‘{"name" : “scikit-learn“, "languages" : "python” }'

ES : recherche

curl -XGET 'http://localhost:9200/search/item/_search?q=scala'

[{"_score":0.19178301, "_source":{ "name" : "gatling", "languages" : "scala"}},

{"_score":0.15342641, "_source":{ "name" : "spark", "languages" : "scala java python" }}]

ES : recommendation

{ "name" : “product1", "description" : “a long description of this product", "similar" : “product2 product4 product7”, "category" : “categoryA”

0.10 : Samsara

• “25 avril 2014 - Goodbye MapReduce” • “11 avril 2015 - Samsara”

• inspiré de R, aidé par Scala • cross-platform (Spark, H2O, Flink?, Ignite?, …) • “Hive pour math” (Dmitriy Lyubimov)

0.10 : spark-itemsimilarity

• nouvelle version utilisant Spark • et DistributedRowMatrix

• supporte seulement LogLikelihoodRatio (LLR) • “Surprise and coincidence” (Ted Dunning)

• entrées et sorties en format text

Item 1

item 2

item 3

item 4

item 5

item 6 ...

user 1

user 2

user 3

user 4

user 5

user 6...

mahout spark-itemsimilarity -i input -o output

2 paramètres à retenir : • --omitStrength • --maxSimilaritiesPerItem

Backbone specs

• scalable (performance) • cross-tech • structuré • cohérence • sécurisé

Confluent : Kafka platform

• Kafka • + REST server • + Schema Registry (avro comme standard) • + Camus

Pat Ferrel : http://occamsmachete.com/ml/

http://confluent.io/

https://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html

Et toi, que fais tu?

Sur LinkedIn : https://goo.gl/1mi0Me

June Spark meetup : search as recommandation

Technology