+ All Categories
Home > Technology > Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Date post: 15-Jan-2015
Category:
Upload: michael-figuiere
View: 1,363 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
36
Machine Learning with Apache Mahout Classification, Clustering and Recommendation 3/3/2011 Michaël Figuière
Transcript
Page 1: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Machine Learning with Apache Mahout

Classification, Clustering and Recommendation

3/3/2011 Michaël Figuière

Page 2: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Machine Learning

Page 3: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Machine Learning

Artificial Intelligence

Machine Learning

Machine Learning is a subset of Artificial

Intelligence

Page 4: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

NoSQL, Search and Machine Learning

NoSQL, Search and Machine Learning greatly complete

each other !MachineLearning

SearchNoSQL

Page 5: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Machine Learning algorithms

• Recommentations

• Classification

• Clustering

• Patterns mining, evolutionary algorithms, ...

Advice user with recommended items

Automatically classify documents based on a given set of examples

Automatically discover groups within a set of documents

Page 6: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Recommendation - User based

Amazon suggests articles bought

by similar customers

Page 7: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Recommendation - Item based

On the article page Amazon leverages item based recommendation

Page 8: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Similarities between users

A B D E FC

1 2

1

Here we observes that users 1 and 2 have similar tastes

Page 9: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Recommendation use cases

• Advice user with items on e-commerce websites

• Advice user with feature he may be interested in on a Web application

• Filter and adapt scoring of results of a search engine

And increase revenue

As most features are usually unknown

Based on similar users clicks, ...

Page 10: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Classification

Mails classified as spams by GMail

Page 11: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Classification use cases

• Automatically attach tags to documents

• Extract suspicious documents

Based on existing manual tagging, wikipedia, ...

Spam, corrupted documents, ...

Page 12: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Clustering

Trendy topics discovered by Google News

Page 13: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Clustering with K-Means

AB

DE

F

C

Page 14: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Clustering with K-Means

AB

DE

F

C

Cluster centerswith random initial position

Page 15: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Clustering with K-Means

AB

C

DE

F

Data are attached to the nearest cluster center

Page 16: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Clustering with K-Means

AB

DE

F

C

Cluster centers are moved in order to minimize the sum

of distances

Page 17: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Clustering with K-Means

AB

DE

F

C

The data point C is then attached to the first center as it has

become the nearest

Page 18: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Clustering use cases

• Finds key topics in a set of documents

• Finds some typical behaviors within a set of users

News feeds, business documents, ...

Visit frequency, buying habits, ...

Page 19: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Apache Mahout

Page 20: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

In few words

• Implementation of machine learning algorithms in Java

• Most of them come in a MapReduce implementation for Hadoop

• Still quite young but growing fast

• Intended to be for Machine Learning what Lucene is for Information Retrieval

Continuously growing collection of algorithms

Scalable to huge datasets

Started in early 2009

Page 21: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Documentation

Page 22: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Recommendation example

DataModel model = new FileDataModel(new File("data.csv"));

UserSimilarity simil = new PearsonCorrelationSimilarity(model);

UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);

Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, simil);

List<RecommendedItem> recommendations = recommender.recommend(1, 1);

The code for a basic recommendation is pretty straightforward !

Page 23: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Classification with Mahout

Trainingalgorithm

Trainingexamples

New data

Model

Model Decision

Copy

Page 24: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Clustering with Mahout

ClusteringalgorithmDocuments List of

clusters

Page 25: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Relevance evaluation

Data used for training

Data used to evaluate relevance of an algorithm and its settings

Entire dataset

Page 26: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

A search engine use case

Page 27: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

A Search Engine

Search

Page 28: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

A Search Engine

SearchMyCustomer

Page 29: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

A Search Engine

SearchMyCustomer

Non Disclosure Agreement 12 days ago... MyCustomer agrees not to disclose any part of ...

2010 Sales Report 1 month ago... MyCustomer: 12 M€ with 3 deals ...

Phone Call 2 days agoCustomer: MyCustomer Time: 9:55am Duration: 13minDescription: Invoice not received for order #2354E

Document

Document

Phone Call

Page 30: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Indexing Pipeline

Text Extractor

Lucene

PDF

PhoneCall

Analyzer

Analyzer

SearchIndex

Tika

Page 31: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

A more complex Search Engine

SearchMyCustomer

2010 Sales Report 1 month ago... MyCustomer: 12 M€ with 3 deals ...

Phone Call 2 days agoCustomer: MyCustomer Time: 9:55am Duration: 13minDescription: Invoice not received for order #2354E

Document

Phone Call

Sales Juridic Accounting

Page 32: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Indexing Pipeline with Mahout

Text Extractor

Lucene

PDF

PhoneCall

Analyzer

Analyzer

SearchIndex

Tika

Classifier

Classifier

Mahout

Page 33: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Query pipeline

Query

Results

Analyzer

SearchIndex

Lucene

Page 34: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Query pipeline with Mahout

Using Mahout recommendations

Query

Results

Analyzer

Analyzer

CustomScoring

SearchIndex

Lucene

Page 35: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Conclusion

• Machine learning brings a lot of valuable features for enterprises

• Mahout is growing fast and is becoming a great choice for Java apps

• Business people are not used to that kind of use cases

Revenue increasing, better productivity, user adoption, ...

With easy integration to business applications

Collaboration with technical folks is mandatory

Page 36: Xebia Knowledge Exchange (mars 2011) - Machine Learning with Apache Mahout

Questions / Answers

?@mfiguiere

blog.xebia.fr


Recommended