+ All Categories
Home > Documents > Introducing Apache Mahout

Introducing Apache Mahout

Date post: 05-Jan-2016
Category:
Upload: tekli
View: 56 times
Download: 2 times
Share this document with a friend
Description:
Introducing Apache Mahout. Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination. Overview. What is Machine Learning? Mahout. Definition. “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” - PowerPoint PPT Presentation
29
Introducing Apache Mahout Scalable Machine Learning for All! Grant Ingersoll Lucid Imagination
Transcript
Page 1: Introducing Apache Mahout

Introducing Apache Mahout

Scalable Machine Learning for All!

Grant Ingersoll

Lucid Imagination

Page 2: Introducing Apache Mahout

Overview

• What is Machine Learning?

• Mahout

Page 3: Introducing Apache Mahout

Definition• “Machine Learning is programming

computers to optimize a performance criterion using example data or past experience”– Intro. To Machine Learning by E.

Alpaydin

• Subset of Artificial Intelligence– Many other fields: comp sci., biology,

math, psychology, etc.

Page 4: Introducing Apache Mahout

Types• Supervised

– Using labeled training data, create function that predicts output of unseen inputs

• Unsupervised– Using unlabeled data, create function

that predicts output

• Semi-Supervised– Uses labeled and unlabeled data

Page 5: Introducing Apache Mahout

Characterizations

• Lots of Data

• Identifiable Features in that Data

• Too big/costly for people to handle– People still can help

Page 6: Introducing Apache Mahout

Clustering

• Unsupervised

• Find Natural Groupings– Documents– Search Results– People– Genetic traits in groups– Many, many more uses

Page 7: Introducing Apache Mahout

Example: Clustering

Google News

Page 8: Introducing Apache Mahout

Collaborative Filtering

• Unsupervised

• Recommend people and products– User-User

• User likes X, you might too

– Item-Item• People who bought X also bought Y

Page 9: Introducing Apache Mahout

Example: Collab Filtering

Amazon.com

Page 10: Introducing Apache Mahout

Classification/Categorization

• Many, many types

• Spam Filtering

• Named Entity Recognition

• Phrase Identification

• Sentiment Analysis

• Classification into a Taxonomy

Page 11: Introducing Apache Mahout

Example: NER

NER?

Excerpt from Yahoo News

Page 12: Introducing Apache Mahout

Example: Categorization

Page 13: Introducing Apache Mahout

Info. Retrieval

• Learning Ranking Functions

• Learning Spelling Corrections

• User Click Analysis and Tracking

Page 14: Introducing Apache Mahout

Other

• Image Analysis

• Robotics

• Games

• Higher level natural language processing

• Many, many others

Page 15: Introducing Apache Mahout

What is Apache Mahout?

• A Mahout is an elephant trainer/driver/keeper, hence…

+Machine Learning

=

(and other distributed techniques)

Page 16: Introducing Apache Mahout

What?

• Hadoop brings:– Map/Reduce API– HDFS– In other words, scalability and fault-

tolerance

• Mahout brings:– Library of machine learning algorithms– Examples

Page 17: Introducing Apache Mahout

Why Mahout?• Many Open Source ML libraries either:

– Lack Community

– Lack Documentation and Examples

– Lack Scalability

– Lack the Apache License ;-)

– Or are research-oriented

Page 18: Introducing Apache Mahout

Why Mahout?• Intelligent Apps are the Present and

Future

• Thus, Mahout’s Goal is:– Scalable Machine Learning with Apache

License

Page 19: Introducing Apache Mahout

Current Status• What’s in it:

– Simple Matrix/Vector library– Taste Collaborative Filtering– Clustering

• Canopy/K-Means/Fuzzy K-Means/Mean-shift/Dirichlet

– Classifiers• Naïve Bayes• Complementary NB

– Evolutionary• Integration with Watchmaker for fitness function

Page 20: Introducing Apache Mahout

How?

• Examples– Taste– Clustering– Classification– Evolutionary

Page 21: Introducing Apache Mahout

Taste: Movie Recommendations

• Given ratings by users of movies, recommend other movies

• http://lucene.apache.org/mahout/taste.html#demo

Page 22: Introducing Apache Mahout

Taste Demo

• http://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=12&debug=true

• http://localhost:8080/mahout-taste-webapp/RecommenderServlet?userID=43&debug=true

Page 23: Introducing Apache Mahout

Clustering: Synthetic Control Data

• http://archive.ics.uci.edu/ml/datasets/Synthetic+Control+Chart+Time+Series

• Each clustering impl. has an example Job for running in <MAHOUT_HOME>/examples– o.a.mahout.clustering.syntheticcontrol.*

• Outputs clusters…

Page 24: Introducing Apache Mahout

Classification: NB and CNB Examples

• 20 Newsgroups– http://cwiki.apache.org/confluence/

display/MAHOUT/TwentyNewsgroups

• Wikipedia– http://cwiki.apache.org/confluence/

display/MAHOUT/WikipediaBayesExample

Page 25: Introducing Apache Mahout

Evolutionary

• Traveling Salesman– http://cwiki.apache.org/confluence/

display/MAHOUT/Traveling+Salesman

• Class Discovery– http://cwiki.apache.org/confluence/

display/MAHOUT/Class+Discovery

Page 26: Introducing Apache Mahout

What’s Next?• More Examples• Winnow/Perceptron (MAHOUT-85)• Text Clustering• Association Rules (MAHOUT-108)• Logistic Regression• Solr Integration (SOLR-769)• GSOC

Page 27: Introducing Apache Mahout

When, Who• When? Now!

– Mahout is growing

• Who? You!– We want programmers who:

• Are comfortable with math• Like to work on hard problems

– We want others to:• Kick the tires

Page 28: Introducing Apache Mahout

Where?

• http://lucene.apache.org/mahout– Hadoop - http://hadoop.apache.org

• http://cwiki.apache.org/MAHOUT

• mahout-{user|dev}@lucene.apache.org– http://www.lucidimagination.com/search/p:mahout

Page 29: Introducing Apache Mahout

Resources

• “Programming Collective Intelligence” by Segaran

• “Data Mining - Practical Machine Learning Tools and Techniques” by Witten and Frank

• “Taming Text” by Ingersoll and Morton


Recommended