Post on 10-May-2015
transcript
© Mathias Brandewinder / @brandewinder
Machine Learning on .NET
F# FTW!
© Mathias Brandewinder / @brandewinder
A few words about me
»Mathias Brandewinder / @brandewinder
»Background: economics, operations research
».NET developer for 10~ years (C#, F#)
»Bay.Net San Francisco, SFSharp.org
»www.clear-lines.com/blog
© Mathias Brandewinder / @brandewinder
I am assuming…
»Few familiar with F#
»Mostly unfamiliar with Data Science / Machine
Learning
»Mostly familiar with OO languages (C#, Java)
»Some familiar with Functional Languages
© Mathias Brandewinder / @brandewinder
Why this talk
»Machine Learning, Data
Science are red-hot topics
› ... and relevant to developers
».NET is under-represented
»ML is also for developers!
© Mathias Brandewinder / @brandewinder
My goal
»Can’t introduce F#, Machine Learning under 1h
»Give you a sense for what Machine Learning is
› Highlight some differences with “standard”
development
› Mostly live code
»Illustrate why I think F# is a great fit
© Mathias Brandewinder / @brandewinder
What is F#?
»Functional first, statically typed language
»Cross-platform: Windows, iOS, Linux
»Open-source (www.github.com/fsharp)
»Think lighter Scala? Python with types?
»Very friendly community (Twitter #fsharp)
© Mathias Brandewinder / @brandewinder
What is Machine Learning?
»"A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by
P, improves with experience E“ [Tom M.
Mitchell]
© Mathias Brandewinder / @brandewinder
In English, please?
»Program performs a Task using Data
»The more Data, the better it gets
»Rooted in statistics, math
»A Computer Science problem as well
› Used in live software, with changing data
© Mathias Brandewinder / @brandewinder
The plan
»Classification »Regression»Unsupervised
»Type Providers
»Existing .NET libraries
»Algebra»Functional fit
© Mathias Brandewinder / @brandewinder
Classification & Regression
© Mathias Brandewinder / @brandewinder
Goal
»What does “a day of Machine Learning”
look like?
»Illustrate Classification and Regression
© Mathias Brandewinder / @brandewinder
Classification, Regression
»Classification = using data to classify items
› Ex: Spam vs. Ham, Character Recognition, …
»Regression = predicting a number
› Ex: predict price of item given attributes, …
»Both belong to Supervised Learning
› You know what question you are trying to answer
› You use data to fit a predictive model
© Mathias Brandewinder / @brandewinder
Support Vector Machine
»Classic algorithm»Tries to separate the 2 classes by the widest possible margin
»Using Accord.NET implementation
© Mathias Brandewinder / @brandewinder
Demo: Kaggle Digit Recognizer
© Mathias Brandewinder / @brandewinder
Take-Aways
»F# is a first-class citizen in .NET
»Good libraries: Accord.NET, Math.NET, Alea.cuBase, …
»Interactive experience with the REPL
»Syntax matters!
»Classification, Regression, Cross-Validation
© Mathias Brandewinder / @brandewinder
Unsupervised
© Mathias Brandewinder / @brandewinder
Goal
»Illustrate unsupervised learning
»Functional programming and ML are a
great fit
© Mathias Brandewinder / @brandewinder
Writing your own
»Usually not advised
»Useful for ML because
› Active research: you might not have a library
yet
› As you learn your domain, you may need a
custom model
© Mathias Brandewinder / @brandewinder
Most ML algorithms are the same»Read data
»Transform into Features
»Learn a Model from the Features
»Evaluate Model quality
© Mathias Brandewinder / @brandewinder
Translates well to FP
»Read data
»Transform into Features -> Map
»Learn a Model from the Features ->
Recursion
»Evaluate Model quality -> Fold/Reduce
© Mathias Brandewinder / @brandewinder
Focus on transforms, not objects»Need to transform rapidly Features
› Don’t force domain to fit algorithm
› Morph around the shape of the data, pass functions
› Algorithms need to be generic
»FP is fantastic for code reuse
© Mathias Brandewinder / @brandewinder
What is Unsupervised Learning?»“Tell me something about my data”
»Example: Clustering
› Find groups of “similar” entities in my dataset
© Mathias Brandewinder / @brandewinder
Example: clustering (1)
© Mathias Brandewinder / @brandewinder
Example: clustering (2)
“Assign to closest Centroid”[Map Distance]
© Mathias Brandewinder / @brandewinder
Example: clustering (3)
“Update Centroids based on Cluster”[Reduce]
© Mathias Brandewinder / @brandewinder
Example: clustering (4)
“Stop when no change”[Recursion]
© Mathias Brandewinder / @brandewinder
Demo
© Mathias Brandewinder / @brandewinder
Type Providers
© Mathias Brandewinder / @brandewinder
No data, no learning
»Most of ML effort is spent acquiring data
»Most of the World is not in your Type System
»Unpleasant trade-off:
› Dynamic: easy hacking but runtime exceptions
› Static: safer, but straight-jacket
© Mathias Brandewinder / @brandewinder
Demo
»http://
www.youtube.com/watch?v=cCuGgA9Yqrs
© Mathias Brandewinder / @brandewinder
Conclusion
© Mathias Brandewinder / @brandewinder
F# is a perfect fit for ML on .NET»Functional style fits very well with ML
»REPL/interactive experience is crucial
»Smooth integration with all of .NET
»Flexible exploration, performance in production
»Type Providers: static types, without the pain
© Mathias Brandewinder / @brandewinder
My recommendation
»Take a look at Machine Learning, Data
Science
»Do it with a functional language
»… and preferably, do it using F#
© Mathias Brandewinder / @brandewinder
Getting involved
»Very dynamic community
»FSharp.org, the F# Foundation
»#fsharp on Twitter