+ All Categories
Home > Technology > Machine Learning for Web Data

Machine Learning for Web Data

Date post: 27-Jan-2015
Category:
Upload: hilary-mason
View: 118 times
Download: 10 times
Share this document with a friend
Description:
Presentation at Web Directions 2010, Atlanta, GA.
Popular Tags:
67
Machine Learning for Web Data Hilary Mason Web Directions USA 2010
Transcript
Page 1: Machine Learning for Web Data

Machine Learning for Web Data

Hilary MasonWeb Directions USA 2010

Page 2: Machine Learning for Web Data
Page 3: Machine Learning for Web Data

= new capacities

(superpowers)

Machine learning is a way of thinking about data.

Page 4: Machine Learning for Web Data
Page 5: Machine Learning for Web Data

http://www.meetup.com/NYC-Tech-Talks/calendar/12939544/?from=list&offset=0

http://bit.ly/9N7VB1

Page 6: Machine Learning for Web Data

6

Page 7: Machine Learning for Web Data

wicked hard problem

10s of millions of URLs /day

100s of millions of events / day

1000s of millions of data points

Page 8: Machine Learning for Web Data
Page 9: Machine Learning for Web Data

?

Page 10: Machine Learning for Web Data
Page 11: Machine Learning for Web Data

@hmason

Page 12: Machine Learning for Web Data

[archive photo]

Page 13: Machine Learning for Web Data

ELIZA

Page 14: Machine Learning for Web Data
Page 15: Machine Learning for Web Data
Page 16: Machine Learning for Web Data
Page 17: Machine Learning for Web Data

ML Today

Page 18: Machine Learning for Web Data

Algorithms +

On-demand computing +

Ubiquitous data

Page 19: Machine Learning for Web Data

Algorithms

New frames for modeling the world with data.

Page 20: Machine Learning for Web Data
Page 21: Machine Learning for Web Data

[moar data and new kinds of data]

Page 22: Machine Learning for Web Data

Examples

Page 23: Machine Learning for Web Data

[spam filters]

Page 24: Machine Learning for Web Data

[netflix movie recommendations]

Page 25: Machine Learning for Web Data

Language Identification

Page 26: Machine Learning for Web Data

Face Identification

Page 27: Machine Learning for Web Data

Machine Learning

Page 28: Machine Learning for Web Data

Supervised Learning

Vs

Unsupervised Learning

Page 29: Machine Learning for Web Data

Clustering

immunity

ultrasound

medical imaging

medical devices

thermoelectric devices

fault-tolerant circuits

low power devices

Page 30: Machine Learning for Web Data

Entity disambiguation

This is important.

Page 31: Machine Learning for Web Data

MEUGLY HAG

Page 32: Machine Learning for Web Data

Entity disambiguation

This is important.

Company disambiguation is a very common problem – Are “Microsoft”, “Microsoft Corporation”, and “MS” the same company?

Page 33: Machine Learning for Web Data

Classification

Page 34: Machine Learning for Web Data

classification

Text Feature Extractor

TrainedClassifier

Cats

Dogs

Fire

Training Data

Feature Extractor

Page 35: Machine Learning for Web Data

<math>

Page 36: Machine Learning for Web Data

Probability

P(A) is the probability that A is true.

Page 37: Machine Learning for Web Data

Axioms of Probability

0 ≤ P(A) ≤ 1

P(True) = 1

P(False) = 0

P(A or B) = P(A) + P(B) – P(A and B)

Page 38: Machine Learning for Web Data

P(A or B) = P(A) + P(B) – P(A and B)

P(A)

P(B)

P(A and B)

Page 39: Machine Learning for Web Data

Bayes Law

Page 40: Machine Learning for Web Data

ExampleThere are10,000 people.

1% have a rare disease.

Page 41: Machine Learning for Web Data

Example

• Population of 10,000• 1% have rare disease• There’s a test that is 99% effective.– 99% of sick patients test positive– 99% of healthy patients test negative

Page 42: Machine Learning for Web Data

Given a positive test result, what is the probability that the patient is sick?

Page 43: Machine Learning for Web Data
Page 44: Machine Learning for Web Data

Disease Diagnosis

99 sick patients test positive, 99 healthy patients test positive

Given a positive test, there is a 50% probability that the patient is sick.

Page 45: Machine Learning for Web Data

Bayesian Disease

Know the prob. of testing sick given healthy, and healthy given sick

Use Bayes theorem to invert probabilities

Page 46: Machine Learning for Web Data

</math>

Page 47: Machine Learning for Web Data

Obtain

Scrub

Explore

Model

iNterpret

Page 48: Machine Learning for Web Data
Page 49: Machine Learning for Web Data

1. Obtain Data

“pointing and clicking does not scale!”

http://www.delicious.com/pskomoroch/dataset

Page 50: Machine Learning for Web Data

lynx –dump http://www.nytimes.com

Lynx: http://bit.ly/a6Pumm

2. Scrub

Page 51: Machine Learning for Web Data
Page 52: Machine Learning for Web Data

3. Explore

http://vis.stanford.edu/protovis/

Page 53: Machine Learning for Web Data

4. Model

Google Prediction APIhttp://code.google.com/apis/predict/

Page 54: Machine Learning for Web Data

4. Model

Python

• NLTK - http://www.nltk.org/• Scikits Learn -

http://scikit-learn.sourceforge.net/

Page 55: Machine Learning for Web Data

4. Model

http://www.alchemyapi.com/

Page 56: Machine Learning for Web Data

5. Interpret

Andrew Vande Moore – Visual Poetry 06

Page 57: Machine Learning for Web Data

http://www.dataists.com

Page 58: Machine Learning for Web Data

One Final Example

Twitter is full of noise.

Sports – downMath – UP!Narcissism - down

Page 59: Machine Learning for Web Data

Code!

Page 60: Machine Learning for Web Data

Filtering & Relevance Ordering

http://github.com/hmason/tc

Page 61: Machine Learning for Web Data

What’s next?

Page 62: Machine Learning for Web Data

Soon:

Natural Language Generation

Rich media classification

Contextual everything

Page 63: Machine Learning for Web Data

Algorithms-As-A-Service

Page 64: Machine Learning for Web Data

infer links in data

Page 65: Machine Learning for Web Data

Filtering

Page 66: Machine Learning for Web Data

Relevance

Page 67: Machine Learning for Web Data

[email protected]@hmason

Thank you!


Recommended