+ All Categories
Home > Documents > Data Mining (and machine learning)

Data Mining (and machine learning)

Date post: 21-Mar-2016
Category:
Upload: faunus
View: 14 times
Download: 1 times
Share this document with a friend
Description:
Data Mining (and machine learning). A few important things in brief top10dm, - neural networks – overfitting --- SVM. http://is.gd/top10dm. C4.5 k -means SVMs A priori EM algorithm. 6. PageRank 7. Adaboost 8. k -NN 9. Naive- Bayes 10. CART. http://is.gd/top10dm. C4.5 k -means - PowerPoint PPT Presentation
53
Data Mining (and machine learning) A few important things in brief top10dm, - neural networks – overfitting --- SVM
Transcript
Page 1: Data Mining (and machine learning)

Data Mining(and machine learning)

A few important things in brief

top10dm, - neural networks – overfitting --- SVM

Page 2: Data Mining (and machine learning)

http://is.gd/top10dm1. C4.52. k-means3. SVMs4. A priori 5. EM algorithm

6. PageRank7. Adaboost8. k-NN9. Naive-Bayes10. CART

Page 3: Data Mining (and machine learning)

http://is.gd/top10dm1. C4.52. k-means3. SVMs4. A priori 5. EM algorithm

6. PageRank7. Adaboost8. k-NN9. Naive-Bayes10. CARTThe 10 most mentioned in

Data Mining academic literature up to 2007,not including Machine Learning literature

Page 4: Data Mining (and machine learning)

http://is.gd/top10dm1. C4.52. k-means3. SVMs4. A priori 5. EM algorithm

6. PageRank7. Adaboost8. k-NN9. Naive-Bayes10. CART

Page 5: Data Mining (and machine learning)

http://is.gd/top10dm1. C4.52. k-means3. SVMs4. A priori 5. EM algorithm

6. PageRank7. Adaboost8. k-NN9. Naive-Bayes10. CART

Decision tree methods

Machine-learning decision boundary method

Specific to certain kinds of data

Heavily mathematical// generalised version of k-means

Page 6: Data Mining (and machine learning)

So ...Today we will look at:

The no. 1 Machine Learning method … Overfitting (this is a good place for it...)

Support vector machines

in the last lecture we will see the new number 1Machine learning method …

Page 7: Data Mining (and machine learning)

Decision boundary methods: finding a separating boundary

Page 8: Data Mining (and machine learning)

Decision boundary methods: finding a separating boundary

Page 9: Data Mining (and machine learning)

Decision boundary methods: finding a separating boundary

Page 10: Data Mining (and machine learning)

Decision boundary methods: finding a separating boundary

Page 11: Data Mining (and machine learning)

Decision boundary methods: finding a separating boundary

Page 12: Data Mining (and machine learning)

Decision boundary methods: finding a separating boundary

Page 13: Data Mining (and machine learning)

If your data were 2D, you could plot your known data points, colour with known classes and draw the boundary you think looks best,

and use that as the classifier

Page 14: Data Mining (and machine learning)

If your data were 2D, you could plot your known data points, colour with known classes and draw the boundary you think looks best,

and use that as the classifier

But it’s much more common (and effective) to use machine learning methods.

Neural Networks and Support Vector Machines are the most common ‘decision boundary’ methods. In both cases they learn by finding the parameters for a very complex curve – and that’s the decision boundary.

Page 15: Data Mining (and machine learning)

Artificial Neural Networks

An artificial neuron (node)An ANN (neural network)

Nodes abstractly model neurons; they do very simple number crunching

Numbers flow from left to right: the numbers arriving at the input layer get“transformed” to a new set of numbers at the output layer.

There are many kinds of nodes, and many ways of combining them intoa network, but we need only be concerned with the types described here,which turn out to be sufficient for any (consistent) pattern classification task.

Page 16: Data Mining (and machine learning)

A single node (artificial neuron) works like this

3

2

1

-2

2

Page 17: Data Mining (and machine learning)

A single node (artificial neuron) works like this

3

2

1

-2

24

-3

0

Field values come along (inputs from us, or from other nodes)

Page 18: Data Mining (and machine learning)

A single node (artificial neuron) works like this

4x3=12

0x2=0

-3x1=-3

-2

2

They get multiplied by the strengths on the input lines …

Page 19: Data Mining (and machine learning)

A single node (artificial neuron) works like this

3

2

1

-2

2

The node adds up its inputs, and applies a simple function to it

f(12-3+0)

Page 20: Data Mining (and machine learning)

A single node (artificial neuron) works like this

3

2

1-2 x f(9)

2 x f(9)

It sends the result out along its output lines, where it will in turn getmultiplied by the line weights before being delivered …

Page 21: Data Mining (and machine learning)

Computing AND with a NN

0.5

0.5

A

B

The blue node is the output node.It adds the weighted inputs, and outputs 1 if the result is >= 1, otherwise 0.

Page 22: Data Mining (and machine learning)

Computing OR with a NN

1

1

A

B

The blue node is the output node.It adds the weighted inputs, and outputs 1 if the result is >= 1, otherwise 0.

With these weights, only one of the inputs needs to be a 1, and the outputwill be 1. Output will be 0 only if both inputs are zero.

Page 23: Data Mining (and machine learning)

Computing NOT with a NN

-1

1

A

Bias unit which alwayssends fixed signal of 1

This NN computes the NOT of input A

The blue unit is a threshold unit with a threshold of 1 as before.

So if A is 1, the weighted sum at the output unit is 0, hence output is 0;If A is 0, the weighted sum is 1, so output is 1.

Page 24: Data Mining (and machine learning)

So, an NN can compute AND, OR and NOT – so what?

It is straightforward to combine ANNs together, with outputs from some becoming the inputs of others, etc. That is, we can combine them just like logic gates on a microchip.

And this means that a neural network can compute ANY function of the inputs

Page 25: Data Mining (and machine learning)

And you’re telling me this because … ?Imagine this. Image of handwritten character converted into array of grey levels (inputs)26 outputs, one for each character

72

3

00

0

abcdef… … …

Weights are the links are chosen such that the output corresponding to thecorrect letter emits a 1, and all the others emit a 0.

This sort of thing is not only possible, but routine:Medical diagnosis, wine-tasting, lift-control, sales prediction, …

Page 26: Data Mining (and machine learning)

If wrong, weights are adjusted in a simple way which makes it more likelythat the ANN will be correct for this input next time

Getting the Right Weights

72

3

00

0

001000

An ANN starts with randomised weightsAnd with a database of known examples for training

If this pattern corresponds to a “c” We want these

outputs

Clearly, an application will only be accurate if the weights are right.

Page 27: Data Mining (and machine learning)

Training an NN

Send Training Pattern in

Crunch to outputs

Adjust weights

STOPAll correct

Some wrong

It works like this:

Present a pattern as a series of numbers at thefirst layer of nodes. E.g. Field values for instance 1

Each node in the next layer does its simple processing, andsends its results to the next layer, and so on, until numberscall out at the output layer

Compare the NN’s output pattern with the known correct pattern(target class). If different, adjust the weights somehow to make it morelikely to be correct on this pattern next time.

Page 28: Data Mining (and machine learning)

`Classical’ NN TrainingAn algorithm called backpropagation BP is the classic way of training a neural network.

Based on partial differentiation, it prescribes a way toadjust the weights so that the error on the latest pattern would probably be reduced next time.

We can instead use any optimisation algorihm(e.g. GA) to find the weights for a NN.

E.g. the first ever significant application of particle swarm optimisation, showed that it was faster than BP, with better results.

Page 29: Data Mining (and machine learning)

Generalisation: Decision boundariesThe ANN is Learning during its training phaseWhen it is in use, providing decisions/classifications for live cases ithasn’t seen before, we expect a reasonable decision from it. I.e. wewant it to generalise well.

A AA

B B

BA A

AB B

BA A

AB B

B

Good generalisation

Fairly poor generalisation

Stereotyping?

Coverage and extent of training data helps to avoid poor generalisatonMain Point: when an NN generalises well, its results seems sensible, intuitive,and generally more accurate than people

A A A

Suppose a network was trained with the black As and Bs; here, the black line is a visualisation of it’s decision space; it will think anything on one side is an A, and anything on the other side is a B. the white A represents an unseen test case. In the third example, it thinks this is a B

Page 30: Data Mining (and machine learning)

Poor generalisation – some insight into overfitting

Suppose we train an a classifierto tell the differencebetween handwrittent and c, using only theseexamples: ts

cs

The classifier will learneasily. It will probablygives 100% correct prediction on these cases.

Page 31: Data Mining (and machine learning)

OverfittingBUT; this classifier will probably generalise very poorly; it will performvery badly on a test setE.g. here is potential (very likely) performance on certain unseen cases

Why?

It will probably predict that this is a c

It will probably predict that this is a t

Page 32: Data Mining (and machine learning)

Avoiding Overfitting It can be avoided by using as much training data as

possible,ensuring as much diversity as possible in the data. This cuts down on the potential existence of features that

might be discriminative in the training data, but are otherwise spurious.

It can be avoided by jittering (adding noise). During training, every time an input pattern

ispresented, it is randomly perturbed. The idea of this is that spurious features will be `washed out’ by the noise, but valid discriminatory features will remain.

The problem with this approach is how to correctly choose the level of noise.

Page 33: Data Mining (and machine learning)

Avoiding Overfitting IIer

ror

Time – for methods like neural networks

Training data

Validation data

Starting to overfit

A typical curve showing performance during training.

But here is performance on unseen data, not in the training set.

Page 34: Data Mining (and machine learning)

Avoiding Overfitting III

Another approach is early stopping. During training, keep track of the network’s performance on a separate validation set of data. At the point where error continues to improve on the training set, but starts to get worse on the validation set, that is when training should be stopped, since it is starting to overfit on the training data. The problem here is that this point is far from always clear cut.

Page 35: Data Mining (and machine learning)

Real-world applications of ANNs are all over the place

Page 36: Data Mining (and machine learning)

Stocks, Commodities and Futures

Currency Price Predictions James O'Sullivan: Controls trading of more than 10 different financial markets with consistent profits.

Corporate Bond Rating George Pugh: Predicts corporate bond ratings with 100% accuracy for consulting and trading.

Standard and Poor's 500 PredictionLBS Capital Management, Inc: Predicts the S&P 500 one day ahead and one week ahead with better accuracy than traditional methods.

Forecasting Stock PricesWalkrich Investments: Neural Networks rate underpriced stock; beating the S&P.

Page 37: Data Mining (and machine learning)

Business, Management, and Finance Direct Marketing Mail Prediction

Microsoft: Improves response rates from 4.9% to 8.2%. Credit Scoring

Herbert Jensen: Predicts loan application success with 75-80% accuracy.

Identifing Policemen with Potential for MisconductThe Chicago Police Department predict misconduct potential based on employee records.

Jury Summoning with Neural NetworksThe Montgomery Court House in Norristown, PA saves $70 million annually using The Intelligent Summoner from MEA.

Forecasting Highway Maintenance with Neural NetworksProfessor Awad Hanna at the University of Wisconsin in Madison has trained a neural network to predict which type of concrete is better than another for a particular highway problem.

Page 38: Data Mining (and machine learning)

Medical ApplicationsBreast Cancer Cell Analysis

David Weinberg, MD: Image analysis ignores benign cells and classifies malignant cells.

Hospital Expenses ReducedAnderson Memorial Hospital: Improves the quality of care, reduces death rate, and saved $500,000 in the first 15 months of use.

Diagnosing Heart AttacksJ. Furlong, MD: Recognizes Acute Myocardial Infarction from enzyme data

Emergency Room Lab Test OrderingS. Berkov, MD: Saves time and money ordering tests using symptoms and demographics.

Classifying Patients for Psychiatric Care G. Davis, MD: Predicts Length of Stay for Psychiatric Patients, saving money

Page 39: Data Mining (and machine learning)

Sports Applications

Thoroughbred Horse RacingDon Emmons: 22 races, 17 winning horses.

Thoroughbred Horse Racing Rich Janeva: 39% of winners picked at odds better than 4.5 to 1.

Dog Racing Derek Anderson: 94% accuracy picking first place.

Page 40: Data Mining (and machine learning)

ScienceSolar Flare Prediction

Dr. Henrik Lundstet: Predicts the next major solar flare; helps prevent problems for power plants.

Mosquito IdentificationAubrey Moore: 100% accuracy distinguishing between male and female, two species.

SpectroscopyStellarNet Inc: Analyze spectral data to classify materials.

Weather ForecastingFort Worth National Weather Service: Predict rainfall to 85% accuracy.

Air Quality TestingResearchers at the Defense Research Establishment Suffield, Chemical & Biological Defense Section, in Alberta, Canada have trained a neural network to recognize, classify and characterize aerosols of unknown origin with a high degree of accuracy.

Page 41: Data Mining (and machine learning)

Manufacturing

Plastics TestingMonsanto: Predicts plastics quality, saving research time, processing time, and manufacturing expense.

Computer Chip Manufacturing QualityIntel: Analyzes chip failures to help improve yields.

Nondestructive Concrete TestingDonald G. Pratt: Detects the presence and position of flaws in reinforced concrete.

Beer Testing Anheuser-Busch: Identifies the organic content of competitors' beer vapors with 96% accuracy.

Steam Quality Testing AECL Research in Manitoba, Canada has developed the INSIGHT steam quality monitor, an instrument used to measure steam quality and mass flowrate.

Page 42: Data Mining (and machine learning)

Support Vector Machines: a different approach to finding the

decision surface, particularly good at generalisation

Page 43: Data Mining (and machine learning)

Suppose we can divide the classes with a simple hyperplane

Page 44: Data Mining (and machine learning)

There will be infinitely many such lines

Page 45: Data Mining (and machine learning)

One of them is ‘optimal’

Page 46: Data Mining (and machine learning)

Beause it maximises the average distance of the hyperplane from the ‘support vectors’ – instances

that are closest to instances of different class

Page 47: Data Mining (and machine learning)

A Support Vector Machine (SVM) finds this hyperplane

Page 48: Data Mining (and machine learning)

But, usually there is no simple hyperplane that separates the classes!

Page 49: Data Mining (and machine learning)

One dimension (x), two classes

Page 50: Data Mining (and machine learning)

Two dimensions (x, x*sin(x)),

Page 51: Data Mining (and machine learning)

Now we can separate the classes

Page 52: Data Mining (and machine learning)

That’s what SVMs doIf we add enough extra dimensions/fields using arbitrary functions of the existing fields, then it becomes very likely we can separate the data with a ‘straight line’ hyperplane.SVMs - apply such a transformation - then find the optimal separating hyperplane.The ‘optimality’ of the sep hyp means goodgeneralisation properties

Page 53: Data Mining (and machine learning)

Next: the classic, field-defining DM algorithm


Recommended