AI – CS289AI – CS289Machine LearningMachine Learning
Introduction to Machine LearningIntroduction to Machine Learning
30th October 2006
Dr Bogdan L. [email protected]
30th October 2006 Bogdan L. Vrusias © 2006 2
AI – CS289AI – CS289Machine LearningMachine Learning
ContentsContents• Learning
• Artificial Neural Networks
• Supervised Learning
• Unsupervised Learning
30th October 2006 Bogdan L. Vrusias © 2006 3
AI – CS289AI – CS289Machine LearningMachine Learning
What is Learning?What is Learning?• ‘The action of receiving instruction or acquiring knowledge’.
• ‘A process which leads to the modification of behaviour or the acquisition of new abilities or responses, and which is additional to natural development by growth or maturation’.
Source: Oxford English Dictionary Online: http://www.oed.com/, accessed October 2003.
30th October 2006 Bogdan L. Vrusias © 2006 4
AI – CS289AI – CS289Machine LearningMachine Learning
Machine LearningMachine Learning• Negnevitsky:
‘In general, machine learning involves adaptive mechanisms that enable computers to learn from experience, learn by example and learn by analogy’ (2005:165)
• Callan:‘A machine or software tool would not be viewed as intelligent if it could not adapt to changes in its environment’ (2003:225)
• Luger:‘Intelligent agents must be able to change through the course of their interactions with the world’ (2002:351)
• Learning capabilities can improve the performance of an intelligent system over time.
• The most popular approaches to machine learning are artificial neural networks and genetic algorithms.
30th October 2006 Bogdan L. Vrusias © 2006 5
AI – CS289AI – CS289Machine LearningMachine Learning
Types of LearningTypes of Learning• Inductive learning
– Learning from examples
• Evolutionary/genetic learning– Shaping a population of individual solutions through survival of the fittest
– Emergent learning
– Learning through social interaction: game of life
30th October 2006 Bogdan L. Vrusias © 2006 6
AI – CS289AI – CS289Machine LearningMachine Learning
Inductive LearningInductive Learning• Supervised learning
– Training examples with a known classification from a teacher
• Unsupervised learning– No pre-classification of training examples
– Competitive learning: learning through competition on training examples
30th October 2006 Bogdan L. Vrusias © 2006 7
AI – CS289AI – CS289Machine LearningMachine Learning
Key ConceptsKey Concepts• Learn from experience
– Through examples, analogy or discovery
• To adapt– Changes in response to interaction
• Generalisation– To use experience to form a response to novel situations
30th October 2006 Bogdan L. Vrusias © 2006 8
AI – CS289AI – CS289Machine LearningMachine Learning
GeneralisationGeneralisation
30th October 2006 Bogdan L. Vrusias © 2006 9
AI – CS289AI – CS289Machine LearningMachine Learning
Uses of Machine LearningUses of Machine Learning• Techniques and algorithms that adapt through experience.
• Used for:– Interpretation / visualisation: summarise data
– Prediction: time series / stock data
– Classification: malignant or benign tumours
– Regression: curve fitting
– Discovery: data mining / pattern discovery
30th October 2006 Bogdan L. Vrusias © 2006 10
AI – CS289AI – CS289Machine LearningMachine Learning
Why Machine Learning?Why Machine Learning?• Complexity of task / amount of data
– Other techniques fail or are computationally expensive
• Problems that cannot be defined– Discovery of patterns / data mining
• Knowledge Engineering Bottleneck– ‘Cost and difficulty of building expert systems using traditional […]
techniques’ (Luger 2002:351)
30th October 2006 Bogdan L. Vrusias © 2006 11
AI – CS289AI – CS289Machine LearningMachine Learning
Common TechniquesCommon Techniques• Least squares
• Decision trees
• Support vector machines
• Boosting
• Neural networks
• K-means
• Genetic algorithms
30th October 2006 Bogdan L. Vrusias © 2006 12
AI – CS289AI – CS289Machine LearningMachine Learning
Decision TreesDecision Trees• A map of the reasoning process, good at solving classification
problems (Negnevitsky, 2005)
• A decision tree represents a number of different attributes and values– Nodes represent attributes
– Branches represent values of the attributes
• Path through a tree represents a decision
• Tree can be associated with rules
30th October 2006 Bogdan L. Vrusias © 2006 13
AI – CS289AI – CS289Machine LearningMachine Learning
Example 1Example 1• Consider one rule for an ice-cream seller (Callan
2003:241)
IF Outlook = Sunny
AND Temperature = Hot
THEN Sell
30th October 2006 Bogdan L. Vrusias © 2006 14
AI – CS289AI – CS289Machine LearningMachine Learning
Example 1Example 1
OutlookOutlook
TemperatureTemperature
SunnySunny
HotHot
SellSell
Don’t SellDon’t SellSellSell
YesYes NoNo
MildMild
Holiday SeasonHoliday Season
ColdCold
Don’t SellDon’t Sell
Holiday SeasonHoliday Season
OvercastOvercast
NoNo
Don’t SellDon’t Sell
YesYes
TemperatureTemperature
HotHot ColdColdMildMild
Don’t SellDon’t SellSellSell Don’t SellDon’t Sell
Root nodeBranch
Leaf
Node
30th October 2006 Bogdan L. Vrusias © 2006 15
AI – CS289AI – CS289Machine LearningMachine Learning
ConstructionConstruction• Concept learning:
– Inducing concepts from examples
• We can intuitively construct a decision tree for a small set of examples
• Different algorithms used to construct a tree based upon the examples– Most popular ID3 (Quinlan, 1986)
30th October 2006 Bogdan L. Vrusias © 2006 16
AI – CS289AI – CS289Machine LearningMachine Learning
Which Tree?Which Tree?• Different trees can be constructed from the same set of examples
• Which tree is the best?– Based upon choice of attributes at each node in the tree
– A split in the tree (branches) should correspond to the predictor with the maximum separating power
• Examples can be contradictory– Real-life is noisy
30th October 2006 Bogdan L. Vrusias © 2006 17
AI – CS289AI – CS289Machine LearningMachine Learning
Extracting RulesExtracting Rules• We can extract rules from decision trees
– Create one rule for each root-to-leaf path
– Simplify by combining rules
• Other techniques are not so transparent:– Neural networks are often described as ‘black boxes’ – it is difficult to
understand what the network is doing
– Extraction of rules from trees can help us to understand the decision process
30th October 2006 Bogdan L. Vrusias © 2006 18
AI – CS289AI – CS289Machine LearningMachine Learning
IssuesIssues• Use prior knowledge where available
• Not all the examples may be needed to construct a tree– Test generalisation of tree during training and stop when desired
performance is achieved
– Prune the tree once constructed
• Examples may be noisy
• Examples may contain irrelevant attributes
30th October 2006 Bogdan L. Vrusias © 2006 19
AI – CS289AI – CS289Machine LearningMachine Learning
Artificial Neural NetworksArtificial Neural Networks• An artificial neural network (or simply a neural network) can be
defined as a model of reasoning based on the human brain.
• The brain consists of a densely interconnected set of nerve cells, or basic information-processing units, called neurons.
• The human brain incorporates nearly 10 billion neurons and 60 trillion connections, synapses, between them.
• By using multiple neurons simultaneously, the brain can perform its functions much faster than the fastest computers in existence today.
30th October 2006 Bogdan L. Vrusias © 2006 20
AI – CS289AI – CS289Machine LearningMachine Learning
Artificial Neural NetworksArtificial Neural Networks• Each neuron has a very simple structure, but an army of such elements
constitutes a tremendous processing power.
• A neuron consists of a cell body, soma, a number of fibers called dendrites, and a single long fiber called the axon.
Soma Soma
Synapse
Synapse
Dendrites
Axon
Synapse
Dendrites
Axon
30th October 2006 Bogdan L. Vrusias © 2006 21
AI – CS289AI – CS289Machine LearningMachine Learning
Artificial Neural NetworksArtificial Neural Networks• Our brain can be considered as a highly complex, non-linear and
parallel information-processing system.
• Learning is a fundamental and essential characteristic of biological neural networks. The ease with which they can learn led to attempts to emulate a biological neural network in a computer.
30th October 2006 Bogdan L. Vrusias © 2006 22
AI – CS289AI – CS289Machine LearningMachine Learning
Artificial Neural NetworksArtificial Neural Networks• An artificial neural network consists of a number of very simple processors,
also called neurons, which are analogous to the biological neurons in the brain.
• The neurons are connected by weighted links passing signals from one neuron to another.
• The output signal is transmitted through the neuron’s outgoing connection. The outgoing connection splits into a number of branches that transmit the same signal. The outgoing branches terminate at the incoming connections of other neurons in the network.
Input Layer Output Layer
Middle Layer
I n
p u
t S
i g
n a
l s
O u
t p
u t
S
i g n
a l
s
30th October 2006 Bogdan L. Vrusias © 2006 23
AI – CS289AI – CS289Machine LearningMachine Learning
The PerceptronThe Perceptron• The operation of Rosenblatt’s perceptron is based on the McCulloch
and Pitts neuron model. The model consists of a linear combiner followed by a hard limiter.
• The weighted sum of the inputs is applied to the hard limiter, which produces an output equal to +1 if its input is positive and 1 if it is negative.
Threshold
Inputs
x1
x2
Output
Y
HardLimiter
w2
w1
LinearCombiner
30th October 2006 Bogdan L. Vrusias © 2006 24
AI – CS289AI – CS289Machine LearningMachine Learning
The PerceptronThe Perceptron• How does the perceptron learn its classification tasks?
• This is done by making small adjustments in the weights to reduce the difference between the actual and desired outputs of the perceptron.
• The initial weights are randomly assigned, usually in the range [-0.5, 0.5], and then updated to obtain the output consistent with the training examples.
x1
x2
1
(a) AND (x1 x2)
1
x1
x2
1
1
(b) OR (x1 x2)
x1
x2
1
1
(c) Exclusive-OR(x1 x2)
00 0
30th October 2006 Bogdan L. Vrusias © 2006 25
AI – CS289AI – CS289Machine LearningMachine Learning
Multilayer neural networksMultilayer neural networks• A multilayer perceptron is a feedforward neural network with one or
more hidden layers.
• The network consists of an input layer of source neurons, at least one middle or hidden layer of computational neurons, and an output layer of computational neurons.
• The input signals are propagated in a forward direction on a layer-by-layer basis.
Inputlayer
Firsthiddenlayer
Secondhiddenlayer
Outputlayer
O u
t p
u t
S
i g n
a l
s
I n
p u
t S
i g
n a
l s
30th October 2006 Bogdan L. Vrusias © 2006 26
AI – CS289AI – CS289Machine LearningMachine Learning
What does the middle layer hide?What does the middle layer hide?• A hidden layer “hides” its desired output.
• Neurons in the hidden layer cannot be observed through the input/output behaviour of the network.
• There is no obvious way to know what the desired output of the hidden layer should be.
• Commercial ANNs incorporate three and sometimes four layers, including one or two hidden layers. Each layer can contain from 10 to 1000 neurons. Experimental neural networks may have five or even six layers, including three or four hidden layers, and utilise millions of neurons.
30th October 2006 Bogdan L. Vrusias © 2006 27
AI – CS289AI – CS289Machine LearningMachine Learning
Supervised LearningSupervised Learning• Supervised or active learning is learning with an external “teacher” or
a supervisor who presents a training set to the network.
• The most populat supervised neural network is the back-propagation neural network.
30th October 2006 Bogdan L. Vrusias © 2006 28
AI – CS289AI – CS289Machine LearningMachine Learning
Back-propagation neural networkBack-propagation neural network• Learning in a multilayer network proceeds the same way as for a
perceptron.
• A training set of input patterns is presented to the network.
• The network computes its output pattern, and if there is an error – or in other words a difference between actual and desired output patterns – the weights are adjusted to reduce this error.
30th October 2006 Bogdan L. Vrusias © 2006 29
AI – CS289AI – CS289Machine LearningMachine Learning
Back-propagation neural networkBack-propagation neural network
Inputlayer
xi
x1
x2
xn
1
2
i
n
Outputlayer
1
2
k
l
yk
y1
y2
yl
Input signals
Error signals
wjk
Hiddenlayer
wij
1
2
j
m
30th October 2006 Bogdan L. Vrusias © 2006 30
AI – CS289AI – CS289Machine LearningMachine Learning
Back-propagation neural networkBack-propagation neural network• Network represented by McCulloch-Pitts model for solving the
Exclusive-OR operation.
y55
x1 31
x2 42
+1.0
1
1
1+1.0
+1.0
+1.0
+1.5
+1.0
+0.5
+0.5 2.0
30th October 2006 Bogdan L. Vrusias © 2006 31
AI – CS289AI – CS289Machine LearningMachine Learning
Back-propagation neural networkBack-propagation neural network
Inputs
x1 x2
1010
1100
011
Desiredoutput
yd
0
0.0155
Actualoutput
y5Y
Error
e
Sum ofsquarederrors
e 0.9849 0.9849 0.0175
0.0155 0.0151 0.0151 0.0175
0.0010
x1
x2
1
(a)
1
x2
1
1
(b)
00
x1 + x2 – 1.5 = 0 x1 + x2 – 0.5 = 0
x1 x1
x2
1
1
(c)
0
(a) Decision boundary constructed by hidden neuron 3;
(b) Decision boundary constructed by hidden neuron 4;
(c) Decision boundaries constructed by the complete three-layer network
30th October 2006 Bogdan L. Vrusias © 2006 32
AI – CS289AI – CS289Machine LearningMachine Learning
Unsupervised LearningUnsupervised Learning• Unsupervised or self-organised learning does not require an external
teacher.
• During the training session, the neural network receives a number of different input patterns, discovers significant features in these patterns and learns how to classify input data into appropriate categories.
• Unsupervised learning tends to follow the neuro-biological organisation of the brain.
• Most popular unsupervised neural networks are the Hebbian network and the Self-Organising Feature Map.
30th October 2006 Bogdan L. Vrusias © 2006 33
AI – CS289AI – CS289Machine LearningMachine Learning
Hebbian NetworkHebbian Network• Hebb’s Law can be represented in the form of two rules:
1. If two neurons on either side of a connection are activated synchronously, then the weight of that connection is increased.
2. If two neurons on either side of a connection are activated asynchronously, then the weight of that connection is decreased.
i j
I n
p u
t S
i g
n a
l s
O u
t p
u t
S i
g n
a l s
30th October 2006 Bogdan L. Vrusias © 2006 34
AI – CS289AI – CS289Machine LearningMachine Learning
Competitive LearningCompetitive Learning• In competitive learning, neurons compete among themselves to be
activated.
• While in Hebbian learning, several output neurons can be activated simultaneously, in competitive learning, only a single output neuron is active at any time.
• The output neuron that wins the “competition” is called the winner-takes-all neuron.
• Self-organising feature maps are based on competitive learning.
30th October 2006 Bogdan L. Vrusias © 2006 35
AI – CS289AI – CS289Machine LearningMachine Learning
What is a self-organising feature map?What is a self-organising feature map?• Our brain is dominated by the cerebral cortex, a very complex
structure of billions of neurons and hundreds of billions of synapses.
• The cortex includes areas that are responsible for different human activities (motor, visual, auditory, somatosensory, etc.), and associated with different sensory inputs. We can say that each sensory input is mapped into a corresponding area of the cerebral cortex. The cortex is a self-organising computational map in the human brain.
• The self-organising feature map has been introduced by Teuvo Kohonen and therefore is also called Kohonen network.
30th October 2006 Bogdan L. Vrusias © 2006 36
AI – CS289AI – CS289Machine LearningMachine Learning
What is a self-organising feature map?What is a self-organising feature map?
Input layer
Kohonen layer
(a)
Input layer
Kohonen layer
1 0(b)
0 1
30th October 2006 Bogdan L. Vrusias © 2006 37
AI – CS289AI – CS289Machine LearningMachine Learning
The Kohonen networkThe Kohonen network• The Kohonen model provides a topological mapping. It places a fixed
number of input patterns from the input layer into a higher-dimensional output or Kohonen layer.
• Training in the Kohonen network begins with the winner’s neighbourhood of a fairly large size. Then, as training proceeds, the neighbourhood size gradually decreases.
Inputlayer
O u
t p
u t
S i
g n
a l s
I n
p u
t S
i g
n a
l s
x1
x2
Outputlayer
y1
y2
y3
30th October 2006 Bogdan L. Vrusias © 2006 38
AI – CS289AI – CS289Machine LearningMachine Learning
The Kohonen networkThe Kohonen network• The lateral connections are used to create a competition between
neurons.
• The neuron with the largest activation level among all neurons in the output layer becomes the winner.
• This neuron is the only neuron that produces an output signal. The activity of all other neurons is suppressed in the competition.
• The lateral feedback connections produce excitatory or inhibitory effects, depending on the distance from the winning neuron.
30th October 2006 Bogdan L. Vrusias © 2006 39
AI – CS289AI – CS289Machine LearningMachine Learning
Competitive learning in the Kohonen networkCompetitive learning in the Kohonen network
• To illustrate competitive learning, consider the Kohonen network with 100 neurons arranged in the form of a two-dimensional lattice with 10 rows and 10 columns.
• The network is required to classify two-dimensional input vectors each neuron in the network should respond only to the input vectors occurring in its region.
• The network is trained with 1000 two-dimensional input vectors generated randomly in a square region in the interval between –1 and +1.
30th October 2006 Bogdan L. Vrusias © 2006 40
AI – CS289AI – CS289Machine LearningMachine Learning
Competitive learning in the Kohonen networkCompetitive learning in the Kohonen network
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)
-1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
W(2,j)
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)
-1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
W(2,j)
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)
-1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
W(2,j)
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1W(1,j)
-1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
W(2,j)