Date post: | 18-Dec-2015 |
Category: |
Documents |
Upload: | ronald-thomas |
View: | 218 times |
Download: | 0 times |
Bayesian NetworkDavid GrannenMathieu RobinMicheal LynchSohail AkramTolu Aina
Bayesianism is a controversial but increasingly popular approach of statistics that offers many benefits, although not everyone is persuaded of its validity
Bayesians Networks based on a statistical approach presented by a mathematician, Thomas Bayes in 1763.
This is an approach for calculating probabilities among several variables that are causally related but for which the relationships can't easily be derived by experimentation.Bayes formula provides the mathematical tool that combines prior knowledge with current data to produce a posterior distribution
It most likely seemed to be a complicated formula that looked something like this: P(a|b) = L(b|a)P(a) / [ L(b|a)P(a) + L(b|not a)P( not a) ] Following medical example, we have a patient who is concerned about his/her chances of experiencing a heart attack. Historical data that we have Population experiences heart attacks:
20%Smokers experience heart attacks :
90% (of all)Without experience of a heart attack smokers:
60% P(heart attack | smoker) = L(smoker | heart attack)Prior(heart attack) / [ L(smoker | heart attack)Prior(heart attack) + L(smoker | no heart attack)Prior(no heart attack) ]orP(heart attack | smoker) = (90% * 20%) / [ (90% * 20%) + (60% * 80%) ]P(heart attack | smoker) = 27%
Bayesian networks are complex diagrams that organize the body of knowledge in any given area by mapping out cause-and-effect relationships among key variables and encoding them with numbers that represent the extent to which one variable is likely to affect another.
This approach allows scientists to combine new data with their existing knowledge or expertise.
In the late 1980 on the basis of work of Judea Pearl, a professor of computer science at UCLA, AI researchers discovered that Bayesian networks offered an efficient way to deal with the lack or ambiguity of information that has hampered previous systems.
Bayesian networks provide "an overarching graphical framework" that brings together diverse elements of AI and increases the range of its likely application to the real world
Bayesian applications
Decision-making using Bayesian methods has many applications in software applications. Best-known example is Microsoft's Office Assistant .When a user calls up the assistant, Bayesian methods are used to analyse recent actions in order to try to work out what the user is attempting to do, with this calculation constantly being modified in the light of new actions. Microsoft is the most aggressive in exploiting Bayesian approach. The company offers a free Web service that helps customers diagnose printing problems with their computers and recommends the quickest way to resolve them. Another Web service helps parents diagnose their children's health problems.
Scott Musman, a computer consultant in Arlington, Va., recently designed a Bayesian network for the Navy that can identify enemy missiles, aircraft or vessels and recommend which weapons could be used most advantageously against incoming targets.
General Electric is using Bayesian techniques to develop a system that will take information from sensors attached to an engine and, based on expert opinion built into the system as well as vast amounts of data on past engine performance, pinpoint emerging problems
Representation of Graphical Models
•Graphical models are graphs in which nodes represent random variables.
•A Bayesian Network is kind of directed graphical model , which takes into account the directionality of the arcs. (arrows between nodes)
•Advatage ofa directed graphical model is that one can regard an arc from A to B as indicating that A ``causes'' B..
A B
Graphical Models 2
•Along with Graph , it is necessary to specify the parameters of the model.
•For a directed model, we must specify the Conditional Probability Distribution (CPD) at each node.
•If the variables are discrete, this can be represented as a table (CPT), which lists the probability that the child node takes on each of its different values for each combination of values of its parents.
Example – wet Grass
Example – Wet grass•Event “grass is wet – 2 causes Rain or sprinkler.
•From table Pr(W = true) | S=true, R= False0 = 0.9 , each row sums to 1.0 so Pr(W = false | S=true , R = false) = 0.1
•Developing Inference from the Bayesian networks
Inference We observe the grass is wet- 2 causes sprinkler or rain .. Which is more likely ???
Pr(S=1|W=1) = Σ Pr(S=1, W=1) / Pr(W=1) = 0.2781/0.6
Pr(S=1|W=1) = Σ Pr(R=1, W=1) / Pr(W=1) = 0.4581/0.6
Normalizing Pr(W=1) = 0.6471
Inference 2
Pr(S=1| W=1) =0.2781/0.6471 = = 0.429
Pr((R=1|W=1) = 0.4581 / 0.6471 == 0.7079
More likely grass is wet because its raining!!
• Example given is “bottom up” Bayes Network from effects to causes. Top down reasoning also possible using example above we can deduce probability grass is wet given that its cloudy.
Inference (cont.)
Inference is concerned with, how can we use graphical models to efficiently answer probabilistic queries?Uses Bayes thoeremP(B|A) = odds P(A|B) / 1 + P(A|B)A prior probability is based on previously observed dataConditional probability of the form P(B|A)
Scenario
Apartment with a smoke detectorSmoke detector near bathroomTaking shower often triggers detector (smoke detectors detect stream)
Scenario (2)
B (burn dinner)
O (plan to go out ) A (smoke alarm)
S (take shower)
F( Electrical Fire)
Bayes theorem
Bayesian P(B|A) = = odds P(A|B) / 1 + P(A|B) = Likelihood(A|B) * odds(B) / 1 + (Likelihood(A|B) * odds(B)) = Likelihood(A|B) * P(B) / P(A|B') * P(B')
Bayes theorem (2)
Conditional probabilities specify the degree of belief in some proposition or propositions based on the assumption that some other propositions are true. Therefore the theory has no meaning without prior resolution of the probability of these antecedent propositions.
Approch
Top down The probability an event will occur
given it a prior probabilityBottom up Reasoning which starts from effect
and tries to determine the causes
types of inference (a) Predictive - a can cause b(b) Diagnostic - b is evidence of a(c) Intercasual - a and b can cause c
a explains c so its evidence against b
(“explaining away”,“Berkson's paradox”, or "selection bias")
a
b
a
b c
a b
Example
The a priori probability of a burglary B is 0.0001.
The conditional probability of an alarm A given a burglary is Pr(A|B)
Example (2)
Burglary No Burglary +----------+----------+Alarm | 0.95 | 0.01 | +----------+----------+No Alarm | 0.05 | 0.99 | +----------+----------+ What is value of Pr(B|A)?
Example (3)
Pr(B|A) = odds(B|A) / 1 + odds(B|A) = Where odds(B|A) = Likelihood(A|B) *
odds(B) = P(A|B) / P(A|B') / Pr(B) / P(B’) = 0.95 / 0.01 * 0.0001/ .9999 = 0.0095 Pr(B|A) =0.0095 / 1.0095 = 0.00941 An alarm implies that , burglary is 94
times more likely than a priori
Bayesian Learning
Sources.A Tutorial on Learning Bayesian Networks by David HeckermanMSR-TR-95-06
Learning Bayesian Networks from Databy Nir Friedman and Moises Goldszmidtfrom Berkeley and SRI International
The easier side to Bayesian Learning
ChorusIn the Theory we can build a sample,With Convergeance surely guarenteed,But beware of autocorrelations,Or it will take forever to succeed!
Verse 4When it runs aint it thrillinTo the last Iteration.It frolics and plays throughout n-spaceWalkin’ in a Bayesian Wonderland
EndingRandom walkin’ in a Bayesian Wonderland.
In perspective
Where Learning enters the arena
Bayesian Networks Summarise as follows as; Efficient representations of probability
distributions Local Models Independence
Effective representations of Probability Distributions for Computing posterior probabilities Computing most probable instantiation Decision making
But there is more i.e. Statistical Induction -> Learning
The Learning Process
Done by Encode existing ‘expert’ knowledge in a
Bayesian Network Use a database to update this knowledge –
creating one or more new Bayesian Networks Results in
Refinement of original knowledge Sometimes the identification of new distinctions
and relationships Robust to the errors in knowledge of experts
Similar to Neural Net Learning
But with the following advantages We can easily encode expert knowledge –
increasing efficiency and accuracy of learning
Nodes and Arcs in learned Bayesian Networks often correspond to recognizable distinctions and causal relationships
Thus it is easier to understand and interpreted the knowledge encoded in the representation
Bayesian Learning– The Problem
Known Structure
Unknown Structure
Complete Data
Statistical parametric estimation
Discrete optimization over structures
Incomplete Data
Parametric optimization
Combined
Why Learning
Feasibility of Learning Availability of data and computational power
Need for Learning Characteristics of current systems and
processes Defy closed form analysis
=> need data driven approach for characterisation Scale and change fast
=> need continuous automatic adaptation Examples
Communications networks, illegal activities, the brain, economic markets
Why Learn a Bayesian Network
Combine knowledge engineering and statistical induction Covers the whole spectrum from knowledge
intensive model construction to data intensive model induction
More than a learning black-box Explanation of outputs Interpretability and modifiability Algorithms for decision making, value of
information diagnosis an repairCausal representation , reasoning and discovery i.e. does smoking cause cancer
A Simple ExampleWang presents a simple example in [2] using only the first four operations, which I reproduce in abbreviated form here. He begins with the following 8 statements:
1. robin (= feathered-creature <1.00, 0.90> 2. bird (= feathered-creature <1.00, 0.90> 3. wan (= bird <1.00, 0.90> 4. wan (= swimmer <1.00, 0.90> 5. gull (= bird <1.00, 0.90> 6. gull (= swimmer <1.00, 0.90> 7. row (= bird <1.00, 0.90> 8. row (= swimmer <0.00, 0.90>
(Note that giving a statement with a frequency of 0.00 simply means that it is not true.) The system is then asked to evaluate the truth value of "robin (= swimmer". It comes to the following conclusions, in this order:
9. robin (= bird <1.00, 0.45> (1 and 2, abduction) 10. bird (= swimmer <1.00, 0.45> (3 and 4, induction) 11. obin (= swimmer <1.00, 0.20> (9 and 10, deduction) 12. bird (= swimmer <1.00, 0.45> (5 and 6, induction) 13. bird (= swimmer <1.00, 0.62> (10 and 12, revision) 14. ird (= swimmer <0.00, 0.45> (7 and 8, induction) 15. bird (= swimmer <0.67, 0.71> (13 and 14, revision) 16. robin (= swimmer <0.67, 0.32> (9 and 15, deduction)
Note that NARS actually comes to a great many more conclusions than this, but the ones shown are the ones that actually lead toward the conclusion. Also, NARS reports the conclusions at both lines 11 and 16, since the guesswork involved necessarily means it needs to be able to change its mind, as it were. The final conclusion, given at line 16, means that two thirds of the relevant evidence indicates that a robin can swim, but that this conclusion has somewhat less than one third of the possible degree of confidence; both of these items, of course, indicate the need for more information :-).
A Comparison with another Learning Technique
Current Topics
Time Beyond discrete time and beyond fixed rate
Causality Removing the assumptions
Hidden Variables Where to place them and how many
Model Evaluation and active learning What parts of it are suspect and what and
how much data is needed.
Decision Theory (1)
What happens when it is time to convert beliefs into actions?
Decision Theory = Probability Theory + Utility
Theory
Decision Theory (2)
Decompose a multi-attribute utility fonction into a sum of local utilitiesEach term is a node, which has as parents: The random variables on which it depends The action (control) nodes The resulting graph is an influence diagram
Finally, compute the optimal sequence of actions to perform to maximize expected utility
Applications (1)
QMR-DT: a decision-theoretic reformulation of the Quick Medical Reference model
Some Applications
Biostatistics – Medical Research Council Bayesian Inferance Using Gibbs Sampling
BUGS)Data Analysis – NASA (AutoClass)Collaborative filtering – Microsoft (Microsoft Belief Networks - MSBN)
Fraud Detection – ATTSpeech recognition – UC Berkeley
Applications (2)
Real-Time decision: NASA’s system VisaGenetics: linkage analysisSpeech recognitionData compression: density estimationCoding: turbocodes
Applications : MS Office
MS office assistant: The Lumière Project
Source: The Lumière Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users, by E. Horvitz, J. Breese, D. Heckerman, D. Hovel, K. Rommelse (Microsoft Research)
MS Office (2)
User behaviour is monitored to determine Assistant actions. Examples: Search Focus of attention Introspection Undesired effects Inefficient command sequences Domain-specific syntactic and semantic
content
MS Office (3)
Portion of a Bayesian Net for infering the likehood that a user needs assistance, considering profile info and recent activity