Bayesian Network David Grannen Mathieu Robin Micheal Lynch Sohail Akram Tolu Aina.

Bayesian NetworkDavid GrannenMathieu RobinMicheal LynchSohail AkramTolu Aina

Bayesianism is a controversial but increasingly popular approach of statistics that offers many benefits, although not everyone is persuaded of its validity

Bayesians Networks based on a statistical approach presented by a mathematician, Thomas Bayes in 1763.

This is an approach for calculating probabilities among several variables that are causally related but for which the relationships can't easily be derived by experimentation.Bayes formula provides the mathematical tool that combines prior knowledge with current data to produce a posterior distribution

It most likely seemed to be a complicated formula that looked something like this: P(a|b) = L(b|a)P(a) / [ L(b|a)P(a) + L(b|not a)P( not a) ] Following medical example, we have a patient who is concerned about his/her chances of experiencing a heart attack. Historical data that we have Population experiences heart attacks:

20%Smokers experience heart attacks :

90% (of all)Without experience of a heart attack smokers:

60% P(heart attack | smoker) = L(smoker | heart attack)Prior(heart attack) / [ L(smoker | heart attack)Prior(heart attack) + L(smoker | no heart attack)Prior(no heart attack) ]orP(heart attack | smoker) = (90% * 20%) / [ (90% * 20%) + (60% * 80%) ]P(heart attack | smoker) = 27%

Bayesian networks are complex diagrams that organize the body of knowledge in any given area by mapping out cause-and-effect relationships among key variables and encoding them with numbers that represent the extent to which one variable is likely to affect another.

This approach allows scientists to combine new data with their existing knowledge or expertise.

In the late 1980 on the basis of work of Judea Pearl, a professor of computer science at UCLA, AI researchers discovered that Bayesian networks offered an efficient way to deal with the lack or ambiguity of information that has hampered previous systems.

Bayesian networks provide "an overarching graphical framework" that brings together diverse elements of AI and increases the range of its likely application to the real world

Bayesian applications

Decision-making using Bayesian methods has many applications in software applications. Best-known example is Microsoft's Office Assistant .When a user calls up the assistant, Bayesian methods are used to analyse recent actions in order to try to work out what the user is attempting to do, with this calculation constantly being modified in the light of new actions. Microsoft is the most aggressive in exploiting Bayesian approach. The company offers a free Web service that helps customers diagnose printing problems with their computers and recommends the quickest way to resolve them. Another Web service helps parents diagnose their children's health problems.

Scott Musman, a computer consultant in Arlington, Va., recently designed a Bayesian network for the Navy that can identify enemy missiles, aircraft or vessels and recommend which weapons could be used most advantageously against incoming targets.

General Electric is using Bayesian techniques to develop a system that will take information from sensors attached to an engine and, based on expert opinion built into the system as well as vast amounts of data on past engine performance, pinpoint emerging problems

Representation of Graphical Models

•Graphical models are graphs in which nodes represent random variables.

•A Bayesian Network is kind of directed graphical model , which takes into account the directionality of the arcs. (arrows between nodes)

•Advatage ofa directed graphical model is that one can regard an arc from A to B as indicating that A ``causes'' B..

A B

Graphical Models 2

•Along with Graph , it is necessary to specify the parameters of the model.

•For a directed model, we must specify the Conditional Probability Distribution (CPD) at each node.

•If the variables are discrete, this can be represented as a table (CPT), which lists the probability that the child node takes on each of its different values for each combination of values of its parents.

Example – wet Grass

Example – Wet grass•Event “grass is wet – 2 causes Rain or sprinkler.

•From table Pr(W = true) | S=true, R= False0 = 0.9 , each row sums to 1.0 so Pr(W = false | S=true , R = false) = 0.1

•Developing Inference from the Bayesian networks

Inference We observe the grass is wet- 2 causes sprinkler or rain .. Which is more likely ???

Pr(S=1|W=1) = Σ Pr(S=1, W=1) / Pr(W=1) = 0.2781/0.6

Pr(S=1|W=1) = Σ Pr(R=1, W=1) / Pr(W=1) = 0.4581/0.6

Normalizing Pr(W=1) = 0.6471

Inference 2

Pr(S=1| W=1) =0.2781/0.6471 = = 0.429

Pr((R=1|W=1) = 0.4581 / 0.6471 == 0.7079

More likely grass is wet because its raining!!

• Example given is “bottom up” Bayes Network from effects to causes. Top down reasoning also possible using example above we can deduce probability grass is wet given that its cloudy.

Inference (cont.)

Inference is concerned with, how can we use graphical models to efficiently answer probabilistic queries?Uses Bayes thoeremP(B|A) = odds P(A|B) / 1 + P(A|B)A prior probability is based on previously observed dataConditional probability of the form P(B|A)

Scenario

Apartment with a smoke detectorSmoke detector near bathroomTaking shower often triggers detector (smoke detectors detect stream)

Scenario (2)

B (burn dinner)

O (plan to go out ) A (smoke alarm)

S (take shower)

F( Electrical Fire)

Bayes theorem

Bayesian P(B|A) = = odds P(A|B) / 1 + P(A|B) = Likelihood(A|B) * odds(B) / 1 + (Likelihood(A|B) * odds(B)) = Likelihood(A|B) * P(B) / P(A|B') * P(B')

Bayes theorem (2)

Conditional probabilities specify the degree of belief in some proposition or propositions based on the assumption that some other propositions are true. Therefore the theory has no meaning without prior resolution of the probability of these antecedent propositions.

Approch

Top down The probability an event will occur

given it a prior probabilityBottom up Reasoning which starts from effect

and tries to determine the causes

types of inference (a) Predictive - a can cause b(b) Diagnostic - b is evidence of a(c) Intercasual - a and b can cause c

a explains c so its evidence against b

(“explaining away”,“Berkson's paradox”, or "selection bias")

a

b

a

b c

a b

Example

The a priori probability of a burglary B is 0.0001.

The conditional probability of an alarm A given a burglary is Pr(A|B)

Example (2)

Burglary No Burglary +----------+----------+Alarm | 0.95 | 0.01 | +----------+----------+No Alarm | 0.05 | 0.99 | +----------+----------+ What is value of Pr(B|A)?

Example (3)

Pr(B|A) = odds(B|A) / 1 + odds(B|A) = Where odds(B|A) = Likelihood(A|B) *

odds(B) = P(A|B) / P(A|B') / Pr(B) / P(B’) = 0.95 / 0.01 * 0.0001/ .9999 = 0.0095 Pr(B|A) =0.0095 / 1.0095 = 0.00941 An alarm implies that , burglary is 94

times more likely than a priori

Bayesian Learning

Sources.A Tutorial on Learning Bayesian Networks by David HeckermanMSR-TR-95-06

Learning Bayesian Networks from Databy Nir Friedman and Moises Goldszmidtfrom Berkeley and SRI International

The easier side to Bayesian Learning

ChorusIn the Theory we can build a sample,With Convergeance surely guarenteed,But beware of autocorrelations,Or it will take forever to succeed!

Verse 4When it runs aint it thrillinTo the last Iteration.It frolics and plays throughout n-spaceWalkin’ in a Bayesian Wonderland

EndingRandom walkin’ in a Bayesian Wonderland.

In perspective

Where Learning enters the arena

Bayesian Networks Summarise as follows as; Efficient representations of probability

distributions Local Models Independence

Effective representations of Probability Distributions for Computing posterior probabilities Computing most probable instantiation Decision making

But there is more i.e. Statistical Induction -> Learning

The Learning Process

Done by Encode existing ‘expert’ knowledge in a

Bayesian Network Use a database to update this knowledge –

creating one or more new Bayesian Networks Results in

Refinement of original knowledge Sometimes the identification of new distinctions

and relationships Robust to the errors in knowledge of experts

Similar to Neural Net Learning

But with the following advantages We can easily encode expert knowledge –

increasing efficiency and accuracy of learning

Nodes and Arcs in learned Bayesian Networks often correspond to recognizable distinctions and causal relationships

Thus it is easier to understand and interpreted the knowledge encoded in the representation

Bayesian Learning– The Problem

Known Structure

Unknown Structure

Complete Data

Statistical parametric estimation

Discrete optimization over structures

Incomplete Data

Parametric optimization

Combined

Why Learning

Feasibility of Learning Availability of data and computational power

Need for Learning Characteristics of current systems and

processes Defy closed form analysis

=> need data driven approach for characterisation Scale and change fast

=> need continuous automatic adaptation Examples

Communications networks, illegal activities, the brain, economic markets

Why Learn a Bayesian Network

Combine knowledge engineering and statistical induction Covers the whole spectrum from knowledge

intensive model construction to data intensive model induction

More than a learning black-box Explanation of outputs Interpretability and modifiability Algorithms for decision making, value of

information diagnosis an repairCausal representation , reasoning and discovery i.e. does smoking cause cancer

A Simple ExampleWang presents a simple example in [2] using only the first four operations, which I reproduce in abbreviated form here. He begins with the following 8 statements:

1. robin (= feathered-creature <1.00, 0.90> 2. bird (= feathered-creature <1.00, 0.90> 3. wan (= bird <1.00, 0.90> 4. wan (= swimmer <1.00, 0.90> 5. gull (= bird <1.00, 0.90> 6. gull (= swimmer <1.00, 0.90> 7. row (= bird <1.00, 0.90> 8. row (= swimmer <0.00, 0.90>

(Note that giving a statement with a frequency of 0.00 simply means that it is not true.) The system is then asked to evaluate the truth value of "robin (= swimmer". It comes to the following conclusions, in this order:

9. robin (= bird <1.00, 0.45> (1 and 2, abduction) 10. bird (= swimmer <1.00, 0.45> (3 and 4, induction) 11. obin (= swimmer <1.00, 0.20> (9 and 10, deduction) 12. bird (= swimmer <1.00, 0.45> (5 and 6, induction) 13. bird (= swimmer <1.00, 0.62> (10 and 12, revision) 14. ird (= swimmer <0.00, 0.45> (7 and 8, induction) 15. bird (= swimmer <0.67, 0.71> (13 and 14, revision) 16. robin (= swimmer <0.67, 0.32> (9 and 15, deduction)

Note that NARS actually comes to a great many more conclusions than this, but the ones shown are the ones that actually lead toward the conclusion. Also, NARS reports the conclusions at both lines 11 and 16, since the guesswork involved necessarily means it needs to be able to change its mind, as it were. The final conclusion, given at line 16, means that two thirds of the relevant evidence indicates that a robin can swim, but that this conclusion has somewhat less than one third of the possible degree of confidence; both of these items, of course, indicate the need for more information :-).

A Comparison with another Learning Technique

Current Topics

Time Beyond discrete time and beyond fixed rate

Causality Removing the assumptions

Hidden Variables Where to place them and how many

Model Evaluation and active learning What parts of it are suspect and what and

how much data is needed.

Decision Theory (1)

What happens when it is time to convert beliefs into actions?

Decision Theory = Probability Theory + Utility

Theory

Decision Theory (2)

Decompose a multi-attribute utility fonction into a sum of local utilitiesEach term is a node, which has as parents: The random variables on which it depends The action (control) nodes The resulting graph is an influence diagram

Finally, compute the optimal sequence of actions to perform to maximize expected utility

Applications (1)

QMR-DT: a decision-theoretic reformulation of the Quick Medical Reference model

Some Applications

Biostatistics – Medical Research Council Bayesian Inferance Using Gibbs Sampling

BUGS)Data Analysis – NASA (AutoClass)Collaborative filtering – Microsoft (Microsoft Belief Networks - MSBN)

Fraud Detection – ATTSpeech recognition – UC Berkeley

Applications (2)

Real-Time decision: NASA’s system VisaGenetics: linkage analysisSpeech recognitionData compression: density estimationCoding: turbocodes

Applications : MS Office

MS office assistant: The Lumière Project

Source: The Lumière Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users, by E. Horvitz, J. Breese, D. Heckerman, D. Hovel, K. Rommelse (Microsoft Research)

MS Office (2)

User behaviour is monitored to determine Assistant actions. Examples: Search Focus of attention Introspection Undesired effects Inefficient command sequences Domain-specific syntactic and semantic

content

MS Office (3)

Portion of a Bayesian Net for infering the likehood that a user needs assistance, considering profile info and recent activity

Date post:	18-Dec-2015
Category:	Documents
Upload:	ronald-thomas
View:	218 times
Download:	0 times

Bayesian Network David Grannen Mathieu Robin Micheal Lynch Sohail Akram Tolu Aina.

Documents