Artificial Intelligencekti.mff.cuni.cz/~bartak/ui2/lectures/lecture01eng.pdf · 2016. 2. 29. ·...

transcript

ArtificialIntelligence

Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic

Introduction

Weconstructrationalagents.Anagent isanentitythatperceives itsenvironmentthroughsensors andactsuponthatenvironmentthroughactuators.Arationalagentisanagentmaximizingitsexpectedperformancemeasure.

InAI1wedealtmainlywithalogicalapproachtoagentdesign(nouncertainty).Weignored

– interfacetoenvironment(sensors,actuators)– uncertainty– thepossibilityofself-improvement(learning)

Coursestructure

Introduction– motivation and background on probability

Probabilistic reasoning – uncertainty, probabilistic reasoning, Bayesian

networks, Hidden Markov Models

Rational decisions– utility theory, Markov Decision Processes, game

theory, mechanism design

Machine learning– decision trees, regression, SVM, reinforcement

learning

Resources

Artificial Intelligence: A Modern Approach– S. Russell and P. Norvig– Prentice Hall, 2010 (3rd ed.)– http://aima.cs.berkeley.edu/

Umělá inteligence 1-6– Vladimír Mařík, Olga Štěpánková,

Jiří Lažanský a kol.– Academia

Coursewebsite

http://ktiml.mff.cuni.cz/~bartak/ui2

Youcanfindthere:– slides– linksandresources– contacts– quiz– …

Linkstoothercourses

SeminaronArtificialIntelligenceII– howtoapplyAItechniquesinpractice

Machinelearning– howcancomputerslearnnewthings

Multi-agentsystems– howtohandlemultipleagents

Probabilisticgraphicalmodels– howtodoBayesianinferenceefficientlyetc.

Human-likeartificialagents– howtodesignagentsforvirtualenvironments

Practicalcourseonrobotics– howtodesignhardwareagents

Uncertaintysofar

Canwehandleuncertaininformation inthepurelogicalapproach?beliefstates– representsetsofallpossibleworldstates fortheagent

Drawbacks– a logicalagentmustconsidereverylogicallypossibleexplanationfortheobservations,nomatterhowunlikely(largeandcomplexrepresentations)

– acorrectcontingentplanmustconsiderarbitrarylikelycontingencies(bigplans)

– sometimesthereisnoplanthatisguaranteedtoachievethegoal,yettheagentmustact

Example

Diagnosingadentalpatient‘stoothacheLetustrytoapplypropositionallogic:Toothache⇒ Cavity

Hmm,isitreallytrue?– notallpatientswithtoothacheshavecavities;someofthemhavegumdisease,anabscess,orotherproblemsToothache⇒ Cavity∨ GumProblem ∨ Abscess∨ …

Wecouldtryturningtheruleintoacausalrule:Cavity⇒ Toothache

Butthisisnotrighteither– notallcavitiescausepainTheonlywaytofixtheruleistomakeitlogicallyexhaustive!

Usingalogicalapproach

Whydoeslogicfailtocopewithadomainlikemedicaldiagnosis?

• laziness: itistoomuchworktolistthecompletesetofantecedentorconsequentsandtoohardtousesuchrules

• theoreticalignorance:medicalsciencehasnocompletetheoryforthedomain

• practicalignorance:evenifweknowalltheruleswemightbeuncertainbecausenotallthenecessarytestshavebeenorcanberun

Weneedanothertooltodealwithdegreesofbelief–probabilitytheory.

Alogicalagentbelieveseachsentencetobetrueorfalseorhasnoopinion.Aprobabilisticagentmayhaveanumericaldegreeofbeliefbetween0(certainlyfalse)and1(certainlytrue).

Basicprobabilitynotation

Likelogicalassertions,probabilisticassertionsareaboutpossibleworlds– samplespaceΩ.– thepossibleworldsaremutuallyexclusive andexhaustive

Eachpossibleworldω isassociatedwithanumericalprobabilityP(ω)suchthat:

0≤ P(ω)≤ 1Σω∈Ω P(ω)=1

Example:Ifweareabouttorolltwo(distinguishable)dice,thereare36possibleworldstoconsider:(1,1),(1,2),…,(6,6)P(ω)=1/36

Events

Thesetsofpossibleworldsarecalledevents.Example:„doublesarerolled“isanevent

Theprobabilityofeventisthesumofprobabilitiesofpossibleworldsintheevent.

P(φ) =Σω∈φ P(ω)Example:P(doubles)=1/36+1/36+1/36+1/36+1/36+1/36=1/6

Theseprobabilitiesarecalledunconditional orprior probabilities („priors“forshort).

Conditionalprobability

Frequently,wehavesomeinformation(evidence)andweareinterestedinprobabilityofsomeevent.

Forexample,whatistheprobabilityofdoubleifwealreadyknowthatfirstdierolledto5?P(doubles|Die1 =5)=1/36/(6*1/36)=1/6

Thisiscalledconditionalor posteriorprobabilityP(a|b)=P(a∧ b)/P(b),wheneverP(B)> 0

ThiscanbealsowritteninadifferentformcalledtheproductruleP(a∧ b)=P(a|b).P(b)

Beware!Ifwehavemoreevidencethentheconditionalprobabilityneedstoassumeit.P(doubles |Die1 =5,Die2 =5)=1

Randomvariables

Inafactoredrepresentation,apossibleworldisrepresentedbyasetofvariable/valuepairs.Variablesinprobability theoryarecalledrandomvariables.Everyrandomvariablehasadomain–thesetofpossiblevaluesitcantakeon(similarlytoaCSP).

Die1 – representsavalueonthefirstdie1(1,…,6)Cavity– describeswhether thepatienthasorhasnotcavity(true,false)

Apossibleworldisfullyidentifiedbyvaluesofallrandomvariables.

P(Die1 =5,Die2 =5)

Probabilitydistribution

Probability of all possible worlds can be described using a table called a full joint probability distribution – the elements are indexed by values of random variables.

Given the table, we can calculate probabilities of values of any random variable:

P(toothache=true) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2P(toothache=false) = 0.072+ 0.008 + 0.144 + 0.576 = 0.8

We will describe the table in a short way as:P(Toothache) = 〈0.2, 0.8〉

Probabilityaxioms

P(¬a)=1– P(a)

inclusion-exclusionprincipleP(a∨ b)=P(a)+P(b)– P(a∧ b)

chainruleP(A,B,C,D)=P(A|B,C,D)P(B,C,D)=P(A|B,C,D)P(B|C,D)P(C,D)=P(A|B,C,D)P(B|C,D)P(C|D)P(D)

Inferenceusingfulljointdistributions

How to answer questions?Knowledge base isrepresented using full joint distribution.To compute posterior probability of a query proposition given observed evidence, we add up probabilities of possible worlds in which the proposition is true (marginalization or summing out).P(φ) = Σω:ω|=φ P(ω)P(Y) = Σz∈Z P(Y,z)

Exampleofprobabilisticinference

P(φ)=Σω:ω|=φ P(ω)P(Y)=Σz∈Z P(Y,z)

P(toothache) (=P(Toothache=true))=0.108+0.012+0.016+0.064 =0.2

P(cavity ∨ toothache)=0.108+0.012+0.072+0.008+0.016+0.064=0.28

P(¬cavity|toothache)=P(¬cavity ∧ toothache) /P(toothache)=(0.016+0.064)/(0.108+0.012+0.016+0.064)=0.4

Normalization

P(¬cavity|toothache)=P(¬cavity∧ toothache)/P(toothache)=(0.016+0.064)/(0.108+0.012+0.016+0.064)=0.4

P(cavity|toothache)=P(cavity∧ toothache)/P(toothache)=(0.108+0.012)/(0.108+0.012+0.016+0.064)=0.6

Noticethatdenominatorsareidenticalinbothformulas!Weevendonotneedtoknowtheexactvalueofdenominator:

P(¬cavity|toothache)+P(cavity|toothache)=1Wecanuseanormalizationconstantα instead,computedsuch

thattheevaluateddistributionaddsupto1.P(Cavity|toothache)=α P(Cavity,toothache)

=α [P(Cavity,toothache,catch)+P(Cavity,toothache,¬catch)]= α [〈0.108,0.016〉 +〈0.012,0.064〉 ]= α [〈0.12,0.08〉 ]= [〈0.6,0.4〉 ]

Inferenceviaenumeration

Inatypicalcase,weknowvalueseofrandomvariablesEfromtheobservation andwearelookingforprobabilitydistributionofrandomvariablesY fromthequery.TheotherrandomvariablesarehiddenH =X– Y– E.P(Y |E=e)=α P(Y,E=e)=α Σh P(Y,E=e,H=h)

Somedrawbacksofinferencebyenumeration:• theworst-casetimecomplexityisO(dn),wheredisthenumberofvaluesindomainsofeachrandomvariable

• tostorefulljointprobabilitydistributionweneedO(dn)space

• lastbutnotleast,itisnoteasytoobtainprobabilitiesforallpossibleworlds

Independence

LetusexpandthefulljointdistributionbyaddingafourthvariableWeatherwiththedomain{cloudy,sunny,rain,snow}– thenewfulljointdistributionhas2x2x2x4=32elements(possibleworlds).

P(toothache,catch,cavity,cloudy)=P(cloudy|toothache,catch,cavity)*P(toothache,catch,cavity)

Doone‘sdentalproblemsinfluencetheweather?P(cloudy|toothache,catch,cavity)=P(cloudy)

Wecanwriteingeneral:

P(Toothache,Catch,Cavity,Weather)=P(Toothache,Catch,Cavity)*P(Weather)

Hencethefulljointdistributioncanbeconstructedfromtwosmallertables,onewith8elementsandonewith4elements.

Thispropertyiscalled(absolute)independence:P(X|Y)=P(X)orP(Y|X)=P(Y)orP(X,Y)=P(X).P(Y)

Conditionalindependence

Fullindependenceallowsusreducingthesizeofthedomainrepresentation,butunfortunatelyfullindependence israreandevenindependentsubsetscanbequitelarge.

Whenonehascavity,doescatchdependontoothache?P(catch|toothache,cavity)=P(catch|cavity)P(catch|toothache,¬ cavity)=P(catch|¬ cavity)

RandomvariablesCatchandToothacheareindependent ifweknowthevalueofCavity.

P(Catch|Toothache,Cavity)=P(Catch|Cavity)

Thispropertyiscalledconditionalindependence:P(X|Y,Z)=P(X|Y)orP(Z|X,Y)=P(Z|Y)orP(Z,X|Y)=P(Z|Y)P(X|Y)

Exploitingconditionalindependence

Conditionalindependence canbeusedtofurtherreducethesizeofdomainrepresentation.

P(Toothache,Catch,Cavity)=P(Toothache|Catch,Cavity)P(Catch,Cavity)=P(Toothache|Catch,Cavity)P(Catch|Cavity)P(Cavity)=P(Toothache|Cavity)P(Catch|Cavity)P(Cavity)

Thefulljointdistributioncanbeconstructedfromthreesmallertablesofsizes2+2+1=5(onlyindependentelementsarerepresented).

Diagnosticsystems

Letusgobacktodiagnosticproblems.Usuallywearelookingfordisease(thesourceofproblems)basedonsymptoms(observations).– weareinterestedinthediagnosticdirectionexpressedasconditionalprobabilityP(disease|symptoms)

However,frompastexperienceweoftenhaveotherinformation:– theprobabilityofdiseaseP(disease)– theprobabilityofsymptomsP(symptoms)– the causalrelationexpressedasconditionalprobabilityP(symptoms|disease)

Howcanthisinformationbeexploitedtogettheprobabilityofthediagnosticdirection?

Bayes' rule

RecalltheproductruleP(a∧b)=P(a|b)P(b)=P(b|a)P(a)

WecandeduceasocalledBayes‘rule(lawortheorem):P(a|b)=P(b|a)P(a)/P(b)

ingeneral:P(Y|X)=P(X|Y)P(Y)/P(X)= α P(X|Y)P(Y)

ItlooksliketwostepsbackwardasnowweneedtoknowP(X|Y),P(Y),P(X).Butthesearethevaluesthatwefrequentlyhave.P(cause|effect)=P(effect|cause)P(cause)/P(effect)– P(effect|cause)describesthecausaldirection– P(cause|effect)describesthediagnosticrelation

UsingBayes'rule

Medicaldiagnosis– frompastcasesweknowP(symptoms|disease),P(disease),P(symptoms)

– foranewpatientweknowsymptomsandlookingfordiagnosisP(disease|symptoms)

Example:– meningitiscausesastiffneck70%ofthetime– thepriorprobabilityofmeningitisis1/50000– thepriorprobabilityofstiffneckis1%Whatistheprobabilitythatapatienthavingastiffneckhasmeningitis?

P(m|s)=P(s|m).P(m)/P(s)=0.7*1/50000/0.01=0.0014

Whytheconditionalprobabilityforthediagnosticdirectionisnotstoreddirectly?• diagnosticknowledgeisoftenmorefragilethancausalknowledge• forexample,ifthereisasuddenepidemicofmeningitis,theunconditionalprobabilityofmeningitisP(m)willgoupsoP(m|s) shouldalsogoupwhilethecausalrelationP(s|m)isunaffectedbytheepidemic,asitreflectshowmeningitisworks

NaiveBayes model

Whatiftherearemoreobservations?Wecanexploitconditional independence asfollows

P(Toothache,Catch,Cavity)=P(Toothache|Cavity)P(Catch|Cavity)P(Cavity)

Ifalltheeffectsareconditionally independent giventhecausevariable,weget:

P(Cause,Effect1,…,Effectn) =P(Cause)Πi P(Effecti|Cause)SuchaprobabilitydistributioniscalledanaiveBayesmodel

(itisoftenusedevenincaseswherethe“effect”variablesarenotactuallyconditionally independentgiventhevalueofthecausevariable).

TheWumpus worldrevisited

Wumpus isback!Wehaveamazewithpitsthataredetectedinneighboringsquaresviabreeze(Wumpus andgoldwillnotbeassumednow).

Wheredoestheagentshouldgo,ifthereisbreezeat(1,2)and(2,1)?

Purelogicalinferencecanconcludenothingaboutwhichsquareismostlikelytobesafe!

Towhichsquaredoestheagentshouldgo?

Wumpus:probabilisticmodel

Booleanvariables:Pi,j – pitatsquare(i,j)Bi,j – breezeatsquare(i,j)(onlyfortheobservedsquaresB1,1,B1,2 aB2,1.

FulljointprobabilitydistributionP(P1,2,…,P4,4,B1,1,B1,2,B2,1)=P(B1,1,B1,2,B2,1|P1,2,…,P4,4,)*P(P1,2,…,P4,4)P(P1,2,…,P4,4)=Πi,j P(Pi,j)P(P1,2,…,P4,4)=0.2n *0.816-n

product rule

pits are spread independently

probability of pit is 0.2 and there are n pits

Wumpus:queryandsimplereasoning

Assumethatwehaveevidence:b=b1,1 ∧ b1,2 ∧ b2,1known=¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1

WeareinterestedinansweringqueriessuchasP(P1,3|known,b).

Answercanbecomputedbyenumerationofthefulljointprobabilitydistribution.LetUnknownbethevariablesPi,j exceptP1,3andKnown:P(P1,3|known,b)=Σunknown P(P1,3,unknown,known,b)

ButitmeanstoexploreallpossiblevaluesofvariablesUnknownandthereare212 =4096terms!

Canwedoitbetter(faster)?

Wumpus:conditionalindependence

Observation:Theobservedbreezesareconditionallyindependentoftheothervariablesgiventheknown(white),frontier(yellow),andqueryvariables.

Wesplitthesetofhiddenvariablesintofringeandothervariables:Unknown=Fringe∪ Other

Fromconditionalindependenceweget:P(b |P1,3,known,unknown)=P(b |P1,3,known,fringe)

Now,letusexploitthisformula.

Wumpus:reasoning

P(P1,3 | known, b)= α Σunknown P(P1,3, known, unknown, b)

= α Σunknown P(b | P1,3, known, unknown) * P(P1,3, known, unknown)

= α ΣfringeΣother P(b | P1,3, known, fringe,other) * P(P1,3, known, fringe,other)

= α ΣfringeΣother P(b | P1,3,known, fringe) * P(P1,3,known,fringe,other)

= α Σfringe P(b | P1,3,known, fringe) * Σother P(P1,3,known,fringe,other)

= α Σfringe P(b | P1,3,known, fringe) * Σother P(P1,3)P(known)P(fringe)P(other)

= α P(known)P(P1,3) Σfringe P(b | P1,3,known, fringe) P(fringe) Σother P(other)

= α´ P(P1,3) Σfringe P(b | P1,3,known, fringe) P(fringe)

α = α. P(known)Σother P(other) = 1

product rule P(X,Y) = P(X|Y) P(Y)

Wumpus:solution

P(P1,3|known,b)=α´ P(P1,3)Σfringe P(b|P1,3,known,fringe)P(fringe)Letusexplorepossiblemodels(values)ofFringethatarecompatiblewithobservationb.

P(P1,3|known,b)=α´ 〈0.2(0.04+0.16+0.16),0.8(0.04+0.16) 〉=〈 0.31,0.69 〉

P(P2,2|known,b)=〈 0.86,0.14 〉

Definitelyavoidthesquare(2,2)!

Summary

Probabilitytheoryisaformalmechanismtohandleuncertainty.Fulljointdistributiondescribesprobabilities ofallpossibleworlds.Answers toqueriescanbeobtainedbysummingoutprobabilities ofpossibleworldsconsistentwiththeobservation.However,largerproblemswillrequireabetterapproach.Wearegoingtoexploit independence andconditionalindependence.

© 2016 Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic

bartak@ktiml.mff.cuni.cz

Artificial Intelligencekti.mff.cuni.cz/~bartak/ui2/lectures/lecture01eng.pdf · 2016. 2. 29. ·...

Documents