Artificial Intelligencekti.mff.cuni.cz/~bartak/ui2/lectures/lecture01eng.pdf · 2016. 2. 29. ·...

Post on 04-Feb-2021

0 views 0 download

transcript

  • ArtificialIntelligence

    Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic

    Introduction

    Weconstructrationalagents.Anagent isanentitythatperceives itsenvironmentthroughsensors andactsuponthatenvironmentthroughactuators.Arationalagentisanagentmaximizingitsexpectedperformancemeasure.

    InAI1wedealtmainlywithalogicalapproachtoagentdesign(nouncertainty).Weignored

    – interfacetoenvironment(sensors,actuators)– uncertainty– thepossibilityofself-improvement(learning)

  • Coursestructure

    Introduction– motivation and background on probability

    Probabilistic reasoning – uncertainty, probabilistic reasoning, Bayesian

    networks, Hidden Markov Models

    Rational decisions– utility theory, Markov Decision Processes, game

    theory, mechanism design

    Machine learning– decision trees, regression, SVM, reinforcement

    learning

    Resources

    Artificial Intelligence: A Modern Approach– S. Russell and P. Norvig– Prentice Hall, 2010 (3rd ed.)– http://aima.cs.berkeley.edu/

    Umělá inteligence 1-6– Vladimír Mařík, Olga Štěpánková,

    Jiří Lažanský a kol.– Academia

  • Coursewebsite

    http://ktiml.mff.cuni.cz/~bartak/ui2

    Youcanfindthere:– slides– linksandresources– contacts– quiz– …

    Linkstoothercourses

    SeminaronArtificialIntelligenceII– howtoapplyAItechniquesinpractice

    Machinelearning– howcancomputerslearnnewthings

    Multi-agentsystems– howtohandlemultipleagents

    Probabilisticgraphicalmodels– howtodoBayesianinferenceefficientlyetc.

    Human-likeartificialagents– howtodesignagentsforvirtualenvironments

    Practicalcourseonrobotics– howtodesignhardwareagents

  • Uncertaintysofar

    Canwehandleuncertaininformation inthepurelogicalapproach?beliefstates– representsetsofallpossibleworldstates fortheagent

    Drawbacks– a logicalagentmustconsidereverylogicallypossibleexplanationfortheobservations,nomatterhowunlikely(largeandcomplexrepresentations)

    – acorrectcontingentplanmustconsiderarbitrarylikelycontingencies(bigplans)

    – sometimesthereisnoplanthatisguaranteedtoachievethegoal,yettheagentmustact

    Example

    Diagnosingadentalpatient‘stoothacheLetustrytoapplypropositionallogic:Toothache⇒ Cavity

    Hmm,isitreallytrue?– notallpatientswithtoothacheshavecavities;someofthemhavegumdisease,anabscess,orotherproblemsToothache⇒ Cavity∨ GumProblem ∨ Abscess∨ …

    Wecouldtryturningtheruleintoacausalrule:Cavity⇒ Toothache

    Butthisisnotrighteither– notallcavitiescausepainTheonlywaytofixtheruleistomakeitlogicallyexhaustive!

  • Usingalogicalapproach

    Whydoeslogicfailtocopewithadomainlikemedicaldiagnosis?

    • laziness: itistoomuchworktolistthecompletesetofantecedentorconsequentsandtoohardtousesuchrules

    • theoreticalignorance:medicalsciencehasnocompletetheoryforthedomain

    • practicalignorance:evenifweknowalltheruleswemightbeuncertainbecausenotallthenecessarytestshavebeenorcanberun

    Weneedanothertooltodealwithdegreesofbelief–probabilitytheory.

    Alogicalagentbelieveseachsentencetobetrueorfalseorhasnoopinion.Aprobabilisticagentmayhaveanumericaldegreeofbeliefbetween0(certainlyfalse)and1(certainlytrue).

    Basicprobabilitynotation

    Likelogicalassertions,probabilisticassertionsareaboutpossibleworlds– samplespaceΩ.– thepossibleworldsaremutuallyexclusive andexhaustive

    Eachpossibleworldω isassociatedwithanumericalprobabilityP(ω)suchthat:

    0≤ P(ω)≤ 1Σω∈Ω P(ω)=1

    Example:Ifweareabouttorolltwo(distinguishable)dice,thereare36possibleworldstoconsider:(1,1),(1,2),…,(6,6)P(ω)=1/36

  • Events

    Thesetsofpossibleworldsarecalledevents.Example:„doublesarerolled“isanevent

    Theprobabilityofeventisthesumofprobabilitiesofpossibleworldsintheevent.

    P(φ) =Σω∈φ P(ω)Example:P(doubles)=1/36+1/36+1/36+1/36+1/36+1/36=1/6

    Theseprobabilitiesarecalledunconditional orprior probabilities („priors“forshort).

    Conditionalprobability

    Frequently,wehavesomeinformation(evidence)andweareinterestedinprobabilityofsomeevent.

    Forexample,whatistheprobabilityofdoubleifwealreadyknowthatfirstdierolledto5?P(doubles|Die1 =5)=1/36/(6*1/36)=1/6

    Thisiscalledconditionalor posteriorprobabilityP(a|b)=P(a∧ b)/P(b),wheneverP(B)> 0

    ThiscanbealsowritteninadifferentformcalledtheproductruleP(a∧ b)=P(a|b).P(b)

    Beware!Ifwehavemoreevidencethentheconditionalprobabilityneedstoassumeit.P(doubles |Die1 =5,Die2 =5)=1

  • Randomvariables

    Inafactoredrepresentation,apossibleworldisrepresentedbyasetofvariable/valuepairs.Variablesinprobability theoryarecalledrandomvariables.Everyrandomvariablehasadomain–thesetofpossiblevaluesitcantakeon(similarlytoaCSP).

    Die1 – representsavalueonthefirstdie1(1,…,6)Cavity– describeswhether thepatienthasorhasnotcavity(true,false)

    Apossibleworldisfullyidentifiedbyvaluesofallrandomvariables.

    P(Die1 =5,Die2 =5)

    Probabilitydistribution

    Probability of all possible worlds can be described using a table called a full joint probability distribution – the elements are indexed by values of random variables.

    Given the table, we can calculate probabilities of values of any random variable:

    P(toothache=true) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2P(toothache=false) = 0.072+ 0.008 + 0.144 + 0.576 = 0.8

    We will describe the table in a short way as:P(Toothache) = 〈0.2, 0.8〉

  • Probabilityaxioms

    P(¬a)=1– P(a)

    inclusion-exclusionprincipleP(a∨ b)=P(a)+P(b)– P(a∧ b)

    chainruleP(A,B,C,D)=P(A|B,C,D)P(B,C,D)=P(A|B,C,D)P(B|C,D)P(C,D)=P(A|B,C,D)P(B|C,D)P(C|D)P(D)

    Inferenceusingfulljointdistributions

    How to answer questions?Knowledge base isrepresented using full joint distribution.To compute posterior probability of a query proposition given observed evidence, we add up probabilities of possible worlds in which the proposition is true (marginalization or summing out).P(φ) = Σω:ω|=φ P(ω)P(Y) = Σz∈Z P(Y,z)

  • Exampleofprobabilisticinference

    P(φ)=Σω:ω|=φ P(ω)P(Y)=Σz∈Z P(Y,z)

    P(toothache) (=P(Toothache=true))=0.108+0.012+0.016+0.064 =0.2

    P(cavity ∨ toothache)=0.108+0.012+0.072+0.008+0.016+0.064=0.28

    P(¬cavity|toothache)=P(¬cavity ∧ toothache) /P(toothache)=(0.016+0.064)/(0.108+0.012+0.016+0.064)=0.4

    Normalization

    P(¬cavity|toothache)=P(¬cavity∧ toothache)/P(toothache)=(0.016+0.064)/(0.108+0.012+0.016+0.064)=0.4

    P(cavity|toothache)=P(cavity∧ toothache)/P(toothache)=(0.108+0.012)/(0.108+0.012+0.016+0.064)=0.6

    Noticethatdenominatorsareidenticalinbothformulas!Weevendonotneedtoknowtheexactvalueofdenominator:

    P(¬cavity|toothache)+P(cavity|toothache)=1Wecanuseanormalizationconstantα instead,computedsuch

    thattheevaluateddistributionaddsupto1.P(Cavity|toothache)=α P(Cavity,toothache)

    =α [P(Cavity,toothache,catch)+P(Cavity,toothache,¬catch)]= α [〈0.108,0.016〉 +〈0.012,0.064〉 ]= α [〈0.12,0.08〉 ]= [〈0.6,0.4〉 ]

  • Inferenceviaenumeration

    Inatypicalcase,weknowvalueseofrandomvariablesEfromtheobservation andwearelookingforprobabilitydistributionofrandomvariablesY fromthequery.TheotherrandomvariablesarehiddenH =X– Y– E.P(Y |E=e)=α P(Y,E=e)=α Σh P(Y,E=e,H=h)

    Somedrawbacksofinferencebyenumeration:• theworst-casetimecomplexityisO(dn),wheredisthenumberofvaluesindomainsofeachrandomvariable

    • tostorefulljointprobabilitydistributionweneedO(dn)space

    • lastbutnotleast,itisnoteasytoobtainprobabilitiesforallpossibleworlds

    Independence

    LetusexpandthefulljointdistributionbyaddingafourthvariableWeatherwiththedomain{cloudy,sunny,rain,snow}– thenewfulljointdistributionhas2x2x2x4=32elements(possibleworlds).

    P(toothache,catch,cavity,cloudy)=P(cloudy|toothache,catch,cavity)*P(toothache,catch,cavity)

    Doone‘sdentalproblemsinfluencetheweather?P(cloudy|toothache,catch,cavity)=P(cloudy)

    Wecanwriteingeneral:

    P(Toothache,Catch,Cavity,Weather)=P(Toothache,Catch,Cavity)*P(Weather)

    Hencethefulljointdistributioncanbeconstructedfromtwosmallertables,onewith8elementsandonewith4elements.

    Thispropertyiscalled(absolute)independence:P(X|Y)=P(X)orP(Y|X)=P(Y)orP(X,Y)=P(X).P(Y)

  • Conditionalindependence

    Fullindependenceallowsusreducingthesizeofthedomainrepresentation,butunfortunatelyfullindependence israreandevenindependentsubsetscanbequitelarge.

    Whenonehascavity,doescatchdependontoothache?P(catch|toothache,cavity)=P(catch|cavity)P(catch|toothache,¬ cavity)=P(catch|¬ cavity)

    RandomvariablesCatchandToothacheareindependent ifweknowthevalueofCavity.

    P(Catch|Toothache,Cavity)=P(Catch|Cavity)

    Thispropertyiscalledconditionalindependence:P(X|Y,Z)=P(X|Y)orP(Z|X,Y)=P(Z|Y)orP(Z,X|Y)=P(Z|Y)P(X|Y)

    Exploitingconditionalindependence

    Conditionalindependence canbeusedtofurtherreducethesizeofdomainrepresentation.

    P(Toothache,Catch,Cavity)=P(Toothache|Catch,Cavity)P(Catch,Cavity)=P(Toothache|Catch,Cavity)P(Catch|Cavity)P(Cavity)=P(Toothache|Cavity)P(Catch|Cavity)P(Cavity)

    Thefulljointdistributioncanbeconstructedfromthreesmallertablesofsizes2+2+1=5(onlyindependentelementsarerepresented).

  • Diagnosticsystems

    Letusgobacktodiagnosticproblems.Usuallywearelookingfordisease(thesourceofproblems)basedonsymptoms(observations).– weareinterestedinthediagnosticdirectionexpressedasconditionalprobabilityP(disease|symptoms)

    However,frompastexperienceweoftenhaveotherinformation:– theprobabilityofdiseaseP(disease)– theprobabilityofsymptomsP(symptoms)– the causalrelationexpressedasconditionalprobabilityP(symptoms|disease)

    Howcanthisinformationbeexploitedtogettheprobabilityofthediagnosticdirection?

    Bayes' rule

    RecalltheproductruleP(a∧b)=P(a|b)P(b)=P(b|a)P(a)

    WecandeduceasocalledBayes‘rule(lawortheorem):P(a|b)=P(b|a)P(a)/P(b)

    ingeneral:P(Y|X)=P(X|Y)P(Y)/P(X)= α P(X|Y)P(Y)

    ItlooksliketwostepsbackwardasnowweneedtoknowP(X|Y),P(Y),P(X).Butthesearethevaluesthatwefrequentlyhave.P(cause|effect)=P(effect|cause)P(cause)/P(effect)– P(effect|cause)describesthecausaldirection– P(cause|effect)describesthediagnosticrelation

  • UsingBayes'rule

    Medicaldiagnosis– frompastcasesweknowP(symptoms|disease),P(disease),P(symptoms)

    – foranewpatientweknowsymptomsandlookingfordiagnosisP(disease|symptoms)

    Example:– meningitiscausesastiffneck70%ofthetime– thepriorprobabilityofmeningitisis1/50000– thepriorprobabilityofstiffneckis1%Whatistheprobabilitythatapatienthavingastiffneckhasmeningitis?

    P(m|s)=P(s|m).P(m)/P(s)=0.7*1/50000/0.01=0.0014

    Whytheconditionalprobabilityforthediagnosticdirectionisnotstoreddirectly?• diagnosticknowledgeisoftenmorefragilethancausalknowledge• forexample,ifthereisasuddenepidemicofmeningitis,theunconditionalprobabilityofmeningitisP(m)willgoupsoP(m|s) shouldalsogoupwhilethecausalrelationP(s|m)isunaffectedbytheepidemic,asitreflectshowmeningitisworks

    NaiveBayes model

    Whatiftherearemoreobservations?Wecanexploitconditional independence asfollows

    P(Toothache,Catch,Cavity)=P(Toothache|Cavity)P(Catch|Cavity)P(Cavity)

    Ifalltheeffectsareconditionally independent giventhecausevariable,weget:

    P(Cause,Effect1,…,Effectn) =P(Cause)Πi P(Effecti|Cause)SuchaprobabilitydistributioniscalledanaiveBayesmodel

    (itisoftenusedevenincaseswherethe“effect”variablesarenotactuallyconditionally independentgiventhevalueofthecausevariable).

  • TheWumpus worldrevisited

    Wumpus isback!Wehaveamazewithpitsthataredetectedinneighboringsquaresviabreeze(Wumpus andgoldwillnotbeassumednow).

    Wheredoestheagentshouldgo,ifthereisbreezeat(1,2)and(2,1)?

    Purelogicalinferencecanconcludenothingaboutwhichsquareismostlikelytobesafe!

    Towhichsquaredoestheagentshouldgo?

    Wumpus:probabilisticmodel

    Booleanvariables:Pi,j – pitatsquare(i,j)Bi,j – breezeatsquare(i,j)(onlyfortheobservedsquaresB1,1,B1,2 aB2,1.

    FulljointprobabilitydistributionP(P1,2,…,P4,4,B1,1,B1,2,B2,1)=P(B1,1,B1,2,B2,1|P1,2,…,P4,4,)*P(P1,2,…,P4,4)P(P1,2,…,P4,4)=Πi,j P(Pi,j)P(P1,2,…,P4,4)=0.2n *0.816-n

    product rule

    pits are spread independently

    probability of pit is 0.2 and there are n pits

  • Wumpus:queryandsimplereasoning

    Assumethatwehaveevidence:b=b1,1 ∧ b1,2 ∧ b2,1known=¬p1,1 ∧ ¬p1,2 ∧ ¬p2,1

    WeareinterestedinansweringqueriessuchasP(P1,3|known,b).

    Answercanbecomputedbyenumerationofthefulljointprobabilitydistribution.LetUnknownbethevariablesPi,j exceptP1,3andKnown:P(P1,3|known,b)=Σunknown P(P1,3,unknown,known,b)

    ButitmeanstoexploreallpossiblevaluesofvariablesUnknownandthereare212 =4096terms!

    Canwedoitbetter(faster)?

    Wumpus:conditionalindependence

    Observation:Theobservedbreezesareconditionallyindependentoftheothervariablesgiventheknown(white),frontier(yellow),andqueryvariables.

    Wesplitthesetofhiddenvariablesintofringeandothervariables:Unknown=Fringe∪ Other

    Fromconditionalindependenceweget:P(b |P1,3,known,unknown)=P(b |P1,3,known,fringe)

    Now,letusexploitthisformula.

  • Wumpus:reasoning

    P(P1,3 | known, b)= α Σunknown P(P1,3, known, unknown, b)

    = α Σunknown P(b | P1,3, known, unknown) * P(P1,3, known, unknown)

    = α ΣfringeΣother P(b | P1,3, known, fringe,other) * P(P1,3, known, fringe,other)

    = α ΣfringeΣother P(b | P1,3,known, fringe) * P(P1,3,known,fringe,other)

    = α Σfringe P(b | P1,3,known, fringe) * Σother P(P1,3,known,fringe,other)

    = α Σfringe P(b | P1,3,known, fringe) * Σother P(P1,3)P(known)P(fringe)P(other)

    = α P(known)P(P1,3) Σfringe P(b | P1,3,known, fringe) P(fringe) Σother P(other)

    = α´ P(P1,3) Σfringe P(b | P1,3,known, fringe) P(fringe)

    α = α. P(known)Σother P(other) = 1

    product rule P(X,Y) = P(X|Y) P(Y)

    Wumpus:solution

    P(P1,3|known,b)=α´ P(P1,3)Σfringe P(b|P1,3,known,fringe)P(fringe)Letusexplorepossiblemodels(values)ofFringethatarecompatiblewithobservationb.

    P(P1,3|known,b)=α´ 〈0.2(0.04+0.16+0.16),0.8(0.04+0.16) 〉=〈 0.31,0.69 〉

    P(P2,2|known,b)=〈 0.86,0.14 〉

    Definitelyavoidthesquare(2,2)!

  • Summary

    Probabilitytheoryisaformalmechanismtohandleuncertainty.Fulljointdistributiondescribesprobabilities ofallpossibleworlds.Answers toqueriescanbeobtainedbysummingoutprobabilities ofpossibleworldsconsistentwiththeobservation.However,largerproblemswillrequireabetterapproach.Wearegoingtoexploit independence andconditionalindependence.

    © 2016 Roman BartákDepartment of Theoretical Computer Science and Mathematical Logic

    bartak@ktiml.mff.cuni.cz