Decision Making - The University of Edinburgh · 2018-02-13 · Decision Making in Robots and...

Post on 16-Jul-2020

0 views 0 download

transcript

DecisionMakingin Robots and Autonomous Agents

Causality:

Howshouldarobotreasonaboutcauseandeffect?

SubramanianRamamoorthySchoolofInforma@cs

13February,2018

WhatdoyouNeedtoKnowaboutyourRobot?

13/02/18 2

WhatdoesRobotNeedtoKnow?

•  Givenaccesstorawdatachannelsforvarious(uninterpreted)sensorsandmotors

•  Deviseaprocedureforlearningthatwilltellyouwhatyouneedforvarioustasks(asyetunspecified)–  Whattypesofmodels?–  Whattypesoflearningmethods?

13/02/18 3

WhatareyouLearningfrom?

13/02/18 4

AnExperiment:HowMuchcanweLearnfromUninterpretedData?

•  LearnmodelsofrobotandenvironmentwithnoiniQalknowledgeofwhatsensorsandactuatorsaredoing

•  Manylearningmethodsbeginthisway,e.g.,RL,butthegoalhereistoconstructarepresentaQonincrementallyandconQnuallyaswell

13/02/18 5

[D. Pierce, B.J. Kuipers, Map learning with un-interpreted sensors and effectors, Artificial Intelligence 91:169-227, 1997.]

SimpleScenario

•  RobotcriVerhasasetofdistancesensors(range)–oneofwhichisdefecQve–butitdoesn’tknowthatyet

•  Othersensors:baVerypower,digitalcompass

•  Ithasatrack-stylemotorapparatus–turnbydifferenQallyactuaQngitswheels

13/02/18 6

WhatdoyouLearnfrom?

RandomizedacQons(holdarandomlychosenacQonfor10Qmesteps),repeatedlyapplied

13/02/18 7

How does environment appear in the data? Can there be a simple empirical learning scheme?

OneStep:GofromRawChannelstoStructureofSensorArray

•  Sensorsmaycomeingroupings:ringofdistancesensors,arrayofphotoreceptors,videocamera,etc.

•  Wefirstwanttoextractgroupingsbasedontwocriteria:–  SensorsthathavesimilarvaluesoverQme–  Sensorsthathaveasimilarfrequencydomainbehaviour

•  Twosimplehypothesiseddistancemetrics:

13/02/18 8

Distribution (e.g., counts)

ExampleTrace

13/02/18 9

d1 d2

ExtendingtheGroupNoQon

WecanreasontransiQvelyaboutsimilarity:So,awanderingtracemightyieldsomethinglikethisasgroups:

13/02/18 10

UponTakingtheTransi'veClosure

13/02/18 11

GedngattheStructureofArray

•  TaskistofindanassignmentofposiQons(inspace)toelementsthatcapturesthestructureofthearrayasreflectedindistancemetricd1.

•  DistancebetweenposiQonsinimage≈distancebetweenelementsaccordingtod1.

•  ThisisaconstraintsaQsfacQonproblem:nsensorelementsyieldn(n-1)/2constraints.

•  Couldsolvebymetricscaling:

13/02/18 12

StructuralModelofDistanceArray

13/02/18 13

VariousTypesofModels

•  ModelsofmoQon–  Owndynamics–  Objectdynamics–  Otheragents

•  Modelsofenvironment–  Space&howImoveinspace–  OthernavigaQonconsideraQons

•  Modelsofself–  WhatistheconnecQonbetweenmysensorsandactuators?–  Whatdothesensorimotorchannelsevenmean?–  Howtogroundalloftheaboveatthislowlevel?

13/02/18 14

Example:Solar-HeatedHouse(Ljung)

•  Thesunheatstheairinthesolarpanels•  Theairispumpedintoaheatstorage(boxfilledwithpebbles)•  Thestoredenergycanbelatertransferredtothehouse•  Forcontrol,onecaresabouthowsolarradiaQon,w(t),andpumpvelocity,

u(t),affectheatstoragetemperature,y(t).

13/02/18 15

SystemIdenQficaQoninEngineering

Inbuildingamodel,thedesignerhascontroloverthreepartsoftheprocess1.  GeneraQngthedataset

2.  SelecQnga(setof)modelstructure(e.g.,autoregressivelinearmodel)

3.  SelecQngthecriteria(e.g.,leastsquaresoveroutputerror),usedtospecifytheopQmalparameteresQmates

Averypopularapproachinvolves(recursive)parameteresQmaQon Validate

Model

Calculate Model

Choose Criterion of Fit

Choose Model Set

Data

Experiment Design

Priors

J

L

13/02/18 16

OntheNatureofScienQficQuesQons

ScienceseekstounderstandandexplainphysicalobservaQons•  Whydoesn’tthewheel

turn?•  WhatifImakethebeam

halfasthick,willitcarrytheload?

•  HowdoIshapethebeamsoitwillcarrytheload?

13/02/18 17

WhatDoLawsTellUsAboutCausality?

•  DoesacceleraQoncausetheforce?•  DoestheforcecausetheacceleraQon?•  Doestheforcecausethemass?

13/02/18 18

DifferentViewsonCausaQon

•  Hume(1711–1776)[CausaQonaspercepQon]WerememberseeingtheflameandfeelingasensaQoncalledheat;withoutfurtherceremony,wecallonecauseandtheothereffect•  Pearson(1857–1936)[StaQsQcalMachineLearningview]ForgetcausaQon!CorrelaQonisallyoushouldaskfor.

•  Pearl(1936-)[MathemaQcsofcausality]ForgetempiricalobservaQons!Definecausalitybasedonanetworkonknown,physicalcausalrelaQonships

13/02/18 19

TwoMajorQuesQonsaboutCausality

1.  LearningofcausalconnecQons:WhatempiricalevidencelegiQmizesacause–effectconnecQon?–  HowdopeopleeveracquireknowledgeofcausaQon–  e.g.,doesaroostercausethesuntorise?–  succession,correlaQonsarenotsufficient–  e.g.Roosterscrowbeforedawn,bothicecreamsalesandcrimerateincreaseatthesameQme(insummermonths)

2.  UseofcausalconnecQon–  WhatinferencescanbedrawnfromcausalinformaQonandhow?

–  e.g.whatwouldchangeiftheroosterweretocausethesuntorise,canwemakethenightshorterbywakinghimupearly?

13/02/18 20

WhatisSpecialabouttheseQuesQons?

•  Theseare“WhatIf?”kindofquesQons

•  IntervenQonalquesQonssuchas“WhatifIact?”•  RetrospecQveorexplanatoryquesQonssuchas“WhatifIhad

acteddifferently?”

•  HowwouldweanswersuchquesQonsusingthestandardmachinelearningtoolbox?

Discuss

13/02/18 21

ThreeLayerCausalHierarchy

•  WecanthinkintermsofaclassificaQonofcausalinformaQon

•  BasedonthetypeofquesQonsthateachclassiscapableofanswering

•  3–levelhierarchyinthesensethatquesQonsataleveli(i=1,2,3)canonlybeansweredifinformaQonfromalevelj(jgreaterthanorequaltoi)isavailable

13/02/18 22

3-layerCausalHierarchy

13/02/18 23

[Pearl 2017]

3-layerCausalHierarchy

Associa@on:invokespurelystaQsQcalrelaQonships,defineddirectlybytherawdata•  Thisislearntbyany“black-box”ofpurelymodelfreeand

datadrivenalgorithm•  Famousexamplessuchasthatdiapersandbeerareowen

boughttogether

Interven@on:rankshigherbecauseitasksaboutachangeinobservedvariables•  Example:whathappensifwedoubletheprice–howwillthe

customerrespond?

13/02/18 24

3-levelCausalHierarchy

Counterfactuals:“WhatifIhadacteddifferently?”•  SubsumeintervenQonalandassociaQonalquesQons

Ifwehaveamodelatahigherlevel,thelowerlevelcanbeansweredeasilye.g.,ifwehadcounterfactualmodel,thentheintervenQonalquesQoncanbesimplyposedas:

Whatwouldhappenifwedoubletheprice?=Whatwouldhappenhadthepricebeendoubleitscurrentvalue?

13/02/18 25

AnotherWaytoConceptualizeHierarchy

13/02/18 26

ExtendedVersionofHierarchy

13/02/18 27

JudeaPearl’sModel:MajorIdeas

Concept Formaliza@on

CausaQon EncodingofbehaviourunderintervenQon

IntervenQon Surgeriesonmechanisms

Mechanisms FuncQonalRelaQonshipsbyequaQonsandgraphs

13/02/18 28

Pearl’sModel:KeySteps

•  DeviseacomputaQonalschemeforcausalitytofacilitatepredicQonoftheeffectsof“acQons”–  Use“IntervenQon”for“AcQon”–  AsacQonsareexternalenQQesoriginaQng“outside”thetheory

•  Mechanism:Autonomousphysicallawsormechanismsofinterest– Wecanchangeonewithoutchangingtheothers–  e.g.logicgatesofacircuit,mechanicallinkages

13/02/18 29

Pearl’sModel:KeySteps

•  IntervenQon–  Breakdownofamechanism=surgery

•  Causality–  WhichmechanismistobesurgicallymodifiedbyagivenacQon

13/02/18 30

ExampletoPonder-1

•  Ifthegrassiswet,thenitrained•  IfwebreakthisboVle,thegrassgetswet

•  Conclusion:IfwebreakthisboVle,thenitrained!

13/02/18 31

ExampletoPonder-2

•  Asuitcasewillopeniffbothlocksareopen•  Therightlockisopen•  Whathappensifweopenthelewlock?

•  Notsure–therightlockmightgetclosed!

13/02/18 32

ModellingCausality

CausalModelM=(U, V, F)•  U=Exogenousvariables

–  Valuesaredeterminedbyfactorsoutsidethemodel

•  V=Endogenousvariables–  ValuesaredescribedbystructuralequaQons

•  FisasetofstructuralequaQons(endogenous)–  FXisamapping,tellsusthevalueofXgiventhevaluesofalltheothervariablesinUandV

–  representsamechanismorlawintheworld

13/02/18 33

{FX |X 2 V }

Example:ModellingCausality

•  Forestfirecouldbecausedbylightningoralitmatchbyanarsonist

•  Endogenousvariables,Boolean–  Fforfire–  Lforlightning–  MLformatchlit.

•  Exogenousvariables,U–  Whetherwoodisdry–  Whetherthereisenoughoxygenintheair

13/02/18 34

FF (U,L,ML) s.t. F = 1 if L = 1 or ML = 1

CausalNetworks

13/02/18 35

IntervenQon/ConQngency

13/02/18 36

Counterfactuals

13/02/18 37

ActualCauses

13/02/18 38

ADefiniQonofActualCause

13/02/18 39

13/02/18 40

MeasureofCausality:Responsibility

13/02/18 41

ProbabilisQcCausalModel

Representedbyapair(M,P(u))

•  P(u)isaprobabilityfuncQondefinedovertheexogenousvariablesU

•  EachendogenousvariableinVisafuncQonofexogenousvariablesU –  alsogivesadistribuQononV

•  Inturngivestheprobabilityofcounter-factualstatement orsimply

13/02/18 42

Pr(YX=x

= y) Pr(YX = y)

ProbabilisQcModel

NecessityTheprobabilitythateventy wouldnothaveoccurredintheabsenceofeventx,(=y’x’),giventhatxandydidinfactoccurSufficiencyTheprobabilitythatsedngxwouldproducey inasituaQonwherex & y areinfactabsentAbilityofeventx to produce event y13/02/18 43

Pr(YX=x

0 = y

0|X = x, Y = y)

= Pr(y0x

0 |x, y)

Pr(YX=x

= y|X = x

0, Y = y

0)

= Pr(yx

|x0, y

0)

WorkedExampleonStructuralEquaQons:CondiQonalProbabilityvs.AcQon

13/02/18 44

Observing versus Acting to make X3 = ON

CondiQonalProbabilityofaCounterfactualSentence

Ifwewanttocomputeprobabilityof:“{ifitwereAthenB}givenevidencee”

wemightusethefollowingthreestepprocedure:1.  AbducQon

–  UpdateP(u)byevidencetogetP(u|e)2.  AcQon

–  ModifyMbyacQondo(A),whereAisantecedantofthecounterfactual,toyieldMA

3.  DeducQon–  UseP(u|e)andMAtocomputeprobabilityof

counterfactualconsequenceB 13/02/18 45

Pearl’sViewofaStructuralEquaQonsbased“InferenceEngine”

13/02/18 46

Answer to query

Answer + estimated confidence

Fit of data to model assumptions

[Pearl 2017]

Recap:InfluenceDiagrams[Howard&Matheson‘84]

•  InfluenceDiagrams(ID)extendBayesianNetworksfordecisionmaking.

•  Rectanglesaredecisions;ovalsarechancevariables;diamondsareuQlityfuncQons.

•  Graphtopologydescribesdecisionproblem.

•  EachnodespecifiesaprobabilitydistribuQon(CPD)giveneachvalueofparents.

13/02/18 47

Multi-agent Influence Diagrams �[Milch and Koller ‘01]

•  Extend Influence Diagrams to the multi-agent case.

•  Rectangles and diamonds represent decisions and utilities associated with agents; ovals represent chance variables.

•  A strategy for a decision is a mapping from the informational parents of the decision to a value in its domain.

•  A strategy profile includes strategies for all decisions.

13/02/18 48

ReasoningPaVernsthroughIDs

•  Informally,areasoningpaVernisaformofargumentthatleadstoandexplainsadecision–  e.g.

•  modusponensinlogic•  explainingawayinBayesnets

•  WhatreasoningpaMernscanagentsuseininterac(vedecisionmakingcontexts?

13/02/18

[A. Pfeffer & Y. Gal, On the reasoning patterns of agents in games, In Proc. AAAI 2007]

49

CharacterizaQonofReasoningPaVerns

•  FourbasicreasoningpaVerns,eachcharacterizedbypathsinamulQple-agentversionofinfluencediagrams

•  CharacterizaQonbasedongraphicalcriteriaonly–  couldfurtherrefinecharacterizaQonbasedonnumericalparameters

13/02/18 50

ReasoningPaVern#1:DirectEffect

•  AnagenttakesadecisionbecauseofitsdirecteffectonitsuQlity–  withoutbeingmediatedbyotheragents’acQons

Drill

Profit

13/02/18 51

ReasoningPaVern#2:ManipulaQon

•  Childknowsaboutparent’sacQon•  Parentdoesnotcareaboutreading,butwantschildtobrushteeth•  Childdislikesbrushingteethbutlikesbeingreadto⇒ Parentcanmanipulatechild

Offer to Read

Parent

Brush Teeth

Child

13/02/18 52

ReasoningPaVern#3:Signaling

•  AcommunicatessomethingthatsheknowstoB,thusinfluencingB’sbehavior

Recommendation

Alice

Choice

Bob

Better Restaurant

13/02/18 53

ReasoningPaVern#4:Revealing/Denying

•  Drillercaresaboutoil•  Testerreceivesfeeifdrillerdrills•  Testercausesdrillertofindout(ornot)aboutinformaQon

testerherselfdoesnotknow

Seismic Structure

Oil Test Result

Drill

Test

Tester’s Profit Driller’s Profit

13/02/18 54

Example:TwoStagePrincipal-AgentGame

Type

Rep0 Rep1

P1 P2

A1 A2

U(A1) U(A2) U(P2) U(P1)

13/02/18

Type: described parameters specific to an agent Rep: Quantification of “Reputation”

55

DirectEffectForAllFourDecisions

Type

Rep0 Rep1

P1 P2

A1 A2

U(A1) U(A2) U(P2) U(P1)

13/02/18 56

ManipulaQon(P1→A1)

Type

Rep0 Rep1

P1 P2

A1 A2

U(A1) U(A2) U(P2) U(P1)

13/02/18 57

ManipulaQon(P2→A2)

Type

Rep0 Rep1

P1 P2

A1 A2

U(A1) U(A2) U(P2) U(P1)

13/02/18 58

Signaling(A1signalsTypetoP2)

Type

Rep0 Rep1

P1 P2

A1 A2

U(A1) U(A2) U(P2) U(P1)

13/02/18 59

Signaling(A1signalsTypetoP2)

Type

Rep0 Rep1

P1 P2

A1 A2

U(A1) U(A2) U(P2) U(P1)

13/02/18 60

Signaling(A1signalsTypetoP2)

Type

Rep0 Rep1

P1 P2

A1 A2

U(A1) U(A2) U(P2) U(P1)

13/02/18 61

Signaling(A1signalsTypetoP2)

Type

Rep0 Rep1

P1 P2

A1 A2

U(A1) U(A2) U(P2) U(P1)

13/02/18 62

Signaling(A1signalsTypetoP2)

Type

Rep0 Rep1

P1 P2

A1 A2

U(A1) U(A2) U(P2) U(P1)

13/02/18 63

Revealing/Denying(P1revealsTypetoP2)

Type

Rep0 Rep1

P1 P2

A1 A2

U(A1) U(A2) U(P2) U(P1)

13/02/18 64

Revealing/Denying(P1revealsTypetoP2)

Type

Rep0 Rep1

P1 P2

A1 A2

U(A1) U(A2) U(P2) U(P1)

13/02/18 65

Acknowledgement

ThesourceofsomeoftheseslidesisaVLDB2014tutorialenQtled“CausalityandExplanaQonsinDatabases”,byMeliou,Roy,Suciu.

13/02/18 66