Strategies Without Frontiers

Strategies without frontiers

Meredith L. PattersonBSidesLVAugust 5, 2014Strategieswithoutfrontiers

This is mostly a talk about game theory, founded by John von Neumann and Oskar Morgenstern in 1944.

Game theory is part of econ, which is way more than just macro/micro where money goes

Weird that the study of decision-making is called the dismal science, though to be fair the more you look at the problem of allocating finite resources, the more hard truths you run up against about physics and human nature

Game theory provides a framework for refining our decision-making models as more information about datas structure comes in1

I hate boring problemsI especially hate solving tiny variations on the same boring problem over and over againThe internet is full of the same boring problems over and over againBoth in the cloud and in the circusNot my circus, not my monkeysmotivation

the circus = social media

Im largely giving this talk because Im tired of assholes being better at coordination than people who arent assholes.

Keith Alexander is consulting for $600K/month on the grounds of some kind of behaviour analysis secret sauce. So, other people are thinking about these problems too.2

Information theoryProbability theoryFormal language theory (of course)Control theoryFirst-order logicHaskell

Also appearing in this talk

Keep the Shannon/Weaver model of communication in your head: two endpoints communicating over a possibly noisy channel of finite bandwidth, who have to serialize their messages to the channel and parse incoming messages off the channel. Both serialization and parsing can produce errors.

This isnt really a langsec talk, but well still be talking about boundaries of competence. In a signaling game, how much confidence you can have in the signal you received being the one that was transmitted depends on how reliably you can receive signals in the language of the channel and how reliably the sender serializes them.

We wont be getting all that deeply into feedback loops, but if you know how they work, keep them in mind.

I kinda lied about the only math you need being the ability to compare two numbers; itll help later in the talk if you can read first-order logic notation, but its not really necessary.3

When an unknown agent acts, how do you react?Observation of side effectsSignals the agent sendsPast interactions with othersFormal language theory(if youre a computer)Systematic knowledge about the structure of interactions and the incentives involved in themIt is pitch black. You are likely to be eaten by a grue.

1: I.e., effects on the environment.

2: So important, they named a class of games after them.

3: The quality of your data is really important here.

4: Langsec wont be making much of an appearance in this talk, but when all the agents are machines, its relevant. Who do you think is going to be driving all those automated exploit generators DARPA is soliciting? People? At first, maybe, but not for long. Drones are expensive and hard to build. More servers are not. And in any case, being able to tell where FLT matters and where it doesnt is an important distinction. Decidable problems are priceless; for everything else theres heuristics, and when those inevitably fail, theres Mastercard.

5: Game theory is the framework well be building up this knowledge around, but well be pulling from all the fields I mentioned earlier.4

Everything You Actually Need to Know About Classical Game Theoryin math and psychologyChanging the GameExtensive form and signaling gamesMultiplayer and long-running gamesReasoning Under Uncertainty, Over Real Dataoutline

5

Everything you actually need to know about classical game theory

6

PlayersInformation available at each decision pointPossible actions at each decision pointPayoffs for each outcome

Strategies (pure or mixed)Or behaviour, in iterated or turn-taking gamesEquilibriaDifferent kinds of games have different kinds of equilibriaWhats in a game?

The four elements at the top are all you need to define a game.

Strategies and equilibria are derived from the structure of the game youre playing.7

a, bc, de, fg, h

A normal form gameCooperateDefectCooperateDefect

8

Pure strategy: fully specified set of moves for every situationMixed strategy: probability assigned to each possible move, random path through game treeBehaviour strategies: probabilities assigned at information setsstrategies

Behavior strategies and mixed strategies are functionally equivalent as long as the player has perfect recall. (Kuhns theorem) So behavior strategies are a bit more like how people act in real life.9

Prisoners dilemma-1, -1-3, 00, -3-2, -2

CooperateDefectCooperateDefectd, e > a, b > g, h > c, f

First described in 1950 by Merrill Flood and Melvin Dresher

Four payoffs: Temptation, for screwing the other guy, Reward, for cooperating, Punishment, for defecting, and Sucker, for being defected on.

Because Reward > Punishment, mutual cooperation is better than mutual defection

Because Temptation > Reward and Punishment > Sucker, defection is the dominant strategy for both agents

Its a dilemma because mutual cooperation is better than mutual defection, but at the *individual* level, defection is superior to cooperation.

10

Matching pennies1, -1-1, 1-1, 11, -1

HeadsTailsHeadsTailsa = d = f = g > b = c = e = h

Basically rock-paper-scissors but with only two options.

There is no pure strategy that is a best response here, since what you always want is to choose the opposite of what your opponent picked.

11

deadlock1, 10, 33, 02, 2

CooperateDefectCooperateDefecte > g > a > c and d > h > b > f

Here, the mutually beneficial outcome is also the dominant outcome: there is no conflict between self-interest and mutual benefit. Still, its an interesting basis for a signaling game, since theres still some incentive to screw the other guy.12

Stag hunt2, 20, 11, 01, 1

StagHareStagHarea = b > d = e = g = h > c = f

The classic social cooperation game, originally described by Jean-Jacques Rousseau.

Two pure-strategy equilibria: both cooperate or both defect. Cooperating is payoff dominant, defecting is risk dominant.

13

chicken0, 0-1, 11, -1-10, -10

SwerveStraightSwerveStraighte > a > c > g and d > b > f > h

Chicken is more of an anti-coordination game choosing the same action creates negative externalities, so you want to not coordinate14

Hawk/doveShareFightShareFighte > a > c > g and d > b > f > h

Proposed by John Maynard Smith and George Price in 1973 in Nature to describe conflict among animals over resources

V is the value of the contested resource, C is the cost of getting into a fight

Often considered as a signaling game theres a round of threatening each other before choosing their moves15

Battle of the sexes3, 20, 00, 02, 3

OperaFootballOperaFootball(a > g and h > b) > c = d = e = f

Also known as conflicting interest coordination

One partner wants to go to the opera, the other wants to go to the ball game, but theyd both rather be together than go to different events. They forgot which one to go to, each knows that the other forgot, and they cant communicate. Where should each go?

Two pure strategy equilibria: both opera or both football. But this is unfair, since one person consistently gets a higher payoff than the other.

One mixed strategy: go to your preferred event with 60% probability. But this is inefficient, because players miscoordinate 52% of the time, so the expected utility is 1.2, which is worse than if either person always goes to their non-preferred event.16

Games can be zero-sum or non-zero-sumGames can be about conflict or cooperationActions are not inherently morally valencedPayoffs determine type of game, strategyWhat have we seen so far?

Types of games overlap in various ways

Zero-sum: the gains/losses of all players balance out to zero. Matching Pennies is zero-sum; Prisoners Dilemma and Stag Hunt are non-zero sum.

All zero-sum games are competitive; non-zero-sum games can be competitive or noncompetitive

An action is just an action. Theres nothing inherently good or bad about choosing Heads or Tails in Matching Pennies; the morality of snitching in PD depends on your ethical framework around snitching, the morality of going off to hunt rabbits in Stag Hunt depends on whether you agreed to hunt a stag beforehand and how seriously you take keeping your word.

As we go on, well look at more complicated games ones that go on longer, have more players, where players have uncertain information about each other, and even ones where the game being played changes form as the game goes on.17

Cournot equilibrium: each actors output maximizes its profit given the outputs of other actorsNash equilibrium: each actor is making the best decision they can, given what they know about each others decisionsSubgame perfect equilibrium: eliminates non-credible threatsTrembling hand equilibrium: considers the possibility that a player might make an unintended moveequilibrium

Cournot equilibrium: Antoine Augustin Cournot, 1838. He was talking about businesses, e.g. factories, but it generalises.

Nash equilibrium: nobody can do better by changing their strategy. In the Prisoners Dilemma, this is clear: any player who wants to cooperate knows that the other guy can defect on him and screw him, so hes better off defecting.

A subgame is a subset of the tree of a game. In subgame perfect equilibrium, all subgames have a Nash equilibrium. Start at the outcomes, work backward, removing branches that involve a player making a non-optimal move.

Trembling hand i.e., you might miss and hit the big red button instead18

Transactional analysis:games people play

Traditional game theory assumes that all agents are rational. But in the 1960s, Eric Berne looked at irrational games the sorts of social games that people entice each other into for attention, sympathy, and other kinds of psychological payoffs, while hiding their true motives.

Berne drops the assumption that players are driven by the most rational angels of their nature, and looks at the payoffs of ulterior-motive social games as ways for players to satisfy unmet emotional needs. So in effect were now considering players to have two sets of preferences that impact their decision-making: one that the rational System 2 uses when making considered decisions, one that the prerational System 1 uses when making quick heuristic decisions.19

Mind gamesAs far as the theory of games is concerned, the principle which emerges here is that any social intercourse whatsoever has a biological advantage over no intercourse at all.

Humans are social animals. We all have biological drives to interact with other members of our species to some extent or another and when that drive is demanding to be satisfied, an argument can serve the same purpose as a productive discussion or even a hug, if what a person is fundamentally looking for is external recognition that they exist.

Payoff comes in the form of neurotransmitter activity. Berne didnt go into that, and the imaging equipment we need to investigate this directly doesnt exist yet, but we can black-box it (Skinner-box it?) with behaviorism: each player experiences some consequences from each interaction, as reinforcement or as punishment.

Positive reinforcement a rewarding stimulus (a chocolate, a kiss, &c)Negative reinforcement removal of an aversive stimulus (eg when someone stops yelling at you)Positive punishment an aversive stimulusNegative reinforcement removal of a rewarding stimulus

Berne identified stimulus hunger, recognition hunger, and structure hunger. Status hunger is probably a combination of the latter two.20

ProceduresOperationsRitualsPastimes(Predatory) GamesTypes of interactions

Procedure: a series of complementary transactions toward some physical end.

Operation: a set of transactions undertaken for a specific, stated purpose. If you ask explicitly for something, like reassurance or support, and you get it, thats an operation.

Ritual: a stereotyped series of simple complementary transactions programmed by external social forces

Pastime: an iterated ritual, with state; can turn into status gaming (establishment of a pecking order)

People spend a *lot* of time on pastimes thats why theyre called that. Facebook is largely a pastime for most people. So is Twitter. When different clusters pastimes collide, you get fireworks because pastimes have a ritual quality (jargon, signaling certain beliefs, &c) and people dont know what pre-existing state theyre walking into.

Game: an ongoing series of complementary ulterior transactions progressing to a well-defined predictable outcome. IOW, the initiator of the game has a goal in mind and isnt being upfront about it. If you ask for reassurance and then turn that against the person, thats a game.21

Hands or roles = playersExtensive form; players move in response to each otherAdvantagesExistential advantage: confirmation of existing beliefsInternal psychological advantage: direct emotional payoffExternal psychological advantage: avoiding a feared situationInternal social advantage: structure/position with respect to other playersExternal social advantage: as above, wrt non-players

Bernes games: structure

Bernes work is pretty heavily based in Freud; hes got this parent/child/adult triad of ego states, and posits that people fall into authoritarian parent modes or contrarian child modes when they play power games with each other. Its kind of a just-so story, so were not really going to get into it. But we will look at the roles that the context of various mind games establishes for the players.

Since games are a series of complementary ulterior transactions, that means theres turn-taking. Each move is considered to be a stroke, i.e., something that affects the other player in some way.

Advantages ~ payoffs.Existential advantage is that sense that events in the world are confirming your beliefs about how the world works, even if you manipulated the events to that end.Emotional payoff here is analogous to positive reinforcement, external psychological advantage is analogous to negative reinforcement. If you win the game, youre raising the likelihood that youll behave that way again, because youve reinforced the evidence that playing games works.Internal and external social advantage are about status and limiting other players moves. If you signal as oppressed, people who prioritize oppression will limit what they do on your behalf.

22

Kick MeGoal: SympathyFind someone to beat on you, then whine about itMy misfortunes are better than yoursAint It AwfulCan be a pastime, but also manifests as a gamePlayer displays distress; payoff is sympathy and helpWhy Dont You Yes, ButPlayer claims to want advice. Player doesnt really want it.Goal: Reassurance

Bernes games: examples

Aint It Awful taken to the pathological extreme manifests as things like Munchausen syndrome or M-by-proxy

In Why Dont You Yes But, the initiator really wants reassurance that their problem is not their fault, but they get it manipulatively by challenging people to present solutions they cant find fault with. Obviously they can nitpick anything to death.

Courtroom pick a victim/scapegoat and pick them apart, most effectively in front of a jury of their peers23

Now Ive Got You, You Son Of A BitchGoal: Justification (or just money)Three-handed version is the badger gameRolesVictimAggressorConfederateMovesProvocation AccusationDefence AccusationDefence Punishment

The badger game

Introduce the idea of changing the game here the mark thinks its one game (the one where if he wins he gets laid at the end), but what he doesnt know is that hes playing a different game (the one where if he wins he doesnt get beaten up but does lose his wallet).

Can be played with just a victim and an aggressor, as long as the victim does something that the aggressor can construe as the victim screwing up in some way

Confederate lures the victim into provoking the aggressor.24

Schlemiel, in Bernes glossaryMoves:Provocation resentment(repeat)If B responds with anger, A appears justified in more angerIf B keeps their cool, A still keeps pushingtrolling

Often about getting the target to embarrass themselves in some way typically by overreacting and saying something theyll regret later. (Im doubtful as to whether the target ever does actually regret it later, but well set that aside for now.)

Berne talks about there being an apology->forgiveness phase of the game, though trolls really arent in it for the forgiveness. So this might be better considered a modification.

Note that a trolls actions revolve around sending signals to some receiver in an attempt to provoke an overreaction. Engaging is therefore a feedback loop providing the troll with more material to feed into its signal generation function. Proceed with caution.

And on that note, lets take a closer look at the class of games that we can use to model interactions involving two-way communication: signaling games.25

Social mediaOrganic responses against predatory gamesPredator Alert Tool/r/TumblrInAction known trolls wiki Those just happen to be ones I know aboutA truly generic reputation system is probably a pipe dreamWikipediaeBayBut for these, we have to extend the basic mathematical model.Other monkey gameboards

26

Dissecting a signaling game

27

THE SETUP

Get it out of your system now, because youre going to hear balls more often than any other noun in the clips that follow. I counted.28

The type

SplitSteal1

This is the beginning of an extensive form game tree for this game.

The unfilled dot in the center is the root. It indicates who makes the first move in this case player 1.

Traditionally the first move is made by Nature and is taken to be the type of the player in a job interview, whether the candidate being interviewed is competent or incompetent; when you buy someone a drink, whether theyre interested in you or not interested in you; when youre deciding whether to tell someone a secret, whether theyre trustworthy or untrustworthy.

But since player 1 has already decided whether hes going to split or steal, hes making the first move.29

BOTH SPLIT

30

Both split

SplitSteal

111ABSplitSplit226800,68006800,6800

31

One splits, one steals

32

One splits, one steals

SplitSteal

111ABSplitSplit6800,68006800,680022

ASplit2StealSteal

BSplit20,136000,1360013600,013600,0

33

Both steal

34

Both steal

SplitSteal

111ABSplitSplit6800,68006800,680022

ASplit2StealSteal

BSplit20,136000,1360013600,013600,0StealSteal0,00,0

35

Normal formAlso known as the Friend-or-Foe game.1, 10, 22, 00, 0

SplitStealSplitSteald = e > a = b > c = f = g = h

Similar to Prisoners Dilemma, except that if you decide to screw each other, you both get screwed just as badly as you would if you cooperated but the other guy defected. Being a sucker isnt any worse for you materially, at least than betting you can screw the other guy and being wrong.36

observation

37

First move: nicks choice

SplitSteal

111Im likely to splitIm likely to stealSplitSplit6800,68006800,68002

SplitStealSteal

Im likely to stealSplit0,136000,1360013600,013600,0StealSteal0,00,0Im likely to split2

38

signaling

Poll the audience after this segment is over. What do they think Ibrahim will pick? What do they think Nick will pick?

Radiolab interviewed both these guys after the show. In the studio, the argument went on for 45 minutes and the audience was booing Nick over and over again. He stuck to his guns the whole time, so in uncompressed time, his signal was fairly unambiguous. 39

Second move: nicks signal

SplitSteal


SplitStealSteal


We dont know whether Nick has actually chosen Split or Steal at this point. Hes signaled unambiguously that he plans to steal, which means that if Ibrahim decides his signal is credible, Ibrahim can only operate on the lower right quadrant of the graph.

At this point, Nicks signal has changed the structure of the game theyre playing: its no longer Friend-or-Foe, its Ultimatum. So the risk Nick is taking now is whether Ibrahim will decide that the ultimatum is so insulting that he should punish Nick by forcing them both to go home with nothing, or whether the promise of 6800 after the show is a credible enough incentive that he should cooperate.

Takeaway: extensive form helps you see how a games structure changes as branches of the decision tree are pruned away40

The big reveal

41

The complete path

SplitSteal


SplitStealSteal


42

Games in the transparent society

43

Strategies now depend on payoff matrix and historyAxelrod, 1981: how well do these strategies perform against each other over time?Ecological tournaments: players abandon bad strategiesRapoport: if the only information you have is how player X interacted with you last time, the best you can do is Tit-for-TatTFT cannot score higher than its opponentAxelrod: Dont be enviousAgainst TFT, no one can do better than cooperateAxelrod: Dont be too cleverIterated games

Axelrods initial tournaments just played strategies against each other 200x and totaled up points at the end. In ecological (or evolutionary) tournaments, each strategys success in the previous round determines how prevalent it is in the current round and cooperative strategies outcompeted non-cooperative ones.

It would be really great if players in the real world abandoned bad strategies as soon as they recognised the strategies werent working, but in practice people are actually pretty bad at recognising this. People are unusually invested in the strategies they choose. Confirmation bias, choice-supportive bias, &c.

Complex inferences just didnt work very well the inferences were usually wrong.

44

Nice: S is a nice strategy iff it will not defect on someone who has not defected on itRetaliatory: S is a retaliatory strategy iff it will defect on someone who defects on itForgiving: S is a forgiving strategy iff it will stop defecting on someone who stops defecting on itproperties

In Axelrods IPD, success i.e., doing the best you can possibly do requires a strategy that satisfies all these properties. Such strategies also outcompete strategies that dont satisfy these properties.

But can we do better than an eye for an eye and a tooth for a tooth? Certainly in the real world there are plenty of people whose modus operandi is moving from victim to victim, opportunistically defecting whenever they think they can get away with it; and remember Bernes games. Are there strategies that can incorporate other information to expose social predators?45

Ord/Blair, 2002: what happens when strategies can take into account all past interactions?We can express strategies in convenient first-order logic, as it turns outTit-for-Tat: D(c, r, p)Tit-for-Two-Tats: D(c, r, p) D(c, r, b(p))Grim: t D(c, r, t)Bully: t D(c, r, t)Spiteful-Bully: t D(c, r, t) s (D(c, r, s) D(c, r, b(s)) D(c, r, b(b(s))))Vigilante: j D(c, j, p)Police: D(c, r, p) j (D(c, j, p) k(D(j, k, b(p)))Societal iterated game theory

c is the column player, r is the row player (ie you); p is the last round, b() is a predecessor function

TFT: Defect on them if they defected on me last round.

TFTT: Defect on them if they defected on me last round and the round before.

Grim: Defect on them if they ever defected on me in the past.

Bully: Defect on them if theyve *never* defected on me in the past. Spiteful-Bully similar, but also defects if its been defected on 3x

Vigilante: Defect on them if they defected on anyone else last round.

Police: Defect on them if they defected on me last round, or if last round they defected on someone who had just cooperated with everyone.Vigilante and Police are peacekeeping strategies: they ignore who someone defected on, only care that they did it46

Evolution is a harsh mistress

Tit-for-Tat

All-Cooperate

Spiteful-Bully

47

peacekeeping

Police

All-Cooperate

Spiteful-Bully

48

In a society, niceness is more nuancedIndividually nice: will not defect on someone who has not defected on itMeta-individually nice: will not defect on individually niceCommunally nice: will not defect on someone who has not defected at allMeta-communally nice: will not defect on communally niceSame applies to forgiveness and retaliationLoyalty: will not defect on the same strategy as itself

Niceness and loyalty

All individually nice strategies are communally nice, but not necessarily vice versa. All individually forgiving strategies are communally forgiving, and all communally retaliatory strategies are individually retaliatory.

Individually retaliatory: defects on someone who defects on it. Communally retaliatory: defects on someone who defects on anyone. Individually forgiving: stops defecting on someone who stops defecting on itCommunally forgiving: stops defecting on someone who stops defecting on everyone

TFT is loyal; if it plays another TFT, theyll cooperate forever. Same for Police, but Vigilante is not loyal Vigilantes will defect on other Vigilantes. TFT is individually nice, retaliatory and forgiving; Vigilante is communally nice, retaliatory and forgiving.49

Peacekeepers dont always agreePolice will defect on Vigilantes and vice versaPeacekeepers protect non-peacekeeping strategies at their own expense

Meta-peacekeeping

Police

All-Cooperate

Spiteful-Bully

Tit-for-Tat

50

Reductio ad absurdum: absolutistt j D(r, j, t) D(c, j, t)

Tit-for-Tat

All-Cooperate

Spiteful-Bully

Absolutist

Absolutist: Defect on c iff c has ever cooperated with someone when you defected, or vice versa.

Absolutist is loyal: it doesnt defect on other Absolutists of its own kind. Note that if you put two groups of Absolutists into a population, theyll defect on each other.

Its also unforgiving: it never stops defecting on someone once its started, like Grim.

Neither individually nice nor communally nice, since it will defect on All-C (cooperated in the past with a defector)

Really only works when theres no noise in players information or actions51

Absolutism uber alles

Tit-for-Tat

All-Cooperate

Spiteful-Bully

Absolutist

52

Reasoning under uncertainty

53

Frequentist: probability is the long-term frequency of eventsReasoning from absolute probabilitiesWhat happens if an event only happens once?Returns an estimateBayesian: probability is a measure of confidence that an event will occurReasoning from relative probabilitiesReturns a probability distribution over outcomesUpdate beliefs (confidence) as new evidence arrivesTwo interpretations of probability

The frequentist perspective operates under the assumption that the long-term absolute probability of an event occurring can be known.The Bayesian interpretation is a subjective one, depending entirely on the information available to the agent.

For a large enough number of samples as evidence accumulates the Bayesian and frequentist interpretations typically converge. But you dont always have all that many samples to choose from.

Really big data problems can be solved by frequentist analysis. But for medium-sized data and really small data, Bayesian analysis performs much better.

A is the parameters, X is the evidence.P(A): prior probability of A. A belief, i.e., a measure of confidence.P(A|X): posterior probability of A, given X the conditional probability of A, based on evidence X.P(X|A): posterior probability of X, given A the likelihood, or the probability of the evidence given the parameters.(Avoiding the post hoc ergo propter hoc fallacy, statistically.)P(X) decomposes to P(X|A)P(A) + P(X|~A)P(~A): the probability that X occurs whether A happens or not54

Probability distribution function: assigns probabilities to outcomesDiscrete: a finite set of values (enumeration)Function also called a probability mass functionPoisson, binomial, Bernoulli, discrete uniformContinuous: arbitrary-precision valuesFunction also called a probability density functionExponential, Gaussian (normal), chi-squared, continuous uniformMixed: both discrete and continuousNarrower distribution = greater certaintydistributions

Probability mass function: gives the probability that a discrete random variable has some particular valuePoisson is basically the bell curve for discrete outcomes; binomial gives the probability of an event occurring over N trials given probability p that it occurs in one trial; Bernoulli is binomial with one trial.

Expected value of Z in the Poisson distribution is equal to its parameter, lambda; in the exponential distribution, its equal to the inverse of the parameter.

Probability density function: gives the probability that a continuous random variable has some particular value; for a range, take the integral of the variables density over that range.

All that we see is Z. We have to estimate lambda, and thats why Bayesian analysis is useful: it gives us useful tools for updating our beliefs about lambda even though we cant see it.

Figuring out the right distribution to use with your data is important. There are a lot of them, useful in different situations, and thats outside the scope of this talk.55

Game theory is great when you know the payoffsWhat can you do if you dont know the payoffs?Or what the game tree looks like?WellYou usually have some educated guesses about who the players areYou have some idea what your possible actions are, as well as the other playersYou can look at past interactions and make inferencesWhich of these can be random variables? All of them.Deterministic: if all inputs are known, value is knownStochastic: even if all inputs are known, still random

You dont know what you dont know

Were treating input here as anything that influences the value of a variable. Deterministic entails decidability.56

Figure out what distribution to useFigure out what parameter you need to estimateFigure out a distribution for it, and any parametersObserving data tells you what your priors areFixing values for stochastic variablesMarkov Chain Monte Carlo: sampling the posterior distribution thousands of times

Dont wait simulate

So youve got some data! What are you going to do with it?Questions to ask yourself when modeling:What am I interested in?What does it look like?What influences it?

Data conditions the values of random variables: the conditional distribution of Y given X is the probability distribution of Y when X is known to be a particular value.

You can keep on assigning distributions to parameters as long as its useful, but if you dont have any strong beliefs about a parameter, this is probably not useful. Pick an average value and let inference update it for you. Or you can also use a uniform distribution for it, and infer what its value is likely to be. Its just another prior, after all.

Monte Carlo simulation: also discovered by John Von Neumann. In normal MC, variables are independent and identically distributed; sample and average. in MCMC, variables can condition each other, conditioning defines the chain. When you combine probabilities, youre reducing the effective volume of your search space; MCMC helps you narrow the search to the areas where youre likely to find values that satisfy the data and the conditions.57

Prerequisites:A Markov chain with an equilibrium distributionA function f proportional to the density of the distribution you care aboutChoose some initial set of values for all variables (state, S)Modify S according to Markov chain state transitionsIf f(S)/f(S) 1, S is more likely than S, so acceptOtherwise, accept S with probability f(S)/f(S)Repeat

Converging on expected values

58

A game without payoffstype Outcome = Measure (Bool, Bool) type Trust = Double type Strategy = Trust -> Bool -> Bool -> Measure Booltit :: Trust -> Bool -> Bool -> Measure Bool tit me True _ = conditioned $ bern 0.9 tit me False _ = conditioned $ bern me

With this definition, the payoffs are completely hidden; all we assume is that the players consider some actions to be cooperating and others to be defecting, and that whether they consider an action to be cooperative or defecting is conditioned on how trusting they are. In this case, a higher value means more paranoid.

If the other player defects on them (the True case), then the probability distribution of this player defecting is a Bernoulli distribution with p = 0.9 this parameter could have been a random variable as well, but for this toy example were fixing its value.

If the other player cooperates, then the probability that this player defects is also a Bernoulli distribution, with p = whatever the players paranoia is.59

Choosing which hole to fill inplay :: Strategy -> Strategy -> (Bool, Bool) -> (Trust, Trust) -> Outcome play strat_a strat_b (last_a,last_b) (a,b) = do a_action Measure Bool allDefect _ _ _ = conditioned $ bern 0.9grimTrigger :: Trust -> Bool -> Bool -> Measure Bool grimTrigger me True False = conditioned $ bern 0.9 grimTrigger me False False = conditioned $ bern 0.1 grimTrigger me _ True = conditioned $ bern 0.9

For Grim Trigger, the fact that weve defected on a previous round tells us that we should continue to defect on that person. Note that were not making this conditional on paranoia.

63

Strategy as a random variabledata SChoice = Tit | GrimTrigger | AllDefect | AllCooperate deriving (Eq, Ord, Enum, Typeable, Show)

chooseStrategy :: SChoice -> Strategy chooseStrategy Tit = tit chooseStrategy AllDefect = allDefect chooseStrategy AllCooperate = allCooperate chooseStrategy GrimTrigger = grimTriggerstrat :: Measure SChoice strat = unconditioned $ categorical [(AllCooperate, 0.25),(AllDefect, 0.25), (GrimTrigger, 0.25), (Tit, 0.25)]

64

Lets play another gameiterated_game2 :: Measure (SChoice, SChoice) iterated_game2 = do let a_initial = False let b_initial = False a

Date post:	21-Apr-2017
Category:	Data & Analytics
Upload:	maradydd
View:	6,395 times
Download:	0 times

Strategies Without Frontiers

Data & Analytics