+ All Categories
Home > Documents > Blackjack & The game of tag

Blackjack & The game of tag

Date post: 17-Jan-2016
Category:
Upload: cecile
View: 45 times
Download: 0 times
Share this document with a friend
Description:
Blackjack & The game of tag. Presented by Leonid Leontiev. Game of tag. Competition, Coevolution and the Game of Tag Craig W. Reynolds Electronic Arts 1450 Fashion Island Boulevard San Mateo, CA 94404 USA telephone: 415-513-7442, fax: 415-571-1893 [email protected] [email protected]. - PowerPoint PPT Presentation
55
Blackjack & The game of tag Presented by Leonid Leontiev
Transcript
Page 1: Blackjack  & The game of tag

Blackjack &

The game of tag

Presented by Leonid Leontiev

Page 2: Blackjack  & The game of tag

Game of tag

Competition, Coevolution and the Game of TagCraig W. Reynolds

Electronic Arts1450 Fashion Island Boulevard

San Mateo, CA 94404 USAtelephone: 415-513-7442, fax: 415-571-1893

[email protected]@red.com

Page 3: Blackjack  & The game of tag

3

Game of tag introduction

• Tag is a children’s game based on symmetrical pursuit and evasion

• Tag is played by two or more, one of whom is designated as “it”

• The it player chases the others, who all try to escape

Page 4: Blackjack  & The game of tag

4

Background

• Tag is intended as a simple model of behavior based on control of locomotion direction, or steering

• Test case to learn about evolving controllers for related, but more complex tasks

• A player’s fitness is determined by how well it performs when placed in competition with several opponents chosen randomly from the coevolving population of players

Page 5: Blackjack  & The game of tag

5

Goals

• Study the use of competitive fitness in the evolution of agent behavior

• Automatically discover a controller through evolution based solely on competition between controllers

• Analyze approach that stands in contrast to evolving controllers by pitting them against a static, predetermined expert strategy

Page 6: Blackjack  & The game of tag

6

History

• 1992 John Koza “Genetic Programming: on the Programming of Computers by Means of Natural Selection”

• 1993 Pete Angeline’s work on coevolution of players for the game of Tic Tac Toe, using competitive fitness

• 1994 Smith R. E. work on coevolution of strategies for the game of Othello

• 1994 Sims, K. “Evolving 3D Morphology and Behavior by Competition”

Page 7: Blackjack  & The game of tag

7

Types of competitive architecture

competitive architecture matches per opponents referencegeneration of n per individual

new versus all (n2-n)/2 n-1 [Koza 1992]new versus several nk k this papersingle elimination n-1 log2 n [Angeline 1993]tournament tree

new versus previous best n 1 [Sims 1994]new versus new n/2 1 [Smith 1994]

Page 8: Blackjack  & The game of tag

8

Experimental Design

• Genetic Programming is used to evolve control programs for simulated vehicles

• No static, predetermined control program

• The vehicles are abstract autonomous agents, moving at constant speed on a two dimensional surface

• Job of control program is to inspect the environment and to compute a steering angle

Page 9: Blackjack  & The game of tag

9

Experimental Design

For each player, at each simulation step:– Its control program is run to determine a steering

angle– The vehicle's heading is altered by this angle– The vehicle is moved a fixed distance along its

new heading– Tags are detected and handled

• The step length is typically 125% longer for “it”

Page 10: Blackjack  & The game of tag

10

Experimental Design

• No simulation of force, mass, acceleration or momentum

• Always two players in a tag game

• The playing field is featureless

• Fitness is defined to be the portion of time (simulation steps) spent not being it

Page 11: Blackjack  & The game of tag

11

Experimental Design

• The entire state of the world consists of: – a flag indicating

who is it– the relative

position of the opponent's vehicle

Page 12: Blackjack  & The game of tag

12

Experimental Design

• Series of 4 games is played• The two players alternate starting as it for

each game of the series• Before each game:

– The players are given random initial headings– Randomly positioned within a starting box

measuring about 3.5 vehicle-body-lengths on a side

• Tag the opponent – getting to within one vehicle length

Page 13: Blackjack  & The game of tag

13

Experimental Design

• Each game

consisted of 25

simulation

steps

• A player's

score for a

game

is the number-of-non-it-steps divided by 25

Page 14: Blackjack  & The game of tag

14

Experimental Design

• To determine a player's fitness, it is pitted against 6 randomly chosen players from the existing population

• Scores from these 24 games are averaged together to obtain the final fitness value

Page 15: Blackjack  & The game of tag

15

Genetic Programming

• Steady State Genetic Programming (SSGP)– choosing two parent programs from the

population– creating a new offspring program from parents

by applying crossover operator and mutation– testing the fitness of the new program – choosing a program to remove from the

population to make room– adding the new program into the population

Page 16: Blackjack  & The game of tag

16

Problems

• Mediocre-but-lucky program may receive undeservedly high fitness and going on to dominate the population

• Competitive fitness values are measured relative to the population at a certain point in time

• Because steady state genetic computation proceeds individual by individual, there is no demarcation of generations.

Page 17: Blackjack  & The game of tag

17

Set of functionsfunction usage description

+ (+ a b) a plus b- (- a b) a minus b* (* a b) a times b% (% a b) if b=0 then 1 else

a divided by bmin (min a b) if a<b then a else b

max (max a b) if a>b then a else babs (abs a) absolute value of a iflte (iflte a b c d) if a <= b then c

else dif-it (if-it a b) if this player is it

then a else blocal-x (local-x) returns x-coordinate

of the opponent playerlocal-y (local-y) returns y-coordinate

of the opponent player

Page 18: Blackjack  & The game of tag

18

Size limitation

• Measured in term of the total number of functions and/or terminals

• When a program size exceeds this limit, the hoist genetic operator [Kinnear 1994] is used to find a smaller (but hopefully still fit) subexpression

Page 19: Blackjack  & The game of tag

19

Results

• These experiments were run on Macintosh Quadra 950 workstations. In this implementation a fitness test consisting of 24 tag games takes 7 to 12 seconds to run, depending on program size.

Page 20: Blackjack  & The game of tag

20

Run A

• A population of 5000 individuals.

• Both players moved at the same speed

• Most popular strategies at the early stage:– Evasion vehicle simply travel in a straight line– Pursuit strategies appear to have been looping

(constant steering angle) and “stumblers” that seemed to move erratically, but managed to creep slowly towards their target.

Page 21: Blackjack  & The game of tag

21

Run A cont.

• Later an improved evasion strategy appeared: if the pursuer is behind you, go straight ahead, otherwise turn randomly (if-it <pursuer-branch> (max 0 (local-y)))

• The pursuers got to be very good at picking off the easy targets, the inefficient evaders

Page 22: Blackjack  & The game of tag

22

Run A cont.

• At the end stage of run A pursuit strategy used a competent but inefficient “three phase” technique

Page 23: Blackjack  & The game of tag

23

Page 24: Blackjack  & The game of tag

24

Run C

• A population of 1000 individuals

• Mutation was added in an attempt to prevent the loss of diversity observed in earlier runs

• Many games consisted of a chase featuring near-optimal pursuit and evasion

Page 25: Blackjack  & The game of tag

25

Fitness of the optimal player placed in competition with the evolving population

Page 26: Blackjack  & The game of tag

26

Histogram of fitness

distribution after 215

generations

Page 27: Blackjack  & The game of tag

27

Run C cont.

• After 215415 individuals were processed (215 generations), there were 4 individuals with the same best fitness value

• One of these was compared to the optimal player in a series of 100 games

• Got a score of 49.3%

Page 28: Blackjack  & The game of tag

28

Page 29: Blackjack  & The game of tag

29

Page 30: Blackjack  & The game of tag

30

Run G

• Did not segregate the pursuer and evader code

• The change seemed to make the problem harder to solve

• Used a larger limit on program size (100)

Page 31: Blackjack  & The game of tag

31

Fitness of the optimal player placed in competition with the evolving population

Page 32: Blackjack  & The game of tag

32

Run G cont.

• Individual 113520 was the best of population

• The program size is 98

• Many strange behavioral traits– Pursuit behavior has a reasonable two phase

strategy for opponents up to 5 units ahead but is very inept for opponents further away

– The evasion behavior is strongly asymmetrical

Page 33: Blackjack  & The game of tag

33

Page 34: Blackjack  & The game of tag

34

Individual 113520 code

(% (% (if-it (abs (local-x)) (iflte (iflte (local-x) 0.57168305 (local-x)

(+ (iflte (local-y) (iflte (local-y) (if-it (local-x) (abs (local-x))) (iflte 0.40530929 0.26004231 (abs (local-x)) (local-y)) (if-it 0.40530929 0.57168305)) (min (abs (local-x)) (+ (local-x) (localx))) (local-x)) (local-x))) 0.57168305 (local-x) (+ (iflte (local-y) (iflte (local-y) (if-it (local-x) (local-x)) (iflte 0.40530929 (local-x) (abs (iflte (local-x) 0.37254661 0.32281655 (local-x))) (local-x)) (if-it 0.40530929 (abs (local-x)))) (min 0.1637349 (iflte (local-x) (local-y) (abs (iflte (abs (local-x)) (max (ifit (local-y) (abs 0.53183758)) (local-x)) 0.32281655 (local-x))) 0.53183758)) (local-x)) (local-x)))) (+ (local-x) (local-x)))

(iflte (- (abs 0.53183758) (if-it (% 0.57168305 (local-y)) (- 0.1637349 (local-y)))) 0.40530929 (abs 0.53183758) 0.83426005))

Page 35: Blackjack  & The game of tag

35

Conclusions• Using the game of tag to test relative fitness,

artificial evolution was able to discover skillful, near-optimal tag players

• Good results were obtained despite the random selection of opponents and based only on relative performance fitness

• The population’s average performance was within 10% of the optimal player, and the best of population individual performed within a few percentage points of optimal (in run C)

Page 36: Blackjack  & The game of tag

36

Conclusions• The quality of evolved players approached,

but did not reach, that of the optimal player

• Possible reasons:– Fundamental limitation of competitive fitness– Flaw in the experimental design– Limitations of genetic population size and length

of runs

Page 37: Blackjack  & The game of tag

BlackjackEvolving Strategies in Blackjack

David B. Fogel Natural Selection, Inc.

3333 N. Torrey Pines Ct., Suite 200 La Jolla, CA 92037 USA

[email protected]

Page 38: Blackjack  & The game of tag

38

Background on blackjack

• Blackjack also known as 21 • Player or players compete against the dealer

or “house.” • The rules vary by casino, and even by

country.• The variations are insignificant, but affect the

potential profitability of player strategies.

Page 39: Blackjack  & The game of tag

39

Blackjack rules

• The dealer and each player receives two cards. The dealer turns the first of his cards face up and the other remains face down.

• The object is to come as close to 21 as possible without going “busted.”

• Each card is counted as its face value, • Face cards counting 10• Aces being counted as1 or 11

Page 40: Blackjack  & The game of tag

40

Blackjack rules

Page 41: Blackjack  & The game of tag

41

Blackjack rules

• If the first two cards dealt to the player yield 21, this is called

“blackjack”• If the dealer’s up card is an

ace, the player may purchase “insurance” for half the amount of the player’s wager. If the dealer has blackjack, the player wins 2:1

Page 42: Blackjack  & The game of tag

42

• If the player has two cards of equal denomination on the deal, he may split the cards into two new hands.

• Also, on the initial deal, when the player has two cards, he has the option of “doubling down”

• If the player goes over 21, he busts and immediately loses his wager.

• If the player stands at a value less than or equal to 21, the play proceeds to the dealer,

Blackjack rules

Page 43: Blackjack  & The game of tag

43

History

• The intelligent player can win consistently at blackjack by “counting cards,” using the history of which cards have been played

• 1956 – “The Optimum Strategy in Blackjack”, Dr. Roger Baldwin

• 1962 – “Beat the Dealer”, Dr. Edward Thorp

Page 44: Blackjack  & The game of tag

44

History

• Thorp analyzed the player advantage, using his basic strategy, when the (single) deck contained all 16 tens, and when a number of the tens were removed. +0.13% advantage with all 16 tens −1.85% disadvantage with 12 tens −3.13% disadvantage with 8 tens −2.14% disadvantage with 4 tens +1.62% advantage when no ten remained.

• No linear relationship between the number of tens and the player’s advantage or disadvantage.

Page 45: Blackjack  & The game of tag

45

Basic strategy

• The player makes the same play in the same setting without respect to which cards have been played in prior hands

• If the player mimicked the dealers rules, the player faced a disadvantage between -5.56% and -6.78% with 95% confidence

Page 46: Blackjack  & The game of tag

46

Basic Startegies

-1.00%

-0.50%

0.00%

0.50%

Ad

van

tag

e

AVERAGE MIN MAX

AVERAGE -0.02 -0.41 -0.02 -0.44 -0.03 -0.43 0.06 -0.43 -0.02 -0.71

MIN -0.14 -0.51 -0.14 -0.55 -0.14 -0.55 -0.06 -0.55 -0.14 -0.82

MAX 0.09 -0.33 0.10 -0.32 0.09 -0.32 0.17 -0.32 0.10 -0.59

1-Deck4-Decks:1-Deck4-Decks:1-Deck:4-Decks:1-Deck:4-Decks:1-Deck:4-Decks:

Thorp Revere Archer Gollehon Patterson

Page 47: Blackjack  & The game of tag

47

Counting strategy

• Computer simulation has shown that the player can have an advantage over the house by altering his strategy based on the distribution of cards played in prior hands

• Player advantage after removing all of the cards of a given rank

• The most significant single card is the 5

Page 48: Blackjack  & The game of tag

48

Counting strategy

Type of Missing Advantage %Card PlayerAces ............................ -2.42Twos ........................... +1.75Threes ......................... +2.14Fours ........................... +2.64Fives ........................... +3.58Sixes ........................... +2.40Sevens ........................ +2.05Eights .......................... +0.43Nines ........................... -0.41Tens ............................ +1.62

Page 49: Blackjack  & The game of tag

49

Evolving Basic Strategies

• Starting with Gollehon's basic strategy and three random variants of the strategy

• 3 million simulated hands on a single deck, reshuffling after 2/3 of the deck had been played

• Strategies were represented as entries in matrices describing decisions

Page 50: Blackjack  & The game of tag

50

Strategy representation example

Page 51: Blackjack  & The game of tag

51

Evolving Basic Strategies

• Simple mutation was used to create an offspring from each parent, altering multiple entries in the strategy

• 10 generations on a one-deck game

• 10 more generations on a four-deck game

• Each generation of evolution required just less than three days on the Macintosh SE

Page 52: Blackjack  & The game of tag

52

Basic Strategies

-1.00%

-0.80%

-0.60%

-0.40%

-0.20%

0.00%

0.20%

0.40%

Ad

va

nta

ge

AVERAGE MIN MAX

AVERAGE -0.02 -0.41 -0.02 -0.44 -0.03 -0.43 0.06 -0.43 -0.02 -0.71 0.22 -0.25

MIN -0.14 -0.51 -0.14 -0.55 -0.14 -0.55 -0.06 -0.55 -0.14 -0.82 0.10 -0.36

MAX 0.09 -0.33 0.10 -0.32 0.09 -0.32 0.17 -0.32 0.10 -0.59 0.33 -0.13

1-Deck4-Decks:1-Deck4-Decks:1-Deck:4-Decks:1-Deck:4-Decks:1-Deck:4-Decks:1-Deck:4-Decks:

Thorp Revere Archer Gollehon Patterson EA

Page 53: Blackjack  & The game of tag

53

Evolving Counting Strategies

• The best-evolved basic strategy and two random variants of this strategy were further evolved

• 400,000 simulated hands, increased over time to speed the process

• 50 generations were executed using the plus-minus counting framework

Page 54: Blackjack  & The game of tag

54

Counting Strategies

-1.00%

0.00%

1.00%

2.00%

3.00%

AVERAGE MIN MAX

AVERAGE 1.82% 0.93% 0.31% -0.12% 0.56% -0.21%

MIN 1.65% 0.77% 0.16% -0.25% 0.32% -0.45%

MAX 1.99% 1.08% 0.45% 0.02% 0.80% 0.03%

1-Deck 2-Decks: 4-Decks: 8-Decks: 1-Deck 4-Decks:

EA Patterson and Olsen

Page 55: Blackjack  & The game of tag

55

Conclusions

• The standard blackjack strategies rely on a separate analysis of the correct play in each of a set of various possible situations

• Each of these situations is viewed independently

• It is reasonable to view the challenge of finding optimum strategies as a nonlinear problem that should be addressed by lifelike simulation of sequences of hands played until a deck or decks is/are reshuffled.


Recommended