Faculty of Mathematics and Information Science Warsaw...

transcript

Towards cognitively-plausible

game playing systems

Jacek Mańdziuk

Faculty of Mathematics and Information ScienceWarsaw University of Technology

Yonsei University, Seoul, Korea, 29/03/2011

Agenda

• Brief introduction to board games

• State-of-the-art achievements in boardgames Quo vadis board game research?

• Cognitively-plausible game playing systems

• Intuitive playing – a real challenge

• Examples of intuitively playing systems

• Conclusions

• Associate Professor, Warsaw University of Technology in Poland

• Knowledge-free and learning-based methodsin solving problems

• Application to games, financial modeling, bioinformatics, optimization

About me

Board games - basics

BF space length tree size

Chess 35 10^50 80 10^123

Checkers 3 5x10^20 70 10^31

Othello 7 10^28 60 10^50

Go (Baduk) 250 10^172 150 10^360

Basic comparison of the

most popular board games

• Perfect-information (vs. imperfect)

• Two-player (vs. multi-player)

• Zero-sum (vs. expected payoff)

• Deterministic (vs. non-deterministic)

• Turn-based (vs. simultaneously-played)

Minimax algorithm

von Neumann i Morgenstern (1944)

Evaluation function

Claude Shannon (1950) – linear weighted evaluation function;

algoritms of type A and type B;

for convenience:

ii xwwsVwhere

otherwisewsV

lossaissifMIN

winaissifMAX

)tanh(),(,1,11

ii xwbwsVMAXMIN

• Environments

• popular

• repeatable

• cheap

• Used for benchmarking

• heuristic search methods

• methods of efficient problem representation

• Social and mental aspect of games

Why AI cares about games?

• The Turk (Baron Wolfgang von Kempelen, 1769) –

Empress Catherina II, Napoleon, …

State-of-the-art

board game playing systems

• Torres y Quevedo (around 1890), KR vs. K

• Alan Turing (1953) – the first chess playing program

Chess• …

• Deep Blue II (1997 - rematch)

• Evaluation function composed of 8000 features

• Manually adjusted weights in the evaluation

function (depending on particular opponent)

• Parallel implementation based on 480 specialized

chess chips

• 30-node cluster

• 100-330 million positions /sek. (50 billion/3 min.)

• NegaScout + TT + Iterative Deep. + Quis. Search

• Extended opening and endgame databases

• Deep Fritz,

• Junior

• Shredder

• Hydra

• Rybka (Vasik Rajlich and Iweta Rajlich)

• Above 3100 ELO on a 4 CPU PC

• Close to 3000 ELO on a single-core PC

Checkers

• A.L. Samuel (1959)

• Evaluation function composed of 22 elements

• Actually it was the first implementation of TD method

• Chinook (1994)

• 4 phases of the game ( 4 eval. functions), 25 basic

features

• Each features depends on several detailed parameters

• Dr Marion Tinsley, lost only 7 games in his 40-year

career as a world champion (including 2 to Chinook)

Checkers…

• The game was solved (Science, 2007, Schaeffer et al.)

• Complete 10-piece endings database

• Partly relying on brute force

• Checkers is a draw game

Othello• Logistello (1997) – M. Buro

• Logistello vs. Takeshi Murakami (6:0)

• Convincing win.

Go (Baduk)

• One of the last strongholds of human supremacy

• MoGo, Many Faces of Go, Fuego, CrazyStone, GoIntellect,

Indigo, Golois, …

• Size of the board

• High branching factor

• Additive nature of the game

• Serious problems with analytical construction of effective

evaluation function for the middle-game phase

(the assessment of end-game positions is relatively easy).

• Due to variety of positional and tactical threats it is highly

probable that „no simple yet reasonable evaluation function

will ever be found for Go” – M. Muller

Go – rollout (MC) simulations• Until the stoping condition is not fulfilled (there is still time):

• Start in the root (current node in the game tree)

• Until the end-game position is not reached:

• Choose move according to some policy

• Simulate execution of the move

• Assess the game (in end-game position)

• Update all moves on the played path in a game tree

according to the end-game score.

• Policy: UCT and its variations (exploration vs. exploitation)

• MoGo (Gelly et al.) in 2008 won for the first time a game

against professional (dan) player (5d)

Quo vadis

board game research?

Quo vadis?

• Practically there are no chances of catching up with

machines in chess, checkers, Othello, and many

other games.

• Baduk reamains the last stronghold. For how long?

• What can we really gain from developing more and

more efficient programs running on faster and faster

hardware?

Human-type approach – a challenge

Mimicking human approach to game playing

in all its major aspects.

• Automatic feature selection for the evaluation function

• Modeling the opponent’s style of play.

• Learning from scratch (knowledge-free methods).

• Autonomous pattern-based knowledge acquisition.

• Highly selective search (efficient move preselection)

Anaconda (Blondie24) – neuro-evolution: checkers;

TD-Gammon - TD-learning (self-play mode): Backgammon,

Grand challenges

• Intuition

• Creativity („to bring into form or being out of nothing”)

• Backgammon (new opening)

• Blondie24 (Anaconda)

• Morph

• Zenith

• Multigame playing (universal, game-independent learning)

• SAL

• Hoyle and METAGAMMER

• General Game Playing contest (Cluneplayer, Cadia, Ary)

Intuitive playing – some insights

Quoting Albert Einstein

The intuitive mind is a sacred gift and the rational mind

is a faithful servant. We have created a society that

honors the servant and has forgotten the gift.

(Albert Einstein, 1931)

Facets of human-type intuition

• Instantaneous qualitative estimation of a game position

• Instantaneous focus on relevant regions of the game board

• Efficient move pre-ordering

• Search-free playing

• Focus on goal /plans rather than particular moves

• Ability to make long-term material sacrifice in order to

gain some positional advantage (without precise

verification of all consequences – possible paths of play)

Example of intuitive play

Immortal game, Anderssen vs. Kieseritzky, London 1851

11. Wg1 … 17. … Hxb2 23. Ge7++

Intuition as a „side effect” of „perfect” play

Deep Blue II vs. Kasparov (NY, 1997, game 2)

After 36...axb5 Deep Blue played „deeply strategic”,

intuitive move 37. Ge4!!, despite obvious 37. Hb6.

Kasparov acused Deep Blue team of cheating!

Intuition – goals and plans of play

Wilkins, 1980

Intuition – neurobiological foundations

• „Monitoring” the decision-making process of human

players:

• How game positions are perceived (mainly in chess)

• Activity of brain regions – fMRI technique (chess and Go)

• Research on perception abilities [de Groot, 1965]

• A chess position composed of about 25 pieces

• 3-15 seconds

• Reconstruction rate: 93% grandmasters, gradually decreasing

along with decrease of the level of play

• Perception [Chase and Simon, 1973]

• Confirmed de Groot’s conclusions

• Short 5 second exposure

• Much higher capabilities of grandmasters than intermediate and

novice players, but only in the case of sensible positions.

• Not confirmed in the case of random positions (!)

• this ability is neither related to specific memory skills nor to

their training

• it is the effect of internal position representation in the form

ofchunks of information/templates with associated moves),

which they use to remember and categorize positions

• Templates library of grandmasters is composed of around 300K

elements.

• Saccadic eye movement [de Groot and Gobet, 2002]

• GM – concentration on edges, weaker players on squares

• GM – greater range (the average distance between consecutive

fixations is greater in case of GM)

• strong players decode 2-3 pieces (part of a template?)

within a single fixation; weaker ones – single element

• Analysis of the first 5 fixations in the problem of finding the best

move in a given position:

• strong players concentrated on relevant pieces/squares

more often than weak players

• Check status detection [Reingold and Charness, 2005]

• Positions on small 3x3 or 5x5 boards, king and 1-2 opponent’s

pieces (potentially attaking the king)

• One of the pieces can be cued

• GM – the same average time, weak players – shorter time in

case of one opponent’s piece or when the cued piece is the

attacking one

• confirmation of suggestions about parallel analysis

performed by GM which include a few pieces comprising

meaningful chunk or part of it.

• Functional Magnetic Resonance Imaging (fMRI)

• Go players compared with chess players

• Amateur players (know the rules, play occassionally)

• Chess: empty board (focus on the center) vs. position with

improperly places pieces (a few of them marked with a star – point

them out) vs. appropriate position with about 25 pieces (point out

the next move)

• Go: empty board (focus on the center) vs. position with improperly

places pieces (point out 6 marked stones) vs. appropriate position

with about 25 pieces (point out the next move)

• Positions are shown in circle, each exposure equals 30 sec.

• For each type of position different neuronal activity

patterns in players’ brains are observed.

• Some differences between Go and chess

• In the case of real Go positions much higher

involvement of the right hemisphere (responsible for

spatail relations)

• In the case of real chess positions: greater involvement

of „analytical” (left) hemisphere - QUESTIONABLE

Examples of intuitively game

playing systems

Trajectories of solutions

Pioneer Project and Linguistic Geometry

• Pioneer Project (USSR, 1960-ties-1970-teis)

• Michaił Botvinnik and his collaborators

• Working on search methods that significantly narrow the

game tree

• Goal: abstraction of relevant position features, and using

them so as to find „promissing” trajectories (solution

candidates) instead of applying laborious, systematic

search similar to human players

• Continuation and generalization of this research in LG project

(mainly with military applications secret)

Abstraction and generalization of game features

Reti ending. White to begin and draw.

How about machine solving the problem on a 100 x 100 board?

Intuition – trajectories of solutions

d=6:Full tree: 10^6 nodesPioneer: 54 nodes, av. bf.=1.68

Intuitive chess playing – SYLPH

• SYLPH [Finkelstein and Markovitz, 1998]

• Extension of Morph (Levinson) and Chump (Gobet).

• Relies on move patterns (game position + played move).

• Game positions are represented as hipergraphs: nodes

refer to squares and edges to relations between them

(including empty squares).

• Relations: a piece controls a square, a piece attacts

directly (indirectly) another piece/square; double attack,

• More complicated relations – involving up to 4-5 nodes

(pieces/squares).

• Learning with a teacher

• a human,

• GNU Chess,

• copy of itself.

• Material patterns (ref. to capture moves): assigned weights

equal to the difference in material.

• Positional patterns (non-capture moves): weight proportional

to the frequency of using that move in played games.

• Augmentation process: observation of games played by two

copies of GNU Chess or analysis of grandmaster games from

game repository.

• SYLPH was equipped with the rules of chess

• 100 games against GNU Chess + augmentation: 50

self-played games by GNU Chess 4614 patterns

• Test (of intuitive play): find the best move in a given position

without search.

• Filter_tk: 1<= k <=10: selects k (best) moves.

• Filter_gf: 0.1<= f <=1.0: selects fraction f of (best) moves.

• A success = the best move is inculded in the selected set.

• A few thousand test positions generated from games played

between two copies of GNU Chess.

• For k=4, the efficiency equals 0,557

• For f=1/3, SYLPH was superior to alpha-beta with d=4 (!) and

material-based evaluation function.

• On 3830 positions extracted from 100 games of M. Tal

(previously unknown to the system) similar results to those

on the above test on GNU Chess positions good

generalization properties.

• No retention mechanism

Summary of the main facets

of cognitive playing

The main facets of cognitively-plausible

board game playing systems

• Allen Newell (1990) introduced several criteria that

define cognitive systems, which refer mainly to their

behavioral, learning, and knowledge-related

properties.

• Duch et al. (2008) proposed a simplified taxonomy of

cognitive systems based on two main criteria:

• LEARNING

• MEMORY

Learning-related postulates

• P1: Learning should be implemented as an incremental

development process.

• P2: Learning should be implemented in a parallel

(multitasking mode).

• P3: Learning system should be capable of suitable

decomposition of game patterns into meaningful

subpatterns without the need for external intervention.

• P4: Learning process should be performed on several

levels of detail.

Memory-related postulates

• P5: Game-related concepts may be effectively

represented and processed with the use of

pattern-based representation.

• P6: Knowledge acquired in the learning process

should be represented in a hierarchical structure

with various inter- and intra-level connections.

• P7: Acquisition of knowledge as well as its further

processing in the system should take into account

symmetries existing in the game.

General conclusions

• Argumentation for potential virtues of cognitively-plausible,

pattern-based playing systems. Several underlying concepts

which in our opinion should of such systems were presented.

• Certainly, the cognitively-plausible methods by no means are

competitive to established AI approaches, but efficacy should

not be the sole goal of game research.

Thank you for your attention!

J. Mańdziuk, Knowledge-Free and Learning-Based Methods inIntelligent Game Playing, Studies in Computational Intelligence, vol. 276, Springer-Verlag, 2010

J. Mańdziuk, Towards cognitively-plausible game playing systems,IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 6(2), MAY 2011, IN PRESS

Faculty of Mathematics and Information Science Warsaw...

Documents