+ All Categories
Home > Documents > Machine Learning of Player Strategy in -...

Machine Learning of Player Strategy in -...

Date post: 16-Feb-2019
Category:
Upload: lamdieu
View: 215 times
Download: 0 times
Share this document with a friend
119
Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer Science with Mathematics May 6, 2010 Supervisor: Dr. J. Bryson This dissertation may be made available for consultation within the University Library and may be photocopied or lent to other libraries for the purposes of consultation.
Transcript
Page 1: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Machine Learning of Player Strategy inGames

Thomas Fletcher

BSc in Computer Science with Mathematics

May 6, 2010

Supervisor: Dr. J. Bryson

This dissertation may be made available forconsultation within the University Libraryand may be photocopied or lent to other

libraries for the purposes of consultation.

Page 2: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Machine Learning of Player Strategy inGames

Submitted by: Thomas Fletcher

CopyrightAttention is drawn to the fact that copyright of this dissertation rests with

its author. The Intellectual Property Rights of the products produced as partof the project belong to the University of Bath (see http://www.bath.ac.uk/ordinances/#intelprop). This copy of the dissertation has been supplied oncondition that anyone who consults it is understood to recognise that its copy-right rests with its author and that no quotation from the dissertation and noinformation derived from it may be published without the prior written consentof the author.

DeclarationThis dissertation is submitted to the University of Bath in accordance with

the requirements of the degree of Bachelor of Science in the Department of Com-puter Science. No portion of the work in this dissertation has been submittedin support of an application for any other degree or qualification of this or anyother university or institution of learning. Except where specifically acknowl-edged, it is the work of the author.

.........................................

Page 3: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Abstract

In this project I explore the use of machine learning techniques and classifiers tothe games of ‘Noughts and Crosses’ and the ancient Chinese game ‘Go’. NaiveBayes Classifiers and other techniques are used to classify player strategy fromthe boards. First computer players are considered and identified with a highlevel of accuracy, then the same techniques are applied to games played between2 human players, achieving around 30% accuracy when classifying between 24different players.

Page 4: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Contents

1 Introduction 71.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.1 Gathering of data and strategies . . . . . . . . . . . . . . 81.3.2 Playing the game . . . . . . . . . . . . . . . . . . . . . . . 81.3.3 Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Background Knowledge 102.1 Game Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Algorithms for Playing Games . . . . . . . . . . . . . . . . . . . 112.3 Learning and Inference in Humans . . . . . . . . . . . . . . . . . 122.4 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4.1 Probability & Bayes . . . . . . . . . . . . . . . . . . . . . 122.4.2 Strategy Categorisation . . . . . . . . . . . . . . . . . . . 142.4.3 Data Mining & Classification . . . . . . . . . . . . . . . . 14

2.4.3.1 Naive Bayes Classifier . . . . . . . . . . . . . . . 142.4.3.2 Bayesian Belief Network . . . . . . . . . . . . . . 162.4.3.3 Adapting Weights . . . . . . . . . . . . . . . . . 18

2.5 Other Machine Learning Techniques Used . . . . . . . . . . . . . 182.5.1 K-fold Cross-Validation . . . . . . . . . . . . . . . . . . . 192.5.2 Selection of Attributes . . . . . . . . . . . . . . . . . . . . 192.5.3 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.6 Software Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6.1 RapidMiner . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6.2 Other Development Tools . . . . . . . . . . . . . . . . . . 22

3 Noughts and Crosses 233.1 Game Description . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Machine Learning techniques applied to Completed Games . . . 24

3.2.1 Player Strategy . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 Attributes to store . . . . . . . . . . . . . . . . . . . . . . 243.2.3 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . 253.2.4 Model Generation . . . . . . . . . . . . . . . . . . . . . . 263.2.5 Analysis Of Errors . . . . . . . . . . . . . . . . . . . . . . 28

3.2.5.1 Identical moves from Different Players . . . . . . 283.2.5.2 Naive Independence Assumption . . . . . . . . . 29

1

Page 5: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

3.2.6 Improving Classification through Attribute Selection . . . 293.3 Games in Progress . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . 303.3.2 Model Generation . . . . . . . . . . . . . . . . . . . . . . 303.3.3 Interactive Play . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Go 324.1 Game Description . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.1 Game Rules . . . . . . . . . . . . . . . . . . . . . . . . . . 324.1.2 Basic Strategy . . . . . . . . . . . . . . . . . . . . . . . . 34

4.1.2.1 Opening (Fuseki) . . . . . . . . . . . . . . . . . 344.1.2.2 Midgame . . . . . . . . . . . . . . . . . . . . . . 344.1.2.3 Endgame . . . . . . . . . . . . . . . . . . . . . . 36

4.1.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2.1 GoGUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.2 Smart Game Format . . . . . . . . . . . . . . . . . . . . . 39

4.3 Machine Learning techniques applied to Completed Games of Go 394.3.1 Player Strategy . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3.1.1 Aya . . . . . . . . . . . . . . . . . . . . . . . . . 394.3.1.2 GNU Go . . . . . . . . . . . . . . . . . . . . . . 404.3.1.3 MoGo . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3.2 Training Games . . . . . . . . . . . . . . . . . . . . . . . . 404.4 Board Positions (First Attempt) . . . . . . . . . . . . . . . . . . 41

4.4.1 Attributes to store . . . . . . . . . . . . . . . . . . . . . . 414.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.4.2.1 Simple Naive Bayes . . . . . . . . . . . . . . . . 414.4.2.2 Forward Selection . . . . . . . . . . . . . . . . . 41

4.4.3 Analysis of results . . . . . . . . . . . . . . . . . . . . . . 434.5 Board Positions — Unbiased . . . . . . . . . . . . . . . . . . . . 45

4.5.1 Attributes to store . . . . . . . . . . . . . . . . . . . . . . 454.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.5.2.1 Simple Naive Bayes . . . . . . . . . . . . . . . . 454.5.2.2 Forward Selection . . . . . . . . . . . . . . . . . 454.5.2.3 ADA Boost . . . . . . . . . . . . . . . . . . . . . 45

4.5.3 Analysis Of Results . . . . . . . . . . . . . . . . . . . . . 484.5.3.1 GNU Go 1 and GNU Go 10 . . . . . . . . . . . 484.5.3.2 MoGo . . . . . . . . . . . . . . . . . . . . . . . . 494.5.3.3 High Accuracy . . . . . . . . . . . . . . . . . . . 494.5.3.4 Forward Selection . . . . . . . . . . . . . . . . . 49

4.6 Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.6.1 Attributes to Store . . . . . . . . . . . . . . . . . . . . . . 514.6.2 Inbuilt GoGUI Scoring Engine . . . . . . . . . . . . . . . 534.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.6.3.1 Simple Naive Bayes . . . . . . . . . . . . . . . . 534.6.3.2 Forward Selection . . . . . . . . . . . . . . . . . 534.6.3.3 ADA Boost . . . . . . . . . . . . . . . . . . . . . 54

4.6.4 Analysis of Results . . . . . . . . . . . . . . . . . . . . . . 574.6.4.1 Comparison with board attributes set . . . . . . 574.6.4.2 Main Source Of Errors . . . . . . . . . . . . . . 57

Page 6: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

4.6.4.3 Forward Selection . . . . . . . . . . . . . . . . . 574.6.4.4 AdaBoost . . . . . . . . . . . . . . . . . . . . . . 58

4.7 Combined Boards & Scores . . . . . . . . . . . . . . . . . . . . . 584.7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.7.1.1 Simple Naive Bayes . . . . . . . . . . . . . . . . 614.7.1.2 Forward Selection . . . . . . . . . . . . . . . . . 614.7.1.3 ADA Boost . . . . . . . . . . . . . . . . . . . . . 614.7.1.4 Aya, MoGo and GNU Go 1 only . . . . . . . . . 654.7.1.5 GNU Go players only . . . . . . . . . . . . . . . 65

4.7.2 Analysis of Results . . . . . . . . . . . . . . . . . . . . . . 674.7.2.1 Combined Scores and Boards . . . . . . . . . . . 674.7.2.2 Forward Selection . . . . . . . . . . . . . . . . . 674.7.2.3 GNU Go confusion . . . . . . . . . . . . . . . . . 67

4.8 Better Classifiers for GNU Go . . . . . . . . . . . . . . . . . . . . 68

5 Go & Humans 715.1 Human Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2 Classifying Humans as Computers . . . . . . . . . . . . . . . . . 715.3 Classifying Humans as Humans . . . . . . . . . . . . . . . . . . . 74

5.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.3.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.3.2.1 Comparison with Computer Players . . . . . . . 745.3.2.2 Forward Selection . . . . . . . . . . . . . . . . . 75

5.4 Classifying Computers as Humans . . . . . . . . . . . . . . . . . 76

6 Discussion of Implementation 786.1 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 786.2 Additional Computer Players . . . . . . . . . . . . . . . . . . . . 796.3 Large Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806.4 Model Generation Time . . . . . . . . . . . . . . . . . . . . . . . 806.5 Noughts and Crosses: PerfectPlayer . . . . . . . . . . . . . . . . 816.6 Programming Style . . . . . . . . . . . . . . . . . . . . . . . . . . 826.7 Modifications Made and Coding Done . . . . . . . . . . . . . . . 83

6.7.1 Noughts and Crosses implementation . . . . . . . . . . . . 836.7.2 Modifications made to GoGUI . . . . . . . . . . . . . . . 836.7.3 BugFixes and Modifications made to RapidMiner . . . . . 83

7 Conclusion & Future Work 847.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.2.1 Additional Attributes . . . . . . . . . . . . . . . . . . . . 857.2.2 More Advanced Classifier . . . . . . . . . . . . . . . . . . 867.2.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 867.2.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Bibliography 88

Appendices 93

Page 7: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

A Source Code 93A.1 Noughts and Crosses . . . . . . . . . . . . . . . . . . . . . . . . . 93A.2 GoGui Modifications . . . . . . . . . . . . . . . . . . . . . . . . . 106

Page 8: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

List of Figures

2.1 Example Bayesian Network . . . . . . . . . . . . . . . . . . . . . 202.2 Example of overfitting a line . . . . . . . . . . . . . . . . . . . . . 20

4.1 Stone Efficiency in Opening . . . . . . . . . . . . . . . . . . . . . 354.2 Example Of a Ladder . . . . . . . . . . . . . . . . . . . . . . . . 374.3 Example Of a Net . . . . . . . . . . . . . . . . . . . . . . . . . . 374.4 Distribution graph from generated model for attribute 10_Q16 . 444.5 Distribution graph from unbiased generated model for attribute

10_Q16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.6 Intersections selected by Forward Selection . . . . . . . . . . . . 504.7 Scores Forward Selection Progress Graph . . . . . . . . . . . . . 544.8 First classifier distribution . . . . . . . . . . . . . . . . . . . . . . 594.9 Second classififer distribution . . . . . . . . . . . . . . . . . . . . 604.10 Boards & Scores Forward Selection Progress Graph . . . . . . . . 624.11 Distribution Graph for 300_capturedWhite . . . . . . . . . . . . 69

5

Page 9: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

List of Tables

3.1 Results of completed-game Noughts and Crosses classifier . . . . 273.2 Results of completed-game selected attribute Noughts and Crosses

classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1 Results of board-position-only Go classifier . . . . . . . . . . . . 424.2 Results of board-position-only Go classifier after Forward Selection 424.3 Games played between players . . . . . . . . . . . . . . . . . . . 444.4 Computer Player winners . . . . . . . . . . . . . . . . . . . . . . 444.5 Results of board-position-only Go classifier . . . . . . . . . . . . 464.6 Results of board-position-only Go classifier after Forward Selection 464.7 Results of board-position-only Go classifier after Boosting . . . . 474.8 Results of scores-only Go classifier . . . . . . . . . . . . . . . . . 554.9 Results of scores-only Go classifier after Forward Selection . . . . 554.10 Results of scores-only Go classifier after Boosting . . . . . . . . . 564.11 Weights in AdaBoost Model . . . . . . . . . . . . . . . . . . . . . 594.12 Results of scores and boards Go classifier . . . . . . . . . . . . . 634.13 Results of scores and boards Go classifier using Forward Selection 634.14 Results of scores and boards Go classifier using ADABoost . . . 644.15 Results of scores and boards Go classifier between Aya, MoGo

and GNU Go 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.16 Results of scores and boards Go classifier between GNU Go 1 and

GNU Go 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.1 Games played between human players . . . . . . . . . . . . . . . 725.2 Human Games Classified as Computer Players . . . . . . . . . . 735.3 Computer Games Classified as Human Players . . . . . . . . . . 76

6

Page 10: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Chapter 1

Introduction

1.1 Overview

Finding strategies in games is an activity that goes back as long as peoplehave been playing games themselves. A major part of strategy in many gamesinvolves guessing an opponent’s strategy so that any weaknesses can be exploitedand their future moves can be predicted more accurately. I aim to produce asystem that will analyse an opponent’s game playing technique by observingthem playing a game and will then attempt to identify other games played bythis player— recognising a player by their strategy. Much previous research hasbeen focused on how to produce a provably optimal strategy (see section 2.1),however a strategy not based on being provably optimal could be used to winor play in a game where a provably optimal strategy is not known.

1.2 Document Structure

The structure of this document chiefly represents the order that I performedthe work. This report begins with the original plan for the project, then looksat related previous work with a review of classifiers and machine learning tech-niques. The use of these techniques is then explored using the simple gameof Noughts and Crosses to perform classification of several strategies, with ahigh accuracy. These techniques are then applied to the more complex gameof Go, and a classifier which could identify several different computer players isdeveloped. Lastly, the same techniques are applied to Go games played between24 different human players on an online Go server. The final chapters of thisreport detail problems I encountered and their solutions, and possible futurework which could be taken to develop this technique further.

7

Page 11: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

1.3 Plan

1.3.1 Gathering of data and strategies

For many games, there is already a large corpus of games available which couldbe used to train the system. The advantage of using this data is that it comesfrom a large variety of sources, and is likely to be more varied and include manypossible game strategies. Using this as training data would reduce the chanceof the system coming across a strategy it had never seen before, and wouldmean that it was more likely to perform well against all kinds of strategies.The disadvantage is that a corpus of games is unlikely to come with strategiesincluded, and this would have to be classified manually by a human (i.e. me).This would take a large amount of time and is likely to be very error prone asidentifying exactly what strategy the player was using requires consideration ofmany moves and will never be 100% certain (hence the reason we are trying todesign a system to do this classification automatically!).

An alternative method of collecting game-data would be to generate it. Sincereasonable computer players exist for most common games, it should be possibleto adapt some of them (or write my own) to play with a specific strategy, andsimply log these games. This should allow a large number of games to be played,and a lot of data collected in a (comparatively) short time. The strategies whichboth players were using could also be logged to provide training data includingthe target classes for the classifier. Automated game logs should also be noise-free, whereas human generated data would be likely to contain at least somenoise as players use the wrong strategy or subconsciously select moves based onother criteria than that which the strategy dictates.

1.3.2 Playing the game

Once the opponent’s strategy has been decided, the system will then need tobe able to take advantage of this knowledge. We can take advantage of thetraining data again by considering what strategies were most successful againstthe opponent’s strategy in past games. For example, if a cautious player is oftenbeaten by a player who plays aggressively then the system can take advantageof this fact and use an aggressive strategy whenever it determines that theopponent is playing cautiously.

Since successful computer players already exist for most games, it makessense to take advantage of these as a base, and then adjust according to thestrategy chosen. This could perhaps be achieved through adjusting weights forcertain moves or changing the evaluation function to favour particular events.

1.3.3 Games

I start out with a simple game— Noughts and Crosses (also known as Tic-Tac-Toe)— and a few specific computer strategies which can play games which arerecorded. This simple game should be easy to simulate and thus create a large

8

Page 12: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

supply of training data with which to analyse. It is also a trivially simple game,so I should be able to investigate and find the cause any errors or problems thatthe classifier consistently fails on.

I then consider the ancient Chinese board game ‘Go’. This game is rich instrategy and is very complex. Current computer players find this game difficultand the best computer players are regularly beaten by amateurs. The basicrules are also very simple (unlike Chess which has many different pieces whichmake different moves) which I hope will make it easier to produce an accurateclassifier.

9

Page 13: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Chapter 2

Background Knowledge

Artificial Intelligence has been previously used in game playing. The mainareas of research have been in game theory, heuristic solutions / algorithms forplaying, and machine learning to learn and improve to produce a good strategy.Each of these are considered in this chapter, followed by an introduction toprobability and techniques that will be used throughout this project. Probabilityis discussed because it will allow us to express levels of certainty about particularstrategies, and provides a foundation for Bayesian modelling and categorisationwhich we will be introduced in the final section of this chapter.

2.1 Game Theory

Game Theory is used to study games mathematically. Typically several playersare modelled and the payoffs for each possible action or “move” are representedin a matrix. Game Theory typically assumes that all players are playing tomaximise their payoff (i.e. score the highest) and that they are perfectly rationaland do not make mistakes. Game theory is concerned with finding equilibria—situations where all players can maximise their payoffs, and so can be used toanalyse and design auctions and voting systems (Roth, 2002; Laffont, 1997).

Game Theory and fast computers have been used to solve some games com-pletely. This means that there is a perfect strategy which will allow the playerto win or force a draw under any circumstances. Noughts and Crosses is a wellknown example of a solved game: perfect play from one player will result ina draw if playing against another perfect opponent, or a win if the opponentmakes a mistake. Connect-4 was solved in 1988 (Allis, 1988), and Draughts in2007 (Schaeffer et al., 2007), which is the largest game solved to date.

Game Theory does not have much relevance to my project, and is onlyincluded here for completeness and to describe the concept of a solved game. Itis important to note that many games have not been solved, and are unlikelyto be solved in the near future. This is due to the time and effort required tosolve a game, and many games will need all (or many) permutations of play

10

Page 14: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

to be simulated in a brute-force manner. For games such as chess which havearound 60 moves in a game and 40 possible moves per turn, this is infeasible bymodern-day computers. The case is similar for many other games which havea large search-tree (Müller, 2002; Schaeffer and Van den Herik, 2002). Go, thelarger game examined in this project, is one of the most complex games thatexists, with an average of 200 possible moves per turn, and a game length ofaround 200 moves. This provides a purpose for my project— it will hopefullyhelp to improve algorithms which do not depend on Game Theory and solvedgames to play competently.

2.2 Algorithms for Playing Games

Several games have had strategies / algorithms produced by humans which arecapable of playing competently and beating average human opponents.

One of the first chess algorithms written for a computer, as described byShannon (1950), uses a brute-force search of every possible move for the next3 moves for each side. It uses an evaluation function of the board-position torank the moves, and then chooses the move which maximises this evaluationfunction (i.e. gives the most favorable board position). This is a very simplealgorithm which would perform slowly, and more complex algorithms or meth-ods of pruning the search tree have been developed (Campbell et al., 2002;Fürnkranz, 1996). Note that this algorithm does not take into account the op-ponent’s strategy explicitly— the current game board position is the only factorwhich affects what move the algorithm will make.

These algorithms generally play games with a good level of proficiency—generally good enough to beat novices and intermediate players, although maybenot good enough to beat experts at playing games (the best chess masters stillbeat the best computer players in some games). I will look at and analyse someof these computer players for playing Go in chapter 4.

Moriarty and Miikkulainen (1995) achieved success training neural networksto discover complex strategies for the board game Othello— the paper claimsthat ‘a complex strategy which had eluded experts for years was evolved in aweek’. Neural Networks show promise in learning to play games intelligently,however they are often very black-box-like: it is difficult to see exactly howthey work. The networks simply produce an output, meaning that for a humanoperator it is difficult to discover any reasoning behind their decisions. Neu-ral networks also tend to require a lot of CPU time to generate. The Othellonetwork populations typically took 24 hours of CPU time for the genetic algo-rithms to evolve a network with significant behaviour. This paper shows thatit is possible for machine learning techniques to develop strong players whichare capable of consistently winning games. While this is not what my project isabout directly, the classification of player strategies could help to develop moreadvanced computer players in the future.

11

Page 15: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

2.3 Learning and Inference in Humans

Tenenbaum et al. (2006) suggests that humans learn according to some BayesianModel (Tenenbaum, 1999). Humans are capable of generalising from a few ex-amples (rather than needing to see every possibility or combination of variables)about the probabilities of future events or unseen variables. This is similar toa Bayesian network (or Naive Bayes) which attempts to split the probabilitiesfor the different random variables, which allows it to forecast probabilities forunseen combinations of random variables. Obviously human inference is a lotstronger (we have many more past memories to draw on, and additional tech-niques), but these similarities are good news for this project because it suggeststhat it will be possible for the classifier to correctly diagnose an opponent’sstrategy using these techniques.

2.4 Machine Learning

Machine Learning has been used in several different games to try to createwinning strategies or create more intelligent players. These methods differ fromgame theory in that they do not generally consider perfect solutions which canbe mathematically proven to win, and from simple algorithms / heuristics inthat they adapt over time depending on how well a solution works.

In this section I introduce several machine learning techniques which will beused throughout the project. I begin with an introduction to probability andBayesian statistics which underpin probabalistic machine learning, then describeexactly what categorisation is. Several different classifiers and how they workare described in detail.

2.4.1 Probability & Bayes

Much of machine learning focuses on probability. Without probability, we arerestricted to statements such as

X ⇒ Y

which means that whenever X is true, Y must also be true. While this isuseful in many situations such as mathematics or some logic, it is difficult toprovide any level of doubt or ’likelihood’. For example, most sheep are white,however it is incorrect to always assume that sheep ⇒ white because there isalso the possibility that the sheep is black. If we include probabilities into ourmodel, we can start to describe relationships between variables in terms of theirlikelihood or our confidence. For example we can say that 99% of the time,sheep⇒ white.

We will use standard probability notation throughout this paper. P (A = a)means the probability that some variable ‘A’ has the value ‘a’. For example,P (coinF lipResult = heads) = 0.5 means that the chance that a flipped coin

12

Page 16: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

lands head-up is 0.5 or 50%. Where it is unambiguous, the variable name willsometimes be dropped, so that P (heads) = P (coinF lipResult = heads) = 0.5A is known as a random variable. If A can take n values, a1, a2, . . . , an then theprobabilities that A is equal to each a1, ..., an will sum to 1.

n∑k=1

P (A = an) = 1

Joint probability, denoted by P (AandB) or P (A,B) is the probability of Aand B both happening. For example, if we draw a single card from a pack ofcards, then P (Clubs,Ace) = 1/52 because there is only one card in the packwhich is both a Club and an Ace— namely the Ace of Clubs.

We also use P (A = a|B = b) to denote conditional probability: the probabil-ity that A=a given that we know B=b. For example, say we roll 2 dice and ob-serve that the first dice lands on 4, then P (sumOfDice = 9|firstDice = 4) = 1

6 .Formally, P (A|B) is defined by:

P (A|B) = P (A,B)P (B)

We can manipulate this equation to give the product rule which states that:

P (A,B) = P (A)P (B|A)

i.e. The probability of both A and B happening is equal to the probability of Ahappening, multiplied by the probability of B happening given that A happens.From this, we can derive Bayes’ Theorem:

P (A|B) = P (B|A)P (A)P (B)

Bayes’ Theorem can be used to find probabilities for certain variables, giventhat we know the values for others. This is very useful in machine learning,since we often want to find the likelihood that something is true, given that weknow (have observed) other variables. (Bishop et al., 2006).

We discuss probability here because it will allow us to express levels ofcertainty about particular strategies. When playing against an opponent, wewill rarely be certain about their strategy since we can only see their pastmoves. Probability allows us to quantitatively state how certain we are aboutour assumptions. Probability also gives us the mechanism to use identities andmathematical formulae, (e.g. Bayes’ Theorem described above) to reason aboutprobabilities and certainties. It also allows us to use some of the classificationtechniques which are described in the next section.

13

Page 17: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

2.4.2 Strategy Categorisation

Some progress has also been made in identifying an opponent’s strategy. Heet al. (2008a) produced a game of Pacman which was then deliberately playedby different human players using 3 different strategies. Various attributes aboutthe game were measured (such as ’distance to ghost’, ’pellets eaten’, ’ghostseaten’) and then used to train a classifier. Further games are then presented tothe classifier and classified into one of 3 strategies which the player had beenusing. The classifiers generally had high levels of success (around 90% of gameswere correctly classified).

Dead-End (a real time predator / prey game) has been modelled similarly,and strategies identified using the same methods of pattern recognition. (Heet al., 2008b). Two classification techniques were used in these papers, thenaive Bayes classifier and a Bayesian Belief Network. These are outlined belowin the Classification section below.

2.4.3 Data Mining & Classification

Since this system will be analysing lots of data in the form of game logs, itis a form of data mining. Data Mining is the act of processing data to findpatterns or classes of data. It is frequently used in marketing and fraud toidentify consumer spending patterns or anomalies. We shall look at classifiers,which are one part of machine learning. A classifier takes a set of n attributes(an observation), F = [F1, F2, . . . , Fn] and choose a class, C for it, where theclass is a discrete set of labels. Typically, a class will be a group or categorywhich should depend on F.

In machine learning, typically a classifier will be trained using training data,which is a set of observations for which the classes are known. The parametersof the classifier are adjusted until it produces the correct classification for most(ideally, all) of the training observations. In the Pacman example describedabove from He et al. (2008a), the known attributes are the observations fromthe game (distance to ghosts, pellets eaten, score etc.), and the class beingpredicted is the strategy that the player was using which is one of A, B or C.Below, I describe several classifiers which will be used throughout this project.

2.4.3.1 Naive Bayes Classifier

A naive Bayes classifier assumes that each of the attributes, Fi, are independent,i.e. that a change in Fi will not affect Fj , or, formally, ∀i, j : P (Fi|Fj) =P (Fi). It uses the product rule for probabilities and Bayes’ rule to reduce theconditional distribution over the class to a series of n multiplications. It requiresthat a probability be specified for each of P (Fi|C), then given all of the Fi, it ispossible to compute a probability for each C, and so decide which class is mostlikely.

When deriving these P (Fi|C), there are 2 cases: either Fi is a discreteattribute which will be always be set to one of a list of several possible values,

14

Page 18: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

or Fi is a continuous attribute which can take any number. An example of anominal attribute might be ‘winner’ in which case the possible values are ‘black’or ‘white’, or a boolean attribute which can only be ‘True’ or ‘False’. An exampleof a continuous attribute might be ‘speed of a car’ where the value could take anyrealistic value (say, ‘0mph’ to ‘200mph’). For the purposes of generating a NaiveBayes classifier, we will consider any textual attribute such as ‘winner’ or ‘colourof piece in position A1’ as discrete, and any numerical attribute such as ‘score’,‘number of pieces on the board’ or ‘ratio of black:white pieces’ as continuousattributes, even though some of them may be restricted to a discrete set ofintegers and not strictly be continuous. The reason for this is that the classifierassumes a Gaussian distribution for P (Fi|C) on the numerical attributes whichhelps to reduce overfitting and reduces the number of training examples neededsince intermediate values can be interpolated. Obviously we can only assume aGaussian distribution for numerical attributes where the attribute represents arange of ‘nearby’ values, not when it represents a numerical label or class, butwe avoid this problem here since no numerical labels are used in this project.The Gaussian distribution assumes that for each attribute and class, the valueswill all lie around a specific value (the mean) and be normally distributed witha certain variance. For example, if we were classifying humans into the classes‘male’ and ‘female’ based on their physical attributes then the graph for the‘height’ attribute for the male class might be centered at ‘175cm’ with a standarddeviation of ‘8.8cm’. To calculate the likelihood of a person who is 180cmtall being male, that is to calculate P (Sex = male|Height = 180cm), theprobability value can be read off the Gaussian distribution graph.

Model Generation For discrete attributes, we need to generate P (Fi|C) foreach possible value of Fi and C. That is, calculate P (Fi = sj |C) for each i, jand C (where sj denotes one of the possible discrete values of Fi). This valueis derived from the training set by finding:

P (Fi = sj |C) = NumberOfExamplesWithClassCWhereFi = sj

NumberOfExamplesWithClassC

These values of P (Fi|C) are then stored in a distribution table and used forclassifing as described below.

For numerical attributes, our Naive Bayes algorithm assumes that the prob-ability distribution is Gaussian (or Normal). This allows us to estimate proba-bilities for ‘unseen’ values of the attribute, reducing the amount of training dataneeded. It also helps to ‘smooth over’ any outlying points, and produces a moreconsistent model which will be less sensitive to minor changes in attributes. AGaussian distribution must therefore be produced for each attribute and class,i.e. a Gaussian curve must be produced to describe each P (Fi|C). A mean anda standard deviation are all that are needed to unambiguously define a Gaussiandistribution and these are calculated as follows:

If we denote the R training examples as Tri where r refers to the par-ticular example (or record, or row) and i is the attribute index (or column)(i = 1, . . . , I), and there are L examples for each class C, then:

15

Page 19: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Mean, µiC =

R∑r=0

Tri

L

StandardDeviation, σiC =

√√√√√√√ (R∑

r=0T 2

ri)−(R∑r=0

Tri)2

L

L

At classification time, each distribution P (Fi|C) can then be calculated fromthe mean and standard deviation using the standard Gaussian distribution for-mula:

P (Fi|C)(x) = 1√2πσ2

iC

e−(x−µiC )2

2σ2iC

Classification We use the definition of the conditional probability and thechain rule to derive:

P (C|F1, F2, ...) = P (C,F1, F2, ...)P (F1, F2, ...)

= 1P (F1, . . . , Fn)P (C)

n∏i=1

P (Fi|Fi−1, Fi−2, . . . , F1, C)

= P (C)P (F1, . . . , Fn)

n∏i=1

P (Fi|C)

This allows us to compute a probability for each class for a given observationof all the Fi. A classifier can be constructed by calculating the probability foreach class, and then choosing the class which is most probable. Formally, thismeans finding:

classify(f1, . . . , fn) = argmaxc

(P (C = c)n∏

i=1P (Fi = fi|C = c))

For a given observation, P (f1, . . . , fn) will be constant so has been removedfrom the formula here.

2.4.3.2 Bayesian Belief Network

A Bayesian Belief Network is a directed acyclical graph where each node repre-sents a particular attribute (such as ’pellets eaten’ in the example above), and

16

Page 20: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

the edges represent causal relationships between them. i.e. If an edge connectsA to B, then the result of A affects B. Probabilities are assigned for each nodefor each combination of its parents’ nodes. These belief networks can then beused to produce probabilities for any particular node given known values forother nodes.

An example of a Bayesian Network is shown in figure 2.1.

The network shows the relationship between the rain, sprinkler and grassbeing wet. The arrow pointing from rain to sprinkler indicates that the factthat rain affects the chance of the grass being wet. i.e. if we know that it israining, then the probability that the grass is wet will change (it will rise). Thesame is true for the arrow from the sprinkler to the grass being wet. If thesprinkler is currently running then the grass is more likely to be wet. The thirdarrow from rain to sprinkler indicates that the fact that it is raining affectswhether the sprinkler is turned on. By looking at the sprinkler truth table, wecan see that if it is raining then it is less likely that the sprinkler will be on (0.01chance of sprinkler being on, compared to 0.4 if there was no rain)— presumablythe sprinkler has a 99% reliable sensor or is human operated which enables itto save water if it rains. A Bayesian network requires truth tables of this typefor each node and requires a dimension in the truth table for each parent. Theexample here considers only random variables with 2 possible values (true andfalse), however the theory will generalise for any number of discrete values (andeven continuous variables if some slight changes are made).

Once we have the network together with truth tables, we can use it to predictvariables, given that we know some other variables. We use G to denote ‘thegrass is wet’, R to denote ‘it is raining’, and S to denote ‘the sprinkler is on’.The joint probability function (the function we use to evaluate the chance of allof R, S and G being specific values) is defined by:

P (G,S,R)

Using Bayes’ rule, and removing the independent variables (i.e. relationshipsbetween variables for which there are no arrows in the network) we can simplifythis to:

P (G,S,R) = P (G|S,R)P (S,R) = P (G|S,R)P (S|R)P (R)

We can now use this to answer questions such as “What is the probabilitythat the sprinkler is on, given that the grass is wet?”

P (S = True|G = True) = P (S = True,G = True)P (G = True)

=

∑R∈T,F

P (S = True,R,G = True)∑R,S∈T,F

P (G = True, S,R)

17

Page 21: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

= 0.2 ∗ 0.01 ∗ 0.99 + 0.8 ∗ 0.4 ∗ 0.90.2 ∗ 0.01 ∗ 0.99 + 0.2 ∗ 0.99 ∗ 0.8 + 0.8 ∗ 0.4 ∗ 0.9 + 0.8 ∗ 0.6 ∗ 0.0

= 0.289980.44838 = 0.6467...

So if we see that the grass is wet and we do not know if it is raining, we cansay that the sprinkler has around a 65% chance of being on. Further examplesand details are available on the internet at Wikipedia.org, autonlab.org, or intextbooks focusing on Graphical Models, Data Analysis and/or Data Mining.

The advantage of a Bayesian Belief Network is that it allows relationshipsto be modelled between variables easily. It provides a framework for manag-ing complex systems of dependence using Bayesian statistics. There are alsoalgorithms for creating Bayesian Networks from data. This is usually done in 2steps— first the structure of the network is decided, then the probability valuesfor the truth tables are derived. Once the network has been created, it can beused as above to derive probabilities for unknown variables. (Charniak, 1991).

2.4.3.3 Adapting Weights

In many models, machine learning is used to find the best weights for a particularstrategy. For example in Chess this may correspond to the points value assignedto each piece. This learning can be performed in several ways. Either by usinga training set and giving higher weights to pieces which played a role in winninggames, or by using a genetic algorithm to produce several versions of the strategywith different weights, then using these strategies to play games and keepingand breeding the most successful ones (Kendall and Whitwell, 2001; Samuel,1967). This technique works well for fine-tuning a strategy and deriving suitableweights or constants, but it is difficult to define an initial strategy in this waybecause there is nothing to start from. This technique has been used to optimiseevaluation functions by Kendall and Whitwell (2001).

2.5 Other Machine Learning Techniques Used

Several other machine learning techniques are described in detail in this sectionand used later in this document. K-fold Cross Validation is used to provide anaccurate measure of accuracy of a classifier over training data, and how muchthat accuracy varies. Attribute Selection and Boosting are used to enhancethe performance of the classifiers described in the previous section. These tech-niques will be applied during generation of classifiers to increase the classificationaccuracy on the Noughts and Crosses and Go data in later chapters.

18

Page 22: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

2.5.1 K-fold Cross-Validation

Validation is a method for evaluating the performance of a model to show thatit is classifying correctly, or accurately enough for use. It is not advisable to usethe same data for training and testing because this does not test for overfitting.Overfitting is a problem which occurs where a model becomes specific to thetraining data, and so does not generalise well to unseen data. An examplecould be a series of points which display linear correlation, with noise (see fig2.2). A polynomial curve may appear to fit the training set better becauseit passes through every point, however it is actually fitting the noise, and alinear ‘curve’ would actually extrapolate to more points (which were not in thetraining set) better than the polynomial. The same effect can be observed withmore advanced models and classifiers, and many situation where the model istested purely against the training data.

The simplest method of validation is Split-Validation in which the the entiretraining set is partitioned into a smaller training and a test set. The smalltraining set is then used to train a classifier which is then tested on the test set.In this way, performance accuracy can be estimated by counting how many ofthe test set elements were correctly classified.

K-fold Cross-Validation splits the entire training set into k subsets, S1, . . . , Sk.Stratified sampling is used to ensure that each subset contains the same amountof examples of each class. k− 1 of these subsets are then used for training, andthe trained classifier is tested on the remaining subset. This process is repeatedk times, with each Si being used for testing in turn. After all k rounds havebeen completed, the performance vectors are summed to produce an averageaccuracy, along with a measure of how much the accuracy varied (e.g. X% +/-Y%).

Cross-Validation is used throughout this report to give accurate measure-ments on how well a classifier performs its classification. Since the training andtest sets are separated, there is no risk of bias towards models which tend tooverfit. Because k − 1 of the k subsets are used in each round, the number ofexamples used for training is not significantly reduced, meaning that the mod-els will not become much less accurate, even in situations where the amount oftraining data is limited. Unless otherwise stated, when a classifier’s accuracyis given in the form X% +/- Y%, the accuracy was determined using 10-foldCross Validation.

2.5.2 Selection of Attributes

Selection of attributes is a technique used to extract a subset of attributesfrom the initial set, such that a more accurate classifier can be generated fromthe extracted set than when the entire set is considered. There may be manyreasons that a smaller number of attributes could produce a better classifier,but the chief reason we shall see is that a Naive Bayes Classifier includes a‘naive’ independence assumption— i.e. it assumes all attributes are independantof each other, so that a change in attribute Fi will not affect attribute Fj .In situations where this independence assumption does not hold, a group of

19

Page 23: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Figure 2.1: An Example Bayesian Network. (Creative Commons Licensed,source: Wikipedia)

Figure 2.2: Example of overfitting a line. The blue, polynomial trendline hasoverfitted the training set, and provides a worse estimation than the lineartrendline for the red data points which were not included in the training set.

20

Page 24: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

dependant attributes can cause bias or oversensitivity to changing attributesand decrease the accuracy of the model. A smaller number of attributes mayalso reduce the complexity or time taken to generate a model.

Forward selection is one such attribute selection algorithm. Forward selec-tion starts by taking each attribute in turn and generating and evaluating amodel using the training set reduced to contain only that one attribute. Theattribute which generates the most accurate model (say, Fi) is added to the re-sult set. The process then repeats with round 2, taking each attribute in turn,generating and evaluating a model which takes both Fi and the attribute beingtested as inputs. Again, the most successful attribute (Fj) will be added to theresult set. The process continues until a round is reached in which there was noincrease in accuracy, or all attributes have been added to the result set, and theresult set consisting of Fi, Fj , . . . , Fp is returned. Forward Selection can take along time, because for a training set initially consisting of N attributes, N mod-els have to be generated and evaluated in each round. If K-fold Cross-Validationis used for the generation and evaluation, this increases to KN models for eachround. Since the maximum number of rounds is N , the maximum number ofmodels which need to be generated and evaluated is KN2.

2.5.3 Boosting

Boosting is a method used to create a strong classifier from several weak clas-sifiers. Several different boosting algorithms exist, but I will use AdaBoost(Adaptive Boosting), a boosting algorithm developed by Freund and Schapire(1995). AdaBoost is one of the most common boosting algorithms due to itssimplicity, supports any weak classifier and has many already written softwareimplementations. Over a series of rounds, t = 1, . . . , T , AdaBoost maintains adistribution of weights D which represent the importance of correctly classify-ing each of the examples in the training set. In each round, a weak classifier,ht, is generated which minimises the weighted error. The weights are then ad-justed such that the examples that were correctly classified have their weightsreduced for subsequent rounds, and incorrectly classified examples will havetheir weights increased. This has the effect that subsequent classifiers will pri-oritise examples which were previously misclassified. The final classifier thencombines the classifications of each of the ht, weighted appropriately, to producea final classification.

2.6 Software Used

2.6.1 RapidMiner

RapidMiner is an open source, GPL data mining application. It includes manyimplementations of machine learning algorithms as modules called operators,including classifiers, regression techniques, boosting and selection algorithmsand many preprocessing operators to prepare and filter the data before it isused to generate a model. It also includes validation operators (such as K-cross

21

Page 25: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

validation) and logging and graphing outputs which can be used to view modelsor the progress of a process. RapidMiner includes a GUI for creating processesincluding many operators, which made it easy to create or restructure processesor switch between operators without writing or refactoring significant amountsof code. Some amount of freedom of configuration is lost by using the GUI overaccessing the operators directly from code using the API, but I did not find thatit hindered any of the processes that I needed to build in this project.

RapidMiner includes a plugin to allow the WEKA (Hall et al., 2009) opera-tors (models, classifiers etc.) to be used within its processes. Weka is a similarcollection of open source data mining operators which can be used within itsown GUI or acessed directly from Java code. Since RapidMiner includes theseoperators as a subset of its own, I chose to use RapidMiner over Weka.

I chose RapidMiner after reading several reviews of data mining applications.RapidMiner is freely availiable, actively developed, used by many individualsand businesses, and provides both a GUI and an API for integration into otherJava programs. It also has the advantage of being open source so I could lookinto exactly how the algorithms work and make changes if necessary.

2.6.2 Other Development Tools

I used the Java IDE ‘Eclipse’ to do all Java development and debugging. Eclipseis actively developed, is the most popular Java IDE, is free and open source,runs on most platforms, and is the IDE I had most experience with.

Microsoft Excel was used extensively to view and edit the data collectedthroughout this project.

22

Page 26: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Chapter 3

Noughts and Crosses

As described in the introduction, I began by applying the machine learningtechniques described so far to the simple game of Noughts and Crosses (alsoknown as Tic-Tac-Toe). Although this game is fully understood, it provided away for me to learn the concepts and test whether the techniques would workfor recognising strategy in such a simple game. Several computer strategies werecreated and played against each other, and the games were recorded. Severalclassifiers were then trained to differentiate between these strategies using themachine learning techniques described in chapter 2. The best classifiers correctlydetermined the player around 87% of the time on completed games, and around80% of the time on games in progress.

3.1 Game Description

Noughts and Crosses is a 2-player game played on a 3x3 grid. One player isdesignated ‘Noughts’ (O) and the other ‘Crosses’ (X). The players take it inturns to make a move and place a piece in a currently empty space on the 3x3grid. The aim of the game is to get three of your pieces in a row, either vertically,horizontally or diagonally. When this happens, the game is over and the playerwith 3 in a row wins. If all 9 spaces of the board are filled without either playerwinning, then the game is over and is declared a draw. Throughout this chapter,we will refer to spaces or cells by their coordinates in the form (column,row),starting with (1,1) in the top left, and ending with (3,3) in the bottom right.

3.1.1 Analysis

Noughts and Crosses is a game which is easy to solve (produce a perfect strategywhich will never lose). For this reason it is primarily played by young children.Once both players work out a perfect strategy, all games will end in draws.

23

Page 27: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

3.2 Machine Learning techniques applied to Com-pleted Games

3.2.1 Player Strategy

I created 6 different players to play Noughts and Crosses:

• RandomPlayer: Plays randomly in an empty space each turn.

• RandCornerPlayer: Plays in a random corner each turn. If there are nocorners empty, will play randomly.

• RandEdgePlayer: Plays on a random edge each turn. If there are nocorners empty, will play randomly.

• CornerPlayer: Plays one of these in order of preference: Top-left, Top-right, Bottom-left, Bottom-right, random empty space.

• EdgePlayer: Plays one of these in order of preference: Top-middle, Middle-right, Bottom-middle, Middle-left, random empty space.

• PerfectPlayer: Plays such that it will never lose (will only ever win ordraw).

I wrote an Java implementation of Noughts and Crosses which allowed meto plug in each of these computer players as either O’s or X’s. The startingplayer is decided randomly with 50% chance of black or white playing first. Theimplementation also captures details about the board and game after each turn.

3.2.2 Attributes to store

I chose what details to record to try to give a representation of what a human seeswhen he/she looks at a board. A simple record of which player played first andan ordered list of coordinates of moves and would be sufficient to describe thegame and make it possible to replay the game (e.g. X;(1,1);(2,2);(1,3);(3,3);(1,2)). However, this is unintuitive and requires an understanding of the rules of thegame in order to make sense. If a child who had never played a game of Noughtsand Crosses before were to look at this representation of the game, they wouldfind it much more difficult to understand the rules or learn to play than if theywere looking at the traditional representation of the game. When a humanwatches a game in progress, they can observe several facts:

1. There are a total of 9 spaces

2. Pieces do not disappear or change once they have been placed

3. The O and X players take it in turns to make a move

4. Connecting 3 in a row (vertically, horizontally or diagonally) wins thegame for that player

24

Page 28: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

These are all important facts that are obscured by the use of a concise gamerecord.

To address problem 2, I chose to record the state of the game after eachmove. Each of the following attributes were recorded for every turn. This moreaccurately models human observation of the game as at any point the humanplayer can see the current state of the entire board, not only the individualmoves made throughout the game.

Each cell in the board was recorded as either ‘X’, ‘Y’1 or ‘e’ (for empty).This means that all 9 cells in the board had their own attribute, and this madeit easy to see that there are only 9 cells. These attributes are named accordingto their coordinates, e.g. 11=Top-left, 13=Top-right etc.

The number of O’s and X’s and empty spaces currently placed on the boardwas recorded.

Several Boolean attributes were also recorded: ‘2Xs’,‘2Ys’,‘3Xs’ and ‘3Ys’.‘2Xs’ was set to true if there were 2 ‘X’s in a row (either vertically, horizontallyor diagonally). This tries to encapsulate the connections / relationships betweenthe positions of cells (i.e. that cell (1,2) is next to cell (1,1)) and the conceptthat lines of X’s and O’s are important in the game.

This gives a total of 16 attributes to be recorded for each turn. A game ofNoughts and Crosses lasts for a maximum of 9 turns, so 16*9 = 144 attributesare needed to describe an entire game. The final list of attributes describing agame is:

1_11, 1_21, 1_31, 1_12, 1_22, 1_32, 1_13, 1_23, 1_33, 1_numXs, 1_numYs,1_numEmptys, 1_2Xs, 1_2Ys, 1_3Xs, 1_3Ys, 2_11, 2_21, 2_31, 2_12, . . . ,9_3Xs, 9_3Ys.

where the ‘i_’ part at the start of the attribute name is the turn numberthat the attribute refers to. To describe a game in which 9 moves were not made(either because one player won after fewer than 9 moves, or because the gameis not yet complete), the attributes for uncompleted turns are left empty.

3.2.3 Data Collection

Each of the 6 computer players were played against all other 6 computer players100 times each, for a total of 3600 games. A final ‘label’ attribute was added tothe records of each of these games. This label consisted of a string— the nameof the computer player playing as Crosses (e.g. ‘CornerPlayer’). This datageneration created a comma separated values file containing the training setof examples which could be loaded into the data mining package ‘RapidMiner’(see section 2.6.1) for processing. A preprocessing step to reduce the numberof attributes was used— RapidMiner’s ‘Remove Useless Attributes’ operatorwas applied to the data to remove any attributes which were had the samevalue in every one of the 3600 training examples. These attributes do not help

1‘Y’ was used to represent ‘O’ to avoid confusion with 0 (zero) which was originally usedfor empty cells. O and Y are used interchangeably throughout this chapter.

25

Page 29: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

to differentiate between classes since they are constant, so are ‘useless’ to theclassifier and can be safely removed.

3.2.4 Model Generation

I used 10-fold cross-validation using stratified sampling (See Section 2.5.2) onthe attributes gathered from the training set of games to generate a Naive BayesClassifier which achieved accuracy of 79.11% +/- 1.26%. This is substantiallyhigher than would be expected from a random classifier (∼ 20%) and has cor-rectly classified 4 out of 5 games. We can see from table 3.1 that the precision(least number of ‘false-positives’) is especially accurate for CornerPlayer, whilethe recall (number of ‘true-positives’) is highest for EdgePlayer.

26

Page 30: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Table3.1:

Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

setof

completed

Nou

ghts

andCrosses

games

using10-fo

ldcross-valid

ation.

Accuracy:

79.11%

+/-

1.26%

True

Player

PerfectP

layer

EdgePlayer

Ran

domPlayer

CornerP

layer

Ran

dEdg

ePlayer

Ran

dCorne

rPlayer

Class

Precision

Pred

ictio

n

PerfectP

layer

480

857

4249

3471.64%

EdgePlayer

0531

80

590

88.8%

Ran

domPlayer

119

6451

1662

5763.43%

Corne

rPlayer

00

3477

030

93.53%

Ran

dEdg

ePlayer

055

430

430

081.44%

PerfectP

layer

10

3865

0479

82.16%

Class

Recall

80.00%

88.50%

75.17%

79.50%

71.67%

79.83%

Table3.2:

Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

setof

completed

Nou

ghts

andCrosses

games

using10-fo

ldcross-valid

ationafterFo

rwardSe

lectionof

24attributes.

Accuracy:

86.83%

+/-

1.56%

True

Player

PerfectP

layer

EdgePlayer

Ran

domPlayer

CornerP

layer

Ran

dEdg

ePlayer

Ran

dCorne

rPlayer

Class

Precision

Pred

ictio

n

PerfectP

layer

536

036

00

093.71%

EdgePlayer

0593

120

820

86.32%

Ran

domPlayer

430

355

02

1286.17%

Corne

rPlayer

10

10600

062

89.15%

Ran

dEdg

ePlayer

07

950

516

083.50%

Ran

dCorne

rPlayer

200

920

0526

82.45%

Class

Recall

89.33%

98.83%

59.17%

100.00%

86.00%

87.67%

27

Page 31: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

3.2.5 Analysis Of Errors

3.2.5.1 Identical moves from Different Players

From table 3.1, we can see that there is a lot of misclassification between Edge-Player and RandEdgePlayer with 59 RandEdgePlayer games being incorrectlyclassified as EdgePlayer, and 55 EdgePlayer games being classified as Rand-EdgePlayer. This is to be expected as RandEdgePlayer will often randomlyplay the exact same moves as EdgePlayer (i.e. RandEdgePlayer will randomlyfirst choose cell (2,1), then (3,2) etc.) This makes it impossible to tell whichplayer was playing simply from looking at the game record. Similar misclas-sifications are made between CornerPlayer and RandCornerPlayer. Confusionbetween RandomPlayer and the other players can also be explained in a similarway.

Due to the randomness of its moves, RandEdgePlayer could make the samemoves as EdgePlayer would make in the same situation, making the actual playerimpossible to determine from the game record. If the opponent (Noughts) playerwere never to play in one of the edges, we might expect this to happen in 1 in24 games ( 1

4 .13 .

12 .

11 = 1

24 ), or around 25 times in our 600 training games playedby RandEdgePlayer. However, we know that for 200 of these 600 games, theopponent was EdgePlayer or RandEdgePlayer which will also play in the edges,reducing RandEdgePlayer’s choice of edges to play in and making it more likely(probability of 1

16 + 16 = 11

48 ) that it will choose to make the same moves asEdgePlayer would. RandPlayer will also sometimes play in edges, increasingthe chance that RandEdgePlayer will make the same moves as EdgePlayer to∼ 1

9 in those games and PerfectPlayer will sometimes win the game after Rand-EdgePlayer has only made 2 moves, increasing the chance of identical play to 3

48in the 100 games against PerfectPlayer. Putting these together gives us a lowerbound for the number of games played by RandEdgePlayer which are identicalto the moves EdgePlayer would make:

1148 .200 + 1

24 .200 + 348 .100 + 1

9 .100 ≈ 72

So we expect there to be more than 72 examples in the training set whichare actually RandEdgePlayer, but playing the same moves as EdgePlayer. Thismeans that even a perfect classifier could not tell these examples apart, andcould do no better than guessing for these examples, so we should expect half(∼ 36 games) to be misclassified. We can see that our Naive Bayes classifier hasactually incorrectly classified 59 examples of RandEdgePlayer as EdgePlayer.Looking closely at some of these misclassified examples I found that the classifieris not perfect, and several examples such as those shown in figure 3.1 weremisclassified as EdgePlayer. This suggests that the model could be improved,or that a human could perform classification with a higher accuracy rate.

28

Page 32: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

3.2.5.2 Naive Independence Assumption

Looking at the examples which were misclassified as PerfectPlayer, it appearsthat the majority (91.5%) of them were games where O had moved first, andwon within the first 5 moves (i.e. the shortest number of moves possible towin). This suggests that the model puts a high weight on a short game forthe PerfectPlayer class. This is understandable since PerfectPlayer is the onlyplayer likely to win a game quickly, but unfortunately this criteria does notdifferentiate between X and O winning the game. This misclassification mayalso be due to the way a Naive Bayes classifier works. Because a Naive Bayesclassifier makes the naive assumption that all attributes are independant, ahigh number of attributes which are dependant will produce a high bias. Inthis case, later attributes having an empty value depend on the game havingfinished early. PerfectPlayer is particularly affected by this problem since if agame finishes after only 5 moves then there will be a lot of attributes which areempty (the remaining 4 moves * 16 attributes = 64 attributes), and so a largeweight assigned to a short game. A better model may be able to consider thegames more intelligently and reduce some of these misclassifications.

3.2.6 Improving Classification through Attribute Selec-tion

One way to improve the model is to change what data is being used to generatethe model. Initially, there are 144 attributes. I applied the Forward SelectionAlgorithm (see section 2.5.2) to optimise the attributes to generate a moreaccurate Naive Bayes model. The forward selection optimisation selected 24attributes, and the improved Naive Bayes model achieved 86.83% +/- 1.56%,an improvement of around 7% over the non-optimised attributes.

The attributes selected, in order (most important attribute first), were:

7_31, 2_21, 2_11, 4_2Xs, 4_32, 5_23, 6_13, 2_22, 4_31, 3_33, 3_12,3_13, 1_21, 4_11, 8_2Ys, 5_3Xs, 2_23, 1_31, 5_3Ys, 1_32, 3_numXs, 1_33,5_numYs, 9_2Ys.

We can see that the attribute selection has heavily favoured the earlier turns,with only 4 of the 24 attributes being from the second half of the game (turns6,7,8 or 9). I attribute this to the naive independence assumption problem ofgames that finish early, described above in section 3.2.5.2. By removing many in-terdependant attributes, the naive independence assumption of the Naive Bayesclassifier was contradicted less, and the number of games incorrectly classified asPerfectPlayer was reduced. In fact, no games where CornerPlayer, EdgePlayer,RandCornerPlayer or RandEdgePlayer played as X were incorrectly classifiedas PerfectPlayer (see results, figure 3.2).

The other problem of 2 players making the same moves, described in sec-tion 3.2.5.1, was also helped by this attribute selection. There are a total of82 + 7 = 89 misclassifications between EdgePlayer and RandEdgePlayer, com-pared with 59 + 55 = 114 in the full attribute version. It can be seen thatone direction— EdgePlayer being misclassified as RandEdgePlayer— has been

29

Page 33: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

improved much more than the reverse misclassification, where accuracy actuallydecreased. Misclassifications between CornerPlayer and RandCornerPlayer alsofollowed this pattern. A larger class weight on EdgePlayer would explain thisinbalance, and it is possible that this was the best way to improve the overall ac-curacy of the classifier, even though it decreases the precision of the EdgePlayerclass.

3.3 Games in Progress

The classifiers trained so far all looked at games which had been completed. NextI investigated how a classifier could perform on a game in progress. This couldbe useful in developing a computer player which would identify the strategy itsopponent was using and adapt its own playing style accordingly. Throughoutthis section, X will denote the player whos strategy we want to identify, and weconsider the computer as the O player (who for our purposes here, will alwaysuse the RandomPlayer strategy).

3.3.1 Data Collection

The training set of games was generated in a similar way to the completedgames training set, each of the 6 computer-players played 100 games againsteach other for a total of 3600 games, and the same attributes were recorded.However, instead of only recording the game when it was finished, I createda record after every move X made. This results in 2-5 records for each game,increasing the total number of training examples to 13290.

3.3.2 Model Generation

Generating a Naive Bayes classifier from this training data yields a model whichachieves 73.53% +/- 1.50% accuracy. This decrease of around 5.5% accuracy isexpected as for many records there is now a lot less data to analyse for someexamples. This will increase the number of collisions where 2 different play-ers would make identical moves (for example, considering only the first move,RandEdgePlayer has a 1 in 4 chance of making the same move as EdgePlayer,compared with a 1 in 24 chance when the entire game is considered). The waythat incomplete and completed games deal with the empty columns may alsocontribute to the increased error. If a game is incomplete (no one has won andthere are still empty spaces) then the attributes for the turns which have notyet been played are left empty. Likewise, if X or O were to win on move 6then the attributes for turns 7, 8 and 9 are left empty. This means that thereis no way to tell simply by looking at one of the unknown attributes (such as9_11) whether the game has already been won (in which case it is likely thatPerfectPlayer was one of the players) or whether this is a partial game recordand the game is still in progress.

Running the forward attribute selection algorithm on this training set in-

30

Page 34: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

creased accuracy to 79.96% +/- 1.36% (an increase of 6.5%, similar to theincrease seen in the full-game training set). However, the number of attributesselected was 31, 7 higher than in the full-game set, possibly as a result of lessdata being available from empty attributes (see above) or because there is nolonger a set of attributes which often describe the game’s final position. Thisaccuracy rate was verified by applying the model to another set of 13281 partialgame records which did not come from the example set (79.89% of exampleswere classified correctly).

3.3.3 Interactive Play

I created another player, HumanPlayer, which could be used as the ‘Crosses’player in my Java implementation of Noughts and Crosses. When Human-Player.makemove() was called, HumanPlayer printed the board to the screen,applied the model learnt in section 3.3.2 above to the current game so far,printed the classification to the screen, then waited for command-line input tomake a move. This interactive demo allowed the human to adopt one of thestrategies such as CornerPlayer, PerfectPlayer etc, and see how well the classiferwas able to detect the correct strategy. No formal experiments were performedusing this method of play (because getting a human to make the moves is slowerand more error-prone than the automatic method used so far), but it proves thatit is possible to create a system which can play against a human and recognisetheir strategy in real time with a high level of accuracy. From here, it would bevery simple to have the computer (Noughts) player adapt its own strategy totake advantage of any known weaknesses in the recognised human strategy.

31

Page 35: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Chapter 4

Go

The board is a mirror of the mind of the players as the moments pass. Whena master studies the record of a game he can tell at what point greed overtookthe pupil, when he became tired, when he fell into stupidity, and when the maidcame by with tea.

— Anonymous Go player. (Sensei’s Library, 2010b)

4.1 Game Description

Go is an ancient Chinese board game which is very computationally complexand requires a great deal of strategy to play well. It is played between 2 players,Black and White, who take turns to places pieces (or ‘stones’) onto a 19x19board. Pieces do not move once they have been placed, although pieces maybe captured by the opposing player and removed. The object of the game is tocontrol the largest portion of the board at the end of the game.

4.1.1 Game Rules

These rules were first published by Davies and Bozulich (1984). This is a versionfrom Jasiek (2010), slightly modified for clarity.

1. The square grid board is empty at the outset of the game.

2. Black makes the first move, after which Black and White alternate.

3. A move consists of placing one stone of one’s own colour on an emptyintersection on the board.

4. Players may pass their own turn at any time.

5. A stone or solidly via grid lines connected group of stones of one colour iscaptured and removed from the board when all the intersections directly

32

Page 36: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

adjacent to it are occupied by the enemy. Capture of the enemy takesprecedence over self-capture.

6. No stone may be played so as to recreate a former board position (The‘Ko’ rule).

7. Two consecutive passes end the game.

8. A player’s territory consists of all the board points either occupied ormonochromely surrounded.

9. The player with more territory wins.

There are several terms used in these rules, or used later in this chapter whichneed defining:

• Intersection — The board is a grid of 19 vertical and 19 horizontal lines.Any place where 2 lines cross is an ‘intersection’ or ‘space’, and stonesmay usually be placed on any empty intersection. A Go board contains361 intersections.

• Chain — A chain is a group of stones of the same colour which are adja-cent. Chains may be connected vertically or horizontally, but not diago-nally. A single stone could be said to be a chain of length 1; a 3x2 blockof stones would be an example of a chain of length 6.

• Liberties — Liberties are the empty intersections directly adjacent to achain. Typically playing a Black stone in one of a Black chain’s libertieswould increase the chain length by one. If a chain has no liberties, it iscaptured and all of its stones are removed from the board.

• Atari — A chain is said to be ‘in Atari’ if it could be captured on theopponent’s next move. Typically, any chain with only one liberty is inatari.

• Group — A group is a collection of chains which are considered together,although may not be connected.

• Komi — The ‘Komidashi’ or ‘Komi’ is the score added to white’s score inorder to offset Black’s advantage of starting first. For even players withno handicap this is set to a value 5.5 - 7.5 (the 0.5 prevents draws). In allgames played in this chapter I used a komi of 6.5.

• Eye — An eye is an empty intersection surrounded on all 4 sides by thesame chain. The opposing player can never play in an eye unless it capturesthe surrounding chain since the piece placed would have no liberties andso be immediately removed. Suicide moves such as this are illegal in mostversions of Go.

• Kyu and Dan — The ranking system for Go starts at around 30 Kyu(Student) for an absolute beginner, advances down the ranks to 1 Kyufor an intermediate amateur, then to 1 Dan (Master) for an advancedamateur up to 9 Dan for the best players (professionals). A difference

33

Page 37: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

of ranks between 2 players denotes the handicap- how many extra stonesthat player should start with. The actual ranking system is quite a bitmore complex than this, but I will not go into more detail here.

There are various variations on the rules and scoring systems, but none thatseriously alter the nature of the game, and are not relevant to this work.

4.1.2 Basic Strategy

4.1.2.1 Opening (Fuseki)

The game begins with players placing stones, usually around the edges of theboard, in order to provide a good basis for capturing territory. The cornersare most highly sought after since few stones are capable of controlling mostterritory here and the surrounding edges of the board mean that the opponentcan only invade from 2 possible directions (Sensei’s Library, 2010a). See figure4.1.

There are 9 intersections on the board known as ‘star points’. These pointshave no special rules, but are commonly accepted to be strategically importantpoints on the board. For this reason, in the opening, players regularly play at(or next to) the star points in order to maximise their influence over the board—especially the corners. Many players extend this idea to have a set of memorisedopenings and the moves to make based on the opponent’s moves. Similarly, thereare specific sequences known as ‘joseki’ which involve knowing several possibleprecomputed playouts for a particular shape or situation (usually in the corners)which can be chosen to produce a known outcome. The joseki should be chosento give the best result for the group in question and the surrounding groupsor board as a whole. In this way they can be sure to make strong moves atthe beginning of the game giving them a good base from which to begin themidgame.

4.1.2.2 Midgame

Much of the midgame strategy of Go relies on determining the life of groups.A group is alive if it can resist capture by the opponent. i.e. if both playerscontinue to take turns to play, then it is always possible to defend some of thegroup’s chains from being captured. A chain is alive if it can form 2 eyes (seesection 4.1.1), as the opponent cannot hope to fill both eyes since playing in aneye would be suicide. A chain is dead if the opponent can capture it, even givenperfect play from the defending player. It is often very difficult to determine thelife or death of a group since it involves looking ahead many moves to determineif there is a way for the group to form 2 eyes or escape (extend the chain pastthe attacker to a point where there are many liberties). Lichtenstein and Sipser(1980) showed that the problem of determining the winner (a problem whichdepends on being able to classify groups as alive or dead) is NP-Hard. Playersoften study many problems and recognise shapes patterns which are alive or

34

Page 38: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Figure 4.1: Stone Efficiency- 2 stones in the corner claim most territory. Twostones on a side claim some territory, while 2 stones in the center claim noterritory.

35

Page 39: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

dead, or can be made to be stronger or killed using the correct sequence ofmoves.

While the aim of the midgame is to maintain life for as many large groupsas possible, this is performed in a number of ways. Offense and defense mustbe balanced— it is important that enemy stones are threatened and that theplayer’s own groups are not captured. Common shapes or sequences of moves areusually studied by players learning to play Go (see the ‘ladder’, figure 4.2 or the‘net’, figure 4.3). Concepts such as ‘influence’ (the ability of groups or stones toaffect other groups on the board), ‘thickness’ (strong groups which are difficult tocapture and have a lot of influence) and ‘Sente’ (having the initiative— playingthe first move in a sequence which the opponent must respond to) are also tobe considered. Evaluation of the board (involving classifying the life of groups)allows players to estimate who is winning and adjust their strategy accordingly.If they are a long way behind then it is more likely that they will need to playmore aggressively in order to achieve victory.

There are too many strategies and concepts to be listed here (The website‘Sensei’s Library’ has a list of strategies and over 100 ‘Go proverbs’ (Sensei’s Li-brary, 2010b)), and learning when to use each them is an important step inbecoming a good Go player. Because most of these concepts are not steadfastrules, they are particularly hard to express as part of a computer program andare another reason that computer Go programs have not reached the levels ofhumans.

4.1.2.3 Endgame

By the time the endgame is reached, the majority of the board layout is in place.The important part of the endgame is to solidify the positions, and play out thelife or death of any groups which is unclear. Most of the fighting tends to onlyaffect 2 groups, and it is unlikely that either player will be able to make a largedifference to the score. However, if the player plays well, then they may be ableto make up a few extra points and maximise their final score.

Computer players tend to be strong in the endgame section of the gamesince the number of possible moves is much lower and stones have much lessinfluence over large areas of the board and on subsequent turns. This meansthat complexity is lower, and so a tree-search can be applied more effectively.However, the NP-hard problem of determining life and death remains, and situ-ations such as multiple-ko and the life of one group affecting another mean thatthe endgame is still far from solved.

4.1.3 Analysis

Since there are a total of 19∗19 = 361 possible spaces on the board, and almostall of the empty spaces are legal moves on any turn, the number of possibleGo games is very large (much higher than games such as Chess). The averagenumber of possible moves on a given move is around 250, and the averagelength of a game is 150 moves (Allis et al., 1994). This means that a brute force

36

Page 40: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Figure 4.2: Example of a Ladder. Black can continue to ‘run’ away across theboard towards the top-right, but will be killed when the edge of the board isreached.

Figure 4.3: Example of a Net (Black). The white 3-chain cannot escape- it isdead.

37

Page 41: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

search of the game tree is infeasible, even to only a few turns deep. The exactnumber of legal Go board positions is believed to be over 2.08 ∗ 10170 (Trompand Farnebäck, 2007). Of course, most of these positions do not arise in playbetween 2 competent players because they would require multiple ‘bad’ movesto be made by one or both players. This complexity is the root of why computerplayers are currently unable to beat the best human players.

A further reason that Go is complex is that moves can have consequencesfar in the future. A move played as a threat on turn 20 may not be realiseduntil turn 100. Searching the game tree to this depth would not be possible bycurrent computers even if the size of the board were substantially reduced.

The problem of deciding which groups on the board are alive or dead, whichis important for player strategy during the game, and also for scoring at the endhas been shown to be NP-Hard (Lichtenstein and Sipser, 1980; Demaine, 2001).

4.2 Tools

Software was used to automatically play the Go games, store game records, andthen replay and convert the games into a form which could be used directly totrain a classifier. GoGui was used to play the games and I made modificationsto analyse game record files in batches to produce comma-separated value fileswhich could be read by RapidMiner. The game record files were stored as SmartGame Format (SGF) files which is an open standard for recording the movesmade throughout a game.

4.2.1 GoGUI

GoGUI is an open source, GPL licensed, Java Go board written and maintainedby Markus Enzenberger. It allows 2 human players to play against each other,or allows computer players which support the Go Text Protocol (GTP) to be‘attached’ to play as an opposition against human players. It is the interface forhuman games recommended by GNU Go and Fuego. It supports time limits,both sets of scoring rules, replaying of a game move by move, setup of eithercolour stone to create specific scenarios or diagrams, and because it is opensource, it can be modified easily.

GoGUI also includes a command-line tool ‘gogui-twogtp’ which automatesplaying 2 GTP players against each other for a specified number of games andrecord the games as Smart Game Format (SGF) files. I used this to play com-puter players against each other and record the SGF files for later analysis.

I made modifications to GoGUI to read in a directory of SGF files and per-form analysis on them to generate a comma-separated-values file which containdata which can be more easily analysed and used to generate a model from.The details of what was recorded is described in later sections. The source codemodifications are included in the appendix.

38

Page 42: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

4.2.2 Smart Game Format

Smart Game Format (SGF) or Standard Game Format files describe the playof a game between players. The format supports over 30 games, including Go.A typical SGF file might contain the following details:

• The game being played. For all games described here, this will be ‘Go’.

• Details of who is playing. For humans this would typically be their nameor username. For computers, their name and version.

• Date and time the game was played

• Rules used including board size, time limits, komi, scoring method, etc.

• A list of moves made, in order. B[aa] denotes Black moving at intersection‘aa’ (in a corner). W[] indicates a pass for White.

• Final score and winner.

SGF is extensible and support can be added for comments or additional infor-mation about the rules, players or gameplay. The full specification is availiableonline (Hollosi, 2010).

4.3 Machine Learning techniques applied to Com-pleted Games of Go

4.3.1 Player Strategy

Many different algorithms have been used to create computer Go players, frombrute force searching with tree pruning to Neural Networks (Lubberts and Mi-ikkulainen, 2001; Richards et al., 1998), to Monte Carlo methods (Brügmann,1993; Hoock et al., 2008; Bouzy et al., 2004). I downloaded 3 freely availiable Gocomputer players which played with different strategies. The aim of this partof the project was to see if it was possible to identify which player was playingfrom only a record of the game. Each of these 3 players supported the Go TextProtocol (Farnebäck, 2010b), and could be attached to GoGui (an open source,Java implementation of a Go board GUI) to play either Black, White, or both.

4.3.1.1 Aya

Aya was created by Hiroshi Yamashita, and was ranked first in 9x9 Go and joint6th in 19x19 Go at the 8th Computer Olympiad in 2003 (Computer Olympiad,2010). More recently, Aya came joint 4th in the 19x19 13th Computer Olympiadin 2008. Aya plays as a bot on the KGS Go Server (KGS), against human playersand other bots, and has maintained a level of 8Kyu for at least the past year(March 2009 - April 2010) (KGS, 2010c). The version of Aya I used was v6.34.

39

Page 43: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

4.3.1.2 GNU Go

GNU Go was created with the intent of making a Go player which was opensource. At the time (in 1989), there were reportedly no open-source implemen-tations of Go (Farnebäck, 2010a). Because of its long (compared to the other Goplayers considered) history and its open-source nature, GNU Go has had manydevelopers and consists of many modules or ‘move generators’ which work indifferent ways to compute the best moves. Each move generator focuses on aparticular area (such as ‘capture or defend chains’, ‘break into opponent’s ter-ritory’ or ‘patterns/shapes seen before’) and provides a list of suggested movesalong with justifications for those moves. These moves are then evaluated andone is chosen as the move to be made. GNU Go uses many different techniquesfor playing Go, including traditional methods and more recent Monte-Carlomethods. GNU Go also includes a command-line argument to set a level forplay, which enables or disables certain techniques, and varies the amount of com-putation time given to each move. In this experiment, I included both GNU Goat the default (level 10), and GNU Go at an ‘easier’ level (level 1), for a totalof 4 computer players (Aya, GNU Go 1, GNU Go 10 and MoGo).

GNU Go won the 19x19 Go tournaments at the 8th and 11th ComputerOlympiads in 2003 and 2006, and came third in the 12th Computer Olympiadin 2007 (Losing to MoGo) (Computer Olympiad, 2010). Several unofficial GNUGo bots are run on KGS with ranks of 5-8Kyu (KGS, 2010e). The version ofGNU Go I used was v3.8.

4.3.1.3 MoGo

MoGo is based on Monte-Carlo Tree Search, and was originally developed byYizao Wang as his Masters Thesis at the University of Paris-Sud. MoGo won the12th Computer Olympiad in 2007, came 2nd in 2008 and third in 2009 (Com-puter Olympiad, 2010). In August 2008, MoGo played against Kim Myungwan,a professional player with rating 8Dan, and won (with a handicap of 9) (Wang,2010). This was the first ever 19x19 game won by a computer player against aprofessional player. When playing this game, MoGo was running on the super-computer HuyGens, and used 800 cores at 4.7GHz (15 Teraflops). MoGo wasalso the first to win a 9x9 game against a professional player, playing as Black(Wang, 2010). MoGo has a rank of 2Kyu on KGS (KGS, 2010f).

4.3.2 Training Games

I played each of the 4 computer players against each other 100 times, for a totalof (4 + 3 + 2 + 1) ∗ 100 = 1000 games. Aya was run with default settings. GNUGo was run at both level 1 and level 10 for the separate players GNUGo1 andGNUGo10. MoGo was limited to 1 second per move. The GoGui command linetool ‘TwoGTP’, which uses the Go Text Protocol to play two players againsteach other, was used to play the games, and produced 1000 Smart Game Format(SGF) (see section 4.2.2) files, which contain details about the game, the list ofmoves of each player, ending score, and all details required to replay the game.

40

Page 44: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Initially, each of the 1000 training games was used to produce 2 trainingrecords: one with the black player as the ‘class’ attribute, and one with all ofthe Black / White attributes swapped and the original white player (the newblack player) as the ’class’ attribute. This enabled me to use a game whichwas originally played as Aya (Black) vs MoGo (White) as a training examplefor both Aya and MoGo, reducing the number of games which needed to beplayed. This was accomplished using a simple script which simply swappedthe colour of all stones in the SGF file and saved it as a new SGF file. Strictlyspeaking, these converted games are no longer valid Go games since Black shouldalways move first, however if we think of ‘Black’ as ‘the player whose strategyis to be identified’, then this process will train the classifier to work on gameswhere the player to be identified went first or second. I thought that any slightdiscrepancies that resulted due to the difference in strategy due to moving firstor second should be negligible next to doubling the amount of training dataavailiable.

4.4 Board Positions (First Attempt)

4.4.1 Attributes to store

I adapted GoGui to batch-process a list of SGF files and produce a represen-tation of the game which could be used to train a classifier. For the first ex-periment, I simply recorded an attribute for each intersection on the board at 10-turn intervals for the first 300 turns of a game. This resulted in 361intersections∗300turns/10 = 10830 attributes recorded for each of the games. Each attributehas a value of ‘.’ (empty), ‘X’ (black stone), or ‘O’ (white stone). The attributesnames are of the form [Turn]_[Column][Row], so 20_A19 refers to the top-leftintersection, 20_A1 refers to the bottom-left intersection, etc. I ran this pro-cess on all 2000 games to produce a training set. The ‘remove useless attributes’operator was used in RapidMiner to remove any attributes which had the samevalue in every one of the training games.

4.4.2 Results

4.4.2.1 Simple Naive Bayes

I used 10-fold cross-validation using stratified sampling (See Section 2.5.1) onthe attributes gathered from the training set of games to generate a Naive BayesClassifier which achieved accuracy of 68.10% +/- 3.71% (see table 4.1).

4.4.2.2 Forward Selection

I repeated the experiment using the Forward Selection algorithm to select themost important attributes from the initial set of 10830. This resulted in a similar(within error margin) classification rate of 64.85% +/- 4.24% (see table 4.2).

41

Page 45: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Table4.1:

Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

seto

fboard

posit

ions

takenfrom

Gogames

betw

eencompu

ter

playersusing10-fo

ldcross-valid

ation.

Accuracy:

68.10%

+/-

3.71%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

355

697

7670.02%

GNU

Go1

69274

14133

55.92%

MoG

o8

12467

2591.21%

GNU

Go10

68145

12266

54.18%

Class

Recall

71.00%

54.80%

93.40%

53.20%

Table4.2:

Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

seto

fboard

posit

ions

takenfrom

Gogames

betw

eencompu

ter

playersafterforw

ardattributeselectionusing10-fo

ldcross-valid

ation.

Accuracy:

64.85%

+/-

4.24%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

343

829

9265.21%

GNU

Go1

55299

48109

58.51%

MoG

o44

38396

4076.45%

GNU

Go10

5881

47259

58.20%

Class

Recall

68.60%

59.80%

79.20%

51.80%

42

Page 46: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

The selection process took 3hours and 14minutes on a laptop with a 2.5GHzIntel Core2Duo processor. The 17 attributes selected by the algorithm were:

10_Q16, 20_Q4, 70_C14, 10_D4, 40_R17, 110_D16, 80_C3, 20_R15,10_R17, 60_R14, 40_D6, 30_R6, 50_H8, 10_Q6, 10_R2, 20_G13, 20_H15

4.4.3 Analysis of results

Figure 4.4 shows the distribution for attribute ‘10_Q16’ (the first, and so ‘mostinfluential’ attribute selected by Forward Selection) in the final model. We cansee that Q16 being empty after 10 moves have elapsed is a strong indicationthat Aya is playing Black. Likewise, if Q16 contains a white stone, it is likely tobe GNU Go 10 playing Black. MoGo and GNU Go 1 are both likely to play tooccupy Q16. Unfortunately, this attribute is not such as good indicator of who isplaying as it appears. The frequencies in this graph (and model) are an artefactfrom how the training data was generated. As described in section 4.3.2, only1000 games were initially played and the remaining games were generated byswapping the colours of all stones.

The original 1000 games were played as in table 4.4.3. This meant that GNUGo 1 always played black, and GNU Go 10 always played white (except whenplaying against themselves). Aya and MoGo played both Black and Whiteagainst other players. Upon closer investigation I found that both GNU Goplayers and MoGo frequently choose Q16 as their first move when they are thefirst to move (i.e. when they are playing Black). This explains why in figure4.4, GNU Go 1 has such a high bar for Black: In the majority of games thatGNU Go 1 played, it was able to go first, and so move at Q16. The same shouldbe true for GNU Go 10, however because of the way the sides were chosen,GNU Go 10 usually played white (see table 4.4.3) so rarely got to move first, soneeded to react to the moves that the opponent had played (and also lost thechance to move at Q16 if black had already played there).

This means that the training set used was biased due to the order of thecomputer players in table 4.4.3. Consequently, the model generated was alsobiased, and although the given accuracy was high, this was only because the testset contained the same bias as the training set. In a situation where any playercould be playing as black or white with equal probability, the model would stillhave its bias towards GNU Go 1 for players playing first, and would have amuch lower accuracy rate.

To fix this bias in the training set, I decided to have the computer playersplay the remaining 600 games. The training set should then contain a total of1600 records— 1 generated from each game— with no bias or preference madetoward any of the players. The results of these games are shown in table 4.4.3

43

Page 47: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Figure 4.4: Distribution Graph from generated Model for attribute 10_Q16using board positions only.

Table 4.3: Number of games played between each of the 4 computer players.White

GNU Go 1 Aya MoGo GNU Go 10

Black

GNU Go 1 100 100 100 100Aya 0 100 100 100MoGo 0 0 100 100GNU Go 10 0 0 0 100

Table 4.4: Number of games where each of the computers players won or lostin the training games. MoGo and GNU Go 1 are the strongest players, Aya isthe weakest.

Win LoseGNU10 black 270 130Gnu1 black 196 204MoGo1 black 289 111Aya black 52 348

44

Page 48: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

4.5 Board Positions — Unbiased

4.5.1 Attributes to store

I performed exactly the same experiments using exactly the same processes andattributes, but using the 1600 records rather than the biased set of 2000 recordsdescribed above.

4.5.2 Results

4.5.2.1 Simple Naive Bayes

I used 10-fold cross-validation using stratified sampling (See Section 2.5.1) onthe attributes gathered from the training set of games to generate a Naive BayesClassifier which achieved accuracy of 63.69% +/- 2.72% (see figure 4.5).

4.5.2.2 Forward Selection

I repeated the experiment using the Forward Selection algorithm to select themost important attributes from the initial set of 10830. This resulted in a clas-sification rate of 62.19% +/- 2.67% (see figure 4.6). The 10 attributes selectedby the algorithm were:

10_Q16, 50_C16, 50_Q3, 50_C17, 100_B15, 180_L3, 140_R3, 180_K1,10_F17, 240_H14.

4.5.2.3 ADA Boost

I repeated the experiment a third time using the Boosting algorithm ADABoost(see section 2.5.3). 10 Naive Bayes models were generated, each with weightsbetween 0.79 and 1.91. This boosted classifier achieved an accuracy of 59.38%+/- 5.13% (see fig 4.7).

45

Page 49: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Table4.5:

Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

seto

fboard

posit

ions

takenfrom

Gogames

betw

eencompu

ter

playersusing10-fo

ldcross-valid

ation.

Accuracy:

63.69%

+/-

2.72%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

306

501

5674.09%

GNU

Go1

40164

5160

44.44%

MoG

o3

14379

1492.44%

GNU

Go10

51172

15170

41.67%

Class

Recall

76.50%

41.00%

94.75%

42.50%

Table4.6:

Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

seto

fboard

posit

ions

takenfrom

Gogames

betw

eencompu

ter

playersafterforw

ardattributeselectionusing10-fo

ldcross-valid

ation.

Accuracy:

59.38%

+/-

5.13%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

307

680

8466.88%

GNU

Go1

48140

28122

41.42%

MoG

o20

72357

4871.83%

GNU

Go10

25120

15146

47.71%

Class

Recall

76.75%

35.00%

89.25%

36.50%

46

Page 50: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Table4.7:

Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

seto

fboard

posit

ions

takenfrom

Gogames

betw

eencompu

ter

playersafterADABoo

stan

dusing10-fo

ldcross-valid

ation.

Accuracy:

63.94%

+/-

3.02%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

297

493

5972.79%

GNU

Go1

49158

2147

44.38%

MoG

o4

17386

1292.12%

GNU

Go10

50176

9182

43.65%

Class

Recall

74.25%

39.50%

96.50%

45.50%

47

Page 51: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

4.5.3 Analysis Of Results

4.5.3.1 GNU Go 1 and GNU Go 10

We can see from the results that the main source of error is when classifyingbetween GNU Go 1 and GNU Go 10. The classifier has correctly predicted GNUGo 1 and GNU Go 10 160 + 172 = 332 times, compared with 164 + 170 = 334games incorrectly classified as the other player. This is unsurprising since theyare the same strategy and algorithm, but with the difficulty level changed. Wewould expect them to play similarly, although we would expect GNU Go 10to be a stronger overall player and win more games. Their similar play mayalso cause confusion due to them fighting over the same spaces in games wherethey are playing against each other. If part of both players strategy is to tryto control a particular space, and we assume that each gets control of it in 50%of their games, then the colour of that space becomes useless for determiningwhich player is which.

Figure 4.5: Distribution Graph from generated Model for attribute 10_Q16using unbiased board positions only.

Figure 4.5 shows the distribution for attribute ‘10_Q16’ in this classifier.Comparing this with figure 4.4, the same graph but from the model generatedfrom the biased data in section 4.4, we can see that the bias has now beeneliminated. We see that MoGo always starts at Q16, and that GNU Go 1and GNU Go 10 now both start at Q16 with almost equal probability (around65-70% of the time).

48

Page 52: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

4.5.3.2 MoGo

MoGo has the most easily identified strategy and has a 94.75% recall. Thismay be because MoGo uses a Monte-Carlo Tree Search method, a differenttechnique to the older methods used by Aya and GNU Go. This technique hasimproved computer Go players immensely in recent years, to the point thatMoGo and Fuego (another Monte-Carlo based computer player) have achievedpodium positions in the most recent 19x19 Go Computer Olympiads in 2007,2008 and 2009. It is possible that this technique causes a distinct difference inplay which is particularly noticable by a Naive Bayes classifier.

4.5.3.3 High Accuracy

I found the accuracy of 63% surprisingly high since this is based only on boardposition. The naive independence assumption of the naive bayes classifier meansthat the classifier does not consider complex relationships between the stones,and is purely basing its predictions on the colour of individual stones on a par-ticular turn. This suggests that some algorithms prioritise capturing specificareas or intersections, or that their style of play means that particular intersec-tions are more likely to be given away to the opposing player or left empty. Thisobservation suggests that the more advanced concepts such as chains, eyes andlife are not absolutely necessary to differentiate between strategies (or at least,not these computer strategies).

4.5.3.4 Forward Selection

The Forward Selection algorithm mostly selected early turns (only 5 out of 17selected were from turns after turn 50). This may be because at the start of thegame, the players have a much more open choice of where they can place theirpieces and are not governed by where their opponent has played. This meansthat a wider range of choices are open because the player does not need to playto defend territory or prevent chains from being captured. More freedom meansthat the players can play according to their own strategy, so if they have specificgoals or areas which they deem important, then it is more likely that they canmove in those places.

It also prioritised the attributes corresponding to intersections near the cor-ners, close to the star-points (strategically important points marked on theboard at D4, Q4, D16 and Q16). The intersections selected by the forward se-lection algorithm are shown in figure 4.6. These are the intersections which arethe best indicators of which player is playing- either because a particular playerfrequently captures this intersection, plays such that the opponent is likely tocapture it, or because it is left empty (e.g. playing such that this is the locationfor an eye).

49

Page 53: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Figure 4.6: Go Board showing the intersections identified by the forward at-tribute selection algorithm as most useful for identifying player strategy.

50

Page 54: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

4.6 Scores

4.6.1 Attributes to Store

Although the classifier based purely on board positions (Section 4.5) outlinedabove achieved a level of accuracy, it does not encapsulate any concepts whichare generally believed to be important in high level strategy such as life anddeath or offensiveness. I investigated whether changing the training data tomore intuitive data which helps to encapsulate these concepts could create abetter classifier. To do this, I modified GoGui further to produce a list ofscores and statistics derived from the current game state, but which do notnecessarily allow the game to be recreated exactly as the SGF file and theBoard positions do. I decided what attributes to collect by reading throughdiscussions of games and reading about strategy and choosing attributes whichwere frequently considered by players, and other attributes which were easy tocollect from GoGui (e.g. the score). There is no penalty to recording too manyattributes since forward selection will remove similar or useless attributes duringmodel generation. The attributes that I collected were:

• capturedBlack and capturedWhite: The number of black and white stonesrespectively which had been captured in the game up to this point.

• adjacentOppColourPoints: The number of black stones which border whitestones. If a black stone borders 2 white stones it will be counted twice.This is obviously reversible- the number of white stones which border blackstones will be the same number. It was conceivable that one of the playersmay play more aggressively strategy which happens to play next to stonesof the opposite colour- either aggressively in an attempt to capture them,or to block intrusions or strengthen borders.

• chainsBlack, chainsWhite and chainsEmpty: The total number of chains(as defined in section 4.1.1) of black stones, white stones or empty stones.Although ‘empty’ spaces typically wouldn’t be considered chains, this at-tribute required no extra effort to add and provides a measure of how‘divided’ the unoccupied space is.

• meanChainSizeBlack and meanChainSizeWhite: The mean (average) lengthof the black chains. Isolated stones count as chains of length 1.

• libertiesBlack and libertiesWhite: The total number of liberties that theblack and white chains have respectively.

• meanLibertiesBlack and meanLibertiesWhite: The average number of lib-erties per chain.

meanLibertiesBlack = libertiesBlack

chainsBlack

• atariBlack and atariWhite: The number of chains of each colour whichhave only one liberty, and so could potentially be captured unless defendednext move. Strictly speaking, a chain with only one liberty may not be

51

Page 55: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

in atari due to the Ko rule, but this is uncommon enough that it can beignored. This provides a measure of how aggressively the opposing colouris playing.

• eyesBlack and eyesWhite: We consider a empty intersection to be an eyeif it is empty and surrounded on all 4 sides (or 3 sides if it is on an edge)by stones of the same colour. While, strictly speaking, the space may notbe an eye because the 4 surrounding stones may not be part of the samechain, this attribute provides a good approximation to the number of eyeseach player has on the board.

• meanEyesBlack and meanEyesWhite: The average number of eyes perchain. If this is high, it could indicate that the player has been playingdefensively and has many alive chains.

• areaBlack and areaWhite: The amount of area each colour would havecaptured if the game was scored at this point using area scoring (Chinese)rules. This is the number of stones the player has on the board, plus thenumber of empty intersections they surround.

• resultArea: The overall score if the game were scored at this point usingarea scoring. A positive score means Black wins, a negative score is avictory for White.

resultArea = areaBlack − areaWhite− komi

• territoryBlack and territoryWhite: The amount of territory each colourwould have captured if the game was scored at this point using territoryscoring (Japanese) rules. This is the number of empty spaces that aresurrounded by a colour, not including spaces with stones in them.

• resultTerritory: The overall score if the game were scored at this pointusing territory scoring. A positive score means Black wins, a negativescore is a victory for White. This is the number of empty spaces that aresurrounded by a colour, together with those captured during the game.

resultTerritory = territoryBlack − territoryWhite+ capturedWhite

−capturedBlack − komi

Each of these 24 attributes were recorded every 10 moves of each game forthe first 300 moves of each game, giving a total of 24 ∗ 30 = 720 attributesrecorded for each game. The same set of 1600 training games as used in section4.5 was used to produce a csv file which could be loaded into RapidMiner. Idid this by modifying the Go GUI source to step through an SGF game 10moves at a time, and perform analysis on the board by counting the number ofblack / white stones, performing grouping into chains, performing eye-detectionetc. using the algorithms I developed and described above. After the csv filewas generated and loaded into RapidMiner, the ‘Remove Useless Attributes’operator was used to remove any attributes which were constant.

52

Page 56: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

4.6.2 Inbuilt GoGUI Scoring Engine

The score attributes (that is, areaBlack/White, territoryBlack/White, resultAreaand resultTerritory) were generated using the scoring engine built into GoGui.Scoring in Go is not a simple matter as it requires both players to agree onwhich chains are alive, and which are captured (if they do not agree then playcontinues). I encountered 2 main problems while implementing the score gener-ation / data collection part of the program to derive the listed attributes froman already completed game:

1) Scoring on an early turn. It is difficult to say what the score is going tobe before the game is getting close to finishing. At the start of the game, it isimpossible to say who is currently winning and by how much.

2) Deciding what groups are alive or dead purely from board position isdifficult to do (NP-Hard) (Lichtenstein and Sipser, 1980). This means that ittakes a long time to score a completed game accurately. If the game is not yetcompleted, it will take even longer to score.

For both of these reasons, I chose to use the inbuilt GoGUI scoring engine.This engine works by ignoring any chains or territory for which life or deathcannot be decided very quickly. This provides a rough guide to the currentsituation and board, but is not very accurate. Using this method takes aroundhalf a second to generate all attributes and scores (so, around 0.0016 secondsfor each 10-turn set of attributes). The alternative method would be to useGoGUI (or another computer player) to score the board at the 10-turn intervalsthroughout the game in order to decide what chains are alive or dead and sogenerate the score. This would be more accurate, but would take a lot longer(5-30 seconds for each 10-turn interval, or around 7 minutes per game). Fordevelopment and testing, I decided to go with the much faster, although lessaccurate, scoring method. An exact score is unlikely to be needed by a classifier,and a general indication of who is winning and by how much is almost as useful.The decreased time also allowed many more experiments and tests to be runand decreased development time.

4.6.3 Results

4.6.3.1 Simple Naive Bayes

I ran the same processes on the score data as were run on the board data. Thesimple Naive Bayes classifier achieved an accuracy of 56.81% +/- 0.99% (seetable 4.8).

4.6.3.2 Forward Selection

The Forward Selection Algorithm achieved an accuracy of 61.88% +/- 2.94%(see table 4.9). The 15 selected attributes (in order) were:

110_libertiesBlack, 170_capturedBlack, 270_meanLibertiesBlack, 180_atari-

53

Page 57: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Black, 200_meanLibertiesWhite, 180_eyesWhite, 260_resultTerritory, 100_lib-ertiesBlack, 240_libertiesBlack, 200_eyesBlack, 270_meanLibertiesWhite, 250_are-aBlack, 270_areaWhite, 220_resultArea, 230_capturedWhite

4.6.3.3 ADA Boost

With the same training set, using AdaBoost, an accuracy of 56.81% +/- 0.99%was achieved (see table 4.10).

Figure 4.7: Accuracy results as Forward Selection algorithm added more andmore attributes. Each dot represents the accuracy of a single trial. Colourdenotes generation.

54

Page 58: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Table4.8:

Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

setof

scores

takenfrom

Gogames

betw

eencompu

terplayers

using10-fo

ldcross-valid

ation.

Accuracy:

56.81%

+/-

0.99%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

318

7529

6665.16%

GNU

Go1

25119

25105

43.43%

MoG

o41

63316

7364.10%

GNU

Go10

16143

30156

45.22%

Class

Recall

79.50%

29.75%

79.00%

39.00%

Table4.9:

Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

setof

scores

takenfrom

Gogames

betw

eencompu

terplayers

usingFo

rwardSe

lectionan

d10-fo

ldcross-valid

ation.

Accuracy:

61.88%

+/-

2.94%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

308

4721

3475.12%

GNU

Go1

50239

40187

46.32%

MoG

o30

26311

4775.12%

GNU

Go10

1288

28132

50.77%

Class

Recall

77.00%

59.75%

77.75%

33.00%

55

Page 59: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Table4.10:Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

seto

fscorest

aken

from

Gogames

betw

eencompu

terp

layers

usingADABoo

stan

d10-fo

ldcross-valid

ation.

Accuracy:

56.81%

+/-

0.99%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

318

7529

6665.16%

GNU

Go1

25119

25105

43.43%

MoG

o41

63316

7364.10%

GNU

Go10

16143

30156

45.22%

Class

Recall

79.50%

29.75%

79.00%

39.00%

56

Page 60: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

4.6.4 Analysis of Results

4.6.4.1 Comparison with board attributes set

We can see immediately that the scores attributes are slightly lower than thoseachieved by the boards classifier. This is surprising since the scores attributesmore directly correspond to the principles of strategy and tactics than the exactpositions stones are placed. It is apparent that the computer player strategiesare slightly more easily identified by their opening moves and where they tend tocapture territory than by the attributes examined here. On the other hand, thescore-based classifiers achieved an accuracy much higher than random classifi-cation, meaning that the scores definitely provide a good way of differentiatingbetween the different strategies, even if they do not perform quite so well as theboard positions.

4.6.4.2 Main Source Of Errors

Again, the main source of error is between GNU Go 1 and GNU Go 10. Thisis for the same reasons discussed in section 4.5.3.1— both players play in verysimilar ways. The number of misclassifications between GNU Go 1 and GNU Go10 is 143+105 = 248, and the number of correct classifications is 119+156 = 275.This is not substantially better than random. The scores attribute set did notachieve such a high precision or recall for MoGo (64% and 79% compared with92% and 95%). It appears that MoGo is better classified by board position,suggesting that MoGo’s strategy for what areas of the board to capture are lessvaried than the other players. This is supported by this statement from YizaoWang, one of the developers of MoGo commenting on a game in progress at the2007 12th Computer Olympiad between MoGo and TSGO:

This is a typical opening for MoGo’s game. It knows nothing about openingstrategy but has a special sense for the centre.—Wang (2007)

4.6.4.3 Forward Selection

The attributes selected by forward selection are all from turn 100 onwards.This suggests that the score based attributes do not differ substantially betweenplayers during the opening. Indeed, chains and battles— the concepts that theattributes chosen here depend on— are most important during the midgame, soit makes sense that forward selection has prioritised attributes from this sectionof the game. A range of attributes were selected, with most score attributes (ortheir derivatives) being selected. Forward Selection improved accuracy by 5%.

The graph of performance as Forward Selection was performed can be seenin the graph in figure 4.7. Each dot on the graph represents the performance ofa model which was generated and tested using 10-fold Cross Validation. Colourdenotes generation. The first generation is the dark-blue block which runsfrom 0-20,000 milliseconds as each of the 720 attributes were used to generate

57

Page 61: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

and test a model in turn. The attribute which scored the highest accuracy,110_libertiesBlack, was then added to the result set and generation 2, testingmodels generated using 110_libertiesBlack and another attribute, runs from20,000-40,000 milliseconds.

We can immediately see that the increase in accuracy is logarithmic in timewith the majority of the increase in accuracy being made in earlier generations.The graph starts at around 25% accuracy, which is the minimum we wouldexpect since there were 4 classes, so even a random classifier would achieve 25%accuracy. By the 6th generation, the average accuracy is around 0.55, or 55%.The graph continues to climb for another 10 generations, taking another 360,000milliseconds (6 minutes) to reach the final accuracy of ∼ 60%.

The length of each generation is also of interest to us— Since each generationcontains roughly the same number of tests, we can see from the graph how thenumber of attributes affects the time take to generate and test a classifier. Thetime for each generation starts at roughly 20 seconds and increases linearly to60 seconds for generation 16. This means that generation time linearly increaseswith number of attributes when the size of the training set and model parametersare held constant for a Naive Bayes Classifier. This is as we would expect—The algorithm requires a pass over each training example for each attribute asseen in section 2.4.3.1, so it is of complexity O(A∗N) where A is the number ofattributes and N is the number of training examples. It is this linearity whichallows the Naive Bayes classifier to be used on this dataset with such a largenumber of attributes. Other classifiers do not perform so well.

4.6.4.4 AdaBoost

Using the AdaBoost algorithm here had no discernable effect on accuracy. Com-paring results tables 4.8 and 4.10 we can see that the two classifiers performedidentically. This may be an artefact of the Normal distribution assumption thatthe naive Bayes algorithm makes for continuous variables. All of the attributeshere are continuous, so every attribute uses the Gaussian distribution for con-verting a value into a probability for each class (see section 2.4.3.1). However,when the second Naive Bayes classifier for the Boosting model is generated, themodified weights do not do much to change the normal curve, so the secondmodel ends up very like the first model and makes very similar predictions. Theonly way that the curve would shift significantly is if there was a group of exam-ples which NOT normally distributed about the mean and were all misclassifiedby the first model. See figures 4.8 and 4.9 for an example. The similarity ofthe classifiers, combined with the comparatively low weights of the models gen-erated in rounds 2-6 (see table 4.11) means that they perform the exact sameclassification, and so get the same accuracy.

4.7 Combined Boards & Scores

I tried combining the Boards and Scores details together into a single classifier.Since the Board data would identify areas where a player favoured a particular

58

Page 62: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Figure 4.8: The model generated in the first round of AdaBoost. The exampleslie in a normal curve.

Table 4.11: Weights for each weak classifier within the AdaBoost model.Model Weight1 0.4452 0.1363 0.0224 0.0035 0.0076 0.0027 0.003

59

Page 63: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Figure 4.9: The model generated in the second round of AdaBoost. Examplesthat the first round model classified correctly have been removed (had theirweights reduced). We can see that the examples still form a normal distributionvery similar to the first round, so both classifiers will make similar predictions.

60

Page 64: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

space or area (especially in the early turns of the game), and the score datawould identify players that tended to play aggressively or score highly (especiallytowards the end of the game), I hoped that the combination of both of thesetraining sets would produce a better classifier. Since the Boards and Scoresachieved around 59-65% accuracy separately, we would hope that the combineddata set would yield a classifier with significantly better accuracy.

4.7.1 Results

4.7.1.1 Simple Naive Bayes

Naive Bayes Classifier with 10-fold Cross Validation achieved an accuracy of69.38% +/- 2.45% on the combined boards and scores training data (see table4.12).

4.7.1.2 Forward Selection

Applying forward selection to the 11550 attributes reduced the attribute set to10 attributes, and created a classifier which had accuracy of 63.56% +/- 3.08%(see table 4.13). The attributes selected were:

10_Q16, 190_capturedBlack, 110_libertiesBlack, 30_M10, 50_C15, 30_P11,60_S9, 20_C18, 10_M11, 20_D19

4.7.1.3 ADA Boost

The ADABoost algorithm achieved an accuracy of 69.75% +/- 2.97% on thesame data (see table 4.14).

61

Page 65: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Figure 4.10: Accuracy results as Forward Selection algorithm added more andmore attributes. Each dot represents the accuracy of a single trial. Colourdenotes generation.

62

Page 66: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Table4.12:Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

setof

scores

andbo

ards

takenfrom

Gogames

betw

een

compu

terplayersusing10-fo

ldcross-valid

ation.

Accuracy:

69.38%

+/-

2.45%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

363

313

3284.62%

GNU

Go1

21153

3148

47.08%

MoG

o2

19391

1791.14%

GNU

Go10

14197

3203

48.68%

Class

Recall

90.75%

38.25%

97.75%

50.75%

Table4.13:Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

setof

scores

andbo

ards

takenfrom

Gogames

betw

een

compu

terplayersusingFo

rwardSe

lectionan

d10-fo

ldcross-valid

ation.

Accuracy:

63.56%

+/-

3.08%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

319

360

4080.76%

GNU

Go1

42246

53168

48.33%

MoG

o15

43308

4874.40%

GNU

Go10

2475

39144

51.06%

Class

Recall

79.75%

61.50%

77.00%

36.00%

63

Page 67: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Table4.14:Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

setof

scores

andbo

ards

takenfrom

Gogames

betw

een

compu

terplayersusingADABoo

stan

d10-fo

ldcross-valid

ation.

Accuracy:

69.75%

+/-

2.97%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

363

372

3084.03%

GNU

Go1

16147

2144

47.57%

MoG

o4

17394

1491.84%

GNU

Go10

17199

2212

49.30%

Class

Recall

90.75%

36.75%

98.50%

53.00%

64

Page 68: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

4.7.1.4 Aya, MoGo and GNU Go 1 only

Since the majority of the error in the classifiers comes from confusion betweenthe two GNU Go players, I generated a classifier using only the training gamesfrom Aya, MoGo and GNU Go 1 (i.e. without any of the GNU Go 10 games).400 games for each player gave a total of 1200 games which were used to generatea naive Bayes classifier using 10-fold validation. The resulting classifier achieveda much higher classification rate of 91.50% +/- 2.41% (see table 4.15).

4.7.1.5 GNU Go players only

I also wanted to investigate whether the classifier was capable of differentiatingbetween GNU Go 1 and GNU Go 10 without any influence from the other com-puter players. Reducing the training set to only those training games labelledas GNU Go 1 or GNU Go 10 yielded a classifier which achieved the accuracyresults seen in table 4.16 (accuracy of 50.25% +/- 5.33%). This is well withinthe margin of error of a classifier which guesses randomly or always choosesthe same classification, so this classifier cannot differentiate between the twostrategies. This indicates that the GNU Go 1 and GNU Go 10 strategies aremuch more similar to each other than to either Aya or MoGo. In section 4.8 wecreate a better classifier which focuses only on GNU Go, and performs betterthan a random classifier.

65

Page 69: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Table4.15:Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

setof

scores

andbo

ards

takenfrom

Gogames

betw

eenAy

a,MoG

oan

dGNU

Go1using10-fo

ldcross-valid

ation.

Accuracy:

91.50%

+/-

2.41%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

366

385

089.49%

GNU

Go1

31341

40

90.69%

MoG

o3

21391

094.22%

GNU

Go10

00

00

0.00%

Class

Recall

91.50%

85.25%

97.75%

0.00%

Table4.16:Accuracyresults

ofNaive

Bayes

Classifier

appliedto

thetraining

seto

fscoresa

ndbo

ards

takenfrom

Gogames

betw

eenGNU

Go1an

dGNU

Go10

using10-fo

ldcross-valid

ation.

Accuracy:

50.25%

+/-

5.33%

True

Player

Aya

GNU

Go1

MoG

oGNU

Go10

Class

Precision

Pred

ictio

n

Aya

00

00

0.00%

GNU

Go1

0174

0172

50.29%

MoG

o0

00

00.00%

GNU

Go10

0226

0228

50.22%

Class

Recall

0.00%

43.50%

0.00%

57.00%

66

Page 70: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

4.7.2 Analysis of Results

4.7.2.1 Combined Scores and Boards

Combining scores and boards data has increased the accuracy in every exper-iment. This is attributed to the fact that more data is availiable. The boarddata allows algorithms which capture specific intersections or areas of the boardto be classified, while the score data provides an overview of how the player hasplayed and scored over the game so far which allows long-term strategy to beidentified by the classifier. As described in the individual analysis sections foreach of the data sets above, the board data tends to be more useful toward thebeginning of the game, while the score data is more useful closer to the end ofthe game, after 100 moves or so have passed. Both of these factors mean thatthe combination of data is best at classifying entire games.

4.7.2.2 Forward Selection

When using forward selection, similar attributes were selected to those selectedwhen the forward selection algorithm was run on only the boards or only thescores data. This was expected since the attributes which are the best indicatorsof player strategy in the board or score only data will also be the best indicatorsof strategy in the combined data. Forward selection also chose attributes fromboth score and board data. For the first 4 attributes, where the biggest gainin accuracy was obtained (see fig 4.10), board and score attributes were usedequally. This supports the hypothesis made at the beginning of this sectionthat the 2 sets of attributes complement each other and are better at identifyingcertain players in different situations.

4.7.2.3 GNU Go confusion

An accuracy of over 90% is very high for differentiating between the 3 strategiesAya, MoGo and GNU Go 1. This result shows that it is possible to identifycomputer strategies with a high level of accuracy, which was my initial aim. I donot have the resources to play a substantial number of games using additionalstrategies, but the evidence suggests no reason why other computer playerswould be any more difficult to identify than the ones already tested.

On the other hand, the classifier has shown a complete inability to differ-entiate between the two different levels of GNU Go. The accuracy is withinthe error bound of 50% which means that it is no better than a random clas-sifier (e.g. coin toss). As discussed above, we would expect the two players toplay very similarly, however I did expect there to be some difference in how theplayers make their moves. The results suggest one of two situations:

1) GNU Go 1 and GNU Go 10 play identically. GNU Go 1 and GNU Go 10do not play identically. GNU Go 10 takes far longer to make a move than GNUGo 1, and also wins more than GNU Go 1. In the 100 games played betweenGNU Go 10 (Black) Vs GNU Go 1 (White), GNU Go 10 took an average of 140

67

Page 71: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

seconds of thinking time per game ( 1.26 seconds per move). GNU Go 1 tookan average of 31 seconds of thinking time per game ( 0.28 seconds per move).In these 100 games, GNU Go 10 won, 72 to 28. In the 100 games where GNUGo 1 was playing black and GNU Go 10 was playing white, GNU Go 10 wonagain, 74 to 26. So there is a definite difference in strategy, or at least ability,since GNU Go 10 takes longer to decide its moves, and wins many more games.

2) GNU Go 1 and GNU Go 10 require a more advanced classifier to tellthem apart. From the win statistics above, it is obvious that GNU Go 10 isthe stronger player, however it still loses 1 in 4 games to GNU Go 1. In Gorankings, winning around 73% of the time corresponds to a difference of around1 rank (the exact amount varies depending on the ranking system used) whichmeans that the two players are very similar in strength. This provides furtherevidence that they are very difficult to tell apart, but it is possible that a betterclassifier would be able to differentiate between them with a better accuracythan random.

4.8 Better Classifiers for GNU Go

As seen in section 4.7.1.5, the classifiers generated so far cannot tell the differencebetween GNU Go 1 and GNU Go 10. This is unsuprising since they are boththe same cpu-player, but with the different difficulty levels. Since we haveestablished that they are essentially playing the same strategy, we have achievedour aim of creating a classifier which can classify between different strategiessuccessfully. However, I also wanted to investigate whether it was possible at allto create a classifier which was capable of classifying between different levels ofGNU Go— differentiating between ‘sub-strategies’— with more accuracy thana random classifier.

Based on the fact that GNU Go 10 is stronger than GNU Go 1, I thought thata better approach might be to consider only the final score with the intentionthat a high score for black would be a good indication that GNU Go 10 was play-ing black. Reconsidering the assumption made above in section 4.6.2, I thoughtthat a more accurate measure of score may be beneficial (and certainly would nothinder performance!) I used GNU Go (level 1) to score all of the board positionsfor the 300th turn (usually after the game was finished). This gave an identi-cal list of attributes to those in the boards and scores attribute set, but where300_resultArea, 300_resultTerritory, 300_areaBlack/White and 300_territo-ryBlack/White were scored more accurately. This process took around 2 hoursto complete for all 1600 games.

Running the simple naive bayes classifier on this data did not significantlyaffect the results either when run on games for all 4 players, or when restrictedto only GNU Go 1 and GNU Go 10 games. The accuracy found was less than1% higher in both cases- i.e. within the error margins. This is unsurprisingsince the scores used in the previous (less accurate) attributes had provided arough indication of the winner and the final score (see section 4.6.2).

I next tried generating a classifier using the final score (300_resultArea) as

68

Page 72: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

the only attribute, in the hope that GNU Go 10’s stronger AI (and so higherscore) would enable the classifier to differentiate it from GNU Go 1. Thisclassifier performed slightly better, and achieved an accuracy of 55.63% +/-3.84%, marginally above the 50% that a random classifier would achieve. Thereason that this classifier does not perform better is that GNU Go 1 and GNUGo10 still perform very similarly when playing against Aya and MoGo. AlthoughGNU Go 10 may perform slightly better on average, it will still lose almost asmany games as GNU Go 1 (statistically they are only 1 rank apart). If all ofthe games were played as GNU Go 1 vs GNU Go 10, then it would be easierto classify the winner and loser using this attribute (with 75% success rate),however the nature of this experiment is that for a given game we do not knowwhich player is playing as black or white.

I decided to use Forward Selection to ensure that there were no better at-tributes to test on than 300_resultArea (which was initially chosen somewhatarbitrarily). Surprisingly, forward selection performed very well here, and wasable to produce a classifier which achieved an accuracy of 68.12% +/- 6.60%.The entire forward selection process took around an hour to complete. The mostsignificant attribute was 300_capturedWhite. The distribution graph from thefinal model can be seen in figure 4.11. The attributes selected, in order were:

300_capturedWhite, 160_Q15, 100_L11, 50_S9, 90_G15, 80_S1, 90_T15,50_O12, 220_J19, 90_R19, 80_A9, 70_G15, 30_O10, 70_O5.

Figure 4.11: Graph showing the probability distribution for GNU Go 1 andGNU Go 10 for the attribute 300_capturedWhite after forward selection.

It is clear from figure 4.11 that GNU Go 10 has a higher expectation forthe number of white stones captured (25 compared with 18). The variance ofGNU Go 10 is also higher, indicating that it is more likely to capture a largenumber of white stones in a game (from the left hand side of the graph it alsoappears that GNU Go 10 is more likely to capture a large negative number ofstones, but capturing a negative number of stones is impossible). This attribute

69

Page 73: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

is the best single measurement for classifying between GNU Go 1 and GNU Go10. It is also interesting to note that the scoring attributes (territory and areaattributes) were not selected. This supports my argument made in section 4.6.2that very accurate scores do not significantly increase classification accuracy,although it is surprising since we have already established that GNU Go 10wins more games than GNU Go 1.

70

Page 74: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Chapter 5

Go & Humans

5.1 Human Data

So far, I have used classifiers on data generated by computer players playingagainst each other. I next investigated if it was possible to use the same tech-niques on games played by humans. The records from KGS (KGS, 2010b), oneof the largest Go Servers online, are freely available and provide a record of allGo games between players where at least one player is of rank 7Dan or stronger,or both players are 6Dan. I used the records of all games played between 1stJanuary 2009 and 31st December 2009 (KGS, 2010a).

Since the games are provided in SGF format (see section 4.2.2), the sameprocesses as used for the computer players can be applied easily. I first ran ascript over all of the 18,839 games to separate them by the player playing black.The most active players with over 100 games can be seen in table 5.1. Choosingthe first 50 games for each of the 24 players who have played between 100 and300 games gave an exampleSet of 1200 games. I ran the modified version ofGoGUI on these games to generate the scores and board position data.

5.2 Classifying Humans as Computers

I ran the Naive Bayes classifier which was generated in section 4.7 (withoutForward Selection or Boosting) on the human data. All of the 1200 Humangames were classified as GNU Go 1.

I then ran the classifier created in section 4.7 using the Forward Selectionalgorithm over the same data. This classifier only uses 10 attributes out of the11550 recorded:

10_Q16, 190_capturedBlack, 110_libertiesBlack, 30_M10, 50_C15, 30_P11,60_S9, 20_C18, 10_M11, 20_D19

As can be seen from table 5.2, GNU Go 10 and MoGo are the most predicted

71

Page 75: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Table 5.1: Number of games played as black by each human player in 2009(KGS server). Players who have played between 100 and 300 games are shown.

Player Games PlayedHutoshi 242GBPacker 232turk 224coolbabe 197BUM 164take1970 164eastwisdom 151supertjc 146lorikeet 136aguilar1 129jim 129guxxan 127hirubon 127yagumo 127artem92 124satoke 122zchen 119koram 118stopblitz 113loveHER 111abel 106michi2009 104Erinys 102shiryuu 102

72

Page 76: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Table 5.2: Results of classifying the games played between human players ascomputer players.

Aya GnuGo1 MoGo1 GnuGo10Hutoshi 11 2 2 35GBPacker 0 1 33 16turk 9 1 2 38coolbabe 3 0 37 10BUM 10 5 8 27take1970 2 4 31 13eastwisdom 4 1 22 23supertjc 4 0 27 19lorikeet 1 1 29 19aguilar1 4 2 15 29jim 2 5 19 24guxxan 10 4 6 30hirubon 4 3 16 27yagumo 7 5 5 33artem92 5 1 13 31satoke 1 2 24 23zchen 0 1 38 11koram 2 1 36 11stopblitz 14 0 14 22loveHER 4 0 9 37abel 7 1 12 30michi2009 3 0 19 28Erinys 6 1 2 41shiryuu 6 3 26 15TOTAL 119 44 445 592

73

Page 77: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

classes. This may be because these algorithms are the most ‘humanlike’ and sogames which are played Human vs. Human most resemble games played by them(at least when only considering the 10 attributes above). I suspect that GNUGo 1 also plays in a ‘humanlike’ way (as suggested from the first classificationexperiment above), but because GNU Go 1 and 10 both play similarly, GNU Go1 was ‘eclipsed’ by GNU Go 10: GNU Go 1 was the ‘second best’ classificationfor 487 out of the 592 human games classified as GNU Go 10. This indicatesthat there were many human games which were played in a similar way to GNUGo 1, but they were even more similar to GNU Go 10, so were classified as GNUGo 10. In other words, if GNU Go 10 had not been one of the classes, a lot moregames would have been classified as GNU Go 1. Since we have already seen thatGNU Go 1 and GNU Go 10 are roughly indistinguishable by our Naive Bayesclassifier, it is likely that the difference between the classes is negligible and theclassifications could be reversed by slight changes to the attributes used.

5.3 Classifying Humans as Humans

5.3.1 Results

I tried using the users themselves as classes, giving a total of 24 classes. Usinga Naive Bayes Classifier and 10-Cross Validation, the generated model achievedan accuracy of 17.42% +/- 3.97%. This is substantially better than a ran-dom classifier, which would achieve 100

24 = 4.17% accuracy. This accuracy wasachieved regardless of whether the final score (for turn 300) was calculated usingthe inbuilt GoGUI or external GNU Go to do the scoring.

When Forward Selection was used on the same attributes, an accuracy of29.92% +/- 1.74% was attained, with the following attributes selected:

30_R16, 20_Q3, 10_Q16, 20_P17, 100_D4, 10_Q4, 170_H1, 270_R18,10_R5, 260_D16, 240_meanChainSizeBlack, 10_R17, 240_T12.

When the exampleset was restricted to only the first 5 users (shiryu, Erinys,michi2009, abel and loveHER), and the same process was run, an accuracy of46% +/- 12.81% was attained (without Forward Selection). Restricting further,to only the first 3 users (shiryu, Erinys and michi2009) yielded a classifier with68.67% +/- 7.33% accuracy (without forward selection).

5.3.2 Analysis

5.3.2.1 Comparison with Computer Players

The result of 68.68% accuracy when classifying between 3 players is not as highas when classifying between 3 computer players (GNU Go 1, Aya and MoGo)(see section 4.7.1.4), where 91.5% accuracy was achieved, but is still much betterthan a random classification. The drop in accuracy between computer playerscan be explained by their nature. Computer players are deterministic, and will

74

Page 78: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

always play according to the same algorithm. In a given situation (board posi-tion), they will always do the same thing (randomness may be applied here tomake the computer produce different outcomes, but the choice and probabilitiesof choosing a particular action remain fixed). They also do not consider theiropponent, his strategy (and strengths / weaknesses), or other ‘meta game’ fac-tors such as time taken to make moves, whether several games have been playedagainst the same opponent, or the possibility of cheating. Computer playerswhich learn from previous games, or do consider these factors may exist, buthave not been studied here.

Contrastingly, humans are not deterministic, will learn over time (the humangames studied here were played over the whole of 2009), will be less likely tomake a wrong move if they have made that mistake before, and will attemptto identify weaknesses in the opponent’s technique, possibly over several games.Factors such as the player’s mood or external time constraints may affect howthey play in different games.

The way the games were played and the data was collected will also havehad an impact on the classification rate. The computer games were played ina controlled environment with each computer player playing given a set timelimit and computational power. The opponents also varied less— there wereonly 4 different opponents and 100 games were played against each opponent.In the human games, there were hundreds of different opponents, meaning thatstrategy was more likely to be more varied. Also, with data gathered frominternet Go servers (such as our data) there is no guarantee that the playerbehind an account is always the same player (friends may occasionally sharean account), meaning that their strategy could appear to change signficantlybetween games. When all of these factors are considered, the drop from ∼ 91%accuracy on computer players, to ∼ 68% (for 3 human classes) here, is muchmore understandable.

5.3.2.2 Forward Selection

Forward selection increased the accuracy when classifying between from 17% to28%. The most 13 selected attributes were:

The recall accuracy varies a lot across the different players. ‘turk’ has a recallof 62%, being correctly identified 31 out of 50 times. ‘supertjc’ was not identifiedcorrectly at all. In some cases, the badly classified players were classified asseveral different players. For example supertjc’s 50 games were classified as16 different players, with no more than 8 games being classified as the sameplayer. In other situations, players were frequently misclassified as anotherspecific player. This may indicate that the 2 players play very similarly— i.e.with similar strategies- and so should perhaps be grouped together into a newclass. This could form the basis of a clustering technique which would findseveral strategies from similar players which could be successfully classified.

The attributes selected here are mostly board-attributes, with only one scoreattribute (240_meanChainSizeBlack) being selected as the 11th attribute. Thissuggests that human players stick to playing similar moves and focusing on

75

Page 79: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

specific areas of the board, rather than consistently scoring higher or playingsuch that the other score attributes remain consistent between games. Thiscould be due to there being many different opponents (a different opponentin each game), meaning that the majority of the ‘white’ attributes (such asmeanLibertiesWhite) are not relevant, and that the attributes which dependon the score (e.g. Area, Territory) could vary dramatically depending on thestrength of the other player.

5.4 Classifying Computers as Humans

Once a classifier which classified humans successfully had been created in theprevious section, I applied it to the computer games to see what humans theywould be classified as. I used the classifier generated in section using the forwardselection algorithm since this had achieved the highest accuracy between humanplayers. The results are shown in table 5.3.

Table 5.3: Number of games played between computer players classified as hu-man players.

Aya GnuGo1 MoGo1 GnuGo10shiryuu 52 45 55 49Erinys 20 2 0 3michi2009 11 3 1 10abel 36 11 1 17loveHER 8 13 0 11stopblitz 50 2 5 3koram 8 39 10 48zchen 3 19 5 14satoke 4 15 1 14artem92 30 18 1 21guxxan 46 16 4 9hirubon 10 29 22 25yagumo 26 6 0 7jim 10 31 69 24aguilar1 6 13 0 11lorikeet 7 8 20 11supertjc 0 2 2 3eastwisdom 2 1 3 3BUM 4 0 0 3take1970 14 67 157 53coolbabe 29 21 44 16turk 18 12 0 23GBPacker 2 12 0 15Hutoshi 4 15 0 7

We can see from the results that each computer player tends to be classifiedas 3-10 different human players. There are some players (e.g. eastwisdom andBUM) that have very few cpu-games classified as them. This suggests that these

76

Page 80: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

players do not play similarly to computer players, and that the computer-playedgames are closer to different human classes. On the other hand, there are playerssuch as ‘shiryuu’ who had many cpu-games classified as them— for all 4 differentcomputer players. This suggests that these players have some similarity withall of the computer players, and may hint at there being a common strategybetween all 4 of the computer players. Again, GNU Go 1 and GNU Go 10 haveachieved similar results, and were classified as most similar to the same humanplayers. This makes sense since if GNU Go 1 plays similarly to koram, then wewould expect GNU Go 10 to play similarly to koram too.

Also very noticable is that MoGo was classified as the player ‘take1970’ for157 of its 400 games. This indicates that MoGo plays in a way most similarto take1970. It is possible that take1970 was actually MoGo playing as a boton KGS. This is supported by the results of the reverse experiment in section5.2, where take1970 was classified as MoGo much more often than any othercomputer player (see table 5.2). It is not possible to test this theory since I haveno way of getting in touch with the user take1970, but if this is true then it mayindicate that a version of these classifiers could be used to detect cheating—identifying when a user is using a bot to play instead of playing themselves.

77

Page 81: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Chapter 6

Discussion ofImplementation

The end results and analysis of the experiments performed have been discussedin previous chapters. This chapter addresses the problems that I encounteredwhile completing this project, and their solutions. There were several problemswhich prevented experiments from working correctly, or which meant that theresults were incorrect. In most of these cases I was able to fix the bugs andre-run the experiment, but in some cases the problems were more fundamentaland required creative solutions or a different approach. Most of the problemsdescribed in this section are the latter case since they are the more ‘interesting’problems and are more applicable to related or future work which may be per-formed. This chapter also describes the design decisions I made when planningand coding the implementation, and suggests ways that it might be improvedif it were to be used for future experiments.

6.1 Data Generation

One of the largest problems in this project was collecting the data. For Go, thisinvolved having all of the computer players play enough games to create a largeenough training set from which to learn a classifier. The time that it takes toplay a Go game meant that data generation at the time the classifier was to betrained was not feasible. I split the processing into 3 sections:

1. Game Playing— Playing the computer players against each other to pro-duce game records (SGF files).

2. Generation of an ‘intuitive’ representation— Processing the SGF files toproduce useful training data that could be imported into RapidMiner andused to generate a classifier.

3. Generation & Testing of models— Using the imported data to produce aclassifier and then testing that model.

78

Page 82: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Each of these 3 steps were completed separately which meant that gameswhich had been played could be used to produce several different representa-tions, and then this representation could be used to generate several differentmodels. This was especially useful during development when the values in theintuitive representations could be inspected and were frequently found to beincorrect (due to bugs in the generation process), or when adjusting parameterswhile producing preliminary models to find what worked best.

I started by playing all 4 computer players against each other in order togive them a varied set of opponents. Since each game took 2-5 minutes to playon a modern laptop, the initial set of 1600 games would have taken a long time(∼100 hours) to play. To reduce this time I took several steps:

1. I was able to have several instances of Go games in progress at once onthe same machine.

2. I had access to other PCs which could be used to run Go games when notin use.

3. Amazon EC2 (‘The Cloud’) offers cheap server rental charged by the hour.I was able to use a High-CPU instance to play some of the games.

4. I originally only played 1000 games, then ‘flipped’ the black and whitesides to allow a game to produce 2 training records- one for black and onefor white (see section 4.6.2).

While having several instances of Go games running on the same machine (point1), I was careful to stop starting new instances when cpu-usage neared 100%so that the computer player strategy would not be degraded. On most PCs, 2instances was enough to hit this limit. A brand new Quad-core PC could have4 games in progress at the same time. The computer players’ behaviour variedwhen run on faster machines. Aya, GNU Go simply took less time to considerand make their moves. MoGo thought for the same amount of time (2 seconds),and performed more analysis of the board (more Monte-Carlo simulations), somade smarter moves when running on a faster computer. I adjusted the amountof thinking time in order to try to keep the number of simulations made constant(5000-9000 per move) when running on different hardware.

Using the games to produce 2 records each did not work as well as hopedsince player strategy differs depending on whether they move first or second,resulting in a biased training set. This was discussed at length in section 4.4.

6.2 Additional Computer Players

I originally intended to use 5 different computer players for analysis. Fuegois one of the strongest (along with GNU Go and MoGo) computer players.Unfortunately, including Fuego would have required an additional 900 gamesto be played, and would probably not have made much of a difference to theresults or conclusions drawn.

79

Page 83: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Fuego

Fuego is an open source, Lesser General Public Licensed, Go library orig-inally developed by the Computer Go Group at the University of Alberta. Itincludes a player that uses a Monte-Carlo tree search, and is actively main-tained and developed. Fuego won 9x9 Go and achieved second place in 19x19Go at the most recent (at time of writing) 14th Computer Olympiad in May2009 (Computer Olympiad, 2010). In August 2009, Fuego played won a game of9x9 Go against Zhou Junxun, a top-rank (9 Dan) professional player- the firsttime a computer player has won a game of 9x9 Go against a top-ranked playerwithout a handicap. Fuego also won a game of 19x19 Go against Mr. ChangShen-Su, a strong 6-Dan amateur player, with 4 handicap stones (Group, 2010).Playing as a bot on KGS, Fuego has maintained a level of 1-2Kyu over the pastyear (KGS, 2010d).

6.3 Large Data Sets

Another issue that arose when I was investigating the human data was thatRapidMiner does not handle large (over 100MB) data sets very well. I initiallyused 100 games for each of the top 24 players, for a total of 2400 records, butRapidMiner struggled to load this data from the csv file and crashed repeatedly.This may be a bug, or RapidMiner may not be intended to handle such largedata sets in this way. Extensions exist to allow it to interface with a databaseand only retrieve records in batches so that it can be used with huge data sets,or data sets which cannot be kept in memory, but I felt that it would be mucheasier and less error prone to reduce the number of games used from each playerthan develop a database, interface and ensure that an equivalent process to thenon-database process was being applied. Versions of Microsoft Excel prior to2007 also cannot handle datasets with more than 256 attributes (columns).

6.4 Model Generation Time

Another issue with using large data sets is that it can take a long time todo processing and generate classifiers. The time taken and memory needed togenerate a classifier is a function of the number of attributes (A), size of trainingset (N), and parameters set for the model (P = p1, . . . , pk) for example, thenumber of generations or iterations to perform during generation.

GenerationT ime = f(A,N, P )

With over 10,000 attributes and 1000 records, many classifiers take a verylong time to be generated. Since I was using 10-fold cross validation, 10 modelshad to be generated in order to provide a correct measure of accuracy.

Forward Selection also takes a long time, although was the fastest of theattribute selection algorithms tested. It has the advantage that in round n,only n attributes are used to generate a model (i.e. A = n is low) so thegeneration time for an individual model will be low. On the other hand, each

80

Page 84: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

round consists of all A attributes being tried one at a time— i.e. there will beA iterations, and A models generated (or, in the case of K-fold cross validation,K ∗ A models generated). Since forward selection typically selected around 15attributes, that means that a total of K ∗ 15 ∗A ≈ 10 ∗ 15 ∗ 10, 000 = 1, 500, 000models were generated and tested. Obviously this is only possible if the modeltakes a very short time to generate and test when the number of attributesis low. For Naive Bayes classifiers, this is true and forward selection could beused— although it still took several hours to complete. For all other models Itried, they took much longer, meaning that forward selection was not practical.

Genetic Attribute Selection was an alternative attribute selection method.This method generates a population of Pop attribute sets, then uses each of thesesets to generate and test a model. The attribute sets which generated the mostaccurate models ‘survive’ to the next generation, and the others are removed andreplaced with copies of the more accurate sets. The survivors then may mutate—a few attributes are randomly added and removed and all of the new attributesets are retested. This continues for a given number of generations (Gens) and atthe end, the best set of attributes is returned. This method requires less modelsto be generated (with the default settings, Pop ∗Gens ∗K = 5 ∗ 30 ∗ 10 = 1, 500models), but each model will be generated from far more attributes (i.e. A willstart off large), with the expected starting number of attributes being A

2 . ForNaive Bayes with 5000 attributes, it takes around a 2 seconds to produce andtest a model. This means that the entire genetic attribute selection processtakes around 50 minutes. This can be improved by changing parameters ofthe genetic attribute selection so that (for example) the maximum number ofattributes to be selected is set to 20. This speeds up generation of each model,so cuts the overall run time. Unfortunately, genetic attribute selection did notperform as well as forward selection (or no selection!) on any of the datasets Itried it on, with any of the parameter settings I used.

6.5 Noughts and Crosses: PerfectPlayer

To save development time, I used a strategy approximating perfect play fromthe PerfectPlayer strategy in Noughts and Crosses. The player first looks tosee if it can win this turn, and moves there if it can. It next tries to block theopponent if they could win on their next move. A brute force approach nextplays out all possible games from the current board position, and the possibleempty spaces which could be moved in this turn are scored according to howlikely they are to result in a victory. Although rather ‘heavy-handed’, this bruteforce approach took only a few minutes to program and test, and since we areonly using the player to provide a strategy which will be classified, it is notimportant that it is actually playing with provably perfect strategy. We couldrename the player ‘VeryGoodPlayer’ and all of the experiments and conclusionsmade in chapter 3 would be unaffected.

I am very confident that PerfectPlayer actually plays in a way identical tothe actual perfect strategy, but I have not formally proved this. I did howeverperform a statistical analysis. I played PerfectPlayer against RandomPlayer10,000 times and checked that PerfectPlayer never lost. RandomPlayer has

81

Page 85: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

a possible 9 ∗ 7 ∗ 5 ∗ 3 ∗ 1 = 945 different ways of playing if it goes first, and8∗6∗4∗2 = 384 ways of playing if it goes second, for a total of 945+384 = 1329possible sequences of moves. If we suppose that there is a possible sequence ofmoves such that PerfectPlayer loses, then the chance of RandomPlayer playingin this way on a given game is 1 in 1329, or 1

1329 . It follows that in a givengame, RandomPlayer will not play this way with probability 1 − 1

1329 = 13281329 .

For RandomPlayer to avoid this sequence of moves for 10,000 games in a row,the probability falls to ( 1328

1329 )10000 = 0.000538. This means that we can be1 − 0.000538 = 99.946% sure that PerfectPlayer will really never lose. Sincewe have not included rotations and translations here, our confidence is actuallyeven greater. This is definitely ‘perfect’ enough for its purpose here.

6.6 Programming Style

Throughout this project, I needed to develop scripts or programs to accomplishspecific tasks. Since the scale of these tasks was very small, and the programswere usually only run a few times, I did not put a lot of emphasis on maintainablecode or ‘good’ system design. Many of the scripts included hard coded values orother code specific to the task, rather than obeying the usual reusable object-oriented code design principles. The choice to use this methodology was aconscious one- I thought that getting results from code which worked was moreimportant and would allow me to try more techniques, experiments and createmore interesting and useful results. If the software was to be distributed, oreven used on a long term basis, it would need to be refactored significantly.Some examples of where the code style could be improved are:

1) PerfectPlayer Noughts and Crosses Player. PerfectPlayer uses a brute-force search to find the optimal positions to play. This is slow (it can take up to1 second to make a move). Since it was only used to generate the initial 3600games’ training data, this was not a problem, but if much more training datawas needed, or a faster response was necessary, then a rewrite of the perfectstrategy would be needed. PerfectPlayer was discussed above in section 6.5.

2) Scoring the board. As discussed in section 4.6.2, scoring a Go board is adifficult problem. I used the inbuilt GoGUI Engine to score the boards in orderto keep the time needed to process the games down. This was not as accurateas scoring the board using GNU Go or scoring it manually, but saved a hugeamount of time.

3) Hard-Coded Values. There are several examples of where values are hard-coded into the program. For example, the directories which contained the SGFfiles to be processed are included in the GoGui.java file. When I needed tochange the directories to process different files (for example, the human games),I altered the code manually and recompiled. It was much faster to do this onoccasion than it was to write a GUI multiple file-selector for the user to choosewhat directories to process. It probably also reduced user error since I did nothave to manually choose the directories to use every time processing needed tobe completed.

82

Page 86: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

6.7 Modifications Made and Coding Done

6.7.1 Noughts and Crosses implementation

As described in section 3, I created a noughts and crosses implementation en-tirely from scratch. This implementation is extensible to add new Players,Symbols (beyond O and X), and with a small amount of effort, change boardsizes or rules. I also linked this in with the RapidMiner libraries and a pre-generated model to allow interactive play and live classification of the player’sstrategy (see section 3.3.3).

6.7.2 Modifications made to GoGUI

The main addition was the 3 additional menu buttons which cause the programto loop through a list of SGF files and produce the representation which werethen used as training data. There are 3 buttons:

• Write All Board Positions To File.

• Write All Scores To File.

• Write Board and Score Positions to File.

When one of these buttons is pressed, a new thread is started to loop throughSGF files and produce the output which is saved to output.csv. Careful attentionhad to be made to ensure that interfacing with GNU Go to score the finalpositions was possible (since GNU Go can take 5-30 seconds to score a board).Using a new thread has the benefit that the EventHandler thread is free toupdate the GUI, so the game currently being processed is played out in a fractionof a second as the processing is completed.

6.7.3 BugFixes and Modifications made to RapidMiner

RapidMiner was usually excellent at processing the data, but there were a fewcases where I was able to update or improve the performance of some of theoperators. For example, in the default implementation of the genetic attributeselection algorithm, initial attribute sets were randomly generated, then testedand rejected (to be regenerated) if they contained more than the maximumnumber of attributes. With over 10,000 attributes, a maximum number ofattributes set to 20, and a default chance of inclusion of each variable of 50%this meant millions of attribute sets were being generated before one which wasaccepted was found. I was able to change the probability of inclusion of anindividual attribute to the set to MaximumNumberOfAttributes

T otalNumberOfAttributes which decreasedthe size of the attribute sets dramatically, hugely increasing the speed of theprocess. I also discovered 2 bugs in other operators which I was able to fix andreport to the developers.

83

Page 87: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Chapter 7

Conclusion & Future Work

7.1 Conclusion

I have shown that it is possible to create classifiers which are capable of differ-entiating between strategies with a reasonable level of accuracy. The best Goclassifier was able to differentiate between 3 very different computer players with90% accuracy, or 70% accuracy when there are 4 different players, 2 of whichplay very similarly. The best classifier applied to human data achieved 30%accuracy when classifying between 24 different players using real life, unfiltereddata. My original aim was to "analyse an opponent’s game playing techniqueby observing them playing a game and attempt to identify other games playedby this player– recognising a player by their strategy." (see section 1.1). Theclassifiers I have developed accomplish this better than any previous work andare capable of recognising a player from their strategy for Noughts and Crossesand Go. It is also possible to perform this analysis in real time, as demonstratedby the interactive Noughts and Crosses player (see section 3.3.3), and classify aplayer’s strategy as the game is played. More accurate or specialised classifiersmay be able to be created in the future in order to take this work further, asdescribed below.

7.2 Future Work

There are several possible avenues for further work.

1. Additional Attributes — The classifier could be developed further withmore measurements made from the board to try to isolate further impor-tant attributes which would enable a higher classification accuracy.

2. More Advanced Classifier — The classifier itself could be developed furtherto identify more patterns or perform more complicated analysis on theinput attributes. The main constraint here is processing time and memory,

84

Page 88: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

but it may be possible to use a classifier on a subset of attributes, or createa classifier which can be generated fast enough to be practical.

3. Clustering — Clustering of strategies could be performed in order to iden-tify particular strategies (rather than individual people) and which wouldgeneralise to new players. This may require a human expert to describeor label strategies in order to create a better strategy.

4. Application — The work which has been done here could be used to createa better computer player. As described at the start of this report, currentplayers do not take into account the opponent’s strategy directly- theysimply try to make the best move in a given scenario. The inclusion ofone of the classifiers I have developed could help to identify the opponent’sstrategy and adjust its own strategy accordingly.

7.2.1 Additional Attributes

Using further measurements from the board could give more of an insight intohow players play. For example, more attributes such as ‘length of game’, ‘num-ber of live/dead groups’, or ‘Average distance of move from opponent’s previousmove’ could be added. Attributes which refer to only the last 10 or 100 movesmay also be beneficial if a particular player tends to play differently at differ-ent points in the game- e.g. ‘White stones captured in the last 10 turns’ or‘Black chains created since 100 turns ago’. The granularity of turns consideredcould also be decreased from 10 which may result in some increased accuracy,especially toward the start of the game.

Identifying what shapes frequently occur in the game may be a good identi-fier of strategy. For example, one player may play in such a way that ‘ladders’(see fig 4.2) are particularly common. One way to accomplish this task wouldbe to create a template of size 3x3 and place it in the top left, then considerthe shape under the template (either considering only black stones, only whitestones, or both stones) and increment a counter for that shape by 1. The tem-plate would then be moved across the entire board, each time considering thevisible stones and incrementing the relevant counters. The process could thenbe repeated for templates of size 4x4, 5x5 and 6x6 (or larger). This would resultin a lot of attributes to be considered.

3 ∗6∑

i=319− i2

= 3 ∗ 846

= 2538

2538 attributes per turn is a lot, so some key shapes out of these wouldprobably have to be chosen to be used in the final board data which is used to

85

Page 89: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

train a classifier. The concept of ‘joskei’ (a ‘textbook’ play which is a well knownresponse to a particular situation) could also be incorporated here through theuse of a joseki dictionary which a sequence of shapes could be matched to, toproduce more attributes which contain the number of times the player has usedthat particular joseki.

7.2.2 More Advanced Classifier

An alternative to providing more attributes for the current classifiers to work oncould be to use a classifier which is able to extract more meaningful data fromthe board positions already included in the training data. Naive Bayes was usedthroughout the majority of this project, primarily because of its speed on largedata sets. However, Naive Bayes does not consider the relationships betweenindividual variables (in fact, it assumes they are all independent). It is likelythat there are other classifiers which could be used which would utilise the morecomplex relationships between attributes and achieve a higher accuracy (Webbet al., 2005).

Stacking, or Ensemble, is a technique which is used to make a strong classi-fier out of several weaker ones. A range of classifiers are trained and evaluatedon training data. These classifiers are then combined or ‘blended’ together us-ing weights to create a classifier which is stronger than any of its components.Feature Weighted Linear Stacking (Sill et al., 2009) takes stacking even furtherand uses meta-data to generate the weights dynamically, so that different mod-els are used for examples they are particularly good at classifying. Stackingtechniques may be able to be applied to the data without significant increasesin processing power / time or memory required.

7.2.3 Clustering

The project has focused on being able to classify computer or human playersfrom their strategy. However, in order to extend to data other than the trainingdata, it would be beneficial to identify generic strategies which are not tied toa particular player. There is little evidence to suggest that using the playerswho played the games as the classes is the optimal selection of classes, and theymay be able to be grouped together or new produced classes altogether. Forexample, GNU Go 1 and GNU Go 10 both played with very similar, althoughnot identical, strategies. These could potentially be grouped together to createanother class which share a lot of attributes. If the human data could be groupedtogether in a similar way, to find 5-10 distinct strategies, then the classifier couldbe applied to future games to identify the strategy.

Unfortunately, there are a huge number of ‘strategies’. Strategies could beas basic as ‘always capture the centre point’, or as complex as to specify a movefor any possible board position. In order to choose 5-10 different strategies touse as the classes of a classifier, relevant strategies need to be identified- forexample strategies which are used a significant proportion of the time, or thatconsistently beat another one of the 5-10 strategies but are not always used by

86

Page 90: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

players (presumably they are weak to another strategy). Since the only labelswe have in the current data set are the usernames of the players who played thegames, and they may employ several different strategies in different games, it isdifficult to decide which of the millions of possible strategies would make goodclasses for a classifier.

One solution could be to get an expert to manually classify some traininggames with well known strategies which humans are known to play with. Aclassifier could then be trained in the same way as the classifiers were trained sofar, and should be able to tell between these defined strategies. There are obvi-ous drawbacks to this method in that access to a Go expert is needed, and s/heneeds to be willing to explain and classify a substantial set of games to producea training set. There is also the problem that humans may switch strategiespart way through a game or use different strategies in different situations.

Another alternative might be to select a large number of different strategies,then classify games according to how much of each of these strategies theyemploy. In this way, a player who happens to use both the ‘capture the centrepoint’ and the ‘play aggressively’ strategy would have both strategies identified.

7.2.4 Application

This project, all experiments, generation of classifiers and the application tohuman data was taken with a view to identifying strategies from the boardwith the intention that it might be applied in a computer player to identifythe opponent’s strategy and adapt its own strategy accordingly. The classifiersgenerated here could be applied directly, and with some statistical analysis ofwhich strategies tended to beat which other strategies, it would be possible tochoose a strategy which would be more likely to win against the opponent.

One issue that arises is that knowing what should be achieved and how toachieve it are different problems. For example, if the best strategy to adopt wasone that captured the top-left corner of the board, then the computer playerwould attempt to do this. However, it would not necessarily know how toachieve such an objective. Analogously, one might say that the best strategy isto ‘win’ (i.e. to have a high score at the end of the game). However this doesnot give a possible method through which this might be achieved. There are afew methods that might be applied in order to plan towards an objective:

Generative models could be used to generate the best next move or severalmoves. For example, every possible move could be simulated and the resultingboards classified, and the move that caused the strategy to be classified theclosest to the desired objective would be made. For example, if White’s (thecomputer player’s) objective was to capture the top-left corner of the board, it islikely that moves made in or near that area would cause the classifier to classifyWhite’s strategy as ‘capturing-top-left’. If the classifier were advanced enoughand several strategies were applied at once with different weights (such as ‘min-imise white casualties’ or ‘defend when attacked’), it is possible that this wouldproduce a strong computer player capable of planning and adapting strategyaccording to the state of the board and the opponent’s perceived strategy.

87

Page 91: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Another approach would be to integrate the strategy recognition classifiersinto an existing computer player. In this way, when an aim was identified such as‘capture top-left corner’, weights within the existing program could be changedto make it weight that corner more highly. In this way, the classifier is usedpurely as an ‘enemy intelligence’ tool and to provide long term strategical aims.The planning and tactical play is left to the existing program’s AI. Given thestrength of recent computer players (e.g. GNU Go) and the amount of researchthat they currently make use of, this is probably the best way to draw on thestrengths of both areas and include the new techniques developed here to makethe existing player even stronger.

Other applications for the work I have demonstrated here might includeadapting strategy recognition to other games, for teaching to allow a student toreceive automated helpful feedback without a teacher, as an automated gamesummary tool for automatically creating short summaries of games includingthe players’ strategies, or for research into a particular opponent to discovertheir ‘favourite’ strategies, so that one might study more effectively how to beatthat opponent. Hopefully this project paves the way for further work to come!

88

Page 92: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Bibliography

Allis, L. et al. (1994). Searching for solutions in games and artificial intelligence.CIP-Gegevens Koninklijke Bibliotheek, Den Haag, pages 90–9007488.

Allis, V. (1988). A knowledge-based approach to connect-four. The gameissolved: White wins. Master’s thesis, Faculty of Mathematics and ComputerScience, Free University, Amsterdam.

Bishop, C. et al. (2006). Pattern recognition and machine learning. SpringerNew York:.

Bouzy, B., Helmstetter, B., and Hsu, T. (2004). Monte-carlo Go developments.In Advances in computer games: many games, many challenges: proceed-ings of the ICGA/IFIP SG16 10th Advances in Computer Games Conference(ACG 10), November 24-27, 2003, Graz, Styria, Austria, page 159. SpringerNetherlands.

Brügmann, B. (1993). Monte carlo go. Unpublished.

Campbell, M., Hoane, A., and Hsu, F. (2002). Deep blue. Artificial Intelligence,134(1-2):57–83.

Charniak, E. (1991). Bayesian Networks without Tears. AI magazine, 12(4):50.

Computer Olympiad (2010). Computer olympiad results. http://www.grappa.univ-lille3.fr/icga/competition.php?id=3.

Davies, J. and Bozulich, R. (1984). An introduction to Go. Ishi Pr.

Demaine, E. (2001). Playing games with algorithms: Algorithmic combinatorialgame theory. Mathematical Foundations of Computer Science 2001, pages18–33.

Farnebäck, G. (2010a). Gnu go manual. http://www.gnu.org/software/gnugo/gnugo\_1.html\#SEC2.

Farnebäck, G. (2010b). Go text protocol specification. http://www.lysator.liu.se/~gunnar/gtp/.

Freund, Y. and Schapire, R. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational LearningTheory, pages 23–37. Springer.

89

Page 93: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Fürnkranz, J. (1996). Machine learning in computer chess: The next generation.International Computer Chess Association Journal, 19(3):147–160.

Group, C. G. (2010). Fuego. http://fuego.sourceforge.net/.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten,I. (2009). The WEKA data mining software: An update. ACM SIGKDDExplorations Newsletter, 11(1):10–18.

He, S., Du, J., Chen, H., Meng, J., and Zhu, Q. (2008a). Strategy-BasedPlayer Modeling during Interactive Entertainment Sessions by Using BayesianClassification. In Natural Computation, 2008. ICNC’08. Fourth InternationalConference on, volume 4.

He, S., Xie, F., Wang, Y., Meng, J., Chen, H., Liu, Z., and Zhu, Q. (2008b).Game Player Strategy Pattern Recognition and How UCT Algorithms ApplyPre-Knowledge of Player’s Strategy to Improve Opponent AI. In the Inter-national Conference on Innovation in Software Engineering (ISE’2008).

Hollosi, A. (2010). Sgf file format ff[4] official specification. http://www.red-bean.com/sgf/.

Hoock, J., Rimmel, A., and Teytaud, O. (2008). Combining expert, offline,transient and online knowledge in Monte-Carlo exploration.

Jasiek, R. (2010). Elementary rules. http://home.snafu.de/jasiek/element.html.

Kendall, G. and Whitwell, G. (2001). An evolutionary approach for the tuningof a chess evaluation function using population dynamics. hno, 1(26):4–5.

KGS (2010a). Historical records for KGS games. http://www.u-go.net/gamerecords/.

KGS (2010b). Kgs go server. http://www.gokgs.com/.

KGS (2010c). Kgs go server rank graph for aya. http://www.gokgs.com/graphPage.jsp?user=ayabot.

KGS (2010d). Kgs go server rank graph for fuego. http://www.gokgs.com/graphPage.jsp?user=Fuego.

KGS (2010e). Kgs go server rank graph for gnu go. http://www.gokgs.com/graphPage.jsp?user=GNU.

KGS (2010f). Kgs go server rank graph for mogo. http://www.gokgs.com/graphPage.jsp?user=CzechBot.

Laffont, J. (1997). Game theory and empirical economics: The case of auctiondata. European Economic Review, 41(1):1–35.

Lichtenstein, D. and Sipser, M. (1980). Go is polynomial-space hard. Journalof the ACM (JACM), 27(2):393–401.

90

Page 94: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Lubberts, A. and Miikkulainen, R. (2001). Co-evolving a go-playing neuralnetwork. In Proceedings of the GECCO-01 Workshop on Coevolution: TurningAdaptive Algorithms upon Themselves, pages 14–19. Citeseer.

Müller, M. (2002). Computer go. Artificial Intelligence, 134(1-2):145–179.

Moriarty, D. and Miikkulainen, R. (1995). Discovering complex Othello strate-gies through evolutionary neural networks. Connection Science, 7(3):195–210.

Richards, N., Moriarty, D., and Miikkulainen, R. (1998). Evolving neural net-works to play Go. Applied Intelligence, 8(1):85–96.

Roth, A. (2002). The economist as engineer: Game theory, experimentation, andcomputation as tools for design economics. Econometrica, 70(4):1341–1378.

Samuel, A. (1967). Some studies in machine learning using the game of checkersII-recent progress.

Schaeffer, J., Burch, N., Bjornsson, Y., Kishimoto, A., Muller, M., Lake, R.,Lu, P., and Sutphen, S. (2007). Checkers is solved. Science, 317(5844):1518.

Schaeffer, J. and Van den Herik, H. (2002). Games, computers, and artificialintelligence. Artificial Intelligence, 134(1-2):1–7.

Sensei’s Library (2010a). About the opening. http://senseis.xmp.net/?AboutTheOpening.

Sensei’s Library (2010b). Go proverbs. http://senseis.xmp.net/?GreatQuotes.

Shannon, C. (1950). Programming a computer for playing chess. Philosophicalmagazine, 41(7):256–275.

Sill, J., Takacs, G., Mackey, L., and Lin, D. (2009). Feature-Weighted LinearStacking. Arxiv preprint arXiv:0911.0460.

Tenenbaum, J. (1999). Bayesian modeling of human concept learning. In Ad-vances in neural information processing systems 11: proceedings of the 1998conference, page 59. The MIT Press.

Tenenbaum, J., Griffiths, T., and Kemp, C. (2006). Theory-based Bayesianmodels of inductive learning and reasoning. Trends in Cognitive Sciences,10(7):309–318.

Tromp, J. and Farnebäck, G. (2007). Combinatorics of go. Computers andGames, pages 84–99.

Wang, Y. (2007). Interview with yizao wang. http://www.youtube.com/watch?v=ELRCn4hfPjQ.

Wang, Y. (2010). Mogo: A software for the game of go. http://www.lri.fr/~teytaud/mogo.html.

Webb, G., Boughton, J., and Wang, Z. (2005). Not so naive bayes: Aggregatingone-dependence estimators. Machine Learning, 58(1):5–24.

91

Page 95: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Appendices

92

Page 96: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

App

endixA

Source

Cod

e

A.1

Nou

ghts

andCrosses

The

source

code

form

yim

plem

entatio

nof

Nou

ghts

andCrosses

(Tic-Tac-Toe)

isinclud

edhe

re.

Thiscode

does

notusean

yexternal

librarie

san

dwas

all

writ

tenby

me.

TicTa

cToeLo

gger.ja

vaan

dBoard.ja

vaprov

idethemaincode

executab

le,L

ocation.java

andSy

mbo

l.javaareused

asda

tastructures

tobe

passed

betw

eenothe

rclasses,

each

oftheplayer

strategies

hasits

ownclass,

andCustomPlayer.ja

va,PlayGam

e.java

andRMTicTa

cToeClassifier.ja

vaare

used

fortheinteractivepa

rtof

thegame(see

section3.3.3).

Allsource

code

,exp

erim

ents

andda

tasets

canbe

foun

don

theen

closed

CD.

Source/tictactoe/T

icTa

cToeLo

gger.ja

va1

pac

kage

uk.c

o.t

f214

;2 3

imp

ort

java

.io

.∗;

4im

por

tja

va.u

til.

Arr

ayL

ist

;5 6

pu

bli

ccl

ass

Tic

Tac

Toe

Log

ger

{7 8

pu

bli

cst

ati

cvo

idm

ain

(Str

ing

[]ar

gs)

{

9S

trin

gou

tpu

t=

"";

10ou

tpu

t+

=p

rin

tBoa

rdL

abel

s(1)

;11

outp

ut

+=

pri

ntB

oard

Lab

els(

2);

12ou

tpu

t+

=p

rin

tBoa

rdL

abel

s(3)

;13

outp

ut

+=

pri

ntB

oard

Lab

els(

4);

14ou

tpu

t+

=p

rin

tBoa

rdL

abel

s(5)

;15

outp

ut

+=

pri

ntB

oard

Lab

els(

6);

16ou

tpu

t+

=p

rin

tBoa

rdL

abel

s(7)

;17

outp

ut

+=

pri

ntB

oard

Lab

els(

8);

18ou

tpu

t+

=p

rin

tBoa

rdL

abel

s(9)

;19

outp

ut

+=

"Cla

ss\n

";

20 21/∗

22∗

//T

his

sect

ion

test

sw

heth

erp

erfe

ctP

laye

ris

act

ua

lly

per

fect

.fo

r(

23∗

int

i=

0;i

<10

000;

i++

){

if(

i%

100

==

0)

Syst

em.o

ut.p

rin

tln

(24

∗i

);

25∗

26∗

Str

ing

gam

eStr

ing

=pl

ayG

ame(

new

Per

fect

Pla

yer

(ne

wSy

mbo

l(

"X"

))

,27

∗ne

wR

ando

mP

laye

r(ne

wSy

mbo

l("Y

")

))

;if

(!g

ameS

trin

g.

equ

als

(""

28∗

))

{ou

tpu

t+

=ga

meS

trin

g;

}}

29∗/

30 31A

rray

Lis

t<P

laye

r>X

Pla

yers

=ne

wA

rray

Lis

t<P

laye

r>()

;32

XP

laye

rs.a

dd(n

ewP

erfe

ctP

lay

er(n

ewSy

mbo

l("X

")))

;33

XP

laye

rs.a

dd(n

ewE

dgeP

laye

r(ne

wSy

mbo

l("X

")))

;34

XP

laye

rs.a

dd(n

ewR

ando

mP

laye

r(ne

wSy

mbo

l("X

")))

;35

XP

laye

rs.a

dd(n

ewC

orn

erP

laye

r(ne

wSy

mbo

l("X

")))

;36

XP

laye

rs.a

dd(n

ewR

andE

dgeP

laye

r(ne

wSy

mbo

l("X

")))

;37

XP

laye

rs.a

dd(n

ewR

andC

orne

rPla

yer(

new

Sym

bol(

"X")

));

38 39A

rray

Lis

t<P

laye

r>Y

Pla

yers

=ne

wA

rray

Lis

t<P

laye

r>()

;40

YP

laye

rs.a

dd(n

ewP

erfe

ctP

lay

er(n

ewSy

mbo

l("Y

")))

;41

YP

laye

rs.a

dd(n

ewE

dgeP

laye

r(ne

wSy

mbo

l("Y

")))

;42

YP

laye

rs.a

dd(n

ewR

ando

mP

laye

r(ne

wSy

mbo

l("Y

")))

;43

YP

laye

rs.a

dd(n

ewC

orn

erP

laye

r(ne

wSy

mbo

l("Y

")))

;44

YP

laye

rs.a

dd(n

ewR

andE

dgeP

laye

r(ne

wSy

mbo

l("Y

")))

;45

YP

laye

rs.a

dd(n

ewR

andC

orne

rPla

yer(

new

Sym

bol(

"Y")

));

46 47S

trin

gBu

ild

erou

tpu

tBu

ild

er=

new

Str

ingB

uil

der

(ou

tpu

t);

48fo

r(P

laye

rX

Pla

yer

:X

Pla

yers

){

49Sy

stem

.out

50.p

rin

tln

("X

Pla

yer:

"+

XP

laye

r.ge

tCla

ss()

.get

Sim

pleN

ame

93

Page 97: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

())

;51

for

(Pla

yer

YP

laye

r:

YP

laye

rs)

{52

Syst

em.o

ut.p

rin

tln

("Y

Pla

yer:

"53

+Y

Pla

yer.

getC

lass

().g

etSi

mpl

eNam

e()

);

54fo

r(i

nt

gam

enum

=0;

gam

enum

<10

0;ga

men

um+

+)

{55

outp

utB

uil

der

.app

end

(pla

yGam

e(X

Pla

yer

,Y

Pla

yer)

);

56}

57}

58}

59 60tr

y{

61//

Cre

ate

file

62F

ileW

rite

rfs

trea

m=

new

Fil

eWri

ter(

"out

.tx

t")

;63

Bu

ffer

edW

rite

rou

t=

new

Bu

ffer

edW

rite

r(fs

trea

m)

;64

out.

wri

te(o

utp

utB

uil

der

.toS

trin

g()

);

65//

Clo

seth

eou

tpu

tst

ream

66ou

t.cl

ose

();

67}

catc

h(E

xcep

tion

e){/

/C

atch

exce

ptio

nif

any

68Sy

stem

.err

.pri

ntl

n("

Err

or:

"+

e.g

etM

essa

ge()

);

69}

70 71//

Syst

em.o

ut.p

rin

tln

(ou

tpu

t)

;72

Syst

em.o

ut.p

rin

tln

("do

ne")

;73

Syst

em.e

xit

(1)

;74

}75 76

pri

vat

est

ati

cS

trin

gpl

ayG

ame(

Pla

yer

Xpl

ayer

,P

laye

rY

pla

yer)

{77

Boa

rdbo

ard

=ne

wB

oard

();

78S

trin

gou

tpu

tstr

=""

;79

Str

ing

trai

nin

gEx

amp

lesS

trin

g=

"";

80 81in

ttu

rn=

0;82 83

//Y

goes

firs

th

alf

ofth

eti

me

.84

if(M

ath

.ran

dom

()>

0.5)

{85

turn

++

;86

Loc

atio

nlo

c=

Yp

laye

r.m

akeM

ove(

boar

d)

;87

boar

d.m

akeM

ove(

loc

,ne

wSy

mbo

l("Y

"))

;88 89

outp

uts

tr+

=tu

rn+

","

;90

outp

uts

tr+

=p

rin

tBoa

rdD

etai

ls(b

oard

)+

","

;91

//Sy

stem

.out

.pri

ntl

n(

boar

d)

;92

}93 94

wh

ile

(!bo

ard

.gam

eCom

plet

e())

{95

turn

++

;

96L

ocat

ion

loc

=X

pla

yer.

mak

eMov

e(bo

ard

);

97bo

ard

.mak

eMov

e(lo

c,

new

Sym

bol(

"X")

);

98 99ou

tpu

tstr

+=

turn

+"

,";

100

outp

uts

tr+

=p

rin

tBoa

rdD

etai

ls(b

oard

)+

","

;10

1//

Syst

em.o

ut.p

rin

tln

(bo

ard

);

102

103

//T

his

isth

epo

int

atw

hich

inan

ACTU

AL(n

on−

tra

inin

g)

gam

e,

we

104

//w

ould

ask

for

ap

red

icti

on

ofX

’sst

rate

gy.

Wri

teth

isex

ampl

eto

105

//th

etr

ain

ing

file

.10

6S

trin

gp

add

edou

tpu

tstr

=ou

tpu

tstr

;10

7fo

r(i

nt

i=

0;i

<(9

−tu

rn)

∗17

;i+

+)

108

pad

ded

outp

uts

tr+

="

,";

109

110

trai

nin

gEx

amp

lesS

trin

g+

=p

add

edou

tpu

tstr

111

+X

pla

yer.

getC

lass

().g

etSi

mpl

eNam

e()

+"\

n"

;11

211

3if

(boa

rd.g

ameC

ompl

ete(

))11

4b

reak

;11

511

6tu

rn+

+;

117

loc

=Y

pla

yer.

mak

eMov

e(bo

ard

);

118

boar

d.m

akeM

ove(

loc

,ne

wSy

mbo

l("Y

"))

;11

912

0ou

tpu

tstr

+=

turn

+"

,";

121

outp

uts

tr+

=p

rin

tBoa

rdD

etai

ls(b

oard

)+

","

;12

2//

Syst

em.o

ut.p

rin

tln

(bo

ard

);

123

}12

412

5w

hil

e(t

urn

<9)

{12

6tu

rn+

+;

127

outp

uts

tr+

=tu

rn+

","

;12

8fo

r(i

nt

i=

0;i

<16

;i+

+)

129

outp

uts

tr+

="

,";

130

//o

utp

uts

tr+

=pr

intB

oard

Det

ails

(boa

rd)

+"

,";

131

}13

2ou

tpu

tstr

+=

Xp

laye

r.ge

tCla

ss()

.get

Sim

pleN

ame

()+

"\n

";

133

134

//re

turn

ou

tpu

tstr

;13

5re

turn

trai

nin

gEx

amp

lesS

trin

g;

136

}13

713

8p

riv

ate

sta

tic

Str

ing

pri

ntB

oard

Det

ails

(Boa

rdbo

ard

){

139

Str

ing

outp

uts

tr=

"";

94

Page 98: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

140

141

for

(in

tro

w=

1;ro

w<

=3;

row

++

){

142

for

(in

tco

l=

1;co

l<

=3;

col+

+)

{14

3ou

tpu

tstr

+=

boar

d.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(col

,ro

w))

144

+"

,";

145

}14

6}

147

148

outp

uts

tr+

=bo

ard

.get

Num

bero

fSym

bol(

new

Sym

bol(

"X")

)+

","

;14

9ou

tpu

tstr

+=

boar

d.g

etN

umbe

rofS

ymbo

l(ne

wSy

mbo

l("Y

"))

+"

,";

150

outp

uts

tr+

=bo

ard

.get

Num

bero

fSym

bol(

new

Sym

bol(

""))

+"

,";

151

outp

uts

tr+

=bo

ard

.has

Tw

oInA

Row

(new

Sym

bol(

"X")

)+

","

;15

2ou

tpu

tstr

+=

boar

d.h

asT

woI

nAR

ow(n

ewSy

mbo

l("Y

"))

+"

,";

153

outp

uts

tr+

=bo

ard

.has

Thr

eeIn

AR

ow(n

ewSy

mbo

l("X

"))

+"

,";

154

outp

uts

tr+

=bo

ard

.has

Thr

eeIn

AR

ow(n

ewSy

mbo

l("Y

"))

;15

5//

Syst

em.o

ut.p

rin

tln

();

156

157

retu

rnou

tpu

tstr

;15

8}

159

160

pri

vat

est

ati

cS

trin

gp

rin

tBoa

rdL

abel

s(in

ttu

rn)

{16

1S

trin

gou

tpu

tstr

=tu

rn+

"_tu

rn,"

;16

2fo

r(i

nt

row

=1;

row

<=

3;ro

w+

+)

{16

3fo

r(i

nt

col

=1;

col

<=

3;co

l++

){

164

outp

uts

tr+

=tu

rn+

"_"

+co

l+

row

+"

,";

165

}16

6}

167

168

outp

uts

tr+

=tu

rn+

"_nu

mX

s,"

;16

9ou

tpu

tstr

+=

turn

+"_

num

Ys,

";

170

outp

uts

tr+

=tu

rn+

"_nu

mE

mpt

ys,"

;17

1ou

tpu

tstr

+=

turn

+"_

2Xs,

";

172

outp

uts

tr+

=tu

rn+

"_2Y

s,"

;17

3ou

tpu

tstr

+=

turn

+"_

3Xs,

";

174

outp

uts

tr+

=tu

rn+

"_3Y

s,"

;17

517

6re

turn

outp

uts

tr;

177

}17

817

9}

Source/tictactoe/B

oard.ja

va1

pac

kage

uk.c

o.t

f214

;2 3

imp

ort

java

.uti

l.A

rray

Lis

t;

4im

por

tja

va.u

til.

Has

hMap

;5 6

pu

bli

ccl

ass

Boa

rdim

plem

ents

Clo

nea

ble

{7 8

pri

vat

eH

ashM

ap<

Loc

atio

n,

Sym

bol>

boar

d;

9 10p

ub

lic

Boa

rdcl

one

(){

11B

oard

new

Boa

rd=

new

Boa

rd()

;12

new

Boa

rd.S

etB

oard

((H

ashM

ap<

Loc

atio

n,

Sym

bol>

)bo

ard

.clo

ne

())

;13

retu

rnne

wB

oard

;14

}15 16

pu

bli

cB

oard

(){

17bo

ard

=ne

wH

ashM

ap<

Loc

atio

n,

Sym

bol>

();

18fo

r(i

nt

row

=1;

row

<=

3;ro

w+

+)

{19

for

(in

tco

l=

1;co

l<

=3;

col+

+)

{20

boar

d.p

ut(n

ewL

ocat

ion

(col

,ro

w)

,ne

wSy

mbo

l("e

"))

;21

}22

}23

}24 25

pu

bli

cvo

idSe

tBoa

rd(H

ashM

ap<

Loc

atio

n,

Sym

bol>

boar

dHas

hMap

){

26bo

ard

=bo

ardH

ashM

ap;

27}

28 29p

ub

lic

void

mak

eMov

e(L

ocat

ion

loc

,Sy

mbo

ls)

{30

Arr

ayL

ist<

Loc

atio

n>

empt

ySpa

ces

=ge

tEm

ptyS

pace

s()

;31

if(e

mpt

ySpa

ces.

con

tain

s(lo

c))

{32

boar

d.p

ut(l

oc,

s);

33}

else

{34

thro

wne

wIn

vali

dM

oveE

xcep

tion

();

35}

36}

37 38p

ub

lic

Arr

ayL

ist<

Loc

atio

n>

getE

mpt

ySpa

ces(

){

39re

turn

getL

ocat

ion

sOfS

ymb

ol(n

ewSy

mbo

l("e

"))

;40

}41 42

pu

bli

cA

rray

Lis

t<L

ocat

ion

>ge

tLoc

atio

nsO

fSym

bol

(Sym

bol

s){

43A

rray

Lis

t<L

ocat

ion

>li

st=

new

Arr

ayL

ist<

Loc

atio

n>

();

44fo

r(i

nt

row

=1;

row

<=

3;ro

w+

+)

{45

for

(in

tco

l=

1;co

l<

=3;

col+

+)

{46

Loc

atio

nlo

c=

new

Loc

atio

n(c

ol,

row

);

47Sy

mbo

lsy

mb

olat

this

loc

=bo

ard

.get

(loc

);

48if

(sy

mb

olat

this

loc

.eq

ual

s(s)

){

49li

st.a

dd(n

ewL

ocat

ion

(col

,ro

w))

;

95

Page 99: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

50}

51}

52}

53re

turn

list

;54

}55 56

pu

bli

cSy

mbo

lge

tSym

bol

AtL

ocat

ion

(Loc

atio

nlo

c)

{57

retu

rnbo

ard

.get

(loc

);

58}

59 60p

ub

lic

int

getN

umbe

rofS

ymbo

l(Sy

mbo

ls)

{61

retu

rnge

tLoc

atio

nsO

fSym

bol

(s)

.siz

e()

;62

}63 64

pu

bli

cb

oole

anha

sTw

oInA

Row

(Sym

bol

s){

65//

Loo

kal

ong

each

row

and

see

ifth

issy

mbo

lis

eith

erin

colu

mns

1an

d66

//2

or2

and

3.67

for

(in

tro

w=

1;ro

w<

=3;

row

++

){

68if

(boa

rd.g

et(n

ewL

ocat

ion

(2,

row

)).e

qu

als(

s)69

&&

(boa

rd.g

et(n

ewL

ocat

ion

(1,

row

)).e

qu

als(

s)||

boar

d.

get(

70ne

wL

ocat

ion

(3,

row

)).e

qu

als(

s)))

{71

retu

rntr

ue

;72

}73

}74 75

//L

ook

dow

nea

chco

lum

n.

76fo

r(i

nt

col

=1;

col

<=

3;co

l++

){

77if

(boa

rd.g

et(n

ewL

ocat

ion

(col

,2)

).e

qu

als(

s)78

&&

(boa

rd.g

et(n

ewL

ocat

ion

(col

,1)

).e

qu

als(

s)||

boar

d.

get(

79ne

wL

ocat

ion

(col

,3)

).e

qu

als(

s)))

{80

retu

rntr

ue

;81

}82

}83 84

//C

heck

the

diag

on

als

.85

if(b

oard

.get

(new

Loc

atio

n(2

,2)

).e

qu

als(

s)86

&&

(boa

rd.g

et(n

ewL

ocat

ion

(1,

1))

.eq

ual

s(s)

87||

boar

d.g

et(n

ewL

ocat

ion

(3,

3))

.eq

ual

s(s)

88||

boar

d.g

et(n

ewL

ocat

ion

(3,

1))

.eq

ual

s(s)

||bo

ard

89.g

et(n

ewL

ocat

ion

(1,

3))

.eq

ual

s(s)

)){

90re

turn

tru

e;

91}

92

93re

turn

fals

e;

94}

95 96p

ub

lic

Boo

lean

hasT

hree

InA

Row

(Sym

bol

s){

97fo

r(i

nt

row

=1;

row

<=

3;ro

w+

+)

{98

if(b

oard

.get

(new

Loc

atio

n(2

,ro

w))

.eq

ual

s(s)

99&

&bo

ard

.get

(new

Loc

atio

n(1

,ro

w))

.eq

ual

s(s)

100

&&

boar

d.g

et(n

ewL

ocat

ion

(3,

row

)).e

qu

als(

s))

{10

1re

turn

tru

e;

102

}10

3}

104

105

//L

ook

dow

nea

chco

lum

n.

106

for

(in

tco

l=

1;co

l<

=3;

col+

+)

{10

7if

(boa

rd.g

et(n

ewL

ocat

ion

(col

,2)

).e

qu

als(

s)10

8&

&bo

ard

.get

(new

Loc

atio

n(c

ol,

1))

.eq

ual

s(s)

109

&&

boar

d.g

et(n

ewL

ocat

ion

(col

,3)

).e

qu

als(

s))

{11

0re

turn

tru

e;

111

}11

2}

113

114

//C

heck

the

dia

gon

als

.11

5if

((bo

ard

.get

(new

Loc

atio

n(2

,2)

).e

qu

als(

s)11

6&

&bo

ard

.get

(new

Loc

atio

n(1

,1)

).e

qu

als(

s)&

&bo

ard

.get

(11

7ne

wL

ocat

ion

(3,

3))

.eq

ual

s(s)

)11

8||

(boa

rd.g

et(n

ewL

ocat

ion

(1,

3))

.eq

ual

s(s)

119

&&

boar

d.g

et(n

ewL

ocat

ion

(2,

2))

.eq

ual

s(s)

&&

boar

d12

0.g

et(n

ewL

ocat

ion

(3,

1))

.eq

ual

s(s)

)){

121

retu

rntr

ue

;12

2}

123

124

retu

rnfa

lse

;12

5}

126

127

pu

bli

cSy

mbo

lge

tWin

ner(

){

128

Arr

ayL

ist<

Sym

bol>

sym

bols

=ne

wA

rray

Lis

t<Sy

mbo

l>()

;12

9sy

mbo

ls.a

dd(n

ewSy

mbo

l("X

"))

;13

0sy

mbo

ls.a

dd(n

ewSy

mbo

l("Y

"))

;13

113

2fo

r(S

ymbo

ls

:sy

mbo

ls)

{13

3if

(has

Thr

eeIn

AR

ow(s

))13

4re

turn

s;

135

}13

613

7re

turn

new

Sym

bol(

"")

;13

8}

96

Page 100: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

139

140

pu

bli

cH

ashM

ap<

Loc

atio

n,

Sym

bol>

getB

oard

Has

hMap

(){

141

retu

rnbo

ard

;14

2}

143

144

pu

bli

cS

trin

gge

tBoa

rdA

sStr

ing

(){

145

Str

ing

str

=""

;14

6fo

r(i

nt

row

=1;

row

<=

3;ro

w+

+)

{14

7fo

r(i

nt

col

=1;

col

<=

3;co

l++

){

148

Sym

bol

s=

boar

d.g

et(n

ewL

ocat

ion

(col

,ro

w))

;14

9if

(s.e

qu

als(

new

Sym

bol(

"e")

)){

150

str

+=

""

;15

1}

else

152

str

+=

s;

153

}15

4st

r+

="\

n"

;15

5}

156

retu

rnst

r;

157

}15

815

9p

ub

lic

Str

ing

toS

trin

g()

{16

0re

turn

getB

oard

AsS

trin

g()

;16

1}

162

163

pu

bli

cb

oole

anga

meC

ompl

ete(

){

164

if(h

asT

hree

InA

Row

(new

Sym

bol(

"X")

)||

hasT

hree

InA

Row

(new

Sym

bol(

"Y")

)16

5||

getE

mpt

ySpa

ces(

).s

ize

()=

=0)

166

retu

rntr

ue

;16

7el

se16

8re

turn

fals

e;

169

}17

017

1//

Con

ven

ien

cem

etho

dsto

avoi

d"n

ewlo

cati

on

s"

etc

ever

ywhe

re.

172

pu

bli

cvo

idm

akeM

ove(

int

col

,in

tro

w,

Str

ing

sym

bol)

{17

3m

akeM

ove(

new

Loc

atio

n(c

ol,

row

),

new

Sym

bol(

sym

bol)

);

174

}17

517

6}

Source/tictactoe/L

ocation.java

1p

acka

geuk

.co

.tf2

14;

2 3p

ub

lic

cla

ssL

ocat

ion

{4

pri

vat

ein

tro

w;

5p

riv

ate

int

col;

6 7p

ub

lic

Loc

atio

n(i

nt

col

,in

tro

w)

{8

this

.col

=co

l;9

this

.row

=ro

w;

10}

11 12p

ub

lic

Str

ing

toS

trin

g()

{13

retu

rn"(

"+

col

+"

,"+

row

+")

";

14}

15 16p

ub

lic

boo

lean

equ

als(

Loc

atio

nlo

c2)

{17

if(l

oc2

.col

==

col

&&

loc2

.row

==

row

)18

retu

rntr

ue

;19

else

20re

turn

fals

e;

21}

22 23p

ub

lic

boo

lean

equ

als(

Ob

ject

loca

tion

2ob

j){

24//

Syst

em.o

ut.p

rin

tln

(lo

cati

on

2o

bj.g

etC

lass

().g

etN

ame(

))

;25

if(!

loca

tion

2ob

j.ge

tCla

ss()

.get

Nam

e()

.eq

ual

s("u

k.c

o.t

f214

.L

ocat

ion

"))

26re

turn

fals

e;

27 28L

ocat

ion

loc2

=(L

ocat

ion

)lo

cati

on2o

bj;

29if

(loc

2.c

ol=

=co

l&

&lo

c2.r

ow=

=ro

w)

30re

turn

tru

e;

31el

se32

retu

rnfa

lse

;33

}34 35

pu

bli

cin

tha

shC

ode

(){

36re

turn

(2∗

col

+3

∗ro

w)

;37

}38

}

Source/tictactoe/S

ymbo

l.java

1p

acka

geuk

.co

.tf2

14;

2 3p

ub

lic

cla

ssSy

mbo

l{

4 5p

riv

ate

Str

ing

valu

e;

6 7p

ub

lic

Sym

bol(

Str

ing

xoro

){

8va

lue

=xo

ro;

9}

10

97

Page 101: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

11p

ub

lic

boo

lean

equ

als(

Str

ing

x)

{12

retu

rn(x

.eq

ual

s(va

lue

));

13}

14 15p

ub

lic

boo

lean

equ

als(

Sym

bol

x)

{16

retu

rn(x

.toS

trin

g()

.eq

ual

s(va

lue

));

17}

18 19p

ub

lic

int

hash

Cod

e()

{20

retu

rnva

lue

.has

hCod

e()

;21

}22 23

pu

bli

cS

trin

gto

Str

ing

(){

24re

turn

valu

e;

25}

26 27}

Source/tictactoe/P

layer.java

1p

acka

geuk

.co

.tf2

14;

2 3p

ub

lic

abst

ract

cla

ssP

laye

r{

4 5p

rote

cted

Sym

bol

myS

ymbo

l;6

pro

tect

edSy

mbo

lot

herS

ymbo

l;7

pro

tect

edSy

mbo

lem

ptyS

ymbo

l=

new

Sym

bol(

"e")

;8 9

pu

bli

cP

laye

r(Sy

mbo

ls)

{10

myS

ymbo

l=

s;

11if

(myS

ymbo

l.eq

ual

s(ne

wSy

mbo

l("X

")))

12ot

herS

ymbo

l=

new

Sym

bol(

"Y")

;13

else

14ot

herS

ymbo

l=

new

Sym

bol(

"X")

;15

}16 17

pu

bli

cL

ocat

ion

mak

eMov

e(B

oard

boar

d)

{18

retu

rnne

wL

ocat

ion

(1,

1);

19}

20}

Source/tictactoe/E

dgeP

layer.java

1p

acka

geuk

.co

.tf2

14;

2 3im

por

tja

va.u

til.

Arr

ayL

ist

;

4 5p

ub

lic

cla

ssE

dgeP

laye

rex

ten

ds

Pla

yer

{6 7

pu

bli

cE

dgeP

laye

r(Sy

mbo

lsy

mbo

l){

8su

per

(sym

bol)

;9

}10 11

pu

bli

cL

ocat

ion

mak

eMov

e(B

oard

boar

d)

{12

Arr

ayL

ist<

Loc

atio

n>

loca

tio

ns

=bo

ard

.get

Em

ptyS

pace

s()

;13

if(

loca

tio

ns

.con

tain

s(ne

wL

ocat

ion

(2,

1)))

14re

turn

new

Loc

atio

n(2

,1)

;15

if(

loca

tio

ns

.con

tain

s(ne

wL

ocat

ion

(3,

2)))

16re

turn

new

Loc

atio

n(3

,2)

;17

if(

loca

tio

ns

.con

tain

s(ne

wL

ocat

ion

(2,

3)))

18re

turn

new

Loc

atio

n(2

,3)

;19

if(

loca

tio

ns

.con

tain

s(ne

wL

ocat

ion

(1,

2)))

20re

turn

new

Loc

atio

n(1

,2)

;21 22

int

rand

=(i

nt)

(Mat

h.r

ando

m()

∗(

loca

tio

ns

.siz

e()

));

23re

turn

loca

tio

ns

.get

(ran

d)

;24

}25 26

}

Source/tictactoe/C

orne

rPlayer.java

1p

acka

geuk

.co

.tf2

14;

2 3im

por

tja

va.u

til.

Arr

ayL

ist

;4 5

pu

bli

ccl

ass

Cor

ner

Pla

yer

exte

nd

sP

laye

r{

6 7p

ub

lic

Cor

ner

Pla

yer(

Sym

bol

sym

bol)

{8

sup

er(s

ymbo

l);

9}

10 11p

ub

lic

Loc

atio

nm

akeM

ove(

Boa

rdbo

ard

){

12A

rray

Lis

t<L

ocat

ion

>lo

cati

on

s=

boar

d.g

etE

mpt

ySpa

ces(

);

13if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(1

,1)

))14

retu

rnne

wL

ocat

ion

(1,

1);

15if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(3

,1)

))16

retu

rnne

wL

ocat

ion

(3,

1);

17if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(1

,3)

))18

retu

rnne

wL

ocat

ion

(1,

3);

19if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(3

,3)

))20

retu

rnne

wL

ocat

ion

(3,

3);

21

98

Page 102: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

22in

tra

nd=

(in

t)(M

ath

.ran

dom

()∗

(lo

cati

on

s.s

ize

()))

;23

retu

rnlo

cati

on

s.g

et(r

and

);

24}

25}

Source/tictactoe/R

andC

orne

rPlayer.java

1p

acka

geuk

.co

.tf2

14;

2 3im

por

tja

va.u

til.

Arr

ayL

ist

;4 5

pu

bli

ccl

ass

Ran

dCor

nerP

laye

rex

ten

ds

Pla

yer

{6 7

pu

bli

cR

andC

orne

rPla

yer(

Sym

bol

sym

bol)

{8

sup

er(s

ymbo

l);

9}

10 11p

ub

lic

Loc

atio

nm

akeM

ove(

Boa

rdbo

ard

){

12A

rray

Lis

t<L

ocat

ion

>lo

cati

on

s=

boar

d.g

etE

mpt

ySpa

ces(

);

13A

rray

Lis

t<L

ocat

ion

>p

ossi

ble

Mov

es=

new

Arr

ayL

ist<

Loc

atio

n>

();

14if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(1

,1)

))15

pos

sib

leM

oves

.add

(new

Loc

atio

n(1

,1)

);

16if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(3

,1)

))17

pos

sib

leM

oves

.add

(new

Loc

atio

n(3

,1)

);

18if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(1

,3)

))19

pos

sib

leM

oves

.add

(new

Loc

atio

n(1

,3)

);

20if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(3

,3)

))21

pos

sib

leM

oves

.add

(new

Loc

atio

n(3

,3)

);

22 23if

(pos

sib

leM

oves

.siz

e()

>=

1){

24in

tra

nd=

(in

t)(M

ath

.ran

dom

()∗

(pos

sib

leM

oves

.siz

e()

));

25re

turn

pos

sib

leM

oves

.get

(ran

d)

;26

}el

se{

27in

tra

nd=

(in

t)(M

ath

.ran

dom

()∗

(lo

cati

on

s.s

ize

()))

;28

retu

rnlo

cati

on

s.g

et(r

and

);

29}

30}

31}

Source/tictactoe/R

andE

dgeP

layer.java

1p

acka

geuk

.co

.tf2

14;

2 3im

por

tja

va.u

til.

Arr

ayL

ist

;4 5

pu

bli

ccl

ass

Ran

dEdg

ePla

yer

exte

nd

sP

laye

r{

6 7p

ub

lic

Ran

dEdg

ePla

yer(

Sym

bol

sym

bol)

{8

sup

er(s

ymbo

l);

9}

10 11p

ub

lic

Loc

atio

nm

akeM

ove(

Boa

rdbo

ard

){

12A

rray

Lis

t<L

ocat

ion

>lo

cati

on

s=

boar

d.g

etE

mpt

ySpa

ces(

);

13A

rray

Lis

t<L

ocat

ion

>p

ossi

ble

Mov

es=

new

Arr

ayL

ist<

Loc

atio

n>

();

14if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(2

,1)

))15

pos

sib

leM

oves

.add

(new

Loc

atio

n(2

,1)

);

16if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(3

,2)

))17

pos

sib

leM

oves

.add

(new

Loc

atio

n(3

,2)

);

18if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(2

,3)

))19

pos

sib

leM

oves

.add

(new

Loc

atio

n(2

,3)

);

20if

(lo

cati

on

s.c

onta

ins(

new

Loc

atio

n(1

,2)

))21

pos

sib

leM

oves

.add

(new

Loc

atio

n(1

,2)

);

22 23if

(pos

sib

leM

oves

.siz

e()

>=

1){

24in

tra

nd=

(in

t)(M

ath

.ran

dom

()∗

(pos

sib

leM

oves

.siz

e()

));

25re

turn

pos

sib

leM

oves

.get

(ran

d)

;26

}el

se{

27in

tra

nd=

(in

t)(M

ath

.ran

dom

()∗

(lo

cati

on

s.s

ize

()))

;28

retu

rnlo

cati

on

s.g

et(r

and

);

29}

30}

31}

Source/tictactoe/R

ando

mPlayer.ja

va1

pac

kage

uk.c

o.t

f214

;2 3

imp

ort

java

.uti

l.A

rray

Lis

t;

4 5p

ub

lic

cla

ssR

ando

mP

laye

rex

ten

ds

Pla

yer

{6 7

pu

bli

cR

ando

mP

laye

r(Sy

mbo

lsy

mbo

l){

8su

per

(sym

bol)

;9

}10 11

pu

bli

cL

ocat

ion

mak

eMov

e(B

oard

boar

d)

{12

Arr

ayL

ist<

Loc

atio

n>

loca

tio

ns

=bo

ard

.get

Em

ptyS

pace

s()

;13

int

rand

=(i

nt)

(Mat

h.r

ando

m()

∗(

loca

tio

ns

.siz

e()

));

14re

turn

loca

tio

ns

.get

(ran

d)

;15

}16

}

99

Page 103: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

Source/tictactoe/P

erfectPlayer.ja

va1

pac

kage

uk.c

o.t

f214

;2 3

imp

ort

java

.uti

l.A

rray

Lis

t;

4 5p

ub

lic

cla

ssP

erfe

ctP

lay

erex

ten

ds

Pla

yer

{6 7

pu

bli

cP

erfe

ctP

lay

er(S

ymbo

lsy

mbo

l){

8su

per

(sym

bol)

;9

}10 11

pu

bli

cL

ocat

ion

mak

eMov

e(B

oard

boar

d)

{12

Arr

ayL

ist<

Loc

atio

n>

loca

tio

ns

=bo

ard

.get

Em

ptyS

pace

s()

;13

if(

loca

tio

ns

.siz

e()

==

1)14

retu

rnlo

cati

on

s.g

et(0

);

15if

(lo

cati

on

s.s

ize

()=

=9)

16re

turn

new

Loc

atio

n(2

,2)

;17 18

//1.

Win

ifp

oss

ible

this

mov

e.19

Loc

atio

nw

inni

ngM

ove

=m

akeW

inni

ngM

ove(

boar

d,

myS

ymbo

l);

20if

(win

ning

Mov

e!=

nu

ll)

21re

turn

win

ning

Mov

e;

22 23//

2.B

lock

oppo

nen

tif

po

ssib

le(T

his

mea

nsgo

ing

inth

e24

//"w

inn

ing

spac

e"

for

the

othe

rpe

rson

).

25L

ocat

ion

bloc

king

Mov

e=

mak

eWin

ning

Mov

e(bo

ard

,ot

herS

ymbo

l);

26if

(blo

ckin

gMov

e!=

nu

ll)

27re

turn

bloc

king

Mov

e;

28 29//

2.5

Fix

bug

whe

rew

edo

n’t

no

tice

whe

nw

eha

veto

thre

aten

tow

inin

30//

orde

rto

prev

ent

them

spli

ttin

g.

31if

(boa

rd.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(1,

1))

.eq

ual

s(ot

herS

ymbo

l)32

&&

boar

d.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(3,

3))

.eq

ual

s(33

othe

rSym

bol)

34&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(2

,2)

).e

qu

als(

35m

ySym

bol)

&&

boar

d.g

etE

mpt

ySpa

ces(

).s

ize

()=

=6)

36re

turn

new

Loc

atio

n(2

,1)

;37

if(b

oard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(1

,3)

).e

qu

als(

othe

rSym

bol)

38&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(3

,1)

).e

qu

als(

39ot

herS

ymbo

l)40

&&

boar

d.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(2,

2))

.eq

ual

s(41

myS

ymbo

l)&

&bo

ard

.get

Em

ptyS

pace

s()

.siz

e()

==

6)42

retu

rnne

wL

ocat

ion

(2,

1);

43 44//

3.If

we

have

n’t

fin

ish

edye

t,

doa

bru

tefo

rce

sear

chof

the

othe

r45

//bo

ard

po

siti

on

s.

46Sy

mbo

ltu

rn=

myS

ymbo

l;47

dou

ble

bes

tSco

re=

−10

00;

48A

rray

Lis

t<L

ocat

ion

>be

stM

oves

=ne

wA

rray

Lis

t<L

ocat

ion

>()

;49

for

(Loc

atio

nlo

cati

on

:lo

cati

on

s)

{50

Boa

rdbo

ard2

=bo

ard

.clo

ne

();

51bo

ard2

.mak

eMov

e(lo

cati

on,

turn

);

52d

oub

lesc

ore

=m

oveA

ndE

valu

ate(

boar

d2,

othe

rSym

bol)

;53

//Sy

stem

.out

.pri

ntl

n(

"Sco

refo

r"

+m

ySym

bol

+"

atlo

cati

on

"+

54//

loca

tio

n+

"is

"+

scor

e)

;55

if(b

estS

core

==

scor

e)

{56

best

Mov

es.a

dd(l

oca

tio

n)

;57

}58

if(b

estS

core

<sc

ore

){

59b

estS

core

=sc

ore

;60

best

Mov

es.c

lea

r()

;61

best

Mov

es.a

dd(l

oca

tio

n)

;62

}63

}64 65

//If

2lo

cati

on

sha

veth

esa

me

scor

e,

choo

seon

era

ndom

ly.

66in

tra

nd=

(in

t)(M

ath

.ran

dom

()∗

(bes

tMov

es.s

ize

()))

;67

retu

rnbe

stM

oves

.get

(ran

d)

;68

}69 70

pu

bli

cd

oub

lem

oveA

ndE

valu

ate(

Boa

rdbo

ard

,Sy

mbo

ltu

rn)

{71

if(b

oard

.gam

eCom

plet

e())

{72

if(b

oard

.has

Thr

eeIn

AR

ow(m

ySym

bol)

)73

retu

rn1;

74if

(boa

rd.h

asT

hree

InA

Row

(oth

erSy

mbo

l))

75re

turn

−1;

76re

turn

0;77

}78 79

Arr

ayL

ist<

Loc

atio

n>

loca

tio

ns

=bo

ard

.get

Em

ptyS

pace

s()

;80

dou

ble

scor

e=

0;81

for

(Loc

atio

nlo

cati

on

:lo

cati

on

s)

{82

Boa

rdbo

ard2

=bo

ard

.clo

ne

();

83bo

ard2

.mak

eMov

e(lo

cati

on,

turn

);

84if

(tu

rn.e

qu

als(

myS

ymbo

l))

85sc

ore

+=

0.00

01∗

mov

eAnd

Eva

luat

e(bo

ard2

,ot

herS

ymbo

l);

86el

se

100

Page 104: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

87sc

ore

+=

0.00

01∗

mov

eAnd

Eva

luat

e(bo

ard2

,m

ySym

bol)

;88

}89

retu

rnsc

ore

;90

}91 92

/∗93

∗p

ubl

icL

ocat

ion

mak

eMov

e(B

oard

boar

d)

{//

1.W

inif

po

ssib

leth

ism

ove.

94∗

Loc

atio

nw

inni

ngM

ove

=m

akeW

inni

ngM

ove(

boar

d,

myS

ymbo

l);

if(

win

ning

Mov

e95

∗!=

nu

ll)

retu

rnw

inni

ngM

ove

;96

∗97

∗//

2.B

lock

oppo

nen

tif

po

ssib

le(T

his

mea

nsgo

ing

inth

e"

win

nin

gsp

ace

"98

∗fo

rth

eot

her

pers

on)

.L

ocat

ion

bloc

kin

gMov

e=

mak

eWin

ning

Mov

e(bo

ard

,99

∗ot

herS

ymbo

l);

if(

bloc

kin

gMov

e!=

nu

ll)

retu

rnbl

ocki

ngM

ove

;10

0∗

101

∗//

3.B

ranc

hif

po

ssib

le.

102

∗10

3∗

Arr

ayL

ist<

Loc

atio

n>

loca

tio

ns

=bo

ard

.get

Em

ptyS

pace

s()

;}

104

∗/10

510

6p

riv

ate

Loc

atio

nm

akeW

inni

ngM

ove(

Boa

rdbo

ard

,Sy

mbo

lsy

mbo

lToW

in)

{10

7fo

r(i

nt

row

=1;

row

<=

3;ro

w+

+)

{10

8fo

r(i

nt

col

=1;

col

<=

3;co

l++

){

109

int

myS

ymLo

c1=

col

+1;

110

int

myS

ymLo

c2=

col

+2;

111

if(m

ySym

Loc1

==

4)11

2m

ySym

Loc1

=1;

113

if(m

ySym

Loc1

==

5)11

4m

ySym

Loc1

=2;

115

if(m

ySym

Loc2

==

4)11

6m

ySym

Loc2

=1;

117

if(m

ySym

Loc2

==

5)11

8m

ySym

Loc2

=2;

119

120

if(b

oard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(m

ySym

Loc1

,ro

w)

)12

1.e

qu

als(

sym

bolT

oWin

)12

2&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(12

3ne

wL

ocat

ion

(myS

ymLo

c2,

row

)).e

qu

als(

124

sym

bolT

oWin

)12

5&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(c

ol,

row

))12

6.e

qu

als(

empt

ySym

bol)

)

127

retu

rnne

wL

ocat

ion

(col

,ro

w)

;12

8}

129

}13

0fo

r(i

nt

col

=1;

col

<=

3;co

l++

){

131

for

(in

tro

w=

1;ro

w<

=3;

row

++

){

132

int

myS

ymLo

c1=

row

+1;

133

int

myS

ymLo

c2=

row

+2;

134

if(m

ySym

Loc1

==

4)13

5m

ySym

Loc1

=1;

136

if(m

ySym

Loc1

==

5)13

7m

ySym

Loc1

=2;

138

if(m

ySym

Loc2

==

4)13

9m

ySym

Loc2

=1;

140

if(m

ySym

Loc2

==

5)14

1m

ySym

Loc2

=2;

142

143

if(b

oard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(c

ol,

myS

ymLo

c1)

)14

4.e

qu

als(

sym

bolT

oWin

)14

5&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(14

6ne

wL

ocat

ion

(col

,m

ySym

Loc2

)).e

qu

als(

147

sym

bolT

oWin

)14

8&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(c

ol,

row

))14

9.e

qu

als(

empt

ySym

bol)

)15

0re

turn

new

Loc

atio

n(c

ol,

row

);

151

}15

2}

153

154

//.

155

//.

156

//.

157

158

if(b

oard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(1

,1)

).e

qu

als(

sym

bolT

oWin

)15

9&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(2

,2)

).e

qu

als(

160

sym

bolT

oWin

)16

1&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(3

,3)

).e

qu

als(

162

empt

ySym

bol)

)16

3re

turn

new

Loc

atio

n(3

,3)

;16

4if

(boa

rd.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(2,

2))

.eq

ual

s(sy

mbo

lToW

in)

165

&&

boar

d.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(3,

3))

.eq

ual

s(16

6sy

mbo

lToW

in)

167

&&

boar

d.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(1,

1))

.eq

ual

s(16

8em

ptyS

ymbo

l))

169

retu

rnne

wL

ocat

ion

(1,

1);

101

Page 105: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

170

if(b

oard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(1

,1)

).e

qu

als(

sym

bolT

oWin

)17

1&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(3

,3)

).e

qu

als(

172

sym

bolT

oWin

)17

3&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(2

,2)

).e

qu

als(

174

empt

ySym

bol)

)17

5re

turn

new

Loc

atio

n(2

,2)

;17

617

7//

.17

8//

.17

9//

.18

0if

(boa

rd.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(1,

3))

.eq

ual

s(sy

mbo

lToW

in)

181

&&

boar

d.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(2,

2))

.eq

ual

s(18

2sy

mbo

lToW

in)

183

&&

boar

d.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(3,

1))

.eq

ual

s(18

4em

ptyS

ymbo

l))

185

retu

rnne

wL

ocat

ion

(3,

1);

186

if(b

oard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(1

,3)

).e

qu

als(

sym

bolT

oWin

)18

7&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(3

,1)

).e

qu

als(

188

sym

bolT

oWin

)18

9&

&bo

ard

.get

Sym

bol

AtL

ocat

ion

(new

Loc

atio

n(2

,2)

).e

qu

als(

190

empt

ySym

bol)

)19

1re

turn

new

Loc

atio

n(2

,2)

;19

2if

(boa

rd.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(3,

1))

.eq

ual

s(sy

mbo

lToW

in)

193

&&

boar

d.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(2,

2))

.eq

ual

s(19

4sy

mbo

lToW

in)

195

&&

boar

d.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(1,

3))

.eq

ual

s(19

6em

ptyS

ymbo

l))

197

retu

rnne

wL

ocat

ion

(1,

3);

198

199

retu

rnn

ull

;20

0}

201

}

Source/tictactoe/C

ustomPlayer.ja

va1

pac

kage

uk.c

o.t

f214

;2 3

imp

ort

java

.uti

l.A

rray

Lis

t;

4im

por

tja

va.i

o.∗

;5

imp

ort

java

.uti

l.∗

;6 7

pu

bli

ccl

ass

Cus

tom

Pla

yer

exte

nd

sP

laye

r{

8

9p

ub

lic

Cus

tom

Pla

yer(

Sym

bol

sym

bol)

{10

sup

er(s

ymbo

l);

11}

12 13p

ub

lic

Loc

atio

nm

akeM

ove(

Boa

rdbo

ard

){

14b

oole

anva

lidM

ove

=fa

lse

;15

Loc

atio

nlo

c=

nu

ll;

16 17w

hil

e(!

vali

dMov

e){

18Sy

stem

.out

.pri

ntl

n("

The

Boa

rdso

far

:")

;19

Syst

em.o

ut.p

rin

t(bo

ard

.get

Boa

rdA

sStr

ing

())

;20 21

Sca

nn

erin

pu

t=

new

Sca

nn

er(S

yste

m.i

n)

;22

Syst

em.o

ut.p

rin

tln

("E

nter

colu

mn

top

lace

anX

in:"

);

23in

tco

l=

inp

ut

.nex

tIn

t()

;24

Syst

em.o

ut.p

rin

tln

("E

nter

row

top

lace

anX

in:"

);

25in

tro

w=

inp

ut

.nex

tIn

t()

;26 27

if(c

ol<

0||

col

>3

||ro

w<

0||

row

>3)

{28

Syst

em.o

ut.p

rin

tln

("E

nter

row

and

colu

mn

num

bers

12

or3.

");

29co

nti

nu

e;

30}

31 32lo

c=

new

Loc

atio

n(c

ol,

row

);

33if

(boa

rd.g

etS

ymb

olA

tLoc

atio

n(l

oc)

.eq

ual

s(ne

wSy

mbo

l("e

")))

34va

lidM

ove

=tr

ue

;35

else

36Sy

stem

.out

.pri

ntl

n("

Not

anem

pty

squ

are

.C

hoos

eag

ain

.")

;37

}38 39

retu

rnlo

c;

40}

41}

Source/tictactoe/P

layG

ame.java

1p

acka

geuk

.co

.tf2

14;

2 3im

por

tja

va.i

o.B

uff

ered

Wri

ter

;4

imp

ort

java

.io

.Fil

e;

5im

por

tja

va.i

o.F

ileW

rite

r;

6im

por

tja

va.i

o.I

OE

xcep

tion

;7 8

imp

ort

com

.rap

idm

iner

.op

erat

or.O

per

ator

Cre

atio

nE

xcep

tion

;9

imp

ort

com

.rap

idm

iner

.op

erat

or.O

per

ator

Exc

epti

on;

10

102

Page 106: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

11im

por

tuk

.co

.tf2

14.r

apid

.RM

Tic

Tac

Toe

Cla

ssif

ier

;12 13

pu

bli

ccl

ass

Pla

yGam

e{

14 15p

ub

lic

sta

tic

void

mai

n(S

trin

g[]

args

){

16R

MT

icT

acT

oeC

lass

ifie

rc

lass

ifie

r=

nu

ll;

17tr

y{

18c

lass

ifie

r=

new

RM

Tic

Tac

Toe

Cla

ssif

ier(

19ne

wF

ile

("M

odel

Ap

pli

erp

roce

ssso

met

urn

s.x

ml"

));

20}

catc

h(I

OE

xcep

tion

e){

21//

TODO

Aut

o−ge

ner

ated

catc

hbl

ock

22e

.pri

ntS

tack

Tra

ce()

;23

}ca

tch

(Op

erat

orC

reat

ion

Exc

epti

one)

{24

//TO

DOA

uto−

gen

erat

edca

tch

blo

ck25

e.p

rin

tSta

ckT

race

();

26}

catc

h(O

per

ator

Exc

epti

one)

{27

//TO

DOA

uto−

gen

erat

edca

tch

blo

ck28

e.p

rin

tSta

ckT

race

();

29}

30 31B

oard

boar

d=

new

Boa

rd()

;32

Str

ing

outp

uts

tr=

"";

33 34ou

tpu

tstr

+=

pri

ntB

oard

Lab

els(

1)+

","

;35

outp

uts

tr+

=p

rin

tBoa

rdL

abel

s(2)

+"

,";

36ou

tpu

tstr

+=

pri

ntB

oard

Lab

els(

3)+

","

;37

outp

uts

tr+

=p

rin

tBoa

rdL

abel

s(4)

+"

,";

38ou

tpu

tstr

+=

pri

ntB

oard

Lab

els(

5)+

","

;39

outp

uts

tr+

=p

rin

tBoa

rdL

abel

s(6)

+"

,";

40ou

tpu

tstr

+=

pri

ntB

oard

Lab

els(

7)+

","

;41

outp

uts

tr+

=p

rin

tBoa

rdL

abel

s(8)

+"

,";

42ou

tpu

tstr

+=

pri

ntB

oard

Lab

els(

9);

43ou

tpu

tstr

+=

"\n

";

44 45P

laye

rY

pla

yer

=ne

wR

ando

mP

laye

r(ne

wSy

mbo

l("Y

"))

;46

Pla

yer

Xp

laye

r=

new

Cus

tom

Pla

yer(

new

Sym

bol(

"X")

);

47P

laye

rp

erfe

ctP

lay

er=

new

Per

fect

Pla

yer

(new

Sym

bol(

"X")

);

48in

ttu

rn=

0;49 50

//Y

goes

firs

th

alf

ofth

eti

me

.51

if(M

ath

.ran

dom

()>

0.5)

{52

turn

++

;53

Loc

atio

nlo

c=

Yp

laye

r.m

akeM

ove(

boar

d)

;54

boar

d.m

akeM

ove(

loc

,ne

wSy

mbo

l("Y

"))

;55 56

outp

uts

tr+

=tu

rn+

","

;

57ou

tpu

tstr

+=

pri

ntB

oard

Det

ails

(boa

rd)

+"

,";

58//

Syst

em.o

ut.p

rin

tln

(bo

ard

);

59}

60 61w

hil

e(!

boar

d.g

ameC

ompl

ete(

)){

62tu

rn+

+;

63Sy

stem

.out

64.p

rin

tln

("In

the

belo

wsi

tuat

ion

,P

erfe

ctP

lay

erw

ould

mov

ein

one

of:"

65+

per

fect

Pla

yer

.mak

eMov

e(bo

ard

)66

+p

erfe

ctP

lay

er.m

akeM

ove(

boar

d)

67+

per

fect

Pla

yer

.mak

eMov

e(bo

ard

)68

+p

erfe

ctP

lay

er.m

akeM

ove(

boar

d))

;69

Loc

atio

nlo

c=

Xp

laye

r.m

akeM

ove(

boar

d)

;70

boar

d.m

akeM

ove(

loc

,ne

wSy

mbo

l("X

"))

;71 72

outp

uts

tr+

=tu

rn+

","

;73

outp

uts

tr+

=p

rin

tBoa

rdD

etai

ls(b

oard

);

74//

Syst

em.o

ut.p

rin

tln

(bo

ard

);

75 76S

trin

gp

add

edou

tpu

tstr

=ou

tpu

tstr

;77

for

(in

ti

=0;

i<

(9−

turn

)∗

17;

i++

)78

pad

ded

outp

uts

tr+

="

,";

79 80w

rite

ToF

ile

(pad

ded

outp

uts

tr)

;81

outp

uts

tr+

="

,";

82Sy

stem

.out

.pri

ntl

n("

Ith

ink

you

are

usi

ng

the

stra

tegy

:"

83+

cla

ssif

ier

.ap

ply

())

;84 85

if(b

oard

.gam

eCom

plet

e())

86b

reak

;87 88

turn

++

;89

loc

=Y

pla

yer.

mak

eMov

e(bo

ard

);

90bo

ard

.mak

eMov

e(lo

c,

new

Sym

bol(

"Y")

);

91 92ou

tpu

tstr

+=

turn

+"

,";

93ou

tpu

tstr

+=

pri

ntB

oard

Det

ails

(boa

rd)

+"

,";

94//

Syst

em.o

ut.p

rin

tln

(bo

ard

);

95}

96 97Sy

stem

.out

.pri

ntl

n(o

utp

uts

tr)

;98

Syst

em.e

xit

(1)

;99

}10

010

1p

riv

ate

sta

tic

Str

ing

pri

ntB

oard

Det

ails

(Boa

rdbo

ard

){

103

Page 107: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

102

Str

ing

outp

uts

tr=

"";

103

104

for

(in

tro

w=

1;ro

w<

=3;

row

++

){

105

for

(in

tco

l=

1;co

l<

=3;

col+

+)

{10

6ou

tpu

tstr

+=

boar

d.g

etS

ymb

olA

tLoc

atio

n(n

ewL

ocat

ion

(col

,ro

w))

107

+"

,";

108

}10

9}

110

111

outp

uts

tr+

=bo

ard

.get

Num

bero

fSym

bol(

new

Sym

bol(

"X")

)+

","

;11

2ou

tpu

tstr

+=

boar

d.g

etN

umbe

rofS

ymbo

l(ne

wSy

mbo

l("Y

"))

+"

,";

113

outp

uts

tr+

=bo

ard

.get

Num

bero

fSym

bol(

new

Sym

bol(

""))

+"

,";

114

outp

uts

tr+

=bo

ard

.has

Tw

oInA

Row

(new

Sym

bol(

"X")

)+

","

;11

5ou

tpu

tstr

+=

boar

d.h

asT

woI

nAR

ow(n

ewSy

mbo

l("Y

"))

+"

,";

116

outp

uts

tr+

=bo

ard

.has

Thr

eeIn

AR

ow(n

ewSy

mbo

l("X

"))

+"

,";

117

outp

uts

tr+

=bo

ard

.has

Thr

eeIn

AR

ow(n

ewSy

mbo

l("Y

"))

;11

8//

Syst

em.o

ut.p

rin

tln

();

119

120

retu

rnou

tpu

tstr

;12

1}

122

123

pri

vat

est

ati

cvo

idw

rite

ToF

ile

(Str

ing

outp

ut)

{12

4tr

y{

125

//C

reat

efi

le12

6F

ileW

rite

rfs

trea

m=

new

Fil

eWri

ter(

"out

tem

p.t

xt"

);

127

Bu

ffer

edW

rite

rou

t=

new

Bu

ffer

edW

rite

r(fs

trea

m)

;12

8ou

t.w

rite

(ou

tpu

t);

129

//C

lose

the

outp

ut

stre

am13

0ou

t.cl

ose

();

131

}ca

tch

(Exc

epti

one)

{//

Cat

chex

cept

ion

ifan

y13

2Sy

stem

.err

.pri

ntl

n("

Err

or:

"+

e.g

etM

essa

ge()

);

133

}13

4}

135

136

pri

vat

est

ati

cS

trin

gp

rin

tBoa

rdL

abel

s(in

ttu

rn)

{13

7S

trin

gou

tpu

tstr

=tu

rn+

"_tu

rn,"

;13

8fo

r(i

nt

row

=1;

row

<=

3;ro

w+

+)

{13

9fo

r(i

nt

col

=1;

col

<=

3;co

l++

){

140

outp

uts

tr+

=tu

rn+

"_"

+co

l+

row

+"

,";

141

}14

2}

143

144

outp

uts

tr+

=tu

rn+

"_nu

mX

s,"

;14

5ou

tpu

tstr

+=

turn

+"_

num

Ys,

";

146

outp

uts

tr+

=tu

rn+

"_nu

mE

mpt

ys,"

;

147

outp

uts

tr+

=tu

rn+

"_2X

s,"

;14

8ou

tpu

tstr

+=

turn

+"_

2Ys,

";

149

outp

uts

tr+

=tu

rn+

"_3X

s,"

;15

0ou

tpu

tstr

+=

turn

+"_

3Ys"

;15

115

2re

turn

outp

uts

tr;

153

}15

4}

Source/tictactoe/R

MTicTa

cToeClassifier.ja

va1

pac

kage

uk.c

o.t

f214

.rap

id;

2 3im

por

tja

va.i

o.F

ile

;4

imp

ort

java

.io

.IO

Exc

epti

on;

5im

por

tja

va.u

til.

∗;

6 7im

por

tco

m.r

apid

min

er.R

apid

Min

er;

8im

por

tco

m.r

apid

min

er.P

roce

ss;

9im

por

tco

m.r

apid

min

er.e

xam

ple

.∗;

10im

por

tco

m.r

apid

min

er.e

xam

ple

.set

.Rem

appe

dExa

mpl

eSet

;11

imp

ort

com

.rap

idm

iner

.exa

mpl

e.s

et.S

impl

eExa

mpl

eSet

;12

imp

ort

com

.rap

idm

iner

.exa

mpl

e.t

able

.Exa

mpl

eTab

le;

13im

por

tco

m.r

apid

min

er.o

per

ator

.∗;

14im

por

tco

m.r

apid

min

er.t

oo

ls.O

per

ator

Ser

vic

e;

15im

por

tco

m.r

apid

min

er.t

oo

ls.X

ML

Exc

epti

on;

16 17p

ub

lic

cla

ssR

MT

icT

acT

oeC

lass

ifie

r{

18 19p

riv

ate

Pro

cess

pro

cess

;20 21

pu

bli

cR

MT

icT

acT

oeC

lass

ifie

r(F

ile

mod

elF

ile

)th

row

sIO

Exc

epti

on,

22O

per

ator

Cre

atio

nE

xcep

tion

,O

per

ator

Exc

epti

on{

23 24//

Rap

idM

iner

.in

it(

fals

e,

fals

e,

fals

e,

tru

e)

;25

Rap

idM

iner

.in

it()

;26 27

try

{28

pro

cess

=ne

wP

roce

ss(m

odel

Fil

e)

;29

}ca

tch

(XM

LE

xcep

tion

e){

30//

TODO

Aut

o−ge

ner

ated

catc

hbl

ock

31e

.pri

ntS

tack

Tra

ce()

;32

}33 34

}35 36

pu

bli

cS

trin

gap

ply

(){

104

Page 108: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

37 38IO

Con

tain

erou

tpu

t=

nu

ll;

39tr

y{

40ou

tpu

t=

pro

cess

.run

();

41}

catc

h(O

per

ator

Exc

epti

one)

{42

//TO

DOA

uto−

gen

erat

edca

tch

blo

ck43

e.p

rin

tSta

ckT

race

();

44}

45 46//

Syst

em.o

ut.p

rin

tln

(ou

tpu

t)

;47 48

Exa

mpl

eSet

resu

lts

=(E

xam

pleS

et)

outp

ut.

getE

lem

entA

t(0)

;49

Att

rib

ute

sat

ts=

resu

lts

.get

Att

rib

ute

s()

;50

Att

rib

ute

cla

ssa

tt=

atts

.get

("p

red

icti

on(C

lass

)")

;51

Iter

ator

<E

xam

ple>

it=

resu

lts

.ite

rato

r()

;52 53

Exa

mpl

ero

w=

it.n

ext(

);

54S

trin

gp

red

icti

on=

row

.get

Nom

inal

Val

ue(

cla

ssa

tt)

;55 56

retu

rnp

red

icti

on;

57}

58 59}

105

Page 109: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

A.2

GoG

uiMod

ification

s

Toavoidreprod

ucingtheentir

e(several

hund

redpa

ges)

GoG

uisource

code

here,t

hebe

low

isaun

ified

diffof

mymod

ificatio

ns(‘>

’rep

resentsan

adde

dlin

e,‘<

’isa

removed

line).T

hese

chan

gesw

erem

adet

oGoG

uiversion1.2p

re2.

The

mod

ified

executab

lesource

code

isinclud

edon

theCD

bund

ledwith

this

repo

rt.

Source/gogui/G

oGui.diff

61a6

2>

imp

ort

net

.sf.

gogu

i.go

.Alp

habe

tNum

bers

;14

6a14

8,1

53>

pri

vat

eS

trin

gO

utpu

tToF

ileT

ext

=""

;> >

pri

vat

eb

oole

ando

neW

aiti

ng=

tru

e;

> >p

riv

ate

Str

ing

boa

rdA

sStr

ing

;> 15

2a1

60>

done

Wai

ting

=tr

ue

;62

2c6

30,6

31<

Fil

efi

le=

show

Save

(i18

n("

MSG

_EXP

ORT

_TEX

T")

);

−−−

>//

Fil

efi

le=

show

Save

(i18

n("

MSG

_EXP

ORT

_TEX

T")

);

>F

ile

file

=ne

wF

ile

("

fin

alP

osi

tio

ns

.tx

t")

;62

7c6

36<

Str

ing

tex

t=

Boa

rdU

til.

toS

trin

g(g

etB

oard

(),

fals

e)

;−−

−>

Out

putT

oFil

eTex

t+

=B

oard

Uti

l.to

Str

ingO

neL

ine

(get

Boa

rd()

,fa

lse

);

629

c638

<ou

t.p

rin

t(te

xt)

;−−

−>

out.

pri

nt(

Out

putT

oFil

eTex

t);

1086

a109

6,1

361

> >p

ub

lic

Arr

ayL

ist<

Fil

e>g

etD

irec

tori

es()

>{

>A

rray

Lis

t<F

ile>

dir

ecto

ries

=ne

wA

rray

Lis

t<F

ile

>()

;> >

dir

ecto

ries

.add

(ne

wF

ile

("C

puG

ames

/Aya

VsA

ya")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

Cpu

Gam

es/A

yaV

sMog

o1")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

Cpu

Gam

es/A

yaV

sGnu

go10

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("C

puG

ames

/Aya

VsG

nugo

1")

);

> >d

irec

tori

es.a

dd(

new

Fil

e("

Cpu

Gam

es/g

nugo

1Vsg

nugo

1")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

Cpu

Gam

es/g

nugo

1VsA

ya")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

Cpu

Gam

es/g

nugo

1VsM

ogo1

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("C

puG

ames

/gnu

go1V

sgnu

go10

"))

;> >

dir

ecto

ries

.add

(ne

wF

ile

("C

puG

ames

/Mog

o1V

sMog

o1")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

Cpu

Gam

es/M

ogo1

VsG

nugo

10")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

Cpu

Gam

es/M

ogo1

VsA

ya")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

Cpu

Gam

es/M

ogo1

VsG

nuG

o1")

);

> >d

irec

tori

es.a

dd(

new

Fil

e("

Cpu

Gam

es/g

nugo

10V

sgnu

go10

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("C

puG

ames

/Gnu

Go1

0VsA

ya")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

Cpu

Gam

es/G

nuG

o10V

sGnu

Go1

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("C

puG

ames

/Gnu

Go1

0VsM

ogo1

"))

;> >

/∗>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/sh

iryu

u")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

KG

SByN

ame/

Eri

nys

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/m

ichi

2009

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/a

bel"

))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/lo

veH

ER

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/st

op

blit

z")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

KG

SByN

ame/

kora

m")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

KG

SByN

ame/

zche

n")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

KG

SByN

ame/

sato

ke")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

KG

SByN

ame/

arte

m92

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/gu

xxan

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/hi

rubo

n")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

KG

SByN

ame/

yagu

mo

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/ji

m")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

KG

SByN

ame/

agu

ila

r1")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

KG

SByN

ame/

lori

keet

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/su

per

tjc

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/ea

stw

isdo

m")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

KG

SByN

ame/

BUM

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/ta

ke19

70")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

KG

SByN

ame/

cool

babe

"))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/tu

rk")

);

>d

irec

tori

es.a

dd(

new

Fil

e("

KG

SByN

ame/

GB

Pac

ker"

))

;>

dir

ecto

ries

.add

(ne

wF

ile

("K

GSB

yNam

e/H

utos

hi")

);

>∗/

> >re

turn

(d

irec

tori

es)

;>

}

106

Page 110: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

> >p

ub

lic

void

acti

onW

rite

All

Boa

rdP

osit

ion

s()

>{

>w

rite

Boa

rdS

tats

(tru

e,

fals

e)

;>

}> >

pu

bli

cvo

idac

tion

Wri

teA

llS

core

s()

>{

>R

unna

ble

run

nab

le=

new

Run

nabl

e()

{>

pu

bli

cvo

idru

n()

{>

wri

teB

oard

Sta

ts(f

als

e,

tru

e)

;>

}>

};>

Thr

ead

t=

new

Thr

ead

(ru

nn

able

);

>t

.sta

rt()

;>

Syst

em.o

ut.p

rin

tln

("C

ompl

eted

acti

onW

rite

All

Sco

res

.")

;>

}> >

pu

bli

cvo

idac

tion

Wri

teB

oard

Pos

itio

nsA

nd

Sco

res

()>

{>

Run

nabl

eru

nn

able

=ne

wR

unna

ble

(){

>p

ub

lic

void

run

(){

>w

rite

Boa

rdS

tats

(tru

e,

tru

e)

;>

}>

};>

Thr

ead

t=

new

Thr

ead

(ru

nn

able

);

>t

.sta

rt()

;>

Syst

em.o

ut.p

rin

tln

("C

ompl

eted

acti

onW

rite

All

Sco

res

.")

;>

}> >

pu

bli

cvo

idw

rite

Boa

rdS

tats

(b

oole

anb

oard

Pos

itio

ns

,b

oole

anb

oard

Sco

res

)>

{>

Arr

ayL

ist<

Fil

e>d

irec

tori

es=

get

Dir

ecto

ries

();

> >//

Che

ckd

irec

tori

esca

nbe

read

.>

for

(F

ile

dir

ecto

ry:

dir

ecto

ries

)>

{>

if(

!d

irec

tory

.can

Rea

d()

)>

{>

Syst

em.o

ut.p

rin

tln

("U

nabl

eto

read

"+

dir

ecto

ry.

getA

bso

lute

Pat

h()

);

>re

turn

;>

}>

}>

>S

trin

gBu

ild

erO

utpu

tToF

ile

=ne

wS

trin

gBu

ild

er()

;>

int

lab

el=

1;>

int

max

turn

s=

300;

>in

tev

eryN

thT

urn

=10

;>

Alp

habe

tNum

bers

alp

hab

et=

new

Alp

habe

tNum

bers

();

> >O

utpu

tToF

ile

.app

end

("g

amef

ile

,")

;>

for

(in

ti=

ever

yNth

Tur

n;

i<

=m

axtu

rns;

i+

=ev

eryN

thT

urn

)>

{>

if(

boa

rdP

osit

ion

s)

>{

>fo

r(

int

row

=19

;ro

w>

=1;

row

−−)

>{

>fo

r(

int

col=

1;co

l<

=19

;co

l++

)>

{>

Out

putT

oFil

e.a

ppen

d(

i+

"_"

+al

ph

abet

.fli

p(c

ol)

+ro

w+

","

);

>}

>}

>}

>if

(b

oard

Sco

res

)>

{>

Out

putT

oFil

e.a

ppen

d(

i+

"_ar

eaB

lack

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_ar

eaW

hite

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_ca

ptur

edB

lack

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_ca

ptur

edW

hite

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_re

sult

Are

a,"

);

>O

utpu

tToF

ile

.app

end

(i

+"_

resu

ltT

erri

tory

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_te

rrit

ory

Bla

ck,"

);

>O

utpu

tToF

ile

.app

end

(i

+"_

terr

itor

yWh

ite

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_ad

jace

ntO

ppC

olor

Poi

nts

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_ch

ains

Bla

ck,"

);

>O

utpu

tToF

ile

.app

end

(i

+"_

chai

nsW

hite

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_ch

ains

Em

pty

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_at

ariB

lack

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_at

ariW

hite

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_m

eanC

hain

Size

Bla

ck,"

);

>O

utpu

tToF

ile

.app

end

(i

+"_

mea

nCha

inSi

zeW

hite

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_li

ber

ties

Bla

ck,"

);

>O

utpu

tToF

ile

.app

end

(i

+"_

lib

erti

esW

hit

e,"

);

>O

utpu

tToF

ile

.app

end

(i

+"_

mea

nL

iber

ties

Bla

ck,"

);

>O

utpu

tToF

ile

.app

end

(i

+"_

mea

nLib

erti

esW

hite

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_ey

esB

lack

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_ey

esW

hite

,")

;>

Out

putT

oFil

e.a

ppen

d(

i+

"_m

eanE

yesB

lack

,")

;

107

Page 111: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

>O

utpu

tToF

ile

.app

end

(i

+"_

mea

nEye

sWhi

te,"

);

>}

>}

>O

utpu

tToF

ile

.app

end

("L

abel

")

;> >

for

(F

ile

dir

ecto

ry:

dir

ecto

ries

)>

{>

Fil

e[]

file

s=

dir

ecto

ry.l

istF

iles

();

>in

tfi

len

um

=0;

> >fo

r(

Fil

efi

le:

file

s)

>{

>fi

len

um

++

;>

//if

(fi

len

um

>50

)>

//br

eak

;> >

Out

putT

oFil

e.a

ppen

d(

"\n

")

;>

Out

putT

oFil

e.a

ppen

d(

file

.get

Nam

e()

+"

,")

;>

acti

onO

penF

ileN

oNew

Thr

ead

(fi

le)

;>

for

(in

ti=

0;i

<=

max

turn

s;i+

+)

>{

>if

(i

%ev

eryN

thT

urn

==

0&

&i

!=0

)//

Onl

yge

tsc

ore

for

ever

yN

thm

ove.

>{

> >O

utpu

tToF

ile

.app

end

(ge

tBoa

rdD

etai

ls(i

,b

oard

Pos

itio

ns

,b

oard

Sco

res)

);

>//

acti

onS

core

();

>}

>ac

tion

For

war

d(1

);

>}

> >/∗

>if

(la

bel

<=

4)

>O

utpu

tToF

ile

.app

end

("A

ya"

);

>el

seif

(la

bel

<=

8)

>O

utpu

tToF

ile

.app

end

("G

nuG

o1"

);

>el

seif

(la

bel

<=

12)

>O

utpu

tToF

ile

.app

end

("M

ogo1

")

;>

else

if(

labe

l<

=16

)>

Out

putT

oFil

e.a

ppen

d(

"Gnu

Go1

0")

;>

∗/> >

Fil

ep

aren

t=

file

.get

Par

entF

ile

();

>S

trin

gpa

rent

Nam

e=

par

ent

.get

Nam

e()

;>

Out

putT

oFil

e.a

ppen

d(

pare

ntN

ame

);

> >}

>la

bel

++

;> >

try

{>

Fil

eo

utf

ile

=ne

wF

ile

("ou

tpu

t.cs

v")

;>

Fil

eWri

ter

fstr

eam

=ne

wF

ileW

rite

r(o

utf

ile

,tru

e)

;>

Bu

ffer

edW

rite

rou

t=

new

Bu

ffer

edW

rite

r(fs

trea

m)

;>

out.

wri

te(O

utpu

tToF

ile

.toS

trin

g()

);

>ou

t.cl

ose

();

>}

catc

h(

IOE

xcep

tion

e){

>//

TODO

Aut

o−ge

ner

ated

catc

hbl

ock

>e

.pri

ntS

tack

Tra

ce()

;>

}>

Out

putT

oFil

e=

new

Str

ingB

uil

der

();

>}

>}

> > >p

riv

ate

Str

ing

getB

oard

Det

ails

(in

ttu

rnN

um,

boo

lean

boa

rdP

osit

ion

s,

boo

lean

boa

rdS

core

s)

>{

>S

trin

gBu

ild

erre

t=

new

Str

ingB

uil

der

();

>S

trin

gne

wB

oard

AsS

trin

g=

Boa

rdU

til.

toS

trin

gOn

eLin

e(

getB

oard

(),

fals

e)

;>

if(

boa

rdP

osit

ion

s)

>{

>re

t.a

ppen

d(

new

Boa

rdA

sStr

ing

);

>}

>if

(b

oard

Sco

res

)>

{>

Sco

recu

rren

tSco

re=

nu

ll;

>if

(tu

rnN

um>

=30

0)

>{

>if

(!n

ewB

oard

AsS

trin

g.e

qu

als(

boa

rdA

sStr

ing

))

>{

>ac

tion

Sco

re()

;>

wh

ile

(do

neW

aiti

ng=

=fa

lse

)>

try

{>

//w

ait(

5000

);

>T

hrea

d.s

leep

(100

);

>//

acti

onF

orw

ard

(0)

;>

}ca

tch

(In

terr

up

ted

Ex

cep

tion

e){

>//

TODO

Aut

o−ge

ner

ated

catc

hbl

ock

>Sy

stem

.out

.pri

ntl

n(

"Sle

epin

terr

up

ted

.A

sex

pec

ted

."

);

108

Page 112: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

>}

>}

>b

oard

AsS

trin

g=

new

Boa

rdA

sStr

ing

;>

curr

entS

core

=in

itS

core

Ret

urn

(m_

dead

Ston

e);

>}

>el

se>

{>

curr

entS

core

=in

itS

core

Ret

urn

(nu

ll)

;>

}>

ret

.app

end

(cu

rren

tSco

re.m

_ar

eaB

lack

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_ar

eaW

hite

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_ca

ptur

edB

lack

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_ca

ptur

edW

hite

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_re

sult

Are

a+

","

);

>re

t.a

ppen

d(

curr

entS

core

.m_

resu

ltT

erri

tory

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_te

rrit

oryB

lack

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_te

rrit

oryW

hit

e+

","

);

>re

t.a

ppen

d(

curr

entS

core

.m_

adja

cent

Opp

Col

orP

oint

s+

","

);

>re

t.a

ppen

d(

curr

entS

core

.m_

chai

nsB

lack

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_ch

ains

Whi

te+

","

);

>re

t.a

ppen

d(

curr

entS

core

.m_

chai

nsE

mpt

y+

","

);

>re

t.a

ppen

d(

curr

entS

core

.m_

atar

iBla

ck+

","

);

>re

t.a

ppen

d(

curr

entS

core

.m_

atar

iWhi

te+

","

);

>re

t.a

ppen

d(

curr

entS

core

.m_

mea

nCha

inSi

zeB

lack

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_m

eanC

hain

Size

Whi

te+

","

);

>re

t.a

ppen

d(

curr

entS

core

.m_

lib

erti

esB

lack

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_li

ber

ties

Wh

ite

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_m

eanL

iber

ties

Bla

ck+

","

);

>re

t.a

ppen

d(

curr

entS

core

.m_

mea

nLib

erti

esW

hite

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_ey

esB

lack

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_ey

esW

hite

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_m

eanE

yesB

lack

+"

,")

;>

ret

.app

end

(cu

rren

tSco

re.m

_m

eanE

yesW

hite

+"

,")

;>

}> >

retu

rnre

t.t

oStr

ing

();

>}

1107

a138

3,1

403

> >p

ub

lic

void

acti

onO

penF

ileN

oNew

Thr

ead

(fin

al

Fil

efi

le)

>{

>if

(fi

le=

=n

ull

)>

retu

rn;

>if

(!ch

eckS

tate

Ch

ange

Pos

sib

le()

)>

retu

rn;

>if

(!ch

eckS

aveG

ame(

))>

retu

rn;

>fi

na

lb

oole

anp

rote

ctG

ui

=(m

_gt

p!=

nu

ll)

;>

if(p

rote

ctG

ui)

>p

rote

ctG

ui(

);

>//

Sw

ingU

tili

ties

.in

voke

Lat

er(n

ewR

unna

ble

(){

>//

pu

blic

void

run

(){

>lo

adF

ile

(fi

le,

−1)

;>

boar

dCha

nged

Beg

in(f

als

e,

tru

e)

;>

if(p

rote

ctG

ui)

>u

np

rote

ctG

ui(

);

>//

}>

//})

;>

}13

43a1

640

>do

neW

aiti

ng=

fals

e;

1360

a165

8,1

705

>p

ub

lic

void

acti

onS

core

Cal

lWri

teB

oard

Sta

ts()

>{

>if

(m_

scor

eMod

e)>

retu

rn;

>if

(!ch

eckS

tate

Ch

ange

Pos

sib

le()

)>

retu

rn;

>b

oole

anpr

ogra

mR

eady

=(m

_gt

p!=

nu

ll&

&sy

nch

ron

izeP

rogr

am()

);

>if

(m_

gtp

==

nu

ll||

!pr

ogra

mR

eady

)>

{>

Str

ing

dis

able

Key

="n

et.s

f.go

gui.

gogu

i.G

oGui

.sco

re−

no−

prog

ram

";

>S

trin

gop

tion

alM

essa

ge;

>if

(m_

gtp

==

nu

ll)

>op

tion

alM

essa

ge=

"MSG

_SCO

RE_N

O_PR

OGRA

M"

;>

else

>op

tion

alM

essa

ge=

"MSG

_SCO

RE_C

ANNO

T_US

E_PR

OGRA

M"

;>

m_

mes

sage

Dia

logs

.sh

owIn

fo(d

isab

leK

ey,

this

,>

i18n

("M

SG_S

CORE

_MAN

UAL"

),

>i1

8n(o

pti

onal

Mes

sage

),

tru

e)

;>

upda

teV

iew

s(fa

lse

);

>in

itS

core

(nu

ll)

;>

retu

rn;

>}

>if

(m_

gtp

.isS

up

por

ted

("fi

na

l_st

atu

s_li

st")

)>

{>

Run

nabl

eca

llb

ack

=ne

wR

unna

ble

(){

109

Page 113: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

>p

ub

lic

void

run

(){

>sc

oreC

onti

nu

e()

;>

done

Wai

ting

=tr

ue

;>

}>

};>

done

Wai

ting

=fa

lse

;>

runL

engt

hyC

omm

and

("fi

na

l_st

atu

s_li

stde

ad"

,ca

llb

ack

);

>}

>el

se>

{>

Str

ing

dis

able

Key

=>

"net

.sf.

gogu

i.go

gui.

GoG

ui.s

core

−no

t−su

pp

orte

d"

;>

Str

ing

opti

onal

Mes

sage

;>

Str

ing

nam

e=

getP

rogr

amN

ame(

);

>op

tion

alM

essa

ge=

form

at(i

18n

("M

SG_S

CORE

_NO

_SUP

PORT

")

,na

me)

;>

m_

mes

sage

Dia

logs

.sh

owIn

fo(d

isab

leK

ey,

this

,>

i18n

("M

SG_S

CORE

_MAN

UAL"

),

>op

tion

alM

essa

ge,

tru

e)

;>

upda

teV

iew

s(fa

lse

);

>in

itS

core

(nu

ll)

;>

}>

}> 22

89a2

635

,263

6>

pri

vat

eP

oin

tLis

tm

_de

adSt

one

;> 26

23a2

971

>ac

tion

Ex

por

tTex

tPos

itio

n()

;33

73a3

722

,374

1> >

pri

vat

eS

core

init

Sco

reR

etu

rn(C

onst

Poi

ntL

ist

dea

dS

ton

es)

>{

>re

setB

oard

();

>G

uiB

oard

Uti

l.sc

oreB

egin

(m_

guiB

oard

,m

_co

untS

core

,ge

tBoa

rd()

,>

dea

dS

ton

es)

;>

m_

scor

eMod

e=

tru

e;

>if

(m_

scor

eDia

log

==

nu

ll)

>{

>Sc

orin

gMet

hod

scor

ingM

eth

od=

getG

ameI

nfo

().

par

seR

ule

s()

;>

m_

scor

eDia

log

=ne

wS

core

Dia

log

(th

is,

this

,sc

orin

gMet

hod

);

>}

>re

stor

eLoc

atio

n(m

_sc

oreD

ialo

g,

"sco

re")

;>

Kom

iko

mi

=ge

tGam

eInf

o()

.get

Kom

i()

;>

Scor

ingM

etho

dsc

orin

gMet

hod

=ge

tGam

eInf

o()

.par

seR

ule

s()

;>

//m

_sc

oreD

ialo

g.s

how

Sco

re(m

_co

untS

core

,ko

mi)

;>

//m

_sc

oreD

ialo

g.s

etV

isib

le(t

rue

);

>sh

owS

tatu

s(i1

8n("

STA

T_SC

ORE

"))

;>

retu

rnm

_co

untS

core

.get

Sco

re(k

omi,

scor

ingM

eth

od)

;>

}34

16c3

784

<S

trin

gw

arn

ings

=ru

nn

able

.get

War

ning

s()

;−−

−>

/∗S

trin

gw

arn

ings

=ru

nn

able

.get

War

nin

gs()

;34

26c3

794

<}

−−−

>}∗

/37

20a4

089

,409

0>

m_

dead

Ston

e=

isD

ead

Sto

ne

;>

done

Wai

ting

=tr

ue

;

Source/gogui/G

oGuiActions.diff

445a

446

,460

> >p

ub

lic

fin

al

Act

ion

m_

acti

onW

rite

All

Boa

rdP

osit

ion

s=

>ne

wA

ctio

n("

ACT

_W

RIT

E_A

LL_

BOA

RD

_PO

SITI

ON

S")

{>

pu

bli

cvo

idac

tion

Per

form

ed(A

ctio

nEve

nte)

{>

m_

goG

ui.a

ctio

nW

rite

All

Boa

rdP

osit

ion

s()

;}

};> >

pu

bli

cfi

na

lA

ctio

nm

_ac

tion

Wri

teA

llS

core

s=

>ne

wA

ctio

n("

ACT

_W

RITE

_A

LL_

SCO

RES"

){

>p

ub

lic

void

acti

onP

erfo

rmed

(Act

ionE

vent

e){

>m

_go

Gui

.act

ion

Wri

teA

llS

core

s()

;}

};> >

pu

bli

cfi

na

lA

ctio

nm

_ac

tion

Wri

teA

llB

oard

sAnd

Scor

es=

>ne

wA

ctio

n("

ACT_

WRI

TE_B

OAR

DS_

AND

_SCO

RES"

){

>p

ub

lic

void

acti

onP

erfo

rmed

(Act

ionE

vent

e){

>m

_go

Gui

.act

ion

Wri

teB

oard

Pos

itio

nsA

nd

Sco

res

();

}};

Source/gogui/G

oGuiMenuB

ar.diff

356a

357

,359

>m

enu

.add

Item

(act

ion

s.m

_ac

tion

Wri

teA

llB

oard

Pos

itio

ns

,K

eyE

vent

.VK

_T)

;

110

Page 114: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

>m

enu

.add

Item

(act

ion

s.m

_ac

tion

Wri

teA

llS

core

s,

Key

Eve

nt.

VK

_Z)

;>

men

u.a

ddIt

em(a

ctio

ns

.m_

acti

onW

rite

All

Boa

rdsA

ndSc

ores

,K

eyE

vent

.VK

_Z)

;

Source/gogui/M

ain.diff

19a2

0>

Syst

em.o

ut.p

rin

tln

("B

oo!

")

;

Source/gogui/S

core.diff

42a4

3,6

2>

pu

bli

cin

tm

_ad

jace

ntO

ppC

olor

Poi

nts;

>p

ub

lic

int

m_

chai

nsB

lack

;>

pu

bli

cin

tm

_ch

ains

Whi

te;

>p

ub

lic

int

m_

chai

nsE

mpt

y;

>p

ub

lic

int

m_

atar

iBla

ck;

>p

ub

lic

int

m_

atar

iWhi

te;

>p

ub

lic

flo

at

m_

mea

nCha

inSi

zeB

lack

;>

pu

bli

cfl

oa

tm

_m

eanC

hain

Size

Whi

te;

>p

ub

lic

flo

at

m_

med

ianC

hain

Size

Bla

ck;

>p

ub

lic

flo

at

m_

med

ianC

hain

Size

Whi

te;

>p

ub

lic

int

m_

lib

erti

esB

lack

;>

pu

bli

cin

tm

_li

ber

ties

Wh

ite

;>

pu

bli

cfl

oa

tm

_m

eanL

iber

ties

Bla

ck;

>p

ub

lic

flo

at

m_

mea

nLib

erti

esW

hite

;>

pu

bli

cin

tm

_ey

esB

lack

;>

pu

bli

cin

tm

_ey

esW

hite

;>

pu

bli

cfl

oa

tm

_m

eanE

yesB

lack

;>

pu

bli

cfl

oa

tm

_m

eanE

yesW

hite

;> >

Source/gogui/C

ountSc

ore.diff

7a8

,10

> >im

por

tja

va.u

til.

Arr

ayL

ist

;> 13

6a14

0,1

44> >

addE

yeD

ata

(s

);

>ad

dCha

inD

ata

(s

);

> >

175a

184

,195

> >//

Thi

sse

ctio

nco

un

tsth

enu

mbe

rof

case

sw

here

abl

ack

piec

eis

plac

edn

ext

toa

whi

tepi

ece

.>

//If

the

sam

epi

ece

shar

es2

bord

ers

,it

wil

lbe

coun

ted

twic

e.

>fo

r(

GoP

oint

adja

cen

tPoi

nt

:m

_bo

ard

.get

Ad

jace

nt(

p)

)>

{>

GoC

olor

adjc

=m

_bo

ard

.get

Col

or(

adja

cen

tPoi

nt

);

> >if

(ad

jc=

=W

HIT

E&

&c

==

BLA

CK)

>{

>+

+s

.m_

adja

cent

Opp

Col

orP

oint

s;>

}>

}19

5c2

15,3

66<

/∗∗

Cha

nge

the

life

−de

ath

sta

tus

ofa

ston

e.

−−−

>p

ubl

icvo

idad

dCha

inD

ata

(S

core

s){

>A

rray

Lis

t<C

hain

>ch

ain

s=

getC

hain

s()

;>

int

chai

nsB

lack

=0;

>in

tch

ain

sWhi

te=

0;>

int

chai

nsE

mpt

y=

0;>

int

ata

riB

lack

=0;

>in

tat

ariW

hite

=0;

>in

tto

talS

izeB

lack

=0;

>in

tto

talS

izeW

hit

e=

0;>

//in

tm

edia

nC

hain

Siz

eBla

ck=

0;>

//in

tm

edia

nC

hain

Siz

eWhi

te=

0;>

int

libe

rtie

sBla

ck=

0;>

int

libe

rtie

sWh

ite

=0;

> >fo

r(

Cha

inch

ain

:ch

ain

s)>

{>

if(

chai

n.c

olo

r=

=BL

ACK

)>

{>

chai

nsB

lack

++

;>

if(

chai

n.i

nA

tari

())

>a

tari

Bla

ck+

+;

> >li

bert

iesB

lack

+=

chai

n.

lib

ert

ies

;> >

tota

lSiz

eBla

ck+

=ch

ain

.siz

e()

;>

}>

111

Page 115: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

>if

(ch

ain

.co

lor

==

WH

ITE

)>

{>

chai

nsW

hite

++

;>

if(

chai

n.i

nA

tari

())

>at

ariW

hite

++

;> >

libe

rtie

sWh

ite

+=

chai

n.

lib

ert

ies

;> >

tota

lSiz

eWh

ite

+=

chai

n.s

ize

();

>}

> >if

(ch

ain

.co

lor

==

EMPT

Y)

>{

>ch

ains

Em

pty+

+;

>}

>}

> >s

.m_

chai

nsB

lack

=ch

ain

sBla

ck;

>s

.m_

chai

nsW

hite

=ch

ain

sWhi

te;

>s

.m_

chai

nsE

mpt

y=

chai

nsE

mpt

y;

>s

.m_

atar

iBla

ck=

ata

riB

lack

;>

s.m

_at

ariW

hite

=at

ariW

hite

;>

s.m

_li

bert

iesB

lack

=li

bert

iesB

lack

;>

s.m

_li

bert

iesW

hite

=li

bert

iesW

hit

e;

>s

.m_

mea

nL

iber

ties

Bla

ck=

(fl

oa

t)li

bert

iesB

lack

/ch

ain

sBla

ck;

>s

.m_

mea

nLib

erti

esW

hite

=(

flo

at)

libe

rtie

sWh

ite

/ch

ain

sWhi

te;

>s

.m_

mea

nCha

inSi

zeB

lack

=(

flo

at)

tota

lSiz

eBla

ck/

chai

nsB

lack

;>

s.m

_m

eanC

hain

Size

Whi

te=

(fl

oa

t)to

talS

izeW

hit

e/

chai

nsW

hite

;> >

if(

s.m

_ey

esB

lack

!=0

)>

s.m

_m

eanE

yesB

lack

=(

flo

at)

s.m

_ey

esB

lack

/ch

ain

sBla

ck;

>if

(s

.m_

eyes

Whi

te!=

0)

>s

.m_

mea

nEye

sWhi

te=

(fl

oa

t)s

.m_

eyes

Whi

te/

chai

nsW

hite

;>

}> >

pri

vate

Arr

ayL

ist<

Cha

in>

getC

hain

s()

>{

>A

rray

Lis

t<C

hain

>re

t=

new

Arr

ayL

ist<

Cha

in>

();

>P

oin

tLis

tgr

oupe

dPoi

nts

=ne

wP

oin

tLis

t()

;> >

for

(G

oPoi

ntp

:m

_bo

ard

)>

{

>P

oin

tLis

tch

ecke

dPoi

nts

=ne

wP

oin

tLis

t()

;> >

if(

grou

pedP

oin

ts.c

onta

ins(

p)

)>

con

tin

ue

;> >

Cha

inne

wC

hain

=ne

wC

hain

();

>ne

wC

hain

.add

(p

);

>G

oCol

orco

lor

=m

_bo

ard

.get

Col

or(p

);

>ne

wC

hain

.co

lor

=co

lor

;>

grou

pedP

oin

ts.a

dd(

p)

;>

chec

kedP

oin

ts.a

dd(

p)

;> >

Poi

ntL

ist

nei

ghbo

urs

=ne

wP

oin

tLis

t()

;>

nei

ghbo

urs

.add

All

Fro

mC

onst

(m

_bo

ard

.get

Adj

acen

t(p

))

;> >

for

(G

oPoi

ntad

jPoi

nt

:n

eigh

bou

rs)

>{

>if

(ch

ecke

dPoi

nts

.con

tain

s(ad

jPoi

nt)

)>

con

tin

ue

;>

chec

kedP

oin

ts.a

dd(

adjP

oin

t)

;>

if(

m_

boar

d.g

etC

olor

(adj

Poi

nt)

==

EMPT

Y)

>ne

wC

hain

.li

be

rtie

s++

;> >

if(

grou

pedP

oin

ts.c

onta

ins(

adjP

oin

t))

//S

houl

dn

ever

happ

en>

con

tin

ue

;> >

if(

m_

boar

d.g

etC

olor

(adj

Poi

nt)

==

colo

r)

>{

>ne

wC

hain

.add

(ad

jPoi

nt

);

>gr

oupe

dPoi

nts

.add

(ad

jPoi

nt

);

>n

eigh

bou

rs.a

ddA

llF

rom

Con

st(

m_

boar

d.g

etA

djac

ent(

adjP

oin

t))

;>

}> > >

}> >

ret.

add

(ne

wC

hain

);

>}

>re

turn

(re

t)

;>

}> >

pu

blic

void

addE

yeD

ata

(S

core

s){

>in

tey

esB

lack

=0;

>in

tey

esW

hite

=0;

112

Page 116: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

>fo

r(

GoP

oint

p:

m_

boar

d)

>{

>if

(m

_bo

ard

.get

Col

or(p

)=

=EM

PTY

);

>{

>P

oin

tLis

tad

jace

ntP

oin

ts=

new

Poi

ntL

ist(

m_

boar

d.

getA

djac

ent(

p))

;> >

GoC

olor

colo

r=

nu

ll;

>bo

olea

nne

wE

ye=

tru

e;

>fo

r(

GoP

oint

adj

:ad

jace

ntP

oin

ts)

>{

>if

(co

lor

==

nu

ll)

>{

>co

lor

=m

_bo

ard

.get

Col

or(a

dj)

;>

}>

else

>{

>if

(co

lor

!=m

_bo

ard

.get

Col

or(a

dj)

)>

{>

new

Eye

=fa

lse

;>

brea

k;

>}

>}

>}

> >if

(ne

wE

ye&

&co

lor

==

BLA

CK)

>{

>ey

esB

lack

++

;>

}>

else

if(

new

Eye

&&

colo

r=

=W

HIT

E)

>{

>ey

esW

hite

++

;>

}>

}>

}>

s.m

_ey

esB

lack

=ey

esB

lack

;>

s.m

_ey

esW

hite

=ey

esW

hite

;>

}> >

/∗∗

Cha

nge

the

life

−de

ath

sta

tus

ofa

ston

e.

Source/gogui/B

oardUtil.diff

93a9

4,1

34> >

pu

bli

cst

ati

cS

trin

gto

Str

ingO

neL

ine

(Con

stB

oard

boar

d,

boo

lean

wit

hGam

eInf

o)

>{

>S

trin

gBu

ild

ers

=ne

wS

trin

gBu

ild

er(1

024)

;>

int

size

=bo

ard

.get

Siz

e()

;>

Str

ing

sep

arat

or=

Syst

em.g

etP

rop

erty

("li

ne

.sep

arat

or")

;>

ass

ert

sep

arat

or!=

nu

ll;

>//

prin

tXC

oord

s(si

ze,

s,

sepa

rato

r);

>fo

r(i

nt

y=

size

−1;

y>

=0;

−−y

)>

{>

//pr

intY

Coo

rd(y

,s

,tr

ue

);

>//

s.a

ppen

d(

’’)

;>

for

(in

tx

=0;

x<

size

;+

+x

)>

{>

GoP

oint

poi

nt

=G

oPoi

nt.g

et(x

,y

);

>G

oCol

orco

lor

=bo

ard

.get

Col

or(p

oin

t);

>if

(col

or=

=BL

ACK

)>

s.a

ppen

d("

X,"

);

>el

seif

(col

or=

=W

HIT

E)

>s

.app

end

("O

,")

;>

else

>{

>//

if(b

oard

.isH

andi

cap

(poi

nt)

)>

//s

.app

end(

"+")

;>

//el

se>

s.a

ppen

d("

.,")

;>

}>

}>

//pr

intY

Coo

rd(y

,s

,fa

lse

);

>//

if(w

ithG

ameI

nfo

)>

//pr

intG

ameI

nfo

(boa

rd,

s,

y);

>//

s.a

ppen

d(s

epar

ator

);

>}

>//

prin

tXC

oord

s(si

ze,

s,

sepa

rato

r);

>//

if(w

ithG

ameI

nfo

)>

//{

>//

prin

tToM

ove(

boar

d,

s);

>//

s.a

ppen

d(s

epar

ator

);

>//

}>

retu

rns

.toS

trin

g()

;>

}

Source/gogui/C

hain.ja

va1

pac

kage

net

.sf.

gogu

i.go

;2 3

pu

bli

ccl

ass

Cha

in{

4 5p

ub

lic

Poi

ntL

ist

poi

nts

;

113

Page 117: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

6p

ub

lic

GoC

olor

colo

r;

7in

tli

ber

ties

;8 9

pu

bli

cC

hain

()10

{11

poi

nts

=ne

wP

oin

tLis

t()

;12

lib

erti

es=

0;13

}14 15

pu

bli

cin

tsi

ze()

16{

17re

turn

poi

nts

.siz

e()

;18

}19 20

pu

bli

cvo

idad

d(

GoP

oint

poi

nt

)21

{22

poi

nts

.add

(p

oin

t)

;23

}24 25

pu

bli

cb

oole

anin

Ata

ri()

26{

27if

(li

ber

ties

==

1)

28re

turn

tru

e;

29el

se30

retu

rnfa

lse

;31

}32 33

pu

bli

cS

trin

gto

Str

ing

()34

{35

retu

rnp

oin

ts.t

oStr

ing

();

36}

37}

Source/gogui/A

lpha

betN

umbe

rs.ja

va1

pac

kage

net

.sf.

gogu

i.go

;2

imp

ort

java

.uti

l.H

ashM

ap;

3 4im

por

tsu

n.s

ecu

rity

.act

ion

.Pu

tAll

Act

ion

;5 6

/∗∗

7∗

WAR

NING

!I

ism

issi

ng

from

this

impl

emen

tati

onto

suit

Go

gam

es.

8∗

@au

thor

Tom

9∗

10∗/

11p

ub

lic

cla

ssA

lpha

betN

umbe

rs{

12 13H

ashM

ap<

Str

ing

,In

tege

r>st

rToI

nt

=ne

wH

ashM

ap<

Str

ing

,In

tege

r>

();

14H

ashM

ap<

Inte

ger

,Str

ing>

intT

oStr

=ne

wH

ashM

ap<

Inte

ger

,S

trin

g>

();

15 16p

ub

lic

Alp

habe

tNum

bers

()17

{18

strT

oIn

t.p

ut("

A"

,1)

;19

strT

oIn

t.p

ut("

B"

,2)

;20

strT

oIn

t.p

ut("

C"

,3)

;21

strT

oIn

t.p

ut("

D"

,4)

;22

strT

oIn

t.p

ut("

E"

,5)

;23

strT

oIn

t.p

ut("

F"

,6)

;24

strT

oIn

t.p

ut("

G"

,7)

;25

strT

oIn

t.p

ut("

H"

,8)

;26

//st

rToI

nt.

put(

"I"

,9)

;I

isn

otus

edfo

rG

oga

mes

.27

strT

oIn

t.p

ut("

J"

,9)

;28

strT

oIn

t.p

ut("

K"

,10

);

29st

rToI

nt

.put

("L

",

11)

;30

strT

oIn

t.p

ut("

M"

,12

);

31st

rToI

nt

.put

("N

",

13)

;32

strT

oIn

t.p

ut("

O"

,14

);

33st

rToI

nt

.put

("P

",

15)

;34

strT

oIn

t.p

ut("

Q"

,16

);

35st

rToI

nt

.put

("R

",

17)

;36

strT

oIn

t.p

ut("

S"

,18

);

37st

rToI

nt

.put

("T

",

19)

;38

strT

oIn

t.p

ut("

U"

,20

);

39st

rToI

nt

.put

("V

",

21)

;40

strT

oIn

t.p

ut("

W"

,22

);

41st

rToI

nt

.put

("X

",

23)

;42

strT

oIn

t.p

ut("

Y"

,24

);

43st

rToI

nt

.put

("Z

",

0);

44 45in

tToS

tr=

flip

Map

(str

ToI

nt)

;46

}47 48

pri

vat

est

ati

cH

ashM

apfl

ipM

ap(

Has

hMap

hm)

49{

50H

ashM

apre

t=

new

Has

hMap

();

51fo

r(

Ob

ject

i:

hm.k

eyS

et()

)52

{53

ret

.put

(hm

.get

(i

),

i)

;54

}55

retu

rnre

t;

114

Page 118: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

56}

57 58p

ub

lic

Str

ing

flip

(in

ti)

59{

60re

turn

intT

oStr

.get

(i)

;61

}62 63

pu

bli

cin

tfl

ip(

Str

ing

s)64

{65

retu

rnst

rToI

nt

.get

(s)

;66

}67 68 69

}

Source/gogui/F

lipBW

.java

1p

acka

gen

et.s

f.go

gui.

go;

2 3im

por

tja

va.i

o.F

ile

;4

imp

ort

java

.io

.Fil

eNot

Fou

nd

Exc

epti

on;

5im

por

tja

va.i

o.P

rin

tStr

eam

;6

imp

ort

java

.uti

l.A

rray

Lis

t;

7im

por

tja

va.u

til.

Sca

nn

er;

8 9p

ub

lic

cla

ssF

lipB

W{

10 11/∗

∗12

∗@

para

mar

gs13

∗@

thro

ws

Fil

eNot

Fou

ndE

xcep

tion

14∗/

15p

ub

lic

sta

tic

void

mai

n(S

trin

g[]

args

)th

row

sF

ileN

otF

oun

dE

xcep

tion

{16 17

Fil

ed

ir=

new

Fil

e("

../

Som

eone

VsS

omeo

ne")

;18 19

Fil

e[]

file

s=

dir

.lis

tFil

es()

;20 21

for

(F

ile

file

:fi

les

)22

{23

Sca

nn

ersc

ann

er=

new

Sca

nn

er(

file

);

24S

trin

gBu

ild

erO

utpu

tToF

ile

=ne

wS

trin

gBu

ild

er()

;25 26

wh

ile

(sc

ann

er.h

asN

extL

ine

())

27{

28S

trin

gli

ne

=sc

ann

er.n

extL

ine

();

29li

ne

=li

ne

.rep

lace

All

("B

lack

",

"Zoo

m")

;

30li

ne

=li

ne

.rep

lace

All

("W

hite

",

"Bla

ck")

;31

lin

e=

lin

e.r

epla

ceA

ll("

Zoom

",

"Whi

te")

;32

lin

e=

lin

e.r

epla

ceA

ll("

B\\

[",

"Zoo

m\\

[")

;33

lin

e=

lin

e.r

epla

ceA

ll("

W\\

[",

"B\\

[")

;34

lin

e=

lin

e.r

epla

ceA

ll("

Zoom

\\["

,"W

\\["

);

35li

ne

=li

ne

.rep

lace

All

("B

\\+

",

"Zoo

m\\

+")

;36

lin

e=

lin

e.r

epla

ceA

ll("

W\\

+"

,"B

\\+

");

37li

ne

=li

ne

.rep

lace

All

("Zo

om\\

+"

,"W

\\+

");

38O

utpu

tToF

ile

.app

end

(li

ne

+"\

n"

);

39}

40 41P

rin

tStr

eam

out

=ne

wP

rin

tStr

eam

(ne

wF

ile

(fi

le.

getA

bso

lute

Pat

h()

))

;42

out.

pri

nt(

Out

putT

oFil

e.t

oStr

ing

())

;43

out.

clo

se()

;44

}45

}46 47

}

Source/gogui/S

ortB

yPB.ja

va1

pac

kage

net

.sf.

gogu

i.go

;2 3

imp

ort

java

.io

.Fil

e;

4im

por

tja

va.i

o.F

ileI

np

utS

trea

m;

5im

por

tja

va.i

o.F

ileN

otF

oun

dE

xcep

tion

;6

imp

ort

java

.io

.Fil

eOu

tpu

tStr

eam

;7

imp

ort

java

.io

.IO

Exc

epti

on;

8im

por

tja

va.i

o.P

rin

tStr

eam

;9

imp

ort

java

.nio

.ch

ann

els

.Fil

eCh

ann

el;

10im

por

tja

va.u

til.

Arr

ayL

ist

;11

imp

ort

java

.uti

l.H

ashM

ap;

12im

por

tja

va.u

til.

Sca

nn

er;

13im

por

tja

va.u

til.

rege

x.M

atch

er;

14im

por

tja

va.u

til.

rege

x.P

atte

rn;

15 16p

ub

lic

cla

ssSo

rtB

yPB

{17 18

/∗∗

19∗

@pa

ram

args

20∗

@th

row

sF

ileN

otF

oun

dExc

epti

on21

∗/22

pu

bli

cst

ati

cvo

idm

ain

(Str

ing

[]ar

gs)

thro

ws

Fil

eNot

Fou

nd

Exc

epti

on23

{24

115

Page 119: Machine Learning of Player Strategy in - cs.bath.ac.ukmdv/courses/CM30082/projects.bho/2009-10/... · Machine Learning of Player Strategy in Games Thomas Fletcher BSc in Computer

25F

ile

root

Dir

=ne

wF

ile

("..

/KG

S")

;26

Fil

e[]

dir

s=

root

Dir

.lis

tFil

es()

;27

Has

hMap

<S

trin

g,I

nte

ger>

use

rcou

nts

=ne

wH

ashM

ap<

Str

ing

,In

tege

r>

();

28 29fo

r(

Fil

ed

ir:

dir

s)

30{

31F

ile

[]fi

les

=d

ir.l

istF

iles

();

32 33fo

r(

Fil

efi

le:

file

s)

34{

35S

can

ner

scan

ner

=ne

wS

can

ner

(fi

le)

;36

Str

ing

pla

yerB

lack

=""

;37 38

wh

ile

(sc

ann

er.h

asN

extL

ine

())

39{

40S

trin

gli

ne

=sc

ann

er.n

extL

ine

();

41 42P

atte

rnp

=P

atte

rn.c

omp

ile

("^P

B\\

[(\\

w∗)

\\]$

");

43M

atch

erm

=p

.mat

cher

(li

ne

);

44if

(m

.mat

ches

())

45{

46p

laye

rBla

ck=

m.g

roup

(1)

;47

int

coun

t=

0;48

if(

use

rcou

nts

.con

tain

sKey

(pla

yerB

lack

))

49co

unt

=u

serc

oun

ts.g

et(p

laye

rBla

ck)

;50 51

use

rcou

nts

.put

(pla

yerB

lack

,co

unt

+1

);

52b

reak

;53

}54

}55 56

if(

!pla

yer

Bla

ck.e

qu

als(

"")

)57

{58

Fil

eou

tdir

=ne

wF

ile

("K

GSB

yNam

e/"

+p

lay

erB

lack

);

59F

ile

ou

tfil

e=

new

Fil

e(

"KG

SByN

ame/

"+

pla

yer

Bla

ck+

"/"

+fi

le.g

etN

ame(

))

;60

try

{61

outd

ir.m

kdir

();

62co

py

Fil

e(

file

,o

utf

ile

);

63}

catc

h(I

OE

xcep

tion

e){

64//

TODO

Aut

o−ge

ner

ated

catc

hbl

ock

65e

.pri

ntS

tack

Tra

ce()

;66

}67

}68

}69

}70

for

(S

trin

gke

y:

use

rcou

nts

.key

Set

())

71{

72Sy

stem

.out

.pri

ntl

n(

key

+"

,"

+u

serc

oun

ts.g

et(k

ey)

+"\

n"

);

73}

74 75}

76 77p

ub

lic

sta

tic

void

cop

yF

ile

(Fil

eso

urc

eFil

e,

Fil

ed

estF

ile

)th

row

sIO

Exc

epti

on{

78if

(!d

estF

ile

.ex

ists

())

{79

des

tFil

e.c

reat

eNew

Fil

e()

;80

}81 82

Fil

eCh

ann

elso

urc

e=

nu

ll;

83F

ileC

han

nel

des

tin

atio

n=

nu

ll;

84tr

y{

85so

urc

e=

new

Fil

eIn

pu

tStr

eam

(sou

rceF

ile

).g

etC

han

nel

();

86d

esti

nat

ion

=ne

wF

ileO

utp

utS

trea

m(d

estF

ile

).g

etC

han

nel

();

87d

esti

nat

ion

.tra

nsf

erF

rom

(sou

rce

,0

,so

urc

e.s

ize

())

;88

}89

fin

all

y{

90if

(sou

rce

!=n

ull

){

91so

urc

e.c

lose

();

92}

93if

(des

tin

atio

n!=

nu

ll)

{94

des

tin

atio

n.c

lose

();

95}

96}

97}

98 99 100

}

116


Recommended