How Chess Players Think-Patrick Turner

8/17/2019 How Chess Players Think-Patrick Turner

1/78

How chess players think:

evidence for the role of search

at Expert level and below

Patrick Turner

First degree: BSc. (Hons) Mathematics

Open University personal identifier: U6094525

Dissertation submitted for:

MSc. in Psychological Research Methods

March 2005


2/78

Abstract

There are two competing views of the dominant mechanism underpinning chess

thinking – pattern recognition or search-and-evaluation? Whilst the recent

development of template theory has gone some way to unifying the two existing

theories, there still remain a great deal of unanswered questions concerning the nature

of the chess thinking process – in particular the relative contribution of recognition

and search-and-evaluation to chess skill. Although recognition-based theories of

chess thinking do not deny that search is part of the thought process, they emphasisethat recognition of the position provides for highly selective search. Thus an Expert

need not search any faster, or deeper, to arrive at a good move – he narrows down his

search by pattern recognition to focus his analysis on the good moves. Conversely,

search-and-evaluation theories emphasise the ability to search deeper, wider, faster

and more thoroughly, coupled with the ability to evaluate leaf nodes more accurately,

as the basis for the selection of good moves. They do not claim that recognition is not

involved in directing search – merely that it is not the dominant mechanism.

The aim of the research discussed here was to investigate support for both recognition

and search theories of chess skill through experimentation involving chess players at

two levels (Expert and Class A/B) completing a „choice of next move‟ task for three

chess positions. Two major conclusions are drawn from the results. Firstly, there is

strong evidence for differences in search capabilities across skill levels in chess

players, supporting the results of Gobet (1998a) and others. Such evidence argues

against the basis of de Groot‟s main conclusion (1965) that recognition is the

dominant mechanism underpinning chess skill. Proponents of template theory (e.g.

Gobet & Simon, 1998a) argue that such continued results for search differences


3/78

3

across skill levels do not undermine the recognition-based theory of chess skill itself.

The second major conclusion to be drawn, however, suggests that there is less support

for the role of recognition than in previous studies, such as Gobet‟s (1998a). It may

be that the results hold only between Class A/B players and Experts. This would

provide evidence to the fact that the better players at club level are superior primarily

because of their search capabilities and not recognition. A different model of chess

skill may be required for players below the level of Master.


4/78

4

Table of contents

Introduction 5

Literature review 10

Methodology 31

Analysis 37

Project Review 55

Conclusions 63

Appendix I: de Groot positions 66

Appendix II: Protocol analysis 71

Bibliography 77


5/78

5

Introduction

The game of chess provides an ideal environment for the study of human

decision-making in complex domains. As such, it has provided the basis for a

number of studies into human cognition, including perception, memory and

decision-making. Over the decades following the publication, in 1965, of

Adriaan de Groot‟s original research into chess thinking, there have emerged two

schools of thought concerning how chess players think – the family of

recognition-based theories typified by chunking theory, due to de Groot (1965),Chase & Simon (Gobet and Simon, 1998a, 1998b; Gobet, 2004) among others;

and the search-and-evaluation theory of Holding (Holding, 1985; Gobet 2004).

Whilst the recent development of template theory has gone some way to unifying

the two theories, there still remain a great deal of unanswered questions

concerning the nature of the chess thinking process – in particular the relative

contribution of recognition and search-and-evaluation (often simply referred to

as „search‟) to chess skill.

The structur e of chess thinking

The two theories agree on the basic structure of the chess thought process. De

Groot (1965) showed that this process can be represented as a sequence of

mental operations on not only the perceived position that the player is confronted

with but also imagined positions as might occur if certain sequences of moves

are played – a development of Selz‟s Framework of Productive Thinking (de

Groot, 1965). Briefly, the chess thinking process comprises three main phases –

a phase of orientation, noting possible threats, plans and candidate moves; a


6/78

6

phase of elaboration, within which specific sequences of moves are considered

(“I move here, then he moves here” etc.), each of which terminates in an

evaluation of the desirability of an imagined position (a „leaf node‟); and a final

phase within which the best move so far considered may be checked before the

player commits to it (de Groot 1965, pp100-116). It is within the middle phase

that search activity is carried out. Although recognition-based theories of chess

thinking do not deny that search is part of the thought process, they emphasise

that recognition of the position (and good moves or general plans to undertake in

such a position) serves to make search activity highly selective. Thus an expert

player need not search any faster, or deeper, to arrive at a good move – he

narrows down his search by recognition to focus his analysis on the good moves.

Conversely, search-and-evaluation theories emphasise the ability to search

deeper, wider, faster and more thoroughly, coupled with the ability to evaluate

leaf nodes more accurately, as the basis for the selection of good moves. They

do not claim that recognition is not involved in directing search – merely that it is

not the dominant mechanism.

Newell & Simon (1972) formalised de Groot‟s framework in the Pro blem

Behaviour Graph (PBG) model. A PBG characterises the phase of elaboration in

chess thinking, where search is undertaken. They are characterised by sequences

of moves, beginning with a candidate move (or base move) and alternating for

moves from each side, with possible branching in each sequence. Each branch

ends in a leaf node and each leaf node is evaluated, usually only as „good‟ or

„bad‟ for the player on move. As such, PBGs allow for the extraction of search

variables such as „number of nodes searched‟, and „maximum depth of search‟.

It is more difficult to extract variables characterising recognition although


7/78

7

„number of base moves considered‟ serves to characterise option generation

before any search is conducted.

Aims

The aim of the research discussed here was to investigate support for both

recognition and search theories of chess skill through experimentation involving

chess players of different calibres completing a „choice of next move‟ task for a

small number of chess positions with varying character.

The experimental aims were to establish significant differences in choice of next

move and search behaviour across two groups of chess players of differing

calibres, for three different chess positions. This was to be achieved through the

a pplication of de Groot‟s experimental procedure and using the analysis methods

of de Groot (1965) and Newell & Simon (1972). Data from the most recent

study of this kind, that of Gobet (1998a), was also to be used for comparisons of

results.

The specific research questions included:

Do club-level chess players of differing calibres differ in terms of quality

of move selection?

Do club-level chess players of differing calibres differ in terms of

capacity of search, mean and maximal search depth, and thoroughness of

search?

To what degree do the levels of search activity in club-level players fit

with existing models of chess thinking?


8/78

8

Novelty

The experimentation and analysis outlined above is not completely novel. It

draws much of the experimental procedure, analysis methods and study variables

from existing research in the field, such as de Groot (1965) and subsequent

replications of that original set of experiments (Newell & Simon, 1972; Gobet,

1998a). It is novel in two respects, however:

It comprises a repeated measures choice of next move task across three

positions; each of the three studies named above focused only on one

position;

It samples from club-level players only (Experts down to Class B) and

therefore serves to test some of de Groot‟s original conclusions, which

were based on an extremely high calibre sample including Grandmasters.

Motivation for this dissertation

The choice of subject matter for this dissertation is motivated by twin interests in

human decision-making in naturalistic settings and empirical research into

human decision-making. An enduring methodological problem that human

decision-making research faces is the design of experiments that both preserve

ecological validity (i.e. a naturalistic decision-making setting and task) and

enable the valid measurement of important variables. Chess is a rare case of a

structured and bounded decision-making environment that still affords

ecologically valid, yet well-defined, experimentation.


9/78

9

Structur e of this dissertation

The remainder of this dissertation is structured as follows:

The Literature Review introduces the main arguments for bothrecognition- and search-based theories of chess skill;

The Methodology chapter outlines the experimental design, experimental

procedure and analysis techniques undertaken.

The Analysis chapter sets out the results and analysis from the

experiment.

The Project Review reflects upon the changes in focus for the research

throughout its course, including modifications to the design, the success

of the experiment, the focusing of the analysis and the validity of the

methods.

The Conclusions chapter revisits the main findings of the analysis in the

context of the original research questions and the wider debate

concerning the nature of chess skill.


10/78

10

Literature review

The game of chess is ideally suited to a range of studies in cognitive psychology,

particularly memory, expertise and decision-making. Success at chess is

completely dependent upon skill and, whilst the configuration of the board and

pieces, and the rules of the game can be understood relatively quickly, a typical

chess position offers a non-trivial decision-making task, even for highly skilled

players. This is because of the inherent complexity that the game offers and,

although information about each position is known perfectly and the ultimate

goal of the game is certain, this complexity renders chess a credible domain of

interest for the study of human decision-making. There is also a substantial

amount of psychological literature on chess, perhaps because of the relatively

simple manner in which experiments can be conducted.

Cognitive psychology and chess enjoy a history of over a century of research; the

key question that has engaged psychologists throughout has been, “What

constitutes skill at chess?” Although it is generally agreed that chess skill is

based upon both recognition (the ability to match patterns based on the

possession of „good‟ patterns) and look -ahead search (essentially the ability to

compute sequences of moves), opinions are polarised and there are distinct

camps that espouse the dominance of one mechanism over the other.

Most modern research on chess skill has its foundations in the studies of the

Dutch international chess master Adriaan de Groot, whose original experiments,

conducted between 1938 and 1944, served to develop both theories of expertise

and decision-making, and corresponding experimental methods. The remainder

of this chapter is divided into sections, each of which discusses a key


11/78

11

development in one or both of the competing recognition-based and search-based

theories of chess skill.

The role of recognition: de Groot

De Groot (1965) was concerned with the thought processes underlying expert

chess players‟ choice of next move decisions. His main experiment was a

„choice of next move‟ task, conducted with a relatively small sample of good

chess players, ranging from grandmasters (including Alexander Alekhine and

Max Euwe) to Class C players (approximately average club level). De Groot

used a set of chess positions, typically middlegame positions taken from games

which he had played. De Groot set these positions up on a chessboard and asked

his subjects, assuming the role of the player on move, to think of a move and

play it on the board as if they were involved in an actual tournament game. The

only extra stipulation was that the subject „thought aloud‟ as he or she did so that

de Groot could record the way in which the subject arrived at his or her next

move. (This method is discussed in further detail in the next section.)

De Groot recorded each subject‟s thought as a verbal protocol which he then

coded, using Otto Selz‟s framework of productive thinking (de Groot, 1965). De

Groot was motivated by Selz‟s framework, which described thinking as a

„hierarchically organised linear series of operations‟ (de Groot, 1965, p vi) and,

in fact, sought to test it through the coding of the protocols. De Groot

demonstrated that he could successfully represent the protocols within this

framework, which, at the macro-structural level, comprises three phases: a first

phase of orientation that may include a listing of candidate moves for

consideration; a phase of elaboration whereby candidate moves are examined in


12/78

12

detail through the consideration of possible sequences of moves that they

precipitate; and a final phase in which a move is selected, possibly following

some form of summarisation. De Groot‟s coding, which was later formalised by

Newell and Simon (1972) as a Problem Behaviour Graph (PBG), captured the

history of all sequences of moves, each beginning with a base move (candidate

next move) considered by the subject. Such sequences included branching,

whereby the subject considered two or more possible sequences from some

branching move coming after the base move. Each sequence terminated in an

evaluation (positive, negative or unexpressed). Since this coding captured all the

moves considered it allowed for the reinvestigation of base moves.

De Groot did not expose every player to every position; positions A, B and C

were most commonly used and de Groot chose only to extract quantitative

variables from the encoded protocols for these positions (seen by 19, 6 and 6

players, respectively). These variables included the chosen move, the time taken

for each phase, the ordered sequence of base moves considered (candidate next

moves), the total number of moves, and variables concerning the frequency of

both immediate and non-immediate reinvestigations. De Groot had also analysed

positions A to C extensively to generate an order of „move quality‟ for each of

the legal moves in each position.

De Groot‟s first results were that stronger players chose better quality moves

than weaker players. Secondly, there was little difference between masters and

Experts1 on the various „search variables‟, including the total number of moves

considered (typically less than 100), depth of search or rate of search (number of

moves per minute). De Groot then asserted “the master does not necessarily

1 Experts is capitalised when referring to the class of players directly below masters and notcapitalised when referring, in general, to people possessing expertise.


13/78

13

calculate deeper, but the variations that he does calculate are much more to the

point; he sizes up positions more easily and, especially, more accurately” (1965,

p320). Although de Groot stated that he still expected greater search abilities in

high calibre players, he conceded that such differences did not explain the

observed performance differences. Having failed to establish skill differences on

these search variables, de Groot therefore conducted a second experiment based

on a „recall‟ task , originally conducted – in flawed form – by Djakow, Rudik and

Petrowski in 1927 (Gobet 2004). Players were exposed to 16 positions, taken

from relatively obscure master games, each for a short length of time (between 2

and 15 seconds). After each presentation the player was requested to reproduce

the position verbally and de Groot developed a scoring scheme for assessing the

corresponding verbal protocols. The results showed, significantly, that

grandmasters outperformed weaker players. De Groot inferred that experience

(in its effect upon perceptual processes) was the contributory factor, asserting

that the position is perceived in large complexes, each of which hangs together as

a genetic, functional and/ or dynamic unit. For the master such complexes are of

a typical nature.” (1965, p329, italics from original text). De Groot also

suggested that “eye movements undoubtedly come into play” – a hypothesis

proved, in 1996, by de Groot and Gobet (Gobet, 2004). De Groot conducted a

detailed analysis of the verbal protocols for the recall task and identified content-

specific themes that demanded differing degrees of attention. It is interesting to

compare this approach with the quantitative (information-theoretic) approach of

Chase and Simon in the development of chunking theory (see below).

Returning to the results of the „choice of next move experiment‟, one of de

Groot‟s innovations was an extension of the Selzian framework of productive


14/78

14

thinking. De Groot noticed that players employed a method that he denoted

„progressive deepening‟ – the reinvestigation of sequences emanating from the

same base move several times, either immediately or non-immediately, with the

tendency to search both progressively wider (examining more branches) and

deeper each time before evaluating at leaf nodes. This is referred to as „rough

cut, fine cut‟ by Newell and Simon (1972, p752). Selz‟s concept of „subsidiary

methods‟ stated that human problem solving is based on, essentially, exhaustive

depth-first search in support of one plan followed by depth-first search for a

second plan if the first fails etc. (where „plan‟ defines the context of evaluation

of leaf nodes). De Groot effectively redefined „exhaustiveness‟ in relative terms,

(1965, p270). This allowed for the reinvestigation of any base move, with the

examination of ever deeper and wider extensions to the search tree emanating

from each move. De Groot proposed that the varying criteria by which a

sequence is considered to be „exhausted‟ upon investigation/ reinvestigation –

and thus the criteria by which the corresponding base move is evaluated as good

or not – are based on recognition.

De Groot‟s main conclusions, across both of this experiments, was that

recognition (based on the possession of perceptual chess-specific knowledge),

together with the application of effective set of heuristic goal-driven rules, were

the major components of chess skill. The identification of recognition, in

particular, as a key mechanism refuted the then commonly held view that chess

skill was innate and had a large impact on theories of expertise that still persists.


15/78

15

I nformation processing and Problem Behaviour Graphs

The representation of human problem solving in the Selzian framework was

attractive to Herbert Simon, who viewed such an activity as, essentially, as

information processing. Simon was also the originator of the concepts of

bounded rationality, which states that there are limits on human information

processing that, in turn, impose limits on human rationality, and satisficing ,

which describes the sufficient, yet sub-optimal, human approach to decision-

making where bounded rationality is enforced, e.g. due to the complexity of the

decision-making environment. Chess is certainly one such environment and

there are clear parallels between satisficing and de Groot‟s progressive

deepening, the latter of which seeks a positive evaluation of a move even though

a thorough analysis may be lacking.

In 1965, Newell and Simon (1972) reinvestigated and replicated de Groot‟s

„choice of next move‟ experiment with the aim of investigating whether the

human decision-maker, in selecting his next move in chess, could be considered

an Information Processing System (IPS) and whether a thorough task analysis

would enable them to enrich their IPS model. Newell and Simon advocated the

elicitation of verbal protocols but emphasised their quantitative analysis rather

than de Groot‟s extensive qualitative analysis. As such they built on de Groot‟s

enhanced Selzian framework and formalised the coding of the verbal protocol as

a Problem Behaviour Graph (PBG).

A PBG is a descriptive chronological model of an individual‟s thinking

throughout the course of a problem-solving task. It concerns the navigation of a

human decision maker along sequences of linked nodes, each representing some

projected state of the environment with links representing the application of an


16/78

16

operator to a previous node. This forms a chronologically order set of sequences

of linked nodes, possibly with branching (representing the conception of two

different operators on a given node), ending at given leaf nodes. A PBG for

choosing the next move in a chess position represents, as nodes, future chess

positions that may be arrived at through the application of a sequence of moves

for white and black. Each initial move, or base move, represents the candidate

moves that a player conceives, and chooses from, in completing the task. Each

leaf node terminates in an evaluation (including a „non-evaluation‟) of the

position at that point. Note that a PBG is not equivalent to a search tree because

the latter models all sequences of moves considered by the chess player in

selecting his next move once only whereas a PBG provides a chronological view

on that player‟s considerations. As such, PBGs therefore may contain a number

of sequences beginning with the same base move, which may or may not be

different (indeed, identical sequences may or may not include different

evaluations). Whilst most of the work underpinning PBGs is due to Selz and de

Groot, Newell and Simon added the graphical formalism. To differentiate

between difference sequences, they redefined de Groot‟s „sub- phases‟ as

episodes – distinct chains of reasoning beginning with a base move, whether it be

different or the same as that considered beforehand.

The advantage of the PBG formulation is that it provides for the quantitative

analysis of the search-and-evaluation process. Newell and Simon (1972)

examined the quantitative variables derived from the protocol of a single subject

(S2) and compared them with those of de Groot‟s sample, noting the consensus

in results in terms of both quality of move and decision-making method. In

particular, S2 exhibited progressive deepening.


17/78

17

Perhaps the most important contribution of Newell and Simon‟s 1965 research

was their detailed analysis of the search strategies of S2 and de Groot‟s subjects.

They proposed a small number of principles for the generation of moves and

episodes – essentially an attempt at naming the „heuristic rules‟ that de Groot had

suggested contributed to chess skill. Newell and Simon did not find much

evidence, in the protocols, of means-ends analysis (goals-setting and the

identification and analysis of means – i.e. moves – to achieve those goals)

although they noted both that all protocols studied concerned position A – a

highly tactical position in which strategic plans are of less consequence – and

that de Groot had observed numerous examples of goal-setting in more strategic

positions (1965, pp157-9). Despite their characterisation of search strategies,

Newell and Simon share de Groot‟s view on the importance of recognition in

chess skill, particularly upon immediate consideration of a position and prior to

any search: “players notice a small number of considerable moves, and do not

notice (or at least do not mention noticing) the large number of remaining legal

moves” (Newell & Simon, p775), that is, there is a perceptual process guiding

search from the outset. This embodies the „first phase‟ in de Groot‟s macro-

structural model of next-move selection.

Chase and S imon’s Chunking theory

Chunking theory emerged from the 1973 experiments of Chase and Simon

(Gobet, 2004) as a general theory of expertise, originally applied to chess. In

line with de Groot‟s conclusions, it asserts that recognition is the key mechanism

underpinning expertise. In the experiment, three classes of player (Masters,

Experts and novices) were exposed to middle and end-game positions of two


18/78

18

types: positions from actual games and random positions matched for the number

of pieces present. There were two tasks: the „recall‟ task was essentially a

modification of de Groot‟s procedure although all positions were shown for 5

seconds and the players were subsequently asked to reconstruct them on a chess

board; the „copy‟ task differed in that the positions were not hidden from the

experiments during the reconstruction phase. For the positions drawn from

actual games, success at reconstruction (according to the number of pieces

correctly placed) was found to be proportional to skill level. For the random

positions, however, there were no significant differences across the three groups

of players. Chase and Simon concluded that the improved performance for more

skilled players was not due to any superiority in short-term memory, but to the

recognition of familiar patterns.

Chase and Simon (Gobet 2004) noted that, in both tasks, subjects reconstructed

pieces in groups, as defined by the intervals between piece placement in the

recall task, and by glances at the stimulus position in the copy task; further,

pieces in the same group tended to share more meaningful relations (e.g.

attacking, defending, same colour, same type etc. – judged by skilled players)

than those in different groups. Chase and Simon denoted these patterns of pieces

„chunks‟. The experiment also provided evidence that better players possess

bigger chunks (in terms of number of component pieces) and more chunks.

Chase and Simon (Gobet 2004) asserted that chunks are stored in short-term

memory (STM) as pointers to patterns encoded in semantic long-term memory

(LTM). Essentially, chunks are akin to the conditions of productions in LTM

that associate patterns with moves. Chase and Simon also expressed time


19/78

19

parameters for the rate of learning (approximately 8 seconds per chunk) and

STM limits (7 chunks, in line with Miller‟s predictions).

In a second 1973 paper, Chase and Simon also proposed that a secondary

transient memory store, a visuo-spatial store known as the mind’s eye, provides

an internal representation of the position upon which mental operations may be

carried out (e.g. the moves suggested by LTM). The position in the mind‟s eye is

also available to perceptual processes and thus chunks in a projected position

following a potential move may also be perceived and matched against patterns

in LTM. Thus chunking theory offers an explanation of how recognition may be

combined with mental simulation to arrive at good moves. It should be noted,

however, that the mind‟s eye extension to the theory is not supported by

empirical evidence since the experiment did not include a decision-making task.

Chase and Simon conducted a second experiment to demonstrate the stability of

chunks. The criterion for stability was: a chunk is considered to be repeated if at

least two thirds of its component pieces are recalled together. Stability of chunks

for class A players was 96%, versus 65% for the master player in the sample.

Support for chunking theory comes from Charness (Gobet, 2004) who, in 1974,

conducted a recall experiment with positions presented verbally, at a rapid rate

(average latency 2.3 seconds per piece) in three ways: by Chase and Simon‟s

relations; by columns (on the board) or randomly. The best recall was found for

Chase and Simon‟s relations and the worst for the random condition.

Cri ticisms of chunking theory

Chunking theory was not without its critics, however. These criticisms are on a

number of bases and include both methodological criticisms and theoretical


20/78

20

criticisms. Gobet and Simon (1998b) summarised the methodological criticisms

raised by many authors, including Holding (1985) and highlighted some

methodological concerns of their own, including the small sample size in the

1973 experiments and the one-to-one mapping of pieces placed a single „bursts

of activity‟ onto chunks. A single burst of activity was defined, in the 1973

recall experiments, as a sequence of piece placement with latencies less than 2

seconds between pieces. Gobet and Simon (1998a) argued that this latency may

actually increase over the recall period. Further, a burst of activity is also

dependent upon the physical limitations of picking up all component pieces of a

chunk in one hand. The most outspoken critic of the theory was, perhaps,

Holding (1985), who advocated the roles of both search and conceptual

knowledge (rather than perceptual chunks) in chess skill. Holding‟s specific

arguments included the following:

Chunks may be encoded into LTM in less than 8 seconds;

The size of chunks is too small to reflect conceptual knowledge;

Although chess skill can explain memory performance, there is no

evidence for a causal relationship in the opposite direction, that is that

memory (and recognition) explains chess skill.

The first criticism was based on recall experiments with interpolated tasks

designed to cause STM interference (e.g. Charness‟s experiment of 1976,

reported in Holding, 1985) had shown no effect on memory performance,

suggesting that LTM encoding for chunks was rapid. The second criticism is

based on Holding‟s assertion that chunking theor y “does not provide a sufficient

basis for maintaining that chess memory is organised in small chunks whose

labels are held in STM. Instead it appears that chess players who actively


21/78

21

process the given positions are able to integrate the general characteristics of

these positions in a hierarchical, prototypical or schematic format, not necessarily

based on pairs of pieces, that constitutes an „understanding‟ of the positions”

(Holding, 1985 p130). Key to this argument is Holding‟s inspection of both

positions and corresponding chunks from Chase and Simon‟s experiments. He

claimed that the actual chunks identified bear little relation to the important

playing themes in that same position and concluded “if we assume that all the

chunks for memorising purposes are to be identified on one basis and the patterns

for move selection on another, the theory loses a good deal of its economy”

[Holding, p103]. Indeed, if we accept the criterion for the stability of chunks

across experiments, it appears that better players perceive positions in a number

of ways (65% stability is a fairly low figure). The final criticism is backed up

with evidence from Holding and Reynold‟s (1982) experiment with random

positions. Players of different skill levels from novice to Expert completed two

tasks: the first was a recall task and the second was a choice of next move task on

the corrected positions. As expected, there was no effect of skill on memory, but

there was a significant effect of skill on (assessed) quality of next move.

Holding and Reynolds concluded that “the evidence shows that skill differences

continue to appear in situations where recognition by chunking is impossible”

(Holding, 1985 p133). In light of such criticisms, Gobet and Simon‟s replicated

the 1973 experiments and made corresponding modifications to the theory

(discussed in Gobet and Simon’s template theory, below).


22/78

22

SEEK Theory: the contribution of Holding

Above all of Holding‟s specific criticisms of Chunking Theory, his central belief

was that it was basically flawed – although he accepted the result that skill has an

effect on memory for meaningful chess positions, he believed that the role of

recognition (based on memory) was insufficient in explaining chess skill.

Holding promoted the importance of search, evaluation and knowledge to chess

skill and expressed this idea in his SEEK theory. It is important to understand

Holding‟s distinction between the mechanisms of „recognition‟ and „search‟

since his use of terminology differs slightly from that of other researchers. To

Holding „recognition‟ defines the key mechanism of Chunking Theory as the

association between perceived patterns (chunks) and good moves – without

search. „Search‟ involves a combination of planning a selective search through

candidate moves and sequences, and evaluating the utility of these moves to

support next move selection. Perhaps the most confusing aspect of Holding‟s

definitions is that he asserts that pattern recognition from semantic knowledge

also plays a key role in directing search by suggesting good moves. To Holding,

“patterns may be general rather than specific chunks” (1985, p174) and the

corresponding recognition mechanism is almost certainly less „automatic‟ in its

cueing of moves than that of Chunking Theory. In fact, it appears that „search‟,

in itself, is an extremely low-level skill, involving only focusing one‟s evaluative

skills on different moves. It should be noted that Holding (and others) refers to

„search‟ when he really means the wider set of skills described above, i.e. search,

evaluation and knowledge – all three of which are embodied in SEEK theory.

Holding claimed that, within de Groot‟s verbal protocols, there was, in fact, a

relationship between skill level and both number of moves considered and speed


23/78

23

of search (number of moves considered per minute), although this was not

statistically significant. He argued that the real effect was obscured by the highly

tactical nature of the only position for which a meaningful number of protocols

were published, i.e. position A. Other studies have supported this claim, in

particular Charness‟s 1981 experiment (Holding, 1985; Gobet, 2004), conducted

with 34 skilled players and a balance of tactical and strategic positions, different

to those used by de Groot, suggests a linear relationship between skill level (in

terms of Elo points) and depth of search (in terms of number of moves). Holding

reports that average maximal depth of search increases by 1.4 plies per standard

deviation of skill (200 points) and Gobet reports that the average depth of search

increases by 0.5 plies for the same interval.

In 1979, Holding (1985) developed a single scale to evaluate positions on the

basis of advantage to one side over the other using the expert judgement of

skilled players. He then asked 50 Class A-E players to evaluate a set of

quiescent positions, with level material, from actual grandmaster games on this

scale. The players were also asked to select a next move. Evaluations were

scored in comparison with the actual outcomes of the games. The results showed

that there is an effect of skill on evaluations. In Holding and Reynold‟s 1982

experiment (Holding, 1985) for recall on random positions players were also

asked to evaluate the position immediately (after it had been corrected following

the recall task) and after 5 minutes of consideration. There were no skill

differences for „correctness of evaluation‟ at either measurement point. Holding

concluded that evaluative skill is influenced by memory, including “generic

[semantic] memory for the type of specific… formations that are known to give

rise to advantages and disadvantages” (Holding, 1985 p208).


24/78

24

Holding‟s main conclusion is that differences in chess skill are due to search,

evaluation and knowledge: “the better players show greater competence in every

phase of the SEEK processes, conducting more knowledgeable evaluations, in

order to anticipate events on the chessboard” (1985, pp255-256).

Gobet and Simon’s template theory

Gobet and Simon (1996) set out to test Holding‟s conclusion by means of a

„natural experiment‟, observing the performance of the then-world champion

Grand Master Gary Kasparov, in both a series of matches of simultaneous games

and tournaments against expert opponents (predominately Masters and Grand

Masters). The average time afforded to Kasparov for each move was 3 minutes

in tournament play and 3 minutes per round (all matches of simultaneous games,

played against between four and eight opponents). Gobet and Simon reasoned

that the increased time-pressure in the simultaneous games would provide

Kasparov with less time to evaluate moves and, therefore, if Holding‟s

conclusion were true, he should perform less well in the simultaneous games

than in the tournament. The results showed that K asparov‟s performance did not

greatly differ across the two conditions. Indeed, in the simultaneous matches,

Kasparov played at the level of a very strong Grand Master. Gobet and Simon

concluded that it was Kasparov‟s pattern-matching that accounted for his similar

performances in both simultaneous matches and normal tournament play, and

that this result could be generalised to all expert chess players. This is supprted

by a similar result from Calderwood, Klein and Randall (1988).

Gobet and Simon (1998b) asserted that some of Holding‟s criticisms were valid

(e.g. those concerning LTM encoding and chunk size) whilst others were


25/78

25

incorrect (or had been shown to be incorrect). For example, Holding‟s result for

skill differences for choice of next move decisions in random positions was

countered by Gobet and Simon‟s experimental results (1998b) that indicated that

chunking theory does predict a small skill difference in the recall of such

positions – contrary to de Groot‟s and Chase and Simon‟s earlier results and

preserving the possibility of a relationship between memory and skill. Gobet and

Simon state that Holding‟s main issue with chunking theory – that it consists of

pattern recognition without search – is a misunderstanding, since the „mind‟s

eye‟ extension to the theory clearly describes the use of pattern recognition to

support a „think -head‟ process, thus generating subsequent moves for

consideration (this account also largely equates pattern recognition of non-base

moves with Holding‟s evaluation mechanism).

In 1996, Gobet and Simon (1998a) replicated Chase and Simon‟s original

experiments, with some key modifications, including an increased sample size of

26 (ranging from Masters to Class A players) and computer-aiding for the

reconstruction of positions, to eliminate the physical limitations on piece

replacement in the original experiment that may have confounded results on

chunk size. The main results concurred with Chase and Simon‟s original study –

that is, skill effects on recall in both tasks disappeared for random positions. The

most startling difference in results, however, related to the size of chunks.

Whilst the effect of skill level on chunk size was again present, mean largest

chunk size at all skill levels was greater. In particular, for Masters this figure

was 16.8 in the recall task (compared with 7 in the original experiment), and 14

in the copying task. Moreover, some positions were reconstructed by Masters

using only one chunk.


26/78

26

This new data confirmed Gobet and Simon‟s development of chunking theory,

namely template theory (1998a, 1998b). Template theory uses the same basic

mechanism as chunking theory, so that chunks are stored in STM as pointers to

patterns in LTM; they are also used to reconstruct visuo-spatial images in the

mind‟s eye (the secondary transient memory store). Gobet and Simon stated that

the more typical the position, the stronger the associations that chunk will have

with semantic memory, including moves, plans and other patterns. Further, they

proposed that such positions are actually represented by templates, which are

essentially chunks with slots for variables. They therefore comprise a „core

chunk‟ and their parameters allow them to describe a range of chunks within a

class defined by the range of variable values. Templates can provide for large

constellations of pieces to be considered together where large chunks alone

cannot, since the number of chunks with, e.g. more than 10 pieces, required to

hold all meaningful patterns on those pieces would be unmanageably large.

Templates, instead, provide for the redundancy that occurs because classes of

chunks tend to share good moves, plans, tactical and strategic features etc. Gobet

and Simon emphasise, within template theory, the associations between chunks

and templates with semantic knowledge. As with chunking theory, the authors

suggested a leaning time for 8 seconds for chunks and templates. Two learning

parameters are proposed: Gobet and Simon also assert that “like the chunking

theory, template theor y is not limited to chess” (Gobet 1998b p.127)

Template theory served to address the outstanding criticisms of chunking theory

in the following ways. The null effect of interference for recall of chess

positions could be accounted for by chunk size, since if less STM pointers are

required to encode a single position (possibly only one for Masters) then noise


27/78

27

will not necessarily eradicate that memory. Likewise, Holding‟s criticisms on

chunk size and conceptual knowledge were countered by direct modifications to

the theory, which were supported by empirical evidence. Finally, Gobet (1998a)

has used template theory to explain skill differences for search variables; this is

discussed in the next section.

The integration of pattern recogniti on and search

Gobet (1998a) conducted a replication of de Groot‟s choice of next move

experiment with 48 Swiss players (ranging from Master to Class B) using de

Groot‟s position A, and conducted an extensive analysis of the resultant verbal

protocols, including the generation of problem behaviour graphs (Newell &

Simon, 1972) and the extraction of the same quantitative variables as de Groot,

with the aim of comparing results and reinvestigating the effects of search

variables on quality of next move. Gobet was motivated both by empirical

evidence that opposed de Groot‟s result that search variables did not differ across

skill levels, e.g. due to Charness (Gobet 2004) and by the lack of replication of

de Groot‟s original experiment; he was undoubtedly also motivated in seeking

empirical evidence to support his own work at that time with Herbert Simon in

developing template theory, since although the research was published in 1998,

the original data was collected as part of a different study in 1986. As well as a

small skill difference for the mean depth of search, Gobet discovered a skill

effect for the way in which progressive deepening was conducted. The variables

in the study characterising progressing deepening behaviour related to the

number of reinvestigations of sequences starting with the same base move; these

were sub-divided into immediate reinvestigations (same base move considered


28/78

28

twice in succession) and non-immediate reinvestigations (same base move

considered twice with at least one different base move considered in between),

and also maximal and total values, with the former providing the largest number

of reinvestigations (immediate or non-immediate) among all base moves

considered. The maximal number of immediate reinvestigations had a positive

association with skill level and the maximal number of non-immediate

reinvestigations had a negative association with skill level.

Gobet‟s main conclusions were that players in his sample differed along more

dimensions that those in de Groot‟s sample, and that the aver age values on all

variables (pooled across skill levels) did not differ significantly between studies.

Gobet notes that the differences he found within his sample were mainly between

Masters and Class players. Since de Groot‟s sample only included 2 players at

Class level, it is perhaps not surprising that such differences did not show up in

the original experiments.

Importantly, Gobet claims that his skill effects for search can still be accounted

for by pattern recognition models of chess thinking because sequences of moves

are likely to be associated with patterns: “pattern recognition should facilitate the

generation of moves in the mind‟s eye, permitting a smooth search” (1998a p24).

Saariluoma presented further evidence of the pattern-recognition-based search

hypothesis (Gobet 1998a, 2004) with his „smothered mate‟ experiment, in which

high calibre players were asked to choose a move that would lead to mate in a

specially devised endgame. The position was one that had an efficient, yet

unusual sequence of moves that led to mate as well as a longer, more familiar

sequence. Players tended to choose the move at the beginning of the stereotyped

position.


29/78

29

Summary

In summary, the relative influences of recognition and search-and-evaluation on

chess skill are not fully understood. Further, the degree to which these are, in

fact, separate processes rather than alternative descriptions of the same process,

is unclear. Certainly most advocates of either theory believes that both

recognition and search mechanisms are fundamental to chess skill. For example,

de Groot‟s (Gobet, 2004, p120) assertion that recognition serves to direct the

look-ahead search-and-evaluation suggests that these processes are, in some

sense, interdependent. Further, Holding‟s (1985) conclusion that search-and-

evaluation is the dominant process is based on the assertion that better players

plan these evaluations in a more effective way. Yet Holding‟s “knowledgeable

evaluations” (1985, p256) might well be directed by effective pattern-matching,

which is essentially De Groot‟s conclusion. Gobet and Simon‟s template theory,

developed in part due to criticisms of chunking theory from advocates of search-

and-evaluation, provides for a credible explanation of skill differences for search

(if it is accepted that templates can store sequences of moves). This extended

theory apparently leaves no room for alternative explanations of chess skill

wherever it could be argued that patterns exist (e.g. any experimentation

involving real chess positions). It therefore offers the possibility of unifying both

recognition-based and search-based theories. To refine the template theory

explanation of skill differences on search variables, further data concerning such

differences would be of great benefit.

Further, the balance of chess research has been in favour of recall tasks, rather

than choice of next move tasks. The attractions of recall tasks (over choice of

next move tasks) in explaining chess skill are the objectivity of the measures and


30/78

30

the ease with which data can be analysed. Since chess skill is primarily

concerned with decision-making , however, it seems strange that there are not

more studies based on the choice of next move task. Finally, research based on

the choice of next move task, perhaps because of the analytical overheads the

task usually imposes, tends to focus on a small number of positions, often only

one – notably Gobet (1998a). An obvious danger in generalising results from a

single position is that any position effects are discounted.


31/78

31

Methodology

This chapter outlines the experimental design, procedure and analytical methods

employed in the research. It also includes an ethical section. The ecological

nature of the experimentation in this study meant that a great deal of relatively

unstructured data (verbal protocols) were generated through the experimental

procedure. These data were subjected to a detailed and structured (qualitative)

protocol analysis that provided a set of quantitative variables to be entered into

statistical analyses. The intermediate results of the protocol analysis offer the

best means of conveying this part of the methodology and serve to precipitate the

relevant section of the Project Review. Appendix II therefore contains details of

the protocol analysis, including an example verbal protocol and Problem

Behaviour Graph (PBG).

Participants

Eight male chess players from four different clubs in Worcestershire and the

West Midlands took part in the experiment. Although their ages were not

recorded, all had been playing chess as graded players for between 30 and 45

years (mean 34.75 years, standard deviation 5.39 years). Their British Chess

Federation (BCF) grades were converted into the Fédération Internationale Des

Échecs (FIDE) standard Elo ratings using the BCF conversion formula (BCF,

2003) and subsequently mapped onto United States Chess Federation (USCF)

class divisions to facilitate comparisons between the results of this experiment

and those of existing studies (e.g. Gobet, 1998a). The players were assigned to


32/78

32

two skill levels according to their equivalent USCF class as described in Table 1,

below.

Level 1 (Expert; n=4) Level 2 (Class; n=4)Sample mean (BCF grading) 168 120Sample mean (FIDE Elo rating) 2087 1849Equivalent USCF class Expert Class A/ Class BEquivalent Elo rating band 2000 – 2200 1600 – 2000Table 1; Description of Skill levels of experiment players

Materials

Three chess positions were used in the experiment. They were positions A, B1

and C of de Groot‟s original choice of next move experiments (de Groot, 1965

pp88-93) and were labelled A, B and C, respectively. They were depicted as

standard chess position images on A4 card, complete with full move histories for

the games from which they were taken. The positions themselves can be found

in Appendix I

Portable digital recording equipment, and pen and paper, were also used in the

experiment. The recording time display on the equipment was made available to

the players in place of a chess clock.

Experimental Design and procedure

A 2 x 3 repeated measures experiment was conducted using the following

independent variables: Skill (Expert; Class) and Position (A; B; C).

The experiment, which was conducted with each participant individually and in a

quiet and undisturbed environment, consisted of a single „choice of next move‟

task repeated across three conditions, defined by the three positions described

above (A, B and C). The procedure was essentially the same as in the original de

Groot experiments of 1938-43 (de Groot, 1965). Before the first task began the


33/78

33

experimenter instructed the player that he would be presented with the positions

one by one and, for each, would be required to choose his next move, as if he

were playing over the board in normal tournament play; the only difference being

that he was requested to think aloud as he did so. The experimenter clarified that

„thinking aloud‟ was not the same as providing a commentary on one‟s thought

process, i.e. it was simply a natural verbal expression of thought. Further, the

player was informed that the positions were from real games and were not chess

„problems‟ (typified by a single provable winning move); and that there were no

time limits imposed, although a guideline was provided: that the player should

aim to spend as much time on the task as they might reasonably expect to in a

tournament game. Once the experimenter had checked that instructions had been

understood and had gained the player‟s informed consent for their participation,

the task began.

The conditions were conducted sequentially with the offer of a short break

between each if required. The position was presented to the player at the same

time the recording began. Thereafter the experimenter only intervened if asked a

direct question concerning procedure or if the participant had remained quiet for

a period of approximately 30 seconds; in the latter case the experimenter

prompted the player by asking, “What are you thinking now?” Throughout the

recording and wherever necessary, the experimenter noted questions for

clarification. At the end of each condition the recording was stopped and the

experimenter requested clarification accordingly. Most such instances concerned

a misreported or unspecified move, piece or square.

Upon completion of the three iterations of the „choice of next move task‟ the

experiment concluded.


34/78

34

Protocol Analysis

The data collected from the experiment consisted of a single verbal protocol for

each player at each level of the 2 x 3 design, giving 24 protocols in total. Each

protocol was transcribed into tabular format and used to generate a Problem

Behaviour Graph (PBG) according to the coding scheme set out in de Groot

(1965), Newell & Simon (1972) and Gobet (1998a). Appendix II describes the

coding scheme in greater detail and includes an example verbal protocol and the

PBG that was generated from it. It also provides definitions of the important

elements of PBGs from which the quantitative variables may be extracted.

Derivation of quantitative var iables

Table 2 describes the set of quantitative variables derived from each graph, and

its means of derivation. Although most of these variables were originally

devised by de Groot (1965) and also used by Gobet (1998a), two were novel and

are indicated in the table.


35/78

35

Quality of Move Subjective assessment of the quality of the chosen move(see Appendix A for the derivation of scores)

Total Time Total time taken for choice of next move: time elapsed frominitial presentation of position to confirmation of next moveselection

Time of First Phase Total time elapsed before first Episode begins

Number of Base Moves Number of distinct base moves (null moves permitted) Number of Episodes Number of distinct Episodes of problem-solving behaviour Number of Nodes Number of nodes (moves) considered, including repeated

and null moves.Total Depth Aggregate of search depths for each episode, with null

moves included in the totals. Episodic depth is defined bythe longest sequences of moves, beginning with the basemove, among all branches. This variable is only measuredto enable the calculation of Mean and Maximal SearchDepths.

Maximal Depth of Search The maximal number among all episodic depths, with nullmoves omitted from the totals.

Mean Depth of Search Mean episodic depth with null moves included ; Total Depth

divided by Episodes.Standard Deviation of Depth ofSearch

Standard deviation of episodic depth with null movesincluded . This is a new variable.

Rate of Base Moves Rate of generation of distinct base moves; Total Timedivided Base Moves

Rate of Nodes Rate at which nodes are considered; Total Time divided by Nodes

Total IR Total number of immediate reinvestigations of all basemoves

Total NIR Total number of non-immediate reinvestigations of all basemoves

Maximal IR The maximal IR amongst all base movesMaximal NIR The maximal NIR across all base moves

Number of Null Moves Total number of null moves among all nodes. This is a newvariable and is only measured to enable the calculation ofProportion of Null Moves.

Proportion of Null Moves Proportion of total number of nodes that are null moves; Nodes divided by Null Moves. This is a new variable.

Table 2; quantitative variables derived from Problem Behaviour Graphs

Ethics

The only serious ethical consideration for this research is the non-disclosure of

any personally identifiable data both during and after the life of the study.

Although all data has been rendered anonymous before reporting, players‟

choices of next move have being assessed and thus they may have reason to feel

that their individual performance is under scrutiny. To mitigate against any such

misconceptions, the experimenter explained that each player‟s data was to


36/78

36

remain anonymous and protected from unauthorised use under the Data

Protection Act 1998. The experimenter also explained that the anonymous

results would be published as part of the MSc. dissertation. The players were

also advised of their right to withdraw from the study, even retrospectively, and

the experimenter provided contact details to each player if they wished to

exercise this right.

The experimental procedure itself was totally innocuous – there were no risks to

the players‟ physical or mental well-being as a result of taking part.


37/78

37

Analysis

Each dependent variable in Table 2 except Total Depth of Search and Number of Null Moves was subjected to a repeated measures factorial analysis of variance

(ANOVA) with the between-subjects variable Skill and the within-subjects

variable Position. The criterion of sphericity was satisfied for all variables

entering each analysis except for Number of Non-immediate Reinvestigations,

which was subsequently excluded from the analysis. These results for each

variable are provided in the next section in meaningful groups; details of other

tests are provided under the appropriate headings. The second section compares

the results with those of similar studies, notably Gobet (2004) and the final

section provides a higher level discussion of all findings.

Resul ts from this study

Quali ty of M ove

The main effect of Skill on Quality of Move is significant (F(1,6)=9.757,

MSE=15.042, p


38/78

38

Skill

level

Position A Position B Position C

Move Quality Move Quality Move Quality

Expert Rc2 1 Rb8 5 Ne4 3

Bxd5 5 Rb8 5 Kh8 2

Bxd5 5 Rb8 5 Bd7 3

Bxd5 5 Rb8 5 e5 5

Class Rc2 1 Kf8 4 d5 1

b4 1 Rb8 5 Bd7 3

b4 1 Kg7 3 e5 5

Kh1 1 h5 2 Ne4 3

Table 3; Moves chosen and Quality of Move for all players across all

positions

Quality of Move

Skill level

ExpertClass

6

5

4

3

2

1

0

Position

A

B

C

Figure 1; estimated marginal means for Quality of Move

The most interesting features of the data illustrated above are that although

Position A appears to split Experts from Class players in terms of Quality of

Move, Move Quality in the other two Positions is better balanced across Skill

levels. In particular, the marginal means for Quality of Move across Skill levels

in position C are almost identical (Class = 3; Expert = 3.25). Further, no player

selected a „bad move‟ in Position B, with no Quality of Move score below 2.


39/78

39

Time vari ables

There is no main effect of Skill on Total Time (F(1,6)=0.605, MSE=29.592, ns)

and, in fact, Experts apparently taken longer than Class players in choosing their

next move in all three positions, the biggest of which was observed for Position

A (a mean total time of 14.5 minutes for Experts versus 9.2 minutes for Class

players). The same pattern is observed for the Time the First Phase; the main

effect of Skill is non-significant here also (F(1,6)=3.604, MSE=3.604, ns).

There is, however a significant main effect of Position on Total Time

(F(2,12)=8.117, MSE=64.528, p


40/78

40

Position Marginal Means Number of

Legal MovesNumber of Base

Moves

Number of

Episodes

A 4.625 10.25 56

B 3 7.75 35C 6.375 13.625 37

Table 4; Marginal Means for Base Moves/ Episodes and Number of Legal

Moves

As can be seen in Table 4, the relationship between Position and Number of Base

Moves does not apparently stem from the number of legal moves available in

each position: an average of 4.625 base moves are generated for position A (56

legal moves) and 3 for position B (35 legal moves), yet 6.375 of the possible 37

legal moves are generated for position C. Further, it can be seen that there

appears to be a linear relationship between the mean Number of Base Moves and

the mean Number of Episodes.

Number of Nodes

The main effect of Skill upon Number of Nodes is significant (F(1,6)=6.593,

MSE=4056, p


41/78

41

Number of Nodes

Skill

ExpertClass

70

60

50

40

30

20

10

Position

A

B

C

Figure 2; Marginal Means for Number of Nodes

Finally, the distribution of Number of Nodes is shown in Figure 3. Apart from

the outlier (117 nodes searched by one of the Expert players in Position A),

Number of Nodes is fairly normally distributed with all values < 100.

Number of Nodes

100 - 110

80 - 90

60 - 70

40 - 50

20 - 30

0 - 10

Frequency Distribution of Number of Nodes

6

5

4

3

2

1

0

Std. Dev = 26.51

Mean = 41

N = 24.00

Figure 3; Frequency distribution of Number of Nodes

Rate of generation

There are no effects (main or interaction) of Skill or Position on Rate of Base

Moves. The main effect of Skill level on Rate of Nodes is weakly significant

(F(1,6)=5.646, MSE=13.777, p


42/78

42

(F(2,12)=0.590, MSE=0.001978, ns) and no interaction effect. Better players

generate nodes more rapidly (Expert: mean 4.09 , s.d. 1.03; Class: mean 2.58,

s.d. 1.48), as illustrated in Figure 4.

Number of Nodes per minute

Skill level

ExpertClass

5.0

4.5

4.0

3.5

3.0

2.5

2.0

1.5

Position

A

B

C

Figure 4; Estimate marginal means of Number of Nodes per minute

Depth of Search

The main effect of Skill for Mean Depth of Search is significant (F(1,6)=3.977,

MSE=3.899, p


43/78

43

a. selecting the maximal search depth of all episodes undertaken to derive

Maximal Depth of Search (pooled);

b. Pooling both Total Depth of Search and Number of Episodes to derive the

new quotient Mean Depth of Search (pooled).

Table 5 summarises the corresponding search data entering the analysis.

Elo

rating

Maximal Depth

of Search

(pooled)

Total Depth

of Search

(pooled)

Number of

episodes

(pooled)

Mean Depth

of Search

(pooled)

1720 4 8 7 1.14

1780 8 79 25 3.161925 5 111 36 3.08

1970 7 128 29 4.41

2010 14 170 43 3.95

2045 11 100 26 3.85

2105 9 170 44 3.86

2190 9 144 43 3.35

Table 5; Pooled Mean and Maximal Depth of Search by Player

The regression of Maximal Depth of Search on Elo Rating is significant

(F(1,22)=10.597, MSE=59.802, p


44/78

44

Reinvestigations

There are no main effects of Skill or Position on any of the reinvestigation

variables although the interaction effect upon Maximal Number of IR is

significant (F(1,6)=7.895, MSE=6.25, p


45/78

45

Proportion of Null Moves

Skill

ExpertClass

.18

.16

.14

.12

.10

.08

.06

.04

Position

A

B

C

Figure 6; Estimated Marginal Means of Proportion of Null Moves


46/78

46

Summary

The following table summarises the main effects of Skill level and Position on

each of the dependent variables entered into the analysis.

Dependent variable Main effect of

Skill2

Main effect of

Position

Interaction

effect

Quality of Move p


47/78

47

Comparison wi th other studies

The results reported above are interpreted in the context of the design and sample

size. This is particularly important for comparisons with results from other

related studies, i.e. Gobet (1998a) and de Groot (1965). The sample was fairly

small sample with a relatively narrow range of skill levels; in particular there

were no Masters among the sample. De Groot‟s sample3 included players of all

skill levels down to Class (n=14; Grandmasters=5, Masters = 2; Experts = 5;

Class = 2). Gobet‟s sample was larger (n=48) with average skill level

somewhere in between de Groot‟s and the sample used in this study

(Masters=12; Experts=12; Class A=12; Class B=12). Conversely, the data in

both of the other studies is based on Position A only, whereas this study

employed three very different types of position (see Appendix I).

Quali ty of M ove

The results of both this study and Gobet‟s confirm de Groot‟s assertion that

better players choose stronger moves. The significance of the effect of Position

on Quality of Move in this study, however, suggests that some positions are more

difficult to select a good move for than others – in particular, Position A.

Interestingly, the position that the players were least comfortable with (Position

C) generated the best quality moves on average. Figure 1 suggests an interaction

effect, with the tactical and complex Position A splitting the two groups

effectively and the strategic and quieter Position B showing little difference, but

the corresponding F ratio is non-significant.

3 For the purposes of comparison, this sample includes only the players for whom detailedstatistics have been extracted from their Position A protocols, courtesy of Gobet (1998a)


48/78

48

Time vari ables

Gobet (1998a) found a weakly significant result for Total Time, suggesting that

Masters choose their next move more rapidly than lower calibre players. The

results above show no differences between Experts and Class players, although

the marginal means indicate that Experts are slower than Class players (12.68

minutes versus 10.46 minutes). The implication is that there are, in fact, no

differences between players of different levels in the time taken to choose their

next move. An observation from the experiment is that some players consciously

truncated their thought processes on the basis that, in a tournament game, too

much time spent on the single choice would lead them into time trouble. Gobet

found a significant reduction in the Time of First Phase for higher calibre players

whereas the results here are also non-significant. Time of First Phase was

perhaps one of the more difficult variables to extract from the protocols due to

the poorly defined boundary it shares with the Phase of Elaboration (de Groot,

1965). Although certain players deliberately sized up the situation and discussed

general plans before entering a longer phase of search and evaluation, others

apparently focused immediately on base moves and corresponding sequences,

whilst one player spent the majority of his time apparently in the First Phase

before committing to a move. This issue is revisited in the Methodological

Discussion.

Base Moves and Episodes

Gobet‟s results suggest a curvilinear relationship for both variables with Skill,

since Class A players generate more base moves and episodes than either Experts

or Class B players, although only the effect on Number of Base Moves is


49/78

49

significant (Gobet 1998a). Perhaps unsurprisingly, with Class A and B players

pooled in this experiment, there are no significant effects of Skill. The

significant effects of Position on both Number of Base Moves and Number of

Episodes, however, again suggests that different types of position give rise to

different search and evaluation strategies irrespective of skill level, but that this

relationship is not explained by the complexity of the position (as measured by

number of legal moves). Position C demanded the widest search for base moves

and generated the most episodes; it may be argued that the character of this

position is perhaps more ambiguous that the other two, containing strategic and

tactical themes. It is possible that this required players to pursue potential

tactical lines as well as more strategic moves.

Search variables 4

De Groot (1965) based his main conclusion, that recognition is the dominant

mechanism in chess thinking, on two results suggesting that search behaviour

does not differ across skill levels (at least at the higher levels of chess skill):

1. Chess players rarely search more than 100 nodes in any position;

2. There are no significant effects of skill on any search variable (e.g. Number

of Nodes, Mean Depth of Search, Maximal Depth of Search).

Whilst both this study and Gobet‟s (1998a) provide evidence in support of the

first result, this study shows that Experts do search more nodes than Class

players. This is partially backed up by Gobet (1998a): although he did not find a

skill effect for Number of Nodes in position A, the average number of Nodes was

considerably lower for the Class B group (33.9) than for the other groups (58 for

4 The variables in the previous groups Number of Nodes, Rate of generation and Depth of Searchare considered here together.


50/78

50

Masters, 58.3 for Experts and 56.8 for Class A players; Gobet 1998a p13). The

significant difference found here, therefore, might be due, in part, to the reduced

skill range among the players in the experiment; it could be that the biggest skill

differences for this search variable are actually to be found between Experts and

Class players. This suggests that there is a improvement in search capacity up to

Expert level, beyond which this measure remains fairly constant – and that de

Groot‟s second result, above, does not hold below the level of Expert.

This study also confirms the significant result from Gobet (1998a) concerning

the effect of Skill on Mean Depth of Search, and adds evidence to the argument

(counter to that of de Groot) that higher calibre players employ greater search

than lower calibre player – due to the significant result on Maximal Depth of

Search.

To investigate such effects in more detail, Charness (Holding 1985; Gobet, 2004)

and Gobet (1998) made predictions of search capabilities for different skill levels

by analysing the relationship between Elo rating and selected depth of search

variables (Maximal Depth of Search and Mean Depth of Search). Charness, in

his 1981 experiment investigating the effects of age and skill on search

capabilities, used four positions, two of which were strategic whilst the other two

were tactical in nature. Gobet used only one position, de Groot‟s position A,

which is highly tactical in nature. The regression equations calculated from the

pooled data in this study suggest slightly larger increases in Maximal Depth of

Search and Mean Depth of Search per 200 Elo points than evidenced by the

previous studies (see Table 7).


51/78

51

Prediction This study Charness Gobet

Increase in Maximal Depth ofSearch per 200 Elo points

2.1 1.4 N/A

Increase in Mean Depth of Search per 200 Elo points

0.8 0.5 0.6

Table 7; predicted gain in search capabilities as a function of Elo rating

In interpreting this result it is noted that:

1. de Groot‟s results are based on a sample dominated by Grandmasters,

Masters and Experts;

2. Charness and Gobet found skill differences for search capabilities when

lower calibre players were more prevalent in the sample;

3. Both Charness and Gobet have suggested that the relationship between skill

level and search capabilities across all playing levels is not linear. Whilst

Charness proposes a plateau effect for high calibre players, Gobet suggests a

curvilinear relationship, whereby high calibre players actually search less due

to better recognition-led evaluation capabilities.

Given the relatively low calibre of the players in this sample, the data presented

here therefore extends the model of Gobet in suggesting that rate of change of

search capability (as measured by Mean and Maximal Depth of Search) is

greater at lower skill levels (e.g. between Class A/B and Expert). Note that the

predictions for Mean Depth of Search are similar across three studies that used

different combinations of types of position. This backs up the result of the

previous section that states that there is no significant effect of Position on either

Mean Depth of Search or Maximal Depth of Search.

Rate of generation

The weakly significant effect of Skill on Rate of Nodes is divergent with Gobet‟s

(1998a) result. Although neither study provides evidence for an effect of Skill on


52/78

52

Rate of Base Moves, Charness‟s 1981 result (Gobet, 1998a) suggests that

Grandmasters generate more base moves per minute than Experts. The reduced

sample size in this study might explain why such a result was not identified here.

Reinvestigations

There was a degree of convergence with Gobet (1998a) concerning

reinvestigation variables. Gobet‟s only significant results in this area were for

the main effects of Skill on Maximal Number of IR (p


53/78

53

Gobet (1998a) also asserted that Maximal Number of NIR is inversely

proportional to Skill, yet an ANOVA with the current data (Position A only)

generates a non-significant result, as Figure 7 indicates.

Maximal Number of NIR

Skill

ExpertClass

2.0

1.8

1.6

1.4

1.2

1.0

.8

.6

Position

A

B

C

Figure 7; Estimated Marginal Means for Maximal Number of NIR

Null Moves

The significant skill effect for Proportion of Null Moves suggests that better

players think in terms of completely specified sequences of moves more often

than lesser players. By means of a comparison, Saariluoma and Hohlfeld (Gobet

2004)5 examined the proportion of null moves as a function of position type

(strategic or tactical) and found that it is greater, at approximately 12%, in

strategic positions; Charness (Gobet 2004) previously found this percentage to be

approximately 10%. Interestingly, although the result in the current study holds

for Expert players (Position B = 11%; Position A = 5.5%; Position C = 5%),

Class players search approximately 15-16% null moves irrespective of position

type. (See also Figure 6.)

5 Calibre of players involved in the study unspecified.


54/78

54

The differences in proportions across the 3 positions as each skill level lead to

two alternative interpretations:

1. Strategic positions (Position B) demand more generalised „plan

formulation‟ than tactical positions (Position A and, to a certain extent,

Position C). result is an increased proportion of templates of move

sequences;

2. Better players are simply more thorough in their analysis of tactical

sequences.

Summary

The results generated by this study broadly agree with those of Gobet (1998a),

Charness (Holding, 1985; Gobet, 2004) and Saariluoma and Hohlfeld (Gobet

2004) and argue against some of de Groot‟s earlier conclusions. Better players

make better choices of move, as shown by de Groot (1965) and Gobet (1998a),

but they also search more, to a greater depth and more thoroughly than lesser

players. The exact relationship between skill and both capacity and depth of

search is probably not linear. It appears that the rate of increase in search

capacity plateaus at the level of Master and above; and that depth of search may

actually vary in a curvilinear fashion with skill level, with a rate of increase that

itself decreases, and actually changes sign, as skill level increases from Class B

to Grandmaster. Given the difference in calibre of players in the samples

considered across the various studies, it is entirely possible that de Groot‟s

results on search variables were actually correct – it is merely the applicability of

the conclusions to lower skill levels that is in question.


55/78

55

Project Review

This chapter reflects upon a two key issues: the necessary refocusing of the

research throughout its course (including modifications both to the design and

the analysis) and the validity of the data collection and analysis methods used in

support of the choice of next move task.

Focus of research

The final dissertation is far more focused than the original research proposal

suggested in might be. The main reason for this is that one half of the study was

suspended to keep the study to a manageable size, both in a positive sense (due to

the healthy amount of material available from the choice of next move task) and

a negative sense (due to both access difficulties and increased overheads of

qualitative analysis). The original experimental design included a choice of next

move task and a personal construct elicitation task, the latter conceived with the

aim of investigating the nature of conceptual knowledge that chess players

possess. Holding (1985) postulated that conceptual knowledge, along with

search and evaluation, explain skill in chess and one of his main criticisms of

chunking theory was that chunks were too small in size to reflect conceptual

knowledge (Gobet & Simon, 1998b). Template theory (Gobet & Simon, 1998a)

addresses this criticism by introducing larger perceptual structures known as

templates, which are large enough, in theory, to encode entire positions.

Personal Construct Psychology (PCP) is concerned with how individuals

construe the world, based on the assertion that each man possesses an ever

changing set of hypotheses about the world that are represented on personal


56/78

56

constructs – essentially axes of reference characterised by contrasting poles (e.g.

we may hypothesise about people on the construct „good- bad‟ or we may

hypothesise about chess positions on the construct, „tactical-strategic‟). Must of

PCP is due to George Kelly, who also devised the Repertory Grid technique,

which includes methods for the elicitation of personal constructs (Fransella, Bell

& Bannister, 2004).

Under the assumption that personal constructs, which may exist at any level of

abstraction, are equivalent ways of classifying/ describing both templates and the

higher level schemata that they relate to, the research questions that the second

half of the study concerned, therefore, were:

How many constructs do chess players of a given skill level possess?

How are the construct systems of chess players organised?

What degree of overlap is there between different chess players‟ construct

systems, particularly those players with similar skill levels? What are the most concrete constructs and do they correspond to Chase &

Simon‟s piece relations in chunking theory?

Thus the questions for this part of the study were fairly open-ended and the

analysis was intended to be investigative. The basic procedure chosen was the

method of triads, whereby thee „elements‟ (in this case, chess positions) are

Date post:	06-Jul-2018
Category:	Documents
Upload:	gadle-monick
View:	231 times
Download:	2 times

How Chess Players Think-Patrick Turner

Documents