+ All Categories
Home > Documents > [a. M. Ungar] Normalization, Cut-Elimination, And (BookFi.org)

[a. M. Ungar] Normalization, Cut-Elimination, And (BookFi.org)

Date post: 30-Oct-2014
Category:
Upload: carlos-ramirez
View: 43 times
Download: 2 times
Share this document with a friend
Popular Tags:
245
NORMALIZATION, CUT-ELIMINATION AND THE THEORY OF PROOFS
Transcript

NORMALIZATION, CUT-ELIMINATION AND THE THEORY

OF PROOFS

CSLI Lecture Notes

No. 28

NORMALIZATION, CUT-ELIMINATION AND THE THEORY

OF PROOFS

A. M. Ungar

V^OJUfA CENTER FOR THE STUDY OF LANGUAGE AND INFORMATION

Copyright © 1992 Center for the Study of Language and Information

Leland Stanford Junior University

Printed in the United States

CIP data and other information appear at the end of the book

Contents

Introduction 1

1 Background 11

2 Comparing NJ with LJ 32

3 Natural Deduction Revisited 46

4 The Problem of Substitution 55

5 A Multiple-Conclusion Calculus 77

6 Reduction Procedures 102

7 Correspondence Results 126

8 Interpretations of Derivations 153

Appendices

A A Strong Cut-Elimination Theorem for LJ 186

B A Formulation of the Classical Sequent Calculus 200

C Proofs and Categories 220

List of Works Cited 229

Index 234

v

Introduction

The idea that proofs ate objects capable of being treated by a mathematical theory is due to Hilbert. His motivations, the resulting theory, its successes and failures are too well known to need rehearsing here. There is, however, one point to be emphasized about Hilbert's theory: because it was in part at least an attempt to reduce abstract reasoning to reasoning about concrete entities, proofs were identified with the derivations of a formal system and it was these latter which were the objects of study. It is equally well known that Hilbert's program found little favor with Brouwer who objected, on the one hand, that the foundational purpose of the program could not be achieved since the study of formal systems presupposed abstract mathematical principles of the kind it was intended to justify and, on the other, that Hilbert had falsified the nature of mathematics by identifying it with an activity which was essentially linguistic (or, at the very least, which was reducible to such an activity).

Because Brouwer viewed mathematics as an activity taking place in the minds of individual mathematicians, a formal proof of the kind studied by Hilbert—or even an informal proof of the kind to be found in any account of mathematics—could serve only as a device to aid memory or communication. In fact, the same could be said of mathematical statements in general. As proofs describe certain kinds of mental activity, so assertions report that the activity in question has or may be carried out; more specifically, a mathematical statement asserts the execution (either as a possibility or as a fait accompli) of the mental construction described by its proof. This radical reinterpret at ion of mathematical statements so that, instead of describing properties of and relations between various kinds of objects, they refer to mental activity assigns to proofs a far more important role in the study of mathematics than being the subject matter of a small part of it. On this conception, the mental constructions described by proofs are the subject matter of every mathematical assertion. (Of course, the proofs themselves are secondary to the experiences or activities which they describe. It is the latter which validate them and there is no guaran-

1

2 Normalization, Cut-Elimination and the Theory of Proofs

tee that every such experience can be given satisfactory expression in this form.) This emphasis upon the priority of non-linguistic activity concentrates attention upon what proofs describe and in the subsequent tradition even the meaning of the term proof has shifted from 'Interpreted linguistic structure" to "the interpretation of a certain linguistic structure," while a shift in the opposite direction takes place in the tradition beginning with Hilbert (proofs as interpreted linguistic structures are bled of their meaning and identified with derivations of a formal system).

Brouwer's conception of mathematics may strike some as eccentric. Nevertheless, the criticisms he advances against Hilbert are well taken. Even if one doubts whether thought can be separated from language and believes that mathematical statements are indeed what they appear prima facie to be, namely, assertions about the properties of independently existing objects, a distinction can be drawn between proofs as linguistic objects and their interpretations, and it is the latter which are of primary interest. In fact, with the perspective of hindsight, it seems clear that Hilbert's proof theory is misnamed—on any interpretation of the term "proof: what he was interested in studying was not proof, but the combinatorial structure of particular representations of proofs. The interest of such a study depends of course upon the ability to translate knowledge about the latter into knowledge about the former. But, granted this possibility, the issue arises as to whether there are interesting questions about the former which cannot be approached through a study of the latter and, if so, whether a mathematical study of the former would be a profitable way to approach them.

As far as the first question is concerned, it seems clear that the study of the interpretations of proofs has a general metaphysical interest of the same kind as the investigation of propositions—the interpretations of sentences— and is prompted by the same kinds of consideration. Sentences are tied to particular languages and are nothing more than conventional arrangements of signs unless they are thought of as expressing something significant; furthermore, whatever they express is not tied to a particular language and may indeed be independent of language altogether. The most fundamental problems about propositions concern their ontological status and their structure. At the very least, if there are to be propositions at all, we need to know when sentences express the same proposition, in other words, to be able to formulate identity criteria for propositions.

The same can be said about the interpretations of proofs. Not only are proofs tied to a particular language but, if we consider formal derivations, they are constructed according to a particular set of rules whose choice is in some sense logically insignificant. (I am not referring here to the choice between first-order and higher-order rules, for example, or between classical and intuitionistic ones. These presumably are substantive ones motivated by the intended interpretation of the proofs in question. I am thinking rather of the choice that is made when one formulation of classical first-

Introduction 3

order logic, say, is preferred to another.) The recognition that two proofs constructed according to different rules (from the same or different calculi) are the same can only be explained by reference to their interpretations, the analogues of propositions, whose identity is being asserted.

It might be thought that the problems associated with identifying and individuating propositions are sufficiently difficult that they become aggravated to the point of intractability when addressed in the context of proofs. It seems to me, however, that the opposite is in fact true, i.e., that they become more amenable to solution. It seems quite plausible after all to claim that logical operations can be represented by structurally similar linguistic ones, with the result that a connection can be established between the structure of a proof and that of its interpretation. On the other hand, there is little reason to believe in a general structural correspondence between sentences and the propositions they express. Furthermore, even if the interpretation of a mathematical proof should turn out to be a structure built up from propositions, at least these will be logical combinations of propositions of a standard, rather simple form (for example, the interpretations of equations whose terms range over a well-determined domain).

In addition to questions of general philosophical interest, there are mathematically interesting ones which can be formulated in terms of proofs. On the view suggested above, proofs provide evidence for mathematical theorems. It is wrong, however, to think of any science, physical, mathematical or social, simply as a collection of putative truths the evidence for which, although perhaps essential to the determination of these truths, is not part of the subject matter of the science in question. The very idea of an organized body of knowledge implies that a theory is not to be described simply as a set of statements; these statements must be related to one another and, in some cases, to other data. Furthermore, in our scientific studies we seek not only to know what is true but also to understand why. In other words, the evidence for the claims of a particular science are themselves a part of it, and deserve to be considered as such. The application of this claim to mathematics is scarcely controversial. It boils down to saying that proofs are an integral part of mathematics on any conception of the subject. Even those mathematicians who have railed against them seem to have directed their barbs more against what they considered to be excessive rigor, or a mistaken view of the relationship between proofs and intuition, rather than against the notion of proof itself.1

The point is worth making, nevertheless, since the trend in logic during the last 60 years has been to confine serious interest in proofs to those

1See, for example, the authors quoted on this subject by Morris Kline in Mathematics: The Loss of Certainty, Oxford, 1980.

4 Normalization, Cut-Elimination and the Theory of Proofs

favoring constructive or intuitionistic interpretations of mathematics. This is because of what might be called the model-theoretic view of mathematics, namely, that in mathematics one attempts to discover what statements are true in certain structures. Proofs matter solely because they provide a means of establishing such truths; hence, the importance of axiomatizabil-ity, etc. I am not claiming that this view is false, but it is surely incomplete. The model-theoretic view provides a useful classificatory framework for mathematics, but it neglects certain features of the subject. In particular, it suggests that axiomatizations are to be judged only by external criteria—for example, by the set of their consequences—and it neglects the explanatory function of proofs by judging them only by what they establish.

In fact, how a theorem is established and the methods employed in its proof are matters of interest to all kinds of mathematicians, even though the nature of this interest, like the nature of the proofs themselves, may vary depending upon one's viewpoint. It is easy enough to illustrate this thesis from the history of mathematics, beginning with the ruler and compass constructions of the ancients and ending with the current interest in mechanical theorem proving and checking. More generally, it seems fair to claim that mathematicians are interested in why a theorem holds, not simply in its holding;2 they recognize that different proofs of the same result—for example, Cantor's and Liouiville's proofs of the existence of transcendental numbers—each have their own significance, and interest themselves in new proofs of familiar results. (The question here is not whether one proof is more reliable than another, but what additional understanding a particular proof provides.) Such an interest, however, presupposes an ability to distinguish between proofs of the same conclusion, in particular cases at least, and suggests more general questions about their identity and difference.

Granted the mathematical interest of questions about proofs, it remains to argue that these are best formulated as questions about their interpretations, rather than about certain kinds of linguistic objects. It seems clear that the preceding discussion is not about the combinatorial properties of formal objects, but at the very least about interpreted proofs. Even a discussion of machine provability can only be carried out within a framework in which the reduction of proofs to formal objects is not presupposed. As for the advantages and disadvantages of interpretations as opposed to interpreted linguistic objects, the advantages are clear. Interpretations are not tied to particular languages, to particular formulations of the rules nor to particular concrete representations of the structure of proofs. Furthermore, they are fundamental to the questions raised above; our interest in linguistic objects arises from their role as representations and our evaluation of

2The first chapter of The Book of Prime Number Records by Paulo Ribenboim (2nd. edition, New York, 1989), which is devoted to presenting nine and a half proofs of the existence of infinitely many primes, may serve to illustrate this fact—even though the theory of half proofs lies beyond the scope of the present work.

Int roduct ion 5

them is in terms of what they represent. It seems perverse therefore, if not downright erroneous, to treat the representation as fundamental. The disadvantages, I suppose, have to do with clarity and metaphysics. It is easier to deal with concrete than with more abstract objects but, since it is the latter whose nature and properties we are attempting to understand, it seems better to face this problem directly, rather than attempting vainly to avoid it altogether (whatever success such tactics may have had in certain formal applications). As for the metaphysical issue, it seems to me that nothing is gained by denying existence to the interpretations of proofs. Once we acknowledge our interest in determining when two expressions have the same interpretation, we have as good as acknowledged it—no identity without an entity, one might say. Of course, the nature of these interpretations remains open—whether they are to be regarded as abstract objects, mental ones, or even equivalence classes of inscriptions—but such questions can be deferred for the time being.

I wish to equivocate here not only about what constitutes such an interpretation, but even about the terminology appropriate to describe it. The fact is that usage provides little guidance in the matter. Phrases like "the denotation of a proof or "what is expressed by a proof" depend on analogies (like the one between proofs and sentences utilized above), and substantial issues may hinge upon which are deemed to be appropriate and, if more than one, whether significant distinctions can be formulated in terms of them. These issues should not be prejudged by the terminology used in discussing them. Some recent commentators have reserved the term "proof" for the interpretation of a certain kind of linguistic object called a "derivation". Although this terminology does not accord well with usage, it will be adopted henceforth for want of anything better (with the proviso that a derivation need not be formal). The precise relationship between derivations and proofs, in particular, whether the former can be said to express or denote the latter, remains to be investigated.

Having argued for the interest of studying proofs (rather than derivations), it remains to argue for the suitability of a mathematical treatment of this subject. A conclusive argument for this claim would be a full blown theory of the kind in question. Lacking that, I hope the considerations advanced below make it seem worth pursuing. It should be said that a theory of proofs—in the sense explained above—is not a new idea. The attempt to utilize the methods so successfully applied to the foundations of classical mathematics for the study of intuitionistic foundations has resulted in efforts in this direction (for reasons which should be clear from my earlier discussion of Brouwer's opinions). I am thinking particularly of the attempts by Kreisel and some of his followers to formulate the basic properties of proofs, in other words, to provide an axiomatic theory of them.3

3These are described in a succession of papers, beginning with his "Foundations of

6 Normalization, Cut-Elimination and the Theory of Proofs

It seems fair to say that the results of their efforts have been somewhat inconclusive, as Kreisel himself seemed to acknowledge when he remarked that "at present there do not seem to be any even mildly promising ideas for a systematic or fundamental science of proof."4

Another approach to formulating a mathematical theory is to regard it as a theory of familiar mathematical objects and attempt to specify which of these objects the theory is intended to describe. This too has been tried in the case of proofs, again in the constructive tradition for the most part, the familiar mathematical objects in this case being functions. The idea itself is an old one, and is already implicit in Heyting's interpretation of the logical operators.5 More recently, it has been developed by Godel and exploited to prove the consistency of arithmetic.6 I do not mean to suggest that these two approaches are independent of one another: the axioms for proofs are, one assumes, suggested by an intended interpretation and, conversely, the identification of proofs with mathematical objects of a certain kind needs to be supplemented by principles which pick out suitable ones of the kind in question. Nevertheless, the distinction between them is worth drawing if only for heuristic reasons.

A mathematical theory may be regarded from a syntactic or a semantic point of view. Sometimes the semantic aspect is dominant in the sense that the truths of the theory are determined by a very clear conception of its intended interpretation, as in the case of arithmetic. In other cases, non-Euclidean geometry for example, the syntactic aspect predominates: we search for interpretations which make a certain set of statements true. These two possibilities are not exhaustive, however. There are surely cases where the intended interpretation is conceived too indistinctly to determine the truths of the theory even though there is no other source for them. Here it seems helpful to alternate between semantic and syntactic approaches. An initial body of data leads to the adoption of certain principles; the investigation of these principles sharpens our ideas about possible interpretations; these interpretations in turn suggest other principles, and so on.7 The development of set theory seems to me to exemplify this general

Intuitionistic Logic," pp. 198-210 of Logic, Methodology and Philosophy of Science, edited by E. Nagel, P. Suppes and A. Tarski, Stanford, 1962.

4 "On the kind of data needed for a theory of proof," Logic Colloquium 76, edited by R. Gandy and J. Hyland, Amsterdam, 1977, page 125.

5See Jntuitionism: An Introduction by A. Heyting (3rd. edition, Amsterdam, 1971), although the interpretation was first formulated in his Mathematische Grundlagen-forschung. Intuitionismus. Beweistheorie., Berlin, 1934.

6 "Uber eine bisher noch nicht beniitzte Erweiterung des finiten Standpunktes," Dialec-tica, Vol. 12, 1958.

7This account accords well with Godel's views about the nature of axiomatic systems, as reported by Hao Wang in his book Reflections on Kurt Godel, Cambridge Mass., 1987. For example: "But Godel does not take an axiomatic system as an 'implicit definition' of the concept(s) in it, because it is supposed to be a report of our (generally incomplete) intuitions of the concept and can be revised and expanded. It does not

In t roduct ion 7

schema, and my suggestion is that it also provides a helpful model for the development of a theory of proofs.

The analysis of a certain part of our intellectual activity yields formal systems which purport to represent reasoning. The elements of these systems are derivations, related in various ways and with various properties. These systems of derivations themselves admit interpretations which impart significance to the aforementioned properties and relations, revealing difficulties and discrepancies between different formalizations which may then be resolved or reconciled by looking again at interpretations.

Gentzen's systems of natural deduction—N systems, for short—seem to be the result of just such an analysis. I do not mean to suggest by this that they can be used to represent every kind of reasoning. Nevertheless, their derivations may plausibly be claimed to represent a certain class of proofs—and quite a significant class at that. Gentzen after all arrived at his rules by analyzing the actual practice 01 mathematicians, and our intuitions are sufficiently well developed to be able to recognize the success of his analysis. Furthermore, his rules offer a systematic treatment of the logical particles: the inferential behavior of each one is governed by a pair of symmetrical rules (an introduction and an elimination) which have a certain separability property.8 This lends credibility to the claim, for its truth does not require that every step in a piece of reasoning correspond to a rule of the system, only that every such step can be expressed naturally in terms of these rules. An analysis like Gentzen's, therefore, which appears to break inference down into atomic steps, is well suited for this purpose.

One of the distinguishing features of Gentzen's systems is that they admit a normal form theorem. This is a very general result about the structure of their derivations. Roughly speaking, it asserts that any derivation II can be converted into one of the same conclusion which has a particularly simple and direct form (and whose assumptions are included amongst those of IT). Now, when considering the derivations of a calculus as representations of proofs, it seems appropriate to concentrate upon those of their features which any object built up by means of its rules must possess, rather than upon those which depend on how the rules are interpreted as building up structures. In the case of natural deduction, the most conspicuous examples of the former are the reduction relations which hold between two derivations when one is obtained from the other by performing one of the transformations employed in the proof of the normal form theorem. As for the latter, although natural deduction derivations are usually thought of

define (and fix completely) the concept, but rather invites improvements by comparison with the changing intuition." (page 247).

8Let * be any connective, then its introduction and elimination rules make no mention of the other connectives and, if A contains no occurrences of *, then A is derivable in the system iff it is derivable in the system without these rules.

8 Normalization, Cut-Elimination and the Theory of Proofs

as trees whose nodes are labelled by formulae, there is nothing about the rules which forces us to interpret them in this way. They could equally well be sequences of formulae, trees labelled by sequents or, for that matter, a variety of physical structures. For this reason, properties distinctive to these labelled trees fall into the latter category. In contrast to the relations mentioned earlier, they are connected only very indirectly with the content of the rules and seem less likely, therefore, to be significant from the point of view of a theory of proofs. (In fact, a likely benefit of such a theory would be to obliterate distracting distinctions between different realizations of derivations.)

These considerations lend credence to the suggestion that, although formulated as a result about the combinatorial properties of derivations, the normal form theorem really tells us something about the structure of the underlying proofs. The relationship between a derivation and its normal form, so it has been claimed, reflects a relationship between the proofs which they represent, and the operations by which the former is transformed into the latter can be interpreted either as operations on the proofs themselves or as preserving certain properties of these proofs. In addition, the formal analogy between logical calculi and theories of functions (the A-calculus, for example), according to which derivations of the former can be correlated with terms of the latter, provides a suggestive interpretation of the theorem. This is because the reduction steps which effect the conversion referred to in its statement correspond on this analogy to the computation rules for terms which are used to analyze the equality relation between them.

It seems reasonable, therefore, to hope that the theorem casts some light on the question of the identity of proofs: just as any model of the A-calculus must respect the equivalence generated by the computation rules for its terms, so any interpretation of what might be called the N theory of proofs (i.e., the theory whose axioms are inequalities expressing the reduction steps between natural deduction derivations) would have to assign the same proof to interconvertible derivations. The terms of this theory are derivations, and the proofs which they are supposed to denote can be thought of as functions. It has even been conjectured that interconvertibil-ity characterizes the identity relation between proofs.9

The preceding seems to provide an attractive and coherent framework within which to investigate a certain conception of proofs and their identity criteria. Notice incidentally that proofs are treated here as the denotations of derivations. Although this appears to conflict with a widely held view of proofs as intensional objects (whatever that phrase may mean), it seems to me that such a treatment poses no new problems for those who have at-

9 The conjecture is defended in a number of papers by Dag Prawitz, who attributes it to Martin-Ldf and the influence of some related ideas of Tait.

Introduction 9

tempted to explain meaning in proof-theoretic terms.10 There is, however, a striking obstacle in the path of developing the N theory along the lines suggested above, namely, what might be called by analogy the L theory of proofs. As is well known, Gentzen abandoned his systems of natural deduction as a tool for metamathematical investigation in favor of sequent calculi—L systems, for short. The latter are a reformulation of the former which constitute little more than a notational variant in the case of the introduction rules, but replace the eliminations by a more restrictive set of rules whose equivalence to them is most easily demonstrated with the aid of the cut rule. Corresponding to the normal form theorem for N derivations is a cut-elimination theorem for L ones, whose proof proceeds by systematically transforming a derivation with cut into a derivation of the same conclusion without cut. There is an obvious similarity between derivations in the two kinds of calculus which extends in a rough and ready way to a correlation between normal and cut-free derivations and, in an even rougher way, to the steps by which derivations are converted into normal or cut-free form. In view of this, it is possible to interpret sequent derivations too as terms denoting functions and the cut-elimination steps as analyzing equality. This is what I mean by the L theory of proofs.

Unfortunately, on closer inspection, discrepancies emerge between the two theories. These were noticed first by Zucker in his paper "The Correspondence between Cut-Elimination and Normalization."11 He showed that, although a very satisfactory correspondence obtains when attention is restricted to fragments without disjunction or the existential quantifier, the full theories as they stand are incompatible with one another. At the end of his paper, he considers a modification of the L theory which conforms better to the N theory but, as he himself admits, it is at best ad hoc. As a result, he seems to give up the idea of reconciling the two theories and concludes that there may be meaningful properties of proofs which are preserved by all reductions of the N theory, but not by those of the L theory. I believe, however, that this assessment is both too optimistic and too pessimistic: too optimistic because, if the two theories cannot be reconciled, this detracts from the plausibility of any claim to the effect that the structure of derivations in either calculus tells us very much about properties of proofs, and too pessimistic because it may yet be possible to reconcile them. The chapters which follow are intended to defend this last claim and expand on some of the issues raised above.

Chapter 1 contains a brief introduction to Gentzen's N and L systems, including a sketch of the proofs of the normal form and cut-elimination theorems. In Chapter 2, I outline the correspondence between the two kinds of calculus and discuss Zucker's results, both positive and negative.

See Chapter 8 below for a fuller discussion of these and related matters. Annals of Mathematical Logic, Vol. 7, 1974.

10 Normalization, Cut-Elimination and the Theory of Proofs

Chapter 3 is concerned with the issue of whether natural deduction derivations can be said to represent proofs adequately and suggests that they might be deficient in some respects; these deficiencies, I argue, play a role in the breakdown of the correspondence. The main result of Chapter 4 is that almost any formal system whose derivations are supposed to represent proofs will inevitably share some of these defects; this leads me to conclude that there is little hope of explaining the usual procedures for reducing a derivation to its normal form solely in terms of proofs and their properties. In Chapter 5, I present a system of natural deduction whose derivations have more than one conclusion. Systems of this kind, I argue, have some advantages over more traditional calculi; in particular, they provide a suitable framework within which to compare the different reduction procedures used for N and L derivations. Chapter 6 contains a description and analysis of reduction steps for the derivations of this system and, in Chapter 7, I show in some detail how they can be related to reduction procedures for the classical sequent calculus. The results of this chapter suggest that proofs may have less structure than is usually thought.

Chapter 8 opens with a critical examination of some proposals to interpret the normal form theorem along the lines suggested above. The conclusions arrived at earlier are used to argue that obstacles in the way of establishing a correspondence between various reduction procedures and of justifying them in terms of proofs do not rule out the possibility of such an interpretation, but rather indicate that traditional procedures are too restrictive and traditional ideas about proofs too narrow. To substantiate this opinion, I suggest how proofs, regarded as denotations of formal derivations, may be incorporated into a general account of justification and how, by applying the methodological principle advocated above, the interplay between theory and interpretation may be exploited to help clarify their properties. The resulting notion of proof, although not entirely standard, has some virtues. It avoids the more annoying obstacles in the way of a unified treatment of cut-elimination and normalization. Furthermore, by diminishing the significance of some structural features of derivations, it opens up the possibility that a general theory of proofs can afford to disregard the minutiae of syntax. This seems to me to be a helpful idea. Proof theory is usually identified with a detailed study of the syntactic features of formal systems. Such a view of the subject, however appropriate when it is regarded as a metamathematical tool, has not provided many answers to general questions about the nature of proofs. Although the conclusions of Chapter 8 are speculative, the approach advocated there, I suggest, may yet do so.

Acknowledgments: I am grateful to Solomon Feferman for instruction and advice on the topics discussed in this book. I would also like to thank the publications staff at CSLI for their painstaking work in producing the final copy.

1

Background

It is a commonplace of the textbooks that logic can be regarded from two points of view. Here, for example, is Copi's account of the matter: "On the one hand, logic is an instrument or organon for appraising the correctness of reasoning; on the other hand, the principles and methods of logic used as organon are interesting and important topics to be themselves systematically investigated."1 Furthermore, it has become customary to associate this distinction with that between formal system and metatheory—as Copi appears to do. Such early pioneers of formalization as Russell, however, seem not to have thought in quite these terms. Not only was their view of logic somewhat broader than the one taken above but, more to the present point, they tended to blur the distinction between logic as organon and logic as systematic investigation. In part, this was because they did not regard formal systems simply as instruments for the appraisal of reasoning. Consider, for example, how Russell introduces his axiomatization of propositional logic.

But the subject to be treated in what follows is the theory of how one proposition can be inferred from another. Now in order that one proposition may be inferred from another, it is necessary that the two should have that relation which makes one a consequence of the other. When a proposition q is a consequence of a proposition p, we say that p implies q. Thus deduction depends upon the relation of implication, and every deductive system must contain among its premises as many of the properties of implication as are necessary to legitimate the ordinary procedure of deduction.2

The system Russell presents is to be a theory of the relation which holds between premises and conclusion of a valid argument; it is intended to legitimate the ordinary procedure of deduction. All this sounds a little strange

1 Symbolic Logic by I. M. Copi, fifth edition, New York, 1979, p. vii. 2 Principia Mathematical p. 90.

11

12 Normalization, Cut-Elimination and the Theory of Proofs

to our ears, accustomed as we are to enforcing a sharp distinction between relations and connectives. Clearly, however, it is Russell's conception of a formal system of logic which underlies the theory of strict implication, and the tradition lives on in current work on the logic of entailment. Nevertheless, it seems fair to say that it is no longer the orthodox one. In fact, we are so used to thinking of logical systems as attempts to replicate ordinary reasoning in a formal context that, unless we keep in mind the different aims which they were originally intended to serve,3 it will seem incomprehensible that the obvious disparity between formal deduction and informal proof was ignored for so long.

The first person to pay attention to it appears to have been Lukasiewicz.

In 1926 Professor J. Lukasiewicz called attention to the fact that mathematicians in their proofs do not appeal to the theses of the theory of deduction, but make use of other methods of reasoning. The chief means employed in their method is that of an arbitrary supposition. The problem raised by Mr. Lukasiewicz was to put those methods under the form of structural rules and to analyze their relation to the theory of deduction.4

Russell did not suppose that his particular axioms and rules coincided with those ordinarily employed in deduction, but he did claim that they were in some sense "sufficient for all common forms of inference."5 As Lukasiewicz points out, however, there is a difference in kind between the derivations of an axiomatic theory of deduction on the one hand and ordinary proofs on the other. The logical content of a mathematical proof is contained, for the most part at least, in the steps which lead from statements to their consequences. (The qualification is needed because an appeal to some logical principle—the law of excluded middle, for example—may occasionally be made as well.) Furthermore, these statements need not themselves be accepted as true, except for the sake of argument, nor are they in general valid propositions about logical relationships. Jaskowski, in the paper quoted above, proposed a solution to Lukasiewicz's problem; it took the form of a novel kind of logical calculus, one distinguished from axiomatic systems in allowing the introduction of arbitrary statements into a derivation—as

3This is not to suggest that there were clearly articulated rival views about the nature of formal systems, only that at one time logicians had a variety of different ambitions for them, not all of which were clearly distinguished, and including some that we are more likely to attempt to realize through the study of formal systems, rather than through the systems themselves.

4 "On the Rules of Suppositions in Formal Logic" by Stanislaw Jaskowski, Studia Logica, Vol. 1, 1934; reprinted in Polish Logic: 1920-1939, edited by Storrs McCall, Oxford, 1967.

5Principia, p. 90. He seems to have meant by this phrase only that, if C is commonly (and correctly) inferred from 5 i , . . . , 5 n , then a certain implication—roughly speaking, one equivalent to (Si A . . . A Sn) —* C—will be a theorem of Principia Mathematica.

Background 13

assumptions, hypotheses or suppositions—as well as by the use of rules which could discharge assumptions.

These are the characteristic properties of so-called natural deduction calculi, as opposed to Hilbert-style formalizations of logic. The latter term is the one usually employed to describe any axiomatic treatment of logic interpreted as a deductive engine, i.e., as an instrument for deriving the consequences of a set of statements. If Hilbert did not originate such axiomatic treatments, he did at least encourage this sort of interpretation of them.6 In particular, he seems to have thought that any step in an (informal) argument could be made to correspond to a series of steps in a formal derivation.

The transition from statements to their logical consequences, as occurs in the drawing of conclusions, is analyzed into its primitive elements, and appears as the formal transformation of the initial formulas in accordance with certain rules.7

Seen in this light, derivations are simply the formal counterparts of informal proofs and discrepancies between the two are reduced to matters of style. A formal derivation, because it makes explicit the "primitive elements" that combine to make up each step of the proof, will be less natural than its informal counterpart. Hilbert acknowledges as much when, at the conclusion of a general discussion about different axiomatizations of logic, he comments on a calculus devised by one of his assistants.

Finally, we mention, as a system which occupies a special place, the "Calculus of Natural Inferences," as set forth by G. Gentzen, which constitutes an attempt to make out of the formal deduction of formulas something more similar to the usual method of proof . . . , such as is customary, e.g., in mathematics. The calculus contains no logical axioms, but only figures of inference which indicate which inferences can be drawn from given assumptions, as well as figures which yield formulas in which the dependence upon the assumptions is eliminated.8

As is apparent from the brief descriptions given above, Gentzen's calculus is similar in kind to that of Jaskowski. It is presented in a paper entitled "Untersuchungen uber das logische Schliefien" which appeared in Mathe-matische Zeitschrift for 1935.9

6See, for example, Chapter II, Section 11 of Principles of Mathematical Logic by D. Hilbert and W. Ackermann, New York, 1950. This is a revised translation of the second edition of their Grundziige der theoretischen Logik.

7Ibid., p. 1. 8Ibid., p. 30. 9Gentzen's work appears to have been done independently of Jaskowski. The latter's

was published in 1934 although, according to Prawitz, it represents a revision of results obtained and announced in the late twenties. Gentzen's paper was submitted for publi-

14 Normalization, Cut-Elimination and the Theory of Proofs

Gentzen's describes the initial motivation for his work in the following terms.

My starting point was this: The formalization of logical deduction, especially as it had been developed by Frege, Russell and Hilbert, is rather far removed from the forms of deduction used in practice in mathematical proofs. Considerable formal advantages are achieved in return.

In contrast, I intended first to set up a formal system which comes as close as possible to actual reasoning. The result was a calculus of natural deduction (NJ for intuitionist, NK for classical predicate logic).10

Although Gentzen begins his paper with an account of this system of natural deduction and concludes it with a proof of its equivalence to more conventional formulations of first-order logic, the intervening parts are devoted to other topics. Here again is Gentzen.

A closer investigation of the specific properties of the natural calculus finally led me to a very general theorem which will be referred to below as the 'Hauptsatz'.

The Hauptsatz says that every purely logical proof can be reduced to a definite, though not unique, normal form. Perhaps we may express the essential properties of such a normal proof by saying: it is not roundabout. . . .

In order to be able to enunciate and prove the Hauptsatz in a convenient form, I had to provide a logical calculus especially suited to the purpose. For this the natural calculus proved unsuitable. For, although it already contains the properties essential to the validity of the Hauptsatz, it does so only with respect to its intuitionist form, . . . n

The calculus for which Gentzen proved his Hauptsatz is called the calculus of sequents (LJ for intuitionist, LK for classical predicate logic). I shall be concerned in what follows with the relationship between these two kinds of calculus and, in particular, with the significance of the Hauptsatz as it applies to both of them. As a preliminary, the present chapter contains a brief introduction to those parts of Gentzen's work relevant to this inquiry.

Assume a first-order language, say £, containing the usual logical symbols (V, A, —•, ->, V and 3) as well as a constant, _L, for falsity. In addition, C contains n-place predicate letters, for each n, individual variables and parameters, and whatever punctuation devices are needed. (For simplicity

cation in July of 1933. An English translation appears on pp. 68-131 of The Collected Papers of Gerhard Gentzen, edited by M.E. Szabo, Amsterdam, 1969. All subsequent page references will be to this translation. 10 Op. cit, page 68. 11 Ibid., pp. 68-69.

Background 15

and definiteness, assume it contains neither equality nor function symbols.) The well-formed formulas of C are defined in the usual way, with the proviso that variables may only occur bound in them. (The parameters and variables are supposed to form disjoint sets.) There will be no occasion to use the language £, only to talk about it and its expressions. So, the symbols used above should be thought of as metalinguistic names for the corresponding symbols of C. Upper-case letters, A,B,C,..., sometimes with natural numbers as superscripts, will range over formulas of £, lower-case letters a, 6, c , . . . from the beginning of the alphabet over its parameters, and lower-case letters x, y, z,... from the end of the alphabet over its variables. If a formula A contains occurrences of the parameter a, it may be written as A(a) and the (ill-formed) expression which results from replacing one or more occurrences of a in A by the variable x can be written as A(x)—provided that x is free for a in A(a). Names for the operators will be used to form complex expressions which range over various classes of formula in the usual way: A A B will range over conjunctions, etc. Furthermore, parentheses will be used where necessary to disambiguate these expressions.

Gentzen's natural calculi are systems for constructing derivations from assumptions. These are supposed to be tree-like structures whose root is their conclusion and whose topmost branches are assumptions. An occurrence of an assumption may be open or closed. The idea here is that the conclusion of a derivation will depend only on its open assumptions; the closed assumptions, having been introduced as auxiliary (open) assumptions in the course of the derivation, will have been discharged before the final conclusion is reached. An isolated assumption is treated as a trivial derivation whose root and only branch are the same, i.e., the assumption A is a derivation of the conclusion A which depends on itself. In addition, there are rules of inference of two kinds. The first kind allows the construction of a derivation of its conclusion which depends upon the same assumptions as the derivation(s) of its premise(s). Consider, for example, the rule which sanctions the construction of a derivation of the conjunction AAB from derivations of A and B: given two such derivations, their roots are transformed into lower branches and a new root, A A B, is joined to them from below.12 The second kind of rule is more interesting; it allows assumptions to be closed in the process of transforming the derivation(s) of its premise(s) into a derivation of its conclusion.

1 2It was Gentzen who originally described these derivations as having a tree form. Students of biology may object that, since they may lack a trunk, they would be better described as shrubs. There will be no occasion to consider any other kind of vegetation in the sequel, however, so this choice of terminology should not cause confusion. In graph theory a tree is usually defined more generally to be a connected graph without circuits. Derivation trees satisfy the additional requirement that, when a direction (up and down the page) is imposed upon them, they have exactly one root.

16 Normalization, Cut-Elimination and the Theory of Proofs

Conditional Proof is a good example of such a rule: it transforms a derivation of the conclusion B from the assumption A into one of the conclusion A —• B which no longer depends on A. Because of their tree structure, the derivations of a natural calculus may contain more than one occurrence of a given assumption. The question then arises as to whether all or only some of these occurrences should be discharged and, indeed, whether any assumptions need be discharged at all. The answer to this seemingly unimportant question, that the application of such a rule may discharge some, none, or all occurrences of the appropriate assumptions, is of some significance.13 It is convenient, therefore, to suppose that the occurrences of an assumption in a derivation are grouped into one or more classes (not excluding the empty one) and to stipulate that the application of a rule of the second kind discharges all the members of an assumption class.

The basic rules of inference for natural deduction are as follows:

(1) A B (2) a. AAB b. A A B AAB A B

(3) a. A b. B AVB AVB

(4) [A] [B] AV B C C

C

(5) [A) B

(6) A A^B B

A^B

(6)

(7) [A]

->A

(8) A ^A

(9) A(a) VxA(x)

(10) VxA(x) A(t)

(11) A(t) 3xA(x)

(12) [A(a)\ 3xA(x) C

c In each case, the premise(s) of the rule is (are) separated from its con

clusion by a horizontal line. The formulae in square brackets stand for the assumption occurrences, if any, discharged by the inference. (They are written above the premise in whose derivation they may occur.) In rules (10)#and (11) t, like a in rules (9) and (12), is supposed to be any parame-

13See "Assumption Classes in Natural Deduction" by D.Leivant, Zeitschrift fiir math-ematische Logik und Grundlagen der Mathematik, Vol. 25, 1979, pp. 1-4. The issue is not one of deductive strength, but has to do with properties of the procedure by which a derivation can be reduced to normal form. These will be discussed in the next chapter.

Background 17

ter of C. Given a language with function symbols, however, t could be any term, whereas a would still have to be a parameter, a is sometimes called the proper parameter or eigenvariable of the inference. It must satisfy the following restrictions: in (9), a cannot occur in the conclusion of the inference nor in any assumption on which the premise depends; in (12), a cannot appear in either premise of the inference, nor in any assumption on which the derivation of the premise C depends except for those of the form A(a). It is easy to generate invalid inferences by ignoring these restrictions on the proper parameter.

Writing the rules in two columns, as in the preceding paragraph, exhibits their symmetry. There is a pair of rules for each operator, * : the conclusion of the left member of the pair is a formula with * as its principal operator; the right member has such a formula as a premise. (This statement is slightly inaccurate, since (2) and (3) each comprise a pair of rules. For ease of expression, however, I shall continue referring to each as though it were a single rule.) The rule on the left introduces * into its conclusion; the rule on the right eliminates * from its premise. Following Gentzen, therefore, the odd numbered rules are usually called Introductions, and the even numbered ones Eliminations. When an elimination rule has a multiplicity of premises, the one whose principal operator is being eliminated is called its major premise; the remainder are minor premises. For example, (5) is called —•-Introduction, and (6) —•-Elimination—rather than conditional proof and modus ponens, respectively—and the major premise of (6) is the one of the form A —• B. It is on this symmetry that Gentzen's "very general theorem" depends. Note that, if -> is treated as a defined symbol with -u4 an abbreviation for A —• -L, the introduction and elimination rules for -i become special cases of the corresponding —• rules.

The system characterized by the rules given above is called minimal logic. It is weaker than intuitionistic logic as formalized by Heyting. The system NJ is obtained from it by adding the intuitionistic negation rule

_L_ A

that any conclusion follows from a falsehood.14 To obtain NK there are a number of different possibilities. Gentzen adds as axioms all instances of A V ->A, the law of excluded middle. He also considers adding instead a rule of double negation,

A

1 4 A is assumed to be distinct from ±. Nothing much depends upon this restriction, but it simplifies the statement of certain results below.

18 Normalization, Cut-Elimination and the Theory of Proofs

Prawitz, in his monograph,15 favors the classical negation rule

i A

And, finally, there is [A] hA] J? B__

B

This last is preferable in some respects.16 NK, without further qualification, will refer to any system obtained from NJ by adding one of the four principles or rules just described. Unfortunately, the addition of a negation rule, whether classical or intuitionistic, spoils the symmetry of the calculus.

NJ and NK can be shown to be deductively equivalent to any of the familiar axiomatic formulations of intuitionistic and classical logic, respectively, in the sense that:

(1) If there is an NJ [NK] derivation of the conclusion C from the assumptions A1,..., An, then (A1 A . . . A An) —> C is a theorem of intuitionistic [classical] logic.

(2) If T is a theorem of intuitionistic [classical] logic, then there is an NJ [NK] derivation of T from no assumptions.

These equivalences were first established by Gentzen for particular axiomatic systems due to Glivenko, and Hilbert and Ackermann.17 The argument in both directions is a straightforward induction on the length of derivations.

The relationship between introductions and eliminations is explained by Prawitz in the following terms.

Observe that an elimination rule is, in a sense, the inverse of the corresponding introduction rule: by an application of an elimination rule one essentially only restores what had already been established if the major premise of the application was inferred by an application of an introduction rule.18

Consider, for example, the derivation

iii n 2

_A B_ AAB

A

Natural Deduction: a proof-theoretical study, Stockholm, 1965. See Chapter 7 below. Gentzen, op. cit, pp. 115-131. Prawitz, op. cit.., p. 33.

Background 19

Its conclusion A is obtained by applying A-elimination to the premise A A B, which itself has been inferred by A-introduction It accomplishes no more, however, than the derivation III of A, and that in a roundabout way. A slightly more complicated example is provided by:

[A]

n2 III B A A^B

B This may be replaced by the derivation

IIi A

n2 B

obtained from n 2 by replacing each occurrence of the assumption A discharged by the indicated application of —•-introduction by a copy of the derivation IIi of A (or by II2 if no assumption was discharged). In each case, one derivation is replaced by another of the same conclusion from the same or fewer assumptions which is more direct in the sense that a detour, introduction followed by elimination, has been removed.

Another kind of detour may be introduced into a derivation by the intutionistic negation rule. This is so when the conclusion of the rule is itself the major premise of an elimination. Consider, for example, the derivations

n n _L IT -L

AAB A A^B B B

In either case, the simplification of the conclusion of the negation rule effected by eliminating its principal operator is more straightforwardly accomplished by inferring the simpler conclusion directly. In other words, either derivation could just as well be replaced by

n

B

Call an occurrence of a formula in a derivation maximal if it serves both as the conclusion of an introduction or the intuitionistic negation rule and as the major premise of an elimination. The above examples show how a maximal formula occurrence of the form A A B or A —• B may be removed from a derivation without changing its conclusion or increasing the number of assumptions on which that conclusion depends. Using these as a model, it is not hard to formulate similar procedures for each of the other logi-

20 Normalization, Cut-Elimination and the Theory of Proofs

cal operators, i.e., for removing maximal formula occurrences of the forms i V 5 , -\A, VXA{X) and 3xA(x).19 Note that a derivation terminating in a redundant application of 3- or V-elimination (i.e., one which discharges no occurrence of an assumption) whose major premise is maximal will be replaced by the derivation of (one of) its minor premise(s). A derivation whose last inference has a maximal major premise is sometimes called a redex, and the derivation by which it is replaced is its contractum. Ill is said to be a subderivation of II2 whenever the latter can be constructed from the former using the rules of inference, and EEi is said to reduce in one step to II2 if the latter can be obtained from the former by replacing a sub-derivation having the form of a redex by its contractum. Roughly speaking, a derivation is said to be in normal form if it contains no maximal formula occurrences. The normal form theorem for NJ asserts that every derivation can be reduced to normal form (where the reduction relation is the transitive, reflexive closure of one-step reduction). The proof proceeds by systematically removing maximal formula occurrences using the techniques illustrated above. Both the statement and proof of the theorem have been reconstructed by Prawitz from passing remarks to be found in Gentzerrs original paper.20

This deliberately simplified account of the matter will be elaborated in the chapters which follow. There are however a couple of points which deserve mention here. The first is that, in order to remove a maximal occurrence of A —• B or -1 A, it is necessary to know which occurrences of the assumption A are discharged by the relevant introduction inference. Similarly, to remove maximal occurrences of disjunctions or existential formulae, one needs to know which assumption occurrences are discharged by their respective elimination rules. Each derivation tree must therefore be such that this information can be read off from it. There are a number of ways to accomplish this. The one adopted in the sequel is to replace the formulae occurring at each vertex by indexed formulae; these will be expressions of £ with a numerical subscript. Assumption occurrences of the same formula with the same index are said to belong to the same assumption class; each application of —•-introduction, -^-introduction or 3-elimination will discharge all the members of at most one class, and V-elimination all the members of at most two. Classes will be identified by their numerical indices, so that a natural number (or two) beside the conclusion of an inference which discharges a particular assumption class will be sufficient to provide the requisite information. Unfortunately, this necessary device complicates the structure of the derivations.

The second point is more fundamental. The preceding account of reduction and normal form is adequate only for NJ~, the fragment of NJ

See Prawitz, Natural Deduction, pp. 35-38 for details. Prawitz, ibid., especially Chapter IV.

Background 21

without V and 3. (NJ~ is called the negative fragment of NJ.) Even a cursory inspection reveals that the eliminations for these two operators differ from the other rules: an application of either one yields a conclusion of the same form as its minor premise(s), albeit dependent on different assumptions. Hence, a derivation of NJ may contain a sequence of occurrences of the same formula—say, A\,... ,An—such that, for 1 < i < n, Ai+i is the conclusion of an inference with minor premise Ai. A sequence of this kind which is not properly contained in another such sequence is called a segment. It is useful to distinguish between four kinds of segment:

(1) If A\ is the conclusion of an introduction or the intuitionistic negation rule and

a. ^*-JI i s the major premise of an elimination, then it is called a maximal segment,

b. otherwise it is called an introduction segment. (2) If A\ is an assumption or the conclusion of an elimination and

a. An is the major premise of an elimination, then it is called an elimination segment,

b. otherwise it is called a minimal segment.

Suppose now that Ai,... ,An is a maximal segment, then (unless n = 1) none of its members is itself a maximal formula occurrence, even though the segment as a whole constitutes a detour of the same kind as a maximal formula would. It is desirable, therefore, that a normal derivation should contain no maximal segment. This necessitates the addition of reduction procedures for their removal. Prawitz's solution to this problem utilizes another distinctive feature of V- and 3-elimination, the fact that they can be permuted with other inferences without necessarily destroying the structure of a derivation. He introduces permutative reductions which diminish the length of maximal segments by permuting the elimination inference whose major premise is An with the application of V- or 3-elimination whose conclusion is An. For example, the derivation on the left below—assuming the lower occurrence of A A B belongs to a maximal segment—reduces to the one on the right

[C(a)] [C(a)\ iii n2 n2

3xC(x) AAB III AAB AAB 3xC{x) A

A A

and the other permutative reductions follow the same pattern.21 The result of n - 1 applications of this procedure will be a maximal segment of

21See Prawitz, op. cit., p . 51. There is a minor technical problem which may arise when an elimination rule with more than one premise is permuted with 3-elimination: the proper parameter of the inference may no longer satisfy the restrictions placed on it.

22 Normalization, Cut-Elimination and the Theory of Proofs

length 1, i.e., a maximal formula occurrence, which can then be removed in the usual way. Although the proof of the normal form theorem is not much complicated by their presence, these reductions introduce problems of another kind which will be discussed in the next chapter.

The virtue of normal derivations is that, for the most part, their elimination inferences come before their introductions. More precisely, define an ordering relation on the formula occurrences in a derivation as follows:

(1) A premise of an introduction inference or of an application of the intuitionistic negation rule precedes its conclusion.

(2) The premise of an application of A- or V-elimination precedes its conclusion.

(3) A minor premise of an application of V- or 3-elimination precedes its conclusion.

(4) The major premise of an application of —>- or -"-elimination precedes its conclusion.

(5) The major premise of an application of V- or 3-elimination precedes every assumption occurrence discharged by that inference.

(6) If A precedes B and B precedes C, then A precedes C.

According to the above, nothing precedes an open assumption occurrence and nothing is preceded by the conclusion of the derivation. Note also that nothing is preceded by the minor premise of an application of -i- or —•-elimination. Inferences can now be ordered by the precedence relation on their premises, the major premise in the case of an elimination. It then follows that, for normal derivations, no elimination is preceded by an application of the intuitionistic negation rule, no application of the intuitionistic negation rule is preceded by an introduction, and no elimination is preceded by an introduction. Using these facts, it is not difficult to verify that normal derivations have the suhformula property in the sense that each formula (not of the form ±) occurring in a normal derivation is a sub-formula of an open assumption or of the conclusion. If -> is defined in the manner suggested earlier, then the parenthetical qualification involving _L can be dropped.

An alternative description of the structure of normal derivations, which is more in the spirit of Prawitz,22 runs as follows. For any derivation II, let T\i be the family of subsets of vertices of n which are linearly ordered by the precedence relation, and call the maximal elements of Tu23 routes through II; routes containing the conclusion of II are called main routes.

In such a case, before permuting the inferences, one can simply replace it by a more suitable choice throughout the derivation of the minor premise. 22See Natural Deduction, Chapter IV, Section 2. 2 3These are maximal with respect to the inclusion relation on Tn'} they are not to be confused with maximal formula occurrences.

Background 23

Then each route through a normal derivation consists of three successive parts (some of which may be empty):

(1) an analytic part comprising formulae which belong to some elimination segment

(2) a minimal segment (3) a synthetic part comprising formulae which belong to some introduc

tion segment.

With the exception of the rules for negation, the conclusion of an elimination is always a subformula of its major premise and a premise of an introduction is always a subformula of its conclusion, it should be obvious therefore that every formula (other than _L, perhaps) appearing in a route through a normal derivation is a subformula of the first or last formula in the route. It follows that normal derivations have the subformula property described above, even though a route need not begin with an open assumption nor terminate with the conclusion. To see this, consider the other possibilities:

(1) The route begins with an assumption discharged by an application of -,_ o r —^-introduction. In this case, the initial formula will be a subformula of the conclusion of the inference and, hence, of the last formula in the route.

(2) The route terminates with the minor premise of an application of ->-or —^-elimination. In this case, the terminal formula will be a subformula of the major premise of the inference and, hence, of the initial formula in any route on which that premise lies.

(3) The route terminates with the major premise of a redundant application of V- or 3-elimination. In this case the route can contain only an analytic part and the terminal formula will be a subformula of the initial formula in the route.

Of course, in cases (2) and (3), the initial formula in question may have been discharged by an application of -i- or —>- introduction and case (1) will apply. On the other hand, in case (1), the last formula in the route may be the minor premise of an application of ->- or —•-elimination and case (2) will apply. Nevertheless, it is easy to see that no circularity can arise, and a straightforward induction on any reasonable measure of the complexity of a normal derivation (e.g., the number of inferences it contains) establishes that it has the subformula property.

Not only did Prawitz reconstruct Gentzen's original result for NJ, he also succeeded in extending it to classical logic, albeit at the cost of some artificiality. Utilizing the definability of V and 3, he was able to restrict attention to the fragment of NK without these operators and to prove a

24 Normalization, Cut-Elimination and the Theory of Proofs

version of the normal form theorem for the resulting system. The first step in the proof is to show that each application of the classical negation rule

hA] J_ A

can be replaced by one or more applications in which A is atomic. This is accomplished by means of a set of reductions which lower the complexity of A. For example, the figure on the left below reduces to the one on the right (provided that a is not the proper parameter of any inference in II):

[VxB(x)] B(a) hB(a)}

[-.VxB(ar)] J. n -.Vxfl(x) j . n

MxB(x) X Bja)

VxB{x)

The assumption occurrences VxB(x) are discharged by -"-introduction; the classical negation rule is used only to discharge -*B(a), a formula of lower complexity than -WxB(x). The reductions for A of the form -«B, B —• C and B A C are similar.24 (It is this step in the proof which fails when V and 3 are included.) Maximal formula occurrences can then be removed as before, and the resulting normal derivations will still have a form of the subformula property: each formula occurrence which is not of the form _L, nor an assumption discharged by an application of the classical negation rule, is a subformula of an open assumption or the conclusion.

Gentzen was less sanguine about the prospects for a useful normal form theorem for NK and, for this reason, abandoned natural deduction in favor of sequent calculi. He describes his aim as "to formulate a deductive calculus (for predicate logic) which is logistic on the one hand, i.e., in which the derivations do not, as in the calculus NJ, contain assumption formulae, but which, on the other hand, takes over from the calculus NJ the division of forms of inference into introductions and eliminations of the various logical symbols."25 Natural deduction systems have an inconvenient feature not shared by axiomatic (logistic) calculi: some of their rules— notably, those like —^-introduction which discharge assumptions and those like V-introduction which place restrictions on the proper parameter of the inference—are properly applied to the derivations of their premises, rather than to the premises by themselves. This introduces some additional complexity into the structure of derivations, and makes axiomatic calculi easier

24See Prawitz, op. cit, p. 40, for details. 25Gentzen, p. 82.

Background 25

to work with in some contexts. Nevertheless, this is not the crucial difference between N and L systems. Gentzen turned to sequent calculi because they enabled him to give a uniform treatment of classical and intuitionistic logic; in particular, no additional rules were required to obtain the former from the latter.

There are axiomatic systems related to NJ and NK in a very direct way, namely, those for deriving statements of the form "C is deducible by the rules of NJ (or NK) from the assumptions A1, . . . , An" By introducing a two-place relation symbol h (for deducibility) and interpreting the rules of natural deduction as clauses in an inductive definition of this relation, it is possible to transform NJ and NK into logistic calculi. The axioms will be of the form A h A, A is deducible from the assumption A. A-introduction will translate into

ThA A h f f r , A h A A 5 '

and A-elimination into the pair

T\-AAB , T\-AAB _ _ _ _ _ a n d _ _ _ _ _

Translation of the remaining rules is equally straightforward. Notice that —•-introduction becomes

r,A\- B rh A-+ B

so, to allow the requisite flexibility in discharging assumptions,26 it is better not to read T and A simply as sets of formulae. They are most conveniently handled as sets of indexed formulae, and will be so in subsequent chapters. For Gentzen, however, T and A were sequences of formulae, and it is his treatment that will be followed here. (I\ A is the concatenation of T with A, and likewise for I\ A.) He called expressions of the form F h A, and even T h A, sequents.

The calculi sketched in the previous paragraph, although related to Gentzen's LJ and LK, differ from them in a significant respect. The theorems of the L calculi are sequents, and their rules are inspired by those of natural deduction. In fact, each introduction rule of NJ is translated into a sequent rule in the manner described above. The elimination rules are treated differently however. Consider, for example, the A-elimination rule which asserts that A follows from A A B. This becomes inverted in the sequent calculus into a rule which asserts that whatever can be deduced from A can also be deduced from A A B:

A,r\-c AAB,T\-C

See note 13 above.

26 Normalization, Cut-Elimination and the Theory of Proofs

All the logical rules of a sequent calculus will be introduction rules, but the symmetry between introductions and eliminations will be preserved in the symmetry between rules which introduce an operator into a formula on the right of h and those which introduce the same operator into a formula on the left. It is not immediately apparent that a left rule has the same deductive strength as the elimination which inspired it. Consider an N derivation of the conclusion C from the assumptions A1,... ,An which contains an application of A-elimination. If that inference were to be removed, the derivation would split into two parts: the subderivation terminating in the premise of the A-elimination (AAS, say) and a derivation terminating in C, with the conclusion of the missing A-elimination (A, say) as an additional open assumption. In other words, A A B is deducible from Al,...,An

( r h i A B), and C is deducible from A, Al,...,An (A, A h C—where the terms of T and A are drawn from among A1,... ,An). The problem then is to put the derivation back together again, i.e., to infer the sequent r , A h C, using A-left. This rule will allow us to infer A A B, A h C from A, A h C, but something more is needed. It is for this reason that Gentzen introduced the cut rule:

$ h P P, 9 h Q

Applying cut to T \- A A B and A A B, A r- C as premises immediately

yields the desired conclusion.27 A sequent calculus will consist therefore of

(1) Axioms (2) Right (introduction) rules for the logical operators—which are noth

ing more than the translations of the introduction rules into sequent notation

(3) Left (introduction) rules for the operators—which are related to the elimination rules and from which their sequent translations can be derived with the aid of cut

(4) Cut and, in addition, some structural rules for manipulating the sequences to the left and right of h; these comprise rules for padding a sequence (thinning), rearranging its terms (interchange) and identifying occurrences of the same formula (contraction).

Here is a list of the axioms and rules for LK:

Axioms: AY- A

2 7In fact, it is even easier to derive directly the translation of A-elimination into sequent notation, using cut together with A-left, as the following shows:

A\- A T\-AAB AAB\-A

r\- A

Background 27

Logical Rules:

Right Left

T\-A,A n - A , s A,T\-A B,r\-A T I - A , y l A £ AAB,T\-A A/\B,T\-A

T I - A , A T I - A , B A , r i - A BJ^rA r\-A,A\/B T\-A,AVB

A,T\- A,B

r\- A,A-*B

i , r h A

Y\-A,A{a) * r\-A,VxA(x)

r\-A,A{t) _ r h A,3xA(x) 3xA(x),T\-A

* where a, the proper parameter, does not occur in T, A, or A(x)

Cut Rule: r h A, A A,Q\-V

A\ZB,T\- A

r\-A,A 5 , $ h $

A ^ B,r,<a\- A ,$

ri- A,A n^ri-A

A(t),T\- A VxA(x),T\- A

A(a), T\- A *

r , $ h A , f

iral Rules:

Right Left

Thinning rf-A

rh A,A

r i - A ,4,rh A

Contraction r\- A,A,A

ri- A,A A,A,T\- A

A,r\- A

Interchange r\-$,B,A,$

$,A,B,tf 1- A $ , £ , A , # h A

LJ is obtained from LK by requiring that the sequence on the right of a sequent have no more that one term. Note that empty sequences are allowed. There is no constant J_ for falsity, but an empty sequence on the right serves the same function: T h can be interpreted as "a falsehood is deducible from the assumptions in IY' The intuitionistic negation rule then disappears into an application of right thinning. There is no need to add a special classical negation rule. To see this observe that, if more than one term is allowed to the right of h, all instances of the law of excluded

28 Normalization, Cut-Elimination and the Theory of Proofs

middle can easily be derived as theorems:

A\- A f- A,-^A

h A,A\Z^A \-AV-*A,A

h A V ^A,A V -^A h Ay ^A

Every rule 1Z of the sequent calculus, with the exception of cut, is such that any formula which occurs in a premise of an inference involving 71 must be a subformula of some formula in its conclusion. This means that derivations which do not employ cut will have the subformula property, i.e., if d is a cut-free derivation of T r- A and A occurs anywhere in d, then A is a subformula of some formula occurring in T, A. Gentzen's Hauptsatz, the analogue for sequent calculi of his normal form theorem for NJ, asserts:

Every LJ or LK derivation can be transformed into another LJ or LK derivation of the same sequent in which no cuts occur.

The proof, although similar in kind to that of the normal form theorem, contains some additional complications. First, Gentzen found it convenient to replace cut by the mix rule:

r> A $H v r ,$ A hA A , t f

A and $ are each supposed to contain at least one occurrence of the mix formula, A, while A^ and $A result from them by removing all occurrences of A. In the presence of the other structural rules, mix is clearly equivalent to cut in deductive strength. He then described a procedure for systematically eliminating mixes from a derivation. Details of the proof can be found in Section III.3 of his paper.28 The conclusion of an application of mix where at least one of the premises is an axiom can be obtained directly from the other premise, using contraction and interchange if necessary. These mixes can therefore be eliminated without difficulty. The idea now is to transform all the mix inferences in a derivation into trivial ones of this kind by first moving them upwards and then replacing them, where necessary, by mixes applied to logically simpler formulae. A non-trivial mix can always be permuted with an inference immediately above it except when some occurrence of the mix formula on the right side of the left premise or the left side of the right one has itself been introduced by that inference. In such a case, the occurrence in question is said to be active m the inference. The procedure for simplifying or eliminating a

2 8 The proof is also reproduced in Chapter XV of Introduction to Metamathematics by S.C. Kleene, Amsterdam, 1952. The proof of a slightly stronger result for a version of LJ is sketched in Appendix A below.

Background 29

non-trivial mix depends therefore on whether its mix formula has an active occurrence. There are three possibilities.

(1) When there are no active occurrences of the mix formula in at least one premise, the mix can simply be moved upwards by permuting it with the inference which yielded that premise.29

(2) If the mix formula is active in both premises, but one of them was derived by the application of a structural rule, the structural inference can be eliminated and the mix either moves up automatically as a result or disappears in favor of applications of thinning and interchange.

(3) If the mix formula is introduced into both premises by a logical rule, there are two kinds of case to consider:

a. The premises also contain inactive occurrences of the mix formula. Consider, for example, a mix formula of the form A A B and suppose that this same formula occurs also in 3> below:

d2

r h A AAB,$\-V r,$AABh AAAB,V

This mix inference can be replaced by two new ones. First, mix is applied before the application of A-left to yield:

di d2

TV- A A,9\-V T,A,$AAB\- &AAB,®

Second, interchange is used to move A leftward—call the resulting derivation d$—and A-left is then applied to it. Next, mix is applied to remove the new occurrence of A A B:

ds di A,r,$AAB h AAAB,$

r h A A/\B,r,$AABhAAAB,9 r , rAAB, $AAB •" AAAB,AAAB, *

Finally contraction and interchange can be applied to this conclusion to produce T, $AAB •" &AAB, ^-

b. The premises contain no inactive occurrences of the mix formula. Consider by way of example the case in which the mix formula has -> as its principal operator. Then it must have been

2 9This account glosses over a number of complications which can arise. It may, for example, be necessary to apply structural rules to the conclusion of the mix after carrying out the permutation. Furthermore, if mix is permuted with an application of V-right or 3-left, clashes with the proper parameter of the inference may occur. These are matters of detail however which, although inconvenient, present no real difficulties.

30 Normalization, Cut-Elimination and the Theory of Proofs

introduced by -»-left in the right-hand premise of the mix, and by -i-right in the left one:

d\ d2

A,T\- A $ h f r , , 4

The above can be transformed into the following:

d2 d\ $\-$,A A,T\-A

# , r A f - ^ , A Since -u4 does not occur in A or $ , thinning and interchange are now sufficient to yield the original conclusion.30

It should be plausible that moving mixes upwards closer to the axioms, as in (1) and (2) above, is a step in the right direction. The same is true of replacing a mix inference by one or more mixes of lower degree (where the degree of a mix is defined as the logical complexity of the mix formula), as in (3b) above. This is because the logical rules are all introductions; hence an atomic formula occurrence can come only from an axiom or a structural inference, and a mix applied to such a formula will be eliminable. It may be doubted however that the kind of transformation described in (3a) accomplishes anything. The first mix at least is higher up the derivation than the inference it replaces, but the second is not. Nevertheless, it is of lower rank than the original mix. (The rank of a formula A occurring on the right [left] side of a sequent S in a derivation is defined to be the largest number of consecutive sequents in a path31 such that the lowest of these is S and A occurs on the right [left] in each of them. The rank of a mix is the sum of the ranks of the mix formula in its two premises.) Gentzen's proof proceeds by induction on the degree of a mix and, for fixed degree, by induction on rank. It is shown that a mix with no mixes above it in the derivation can be eliminated, and this result is then applied to remove all the mixes in a derivation from top to bottom.

The preceding is no more than a sketch of the proof of the Cut-Elimination Theorem; the interested reader can consult the references given earlier for a full account. Despite its added complexity, the argument does have the virtue of working for both classical and intuitionistic logic. Since it makes no reference to the distinction between LK and LJ, it applies equally to both calculi. To see this, it is enough to observe that the various transformations, permutations and eliminations of cut inferences on 3 0Appendix B below contains a full list of the various transformations by which a cut may be either eliminated or replaced by less complex cuts for a version of LK with indexed sets rather than sequences of formulae. 3 1A path here is just a chain <Si,.. .Sn of sequents such that, for 1 < i < n, St is a premise of an inference with conclusion <Si+i-

Background 31

which the proof depends will always yield an L J derivation, when applied to one—and similarly for LK. Within the context of intuitionistic logic, however, the normal form theorem for NJ is not only simpler to prove than the corresponding result for £J, but also more illuminating in some respects. Normal forms, unlike cut-free ones, are unique.32 Furthermore, the similarity between (fragments of) NJ and various term calculi (the calculus of A-conversion, for example) reveals a connection between normal form theorems for derivations, on the one hand, and for terms on the other. As a result, techniques devised for proving theorems of the latter kind have been carried over to the former. This has led not only to normal form theorems for stronger systems based on natural deduction, but also to proofs of stronger results for NJ itself. I have in mind here the so-called strong normalization theorem for NJ. Recall that the normal form theorem is proved by describing a class of reduction steps, specifying a systematic procedure for applying these to any derivation and showing that this procedure must terminate in a normal derivation. The strong normalization theorem asserts that, given a derivation II, any (sufficiently long) sequence of such steps applied to n must terminate in a unique normal derivation, the normal form of II. A more detailed comparison between the two versions of Gentzen's Hauptsatz is reserved for the next chapter.

3 2This is true without qualification for the fragment without disjunction, but redundant applications of V-elimination (if they have a maximal major premiss) create problems which can be solved only by some rather arbitrary restrictions on their removal.

2

Comparing NJ with LJ

The relationship between introductions and eliminations in natural deduction, on the one hand, and right and left rules in the sequent calculus, on the other, gives rise to a kind of structural similarity between the derivations of LJ and those of NJ. Unfortunately, the presence of left thinning in Gentzen's version of the sequent calculus makes it difficult to exhibit this feature. For purposes of comparison, therefore, it is convenient to depart from his original formulation and omit the rule. To compensate for this omission, however, it becomes necessary to strengthen A-right, —»-right, -i-right, V-left and 3-left in the following manner:

T\-A,A T'\-A,B (A),r\-A,B (A),rhA r,r'hA,AAB T\-A,A^B r n A , - ^

Q4),rhA (J?),r'hA Q4(q)),rhA* i v B , r , r ' h A 3xA{x),rt- A

* a cannot occur in T, A or 3xA(x)

The point of enclosing formulae in parentheses is to indicate that they need not actually occur in a premise for the rule to be applicable. Thus revised, the rules correspond more closely to their correlates in NJ.1 A version of LJ modified in this way is proposed by Zucker in his paper "The Correspondence between Cut-Elimination and Normalization." He also abandons sequences of formulae in favor of sets of indexed ones, which enables him to dispense with the rule of interchange and simplifies matters somewhat.2

1Here, A is supposed to be empty in the three right rules, and to contain at most one formula in the two left ones. This restriction is abandoned in the modification of LK considered below.

2 Annals of Mathematical Logic, Vol. 7, 1974, pp. 1-156. See Section 2.2. The other major differences between his calculus and Gentzen's are:

(1) Only atomic formulae may appear in the axioms. (2) Negation is defined in terms of falsity, and the left and right negation rules re

placed by axioms of the form I h P , where P is atomic (and different from _L).

32

Compar ing NJ wi th LJ 33

Now that sets have replaced sequences T, A stands for r U A, and r , Ai for T U {Ai}. This notation is not intended to imply either that T fl A = 0 or that Ai 0 I \ (When I do want to indicate that T and A are disjoint, or that Ai is not a member of T, I will use T; A and T;Ai, respectively.) Once these revisions are made, it is easy to define, by induction on the rules, a mapping </> from the derivations of LJ to those of NJ. Axioms are mapped onto single assumptions. A derivation terminating with an application of a right logical rule is mapped onto the result of applying the corresponding introduction to the image(s) under </> of the derivation(s) of its premise(s). A derivation terminating with an application of a left logical rule is mapped onto the result of applying the corresponding elimination upwards to each member of the appropriate assumption class in the image(s) under 0 of the derivation(s) of its premise(s).3 An application of right thinning corresponds to an application of the intuitionistic negation rule, and left contraction is the amalgamation of a pair of assumption classes, (j) can be described as a homomorphism from LJ to NJ since it preserves logical structure as embodied in the rules of inference. It should be obvious that, if d is a derivation of the sequent T h C, then (j)(d) is a derivation of C whose open assumptions are the members of

r.4

Although <f> is clearly not a one to one map, it is possible to establish a sort of converse to the above: each NJ derivation II of C from A1,..., An can be transformed into an LJ derivation </>'(II) of the sequent Al,...,An h C. (The Al's are supposed to be indexed formulae now.) Again, the definition is by induction on the rules. Assumptions correspond to axioms and introductions to right rules as before. The only difference concerns the eliminations. Each application of an elimination will translate into an application of the corresponding left rule together with cut.5 The properties of <\> and <j>' are sufficient to establish the deductive equivalence of NJ and LJ in the sense that: C is derivable in NJ from the assumptions A1,..., An iff A1,..., An h C is derivable in LJ. LJ is therefore an adequate formalization of intuitionistic logic.6

(3) There is no thinning rule. So the only structural rules are left contraction and cut.

Nothing much depends upon whether negation is defined and thinning omitted— provided that the corresponding modifications are made to NJ—but, unless non-atomic formulae are allowed in the axioms, Lemma 2.2 below will not hold.

3Upward application means adding the premise(s) of a rule, when its conclusion is already present as an assumption.

4In Section 2.4 of his paper, Zucker gives an exhaustive description of the mapping <j> for his versions of LJ and NJy and establishes its basic properties.

5See note 27 of Chapter 1 for the translation of A-elimination. The same principles apply in the remaining cases. Gentzen describes a similar transformation of NJ derivations into LJ ones in Section V of his paper.

6Although the two versions of LJ do not yield exactly the same set of theorems, it is obvious that r h C is derivable in the original calculus iff V h C is derivable in the

34 Normalization, Cut-Elimination and the Theory of Proofs

There is no difficulty about transforming NK derivations into derivations of LK. The law of excluded middle is a theorem of LK, and the various classical negation rules all translate straightforwardly into derived rules of LK. Everything else is similar to the intuitionistic case. If LK is revised along the lines of L J,7 it is even possible to extend (/> to LK in such a way that (f)(d) is an NK derivation of WA from the members of T whenever d is a derivation of T h A (where WA is a disjunction of the formulae in A, or just ± if A is empty). This extension is rather unnatural, however, and (j> will no longer be a homomorphism. In any event, LK (in its original version) is easily shown to be a complete formalization of classical logic in the sense that: T h A is derivable in LK iff AT —• WA is classically valid, (AT is any conjunction of the terms of I\ If T is empty, AT —• VA is just VA.)8

Although superficially different, the cut-elimination theorem is accurately described as the sequent analogue of the normal form theorem. This can be seen by comparing NJ with the revised formulation of L J. For the latter calculus, the cut-elimination theorem can be proved directly, without recourse to the mix rule. The proof is much the same as the one outlined at the end of the previous chapter. As before, there are three cases under which a non-trivial cut may fall:

(1) The cut formula is not active in at least one premise. In this case, the cut is simply permuted with the inference which yielded that premise.

(2) The cut formula is active in both premises, but has been introduced by a structural rule in at least one of them. Here there are only two possibilities:

a. The left premise is the conclusion of a right thinning. This case is easily handled by replacing both thinning and cut by a new application of thinning:

d r\- d'

T\-A A , A h B r , A h #

b. The right premise is the conclusion of a contraction. Here, the contraction is removed with the result that two cuts are needed, where only one was required before:

revised one, for some V whose members (ignoring indices) are included among the terms of T. Hence, the adequacy of the latter establishes that of the former as well.

7This just means omitting left thinning and adopting the five strengthened rules. There is no need to restrict right thinning since it has a translation in NK under the mapping suggested below.

8 The relationship between NK, LK and Hilbert's axiomatization of first-order logic is discussed in more detail in the final section of Gentzen's paper. See also note 11 below.

d

r\- B

Comparing NJ with LJ 35

d>2 d\ d2

di Aj,Aj,A\- B di T\-A Aj,AjA\-B T\-A Aj,A\-B F\-A r,Aj,A\-B

r , A h 5 T,A\-B

(Ai and Aj are supposed to be occurrences of the same formula with different indices. We may assume that Aj & I\) This is the only case in which using cut rather than mix creates a problem. It is solved by redefining rank for indexed formulae in such a way that, if Ai and Aj have ranks n and m respectively in the premise of the contraction, Aj has rank n + m -f 1 in its conclusion. Then both cuts on the right have lower rank than the one on the left.

(3) The cut formula is introduced into both premises by a logical rule. This corresponds to case (3b) of the original proof, and the cut is replaced by a cut or cuts of lower degree in the manner illustrated there. (Notice that case (3a) of the original proof cannot arise when cut is used instead of mix.)

The absence of left thinning means that the procedures by which a cut is eliminated or replaced by one of lower degree may produce a sequent which contains fewer formulae on the left than the conclusion of the original cut. When this is so, it is necessary to go through the sequence of inferences below the cut, deleting those which no longer apply (because the formula on which they once operated has disappeared). Zucker calls this deletion process pruning. As he points out, it is cumbersome to define pruning in full, although the idea behind the definition is simple enough; its clauses are determined by the rules of inference.9 The inductive argument now goes through as before and the theorem holds in the form:

A derivation of T \- A can be transformed into a cut-free derivation of T' h A, for some T' C I \

A cut which can be reduced to a trivial cut, and thence eliminated, by permuting it upwards past other inferences (as in case (1)) and by removing contractions (as in case (2b)) is said to be inessential Essential cuts, on the other hand, require the procedures described in (2a) and (3) for their elimination. Call a derivation almost cut-free if it contains no essential cuts, then the following hold:

Theorem 2.1 For every derivation d of LJ, d is almost cut-free iff </>(d) is normal.

Theorem 2.2 For every derivation U of NJ, U is normal iff <fr'(U) is almost cut-free.

A definition is given in Zucker, op. cit., Sections 3.1.5. and 7.8.3.

36 Normalization, Cut-Elimination and the Theory of Proofs

Theorem 2.1 holds because each axiom of d corresponds to some formula belonging to the minimal segment of a route through (p(d) and, roughly speaking, each such route is constructed from these formulae by applying eliminations upwards and introductions below them, as required. So, if the rules of LJ are interpreted as an inductive definition of derivabil-ity in JVJ, the cut-free rules can be interpreted as a definition of normal derivability. More formally, the proof of Theorem 2.1 depends upon the following:

Lemma 2.3 / / d' can he obtained from d by eliminating a trivial cut, or by a procedure which falls under cases (1) or (2b), then (j)(d) is normal iff (j){d') is.

Lemma 2.3 is proved by verifying that <p(d) = <j)(d'), except when d' comes from d by permuting a cut with an application of V- or 3-left in its left premise. In these two cases (/>(df) is obtained by moving a group of similar applications of V- or 3-elimination downwards past one or more inferences and amalgamating them into a single application of the rule.10 ("Similar" just means that the respective premises of each application have the same derivation.) But it is easy to see that this sort of permutation, in either direction, can never introduce a new maximal segment into a derivation. In fact, both <j>(d) and <t>(d') will have routes of exactly the same structure; the only difference will be that the segments which constitute the routes may have different lengths.

Now let d be almost cut-free and d' result from d by removing all the inessential cuts. A routine induction on the construction of d' (i.e., on the rules of LJ without cut) establishes that <p(df) is normal and hence, by Lemma 2.3, that (j)(d) is too. Conversely, suppose that d is not almost cut-free. By Lemma 3, we can suppose that d contains a cut which actually falls under case (3), or under case (2a) with the proviso that the cut formula in the right premise is active and has been introduced by a logical rule. It is a straightforward matter to verify that the subderivation which terminates with this cut is mapped by (j) onto a derivation containing a maximal formula occurrence. For example, if the cut formula were A A B, there would be two possibilities.

(1) The subderivation in question is of the form:

d\ di dz TV A A h f f A , $ h C T,A\-AAB AAB,$\-C

T , A , $ h C

10See Zucker, op. cit. It follows from his Theorem 4.1 (Theorem 2.6 below) that <f>(d) is unaffected when 3 and V are not involved, and his remarks in Section 7.2 are helpful for clarifying the effect of permuting cut with an application of V-left.

Comparing NJ with LJ 37

whose image under </> is the result of substituting

Mdi) 4>{d2) A B

AAB

for the indicated assumptions A A B in

AAB A

<t>{d3)

(2) The subderivation has the form shown on the left below, while its image under <f> is as shown on the right:

d\ d,2 <Kdi)

± AAB

A 4>{d2)

T\-AAB AAB,A\-C

<Kdi) ±

AAB A

4>{d2) T,A\-C

<Kdi) ±

AAB A

4>{d2)

The remaining operators present no special difficulties, so the result follows by a trivial induction on the number of inferences below the cut.

For Theorem 2.2, let us suppose first that II terminates with an elimination whose major premise belongs to a maximal segment (call such an inference a crucial elimination). We prove by induction on the length of the segment that 0'(II) terminates with an essential cut. If the length is 1, the cut in question falls under case (3) (or case (2a) with the same proviso as before). This is just a matter of checking the various cases. For example, if the last inference of II is an application of A-elimination and II has the form shown on the left below, </>f(U) will be as shown on the right:

ni n2 0'(no <//(n2) A. B_ r h A A\- B A\- A AAB r , A h i A B A/\B\-A

A r , A h A

If AAB had been introduced into II by an application of the intuitionistic negation rule, the case would be exactly analogous, with thinning taking the place of A-right in </>'(II). Now suppose that the segment has length n + 1 and let IT' be obtained by permuting the last inference of II with the application of V- or 3-elimination immediately above it, then <j)'(JI') is obtained by permuting the final cut of </>'(II) with the application of V-or 3-left whose conclusion is its left premise. Hence, the induction step follows by applying the induction hypothesis to the subderivation of II' which terminates with a crucial elimination whose major premise belongs to a segment of length n. The theorem now follows, in one direction, by a trivial induction on the number of inferences below the last crucial elimination of a non-normal derivation.

38 Normalization, Cut-Elimination and the Theory of Proofs

The proof of the converse is scarcely more interesting. A formula A occurring on the right of a sequent in a derivation d is said to have been introduced by axioms if d is an axiom, or the last inference 71 of d is an application of a left rule or cut and every occurrence of A on the right in a premise of TZ was introduced by axioms. The following can then be proved by induction on the construction of normal derivations:

Lemma 2.4 For normal U, if there is no main route through U which terminates with an introduction segment, then the formula on the right of the conclusion of 4>'(J1) was introduced by axioms.

It is also a straightforward matter to prove by induction on the construction of d\\

Lemma 2.5 If d is of the form

d\ d2

r , A h B

where A was introduced by axioms in the conclusion of d\, then the last cut of d is inessential.

The desired result now follows from Lemmas 2.4 and 2.5 by induction on normal II. There is nothing to show except when the last inference of II is an elimination. In this case, the last inference of </>'(II) will be a cut, and the derivation of its left premise will be ^(IT) , where II7 is the derivation of the major premise of the last inference of II. Since II is normal, it follows (by Lemma 2.4) that the cut formula in the conclusion of ^ ' (n ' ) was introduced by axioms and, hence, (by Lemma 2.5) that the cut is inessential.

Theorems 2.1 and 2.2 establish a correspondence between normal derivations and (almost) cut-free ones (for intuitionistic logic, at least), but they fall short of establishing a correspondence between the two versions of the Hauptsatz. The cut-elimination theorem is not simply a demonstration of the completeness of the cut-free rules. If it were, it could be proved more simply by semantic methods. In fact, one of the most attractive features of LK is that its rules, minus cut, can be shown to be complete in a very straightforward way.11 The proof exploits the fact that each logical rule can be read backwards as an attempt to refute its conclusion. Given a sequent, the rules are applied backwards in an effort to find an interpretation which falsifies all the formulae on the right while verifying those on the left. The result will be a search tree. A branch of the tree is closed if the same formula appears on both the left and right side of its final sequent. Now, if the tree terminates with all its branches closed, it can be transformed into a derivation of the initial sequent simply by inverting it. On

1 1 Since it is obvious that cut is classically valid, this provides a rather direct proof of the adequacy of LK.

Compar ing N J with LJ 39

the other hand, if it does not, a counterexample to the initial sequent can be extracted from it.12

What is missing from this proof of 'cut elimination' is the association of a particular cut-free form with each derivation. Whether this is regarded as a significant loss depends upon one's point of view. Many applications of the theorem remain unaffected. For these one needs only that cut-free derivations have the subformula property and that relatively weak principles are sufficient to show, for every derivation, the existence of a cut-free one with the same conclusion.13 It is sometimes claimed, however, that Gentzen's Hauptsatz is the first major theorem of general proof theory, the study of the notion of proof for its own sake—without restriction on the principles employed and without regard to possible applications. Although its significance may not be very well understood, it is thought to have an intrinsic interest beyond its applications to foundational problems. Furthermore, although it deals only with the combinatorial properties of formal derivations, its interest is supposed to stem from what it tells us about the proofs represented by these derivations. The analogy between derivations and terms mentioned at the end of the previous chapter, together with the strong normalization theorem, suggests that reduction steps for derivations may be like rules for computing the value of a function. On this analogy a derivation and its normal (or cut-free) form represent the same proof, the latter perhaps representing it in a particularly direct way (just as numerals, the normal forms of closed arithmetical terms, are related in a direct way to the natural numbers which they denote).

Implicit in this view is the idea that it is not merely the existence of normal (or cut-free) forms which is of interest, but also the procedure by which a derivation can be reduced to such a form. After all, to refer to a derivation d! as a normal form of d is to imply that d and d! are related by something more than their common assumptions and conclusion (since there are in general many other normal derivations which share these features of d' and which cannot be called a form of d in any sense); what distinguishes d' is that by carrying out a specific procedure d can be transformed into d'. It is not surprising, therefore, that the rough account of the relationship between the cut-elimination and the normal form theorems given above has been supplemented by a more refined analysis which compares the actual cut-elimination procedure for LJ with the normalization procedure for NJ. The most notable effort in this direction is to be

12 A good account of a completeness proof along these lines can be found in Chapter VI of Mathematical Logic by S.C Kleene, New York, 1967. To facilitate the proof, he considers a system G 4 which differs from LK in some minor respects. 13See Kreisel's remarks on pages 329 and 364 of his "A Survey of Proof Theory," Journal of Symbolic Logic, Vol. 33, 1968, as well as the first section of his "A Survey of Proof Theory II," Proceedings of the Second Scandinavian Logic Symposium ed. by J.E. Fenstad, Amsterdam, 1971.

40 Normalization, Cut-Elimination and the Theory of Proofs

found in Zucker's paper, "The Correspondence Between Cut-Elimination and Normalization," cited above.

Zucker considers first the negative fragments NJ~ and LJ~ of these calculi. In LJ~ he distinguishes two kinds of conversion step:

(1) Permutative Conversions—These allow cuts to be permuted upwards past other inferences, contractions to be permuted downwards past other inferences and trivial cuts to be eliminated.

(2) Essential Conversions—These replace cuts by cuts of lower degree.

In NJ~, of course, there is only one kind of conversion, namely removing a maximal formula occurrence. The equivalence relation = on the derivations of LJ~ generated by the permutative conversions he calls strong equivalence and proves:

Theorem 2.6 For all derivations d,d' of LJ~, d = d' iff </>(d) = (/)(df).

Let d, d',... range over the derivations of ZJ~, and II, I I ' , . . . over those of NJ~. d y\ df means that d' is obtained from d by applying an essential conversion, and II y\ II' means that II' is obtained from II by replacing a redex by its contractum. y is the transitive closure of >-i.

Theorem 2.7 (Zucker)

(1) Ifd yx d', then 0(d) y <f>(d').

(2) If(t>(d) yx (j)(df), then d* yx d' for some d* = d.

A reduction sequence in either calculus is a sequence of derivations each term of which—apart from the last—converts (or reduces) to its successor. Obviously not every reduction sequence in LJ~ is finite—after all, successive cuts can be permuted with one another ad infinitum, but Zucker shows that there are only a finite number of distinct derivations to which a given d can convert by a series of permutative reductions. A reduction sequence in LJ~ without infinite repetitions is said to be proper. Using Theorems 2.6 and 2.7 above, we can easily correlate reduction sequences in LJ~ with ones in NJ~ and vice versa. In general these will not correspond term by term but, if S = (do,di,. . .) is a reduction sequence, we can construct a reduction sequence $(S) = (n 0 , I I i , . . . ) with (j>(do) = IIo and (fi(di) = Tlj, for some j , and conversely. It follows from the proof of Theorem 2.1 above that each essential conversion corresponds under <p to one or (finitely many) more reduction steps and, hence, that a proper reduction sequence S is infinite iff $(S) is infinite. Using this fact and Theorem 2.1, Zucker infers the following results from the corresponding ones for NJ:

Theorem 2.8 (The Strong Cut-Elimination Theorem for LJ~) Every proper reduction sequence in LJ~ is finite.

Theorem 2.9 (Church-Rosser type Theorem for LJ~) Ifd reduces to both d! and d*, where df and d* are cut-free, then d! = d*.

Comparing NJ with LJ 41

He also points out that these results in turn imply their analogues for NJ~ so that strong normalization for NJ~ is equivalent to strong cut-elimination for LJ~, and uniqueness of normal forms for the derivations of NJ" is equivalent to the uniqueness of cut-free forms (up to strong equivalence) for LJ~~ derivations.

These results are described in some detail because they are paradigms of the kind of results we would like to obtain for NJ and LJ. They conform to our expectations about the relationship between the sequent calculus and natural deduction as expressed by the mapping <f>. It is all the more disappointing, therefore, to find that they do not extend in a straightforward way to the full calculi. The problem arises with the conversion steps which allow a cut to be permuted upwards past an application of 3- or V-left in the derivation of its left-hand premise. As remarked earlier, these do not in general correspond to permutative reductions of the sort prescribed by Prawitz for NJ. Furthermore, Zucker shows that they give rise to infinite proper reduction sequences unless restrictions are placed on them. Before describing his example of such a sequence, I want to introduce a brief digression on the subject of indices.

Zucker's system of indexing formulae is gratuitously complicated and I have deliberately avoided giving an account of it. It does have one feature, however, that must be mentioned here. He insists that each formula introduced into a derivation have its own unique index, so that formula occurrences can only be identified with one another by applying the contraction rule. This means that the derivations of the two premises of an application of cut, for example, must have disjoint sets of indices. This restriction poses no real problem, however, because derivations which differ only in the naming of their indices are said to be congruent and for all practical purposes are treated as identical. Given a derivation di, let us write d\ for some derivation which is congruent to di but has no index in common with it (or with any derivation to which d* may be joined). Then, for example, we can write the conversion rule for permuting a cut with a contraction in its right-hand premise as follows:

, di d A R R V- C *% Am^ B Bj,Bj h C

Am L AmhC

(I have omitted the additional formulae which may appear on the left of the conclusions of d, d1? etc., for the sake of simplicity, and will continue this practice below.)

What follows is a slightly simplified version of Zucker's example of a

42 Normalization, Cut-Elimination and the Theory of Proofs

non-terminating proper reduction sequence:14

(2.1) di d2

d4 ds Aj\- E Bj\- E dz

Cn<rF DpV- F A v B f e h E EhFm^G C\/Dq\~F A\J Bk,Fmy-G

CVDq,AvBk\-G

(2.2) di d3 d2 dl Aj\-E EuFmY-G BjhE Er,Fs^G

d* dh Aj.FmV-G Bj,Fs h G Cn'r F Dp\-F A\JBk,Fm,FsY-G

C V Dq\~F AV Bk,Fm V- G C v DqiAvBkhG

(2.3) di d2 d2 d*z

d4 di AjhE Ei,Fm^G Bj*rE Er,F$hG d\ dl CnVT DVVT Al,Fm\-G B^Fs h G

C V Dw\-F CvDq,AvBk ,fs\-G cww1cvDq,AvBk \-g

CvDq,AVBk\-G

(2.2) is obtained from (2.1) by permuting the uppermost cut with V-left and (2.3) from (2.2) by permuting the bottom cut with contraction. Since the part of (2.3) written in calligraphic characters has the same form as (2.1), we can now repeat the procedure using it in place of (2.1), and so on ad infinitum.

Zucker concludes from this example that any set of conversion rules for NJ which has the strong normalization property cannot correspond to the natural cut-elimination procedure. He does describe a more restricted set of rules for permuting cut with V- and 3-left. These preserve the correspondence between reductions in the two systems but, as he rightly points out, they are ad hoc. In fact, they scarcely deserve to be called permutative conversions because the transformations they specify are quite complicated and are defined by cases according to the principal connective of the cut-formula involved.

The exact significance of the results quoted above is a matter for debate. Zucker himself interprets them as demonstrating a failure of correspondence which "shows that there is indeed a combinatorial difference between sequent calculus and natural deduction, at least with regard to reduction procedures"; he even raises the possibility that "there may be (meaningful) properties of proofs which are preserved by all reductions in NJ but

14Throughout this work the subscript on a logically complicated formula is associated with the formula as a whole, not with its rightmost component. So, for example, A V Bk

should be read as (Av5) f e , not as AV(Bk), and similarly for the other logical particles.

Compar ing NJ with LJ 43

not in LJ."15 On the other hand Pottinger, in his paper "Normalization as a Homomorphic Image of Cut-Elimination,"16 asserts that these same results provide a positive answer to the "question whether Cut-elimination procedures in L-systems are 'really the same thing as' normalization procedures in natural deduction systems." (In fact, Pottinger deals only with intuitionistic propositional logic, but his remarks apply equally well when quantifiers are included.)

There are really two issues raised by Zucker's work: one is the failure of strong cut-elimination for LJ, and the other concerns the relationship between cut-elimination and normalization. As regards the first, I tried to indicate above how Zucker's counter-example depends upon a special feature of his indexing system. If his calculus is modified so that a formula occurrence need not be introduced with a new index, his construction can no longer be carried out. In fact, as Dragalin has shown, strong cut-elimination does hold for the full sequent calculus (classical as well as intuitionistic) relative to a 'natural' set of conversion rules.17 Cuts can be permuted upwards past any other kind of inference without restriction, except that there should be no 'clash' of indices. To see what this means consider, for example, a figure of the form

(2.4) d!

d A ; W C Meft Y;AP\-B &,Af\Dr',Bq\-C

T,A,AADr;Ap \~C cut

We cannot permute cut directly with A-left since this would result in the occurrence of Ap in the conclusion of d being incorporated into the premise of A-left. To avoid this, we pick some index m which occurs nowhere in d or d', and replace the latter by a derivation df* obtained from it by replacing all occurrences of Ap by Am. So the rule for permuting cut with A-left (in its right-hand premise) will be that (2.4) converts to

(2.5) d df*

cut ;AphB A;Am;BqhC

r , A; Am; Ap r- C , f

T,A,AADr]Ap\-C

Such a restriction is quite natural if we bear in mind that conversions of this kind are intended to be simple permutations of inferences which should in all other respects leave the derivation unchanged. It is analogous to the requirement that, when a term is substituted for a variable in some expression, it be free for the variable in question.

15Zucker, op. cit., Section 1.5.1. 16 Annals of Mathematical Logic, Vol. 12, 1977, pp. 323-357. 17See Appendix A below for further discussion of this issue and a sketch of his proof.

44 Normalization, Cut-Elimination and the Theory of Proofs

In view of what was said earlier about permuting cut with V- and 3-left, we cannot expect a correspondence between this sort of cut-elimination procedure and the usual normalization one. In fact, when = is interpreted as the equivalence relation generated by this new set of permutative conversions, Theorem 2.6 fails in both directions for the derivations of LJ. Furthermore, the normalization procedure obtained by translating it via cj) into NJ is on the face of it neither particularly natural nor convenient. Nevertheless, the fact that strong cut-elimination holds relative to a natural set of conversions has some bearing on the second of the two issues mentioned above, that of the correspondence between cut-elimination and normalization.

The customary normalization procedure for NJ is accepted uncritically in Zucker's work, and its image under <\> investigated. His belief that strong cut-elimination fails relative to the natural set of conversions for L J derivations is no doubt the reason why he does not consider whether there is a reasonable modification of this normalization procedure which corresponds to the cut-elimination one; it also leads him to his pessimistic conclusion about the possibility of a correspondence. There is, however, unquestionably an ad hoc character about the permutative reductions allowed in normalizing a derivation of NJ. Consider, for example, the following three pairs of figures:

(2.6)

(2.7)

(2.8)

[5(a)] n

[B(a)\ n

A(x) 3zB(z) VxA(x)

VxA(x) My)

A(x) VxA(x)

3zB(z) A(y) A(y)

[5(a)] n

[B(a)} n

A{x) A(x) 3zB{z) VxA{x)

VxA{x) \/xA(x) V C

VxA(x) 3zB(z) VxA(x)VC

VxA(x) V C

[B{a)\ n

\B(a)} n

~ixA{x) A C 3zB(z) VxA(x)

VxA(x)

Vx^(x) A C Vx^l(x)

3zB(z) A(y) A(y) A(y)

Anyone who knows the proof of the normalization theorem will recognize that the figure on the left in (2.6) must reduce to the one on the right, but it is almost impossible to imagine that there exists any general theo-

Compar ing NJ wi th LJ 45

retical explanation of why this should be so in (2.6), but not in (2.7) or (2.8) (or in (2.6) and (2.8), but not in (2.7), if one accepts Martin-Lof s suggestion about normal forms18)—let alone one couched in terms of the properties of proofs represented by these derivations. It should not surprise us, therefore, that these features appear even more conspicuously ad hoc when they are translated into a cut-elimination procedure. We can define what it means for a formula occurring on the left of a sequent in an LJ derivation to be 'the major premise of an elimination', and allow a cut to be permuted with an application of V- or 3-left in the derivation of its left-hand premise only when the occurrence of the cut-formula in its right-hand premise satisfies this definition. Of course, the definition is very ad hoc, and the cut-elimination procedure which results can be tortuous (since it requires us to allow the permutative reductions to go from right to left as well as from left to right, i.e., to treat them as symmetric permutations rather than asymmetric reductions). But, reinterpreted in the appropriate way, Theorems 2.6 and 2.7 above will now hold for all of NJ and LJ. Conversely, if we start with the cut-elimination procedure for LJ, we can obtain a correspondence with a non-standard normalization procedure for NJ. The situations are not quite symmetrical, however, because the cut-elimination procedure outlined above does seem natural, and lacks features which are obviously ad hoc. Unfortunately, this impression of naturalness derives in part from the fact that it is well-adapted to certain combinatorial peculiarities of the sequent calculus which NJ does not share.

Faced with a situation of this kind, one can either suppose that there are significant differences between cut-elimination and normalization, and search for a more general framework within which these can be understood, or search for a framework within which such differences are explained away. It is my intention to pursue the latter course. The anomalies revealed by Zucker's work seem to lack intrinsic interest, and they pale beside the striking similarities between cut-elimination and normalization. I suggested above that the treatment of V and 3 in NJ is unsatisfactory even if the problems it creates for the correspondence are ignored. In the next chapter, therefore, I propose to take a closer look at natural deduction calculi to see how a revised treatment can be justified without reference to normalization. After that, I will consider what kind of correspondence with cut-elimination can be established for the revised calculus.

18See pp. 253-4 of "Ideas and Results in Proof Theory" by D. Prawitz, in Proceedings of the Second Scandinavian Logic Symposium ed. by J.E. Fenstad, Amsterdam, 1971.

3

Natural Deduction Revisited

Granted that Gentzen's N systems are natural—both in the sense that their derivations correspond quite closely to informal patterns of argument and in their treatment of the logical connectives, this does not preclude the possibility that other formulations of natural deduction are equally (if not more) so. These calculi do not correspond so closely to informal reasoning as to allow no room for variation. Furthermore, not every change in the formulation of their rules will affect the meanings of the connectives, and some may give rise to systems possessing certain formal advantages. Gentzen himself was the first to utilize this fact when he devised sequent calculi as a means of proving his Hauptsatz.

Kreisel has remarked that "we can hardly hope that existing formalizations [of natural deduction] are exactly right for the new applications . . . After all, the systems were developed for other reasons, logical or aesthetic."1

It is doubtful, for example, that Gentzen was interested in what properties of derivations (apart from the obvious one—that their conclusions remain more or less unchanged) are preserved by his reduction steps. Had he been, he would scarcely have introduced the mix rule as an auxiliary, justifying it only by a remark to the effect that in the presence of the other structural rules it is deductively equivalent to cut. (For we can show that any two derivations of the same end-sequent—and many more besides— are 'equivalent' if we combine Zucker's ideas about strong equivalence with the view that derivations which have the same translation in the calculus with mix are equivalent.) It is equally doubtful whether he was overly concerned with the relationship between combinatorial properties of derivations and structural properties of proofs, if indeed he thought in such terms at all.

It is true that Gentzen prided himself on the similarity of his rules to the steps involved in actual reasoning, but this is a slightly different issue. On the few occasions he does consider the formal structure of derivations, he talks mostly about what he takes to be its artificial features. For example,

1 "A Survey of Proof Theory II," page 114.

46

Natural Deduction Revisited 47

he points out2 that in a proof by cases (V-elimination) the tree form of derivations "does not bring out the fact that it is after the enunciation of [the disjunction].... that we distinguish the cases." As a matter of fact, NJ is not a particularly plausible candidate for a calculus the form of whose derivations accurately reflects the structure of the proofs they are supposed to represent. This last assertion is not based upon a particular conception of the nature of proofs, just on some general considerations about what constitutes an accurate representation.

A derivation of NJ is a graph with a formula at each vertex, so it must represent an array of what these formulae stand for—propositions, formulae of some other language, judgments, mathematical constructions, or whatever—structured by some logically significant relation, which might be called 'immediately follows from'. The direction of the edge relation represents an ordering of logical rather than temporal priority (otherwise its transitive closure would surely have to be linear), and a fortiori indicates the dependence of each formula occurrence on its predecessors. Of course, this account of the matter is somewhat oversimplified if only because some connectives, notably V and —> , by virtue of their meaning require more information about dependence to be represented than the edge relation alone can convey. It is nonetheless roughly correct for derivations in the negative fragment. The elimination rules for V and 3 , however, do not fit comfortably into this pattern. The edges joining premises and conclusion in an application of either one of them do not exemplify the relation 'immediately follows from', nor does the ordering any longer reflect only logical priority. In the left-hand figure of (2.6) above, for example, the logical relationship between A(y) and 3zB(z) is the same as that between the lower occurrence of VxA(x) and 3zB(z). A kind of temporal order has now been introduced into the structure of derivations. (Significantly, these are the only two rules which may be permuted with the others in a derivation without destroying it.) In addition, both rules allow vacuous applications in a sense that the others do not; continuing the previous example, if no occurrences of B(a) are discharged by the inference, it is not clear that there is any significant logical connection between 3zB(z) and the remaining formulae occurring in the figure.

The preceding remarks are not intended as criticisms of the elimination rules for V and 3 . One could well argue that rules of their form provide a more adequate means of formalizing the notion of proof. The point is rather that, if we take seriously the idea of a structural similarity between derivations and proofs, inferences need to be represented by formal rules in a uniform way, unless there is some compelling logical reason which justifies the differences. But it is difficult to argue that there is something about the meanings of V and 3 which explains the distinctive features of their

2 "Investigations into Logical Deduction," page 79.

48 Normalization, Cut-Elimination and the Theory of Proofs

elimination rules: why the order in which they are applied matters more than in the cases of the other connectives, for example, or why they allow vacuous applications. In consequence, it seems reasonable to conclude that there are combinatorial features of NJ derivations which have no analogue in the proofs they represent.

It is also not my intention to challenge Gentzen's analysis of the logical particles. His analysis, however, does not determine uniquely what form the rules must take. We can distinguish three aspects of a rule which may vary while its logical content remains fixed: its formulation, its structural effect, and its manner of application. These terms are somewhat imprecise, but they are meant to be suggestive rather than scientific. The following examples should give an idea of what I have in mind:

(1) As mentioned in Chapter 1 above, NJ can be formulated as a sequential system by writing the assumptions on which each formula occurrence depends to its left (followed by a turnstile). Then A-introduction, for example, becomes A-right, and A-elimination becomes the rule

T\- AAB T\-A

(or the same rule with conclusion T \- B). As Prawitz remarks, this is just a trivial reformulation of the system and is not to be confused with the calculus of sequents LJ. Equally, we could reformulate the rules of LJ after the pattern of natural deduction rules.

(2) The schematic description of a rule does not by itself tell us what sort of structure results from applying it. It has often been pointed out, for example, that the rules of LJ can be interpreted as rules for constructing derivations of NJ, in addition to their intended interpretation as rules for constructing trees labelled by sequents. 3-elimination provides another example. Instead of interpreting it in the usual way, we could specify that the tree resulting from an application of this rule is obtained by placing a copy of the derivation of the major premise above each assumption discharged by the inference (in the derivation of the minor premise).

(3) Even allowing for differences of kinds (1) and (2), it is clear that left rules and eliminations are not the same. If we reformulate A-left, for example, as a natural deduction rule, it becomes a sort of upward version of A-elimination. The difference is not a matter of meaning, but rather of how the rule is applied. Prawitz, taking his cue from the rules for V and 3 , has given a more general formulation of what constitutes an elimination. Using it, we can write Gentzen's rule for inferring from a conjunction as

[A]

n n' AAB C

c

Natural Deduction Revisited 49

Now, A-elimination is the special case of this rule in which II' is empty, and A-left the case in which II is empty. Differences of this kind may affect the deductive strength of a calculus, but it is misleading to think of one form of the rule as being stronger or weaker than the other.3

When the rule for inferring from an existential formula is rewritten so as to agree in its formulation and manner of application with the rules for the other connectives (excluding V), the result is a rule of existential instantiation, i.e., something of the form

3xA(x)

~MbT Unfortunately, a rule of this kind has its drawbacks. Although it can be given a convenient formulation in classical logic by introducing a function from formulae to variables as an auxiliary syntactic device—a point I shall return to later—in the intuitionistic case awkward restrictions must be placed on some of the other rules ( V- and —•-introduction, at the very least) to ensure validity.4 Furthermore, even a deductively satisfactory formulation leads to problems when normalizability properties are taken into account. If the parameter b is supposed to depend only on the premise 3xA(x), the normalization theorem does not even hold, as the following example shows:

3Later on, we shall encounter an example of a calculus which is incomplete, not because its rules are too weak, but because the conventions governing their application are inadequate. In the present case, the cut-elimination theorem ensures that no such problem can arise. Nevertheless, we have the impression that left rules are weaker than eliminations because not every NJ derivation corresponds to a derivation in LJ without cut. To see why this is misleading, it is helpful to think of left rules as eliminations applied upwards. Applying eliminations and introductions downwards, as in NJ, yields a larger class of derivations than applying eliminations upwards and introductions downwards, as in LJ, unless the latter calculus is supplemented by a cut rule. We can, however, also imagine introductions being applied upwards—although it then becomes more difficult to formulate the restrictions necessary to ensure their validity—and consider a calculus in which eliminations and introductions are both applied upwards; its derivations will just be those of NJ. Finally, if eliminations are applied downwards and introductions upwards, the resulting calculus is incomplete without the addition of a structural rule of some kind. (This last idea may seem far-fetched, but as a matter of fact tableaux proof procedures are of this form.)

The above considerations suggest that, although the convention that all rules can be applied in the same direction is less restrictive than the alternatives, there is no sense in which an elimination is stronger than its corresponding left rule or vice versa.

4See derivation (5.2) below for an example of why some restriction on —•-introduction is needed. The case of V-introduction is a little different. In both classical and intuitionistic logic, some device is needed to block the inference from 3xB(x) to VxB(x). There is a simple and elegant way to do so for classical logic, but it depends upon interpreting ->V as equivalent to 3-i and hence cannot serve for the intuitionistic case. (See Chapter 5 below.) The only other possibility seems to be the clumsy flagging restrictions on variables which are usually incorporated into systems of natural deduction containing a rule of existential instantiation.

50 Normalization, Cut-Elimination and the Theory of Proofs

3x(A(x) A B(x)) A{a) A B{a)

A(a) 3xA(x) 3xA(x)

A(b) A(b)

n rr C(b)

(Here, and below, all the occurrences of b shown are supposed to be connected.5)

If we stipulate that b depends on the derivation of 3xA(x) as well, the following example shows that the normalization theorem will only hold if we allow reduction steps which simultaneously eliminate more than one maximal occurrence of a formula.

n n A(a) A(a)

3xA(x) 3xA(x)

III I l 2 _ C(b)

(This is so even if we allow interreducible derivations of 3xA(x) to determine the same parameter.) It seems, therefore, that we would be ill advised to replace 3-elimination by a rule of existential instantiation. Our aims will be better served by simply altering its structural effect in the manner suggested above.

For the time being, we propose to exclude redundant applications of 3-elimination and restrict our attention to the fragment of NJ without

5Two occurrences of a term t in a derivation are said to be connected if there exists a ^-connection between the formulae in which they occur. A ^-connection is a connection each of whose members contains t, and a connection is a sequence ai,...,an of formula occurrences such that for a l i i (1 < n) one of the following conditions hold:

(1) cti and a^+iare both premises of an application of —•-elimination. (2) oti lies immediately below OL% + I or vice versa.

(3) a.i and a i + 1 are both discharged by the same inference.

(4) oti is discharged by an application of —^-introduction whose conclusion is a i + i , or vice versa.

When we are dealing with NJ or fragments thereof (as opposed to a calculus of the sort considered in the text), this definition must be modified so as to conform to Zucker's. (See Def. 2.5.1 on page 34 of his paper.) In effect this means qualifying clause (2) with the proviso that ct{ not be the major premise of an application of V- or B-elimination, and adding the clause: c*i is the major premise of an application of V- or 3-elimination and ct{+i is discharged by the inference, or vice versa.

Natu ra l Deduct ion Revisited 51

disjunction (Nj( v^). Consider a calculus which is like Nj( v) except that

[Mb)] ni n2

3xA{x) C C

instead of being the tree obtained from IIi and n 2 by adding a new vertex labelled C below their bottom vertices, is obtained by placing a copy of III above each assumption discharged by the inference and connecting each conclusion 3xA(x) with the corresponding occurrence of the assumption A(b). (Call this calculus JVJ<-V>'.)

As I remarked earlier, it is a feature of 3-elimination that an application can be permuted with other inferences without destroying the derivation in which it occurs. The only restrictions are: (1) that it cannot be permuted downwards past an inference which discharges an assumption in the derivation of its major premise, and (2) that it cannot be permuted upwards past an inference whose premise(es) contain the proper parameter of the elimination. To minimize the inconvenience of (2) and to ensure that the proper parameter property,6 for example, is preserved by such permutations, let us adopt the convention that derivations which result from one another by relettering all the connected occurrences of a parameter (subject, of course, to the obvious restrictions on such substitutions, and provided that the open assumptions and conclusion of the derivation remain unchanged) are to be identified. This means, for example, that the following two derivations of NJ^~^ can be obtained from each other by permutation:

[A(b)\ [A(b)\ [A(b)\ [A(c)} ni n2(6) rix n2(c) C D 3xA(x) C 3xA(x) D

3xA{x) CAD _C D_ CAD CAD

where c is a parameter which occurs nowhere in 112(b). Another example of such a pair is:

[Mb)} 1Mb)] ni n;

C{b) 3xA(x) C(d) 3xA(x) 3yA(y) C(d)

3yC(y) 3yC(y)

6This is explained for both natural deduction and sequent derivations in Zucker's Definition 2.5.1 under the name of the proper variable property. For the former it means that, if a occurs as the proper parameter of an inference, then all the occurrences of a in the derivation are connected to each other. For the latter, it just means that every occurrence of the proper parameter of an inference in a derivation occurs above that inference.

52 Normalization, Cut-Elimination and the Theory of Proofs

where d is a parameter occurring nowhere in IIi, and Ili is obtained from III by substituting d for all occurrences of b connected to the one in the conclusion C(b) (provided that the two occurrences of b shown on the left are not connected). I shall not write out anymore cases since they are pretty obvious. The only point to bear in mind is that we may assume parameters are relettered in an appropriate way whenever it is necessary or expedient to do so.

Let ^ be the equivalence relation on the derivations of ATj(~v) generated by this set of permutations. There is an obvious correlation between such derivations and those of Nj(~v^', which can conveniently be represented as a map ip : Nj(~v^ —• Nj(~y^f. Furthermore, it is not hard to see that for a i m , IT in NJ^~^

n~n ' iff ^(n) = ̂ (n/). This can be proved by the same technique as Zucker uses to prove the result cited as Theorem 2.6 in Chapter 2 above.7 From left to right it is just a matter of checking all the cases, and from right to left the proof is by induction on the length of ^(U), the various cases being determined by the last rules of II and IT. (The only non-trivial case is when at least one of these is 3-elimination.)

Composing ip with the mapping <f> referred to earlier yields a mapping from the derivations of Lj(~v>} (the calculus which is like LJ without the rules for disjunction, except that vacuous applications of 3-left are disallowed) to those of Nj(~v)f. If the definition of = is extended in such a way that permuting cuts upwards past applications of 3-right and -left are included among the permutative reductions,8 we can show that for all d, df

in LJ^ d = d' iff <f)o<ip(d) = (j)oip(d').

The method of proof is the same as before although there are a larger number of non-trivial cases to consider in the induction; most of them, however, are already dealt with in Zucker's proof.

A number of different normalization procedures suggest themselves for jyj(-v)/ Q n e -ls t o a j i o w e a c h conversion step to remove exactly one maximal occurrence of a formula. This means the steps will be defined as for NJ except that the two 3-reductions are replaced by:

n A{c) n

3xA(x) yx A(c) A{b) II" ir

where II" results from II' by substituting c for all occurrences of b connected

7See pp. 51—65 of his paper. 8These reduction steps are listed as cases (Bib) , (Blh) and (B2d) in Appendix A below.

Natu ra l Deduct ion Revisited 53

to the one shown. The drawback to this procedure is that it takes us outside the class of NJ^~v^f derivations. To implement it we would need to introduce a wider class of figures (obtained from the derivations by adding the rule

n 3xA(x)

for all parameters (b) and show that every derivation reduced to a normal one despite possible detours via these quasi-deri vat ions. An alternative procedure would be to define reduction steps in terms of how Nj(~v^' derivations are constructed—in effect, this means in terms of Nj(~y) derivations. So, we would have for all cr, r in NJ^~V^

ayxr iff II >-i n ;

for some II, IT such that i/j(Ti) = a and ^(IT) = r. Finally, there is an intermediate procedure which requires that maximal formulae-occurrences be removed one by one, except in the case of existential formulae all of whose a-connected occurrences can be removed at the same time (where a is supposed to be the proper parameter of the elimination).

How close a correspondence obtains between normalization in Nj(~v^' and in the other calculi depends, of course, upon the particular normalization procedure chosen. But for all those sketched above we have at least the following:

(1) For all 11,11' in 7Vj(~v), n yx II' implies ^(11) y </>(II').9

(2) For all a, r in Nj(~v^, a ^ T implies that there exist II, II' such that

a = </>(n), r y ip{W) and n yx n7.10

(3) For all dj in L J ( " V \ d yx d' implies 0 o tp(d) y <t> o ip{df). (4) For all a, r in Nj(~^', a y± r implies that there exists d, d' such

that a = <j> o ip(d), r y <j> o i/>(d') and d >-i d'.

Such differences as exist between reduction procedures in the three calculi are not unexpected, given the nature of the correspondence between their derivations. In the passage from Lj(~v^ to NJ^~V^ and from the latter to AfJ^"^', a single formula occurrence in a derivation may be associated with a number of occurrences of that same formula. Now, generally speaking, a reduction step eliminates a single maximal occurrence (or its analogue in the sequent calculus). So, one such step may correspond to a succession of steps under the mappings (f> and ^ (as in (1) and (3))

9Since vacuous applications of 3-elimination are not allowed in Nj(~v\ the normalization procedure for this calculus will be like the usual one for NJ except that these must be removed—by what Prawitz calls immediate simplifications—as they arise in the course of the reduction process. In addition, of course, we allow the permutations which generate ~ . 10Here, and in (4), r may range over quasi-derivations when the first or third reduction procedure for JVJ^ - ^ ' is being considered.

54 Normalization, Cut-Elimination and the Theory of Proofs

whereas, going in the opposite direction, a single step may need to be combined with some others before it corresponds to one step (as in (2) and (4)). Furthermore, the correspondence is close enough to allow results to be translated from one calculus to another. In particular, strong normalization and uniqueness of normal forms in Nj(~w^f translate via 0 and ip into the corresponding results for Nj(~v^ and LJ^~y\ Although normal forms may not be unique in NJ^~V^ or LJ^~y\ they will be equivalent under ~ and = , respectively. As for strong normalization, it can be translated into the assertion that every non-repeating reduction sequence terminates. Because equivalence classes of derivations under ~ are finite, permutations can be included among the reduction steps for NJ^~V\ In the case of LJ^~y\ however, equivalence classes under = are not finite, as Zucker already observed for the negative fragment. Nevertheless, by utilizing the asymmetry of the reduction steps, he was able to show that any sequence of reductions in LJ~ must either terminate or have infinite repetitions and, the same holds for Lj(~vln (The translat ion will also go the other way, if we adopt the second of our normalization procedures for ATj(-v)', for it allows y to be replaced by = in (2) and (4).12)

Although it is possible to define the class of Nj(~y>}' derivations directly, using a rule of existential specification, and to normalize them by removing one maximal formula occurrence at a time, such an approach is neither as natural nor as convenient as one would wish. Nevertheless, Nj(~y^f does provide what might be called a permutation-free representation of proofs (albeit for a restricted set of connectives) and a reasonable framework within which to compare normalization and cut-elimination procedures. To some extent, therefore, it vindicates my earlier remarks about the desirability of a revised treatment of V- and 3-elimination. Unfortunately, when it comes to full predicate logic, there is no satisfactory analogue of Nj(~v>}'. The problems encountered in attempting to extend the above treatment to disjunction form the subject of the next chapter.

1 1A detailed account of how these results can be translated between the negative fragments of NJ and LJ is to be found in Sections 5 and 6 of his paper. 1 2This will be so, however, only if Zucker's indexing conventions and contraction conversions are adopted as well. It does not hold for the versions of the sequent calculus presented in the appendices. This is because (4), even with y replaced by = , does not guarantee that reduction sequences in Nj(~v^f can be translated into reduction sequences in Lj(~w\ If we were to add the clause:

Furthermore, for all d" such that d" = <f)oip(d), d" reduces to d in a finite number of steps,

translation would no longer be a problem, but it is hard to see how reduction for Lj(~v^ could be defined so as to satisfy this stronger condition without compromising the asymmetry of the permutative reductions.

4

The Problem of Substitution

When the rule for inferring from a disjunction is rewritten so as to agree in formulation and manner of application with the elimination rules for —>, V and A, the result is something of the form

Ay B A B

There is an obvious non-standard feature of this rule, which some might argue disqualifies it from being regarded as one at all, namely, it lacks a unique conclusion. The idea of rules with more than one conclusion is, however, a coherent one and there is no reason to eschew it in principle.1

Indeed, something like it is embodied in the sequent calculus (in its classical version at least) for sequential systems are most naturally interpreted as defining a derivability relation between sets of formulae. (Although disavowed by Gentzen, such an interpretation clearly underlies his development of these calculi.) Formally speaking, a sequent rule has only one conclusion, an array of formulae—not that the same could not be said of the rule cited above—but if the difference is just a matter of form, there is little to be said against derivations with multiple conclusions. Ordinary practice certainly does not rule them out. An incomplete proof by cases, for example, can reasonably be regarded as an argument with more than one conclusion (and a completed one may contain more than one occurrence of its conclusion). Even if we believe that a constructive proof, by its very nature, can only establish a single conclusion, there is no reason why its formal representation should not be allowed to contain multiple occurrences of that conclusion. As a matter of fact, the method of representing proofs in natural deduction runs counter to ordinary practice in this respect. In an NJ derivation an assumption is written down each time it is used, while the conclusion is a unique formula occurrence. When arguing informally,

xIt has been treated at length by Shoesmith and Smiley in their book Multiple-Conclusion Logic (Cambridge, 1978), and more recently by Girard. See his discussion of proof-nets in Section 2 of "Linear Logic," Theoretical Computer Science, Vol. 50, 1987, pp. 1-102.

55

56 Normalization, Cut-Elimination and the Theory of Proofs

on the other hand, it is rare to copy down assumptions more than once and not unknown for the conclusion to be written out several times.

The revised rule for V-elimination given above is clearly valid in both the intuitionistic and the classical sense, provided that its conclusions are interpreted disjunctively. Classically, this means that at least one of them must be true; from a constructive point of view, it is perhaps better to think of a derivation with more than one conclusion as an unfinished argument which is completed by showing that a particular formula follows from each of its conclusions. When combined with the other rules, however, the presence of multiple conclusions requires us to place some additional restrictions on V- and —•-introduction in the intuitionistic case. These should have the effect of blocking inferences from Vx(A(x) V B) to VxA(x) V B and from A —> (B V C) to (A —> B) V C. Such restrictions can be avoided by adopting the expedient used earlier in the case of 3-elimination, namely, leaving Gentzen's formulation of the rule intact while altering its structural effect. This would not get rid of multiple conclusions, but it would ensure that each conclusion in a derivation was an instance of the same formula. Furthermore, even if the direct rule given above is retained, Gentzen's V-elimination will have to be translated into a (derived) rule of the resulting multiple-conclusion calculus (call it NJ1) when the need arises to map the derivations of NJ into those of NJ'. For these reasons I shall turn now to the question of how such a translation is to be accomplished.

The basic idea is straightforward enough. I want to interpret

[A] [B] n iii n2

AMB c c c

as the figure obtained by simultaneously substituting the conclusions A of

n AM B

A B

for the assumptions A of A rii C

and the conclusions B of this same derivation for the assumptions B of

B n2 c

It is, however, by no means obvious what substitution is to mean in this context. The derivations of NJ are just trees and we know exactly what it means to substitute a derivation II of C, say, for each occurrence of C

The Problem of Substitution 57

as an open assumption in some other derivation II'. The derivations of NJ', on the other hand, will be directed graphs which may have more than one bottom vertex labelled by the same formula. Our aim is to define an operation on these figures which shares the basic properties of substitution for trees, and coincides with it in case the derivation being substituted is a tree. For example, assuming a has conclusions of the form A, we expect the result of substituting these for the assumptions A of r , written (<J/A)T,

to satisfy the following conditions:

iA i\ / i A\n i a if C is A

<4-1 } (*M)c = { c otherwise

(4.2) ( * M ) £ R = ^ ^ R

when R is a one premise rule, and

(4.3) {a/A)^W = WWEWA)DRI

when R/ is a two premise rule. In view of this, it appears that we must first decide what kind of figure results from applying a rule of inference to a graph derivation. Again, this is unproblematic in the case of tree derivations, but less so when we consider graphs—especially where two premise rules are concerned. Having made such a decision, we should then be able to define substitution, using the above conditions together with the requirement that it be an associative operation.

If proofs are regarded as functions, substitution is analogous to the operation of composition. For this reason alone, it ought to be associative. In the present context, however, even the property of associativity needs to be generalized. What I have in mind can be illustrated by considering the properties of composition applied to functions of more than one argument. Suppose that f: Xi x X2 *-+ X, g: Yi xY2 ^ Y, h: X xY \-> Z and i: Vi x V2 •-»• X\. Then fto/, for example, is a function from (Xi x X2) x Y to Zand, just as (hof)oi = ho(foi), so (hof)og: (X\ xX2)x(Y"i xY2) »-» Z must be equal to {ho g) o / . Because the derivations under consideration here correspond not only to functions of more than one argument, but also to ones which yield an array of values, substitution should satisfy not only conditions of this sort but also their duals. To illustrate what I mean by the dual oi(hof)og = (hog) of (subject to the obvious restrictions on the domains and ranges of / , g and h) consider the derivations II, II! and II2 on the preceding page, and let a be the figure which results from applying the rule

AW B A B

to II, then it should be the case that

(4.4) ((a/A^/B)^ = (((T/B)n2M)n1

58 Normalization, Cut-Elimination and the Theory of Proofs

provided that B is not among the conclusions of IIi nor A among those of n2.

Recall that the assumptions in an NJ derivation are supposed to be grouped into equivalence classes, and that the application of a rule of inference discharges all the members of at most one such class (or a pair of such classes, in the case of V-elimination) rather than all occurrences of a given formula as an assumption. Rules can therefore be described as applying to equivalence classes of formula occurrences, rather than to individual occurrences or to all occurrences of the appropriate form. Since each NJ derivation has a unique formula occurrence as its conclusion, these distinctions become blurred in the case of the direct rules. In a multiple-conclusion calculus, however, this is not so, and it is convenient to group conclusions into equivalence classes as well while retaining the convention that rules are applied to such classes. The class to which a formula occurrence belongs will be indicated by its subscript, a natural number. Whether these subscripts are formally a part of the calculus or merely an auxiliary device will be left open. In the latter case any renumbering of subscripts which does not affect the composition of the various equivalence classes leaves a derivation literally unchanged; in the former, any such renumbering is said to produce a congruent derivation and, for (almost) all practical purposes, congruent derivations can be treated as identical.

Pursuing the previous example in the light of these considerations, it is always possible to ensure that

((a/A^/B)^ = ((<7/B)n2/4)IIi

or, more properly, that

((*M„)ni/Bm)n2 = ((a/Bm)n2Mjn1

by carrying out some appropriate resubscripting when necessary. Furthermore, this can be done in such a way that the subscripts on the assumptions and conclusions of the resulting derivation remain unchanged. Let (^fc/n)n denote the result of replacing all the assumptions An of II by ones of the form Ak\ similarly, Yl{Ak/n) is the result of replacing all conclusions An by ones of the form Ak- The meaning of simultaneous in the phrase 'simultaneous substitution' can now be explained. Given a figure a which includes An and Bm amongst its conclusions, the result of simultaneously substituting the conclusions An of a for the assumptions An of IIi and the conclusions Bm of a for the assumptions Bm of II2, is defined as aa(B f c / m )Mn)ni /B f c ) (B f c / m )n 2 or ( ( a ( ^ / n ) / 5 m ) n 2 M P ) ( ^ / n ) n 1 for suitable choice of k and p.2 The point of all this is that the order in which these consecutive substitutions are performed does not affect the result; they can therefore be treated as having been carried out simultaneously. A

2k and p are suitably chosen if they avoid any clashes of subscripts. It is sufficient for this purpose to require that A; occur nowhere in 111 and p occur nowhere in Eb-

The Problem of Substitution 59

more suggestive notation for the figure denoted by the above expressions is

n AW Bt

ni n2

and I shall adopt it, modified according to context, for simultaneous substitutions in general—at least until the end of this chapter.3

Condition (4.4) above is just one of a group which together constitute a generalization of associativity in the sense that they characterize the basic properties of substitution for multiple-conclusion derivations (or composition for functions of more than one argument and value). It is easy enough to supply the others but, with one exception which is of particular relevance to the discussion, I shall not do so here. Using T,T',. . . etc., to range over possible multiple-conclusion derivations, the condition in question can be written as follows: If Cp occurs among the conclusions of r and r ' , and An is not an assumption or conclusion of r", then

(4.5) ((T/An)r'/Cp)r" = ((T/CP)T" /An)(r' /Cp)r"

The above is of interest because, when r is (a/Bm)Il2, r' is III and r" is a figure of the form

CP R

D

or 5 P Er , D Uq

it asserts that n n

AVBk AVBk

An t>m An £>m nx n2 ni n2 Op Cp

Dq R Dq

K DqK

or the analogue of this equation with R/ in place of R (provided that substitution is assumed to be associative). Read as a reduction from left to right rather than as an equality, the above is just a notational variant of what Prawitz calls VE-reduction (on the understanding that II, IIi and II2 are NJ derivations and at least one occurrence of Cp is maximal). The situation is reminiscent of Nj(~v^. Once the decision was made to interpret 3-elimination in terms of substitution, the basic properties of this operation ensured that permutative 3-reductions became redundant. So, here,

Subst i tu t ion as defined in Chapter 5 does not satisfy (4.4). As a result, a slight reinterpretation of the notation introduced here will become necessary.

60 Normalization, Cut-Elimination and the Theory of Proofs

the properties of a more general notion of substitution rule out permutative V-reduetions.

What I have done so far is to adumbrate a kind of multiple-conclusion variant of natural deduction. I have not, it is true, stated its rules— although it is not hard to imagine what form they ought to take—but have simply discussed some of its general properties. More significantly, I have not yet described the structural effect of applying a rule in this calculus. It is the latter that concerns me most here but, in order to have a fixed framework within which to consider it, I will begin by listing a set of propositional rules:

(In what follows ra, n , p , . . . range over natural numbers.)

Axioms: Am

Rules:

(1) s*-m tjn

AABp (2) a. A A Bp b. AABP

Am Bn

(3) a. Am b. Bn

Ay Bp Ay Bp (4) Ay Bp

An Dm

(5) Bm

A-*BP {An} (6) A.n A. • tip

Bm

(7) ±P_

An

(8) An

TP

Some comments on the rules:

(1) Rule (8) is clearly redundant and is included only for aesthetic reasons. Having introduced it to preserve symmetry, I propose to ignore it in the sequel because its presence complicates the discussion of normalization.

(2) These rules are supposed to operate on multiple-conclusion derivations. This means, in particular, that they apply not only to derivations of their premises, but also to derivations whose conclusions include their premises. For example, rule (1) is to be interpreted as follows: if r is a derivation of A, Am from T and r' is a derivation of Af,Bn f r o m r , then

r r '

AABp

is a derivation of A, A', AABV from T, V (where T, A , . . . range over sets of subscripted formulae).

(3) Rule (5) is just —•-introduction; the notation {An} indicates that all occurrences of An as an assumption are discharged by the inference.

The Problem of Substitution 61

There is one small deviation from the usual formulation of the rule, however: it is convenient to allow an application to discharge any other occurrence of An as an assumption which may subsequently be incorporated into the derivation, provided that there is a path between it and some occurrence of Bm (as a premise of the inference) which does not pass through any occurrence of A —• Bp (as a conclusion of the inference). In view of what was said earlier about the properties of substitution, permutations of inference—where possible— cannot affect the structure of a derivation. As a result, the class of derivations is not enlarged by this feature of the rule. All it allows is more latitude in the matter of permuting inferences.

(4) Rule (5) is the only rule which is not intuitionistically valid (because it is tantamount to inferring (B —• C) V D from B —• (C V D) ) . It could be made so by imposing suitable restrictions, but for the present it suffices that all the rules are classically valid—a fact which is easily verified.

A set of rules 1Z determines both a consequence relation between (sets of) formulae and a class of figures called derivations. In the former case, rule (3a), for example, would be interpreted as

if A, A is a consequence of T, then A, A V B is a consequence of r

and, in the latter, as

the result of applying this rule to a derivation of A, A from T is a derivation of A, A V B from T.

The consequence relation or class of derivations determined by 1Z is the least such relation or class closed under the rules of 1Z, when appropriately interpreted. These two notions are connected by the derivability relation (where A, A is said to be derivable from T if there is a derivation of A, A from T) because it is usually taken for granted that, for any 1Z, derivability by 1Z coincides with consequence by 1Z. It is, however, slightly misleading to speak of 1Z determining a class of derivations. In fact, there may be many classes of figures which qualify. To fix the class of 1Z derivations, it is necessary to specify the kind of object which the rules of 1Z are supposed to construct (whether, for example,they are to be sequences, trees or graphs, and how their nodes are to be labelled). The rules of % can then be interpreted as operations on these objects, and the 1Z derivations will be the smallest class closed under the operations. (The rules of LJ, for example, as remarked earlier, admit more than one such interpretation.) In other words, it is necessary to specify a notion of derivation. It can happen, however, that the notion in question is defective: it may not be closed under all the requisite operations, for example, or it may be closed under them, but their application may not always yield the desired result (i.e., a

62 Normalization, Cut-Elimination and the Theory of Proofs

derivation of the required conclusions from the assumptions given). Such deficiencies may even carry over to the derivability relation so that it no longer coincides with consequence. I propose to call a notion of derivation adequate for % if the relation of derivability by K which it determines coincides with the consequence relation for 1Z.4

It should be obvious, especially to anyone familiar with Gentzen's calculus LK, that (l)-(7) represent a complete set of rules for the classical propositional calculus in the sense that (I\ A) is in the consequence relation determined by (l)-(7) iff /AT —• W A is a tautology. It still remains to specify a notion of derivation which is adequate for l)-7) and satisfies the properties described earlier in connection with the substitution operation. Derivations, according to what was said above, will be graphs whose vertices are labelled by formulae (together with some additional information, perhaps). Not every such graph will be a derivation, however, even if it is built up exclusively from configurations like

A B B \ / 1 etc. AAB AWB

which exemplify instances of the rules. To exclude figures like

Ay B A B AAB

for example, we stipulate that the premises of rules (1) and (6) cannot both be conclusions of a single derivation. Following the usual practice for NJ, each axiom of the form Am will be represented by a one-element graph whose only vertex is labelled Am. So, the problem is to generate a class of derivations from these one-element graphs by means of operations which represent the rules of inference—in other words, to describe the structural effect of applying a rule. Unfortunately, there is no way to do this without abandoning at least one of the requirements discussed above.5

4These ideas are drawn from Shoesmith and Smiley. See, in particular, page 26 of their Multiple-Conclusion Logic. They use "deduction" and "deducibility" where I have used "derivation" and "derivability." Aside from this, their explanation of adequacy differs from the above in only one respect: I have defined adequacy relative to a set of rules, whereas they call a notion of deduction adequate if it is adequate for every set of rules. The coincidence of derivability and consequence for a set 71 of rules is of course not the same as the completeness of 7£.

In principle, the issue of adequacy arises in the case of both single- and multiple-conclusion rules, but it assumes no practical importance in the former case. This is because there are few alternatives to consider when defining deduction for single-conclusion calculi, and the only two taken seriously—deductions as sequences of formulae and deductions as trees—are both adequate in this sense for any set of rules. In the multiple-conclusion case, however, matters are less straightforward, as we shall see below.

51 have attempted to place as few restrictions as possible on what kinds of figures

T h e Problem of Subs t i tu t ion 63

Theorem 4.1 Any notion of derivation which is adequate for rules (l)-(7) cannot have an associative substitution operation.

The remainder of the chapter is devoted to proving this claim and discussing some of its consequences.

Assume the rules of inference have been represented as certain operations on graphs and that substitution has been defined in terms of these in the manner suggested above. It then makes sense to talk about the class N of derivations generated by the rules, and to consider the subclass D of N generated by the propositional rules of NJ. (Alternatively, D could be described as the range of a mapping from propositional NJ derivations to N. It is preferable, however, to think of the rules of NJ as being reinterpreted so that they now yield members of TV.) The schematic descriptions of these rules coincide with those given above, except for -»- and V-elimination. In the case of the former, the difference is just a matter of notation. As for the latter,

[A] [B] n iii n2

AvB C C C

is to be interpreted as

n AWB

A B iii n 2

Redundant applications of the rule, i.e., ones which do not discharge an assumption in both of the minor premises, are excluded as they were earlier in the case of 3-elimination. The problems these present will be discussed later.

Lemma 4.2 Suppose that a is a particular occurrence of the configuration

A\JB A B

derivations could be. To call them labelled graphs is just a way of saying that they are structured arrays of formulae. As for the choice of one-element graphs to represent axioms, it is made for the sake of simplicity and definiteness. Nothing depends upon it and almost any other objects could serve as well. The argument which follows makes few additional assumptions about derivations; only that their structure is logically significant and that the rules of inference operate in a uniform way. It does, of course, depend upon the choice of rules—although not on these particular formulations of them—and, more importantly, on the stipulation that the two premises of an inference involving (1) or (6) not be connected prior to the application of the rule. These assumptions and the possibility of avoiding the result by doing without them are discussed briefly at the end of the present chapter.

64 Normalization, Cut-Elimination and the Theory of Proofs

in a derivation II of T). Then U can be written as

W AVB

A B

ni n2

for some n ' , I I i , I l2 in D, where a is introduced by the application of V-elimination shown.

Proof. By induction on the definition of D (i.e., on the rules of NJ). We think of II as being given together with a construction tree which establishes its membership in D, and carry out the induction on the number of steps in this tree. The basis step is trivial, and the induction step is taken care of by condition (4.5) above (which says, in effect, that V-elimination can be permuted downwards past any inference) except when the last step in the construction of II is another application of V-elimination. In this case II is of the form

nx

CVD C D

nr n* and there are two subcases to consider:

1. a is part of IIX . By induction hypothesis IIx can be written as

n* AW B

A B

ni rr2

There are a number of possibilities to consider according to whether C V D is among the conclusions of II*, II^ or n 2 . Suppose C V D occurs as a conclusion of all three; the other possibilities are similar and easier to handle. So, IIx has the form

IT Ay B CVD

A B

ni n'2 CVD CVD

I have not bothered to write in the subscripts and will simply assume that the only occurrences of C V D, A V B,C, D,A and B as assumptions or conclusions in I I x , I I* . . . etc., are the ones shown.6 Now

6Obviously, this effect can be achieved with subscripted formulae by resubscripting where necessary, so there is no loss of generality.

The Problem of Substitution 65

(4.6)

where

(4.7)

n= _cyD/c nx/D nx

nx

CVD = C D

By applying (4.5) twice to the right hand side of (4.7) and because of the relationship between substitution and the application of a rule expressed by (4.1)-(4.3) above, we obtain

(4-8) n *

CVD = ((n"M)n'1'/B)n'2'

where

C D

n* n" = A\/B C\JD

A B C D

ni ny = CMD

C D

n2 n2' = CVD

C D

So, (4.9) n = ((((n"M)n;7B)nZ/c)nr /i>)n2

x. Applying (4.5) twice to the expression within the outermost brackets yields

(4.io) n = ((((n"/c)n1x/A)(n;7c)nr/B)(n^7c)nr/D)n2

x

and, using (4.5) again to distribute Il2 , we get

(4.11) 11 =

((((n'7c)nr/D)nxM)((n'17c)n1x/D)nx/s)((n27c)n1

x/JD)nx

But, writing

n' AM B A B

66 Normalization, Cut-Elimination and the Theory of Proofs

for ((n"/C)n ix /Z))n2

x and writing Ifc for ( (n^7C)n x /D)n 2x , where i = 1

or 2, (4.11) becomes

n = AVB A I I i / £ n 2 =

n' AVB A B ni n2

2. a is part of II* or n j . We might as well assume cr is part of IIx (If it is part of IIJ, the ar

gument is the same.) Here the various subcases (according to whether C occurs as an assumption in II*, n^ or Uf

2) do not require separate treatment because it follows from (4.1)-(4.3) that, when A is not among the assumptions of <7, {T/A)<J = a. So, by induction hypothesis, II* can be written as

n* AVB A B

ni n'2 The remarks made above about the absence of subscripts, and the occurrence of A V B, C V D, etc., as assumptions or conclusions apply here as well, except that C may occur as an assumption in any or all of II*, II^ and II2 although I will not display it. Let r be

then

(4.12)

n = (T/C) (( AVB IA ) n; /B \ n'2 A Bl

and we want to show that this is the same as

(T/C)AVB/A\ (T/OWJB I (T/C)W2

For this purpose I appeal to the dual of (4.5). Duality in this context means switching assumptions with conclusions, and the expression being substituted with the one being substituted into. The condition in question is

(4.13) (T/A)(T'/B)T" = ((T/A)T'/B)(T/A)T"

provided that B is not among the assumptions or conclusions of r . Two

The Problem of Substitution 67

applications of (4.13) to (4.12) yield the desired result. Hence

IT

A B ni n2

where IT

II' = (r/C)A V B A B

and n̂ = (r/c)n; (t = 1,2). • If v is a vertex of a derivation II which is labelled Cn, v is called the

occurrence v of Cn in II.

Definition 4.1 Suppose there is a configuration a of the form

AVB A B

in II and that v is a vertex ofU. v is said to join the disjunction Ay B in a if there are paths from v to both the occurrence of A and the occurrence of B in a which do not pass through the disjunction.

Although I have not yet specified what operation on graphs corresponds to applying a rule of inference, the following holds at least.

Lemma 4.3 The application of a rule of inference to a derivation II cannot introduce formula-occurrences which join any disjunction in II.

Proof Again the proof is by induction on the definition of D, or rather on the definition of the class D ' which is like D except that no restriction is placed upon the conclusions of the minor premises in an application of V-elimination (i.e., they need not be the same as one another). Although Lemma 4.2 was stated only for D, the proof applies equally to D', so I feel entitled to claim this result for D ' as well. (Notice, incidentally, that D' is just N.)

The basis step is trivial. Now suppose II contains no joined disjunctions and that the application of a one-premise rule R, one of the form

C_

a say, introduces a vertex v which joins some occurrence of the disjunction A V B in the resulting derivation

n

C"

68 Normalization, Cut-Elimination and the Theory of Proofs

Then, by Lemma 4.2,

n' AM B

n= A B iii n 2

c c where the disjunction which becomes joined in

n c_ c

is introduced by the application of V-elimination shown. Hence, by (4.5)

n' n' AM B AM B

C = A B = A B •=, ~ i h n 2 ~ n x n 2

c_ c_ c_ c_ c c c

But rules operate only on their premises, not on any other vertices of a derivation, so it follows from the definition of substitution that there can be no connection between the vertices of

Iii C_ C

and those of n2 c_ c

except via an occurrence of A V B—which means that no vertex in either of them can join such an occurrence. The case in which R is a two-premise rule, given our earlier stipulation that its premises must belong to separate derivations, is just a notational variant of the above.

It only remains to argue that no disjunctions are joined as a result of applying V-elimination. By the definition of simultaneous substitution, it suffices to show that, if II and II7 contain no joined disjunctions, then neither does (U/A)IV. But this follows from the preceding cases by an induction on the construction of IT and the associativity of substitution. •

If rules are thought of as operating on individual vertices, rather than on groups of them with the same label, there is an obvious interpretation of applying a rule of inference, namely, the usual operation on tree derivations extended to the case of directed graphs. Suppose now that the formulae

T h e Prob lem of Subst i tu t ion 69

appearing as premises in the set of rules given above are taken to be individual occurrences and that this interpretation is adopted, then the class of derivations generated in this way will be denoted by N'.

Lemma 4.4 Let N be any class of derivations generated be rules (l)-(7) above whose members satisfy Lemma 4-3- Then, to each derivation II in N there corresponds Iic in N' such that II and Yic have the same assumptions and conclusions.

Proof. Because each formula occurrence in a graph-derivation is supposed to follow from its immediate predecessors (or, when a vertex has two successors, to follow from it jointly in some sense), the application of a rule can only consist of performing the following operations (perhaps more than once):

(1) a. Add a new vertex labelled by the conclusion of rule (2), (3), (5) or (7) below a bottom vertex labelled by its premise.

b. Add a new vertex labelled by the conclusion of rule (1) or (6) below a pair of bottom vertices each labelled by one of its premises.

c. Add two new vertices, each labelled by one of the conclusions of rule (4), below a bottom vertex labelled by its premise.

(2) Amalgamate a pair of vertices which are both instances of the same conclusion.

Vertices in a derivation are said to belong to the same cluster if they were introduced by the same inference—except in the case of rule (4), where the vertices introduced are divided into two clusters depending upon whether they are labelled by the left or right conclusion of the inference.7 It follows from Lemma 4.3 that, when a pair of vertices is operated upon—as in (lb) and (2), the members of the pair must belong either to the same cluster or to separate derivations. In view of this it is a routine matter to show by induction on the rules that, if II is any derivation built up using (1) and (2), we can find IIC as described above with the additional property that Uc has the same number of occurrences of each of its conclusions as II has clusters of them. (This stronger condition is needed for the induction step in the case of the two-premise rules.) •

I claim now that derivability in N' is not the same as the consequence relation defined by the rules. Furthermore, there is no hope of salvaging the situation for intuitionistic logic by finding a suitably restricted version of the rules and showing that the corresponding subset of N' coincides with their consequence relation. In other words, there is at least one intuitionis-tically valid formula of the propositional calculus which cannot be derived in AT'.

7I assume some way to differentiate between these two groups of vertices even if both conclusions happen to be the same formula (with the same index).

70 Normalization, Cut-Elimination and the Theory of Proofs

The argument I propose to give is based upon a similar one to be found in Chapter 8 of Shoesmith and Smiley.8 They set out to show the inadequacy—in the sense explained in footnote 4 above—of Kneale's notion of development, using a slightly different set of rules. As they point out, "Kneale's 'tables of development' are the pioneer multiple-conclusion proofs."9 A table of development is essentially the same as a derivation of N' except that rule (5) is replaced by

* B A A->B A-+B

where the rule on the left is one with no premises. I have preferred rule (5) out of deference to tradition and, more importantly, because it is based on the intuitionistic meaning of implication. Their rules on the other hand reflect the truth-table definition of this connective. (Of course, this is not to say that their rules cannot be restricted—or that ours need not be—to become intuitionistically valid.)

The normal-form theorem holds trivially for this calculus, as does the strong normalization theorem, because the carrying out of any reduction step actually diminishes the size of a derivation (as measured by the number of its vertices). Smiley and Shoesmith therefore need only argue that there are tautologies which have no normal derivation to establish their result. The example which they provide, namely (A —* A) A (A V (A —• A)), is easily seen to be derivable in Nf, so I have been obliged to come up with a more complicated one. Nevertheless, it is misleading to describe one set of rules as being stronger than the other—after all, both can fairly claim to be complete formalizations of the classical propositional calculus. It is the way in which derivations are pieced together, rather than individual peculiarities of the rules themselves, which accounts for the existence of such examples.

Lemma 4.5 The normal form theorem holds for N'.

Proof. The proof, which I will just sketch, is adapted from the one for NJ to be found in Prawitz's monograph.10 Notice that, despite the classical character of the rules for N', it is the proof for NJ not NK which adapts to the present case, and that it is simplified rather than complicated in the process.

The reduction steps for TV'-deri vat ions are as expected. They include _L-reductions for each of the other connectives, namely,

8loc. cit. 9They appeared first in his paper, "The Province of Logic" {Contemporary British

Philosophy, 3rd. Series, ed. by H. D. Lewis, London, 1956), and subsequently in the aptly titled The Development of Logic (Oxford, 1962). 1 0 Natural Deduction, Chapter IV. 1

The Problem of Substitution 71

n n n n i _L

BAA ±

Ay B B 1

AAB _L

BAA ±

Ay B B B-*A A A A B A

all reduce to n 1 A

A-reduction is the same as for NJ. There are no permutative reductions, and V-reduction takes the following form:

n n A n , A n

JVB ^ A and

TVA ^ A A B B A

Finally, the reduction step for —> is

n B ir

A yi

ir

A-*B {Ap} ir A yi

n B

yi

B

Because the rules of N' apply only to single formula occurrences, there is no need to bother with subscripts except for assumptions, and the right hand figure in the statement of —^-reduction denotes the result of replacing each assumption Ap of II by a copy of the conclusion A of IT shown on the left (together with its derivation).

Maximal formulae are defined in the usual way, and to each derivation II we assign a value (m,n), where m is the highest degree of a maximal formula occurrence in II and n is the number of such occurrences of degree ra. The reduction procedure is as follows: eliminate any maximal formula occurrence of highest degree unless it is an implication to which —•-reduction applies; in this latter case eliminate it only if the derivation of the minor premise contains no maximal formula occurrences of highest degree.

It is easy to see that any reduction step performed in accordance with this procedure diminishes the value of a derivation—the pairs (m, n) are ordered lexicographically—and that the procedure can always be applied to a derivation which is not in normal form. From this the result follows, n

Multiple-conclusion derivations do not have unique normal forms—at

72 Normalization, Cut-Elimination and the Theory of Proofs

least, not relative to the conventional type of reduction step. For example,

n A

Ay B A B

W C

U" D

CAD

reduces to both

D

n . n" A a n d D

by our procedure (assuming that Ay B and C A D are of the same degree). Furthermore, the choice of rule (5) as a formulation of —^-introduction serves to compound the problem, because a single group of assumptions may be discharged by more than one application of this rule. As a result, N' contains derivations of the form

n ir B c n" A A^B {Ap} A->C {Ap} A

B , C

which may reduce to

ir n" n n

B,C B,C

(assuming that A —> B and A —> C are of the same degree and that 11', II" contain only maximal formulae of lesser degree). These are some of the problems involved in finding a correspondence between normalization in a calculus like Nf and normalization in NJ. None of this is strictly relevant to the matter at hand, however, although I will return to the topic later.

Normal derivations have a particularly simple structure—more so even than in NJ. If a branch of a derivation is defined to be any subset linearly ordered by the edge relation which does not pass through (although it may terminate with) the minor premise of an application of rule (6), then every branch begins with a (possibly empty) series of eliminations, ends with a (possibly empty) series of introductions and, in between, (perhaps) an application of rule (7). It is obvious, therefore, that normal derivations have the subformula property.

Lemma 4.6 The formula *(A,B) A *(C, D) is not derivable in Nf where *(X, Y) abbreviates {X V Y) -+ ((X V Y -> X) V (X V Y -> Y)).

Proof. Assume A, B, C, and D are all atomic. This makes matters a little

The Problem of Substitution 73

simpler, although it is not essential to do so. If the formula is derivable, it has a normal derivation, whose last rule of inference must therefore be an introduction, A-introduction, in fact. So, consider what form a derivation of *(A, B), say, can take. Again, the last inference must be an introduction, and the derivation will consist of an application of V-elimination followed immediately by introductions. An inspection of the rules reveals that it must look like the following:

Ay B A B

(AvB)-+A {AvB)-*B (AVB^A)V{AVB^B) {AV B -» A)V (A V B -» B)

*(A,B) *(A,B)

where the assumption AWB is discharged by any or all of the applications of —•-introduction. Similar considerations apply to the derivation of *(C, D). What is interesting about these derivations is that they cannot have fewer than two conclusions. It is apparent, therefore, that however many times A-introduction is applied there will always be a pair of conjuncts left over. Consequently, there can be no normal derivation of *(A, B) A *(C, D), only of *(4, B) A *(C, D), *(A B) or *(4, B) A *(C, £>), *(C D). •

Theorem 1, whose proof is now complete, indicates that there is a difficulty about defining substitution for the derivations of the propositional calculus. If this operation is to be well-defined, derivations which differ only in the order of their construction cannot be distinguished from one another. Yet I have just shown that there is no way to obliterate such distinctions entirely—at least not for any class of derivations sufficient to yield all valid formulae. The difficulty could be avoided, of course, by weakening the properties required of substitution. Unless we are prepared to change our understanding of this operation in more familiar contexts, however, there seems to be little doubt that they are appropriate ones. (This claim becomes even more plausible when substitution is considered in the abstract framework of category theory.) Anyway, although such an expedient would allow substitution to be defined for derivations of the kind considered above, it would leave their structural properties unaffected. In particular, there would still be distinct derivations which differed only in the order of their construction and, as a result, permutative reductions or their analogues would be required for a satisfactory normalization procedure.

This circumstance, regrettable though it may be, seems unavoidable given the treatment of derivations and rules of inference adopted above. My aim, however, was to make it as general as possible. Permutative reductions will be needed not just for this particular multiple-conclusion calculus, but for all those systems (like Gentzen's N and L calculi—in both their intuitionistic and classical forms) whose rules can be interpreted as

74 Normalization, Cut-Elimination and the Theory of Proofs

generating a sufficiently large subset of its derivations.11 Derivations are taken to be arrays of formulae structured by a logically significant relation and rules of inference are supposed to operate on their premises in a uniform way, but it is difficult even to imagine what would be entailed by doing without these assumptions. The same cannot be said for the stipulation that the two premises of an application of rule (1) or (6) be from separate derivations.

If this restriction is relaxed and, more generally, even the conclusion of an inference is allowed to belong to the same derivation as its premise(s), application can be reduced to a single basic operation: connecting a pair of vertices, whether from the same or different derivations. It then becomes necessary to place some global restrictions on the structure of a derivation to ensure its correctness. This approach to multiple-conclusion logic is discussed extensively by Smiley and Shoesmith, who devote Part II of their book to the issues involved in piecing together derivations in this way. The flexibility gained by such a piecemeal method of construction does facilitate the definition of an associative substitution operation—several of them in fact, but this turns out to be a relatively minor advantage. It is hard to attach much intuitive significance to the joining of individual vertices, especially when they belong to the same derivation. Furthermore the resulting derivations, although they may not strictly speaking allow permutations of inference, do display structural differences which appear to be no more logically significant. (These will depend on whether different applications of rules with the same conclusion connect to one or more occurrences of it.) More importantly, our basic problem remains unsolved because we are looking for a homomorphism (with respect to the logical rules) from a familiar Gentzen calculus to one of this kind which will preserve substitution, and the image of A-introduction, say, under such a mapping must be a derived rule which transforms separate derivations of A and B into a derivation of A A B.

Girard accepts the impossibility of finding a homomorphism of this kind.12 He is led therefore to take a more radical approach to multiple-

11Since Lemmas 4.2 and 4.3 depend upon the special form of rule (5), it might seem that the result could be avoided by replacing this rule with a more traditional version of —^-introduction, but this is not so. These lemmas will still hold for derivations in which no application of —•-introduction discharges an assumption in the derivation of one of the premises of an application of V- elimination lying above it, and it is only reasonable to require the structural effect of applying a rule in these special circumstances to be the same as in all others. 1 2 He too is interested in how to represent the proofs of classical logic by derivations whose inferences cannot be permuted with each other and which can be reduced to normal form without permutative reductions, and remarks in a number of places that it cannot be done. See, for example, page 9 of "Linear Logic" (Theoretical Computer Science, Vol. 50, 1987): "It seems that the problem is hopeless in usual classical logic and the accumulation of several inconclusive attempts is here to back up this impression." The results of the present chapter may also be seen as backing it up.

The Problem of Substitution 75

conclusion calculi, albeit one which may also be described as a piecemeal method of derivation construction. His rules operate by connecting pairs of individual formula occurrences, and the resulting figures are called proof-nets if they satisfy a certain global soundness condition. Unlike the situation described in the previous paragraph, however, each occurrence must be the conclusion of exactly one application of a rule (axioms are treated as conclusions of a 0 premise rule) and the premise of at most one. What distinguishes Girard's approach above all is that he abandons the traditional logical vocabulary and studies a variety of novel connectives. In terms of these he is able to characterize fragments of classical logic with nice properties. There is no space here to do justice to his ideas and results, but they do provide further evidence for the interest of derivations with multiple conclusions.13

Another possibility would be to relax the requirement that a derivation must be an array of formulae. There are no doubt a number of ways to do this, but what I have in mind here is to treat them as sets of such arrays—sets of trees, in fact. For example, the derivation

[A] [B] n nx n2

AyB c c c

is to be interpreted as the union of

n A B

»° "e? "c with some notation to indicate that the occurrences of A V B shown are not among the conclusions of the resulting derivation, nor A and B among its (open) assumptions. This results in a rather unfamiliar notion of derivation which is difficult to reconcile with the idea of a proof as a determinate procedure for arriving at a conclusion. By removing the connection between the major premise of an application of V-elimination and the assumptions discharged by that application, it would appear that an essential component of the derivation has been lost. Furthermore, some justification is needed for treating

A B

AAB

as an entirely different structure from

AVB A B

13His recent book Proofs and Types (Cambridge, 1989), written with Yves Lafont and Paul Taylor, includes a readable sketch of these ideas. A more detailed account is to be found in his paper "Linear Logic."

76 Normalization, Cut-Elimination and the Theory of Proofs

(After all, if the idea of a structural similarity between proofs and derivations is to be taken seriously, inferences have to be represented in a uniform way unless there is some compelling logical reason for not doing so.) Despite these objections, however, if the derivations of such a calculus could be shown to characterize the equivalence relation on natural deduction derivations generated by permutations of inference, they would at the very least be of formal interest.

The impossibility of defining an associative substitution operator on derivations does not vitiate entirely this particular approach to logic since the derivations in question may still be (and indeed are) closed under substitution in the sense that, given derivations of A, A from T and of A' from T',^4, we can find a derivation of A, A' from r , r ' . Nonetheless, it is disappointing that there appears to be no straightforward extension of the negative fragment of NJ which preserves its distinctive combinatorial properties. This has consequences for both of the issues raised earlier. Granted that, whatever notion of proof is captured by NJ, it cannot be one in which the order of (permutable) inferences is important, there now seems to be little hope of representing this feature conveniently in a formal derivation. As for the correspondence between cut-elimination and normalization, had it been possible to exhibit a calculus C which needed no permutative reductions, together with homomorphisms from NJ and L J to C which preserved the proper reduction steps, this would have sufficed not only to establish a correspondence between reduction procedures for these calculi but also to make it plausible that permutative reductions are logically insignificant.

Under the circumstances, it seems best to consider equivalence classes of derivations (the equivalence being generated by permutations of inferences) and try to interpret these within the context of a general discussion of the identity of proofs. If it can be successfully argued that equivalence in this sense and interreducibility (factored out by this equivalence) represent significant relations on proofs, it should be possible to make better sense of permutative reductions and reestablish a correspondence between cut-elimination and normalization procedures.

5

A Multiple-Conclusion Calculus

Before pursuing the line of investigation suggested at the end of the previous chapter, I think it worth considering multiple-conclusion systems of logic in a little more detail. I am less concerned with their intrinsic interest than with the fact that they seem to be the natural analogues of sequent calculi—for classical logic at least. As such, they provide a convenient framework for the comparison of N and L calculi in general, and a treatment of classical natural deduction which is superior to the conventional one. I shall attempt to substantiate these claims below after having first outlined a usable version of natural deduction with multiple-conclusions.

In the previous chapter, I discussed the relationship between consequence and derivability relative to a set of multiple-conclusion rules—for propositional logic at least—but failed to describe an adequate notion of derivation. In order to do so, I must specify what operation on graphs is to represent the application of a rule of inference. It is apparent that Lemma 4.3 cannot be expected to hold for such an operation, and derivations containing circuits will have to be allowed. I do, however, want to exclude circuits formed by joining a number of occurrences of the premise of a rule (or a number of occurrences of each premise in the cases of rules (1) and (6)) to a single occurrence of its conclusion (or a single occurrence of each conclusion in the case of rule (4)). My objection to this procedure is that its effect is to reintroduce the kind of structural features which prompted the search for alternatives to Gentzen's TV calculi in the first place.

Consider for simplicity any one premise rule

A R(A)R

and suppose that the result of applying R to

n A,B,...

77

78 Normalization, Cut-Elimination and the Theory of Proofs

is the derivation n

R(AY ' •"• obtained from II by adding a new vertex labelled R(A) below all the bottom vertices of II labelled A. (There is no loss of generality here since it is easy to restrict the application of the rule to fewer occurrences of A by appropriate use of subscripts.) Now, the obvious mapping from NJ derivations to multiple-conclusion ones constructed by rules which operate in this way, call it T, is clearly an isomorphism between its range and the derivations of NJ. So, for example, the difference between

[C] [D] [C] [D] n rii n2 ni n2

CV D A A and n A A A CV D R(A) R(A)

R(A) R(A)

will be reflected exactly by the difference between the multiple-conclusion derivations

CVD CVD C D C D

.F(ni) T(u2) and ^(no T(ii2)

A A A A R(A) R(A) R(A)

(It is conceivable that there is some alternative to T which would avoid these consequences, but it is hard to imagine what it would look like. On the whole, this particular line of inquiry seems not worth pursuing.) These considerations lie behind the development which follows.

Since (4) is the only rule with more than one conclusion, a derivation having exactly two conclusions, both labelled ^4, can be represented by a figure of the form

nx

CVD n = c D

n2 n3 A A

where IIi,Il2 and II3 are all single-conclusion derivations. II specifies two ways of reaching the conclusion A, depending on how the disjunction C V D is decided. (C V D is decided when IIi, or the result of substituting derivations of the appropriate kinds for the assumptions of IIi, contains a proof of C or of D in the sense that it reduces to a derivation whose last inference . C D , lS—.D ° r C - D - }

A Multiple-Conclusion Calculus 79

Similarly, a derivation having exactly three conclusions, all labelled B, can be represented by a figure of the form

ni EVF

ni EV F

W =

E

n'2 GVH G H

K n'5 B B

n; or n"

E F

n2 n3 B G\/H

G H

B B

Suppose now that we want to derive A A B from II and II'. The resulting derivation should specify six ways of reaching the conclusion A A B, depending upon how the disjunctions CVD, E\/ F and G V H are decided. These can be represented in a single derivation by taking three copies of II and two of II7, and joining their conclusions to six new vertices labelled A A B in the following manner.

(5.1)

CVD n i

CVD C D

n2 n3 A A

rii CVD

C D

n2 n3 A A

EWF ni

EWF C D n2 n3 A A

n i CVD

C D

n2 n3 A A

rii CVD

C D

n2 n3 A A

E F

n2 n3 G V # B

E F

n'2 n^ G V i f B

AAB A/\B AAB AAB AAB AAB

This is not simply an arbitrary arrangement. It is designed to ensure that, no matter how each disjunction in II and II' is decided, there will always be at least one way to reach the conclusion .AAB. (Of course, a disjunction can be decided in only one way in all copies of a single derivation.) Furthermore, in view of the preceding discussion, each such way should be represented by a different path. If we add the (quite reasonable) requirement that each one should be represented by at most one path, then the above arrangement is the only possibility.

80 Normalization, Cut-Elimination and the Theory of Proofs

Guided by this example, I define below an operation of combination on graphs. The application of a rule of inference will then be interpreted as the combination of graphs, one of which has a special form. So, the above derivation is obtained by combining II and II' with the graph

A B AAB

(This notation is explained in the next paragraph.) In the case of a one-premise rule, for example (3a), its application to U should result in a derivation of A \JB obtained by adding two new vertices with this label, one below each occurrence of the conclusion A. Such a figure can be obtained by combining II with the graph

A AVB

Before proceeding further I need to introduce a few conventions:

(1) I will write, for n > 0, A!... An

A for the graph comprising n vertices labelled A\, . . . , An, respectively, which are joined to a single vertex labelled A below them.

A Al...An

is the graph obtained from

Ax... An

A

by reversing the direction of the edge relation. Finally, I will use A to denote the graph consisting of a single vertex labelled A. (It should be obvious from the context when A is being used to denote a formula, and when a graph.)

(2) As mentioned earlier, formulae occurring in a derivation will be assigned natural numbers as subscripts. For the present, these subscripts are to be considered part of the formalism of the calculus.

The use of subscripts is simply a bookkeeping device. It corresponds to the use of sequences of formulae on the left and right of a sequent, and generalizes the idea of equivalence classes of assumptions which is routinely employed nowadays in the treatment of natural deduction. When derivations are regarded as instances of valid argument forms which can be combined together to produce new forms, subscripts make it possible to preserve distinctions which would otherwise be lost. They serve as place holders in much the same way as variables do in the usual notation for functions and terms. There is also a strong reason to use subscripts if one is interested in the strong

A Multiple-Conclusion Calculus 81

normalization theorem since this is known to fail for the version of NJ (in fact, even for its pure implicational fragment) in which either all assumptions of the appropriate kind are discharged by an application of —•-introduction or none are. Subscripts are the means of distinguishing between different occurrences of a particular assumption when it is desirable to do so.

Although not absolutely necessary, matters are simplified if a graph comprising a single vertex is labelled by a formula with a pair of subscripts, its subscript as an assumption and its subscript as a conclusion. The advantages of this modification are twofold. It facilitates the comparison of multiple-conclusion calculi with the familiar Gentzen ones, and it simplifies various definitions below by eliminating the need to treat one element graphs as special cases. I will write the subscript as an assumption above the subscript as a conclusion so that, for example, the one element graph labelled by A whose subscript as an assumption is i and as a conclusion is j will be denoted by Aj. A one element graph labelled in this way will be called an axiom. It is convenient to be able to describe the axiom A1- as having a top vertex labelled Ai and a bottom one labelled Aj. I propose to adopt this manner of speaking henceforth, even though it suggests erroneously that the graph in question has at least two vertices.

(3) Any axiom and any directed graph with at least two vertices, each of which is labelled by a subscripted formula, will be called a quasi-derivation. The labels of the top vertices of a quasi-derivation are called assumptions, and those of its bottom vertices conclusions, but I will also use these terms to refer to the vertices themselves. Again, it should be obvious when a formula occurrence is meant, and when a vertex.

(4) Graphs which are identical except for their vertices are said to be copies of one another. There is no need to distinguish between different copies of the same quasi-derivation, and I will always assume that distinct graphs and distinct copies of the same graph have disjoint set of vertices.

(5) If II is a quasi-derivation which is not an axiom, I will write (^4^)11 for the quasi-derivation obtained from II by using Ai to relabel all assumptions of the form Aj. Similarly, U(Ai/j) is the quasi-derivation obtained by using Ai to relabel all conclusions of the form A3. If II is an axiom, say C^, (Ai/j)H is C^ if C = A and n = j , and II otherwise. Similarly, U(Ai/j) is Cf if C — A and m = j , and II otherwise.

Anticipating the comparison with LK, I will describe (Ai/j)U as having been obtained from II by left contraction and U(Ai/j) by right contraction.

82 Normalization, Cut-Elimination and the Theory of Proofs

Definition 5.1 Given Ak, let II be a quasi-derivation with m bottom vertices labelled Ak and II' be a quasi-derivation with n top vertices labelled Ak,(m,n > 0). Suppose also that these vertices are enumerated in some arbitrary way, and let I I I , . . . , IIn be n copies of II and U[,..., 11^ be m copies of IT. [II, j4fc,n/], the result of combining the conclusions Ak of n with the assumptions Ak of IT, is defined as follows:

(1) If n is an axiom A[, [II, Ak, IT] = {Ai/k)Il'.

(2) If IT is an axiom A\, [II, Ak, IT] = Tl(Ai/k).

(3) If neither II nor IT is an axiom, [II, Ak, IT] is the graph obtained from the union of I I i , . . . , I I n ,!![ , . . . 11^ by identifying the vertices V(p^ and y(p'q) for each p, q (1 < p < n, 1 < q < m), where V(p,g) is the qth bottom vertex of Up labelled Ak and v^p^ is the pth top vertex ofn^ labelled .4*.

It is easy to verify that, because copies are not distinguished from one another, [II, a, II'] does not depend upon the particular enumerations chosen for the conclusions a of II and the assumptions a of IT. Also, suppose for a moment that combination has been defined for graphs labelled by unsubscripted formulae as well, then figure (5.1) above can be written as

n',s, TI,A A B A/\B

(provided that B does not occur among the assumptions of II) or

A B] n,yi, IT,JS,

AAB

(provided that A does not occur among the assumptions of II'). The next task is to explain what it means to be a derivation in this

calculus. For this I need some additional notation and terminology:

(1) a. Two quasi-derivations are said to be congruent if they are obtainable from one another by a one-one mapping T between labels which satisfies the condition:

For all A, i, T{Ai) = Aj for some j . b. Call a subscript occurrence intermediate if it is not part of the

label of an (open) assumption1 or conclusion. Quasi-derivations which are identical once all intermediate subscript occurrences have been deleted are said to be almost alike.

I would like to be able to claim that derivations which are almost alike or congruent (i.e., related by the transitive closure of the union of (a) and (b)) are indistinguishable. As will emerge, however, only derivations which are both can be identified. Such derivations are

A formal distinction between open and closed assumptions is drawn below.

A Multiple-Conclusion Calculus 83

said to be alike, and in what follows a derivation will be considered well-defined if it has been specified uniquely up to likeness.

(2) The quasi-derivations II and II' are said to be compatible if no subscript with an intermediate occurrence in II occurs anywhere in II', and vice versa.

For the remainder of this work, I will tacitly assume that all quasi-derivations II satisfy the following condition: no subscript which occurs on an assumption or conclusion of II has any intermediate occurrences in II. This guarantees that, for any II and II', it will always be possible to find a pair of mutually compatible quasi-derivations which are like them.

I will write

n ii Bk

for

(3)

n*(An/i),An Bk

where n occurs nowhere in II*, and II* is a quasi-derivation like II which contains no intermediate occurrences of k. Similarly,

n

Bk Cj

An

will denote

|n**(^n/i),An,^ nc j

where n occurs nowhere in II**, which is like II except that it contains no intermediate occurrences of k or j .

(4) Let 111 and n 2 be quasi-derivations with conclusions of the form A{ and S j , respectively. I will write

U \An/i),An, A Ft 1

n**(n ^ R n m\ U \An/i),An, 1 1 {£>m/j)inm, X-Y

^ J -

ni n2 Ai B3

as

Ck

where II* and II** are like III and n 2 , respectively, except that neither contains an intermediate occurrence of k and they are compatible with one another; furthermore, m and n are distinct subscripts which occur nowhere in II* or II**.

84 Normalization, Cut-Elimination and the Theory of Proofs

(5) The notations

Bk Bk Cj Az

A{ Ai Bk Cj n n ni n2

are dual to the above—the duality being between assumptions and conclusions, and as such do not require separate explanation.

Granted that the application of a rule is to be interpreted in terms of combination, there are still a number of different ways to read the axioms and rules of Chapter 4 as the clauses of a definition of (propositional) derivation, depending upon how these rules are to be applied. It is most natural to think of them as being applied downwards so that the definition becomes:

Axioms: For all A, n and m, A7^ is a derivation of Am from An? Rules:

(1) If II is a derivation of A, Am from T and II' is a derivation of A', Bn

from r", then

n ir

AABP

is a derivation of A, A7, A A Bp from T, Ff for all p. (I do not intend to exclude the possibility that A A Bp is already a member of A or A'.)

(2a) If n is a derivation of A, A A Bp from T, then

n AABP

is a derivation of A, Am from T. (Again, Am may be a member of A.) With the possible exception of rule (5), it should be obvious how the remaining clauses are to be formulated, so there is no need to list them here. As for (5):

(5) Let II be a derivation of # m , A from T, then

(4,/»)n Bm

A->BP M

is a derivation of A —• £?p, A from T — An, where q is a subscript (distinct from p) which occurs nowhere in II, and A —• Bp ^ is the label A —> Bp augmented by some notation which indicates that any

2I have changed the form of the axioms slightly so that they will satisfy the definition of quasi-derivation given above.

A Multiple-Conclusion Calculus 85

vertex so labelled write

for

Two remarks:

(1) It is apparent that the derivability relation characterized by this set of derivations coincides with the consequence relation determined by the rules of Chapter 4.

(2) If no restrictions had been placed on subscripts, or indeed if subscripts had been omitted entirely, the result would have been equally satisfactory from the point of view of derivability. I mention this to emphasize that the complications these involve have less to do with the multiple-conclusion approach than with the use I want to make of it—in particular, to study normalization and compare the derivations of different calculi. Formulations of more conventional systems suitable for these purposes would be no less complicated.

As mentioned above, there are other ways in which the class of derivations might be defined. One possibility is to apply the rules upwards. A slight complication arises because rule (5) will now allow closed assumptions to be introduced into a derivation. Furthermore, when this happens, it is inconvenient to insist that there are no other closed assumptions of the form in question already present in the derivation. For the purposes of this paragraph, therefore, let the subscripts on closed assumptions not be classified as intermediate. (Of course, this alters slightly the meaning of compatibility, as well as the various notations explained in terms of it.) With this proviso, the clauses of the definition are easy to state. For example, clause (2a) reads:

If II is a derivation of A from T, Am then

AABp

n is a derivation of A from F if every occurrence of the assumption Am

3It is not hard to make this description more precise and to spell out a particular labelling procedure, but the above should be sufficient. Also, it goes without saying that the definition of quasi-derivation is now modified to include labels of this kind, and that an assumption is closed if it is discharged by some vertex.

discharges all assumptions of the form Aq. I will

n Bm

A-*BP {An}

(Aq/n)n Bm

A->BP <«>

86 Normalization, Cut-Elimination and the Theory of Proofs

in II lies above a vertex labelled (A A B) —• Cq ^ (for some C, </), and from T, A A Bp otherwise.

Similarly, clause (5) reads:

If II is a derivation of A from r , A —> Bp and n appears neither as a subscript nor as an annotation in II,4 then

Bm

A-+Bp <»)

n is a derivation of A from T if A = B and n — m, and from T, # m

otherwise.

It is a routine matter to write out the remaining clauses. The class of derivations obtained in this way is not identical with the previous one, although the two are deductively equivalent.

Except from a heuristic point of view, the above method of constructing derivations is perhaps little more than a curiosity. Of more interest is the class of derivations which results from applying the introduction rules, i.e., rules (1), (3) and (5), downwards and the elimination rules, i.e., (2), (4) and (6), upwards. By closing this class under an operation CUT such that the result of applying CUT to

r n

A,Ai

and AiX

IT A'

is a derivation of A, A7 from T, V whenever all occurrences of Ai as an assumption in IT' are minor premises of an application of rule (6), a natural interpretation for the logical rules of (the propositional part of) LK can be obtained.5 No matter how CUT is defined, this class of derivations, call it ND, is not co-extensive with either of the other two. The cut-elimination theorem, however, ensures that it is deductively equivalent to them.

To give a more complete account of the relationship between Gentzen's calculi and the above, quantifiers have to be considered and a substitution operation appropriate to multiple-conclusion derivations defined. The latter presupposes that the set of such derivations has been fixed. For this reason I will not pursue the idea of the upward application of rules any

4Weaker restrictions on n are possible, but these are convenient. 5This is so once some provision has been made for negation. For the version of LK

found in Appendix B it is enough to require that rule (7) be applied only to axioms. More traditional formulations of the sequent calculus are best handled by introducing additional rules corresponding to the left and right rules for this connective.

A Multiple-Conclusion Calculus 87

further here, and will subsequently propose an alternative interpretation for left rules.

Turning now to quantifiers, there is no doubt about what form their rules should take, namely:

(9Q) A(b)n (10g) QxA(x)n

QxA(x)m A(b)m

where Q is V or 3. The only questions concern the restrictions to be placed upon them (apart from the obvious one that x be free for b in (9^)). Briefly, in some suitable sense b must be arbitrary in (9V) and new in (103). Also, the inference from V3 to 3V must be blocked. This is usually accomplished by placing some rather cumbersome restrictions on the rules in question, which in the present case would need to be supplemented by restrictions on some of the other rules.6 It seems preferable, therefore, to adopt a slightly different approach.

Recall that the vocabulary of our language C includes both variables and parameters, and that variables can only occur bound in a formula while parameters can only occur free. (C is assumed to be countable.) Expressions which are like formulae except that they may contain free variables are called quasi-formulae. I will write V for the set of variables of C and P for its set of parameters; V 0 P = 0. As before, a, 6, c , . . . will range over the members of P , but I will now use x,y,z... to range over V U P and reserve v,vf,vi,... for V. Now, let U and E be the sets of quasi-formulae of C whose principal connective is a universal and existential quantifier, respectively; let / : E U U »-> V, and consider the following rules.7

6 The rules I have in mind are the two premise ones, and the need to place restrictions on them is a result of the fact that derivations here are not written in sequence form. For example, the requirement that b occur nowhere in the derivation of the premise of an application of (103) is not sufficient to prohibit the invalid combination

3xA(x)m 3xB(x)p

A(b)n B(b)q

A(b) A B{b)r

To exclude this, rule (1) must not be applied to premises containing occurrences of the same parameter when at least one such occurrence in each premise comes from an application of (103).

7For the record:

(1) In (9V), A(v') is the result of replacing every free occurrence of f(VvA(v)) in A(f(VvA(v))) by v'', where (unless v' = f(tivA(v))) v' does not occur in A(f(VvA(v))).

(2) In (93) , v' does not occur in A(x) (unless v' = x). (3) In (10v), A(x) is obtained from A(v') by replacing every free occurrence of v' by

x. Also, if x is a variable, it is free for v' in A(v'). (4) In (103), A(f(3vA(v))) is obtained from A(v') by replacing every free occurrence

of v' by f(3vA(v)). (Condition (2) on / below ensures that, unless f(3vA(v)) = v', f(3vA(v)) cannot occur in 3v'A(v').)

88 Normalization, Cut-Elimination and the Theory of Proofs

v A{f{VvA{v)))n v WA(v% i y ) WA{v')m

U U ) A(x)m

(9B) _ ^ _ ( 1 0 3 * * ^ V 3v'A{v')m

v y A(f(3vA(v)))m

By placing some simple conditions on / and adjoining these rules to the propositional ones given earlier, a formalization of classical predicate logic is obtained which is adequate in the sense that, if T and A are sets of formulae, there is a derivation of A from F iff A\ T —> W A is classically valid. The following conditions suffice for this purpose:

(1) / i s one-one. (2) There is an enumeration {va)a<(3 of Range(/) such that, if /(^4) = v7

and va occurs in A, then a < 7.

(1) ensures that f(VvA(v)) is suitably arbitrary and that f(3vA(v)) is sufficiently new, whereas (2) rules out the possibility that, for some A{v,v\), v', v", f(3vA(v,v')) = v" and f(VviA(v",vi)) = v'—thus blocking the inference from V3 to 3V.

The idea is that f(3vA(v)) can be interpreted as an individual which satisfies A(x) and f(\/vA(v)) as one which fails to satisfy it, whenever such individuals exist. More precisely, if (M,a) is any model for £, it can be expanded to a model ((.M, (vQ)Q<jg), a), where /3 is given by (2) above, for the language obtained from C by treating the members of Range(/) in their free occurrences as constants. I take a model for £ to be a structure M = (M, . . . ) of the appropriate similarity type together with an assignment a : P *-+ M. Also, for convenience, I assume that all models have the natural numbers as their domain. The £a 's are defined by induction on a (a < (3) as follows:

Suppose that f(A) = va-

(1) If A is of the form 3vB(v) and there exists m € M such that

((A*,(v7)7<a),3) N B{c) [m]

where c does not occur in 3vB(v), let va be the least such m. (2) If A is of the form \/vA(v) and there exists m G M such that

((M,(vy)y<a),a) \= B(c) ->-L [m]

where c does not occur in VvB(v), let va be the least such m. (3) In all other cases let va = 0.

(M,a) \= B(c) [m] means that (M,a!) (= B(c), where a' is the assignment which is like a except that it takes c to m.

The rules are obviously sound under this interpretation and any model for C can be expanded in this way. The adequacy of the rules therefore

A Multiple-Conclusion Calculus 89

follows. It is also worth mentioning, perhaps, that a function satisfying the conditions specified above is easily defined for any £; in fact, it can be constructed by a routine adaptation of the technique used in the Henkin completeness proof to extend a consistent set of sentences to a saturated one.8

As with rule (5), in the presence of conclusions other than those operated upon by the rule, (9V) is not intuitionistically valid. For example, it allows us to derive

n n vx \/v(A(v) V B)m

I 1 " ' A{f(WA(v')))VBn l

v; A(f(WA(v')))p Bq ,

[6 } \/vA(v) V Bs

Furthermore, even in the absence of such additional conclusions, (103) and (5) taken together lead outside the confines of intuitionistic validity. This can be seen from the following derivation:

Bm B -> 3v'A{v')n (6)

(103) 3, 3v'A(v% (5.2) ^ A(f(3vA(v)))

J B -> A(f(3vA(v)))r {Bm} ^ } 3v'(B - A{v'))8

It seems reasonable to claim that (103) itself is intuitionistically valid and that rule (5) is problematic. Certainly, when the rules are applied downwards, the only way to obtain intuitionistic logic seems to be by restricting the latter—and, of course, (9V)—rather than (103). A purist might argue that the individual asserted to exist on the basis of an intuitionistic proof of 3vA(v) depends on the derivation of this conclusion and not simply on its form. Even a version of (103) modified to meet this objection, however, would not prevent the derivation of 3v(B —• A(v)) from B —• 3vA(v). It may seem anomalous that a rule which has traditionally been claimed to express the meaning of intuitionistic implication turns out to be invalid. This is so, however, only when it is applied to </uasz-formulae, and no effort has been made to interpret these in an intuitionistically meaningful way. The question of how best to characterize the intuitionistic part of this calculus will be deferred until after substitution has been defined. 9

8This interpretation notwithstanding, I should emphasize that / is just an auxiliary syntactic device. It is not even part of the vocabulary of £ , and certainly not a logical symbol comparable to Hilbert's e-symbol.

9Notice that, if the definition of ND is extended to include rules (9^) and (10^), the former being applied downwards and the latter upwards, there is no need for / or for the use of quasi-formulae. It is sufficient to require that the parameter generalized in (9V), or introduced by (103), not occur in any other assumptions or conclusions of the derivation.

90 Normalization, Cut-Elimination and the Theory of Proofs

In the light of my earlier remarks on the subject, the definition of substitution should come as no surprise. The only questions will concern its properties. Given any graph Q, let c{Q) be the cardinality of its set of vertices. Furthermore, let D be the class of derivations generated by rules (1)-(10^) when they are applied downwards and their application is interpreted in terms of combination in the manner outlined earlier.

Definition 5.2 Let II be a quasi-derivation with conclusions of the form Ai and let II' be a member of D, [U/Ai]Uf, the result of substituting U for each occurrence of the assumption Ai in IT, is defined by induction on c(II/) as follows:

Case 0: c(II') = 1

ifn; = i4i, [n/^ii'is n. If IT 7 ^ , [n/Ai]TV is IT. Now suppose c(lT) = n (n > 1) and that [ I I / ^ ] ! ! " has been defined for all II" such at c(II//) < n. There are a number of cases to consider, depending upon the form of II', but these fall into four groups.

Cases 2, 3, 7, 9^ and 10^: IT is of the form IT

zr~R

where R is any single-premise, single-conclusion rule except (5). Let p be an index which occurs nowhere in n or II", then

[nMt](n"(cp/n)) \a/Ai]n' = cp

Case 4: II' is of the form n"

C\tDn

This is just a notational variant of the preceding cases. Cases 1 and 6: IT is of the form

iii n 2

D

where R is rule (1) or (6). Let r and s be indices which occur nowhere in II or 11', then

rnM<](n1(flr/m)) [U/AijW = Br

Dq

[nM,](n2(c,/n))

A Multiple-Conclusion Calculus 91

Case 5: II' is of the form n" Cn

B —• Cm {Bp}

where p occurs nowhere in II and only on assumptions of the form

Bp in II".

Let q be an index distinct from p which occurs nowhere in II or IT, then

\n/Ai](TL»(Cq/n)) [n/Ai]n' = cq

B —> Cm {Bp}

The foregoing definition may fairly be claimed to express the meaning of substitution in the context of multiple-conclusion derivations. Nevertheless, it might be thought that substitution could simply have been identified with combination. D, however, is not closed under arbitrary combinations. It is, of course, immediate that D is closed under substitution as defined above. Furthermore, if n is a derivation of A, Ai from T and IT is a derivation of A ; from T\Ai, then [ I I / ^ i T is a derivation of A, A' from r , H . It only remains to show:

Theorem 5.1 The operation of substitution is well-defined.

Proof. The problem here is that the various cases of the definition are not exclusive. They are easily seen to be exhaustive. The only doubt that may arise is on account of Case 5. Recall however that, if

IIi

A->Bm {Aq}

and p is any subscript not occurring in II, then

(Ap/qWx n = An

A • JDm \Ap)

as well. In other words, if II is obtained by an application of rule (5), it can always be written in the form required by Case 5. The proof is by induction on c(lT), utilizing the fact that combination satisfies

(5.3) p i , A,,n2],B7,n3] = [[ni,B„n3],A„n2] provided that Ai is not among the conclusions of II3, nor Bj among those

of n2 .1 0 The basis, c(II') = 1, is trivial. As for the induction step, there are

Cf. the remarks following the definition of combination.

92 Normalization, Cut-Elimination and the Theory of Proofs

ten kinds of case to consider—some divided into subcases—corresponding to pairs of the four kinds of case in the definition. By way of example, I will sketch the case in which both representations of II' fall under the third of the four inductive clauses of the definition, i.e., the case in which

nx n2 n3 n4 n' = B^ c_ = £ F_

D G

(I omit subscripts to simplify the notation.) There are four subcases to consider.11

n" Subcase 1: For some „ „

o hi

n" n4 ux n" U2 = C E F and n 3 = B C E

G D

n" Given a derivation ~ „ let

G, hi

[n/411" [u/A]u4 \n/A]U! [u/A]u" ~ = E F and ,-, = B C C G E D

1 1 The point to be emphasized is that these are the only possibilities. It was to ensure this that I insisted on distinguishing between derivations which are almost alike and did not simply identify the application of a rule with combination. If IT is thought of as having been constructed by two different sequences of operations, say (ai)i<n and (/3j)j<m, then it is not possible for an and /3m to be distinct unless they operate upon disjoint sets of vertices. I have in fact arranged matters so that, if (ai)z<n and (Pj)j<m are both constructions of IT, then m = n and, for some permutation p of l , . . . , n , ap(i) = Pi (1 < i < n).

Suppose, for example, that

i i i n 2 n 3 n 4

IT = Ai B± = Ai Bj_ Ck Ck

then there is no 11^ such that

/n3 n^\ n 2 = \Ai B j f i n l ( B < / w )

I i i = II3, and II4 = Xl'^(Bjjn). It is straightforward to check that, had cases of this kind not been ruled out, substitution would not in general have been a well defined operation.

A Multiple-Conclusion Calculus 93

Then,

rix n2 [n/A}Ux [u/A]n2 \a/A]_B C_ = _B C_

D L ) [n/A^x *(n")

B c D

9(11") [U/A}Tl4

E F G

n3 n4 [n/A}E F

by (5.3)

G

n" Subcase 2: For some R „

n" n4 n" n2 n i = B E F and U3 = E B C

G D

n" Subcase 3: For some „ R

n3 n" n" n2 n i = E F B and n 4 = F B C

G D

n" Subcase 4: For some „ „

n3 n" nx n" n 2 = E FC and IL, = B C F

G D

These three are just notational variants of Subcase 1, and there are no additional complications associated with the remaining cases. •

Finally, notice that substitution satisfies the following conditions:

(5.4) [[u/AijnjBj]^ = [ii/Ai] ((n1/5>]n2) provided that Bj is not among the conclusions of II nor Ai among the assumptions of II2, and

(5.5) [U/AillUjBj]^ = [U1/Bj][Il/Al]U2

provided that Ai is not among the assumptions of II1 nor Bj among those of II. It is an easy matter to prove both (5.4) and (5.5) by induction on

c(n2).

94 Normalization, Cut-Elimination and the Theory of Proofs

Suppose now that Ai ^ Bj and that m,n occur nowhere in II, III or n 2 . I will write

n ni Ai B3

n2

for [n (> l n / t )M„][n 1 (S m / j ) /5 m ] (^„ / i ) (5 T O / j )n 2 . In view of (5.5) above, the latter can be described as the result of simultaneously substituting the conclusions Ai of II and Bj of IIi for the assumptions At and Bj, respectively, of II2. This notation generalizes in the obvious way to

ni nn

4\" X n

which denotes the result of simultaneously substituting the conclusions A% of Ilfc for the assumptions A\k of II (1 < k < n). I will also write

n A% Bj rii n 2

for [[U{An/l)(Bm/j)/An}(An/i)n1/Bm}{Bm/j)n2. I would like to be able to claim that

W/AAnjBjfo = [[U/B^/A^, provided Bj is not among the conclusions of 111 nor Ai among those of n 2 , and to describe

n A% Bj ni n2

as the result of simultaneously substituting the conclusions A% and B3 of II for the assumptions Ai of III and Bj of II2, respectively. Unfortunately, however, the latter is not in general equal to

n Bj Ai n2 ri!

This notation is, therefore, intended to represent consecutive, rather than simultaneous, substitutions—the order of substitution being indicated by the left/right order of the conclusions of the derivation being substituted. I will also sometimes write [ n / ^ j n ' as

n

ir (The notation introduced above for substitution coincides with that for

upward application in those cases where the derivation being substituted

A Multiple-Conclusion Calculus 95

has the form of one of the rules. For example,

AABm

n can be read either as the result of substituting the conclusions An of

for the assumptions An of II or as the result of applying rule (2a) upwards to the assumptions An of II. To resolve this ambiguity, I specify that henceforth the former interpretation is always the intended one.)

Now that substitution has been defined, it is possible to establish the relationship between members of D and the derivations of the familiar Gentzen calculi, at least for those versions of them which do not involve thinning. In the case of the N calculi, this means excluding applications of V- and 3-elimination which do not discharge an assumption in the derivation of each minor premise. (That such applications involve a thinning procedure will be argued below.) Thinning is a troublesome feature of Gentzen's calculi which affects properties involving normalization and normal forms more than derivability.12 It seems best, therefore, to postpone discussion of this rule until we consider normalizability.

I begin by considering NJ with a view to interpreting its rules in such a way that they generate a subclass NJ& of D. This, in turn, induces a structure preserving map between the derivations of NJ and NJE>. It should be obvious how to proceed except for a couple of points of detail. One of these has to do with subscripts. If they are included as part of the formalism of NJ, there is no problem. If they are not, however, each derivation of NJ must be associated with a subset of D whose members are congruent. Unfortunately, this relation is not really a congruence with respect to any

1 2In the case of the L calculi, derivability is unaffected by the omission of thinning if the rules are formulated after the manner of Chapter 2 above, rather than Chapter 1, and the negation rules are replaced by axioms for _L. The absence of thinning, however, complicates somewhat the proof of the cut-elimination theorem since it requires the reduction steps to be supplemented by a pruning operation on derivations. A similar complication arises in the proof of the normalization theorem for TV if empty assumption classes are not allowed in applications of V- and 3-elimination. It was perhaps for some such reason that Gentzen included the rule, although his motivation in the case of TV might equally well have been to treat all the rules which discharge assumptions in a uniform way. (If —•-introduction is restricted so that it must always discharge an assumption, it will still be deductively equivalent to the usual formulation of the rule in the presence of the rules for conjunction. The restricted rule, however, spoils both the normal form theorem and the separability property.)

I take "derivable from T" to mean "there is a derivation with assumptions from among the members of T." If it means "there is a derivation with assumptions T," it still remains unaffected by the absence of thinning (for the calculi with which we are concerned) although other nice properties, like separability, will fail.

96 Normalization, Cut-Elimination and the Theory of Proofs

operation, such as substitution or applying a two-premise rule, which combines two or more derivations. In the case of such operations, therefore, it is necessary to specify the representative from each congruence class to which they are to be applied.13 There is no particular difficulty involved in doing this. So, for the purposes of the comparison, it makes little difference whether subscripts are included as part of NJ or not.

The other point concerns the proper parameter in an application of V-introduction or 3-elimination. There are two possibilities. The first is to allow quasi-formulae to figure in NJ derivations and modify the restrictions on these rules to ensure that the proper 'parameter' is always an appropriately chosen free variable. The second is to exclude quasi-formulae and introduce a one-one correspondence g between parameters and variables. The rules can then be left unchanged except for the requirement that the proper parameter of an application of 3-elimination, for example, with major premise 3vA(v) must be g(f(3vfA(v'))) for some v'. (A similar remark applies to V-introduction.) Since matters have been arranged so that there will always be infinitely many parameters available for each such application, little if anything is lost by this restriction. Nothing much depends on which alternative is chosen, but I prefer the second and adopt it below.

With the exceptions of V- and 3-elimination, the schematic descriptions of the rules for NJ correspond to my notation for the application of a rule in multiple-conclusion logic. Hence, they can be read ambiguously as generating the conventional tree derivations or members of D. As for V-and 3-elimination, their application is interpreted as follows:

n [M [Bk]

n nx n2 AW Bp

A w D n n corresponds to Ai Bk

111 112 Jp ^q ^q

a <7

and

Lq Gq

n n iii

W f l ( x ) )J 3vA(v)p

-, , , N „ corresponds to A(x)a

3vA{v)p Cr y ^/q

Cr

13 A similar procedure is necessary in NJ itself. For example, if each of

n , IT A a n d B

contains an assumption class involving C, some way is needed to indicate whether these two classes are to be amalgamated or kept distinct in

n ir A B_ AAB

A Multiple-Conclusion Calculus 97

(where x = f{3v'A(v')) and Ili is obtained from 111 by replacing all occurrences of g(x) which are linked to those in assumptions of the form A(g(x))q by x).14 I write NJQ for the class of derivations generated by the rules of NJ interpreted as instructions for the construction of multiple-conclusion derivations. There is then an obvious isomorphism between the derivations of NJ and the members of NJp (or congruence classes thereof— depending upon whether subscripted formulae are allowed to appear in NJ derivations). If almost alike members of NJ& are identified, this becomes a homomorphism from the derivations of NJ onto those in NJp. I shall have more to say about these correspondences later.

I turn now to LK; there is no need to give a separate treatment of LJ since it is simply a special case of LK. As before, everything is very straightforward except for a couple of points of detail. These are for the most part concerned with the formulation of LK to be chosen. If sequents are taken to be of the form T h A, where T and A are sets of subscripted formulae, and the negation rules are replaced by axioms for _L, the rules of LK are easily interpreted as generating members of D. (Again, there is a small point to be considered concerning the proper parameter of an application of V-right or 3-left. The situation is exactly analogous to the case of NJ, however, and I propose to deal v/ith it in the same way.) I will sketch such an interpretation for this formulation, and then indicate very briefly how it can be adapted to other versions of the sequent calculus.

There are two kinds of axioms, namely

Ai h Aj and _Ljh Aj

for all A, i and j . These are interpreted as the derivations

A) and - ^

respectively. As for structural rules, interchange is redundant and thinning has been excluded for the time being; cut corresponds obviously to substitution, and contraction is just a special case of resubscripting an assumption or conclusion. (It is clear that D is closed under such resubscripting in the sense that, if II G D, then (Ai/j)W and W(Ai/j) are both in D for some IT which is like II. In case this is properly an instance of contraction, i.e., in case Ai is already among the open assumptions or conclusions of II, IT can be taken to be II.) Right rules are handled in the same way as the introduction rules of NJ, so it only remains to consider the left rules. To ensure that D is closed under these, a way must be found to interpret

14Let a and {3 be occurrences of a parameter or free variable in a derivation {i.e., of the same parameter or variable), then a and (3 are said to be linked if (a,/?) is in the least equivalence relation 1Z satisfying the condition:

(a,/3) £ K iff the formula occurrence which contains /3 lies immediately below the one containing a.

98 Normalization, Cut-Elimination and the Theory of Proofs

them in terms of downward applications of multiple-conclusion rules. This is accomplished with the help of substitution. I give the examples of V-and —•-left below; it should be apparent how to proceed in the remaining cases:

n rr Aj,r\-A Bj,V \-A' corresponds to AvBk,r,r h A,A'

AvBk

Ai Bj

n ir A A'

r

and r n

r h A, Ai BhV h A' corresponds to A Ai Au~^ Bk

r , n IT

, 4 - > £ f c , r , r ' h A , A ' Bj

IT A'

Let LKD and LJD denote the classes of derivations generated by the rules of LK and LJ, respectively, when interpreted as above.

No real difficulty arises if sequents are taken to be of the form 6 H ^ , where 0 and \£ are sequences of unsubscripted formulae, and rules of interchange are added to LK. A sequent in this sense can be associated with each member of D as follows:

Given an enumeration e of the formulae of £, define

Ai > Bj iff i > j , or i — j and A comes after B in e.

For U € D, replace all free occurrences of variables in the assumptions or conclusions of II by their images under g~l, then place the formulae thus obtained from the assumptions in decreasing order (with respect to > ) to the left of h, and those which result from the conclusions in increasing order to the right. Finally, delete all subscripts. The rules of LK can now be interpreted as generating the equivalence classes of members of D obtained by identifying congruent derivations associated with the same sequent. One need only ensure that a formula occurrence introduced by the application of a rule is assigned a sufficiently large subscript, and allow for the necessary resubscripting in the case of the two premise rules. Interchange is taken care of by one or more changes of subscript.

If the axioms for _L are replaced by left and right rules for negation, there are two possibilities. One is to define negation in terms of _L and treat its rules as special cases of the corresponding ones for implication. The drawback to this approach is that LJ cannot be obtained from LK by allowing no more than one formula to appear on the right of a sequent; in addition, sequents of the form 0 \-L,A or 0 h A, _L are needed.15 The

1 5 The resulting system [when restricted in the manner indicated in the text] is an adequate formalization of classical [intuitionistic] logic in the sense that @ h ^ is provable

A Multiple-Conclusion Calculus 99

other alternative is perhaps more natural. It involves treating negation as primitive and augmenting the multiple-conclusion rules of Chapter 4 by the analogues of ->-left and -<-right. This can be done conveniently, however, only after the introduction of thinning, so I shall omit a discussion of these rules here.

I turn now to NK. Whether we follow Gentzen and add as axioms all instances of the law of excluded middle, or adopt Prawitz's classical negation rule, it is easy to extend the interpretation of NJ to NK, as the following shows:

A V (A —>JL)i corresponds to

and

[A - ± „ ] n . corresponds to

-Lfc

Ay Lm

A.p -L« AV{A^

[Am] AW lp

±)i A^±r {n} AV(A-+±)i

Ai ^ A —±n {m}

n -u At

I will use NKID and NK2D, respectively, to denote the extensions of NJ& obtained by means of each of the above; NK& will denote ambiguously NK\D or NK2D- It is clear that NK\D 7̂ NK2D> For example, the particular derivation of A V -iAi chosen to interpret this axiom is not in NK2D<

and [An]

AV ±n

Ap J-q

A —>_Lr{n} As

A.p

is not in NKID. Another alternative suggested by Gentzen is to add the double negation rule

-1-1,4

iff A 6 —• V ^ ' is classically [intuitionistically] valid, where V is a non-empty subsequence of \t obtained by deleting 0 or more terms of the form _L. If # is never allowed to contain more than one formula, the result is a system of minimal logic. Thus, we can dispense with the axioms for X without adding any new rules or axioms and still characterize, albeit in a slightly artificial way, minimal, intuitionistic and classical logic. Similar considerations would allow us to dispense with rule (7) at the expense of some artificiality.

100 Normalization, Cut-Elimination and the Theory of Proofs

The obvious interpretation of

n (A ->±) - ± i

Aj

IS

[An] AV±K

Aj i p n

A->J.g{n} ( A ^ ± ) - > ± j

in The resulting extension of NJD, however, is just a subset of NK2D' As a matter of fact, none of these extensions seem particularly natural. What seems to be essential to each of them is the use of the configuration

[Am] A V 1 P

Ai JL 2-A-+±n{m}

in constructing derivations. But the obvious way to do this, while preserving the single-conclusion character of NK, is to admit the rule

[A] hA] n n' c c

c whose interpretation will be

[A, AV ±p

A.i -Lj

n A ^ l „ { m }

ck n' ck

The resulting extension of NJ&, although obviously a proper subset of LKD, properly includes NK\D U NK2D-

The uniform interpretation of Gentzen's calculi in terms of D helps us better understand the similarities and differences between their respective rules. Furthermore, it allows relationships between the derivations of the various calculi to be expressed in a rather satisfactory way. In particular, it is a routine matter to verify (by induction on the rules in each case) that

NJD = LJD C NKD C LKD C D .

For the propositional parts of LKD and D, this last inclusion can be replaced by an equality. The reason why LKD ^ D is that the restrictions on

A Multiple-Conclusion Calculus 101

(9V) and (103) are more generous than those placed on V-right and 3-left (or on V-introduction and 3-elimination, for that matter). Clearly, there is no sequent derivation corresponding to

A(f(VvA(v)))n

WA(v')m

or 3v'A(v')n

A(f(3vA(v))) m However, even members of D which contain no quasi-formulae among their (open) assumptions or conclusions may not be in LKp. Consider, for example,

ni n2 3v{A(v)AB{v))n BvijAdv) A B(vi))m

A(x) A B(x)p A(x)AB{x)r

A(x)q B(x)a

A{x) A B(x)t

3v2(A(2v) A B(v2))u

where x = f(3v3(A(v3) A Bfa))) for some V3.

6

Reduction Procedures

The interpretation discussed at the end of the previous chapter can, of course, be extended to the various reduction procedures for TV and L calculi. I shall not spell out how this is to be done since it is a purely mechanical matter to translate Prawitz's reduction steps for NJ, for example, into operations on the members of NJD- I shall, however, assume such a translation in the discussion which follows. In addition, the rules for generating D themselves suggest a method of normalization. The possibility seems to exist, therefore, for a uniform treatment of reduction in all five calculi. It is to this topic that I now turn.

Suppose U £ D and A is any formula. There is no need to refer to the rules in order to explain what it means for an occurrence of A to be maximal in II.

Definition 6.1 An occurrence of A is maximal in a derivation II if there is a subformula B of A such that B occurs as one of its immediate predecessors and successors in II.

In other words, a maximal occurrence is one which appears in a configuration of the form

B i A

I B

(I will not bother with subscripts for the moment.) A reduction step deletes the maximal occurrence of A and identifies the two occurrences of its sub-formula B. This is essentially what all the familiar reduction procedures accomplish. Unfortunately, there are various complications which obscure somewhat the basic picture. These are discussed in (l)-(4) below.

(i)

Reduction steps are thought of as operations on derivations. This means, in particular, that the result of applying one to a derivation must

102

Reduction Procedures 103

itself be a derivation. The procedure described above does not satisfy this condition, however, since matters have been arranged so that the application of a rule may involve more than one occurrence of a formula. By itself, there is no reason why this should lead to different reduction procedures for the different calculi—except in the case of NK. (This is because NKp is the only class under consideration which is not closed under all the usual reductions.) In addition, however, reductions are required to preserve as far as possible the local character of the above procedure and this means, in effect, defining them relative to a set of rules: a reduction removes a certain kind of inference step from a derivation, namely all the occurrences of a formula which figure as the premise of a particular application of a rule together with all the occurrences of the conclusion of this application. But different sets of rules do suggest different reductions when the latter are conceived in this way. For example, suppose II, II' G NJp and IT results from II by removing an inference step of this kind while leaving the rest of the derivation intact; even though n and IT must also be in Up, there is no reason to suppose that the relationship between them can be expressed in such simple terms relative to the rules of L J. This dependence on the rules seems to be the major source of the differences between various reduction procedures.

(2) Because there are rules with more than one premise or conclusion, the

configuration displayed above may occur as part of a larger configuration having one of the forms

B B B C C B

i i \/ \S A A A A

/ \ S\ I I B C C B B B

Once the vertex labelled A has been deleted and the two vertices labelled B have been identified, the question arises as to what should be done with the vertex labelled C and that part of the derivation connected to it. There are basically two alternatives: one is to delete these as well, the other is to retain them while ensuring that C does not appear as an additional assumption or conclusion in the resulting derivation. For the latter an operation is needed which allows redundant formula occurrences or derivations to be adjoined to a given derivation. Out of deference to tradition I propose to call this operation thinning.

Although the most satisfactory procedure would seem to be one which pruned as much of the derivation as possible, there are a number of reasons for adopting the latter alternative, or something approximating to it. In the first place, the inclusion of thinning may actually simplify reduction in a calculus. This is because, if we consider the sequence (or possibly, tree) of

104 Normalization, Cut-Elimination and the Theory of Proofs

operations by which a derivation is constructed, V- and 3-elimination (in NJ and NK), left rules (in LJ and LK) and right rules (in LK) are all such that applications of them may be made redundant by the removal of an earlier step of the construction. So, without thinning, reduction steps cannot operate simply on some initial subsequence (or subtree) of the construction. (In terms of the traditional representations of the derivations of Gentzen's calculi, the situation can be described by saying that applications of these rules which lie below a given inference step may be made redundant by its removal. As a result, a reduction step cannot simply operate on the subderivation terminating with the configuration to be removed, while the rest of the derivation remains unchanged, unless redundant applications of at least some of the rules are allowed.) Considerations of this kind are probably sufficient to explain Gentzen's treatment of thinning. It does not seem to be based on any general principles but, on the contrary, to be rather ad hoc and designed simply to facilitate the proof of the normal-form or cut-elimination theorem.1

A second reason to allow some thinning is the well-known fact that, in its absence, normal forms are not in general unique. More precisely, they are not unique in any fragment which contains both conjunction and disjunction, as the following configuration illustrates:

A Ay B A B C

BAC C

Although the uniqueness of normal forms is an important desideratum, it did not become one until relatively recently and, therefore, cannot properly be used to explain features of the traditional reduction procedures.

Finally, without at least thinning on the right, the cut-elimination theorem will not hold for LK; there will, for example, be no cut-free derivation of\-A,A-+B.

There is no doubt that the treatment of thinning is another source

1In the case of natural deduction, the fact that redundant applications of only two of the rules are allowed suffices to justify the claim that Gentzen's treatment is ad hoc. In the case of the sequent calculus, it is the use of thinning in the cut-elimination procedure which is disturbing. Neither alternative mentioned in the text is employed consistently; instead both of them are permitted, as well as everything in between—that is to say, using the example in the text, the whole subderivation connected to C may be deleted, selectively pruned, or left entirely intact. Its final form is determined by the position of the inferences from which it is constructed relative to the cut being eliminated. The principle is that, when an inference becomes redundant as a result of a cut-elimination step, it is deleted if it lies above the cut in question and retained if it lies below it. One consequence of this is that derivations which differ from one another in what are usually thought to be insignificant ways ( i e . , when one results from the other by a trivial permutation of inferences) may reduce to entirely different cut-free forms.

Reduction Procedures 105

of differences between reduction procedures. From the present point of view, however, it should not be. We can hardly claim to have understood the significance of reduction if we are unable to decide on general grounds what to do with those parts of a derivation made redundant by the removal of maximal formula occurrences. Furthermore, our decision on the matter should not simply reflect what is convenient given the format of a particular set of rules.

(3)

A third complication arises from the fact that the removal of an inference may add open assumptions or conclusions to the non-redundant part of a derivation. For the members of D and its various sub-classes, this can only occur in the reduction of maximal occurrences of an implication. There is no need to dwell on this case, however, since complete unanimity exists on how it is to be handled: such a reduction involves the removal of an application a of rule (5) followed by an application (3 of rule (6), so the derivation of the minor premise of /? can be substituted for all the assumption occurrences reopened by the removal of a. (Negation treated as a primitive and governed by rules analogous to -i-left and —>-right provides another example of a connective which requires a reduction step involving this kind of complication.) Notice that, were it not for this feature of the step for implication, reduction would be a trivial matter. This is especially evident for the members of D since all the other reduction steps actually diminish the size of the derivation to which they are applied.

(4)

The last point I want to raise is connected with the first. Call a collection X of occurrences of a formula in a derivation a premise or conclusion occurrence if its members constitute together one of the premises or conclusions, respectively, of a single application of some rule. If X is a premise occurrence, it is clear that its members need not all belong to the same conclusion occurrence (even if they are all intermediate) which means, in particular, that X may contain both maximal and non-maximal occurrences. The procedure outlined in (1) must, therefore, be supplemented if it is to succeed in removing all maximal formula occurrences from a derivation. For example, suppose X = X\ U X2 where X\ and X2 are conclusion occurrences of different rules, the members of X\ are maximal while those of X2 are not, and X is a premise occurrence of an application a of rule R. Clearly, what is needed is a way to replace a by successive applications a i and 0:2 of R which differ from it only by using the premises X\ and X2, respectively, instead of X. (X\ together with the conclusion of ot\ can then be removed in the manner described in (1) above.) In case R is a one-premise rule, this is accomplished by replacing the derivation in question by one which is almost alike. In general, however, two-premise

106 Normalization, Cut-Elimination and the Theory of Proofs

rules require something more complicated. In any event, the above demonstrates the need to incorporate some method of splitting up inferences into the reduction procedure.

This need is another source of differences between reduction in N and reduction in L. In both kinds of system it is met by steps which allow the permutation of inferences. The effect of the permutative conversions in N, however, is to allow a premise occurrence to be split only when it contains some maximal members. On the other hand, there is no such restriction in L, where the fact that cuts may be permuted upwards past any other inference legitimizes every kind of split. As in the case of thinning, these differences are most plausibly explained by reference to the format of the rules and what is convenient for the proof of the normal-form or cut-elimination theorem. This state of affairs, however, is no more satisfactory than the corresponding one in (2). We ought to be able to decide on the basis of general principles, and independently of any particular formalism, which inferences may be split (without affecting whatever properties are preserved by reduction) and which, if any, may not.

Let me conclude this discussion with a couple of observations. All the problems associated with establishing a correspondence between cut-elimination and normalization derive from the issues raised in (2) and (4) above, i.e., from the need for thinning and permutative reductions. Because these are not treated in an altogether satisfactory manner in the traditional accounts of reduction, it would be desirable to provide a treatment which is in some sense more natural and, in addition, applies equally to both N and L calculi. By itself, this does not seem to be an especially difficult task; what is harder is to accomplish it in such a way that uniqueness of normal forms and the strong normalization property are preserved.

I turn now to a description of possible reduction procedures for the members of D. In view of my previous remarks, a necessary preliminary step is to define an operation corresponding to thinning. To this end, I propose to augment rules (l)-(7) of Chapter 4 by the following:

It would seem more natural to formulate these rules with Ak replaced by A{. The resulting class of derivations would not be closed under contractions however.

Thinning is essentially a method of combining derivations together, while at the same time dispensing with some assumptions or conclusions. There are at least two ways to represent it. The one I have chosen is to attach the thinned derivation (or formula) to the occurrences of a conclusion. The drawback to this approach is that there is a measure of artificiality in-

Reduction Procedures 107

volved in interpreting the usual thinning rules since these make no reference to a particular assumption or conclusion by means of which the thinned formula is attached to the rest of the derivation. The alternative is to do without attachments of this kind altogether and allow derivations to be disconnected graphs; roughly speaking, derivations will be constructed by taking sets of derivations in the old sense and discharging some of their assumptions or conclusions. Although superficially attractive, this approach is fraught with difficulties. It becomes necessary to redefine such basic notions as likeness, combination and substitution. Substitution, however, is not easily defined for derivations of this kind without sacrificing some of its basic properties. It must either be defined relative to some construction tree of the derivation being substituted into, or distinctions must be made between a variety of derivations all of which are essentially alike. (An example of the kind of undesirable distinction I have in mind is that between {II} and {II, IT}, where II and IT are alike and compatible.) It is technical problems of this kind which have led me to prefer the thinning rules given above.

Let DT denote the class of derivations which is like D except that it is generated by the rules (LT) and (RT) in addition to (l)-(7), (9Q) and (10Q) . The definition of substitution is easily extended to the members of DT for, although new cases arise, these fall within the groups specified in the definition. In particular, case (RT) can be treated with Cases 1 and 6, and case (LT) can be treated with Case 4. A similar remark applies to the argument that substitution is well defined: new cases arise, but no new kinds of case. In view of this, I shall not bother to rewrite the definition, but will take it for granted that substitution into a member of DT has been properly defined. Furthermore, the notation introduced earlier in the context of members of D will be used unchanged for the derivations of DT-

The rules of Gentzen's N and L calculi can be interpreted in DT without the restrictions required by D. In the cases of NJ and NK, this involves extending the earlier interpretation to cover applications of V- and 3-elimination which do not discharge any assumptions in the derivation of the minor premise (or which do not discharge assumptions in both of them in the case of V-elimination). The idea is simply to use (LT) to add assumptions of the appropriate form when these are lacking. I consider the case of 3-elimination below. It should be obvious how to apply the same technique to the troublesome cases of V-elimination: Using the same notation as before,

n Mt))q ^ k z\ Af \ r corresponds to A(x)q IV

108 Normalization, Cut-Elimination and the Theory of Proofs

where IIJ is n^ if rii contains occurrences of A(g(x))q as an assumption, and U{ is

A[X)g C r

otherwise. To interpret LJ and LK, provision must be made for left and right

thinning, which I take to be the following pairs of rules:

. Bk,T\-A ... r\-A,Bk

(°) A. a. r u A (6)

and

Then,

Ai,Bj,r\-A v ' Ai.ri-A.B,-

(°) ru A p. A. (b> r\-A,Bi,Ai v 7 Bj,T\-A,Ai

n Bfc, T h A corresponds to

A* Bj Bk T

n AuBj,T\-& A

n r h A, 5fe corresponds to

^,rhA,B,

n r h A, 5fe corresponds to

r n

Aj Bk A

r n

A Bk

n-A,*,,* w^-

n Bfc, T h A corresponds to

B,

B . - . r i - A M i

rsfc ^ n A

When k = j m the above, these rules are obviously equivalent to the familiar thinning rules for the sequent calculus. Their (b) versions do not add to the deductive strength of the calculus, for it is easy to demonstrate that without them

h A AihA

is a derived rule. In other words, given

n I - A

Reduction Procedures 109

it is always possible to construct

IT

without using the (b) versions of left or right thinning. The argument is by induction on the number of inference steps which follow the last application of —»-right in II and is routine.2 As for their resubscripting function, it can be duplicated by the instances

r h A ' * * and * * ' r h A

of the (a) versions. (The right-hand side of a sequent can never be empty in this formulation of L.) These (b) rules are not even necessary for cut-elimination to hold. They do, however, make it possible to give a more systematic and, I hope, more rational treatment of thinning in the cut-elimination procedure, and that is why I have chosen to include them.

The thinning rules of an L calculus for deriving sequents composed of sequences of unindexed formulae are not hard to interpret in DT* Right thinning is thought of as operating on the right most formula of a sequent, and left thinning on the left most one. I mention this only to emphasize once more that all variants of Gentzen's calculi can be interpreted within this multiple-conclusion framework without the need to tinker with their rules.

To substantiate this claim further, let us briefly consider how to interpret -i-left and -i-right in a multiple-conclusion calculus which treats negation as primitive. There are a number of ways to add rules for negation to DT. Perhaps the most convenient is to follow Kneale and allow rules which have zero premises or conclusions.3 Then rule (7) can be replaced by:

(7') An "Am and (8') —

(Here * is supposed to indicate that (7') has no conclusion and (8') no premises. It can be thought of as an auxiliary symbol whose function is to close certain assumptions or conclusions. It is convenient to stipulate that, when * occurs as a result of applying (7')> this occurrence cannot be part of an application of (8') as well.) Let D'T be the class of derivations generated by this modified set of rules.

Corresponding to D'T is a sequent calculus obtained from LK by removing the axioms for _L and adding the usual negation rules. It is easy to see

2 "last" here means "having no application of —+-right below or to the left of it." 3 The Development of Logic, page 542.

110 Normalization, Cut-Elimination and the Theory of Proofs

how this calculus can be interpreted in D'T.

n r h A, An corresponds to

r , - A m h A

and

r , An h A corresponds to r h A, -,Am

Everything else is as before. Notice, however, that the (6) version of thinning on the right is necessary in this formulation of the sequent calculus if r h Ai is to be derivable from V K

Let A/rj£,T be the class of derivations obtained by interpreting the rules of NJ as instructions for constructing members of DT, and similarly for NKDT1 LJDT and LKDT. It is an easy matter to establish relationships between these classes similar to those described at the end of Chapter 5. Now, however,

NJDT C NKDDT C LKDT C DT

and NJDT C LJDT C LKDT C DT.

It is not the case that NJ&T — LJryT C NKJJT, because in the TV calculi thinning is only used when it is needed for an application of V- or 3-elimination, whereas arbitrary left thinnings are allowed in LJ.

I now want to list some reduction steps for the members of DT- In order to simplify matters, let DT be modified so that An in rule (7) is always atomic and A itself is distinct from _L I assume that the corresponding modifications have also been made to both the classical and intuitionistic negation rules in natural deduction, and to the axioms for ± in the sequent calculi. It is well known that such changes do not affect deductive strength. I propose to describe these reductions without first specifying what it means for a derivation to be in normal form. My reason for so doing is that I am unwilling to commit myself in advance to a particular notion of normal form. Clearly, however, a normal derivation should contain no maximal formula occurrences and should possess the subformula property—or something approximating to it. In addition, I would like to give a sufficiently comprehensive and general list of reduction steps so that the familiar reduction procedures can be interpreted in terms of them. This is my primary concern here, and I will not worry about whether all reduction sequences terminate or about the uniqueness of normal forms until later.

To state the reductions I must introduce a further piece of notation. Consider for a moment the simplest kind of proper reduction step in any Gentzen calculus. Roughly speaking, it can be translated as follows:

r n

AAn -.A

Reduction Procedures 111

If II has some subderivations of the form II' A

f(A) Introduction

Elimination

then II reduces to the result of replacing these by derivations of the form

IT A

The problem is to specify what constitutes a subderivation of II, and what it means to replace a subderivation in II by some other figure. In the case of tree derivations the answers to both these questions are familiar and obvious. The matter is a little more complicated here, however. I want to say that III is a subderivation of II2 if the latter can be obtained from the former by applying a series of rules of inference, and that the result of replacing II1 by II3 in II2 is the figure obtained by applying this same series to II3 rather than III. (In fact, I am really describing a special kind of subderivation here, namely one whose assumptions are also assumptions of II2. Call it an initial subderivation. A more general notion is obtained if, in addition to applications of rules, substitution for the assumptions of IIi is allowed.) The step described above cannot always be expressed in the form

n' A n'

f(A) reduces to A A n"

U"

even if it is simply the translation of a reduction in NJ or LJ. Suppose, for example, that I want to remove a maximal occurrence of f(A) in the derivation of the left minor premise of an application of V-elimination. This will correspond, on the multiple-conclusion interpretation of the rules of NJ, to the removal of one or more occurrences of f{A) from a derivation II* in NJDT- It may not be possible, however, to represent II* in the form shown on the left above, where the occurrences of f(A) displayed in the figure are those to be removed by the reduction. II* might have the form

n i c

( n2 \ A

f(A) A

V n3 / E

D

n4 E

112 Normalization, Cut-Elimination and the Theory of Proofs

or be the result of substituting this latter for the assumptions E in some II5, and so on. (Again, it is the failure of generalized associativity, and in particular of condition (4.4), which is responsible for the present difficulty.) Another possibility is that II* results from the figure shown by applying rule (5) to its conclusions of the form E. In this case, there may not even be a II5 such that II* is of the form

{[[[U1/C}Il+/A}U3/D}Ui/E}U5

where n + is n2 A

f(A) A.

These considerations underlie the following definition.

Definition 6.2 (1) a. For n e DT, the subset S(I1) of DT is defined by induction as follows:

i. II € S(n) . ii. If IT € S(I1) and II" results from II' by applying any one

premise rule, then II" € S(II). iii. If

III

then

R € S(II) and

rii n2

n2 eDT

n2 II! c n Bm

Dp

Bm Cn and _ ^ _ _ _

are both in S(II), where R is any two premise rule, b. S'(II) is defined like S(II) except for the additional clause:

iv. If IT G S'(II) and II" G DT has conclusions of the form A%, then \n"/Ai]n' E S'(n).

II is said to be a subderivation of II* if II* G S'(II), and an initial subderivation of II* if II* G S(II).

(2) a. So(II) is defined like S(II) above except that (i) is replaced by the clause:

i;. If II' G DT, and Ai appears among both the conclusions of II and the open assumptions of II', then [ n / ^ ] l T G So(II). II' G S„+i(n) iff W G So(n / ;) for some II" G Sn(II).

b. S;0(II) is defined like S0(II) except that S'(II) replaces S(II) in

its definition.

S^+ 1(n) is defined like Sn + i(II) with S^(II) and S'0(U) playing the roles of Sn(II) and So(II), respectively.

II* is said to have subderivations of the form II if II* G S^(II) for some

Reduction Procedures 113

rc, and to have initial subderivations of the form II if II* G Sn(II) for some n.

If II* has initial subderivations of the form II, then II* is the result of applying a sequence of operations to II. These can be taken to consist of substitutions and applications of rule (5), since applications of all the other rules can be treated as special cases of substitution. If E abbreviates such

a sequence, II* may be written as y. (Notice that, although the latter de

notes a derivation, E by itself merely stands for a sequence of operations.)

Suppose that II* E Sm(II) and that IT is a derivation with the same con-IT

elusions as II. It is obvious that y the result of applying the operations

in E to IT,4 is a member of Sm(II /) with the same conclusions as II*. If, in addition, the open assumptions of II' are included amongst those of II,

IT then those of v will also be included amongst those of II*. I propose to use

E, E', E i , . . . to stand for sequences of such operations in general. Using this notation, the reduction step (6.1) above can be written as

IT

-A- II' f(A) reduces to v

E

It is just the special case in which E consists of a single substitution.

I will write II* as ,y. to indicate that II* has subderivations of the

form II. (/.e., the parentheses around E mean that it may contain operations which require substitution for the assumptions of the derivation under construction.) It might be argued that, since the notion of initial subderivation does not correspond to anything very natural when applied to the derivations of N and L, the reduction steps for these calculi could be better formulated in terms of subderivations. The above would then become

IT

-A- n' f(A) reduces to , v .

The former, however, seems more appropriate in the present context. Both

4 What is intended by this phrase should be sufficiently clear for the purposes of the present discussion. See definition 3 below for a proper account of it and of some related notions.

114 Normalization, Cut-Elimination and the Theory of Proofs

formulations are, of course, equivalent because, if

n' A

IT = f(A) A

(S)

it is always possible to find II" and E' such that

n" A

—— rr n" n* = f{A) and ,£. = E ,

E'

(I expand on this remark below.) The differences between the reduction steps for NJ, LJ and LK cannot

be expressed simply by placing conditions on E. From this perspective, (proper) reductions in all three calculi are essentially of the same sort. To represent them we must in general allow the derivation on the left to be a member of Sn(II+) , for any n, where

IT A

n+ = f(A)

This kind of reduction step does not seem particularly natural, however, when the rules for generating D? are considered. (The criterion of naturalness employed here is that a single reduction should remove exactly those maximal formula occurrences which constitute the conclusion of a single application of an introduction and the major premise of a single application of an elimination.) The reason is that substitution is not a basic rule of DT- In view of this, it would be more appropriate to restrict the step described above by requiring that the derivation on the left be a member of S ( n + ) (where II+ is as above). There is a sense, therefore, in which a proper reduction step in any of Gentzen's calculi corresponds to a series of one or more natural reductions in DT-

Definition 6.3

n IT (1) Given _, and II', _ is defined by induction on the length n of E as

follows: II IT

a. If n = 0, y is just II and y = If.

Reduction Procedures 115

b. If n = m + 1, ^ results from yn where E' has length m, by

performing one of the following operations:

i. Substituting the conclusions Ai of y, for the assumptions

Ai of II" (for some A{ and n")

II" if Ai is among the conclusions In this case y = L,/Ai

yn and = _,, otherwise,

ii. Applying rule (5) to the conclusions Ai of _,,

n;. In this case y is the result of applying rule (5) to the con-

IT . n' IT elusions Ai of yf if such there be, and = , otherwise.

n n' IT (2) Given ,y, and II', ,y, is defined in the same way as ~ except for

the additional clause:

iii. Substituting the conclusions Ai of II" for the assumptions

A of n

IT IT In this case ,„v = [W'/Ai], ,y

(3) Given ,~. and II' such that II, II' G LJDT and LJDT is closed under

IT . IT the operations in E, / v n is defined in the same way as ( , except that (iii) is replaced by the clause:

iii'. Substituting the conclusions Ai of II" for the assumptions

A-of n

A% 0 t (E') n' IT IT

In this case / v l = [ n / ; / i 4 i ] / v n if / v , x has assumptions n'

of the form Ai, and / v l = [II"/.42]n* otherwise, where n'

IE'} IT

IT = , D • (-^j *s the- conclusion of / V / V )

Bj

Notice that, even if the open assumptions and conclusions of II' are

116 Normalization, Cut-Elimination and the Theory of Proofs

n' included in those of II, y may have open assumptions which are closed

in v . A similar remark applies to the relationship between , , and , v v

Notice also that, IT IT

if II = i y then II = . ,.

for some £'. Before listing the reduction steps, I introduce one last convention. As

remarked earlier, a reduction step is taken to have the general form:

If II has initial subderivations of the form IIi, then II reduces to the result of replacing IIi by II2 in II.

For ease of writing, however, I will simply display the subderivations directly involved in the reduction, so that the above will be written

IIi reduces to II2

rather than

* reduces to § for any E.

The reductions themselves fall into three groups:

I. Proper Reductions

These remove maximal formula occurrences.

(i) ni n2 ni n2 An -Dm reduces to An -Om AABP Bq

Bq

and similarly for Aq in place of Bq.

(2) n An

n An

A V Bm reduces to

and similarly for Bn in place of An.

(3) a. n ' II Bm reduces to

V Q

n(A,/n) Aq

An A->BP W Br

n,(Br/m) Br

if Aq is among the assumptions of IT. (Notice that, because II and II' have to be compatible, II cannot have conclusions of the form Aq.)

Reduction Procedures 117

b. II' II Bm reduces to

n n' A R

An A^BP W Br

BT

otherwise.

(4) n A(v')n reduces to

n'{A(x)p/n) A{x)p

VvA(v)m

A{x)p

provided that the figure on the right is a derivation with the same open assumptions and conclusions as the one on the left, where II' is obtained from II by substituting x for each occurrence of v' connected to those in conclusions of the form A(v')n.

5

(5) n U{A(x)p/n) A(x)n A{x)p

3vA(v)n reduces to n" A{v')p

ir

provided that the figure on the right is a derivation with the same open assumptions and conclusions as the one on the left, where II" is obtained from IT by substituting x for each occurrence of v' connected to those in assumptions of the form A(v')p.

Comment: The restrictions on (4) and (5) are required because DT is not closed under transformations of the kind employed in these steps. More disturbing than this fact is the impossibility of eliminating all maximal occurrences of universal and existential formulae by means of these reductions, even from derivations without quasi-formulae among their (open) assumptions or conclusions. This can be seen from the following rather trivial examples:

5Connectedness is defined in footnote 5 of Chapter 3. In the presence of (LT) and {RT), it seems best to stipulate that for the purposes of Clause 2 of this definition Bk does not lie immediately below Ai in a configuration of the form

Ai £ *

nor does Ax lie immediately below Bk in

Bk B3 A, •

118 Normalization, Cut-Elimination and the Theory of Proofs

\/v{A{v)VA{v))n A(b)t

(A{v')\/A{vf))m 3vA(v)r 3v"A(v")8

A{v')p A(v')q A(V')P A(V')Q

\fvA{v)r W'A(v")s a n d A(v') A A(v')m

A(b)t 3v(A{v) A A{v))n

(where v' = f(VvA{v))) ( w h e r e </ = f(3vA(v)))

It seems better to deny that derivations of the above sort have normal forms than to admit as reduction steps the radical transformations necessary to remove their maximal formula occurrences. To prove a normalization theorem for DT, therefore, it becomes necessary to show first that each II € DT (whose assumptions and conclusions consist only of formulae) can be associated with a derivation in some convenient subclass, like LKDT,

having the same assumptions and conclusions as II. Unnormalizable derivations involving only maximal universal formula

occurrences can conveniently be excluded by restricting (9V) in such a way that the proper variable of the inference does not occur in any other conclusions of the derivation of its premise (or, if it does, that there is no connection between these occurrences and those in the premise itself). Existential formulae are not so easily dealt with, however. There appear to be no convenient restrictions on the rules of DT which will exclude derivations like the right-hand one above. If they are to be excluded, this is best accomplished by incorporating substitution into the rules and rewriting (103) to mimic 3-left or 3-elimination.

II . Pe rmuta t ive Reduct ions

These allow inferences to be split up. In particular, given an application of an elimination whose major premise consists of maximal and non-maximal occurrences of some formula, it can be divided into a number of applications of the same rule each of which is such that its major premise is entirely composed either of maximal formula occurrences or of non-maximal ones. This is a necessary preliminary to the removal of the maximal occurrences and represents a generalization of the procedure whereby maximal segments are removed using permutative reductions in JV.

(1) Let n 2

l l n —- Cyy

n3 if CT is among the conclusions of II2, and let II2 = n 2 otherwise. Also, let

III Tl = Ap

n2

Reduction Procedures 119

and assume that IIj has conclusions of the form Cr • Then

n

n3

reduces to \S<p S*-p

n3 ir2'

Let n =

IIi Bp

rw and n' = cm

A - 5 , rw and

n2 Also, assume that II2 has neither open assumptions of the form Ak nor conclusions of the form Bp and that Cm i=- A —• Bq. Then

n IT Cm reduces to Bp

n 2 A-*Bq {k}

Comment : The above, taken together, are equivalent to the LK conversion which allows a cut to be permuted upwards past any inference. To translate this same conversion step for LJ (and, a fortiori, the permutative reductions for NJ and NK) (2) is not needed, only the special case of (1) in which II1 is of the form

n* Dq A.p

n4

where Cr is not among the conclusions of 11* nor Ap among those of II4. In other words, the only permutative reduction needed for LJ is

IT n* JDq Ap IJq Ap

n4 n2 reduces to /n 4 \ /n2 KJf \cr\ \cr IT3 W ln3. Notice incidentally that, when II2 has no conclusions of the form C r ,

(1) becomes a symmetrical transformation, namely,

IIi IIi Ap Cr reduces to Cr Ap.

n2 n3 n3 n2

III. Thinning Reductions

These are of two kinds:

A. pruning reductions which remove inferences from the derivation of the thinned premise in an application of (LT), and from the derivation below the thinned conclusion in an application of (RT), and

120 Normalization, Cut-Elimination and the Theory of Proofs

B. permutative reductions which allow applications of (LT) and (RT) to exchange places with another inference.

Reductions of kind (A) are needed to ensure that normal derivations possess the subformula property. As for those of kind (B), they are intended to take care of any "maximal segments" which may arise because of the presence of the same formula in the premise and conclusion of a thinning. When it comes to the choice of thinning reductions, there is a certain incompatibility between the claims of reason and expediency. In addition, because they are not given much consideration in the usual treatments of normalization, there is little to guide decisions about which to include. The following, therefore, is intended to be a comprehensive list of the various possibilities. I do not mean to suggest that all of these reductions are necessary, or even desirable, nor to rule out the possibility of placing some restrictions on those which are found to be acceptable.

A. Pruning Reductions (1) a.

Ill n» Al

n w Bk Q

cn

n„ A? c,

Up

s to

IIi '_

^u Cp Gp

CP Bj

Cp

Ca BZ

B™ are where A\x,..., A™n are all the open assumptions of II, and B]x

all its conclusions with the exception of Bk> (Here and below p is supposed to be some new index.)

b. If II has no open assumptions

n ir

Ca

reduces to

un

B

II' Ci

Uv Bj

3rn

Reduction Procedures 121

where B}x,..., Bf^ are all the conclusions of II with the exception of Bk> In the special case where II has only Bk as its conclusion, the figure on the right is to be interpreted as U'(Cq/i).

(2)

n Op

n Ci

Cq Bk reduces to —n—7*

Op

ir CP Bi,

L/p

Cq BTm

where A\x,..., Afn are all the open assumptions of IT with the exception of Bk, and B} . . . . , BY1 are all its conclusions.6

(3) n Ai Bj

Bk

reduces to n(Bfc/j)

Bk

(4) n Bj reduces to

n(sfe/j) Bk

Bk Ai reduces to

(5) n 11' reduces

n n' 3 t O -^71 \^T

ck

11' reduces Ck

(6) n An n'

reduces to

n „̂ n'

c~k

reduces to / l m 1 r Og

where r occurs nowhere in 11 or 11'.

6In Dip (the version of DT with rules for -i, rather than axioms for _L) the possibility arises that IT may have no conclusions. In this case, the figure on the right is taken to be

n(c,„) whenever Bk is the only open assumption of IT.

122 Normalization, Cut-Elimination and the Theory of Proofs

Comments: (a) (1), (2), and (3) are sufficient for interpreting the reductions employed in the usual normalization/cut-elimination procedures for the various Gentzen calculi. Unless they are restricted in some way, however, normal forms will not be unique.

(b) Notice that (1), (2), and (6) have the property that the figure on the left has the same open assumptions and conclusions as the one on the right. In (5), on the other hand, the figure on the right may have additional open assumptions.

B. Thinning Permutations

(i) iii n 2 n 2

Ai Bj reduces to II i Bj ~W^~ At C~q

R

Cd Cd and vice versa, where R is any single-premise, single-conclusion rule. If the step is read from left to right, I assume that the only occurrences of Bk as a conclusion in

ni n2 Ai Bj

Bk

are the ones displayed and q is supposed to be a new index; furthermore, if R is an application of rule (5), I assume that no assumptions in Iii are discharged by it. If the step is read from right to left, I assume that the only occurrences of Cq as a conclusion in

n2

Ei cq

are the ones displayed and k is supposed to be a new index; furthermore, if R is an application of rule (5), I assume that no assumptions in rix become closed by it. (2) a. iii n 2 n 2

Ai Bj reduces to IIi Bj

and vice versa, where R is any two conclusion rule. If the step is read from left to right, I assume that the only occurrences of Bk as a conclusion in

ni n2 Ai Bj

Bk are the ones displayed and q is supposed to be a new index. If the step

Reduction Procedures 123

is read from right to left, I assume that the only occurrences of Cq as a conclusion in

n2

Bj Cq Dn

are the ones displayed and k is supposed to be a new index.

(2) b. i i i n 2 n 2

Bj reduces to Bi Bk

R

"m ^n R Dn At Cq

Cn

and vice versa. Again, R is any two conclusion rule and the same assumptions about Bk and Cq are made as in (2)a.

(3) a. Eh n 2 At Bj n 3

Bk Cn

Dm R

reduces to n2 n3

Iii Bj Cn

Ai Dq

Dm and vice versa.

(3) b. iii n 2

n 3 At B3

Cn Bk }

reduces to

n3 n2 Iii Cn Bj M Dg

R

R

Dn D„ and vice versa.

In both cases, R is supposed to be any two premise rule; furthermore, if the step is read from left to right, I assume that the only occurrences of Bk as a conclusion in

ni n2 Ai BL

Bk

are the ones displayed and q is supposed to be a new index. If the step is read from right to left, I assume that the only occurrences of Dq as a conclusion in

n2 n3 Bj Cn

Dn

or in

n3 n2 Cn Bj

Dg

124 Normalization, Cut-Elimination and the Theory of Proofs

in the case of (3)b, are the ones displayed and k is supposed to be a new index.

(4) n

Ak Bi reduces to

n

cJR Cn Bj

where R is any single-premise, single-conclusion rule, and vice versa. Also, the usual assumptions (i.e., those made in (l)-(3) above, adapted in the obvious way to the present case) about Ak and Cq apply.

(5) a. n Ai

n Ai

Ak Cn Dr

R Bi reduces to Cn Da

R

and vice versa.

(5) b. n Ai

Dm Bj

n Ai

Ak

Urn ^n

Bi R 3

reduces to Dn &n

Dm Bj

and vice versa. In each case, k is supposed to be any two conclusion rule, and the usual

assumptions apply to Ak and Dq.

(6) a.

versa.

n i n2 M Cn Ak^

~DZ~R

n i

At

Bj reduces to

reduces to

n2 nx ^n Ai

and vice

(6) b.

versa.

n i n2 M Cn Ak^

~DZ~R

n i

At

Bj reduces to

reduces to

Dm Bj

nx n 2

versa.

Ak Cn 0 Bi

reduces to

reduces to Dg

versa.

Ak Cn 0 Bi

reduces to

reduces to

n B.

R

i- /m

and vice versa. In each case, R is supposed to be any two premise rule, and the usual assumptions apply to Ak and Dq.

Some additional permutations are needed for D'T to alter the position of a thinning with respect to applications of the negation rules. The following will suffice for this purpose:

(7) II' reduces to n

n ->AP

~>An Bk At Bk

IT -*An

and vice versa. If the step is read from left to right, the only occurrences

Reduction Procedures 125

of-.A n as a conclusion in ir

-»4» ->^n Bk

are the ones displayed and t is supposed to be a new index; if the step is read from right to left, the only conclusions of

n Am

At Bk

having the form At are the ones displayed and n is a new index.

(8) n x n 2 i ii n IT Bk ~^Ap reduces to Bk Am n 2

Am ~*An At ~~*Ap * *

and vice versa, where similar assumptions apply to -*An and At as in the case of (7).

(9) * * ->Ap Am reduces to -*An Aq

-"An Bk Am Bk

and vice versa, where q is any index if the step is read from left to right, and similarly for p if the step is read from right to left.

(io) n * * Bk ~^Ap Am reduces to -iAn n

-*An B* A«

A

and vice versa—p and q as in the case of (9).

Comment: These permutations are more than are needed for normalization. To reduce maximal segments to maximal formula occurrences, it would be sufficient to allow them to go only one way, either from left to right or from right to left, and apply only to certain rules. My motive in presenting them as I have done above is to leave open the possibility of equating derivations which differ from one another only by some permutations of thinnings.

This concludes the list of possible reduction steps.

7

Correspondence Results

I now want to consider briefly the question of a correspondence between the steps described in the preceding chapter and reduction in LK. I will adopt for a moment the terminology according to which the derivations of LK are mapped onto LKpT. Call this mapping (/> and let d, d ' ,d i , . . . range over sequent derivations, d >i d' means that d reduces to d' by applying any one of the reduction steps listed in Appendix B, and > is the transitive closure of > i . It can reasonably be claimed that the reduction procedure characterized by > is in essence the familiar one which derives from Gentzen. It consists of three kinds of reduction step: the elimination of a cut one of whose premises is a logical axiom, the replacement of a cut by one of lower degree, and the permutation of a cut upwards past another inference. Unlike the traditional procedure, however, these steps are not applied in any systematic order (as determined, say, by the degree of the cuts or their position in a derivation); if a step can be applied, it may be applied.

The other significant difference concerns the treatment of thinning. Ignoring distinctions which cannot be made in the usual formulations of the sequent calculus, it is fair to describe the reductions listed in the appendix as allowing any application of thinning to be permuted upwards. Although not part of the traditional procedure, these are perhaps not such a radical extension of it as they may appear to be, for if d! is obtained from d by such a permutation (or indeed by permuting any other inference upwards), we can find d1? d[ such that <j>(d) — </>(di), <j>{d!) — <£(di) and d\ reduces to d'i by an upward permutation of cut. In a sense, therefore, the traditional procedure already sanctions the upward permutation of thinnings. (Once this is granted, my formulation of the steps for reducing the complexity of cuts—according to which any necessary thinnings are applied above the new cut rather than below it—can be seen not to differ significantly from the customary ones.)

My aim is to characterize in terms of some subset X of the reduction

126

Correspondence Results 127

steps listed in the previous chapter a relation yx between members of LKDT

such that the following hold:

Theorem 7.1 IfU yx W, then d >i d! for some d,d' such that </>(d) = U and<l){d,)=Uf.

Theorem 7.2 If d >i d', then </>(d) ± (f)(df).

Here y is the transitive and reflexive closure of >-i, and n ^ i l l ' means that for some IIi, 11̂ and E:

n = ^ ,11' = ^ , and "II x reduces to Iii"

is an instance of one of the reductions in X. Let X include all the steps in (I) and (II) together with (IIIA1) and

(IIIA2). In addition, X should contain the reductions in (IIIB) restricted as follows:

(1) If R is an introduction, the step applies only from right to left. (2) If R is —^-elimination, (3) and (6) are replaced by

i. (36) and (6a) from left to right.

n. n' n n A-+Bm A? i r

Ap A —• Bk Cn reduces to Aq cn A —• Bm

Br Br

where the only occurrences of A —> Bk as a conclusion in

IT A-Bm

A-+Bk Cn

are the ones shown, and q is supposed to be a new index. iii. IIi IT Ux n

II Cn A —• Bm reduces to Cn Ap W Ap A^Bk Aq A^B,

Br Br

where the only occurrences of A —• Bk as a conclusion in

iii n ;

^ n A —» Bm

A^Bk

are the ones shown, and q is supposed to be a new index. (3) If R is any other elimination, the step only applies from left to right.1

1(2ii) and (2iii) above can be included in X because, although they themselves are not members of (IHb), they can easily be derived from this group of reductions. ((2ii) is obtained by an application of (6a) from left to right, followed by an application of (6b) from right to left. Similarly, (2iii) is obtained by an application of (3b) from left to right, followed by an application of (3a) from right to left.)

128 Normalization, Cut-Elimination and the Theory of Proofs

(To establish a correspondence between the version of LK with rules for negation and LKJJT , the image of this calculus under </>, X must be augmented by (7) and (8) from left to right, and (9) and (10) from right to left.)

It is not surprising that (IIIA3-6) are omitted from X. (IIIA3) is a part of the normalization procedure for N, rather than of cut-elimination for £, whereas (IIIA4-6) are best described as experimental steps. The restrictions on the members of (IIIB) are also to be expected. From one point of view thinning permutations are all alike in the sense that, if one is allowed, there is no reason to exclude any other; the members of (IIIB) were selected with this in mind. On the other hand, from the more limited perspective of a cut-elimination or normalization theorem, the criteria for including such reductions are stricter and take into account only what is necessary and convenient, given the format of the rules, for proving the theorem in question.

Theorems 7.1 and 7.2 are established by lengthy and mostly routine inductive arguments. I shall merely outline these below, considering only one or two of the more interesting cases in any detail. Proof of Theorem 7. 1: If

ni . n;

£ ^ S,

we show by induction on the length n of E that, for some d!', d such that

(j){d) = ^ and cj){d') = ^ ,d >x d'

Basis Step: n = 0. There are various subcases to consider according to which step transforms IIi into 11^. For reductions from group (I), it should be obvious that the result holds by virtue of the corresponding reductions in group (B).2 Consider, by way of example, (II):

If (j){di) = IIi, and (j){d2) = II2, then the image under </> of

di AnhAn r\-A';Bm BqVBq

di An; T'\-A;AABp A A Bp r- Bq

r t -A ;A n A w ; r ' hA ,B q

r,r'hA,A',£, is

ni n2

AABP

Bq.

2See Appendix B below for a detailed description of the version of LK assumed here, together with a complete list of reduction steps for this calculus.

Correspondence Results 129

A single application of (Bib) to the former yields

d2

r ' hA ' ;B m

rfx An,r\-A',Bq BqhBq r i -AM n An;T'\-A',Bq

er (j) is

r , r ' hA,A ' ,B ,

ni n2

Bq.

whose image under <j> is

In group (II), (1) is taken care of by (C13), and (2) by (C16). Turning to (IIIA), consider first (la): Let <j)(di) = Hi (1 < i < n), <f)(d) = II and <j>{d!) = II', then the image

under (j) of the derivation given below is the same as the figure on the left in the statement of (IIIAla) in Chapter 6 above. (A is supposed to stand for {*£,...,££}.)

d' d T h A'; Ct

di A},;.. ;A£ \~A;Bk ft;r'hA',C, r t h A i ; A\ At;...; A? ; P h A, A', Cq

_ . 4 ; . . . ; A ? ; r „ r i - A 1 , A , A ' , C ,

dn . " r n h A n ; ^ .

r„,...1rl,r'hAn,...,Al,A,A',c, An application of (E4) to the above gives

d' d1 T'\-A';C,

Tx h A i ; Al Aj;...; A? ; V \- A, A ' ,C , . ^ ; . . . ; i 4 ? B ; r 1 , r ' l - A 1 , A , A ' , C ,

Tn\-An;Al

3 r^.^ri.ri-An A^A.A' ,^

The double horizontal line is meant to indicate a series of thinnings. These can be assumed to have been ordered in such a way that the image under </> of the preceding derivation is just the figure on the right in the statement of (IIIAla).

(lb) is just a special case of the above. As for (2), it is handled in the same way with (El) being used instead of (E4). It only remains to consider

3A fuller explanation of this notation can be found in Appendix B.

130 Normalization, Cut-Elimination and the Theory of Proofs

the reductions in (IIIB), but these are routinely handled by their analogues in (F5-16) or in (G). For example, when R is rule (1), (IIIB3a) is treated as follows:

Since rule (1) is an introduction, it is sufficient to find d and 6! such that

n2 n3 rii n2

*4=£jy*. «'Q~i4rL£: and d>id' BACm BACm

So, let Hi = (j)(di) (1 < i < 3) and take d to be

d2 ^3 r 2 h A 2 ; ^ r 3 h A 3 ; C n

di r 2 , r 3 h A 2 , A 3 ; g A C q

r t h Ai; ̂ ^ ; T2, T3 h A2, A3, B A Cm

r 1 , r 2 , r 3 h A 1 , A 2 , A 3 , 5 A C m

An application of (F14bi) to the above gives

d2

T2hA2;Bj d3

di Ai;r2)r&2;Bk r 3 h A 3 ; C n

T i h A i ; ^ 4 r 2 , r 3 h A 2 , A 3 , B A C m

r 1 , r 2 , r 3 h A 1 , A 2 , A 3 , S A C m

and this latter derivation can serve as d!. The remaining members of (IIIB) are dealt with in a similar manner. This establishes the basis of the induction. Induction Step: n = m + 1. There are only two cases to consider according to whether the last operation in E is substitution or an application of rule (5). Both follow trivially from the induction hypothesis, however.

This concludes the proof of Theorem 7.1. d

Proof of Theorem 7.2: The proof of Theorem 7.2 is similar, albeit a little more complicated.

Given d and d' such that d >i d', we show by induction on the number n of inferences below the cut or thinning to which the reduction is applied that <f>(d) h </>(d'). Basis Step: n = 0. There are innumerable cases to consider according to the nature of the step by which d' is obtained from d. (The cases below are numbered according to the classification of reduction steps for LK found in Appendix B.)

A. It is obvious in this case that (j){d) = <j>{d').

B. The reductions in this group are handled rather straightforwardly by means of their analogues in (I) supplemented, in some cases, by the

Correspondence Results 131

use of conversions from (II) and (IIIA). As an example, I consider (B2a) below. The other cases are similar (and usually simpler): So, let d and d! be the figures on the left and right, respectively, in the statement of (B2a) in Appendix B; furthermore, let V be {A\x,..., Al} and A' be {/?/,,..., B£ }. Then

4>(di) Bi

4>(d) = Ay Bk

A p Bq

and

<t>(di)

Bs

4>(d') = *L_

Bs

Bs

B^ .Bh

Ba

Bq By £ '

7"

where (for any r) , is the result of substituting the conclusions Bq of

7-r for the assumptions Bq of 4>{ds), and ^ is the result of substituting

the conclusions Ap of r for the assumptions Ap of ^(cfe) and then substituting the conclusions Bq of the derivation thus obtained for the assumptions Bq of 4>{ds). (The usual provisions about subscripts apply to avoid unwanted clashes. This presents no problem since alike derivations are treated as indistinguishable from one another. In particular, I assume that Bq & T U T'.) Now,

B1 Bl

<f>(d) >i R A = Bq A p Xi <j>(d'). q y P 4>{d2)

(The first reduction is justified by (12), the second by (IIIA2).) Notice that, in the case of (B2b), an application of (III) would be

132 Normalization, Cut-Elimination and the Theory of Proofs

required before (IIIA2) is applied, followed by a second application afterwards—the inverse of the first—to transform the resulting derivation into 0(d').

C l - 9 . In all these cases, it follows from the properties of substitution that (f)(d) and <t>{d!) are the same.4

CIO—18. Here there are various possibilities, depending upon the last rule of the left-hand premise of the cut.

(1) In case it is a one-premise left-rule, 0(d) and 0(d') are clearly identical.

(2) In case it is —•-right, the image under 0 of the step by which d! is obtained from d is just (112).

(3) In all the remaining cases except A-right, the image under 0 of the step by which d' is obtained from d is an instance of (III). This is obvious when the rule in question has only one premise or is cut. If it is V-left, take IIi in the statement of (III) to be

BAAn

&m Ap

so that this step takes the following form (assuming Cr is among the conclusions of n 2 ) :

BAAn

Bm Ap

n; n2

n3

B\An

reduces to

Similarly, if it —•-left, take fix to be

n; Bm B

*p>

so that (again assuming that Cr is among the conclusions of n 2 ) (III) becomes:

/ n j \ n; | £>TH t> • A n O f Bm B-*An

V n 2 ) reduces to n3 Ap

n2 CT Or

n3 n3 (4) It only remains to consider (CIO), the case in which the last rule

See equations (5.4) and (5.5) above.

Correspondence Results 133

on the left is A-right. To simplify the writing I will omit subscripts and assume that the cut-formula occurs in both premises of the application of A-right. So, d and d! can be taken to be

T\-A;C;A r'\-A';C;B d3

T, Tf h A, A', AAB;C C; T" h A"

and r ,r ' , r"h A , A ' , A " , A A #

d\ d% d2 c/3 FhA;C;A C;T"\-A" r'hA';C;B C',T" h A"

r , r " F A , A " ; A r ' , r " l - A ' , A " ; S

r,r',r"h A,A',A",AAB respectively. Writing IIj for 4>{di) (i = 1,2,3), we must show that

n n; n2 c t A B n3 AAB

where

ru n2 iij n = A B and Yl'} = C for j = l,2

AAB n3

Now EI can be written as

[njA] ( [n 2 /B]^_ | - )

hence by (III)

(7.1)

11 ( C y, [n'JAjt n3

v '^W^AA/0. n3)

But, by (5.4), (7.1) can be written as

[n iM][n 2 /B]^J - /c n3

which by (5.5) is the same as

(7.2) [n2/5][ni, '^-TAH n3

An application of (III) to (7.2) yields

[U'JBm/A}^^-

134 Normalization, Cut-Elimination and the Theory of Proofs

which is just

ni n'2 A. B_ AAB

D l - 5 . In these cases, the fact that 4>(d) = <f>(d') follows immediately from the following:

Lemma 7.3 / / j occurs nowhere in . then Ai

[TKA^/AjllU/A^W = [U/AiU^W

This lemma is easily proved by induction on c(Hf); the basis step is trivial, and the induction hypothesis, together with the various clauses in the definition of substitution, takes care of the rest.

D6—9. All these cases translate under (j) into the special case of (III) in which II2 is an assumption. They are all similar, so I will consider a single example, (D6): Let IIJ = 4>(di) (z = 1, 2,3), then it suffices to show that

n' A KB,

n3

n m >-i A A Bm

n3

II

AABS

where n'3' = (A A Bsjm ;)n'3,

n' -ni

AABm , Ai Bj AABm

n'2 , AABt

and

n" = ni n2

, AABt

AABS

But this is just the instance of (III) in which IIi = II", Ii2 = A A B^ and n 3 = n^.5

E. The translation of (1) under (f> is an instance of (IIIA2). Similarly, (4) translates into an instance of (IIIAla) (or (IIIAlb) if T is empty). (3) too translates into an instance of (IIIAl), the only complication being that it is necessary to replace the subscript i whenever Ai G A— but this presents no problem. It only remains to consider (2). Let T = {Al,...,Al}, A' = {B}^...,BfJ and assume A% £ A', then we must show that

Aj n Ai Bk y Ai

<Kdi) 4>{d2) 0(di) 5The term on the left in each equation refers back to the formulation of (III) in Chapter

6. According to this same key, Ap is replaced by A A Bs and Cr by A A Bm.

Correspondence Results 135

where A1

2±L-Aj

U = A?

B 3\

But

BV

Ai Bk yx

tfdi) 4>(d2)

by (III), from which

^ Ai Bkl

Bk <j>(d2)/Ai 4>{di)

n Ai

<t>{di) is obtained by (IIIA2). In case Ai € A, the index i must first be replaced by one which does not occur on A in A. Again, this presents no problem.

F l - 4 . In all these cases, with the exceptions of (F2c) and (F3c), it is an easy matter to verify that 0(d) = <j>(d'). As for (F2c) and (F3c), <j)(d) yi 0(d;) by an application of (III). To see this, consider by way of example the case of (F2c) in which R is V-left; the remaining ones are almost identical. We must show that

EVFm

Ck

En F0 EVFm

4>(d) (j>{df) y En F0

Ai n nr

where

n = Ck Ai_

Aj

4>(d') and n ' = Cfc Ai

in other words, that

£ V F T O (7.3) En F0

En 4>(d)/F0 4>{d')/Ai Ck A%

Ai

136 Normalization, Cut-Elimination and the Theory of Proofs

EVFni

reduces to

1 \En F0

But, by (III), (7.3) reduces to

E V F7

En U/F0 n'

En En 4>{d)IAi

Ck At A,

Fa

and this is the same as (7.4), since by (5.4)

E\/F„ En F.

En m/Ai Ck A{ EVFm

En F0

n'

En n F5-16. Roughly speaking, the reductions in this group all translate into

instances of (IIIB). There are, however, some minor exceptions and a few complications. Notice first that (j>(d) = <j)(d') in the case of (F9aii) or (F9bii). The remainder of (F5-10) all translate directly into members of (IIIB). When we consider (Fll-16), on the other hand, the situation is complicated by the fact that the active formula in the premise of the thinning being permuted may also occur in the premise(es) of the preceding inference. (Of course, this may happen in (F5-10) as well, but it causes no difficulty in these cases because they deal with left rules. As a result, given d and d' such that d >i d' by one of these reductions, we can find d" such that d" > i d! by the same reduction, 4>(d) = <j>{d") and the active formula in the premise of the thinning being permuted is introduced by the preceding inference.6) If it does, (III) must first be used to reduce cj){d) to a </>(d*) where d* is like d except that the final application of thinning has been split into two in such a way that <f>(d*) y± <f>(df) by an application of the appropriate member of (IIIB). This procedure takes care of all cases except for (F15aii) and (F15bii). These last two translate straightforwardly into instances of (III). (Cf. (F9aii) and (F9bii) above.) It is worthwhile to distinguish (F16) and (F13) from the other cases, however, because they require (III) to be applied even when the active formula of the thinning is introduced by the preceding inference. (In (F16) and (F13), (III) serves not only to split up the thinning, but also to permute one of the inferences which results.)

6 As an example, consider (F5a). Let d\ and d[ denote the derivations on the left and right, respectively, in the statement of this reduction step, then we can take d" to be

{A3/i)d A a ; (B j ) ; rh A

A s ; (B f e ) , (C n ) , rhA Bt;(gfc),(Cn),rhA

B f c ,c n , rhA

R

It follows from the familiar properties of substitution that (p(d^) = <f>(di). Furthermore, it is obvious that <t>(d") y\ <£(^i) is an instance of (IIIB1) from left to right.

Correspondence Results 137

So, there are really two kinds of case to consider. Let us take (Flla) as typical of the one, and (F16a) as typical of the other: F l l a . To simplify the writing, assume that Ai is distinct from Bk

and Cn , then d and d! can be taken to be

di _ r h A ; ( 5 m ) ; A

and

di rhA;(Bm);Ai

T h A ; B , R

rhA,5 f c ,cn

r h A , ( £ f c ) , ( C n ) ; ^ r\-A,(Bk),Cn;At

r h A , ^ , c n R

respectively. Let IT denote <f>(d\) if Bm is not among the conclusions of </>(di), and let II' denote

otherwise. Now, (p(d) can be written as

L^)/^]AySm] Bn Bk Cn

but this either is Ai

[Tl'/Ai] Bm

Bk Cn

(if Bm is not among the conclusions of <t>(d\)) or reduces to it by an instance of (III). The latter derivation, however, reduces immediately to

/ M \n!/Ai] At_ Cn

(by (IIIB4) from right to left), which is just <j>(d') by (5.4). F16a. We can take d and d' to be

r ; ^ K A ; ( B m ) di r;^hA;(B r o ) r,Aj\-A;Bm

r,Aj\-&,Bk,Cn

and T;Ai\-A,(Bk),{Cn) r;At\-A,Bk,(Cn)

*,. ~ , ~ * , ~ „ r , ^ l - A , B f c , C B

respectively. Also, let IT' be as above, then (j)(d) can be written

as Aj

m A{ Djn J

This reduces by (III) to

Ai Hdi)/Bn Bn

Bk Cn

Bm Bk Cnl

Ai n'

138 Normalization, Cut-Elimination and the Theory of Proofs

whether or not the conclusions of </>(di) include Bm. Now, the latter derivation reduces immediately to

A, I At Cn / A{ ir

_Ai Bk I

by an application of (IIIB5a) from right to left, and this last is just cj){d') (again, by (5.4)).

G. These reductions all translate into instances of members of (IIIB). More specifically, the image under 0 of (la) and (2d) is (IIIB5b) (in either direction); (lb) and (2b) translate into instances of (IIIB6a) from left to right, while (lc) and (2c) translate into the same reduction, but from right to left. Finally, (Id) and (2a) translate into instances of (IIIB3b) (in either direction).

This completes the basis step of the induction.

Induction Step: n = m + 1. Suppose that the result holds when the reduction is applied to any inference with no more than m steps below it. Let d >i d', and assume that the number of steps below the inference I being reduced is m + 1. We must show that 0(d) y 0(d').

Let d\ be the immediate subderivation of d which contains J , and d[ be the result of applying the appropriate reduction to J , then (j)(d\) y <t>(d[) by induction hypothesis. We distinguish two cases:

(1) di is the derivation of the right hand premise of an application of —•-left or cut, either premise of an application of A-right, or the premise of an application of right thinning (a), left thinning (b), —i-left, V-, —»-, V- or 3-right. In all these cases, 0(d) and 0(d') can be written as [(f>(di)/ Ai]U and [(^(d^/AilU, respectively (or as

0(di) 0(<*i) Aj and Aj

B->Ak {n} B-+Ak {n}

respectively, in the case of —>-right), for some Ai and II. Hence, it follows immediately from the induction hypothesis that 4>(d) y <t>(d').

(2) In all other cases c/)(d) and (f)(df) can be written as [U/Ai](f)(di) and [II/i4i]^(d/

1), respectively, for some Ai and II (or, if d\ is the derivation of the left-hand premise of an application of V-left, as [[Il/Aityid^/Bj]!!' and [[U/Ai]^^)/ Bj] II', respectively). Here, 0(d) y <t>{df) follows immediately from the induction hypothesis and the following:

Correspondence Results 139

Lemma 7.4 If Hi >-\ H2, n is any member of LKr)T and Ai is any conclusion ofH, then [U/Ai]Ui >-i [n /^] r i2 (by an application of the same reduction step).

Proof of Lemma 7.4 ft ft

Since II1 >-i II2, III can be written as * and II2 as 2, for some E, where "III reduces to n'2" is an instance of one of the steps listed above. The lemma is proved by induction on /(E), the length of E.

Basis step: /(E) = 0 In this case it is obvious that "[n/i4»]IIi reduces to [ n / ^ J I V is an instance of the same reduction step. (Strictly speaking, this has to be verified in the case of each reduction step with the aid of the definition of substitution, but it is a trivial matter to do so.)

Induction Step: /(E) = n + 1 Suppose that the lemma holds for all E" such that /(E") = n, and let E' denote the first n terms of E. There are two cases to consider:

i. The last term of E is an application of rule (5). Now,

§ yx "'?, hence [U/A^] ^ [U/A^

by induction hypothesis. But it follows from the definition of substitution that, if II has no open assumptions of the form Cn or conclusions of the form Dm, for any II':

w, [n/Ai]W

Dm VIM ( " ' Dm C-+Dp {n}

VIM \C^Dp

So, in particular,

Dm [n/At] ? 0-

C^DP {n}

[n/At]

ii. The last term of E is an application of substitution—for the assumptions Bk of some II', say. Again, it follows from the induction hypothesis that

[U/A^] M [Il/A^

Furthermore, by (5.4), if Ai is not among the assumptions of II' nor Bk among the conclusions of II,

[ n / ^ y ^ j i r = [n/^iQ^/sfcjn')

140 Normalization, Cut-Elimination and the Theory of Proofs

IT (j = 1,2). But the term on the right is just [ I I /^] ^ , hence

[U/A^ yt [U/A^

This completes the proof of the lemma and of Theorem 7.2.7 •

Whether all the reduction steps listed in the preceding chapter are considered, only that subset of them which corresponds to reduction in LK, or the steps for LK listed in Appendix B, it should be apparent that normal forms are not unique nor does every reduction sequence terminate. The latter feature can be attributed to two factors. The first is that some reduction steps are symmetrical. These comprise various instances of (III) as well as the thinning permutations in (IIIB); in the case of LK, they are basically those permutations which involve only cuts and thinnings, and do not result in the splitting up of an inference. The reduction sequences generated by steps of this kind can contain only a finite number of distinct terms—although some of them may be repeated infinitely many times. For this reason, they seem not to pose a serious problem. The same cannot be said of the second factor, which is that certain reductions, when applied to a derivation, may yield a more complicated one. Here again I am thinking of (III) or, in the case of LK, those steps which allow a cut to be split up, namely, (Cl-4), (C10-13), (Dl-5) and (D6-9).8 It was this phenomenon which was exploited by Zucker to produce an infinite non-repeating reduc-

7Using this lemma, it is easy to show (again by induction on / (£ ) ) that, if

n = 5A, II' = 5 ? and "III reduces to n 2 "

is an instance of one of the reduction steps, then n >-i II7—thus substantiating the claim made earlier that nothing is lost by formulating the reduction steps in terms of initial subderivations.

8A11 I intend here is to draw attention to the fact that some reduction steps enable us to generate infinite non-repeating sequences. As a matter of fact, I am hard put to explain what it means for one derivation to be more complicated than another in the present context. A necessary condition seems to be that the former should contain more inferences or vertices than the latter, but this is not sufficient. For example, although the application of (I3a) may increase the size of a derivation, it seems inappropriate to assert that it also increases the complexity—because the strong normalization theorem holds (for NJ), if for no other reason. Yet it is notoriously difficult to specify what kind of simplification is accomplished by this step. (A way to do so, and hence to define a measure of complexity which decreased with each application, would yield a simple and direct proof of strong normalization.)

In the case of LK, a thinning permutation such as (Fid) provides an example of a reduction which, although it may increase the number of inferences, seems to simplify rather than complicate a derivation. The situation here is further obscured by the fact that applications of at least two different reduction steps are needed to generate sequences of the kind described above. It is perhaps unreasonable, therefore, to claim that any particular step is by itself responsible for an increase in complexity.

Correspondence Results 141

tion sequence for LJ. His example can easily be adapted to the case of LKDT, as the following shows.

To simplify the notation, I shall not write in subscripts; clearly, nothing is lost by this omission. Recall that by (5.5)

ni n2 A B

n may be used to denote ambiguously

n2 ri! [Ui/A] B and [U2/B] A

n n (provided, of course, that II2 has no assumptions of the form A and II1 has none of the form B). Now, suppose that ITi and II2 are of the forms

ir n" CD A E F

ni n2

A A

respectively. Then,

nx n2 A B

n

(by (III) and (5.5)) where

n2 n+ = B

n

^ 1

and

d i i u ni' n2' B B

n' C D n2 n2 n; rr2 = B\ B2

A A rr n+ n+

n' C D

IT = ni ir2 Bx A A B2

n n As is apparent,

rii n2 n2 n2 A B and Bx B2

n n* have the same form and II is a proper subderivation of II*. therefore be applied once more to obtain a figure of the form

n"

(III) can

n2 n2 #2i B22

where 11̂ = E ni' B\ B2i

n*

n2' B\ B22

n*

142 Normalization, Cut-Elimination and the Theory of Proofs

Clearly 11* properly includes II*. Furthermore, (III) can now be applied to

n2 n2 # 2 i &22

n* and so on ad infinitum. With each application, the derivation which results is of increased size. (The use of the subscripts 1, 2, 2i and 22 on B is, strictly speaking, an abuse of notation. They are only intended as an informal device to make matters clearer by keeping track of various occurrences of B between figures.)

As I remarked in Chapter 2 above, this sort of example cannot be carried out in the sequent calculus except under special circumstances. In particular, it does not apply to the version of LK presented here. The difference between LKDT and LK in this respect is accounted for by the fact that, if d is obtained from d' by a reduction step in (Cl-9) or (Dl-5) (i.e., by permuting a cut with the last inference in the derivation of its right-hand premise, or by splitting the cut-formula in the right-hand premise of a cut), (j>(d) = <j>(df). As a result, the translation of a reduction sequence from LKDT back into LK may involve applying these steps from right to left, as well as from left to right. This is the case in the above example. It is easy to check that an infinite non-repeating reduction sequence analogous to the one described above can be generated in LK if the reduction procedure is augmented in this way.

The conclusion to be drawn from all this, however, is not that there can be no infinite non-repeating reduction sequences in LK. I present an example of one below. Notice that it depends essentially upon allowing more than one formula to appear on the right-hand side of a sequent. In this respect, it differs significantly from the preceding example, which applies to LJDT no less than to LKDT. Again, I shall omit subscripts and all parts of the derivation (e.g., side-formulae) which do not affect the situation. Reading from top to bottom, each one of the figures below reduces to its successor by an application of (C4) or (C13).

(7.5) A h C\ B CY- B B h E B-E\~D A\-B BY- D

Ah D

(7.6) B\-E B;E\-D B h E B;EhD

AV-C\B B\- D C\- B BY- D A\-C;D C\~ D

AhD

Correspondence Results 143

(7.7)

AhC;B B\- E A\-C;B B;EVD BhE B,E\-D AhC;E A;E\-C;D CVB B h D

A\-C,D C\- D Ah D

(7.8)

A\-C,B BhE A\-C;B B \-E C h B B \- E C h B B\-E A\-E\C A-S\-C\D CVe C;£\-V

A\-C-,D C\-V AV-V

Now, it is clear that any reduction step which applies to (7.5) can be applied with the same result to the part of (7.8) which is written in calligraphic characters. So, this sequence of steps can be repeated, beginning this time with (7.8), to obtain a larger derivation, and so on ad infinitum. Other similar examples of infinite non-repeating sequences can be constructed, but the above is as simple as any.

I turn now to the issue of uniqueness of normal forms. The above considerations are already sufficient to rule out the possibility of each derivation having a unique normal or cut-free form. (For the purposes of the present discussion, I will regard as normal any derivation of LKDT which has the subformula property, and lacks maximal segments and formula occurrences.) In the first place, some of the symmetrical reduction steps, most notably the thinning permutations, apply to derivations which may already be normal or cut-free. By itself, this is perhaps not so disturbing since it implies only that a derivation may reduce to a finite number of normal ones and that these are all reducible to one another by means of such permutations. In fact, however, a derivation may have infinitely many distinct normal forms. This is obvious in the case of LKr>T, since each derivation in the above example of an infinite reduction sequence may be normal, and it is also true for LK. (To see this, suppose that AY- C\B, C h B, B h E and B\ E h D all have cut-free derivations and that C, B and E are atomic. Then, it is an easy matter to specify a reduction procedure, namely always eliminate the left-most cut with no cuts above it, which will yield a distinct cut-free form for each term of (7.5)-(7.8).)

In addition to the problems caused by the failure of strong normalization, there is a further difficulty which stands in the way of uniqueness. In LK it has to do with the manner in which cuts are to be eliminated, and in LKDT it concerns the pruning of redundant parts of a derivation {i.e., those attached by thinning). Notice first that the proper reduction steps for LKDT

a r e conservative: each application of any one of them removes exactly two inferences from a derivation (an introduction together with the elimination following it). Those which remove maximal formula occur-

144 Normalization, Cut-Elimination and the Theory of Proofs

rences whose principal connective is A, V or —• differ from the conventional reductions in this respect. (This holds for —• only when the occurrences are introduced by an application of rule (5) which discharges no assumptions.) The usual procedure is to remove some or all of the subderivation which culminates in [or derives from] the redundant premise [or conclusion] of the inferences eliminated by the reduction—whenever such redundancies occur. How much of this subderivation is to be removed depends upon considerations which vary from calculus to calculus. Now, as an illustration of the difficulty which presents itself, consider a derivation of the form

(7.9) n' A

CV A C A

n n" A B

AAB A

where II, n ' and n " are assumed to be normal, any order yields (7.10) ir

Applying (II) and (12) in

C A n n" A B

A The problem is that, although this latter derivation contains no maximal formula occurrences, it may not be normal, i.e., it may lack the subformula property. A solution is provided by the pruning reductions in (IIIA). Unfortunately, however, when used without restriction, they lead to distinct

normal forms. In particular, they can be used to convert (7.10) into . or

IT into .. Before discussing the possibility of restricting them in some way,

it is worthwhile to consider how this matter is handled by the reduction procedures for NJ and LK. In NJ (7.9) cannot be represented by a derivation whose last inference is A-elimination, rather it corresponds to one of the form

[C]

n n" rr A B A AAB

CvA A [A]

Correspondence Results 145

and this reduces to . no matter which maximal formula occurrence is

operated upon first. In LK, (7.9) corresponds to a number of different sequent derivations, among them the following:

(7.11) d' d"

\- A C\- B A\- A d \-CVA C\JA\-B;A \-A \-B;A A\- A

\- AAB;A AAB\-A \-A

and (7.12)

d d" \- A CY- B A\- A

df C\-AAB AAB\-A \- A C\-A A\- A

\-CVA CV AY- A h A

(d, d' and d" are supposed to be such that (j)(d) = II, (j){d') = 11' and cj){d") = II". Also, I have not bothered to write in any side-formulae.) No matter how the cuts are eliminated from (7.11) and (7.12), the result is that the former reduces to d and the latter to d!. This would be perfectly satisfactory were it not for the fact that the following derivation can easily be seen to reduce to both (7.11) and (7.12):

d' d" d \- A C\-B AY- A \- A BY- B A\- A

\-CvA CVA\-B;A B \-A A B A A B Y-A h B\A BY- A

\- A

The problem is that when a cut is eliminated or reduced in complexity, its location in the derivation affects the result of reducing it. Unfortunately, however, this location is not in general uniquely determined—except by adding ad hoc restrictions. (Another illustration of the difficulty this causes is provided by the procedure for eliminating a cut, one of whose premises is introduced by thinning.)

The uniqueness of normal forms in natural deduction can be explained, as far as the negative fragment is concerned, by two features of this calculus. The first is that there is a natural ordering of the inferences which constitute an N derivation, and this ordering is not affected by any of the reduction steps. The second is the fact that any branches made redundant by the application of a reduction are composed entirely of inferences

146 Normalization, Cut-Elimination and the Theory of Proofs

subordinate (in the sense of this ordering) to the ones being removed. In view of this, they may be pruned in their entirety without spoiling uniqueness.

When we turn to the full calculus, it is no longer obvious that the ordering of inferences is entirely natural. In addition, it becomes necessary to allow reductions which alter this ordering. To preserve uniqueness, ad hoc restrictions must be placed on these. By itself, however, this is not sufficient because the branches made redundant in the process of normalization may contain inferences which are not subordinate. So, to ensure that the calculus possesses the second feature mentioned above, the meaning of "redundant" is altered by treating inessential applications of V- and 3-elimination as though they were essential.9 In NJpT and NKQT,

this last translates into some ad hoc conventions concerning when the rule (LT) can be applied and how much of a redundant subderivation is to be pruned.

In light of the preceding considerations, we can understand better why normal forms are not unique in LK and LKDT. TO begin with, there is in general no satisfactory way to order the inferences of a multiple-conclusion derivation. This can be seen from the example of (7.9), which can be interpreted as having been constructed either from II and

IT A

CV A C A

n" by applications of A-introduction and elimination, or from

C IT n n" A A B_

CV A AAB A

and A by V-elimination. Although there are a number of other similar examples, I will present only one more:

9 The branches of a derivation made redundant by a reduction are those which, if they are to be retained after the removal of some maximal formula occurrences, must be reattached by means of thinning. I argued earlier that applications of V-elimination which do not discharge assumptions in both minor premises and applications of 3-elimination which discharge no assumptions both involve the tacit use of thinning. If this interpretation is kept in mind and reduction in the N calculi is taken to consist of removing maximal formula occurrences together with the branches made redundant by their removal, then it can fairly be claimed that what it means for a branch to be made redundant in the negative fragment is not the same as in the full calculus.

Correspondence Results 147

Given ~ ^, let

n n4 rii n n2 = E F and n 3 = j4 B_

G C

Then iii n 2 n 3 n4

_^4 B_ = ]Z F_ C G

(provided that E is not among the conclusions of Iii, nor B among those of n 4 ).

Of course, the inferences of an LK derivation can be ordered. The problem is that the ordering is rather artificial and can be changed radically— most notably by the upward permutation of cuts—in the course of cut-elimination. In LKDT too, even when the order of inferences is determined by the structure of the derivation, it can often be reversed by applying (III). As a result, any reduction procedure (for LK or LKoT) which allows entire branches to be pruned as they become redundant will not yield unique cut-free or normal forms. It is however hard to envision natural reduction steps which prune enough to ensure the subformula property, but not so much as to destroy uniqueness.

Certainly those reduction steps for LK which involve pruning, namely (Bl-3), do not fit this description. They correspond in LKpT to (11-3), respectively, followed by an application of (IIIAl) (in the cases of (II) and (I3b)) or (IIIA2) (in the case of (12)). In effect, they allow a segment of variable length to be removed from each redundant branch—the only constraint being that no open assumptions or conclusions are to be lost. It is obvious that this is too drastic for uniqueness. Furthermore, because the extent to which a branch is pruned depends upon the position of the inferences used to construct it relative to the cut whose complexity has been reduced (those above it being removed, while all others are retained), relatively insignificant permutations of inferences will alter significantly the effect of these reductions, and it is by no means clear that they have any claim to be called natural.

If we contemplate replacing (IIIA1-2) by pruning reductions which are more systematic and will preserve uniqueness, there seems to be only one reasonable possibility: restrict (IIIAla) to the case in which II consists of a single introduction and, dually, (IIIB2) to the case in which II' consists of a single elimination. The drawback to this idea is that, if the connective being introduced is —•, II will not fit into the format of (IIIAla). To take account of this, (IIIAlb) must be replaced by (IIIA5). Now, if the somewhat trivial step (IIIA6). is also included, we have a group of pruning reductions which will ensure that normal derivations possess the subfor-

148 Normalization, Cut-Elimination and the Theory of Proofs

mula property. (In fact, _L may occur in a normal derivation even though it is not a subformula of any assumption or conclusion. For all practical purposes, however, the claim is true.) Furthermore, because no maximal formula occurrences are removed by these reductions, they do not by themselves threaten uniqueness and leave open the possibility of proving that the normal forms of a derivation are all equivalent in some suitable sense of the word. What vitiates this approach, of course, is the fact that (IIIA5) may introduce new open assumptions into the derivation on which it operates, and hence is unsuited to be a reduction step. The disappointing conclusion, therefore, appears to be that the only natural way to prune a normal form of an LKDT derivation II in such a way that it will have the subformula property and bear some structural resemblance to the other normal forms of II leads to an insuperable difficulty.

The preceding remarks notwithstanding, by choosing an appropriate notion of normal form and placing sufficient restrictions on the reduction steps, it is clearly possible to prove for any of the calculi under consideration not only the uniqueness (up to some equivalence) of normal forms but also the termination of each reduction sequence in a normal derivation with the subformula property. The problem is that this will be an ad hoc procedure designed expressly for the purpose of obtaining these results. On the other hand, unless we are prepared to interpret 'natural' as 'natural relative to the rules of a particular system', it is not clear that there is any such thing as a natural reduction procedure.

Despite a fundamental similarity between reduction in all the systems discussed above, we are faced with a bewildering number of choices about matters of detail which are decided for each particular calculus in what appears to be a reasonable way only by respecting its combinatorial peculiarities. As a result, these decisions often seem pointless and arbitrary when translated from one calculus into another. Furthermore, it is upon these apparently trivial decisions that the possibility of proving strong normalization and Church-Rosser type theorems depends. When we do come across a calculus like NJ for which such theorems hold with respect to a relatively straightforward set of reduction steps, it seems to be more a matter of combinatorial accident than a reflection of some profound truth about normalization. For this reason, it seems unwise to use (as Zucker appears to do) the normalization procedure for NJ as a kind of benchmark by which to judge other reduction procedures. Once we go beyond the negative fragment, no method of reduction stands out as privileged; they all appear to be more or less satisfactory compromises between competing requirements. An investigation of their formal properties does not provide sufficient grounds for choosing between them or assessing their wider significance. So, rather than pursuing such an investigation further in the hope of discovering some clue as to how the relationship between a derivation and its normal form(s) is to be interpreted, it might prove more fruitful to

Correspondence Results 149

consider directly various interpretations which have been suggested for the derivations of a formal system with a view to drawing up a set of criteria, independent of the rules of any particular formalism, by which to judge reduction procedures and proposals regarding their significance.

Before turning to this task in the next chapter, I would like to conclude the present one with a brief discussion of a topic which, although peripheral to my main theme, is nevertheless of some interest—namely, the advantages of presenting classical logic in a multiple-conclusion framework. As I observed earlier, the rules of DT are all classically valid so that, for example, the sequent T h A is derivable in LK iff there is a derivation II G DT of A from T. Some of them, however, are not intuitionistically valid. In particular, both (5) and (10v) need to be restricted to obtain intuitionistic logic. It is rather a complicated matter to formulate such restrictions if the rules are to be applied downwards in a straightforward manner, although there are quite naturally generated subsets of DT which are adequate for intuitionistic logic—the most conspicuous example of one being LJDT> Nevertheless, it seems fair to say that these multiple-conclusion rules express more naturally the classical interpretation of the logical connectives than its intuitionistic counterpart. The situation is reversed in the single-conclusion case. It has often been remarked, for example, that NJ is extended to classical logic at the cost of a certain artificiality, or that NK is perhaps not "the proper way of analyzing classical inferences."10 Furthermore, NKDT

can only be described as an arbitrary subset of DT- The claim I wish to defend here is that D? (or some variant of it) is the proper generalization of NJ to classical logic and that it is superior to NK as a natural system of classical deduction.

In the first place, the rules of classical multiple-conclusion logic can be formulated in a completely explicit way without restrictions. (This was observed by Kneale and seems to have motivated, in part at least, his interest in multiple-conclusion derivations.) The only rule which does not conform to this description is (5) and, following Kneale, it could be replaced by:

<5 '* A-^TB «"« <5"> A^B This is surely as it should be since we have been taught that classical inference depends only on the truth or falsity of statements, not on the manner in which they are established. (There is an additional advantage in doing away with improper inference rules if one believes that the meanings of the logical connectives are defined by their associated rules: such definitions will now all be explicit ones.)

A second point to consider is that (with some minor qualifications) every

10See Gentzen's "Investigations into Logical Deduction," or Prawitz's "Ideas and Results in Proof Theory." The quotation is taken from the latter.

150 Normalization, Cut-Elimination and the Theory of Proofs

derivation in DT can be reduced to a normal form having the subformula property. This is in contrast to NK for which a theorem of this kind holds only in the negative fragment.11

It is perhaps worth noting, however, that a similar result can be proved for a single-conclusion variant of NK—or at least for its propositional part. I pointed out earlier that, if the following rule is substituted for the law of excluded middle or the classical negation rule in NK, the resulting system of rules will, under the appropriate interpretation, generate a more natural subset of DT'.

(7.13) [X] [-iX] n ir c_ c_

c Let the calculus obtained by adding this rule to NJ, be called NK'. As

far as I know, NK' has not been much studied.12 It is obvious that NK' is adequate for classical logic (since, for example, the law of excluded middle is trivially derivable using the above rule). Furthermore, a strong normalization theorem is provable for the propositional part of NK'. This is because nothing is lost (as far as propositional inferences are concerned) by requiring X to be atomic in the above. I show that this is so below.

The four figures which follow can be interpreted either as the cases in an inductive argument to show that NK1 is deductively equivalent to the system obtained by restricting the above rule to atomic X, or as a set of negation reductions according to which (7.13) reduces to the figure shown when X has the appropriate form.

(1) X = AAB

[AAB] hA] A [AAB]

[A] [B] 1 hB] B_ AAB i ( i 4 A S ) ±

n IT -i(;4 A B) C_ C_ IT

c_ c__ c

1 1 It was Smiley and Shoesmith who first observed this advantage of multiple-conclusion natural deduction and, in Chapter 20 of their Multiple-Conclusion Logic, proved a normal form theorem for the classical predicate calculus which made use of it. 12Sundholm however mentions it in his article, "Systems of Deduction," where it is called the rule of non-constructive dilemma. (See Vol. I of the Handbook of Philosophical Logic edited by F. Giinthner and D. Gabbay, Dordrecht, 1983.)

Correspondence Results 151

(2) X = AVB

\A) [B] AVB

AV B n

[A] l^A] [AVB] ±

1

[B] [-B] _L

\A) [B] AVB

AV B n ->{AVB)

W

n C C

c C

c (3) X = A-+B

[B] A^B

n

[4 [A^B] B [-,B] [A]

1 1 [-̂ 4]

J. [B] A^B

n

[4

^(A - B) i r ~A

B [B]

A^B

n

[4

^(A - B) i r ~A [^ B

c c c

c

n c

(4) X = ^A [A] hA]

->->A [->A]

TV n c c

An additional permutative reduction is needed to take account of maximal segments produced by (7.13), but this creates no problem and the usual proof of normalization goes through virtually unchanged.13

The attempt to extend the above to quantifiers founders on the following difficulty: the analogues of (l)-(4) do not hold for the usual introduction and elimination rules for V and 3; on the other hand, they do hold trivially with respect to the rules (9Q) and (10g). These latter rules, however, pose certain problems for normalizability, so it is by no means clear that there is a reasonable formulation of the quantifier rules with respect to which a normalization theorem will hold.]

My final reason for maintaining the superiority of DT is in some ways the most important: just as Gentzen claimed that the derivations of NJ had a "close affinity to actual reasoning," so I claim that classically valid principles can be derived in rather a natural, concise and straightforward

13There also appears to be no difficulty in extending the "convertibility" proof of strong normalization to the present context, but I have not checked this in detail.

152 Normalization, Cut-Elimination and the Theory of Proofs

way using the rules of DT- TO illustrate this fact, I present two sample derivations, the first of (A —» B) V (B —> A) and the second of {A - ^ ( B V C)) -* ({A - • B) V (A - • C)). For the sake of legibility I will not bother to write in subscripts or identify the rules used to justify each step:

[B] A-*B

(A-*B)vA B

(A-> # ) V (B-» A) B (4 - • B) V ( £ -> A)

[A]* [A -> (B V Q] * B V C

B C A ^ B * A ^ C

( A - •B)V(A-+ C) ( A -• B) V (A -» C) (A - ( B V C ) ) - ( ( A - B V v (A - C)) t

(* and f indicate which assumptions are discharged by which inference.) These derivations compare very favorably in terms of length and intelligibility with their NK counterparts, as do many other DT derivations. (The relationships between universal and existential quantifiers, for example, can be established in just a few lines in DT) It seems fair to conclude from all this that multiple-conclusion logic has more than mere curiosity value.

8

Interpretations of Derivations

If the analogy between the derivations of a logical calculus and the terms of a calculus of functions is taken seriously, it leads naturally to the idea that interreducible derivations represent the same proof.1 This is so not only because it suggests the possibility of a strong normalization theorem for derivations (which, as Prawitz has pointed out, gives a certain coherence to the idea) but also because reduction in term calculi has traditionally been used to analyze the equality relation between terms and, by extension, the identity relation between the objects which they denote. Church, for example, states quite explicitly that the substitution of interreducible terms for one another in an expression leaves its meaning unchanged. This implies at least that such terms must have the same denotation under any interpretation. He also points out, however, that the notion of difference in meaning is a vague one, that there is a range of distinct identity criteria for functions and that it is not always possible to distinguish between these by means of the reductions he considers.2

Prawitz seems to have been the first author to suggest in print that the identity relation on proofs could be characterized in terms of reductions between derivations, although the idea that there is a connection between proofs and functions, or a formal analogy between the derivations of a logical calculus and the terms of a calculus of functions, goes back quite a long way. Godel, for example, observed that the concepts of computable function of finite type and intuitionistically correct proof may be used interchangeably in certain contexts.3

Furthermore, Curry and Feys noted a striking analogy between the theory of functionality and the positive implicational calculus, which they

1See, for example, page 249 of Prawitz's paper "Philosophical Aspects of Proof Theory" in Volume I of Contemporary Philosophy (ed. by G. Floistad, The Hague, 1981).

2 The Calculus of A-Conversion, Princeton, 1941. See page 15 and pp. 2-3. 3"Uber eine bisher noch nicht beniitzte Erweiterung des finiten Standpunktes," Dialec-

tica, Vol. 12, 1958, p. 283; reprinted with translation in Vol. II of his Collected Works (ed. by S. Feferman et ai, Oxford, 1990).

153

154 Normalization, Cut-Elimination and the Theory of Proofs

exploited to prove a normal-form theorem for combinators.4 Another connection was made by Tait.5 He adapted a method used to prove cut-elimination for derivations involving induction principles to analyze the computation of functionals involving definition by recursion (i.e., to prove a normal-form theorem for a certain set of terms). Howard, building on the ideas of Curry and Tait, extended the analogy from the positive im-plicational calculus to Heyting arithmetic and indicated how to establish a normalization theorem for the associated calculus of terms.6 The method he suggests for proving such a theorem is due to Tait; it involves the notion of a convertible term, which was introduced by the latter to analyze the computable functionals of finite type, in the sense of Godel, as well as a certain extension of them.7 This method was also used to good effect by various contributors to the Proceedings of the Second Scandinavian Logic Symposium, in particular, by Girard, Martin-Lof and Prawitz, all of whom adapt it to prove normalization theorems for the derivations of a variety of systems.8

Prawitz argues that the derivations of Gentzen's N calculi represent first-order proofs (and, conversely, that each such proof can be represented by an N derivation), and then goes on to conjecture that two derivations represent the same proof if and only if they are interreducible. There are a number of points to notice about this conjecture. The first has to do with what it means for a proof to be represented by a derivation. It is quite clear that for Prawitz proofs are abstract objects whose relationship to their representations is analogous to that between propositions and the sentences which express them. It follows that to ask when two derivations represent the same proof is much like asking when two sentences have the same meaning.9 Whether this is an appropriate analogy and how it is related to the one mentioned in the preceding paragraph will be considered later. For the present, I want to draw attention to a second point about the conjecture: its tentative nature. Of course, it is advanced only as a possible answer to the question of when two derivations represent the same proof, and no conclusive evidence if offered in support of it. There is, however,

4 Combinatory Logic,Volume I, Amsterdam, 1958, page 312ff. 5 "Infinitely Long Terms of Transfinite Type," Formal Systems and Recursive Func

tions, 1965, page 177. 6 "The Formulae-As-Types Notion of Construction"; this has circulated as a manuscript

since 1969, but was first published in the volume of essays To H.B. Curry, ed. by Seldin and Hindley, New York, 1980.

7 "Intensional Interpretations of Functionals of Finite Type I," Journal of Symbolic Logic, Vol. 32, 1967, pp. 198-212.

8Prawitz's contribution to this volume forms the basis for my discussion of his views in the text. The conjecture about the identity of proofs is to be found on page 257. He states on page 261 that it "is due to Martin-Lof and is also influenced by similar ideas in connection with terms" which are to be found in Tait's 1967 paper referred to above.

9See, for example, page 237 of "Ideas and Results in Proof Theory"—or "On the Idea of a General Proof Theory," Synthese, Volume 27, page 68.

Interpretations of Derivations 155

another sense in which it may be considered tentative: even if it should turn out that identity of proofs can be characterized in terms of reducibility,

Nevertheless, the conjecture as stated above is clearly in need of certain refinements. Firstly, derivations that only differ with respect to proper parameters should obviously be counted as equivalent. Secondly, one may ask whether not also the expansion operations preserve the identity of the proofs represented. It seems unlikely that any interesting property of proofs is sensitive to differences created by an expansion.10

These are just examples of matters of detail which need to be settled before the conjecture can be put in a definitive form. Two further ones are provided by questions concerning immediate simplifications and the permuta-tive reductions. Immediate simplifications allow the removal of redundant applications of V- and 3-elimination as well as of the classical negation rule ; such applications obviously "constitute unnecessary complications" in a derivation, but their unrestricted removal destroys the uniqueness of normal forms.11 As for the permutative reductions, there seems to be no particular reason—apart from expediency—for restricting their application to cases where it results in diminishing the length of a maximal segment. If, for example, the major premise of an elimination rule can always be permuted upwards past an application of V- or 3-elimination, this will not affect the strong normalization theorem.12 If, however, no restriction is placed on these reductions, not every reduction sequence will terminate (as Zucker's example shows). What is at stake in all these cases is the exact definition of the reduction relation. Even after stipulating that interreducibility is to be, roughly speaking, the equivalence relation generated by an agreed upon set of proper reductions, we are still obliged to decide a number of rather picayune questions before we can fix it precisely. Furthermore, if the claim that this relation has significance outside the confines of a formal system is to have any credibility, we must exhibit sound reasons for our decisions. It is, however, difficult even to imagine what kind of evidence would settle conclusively issues as small as these.

This leads me to my final point about the conjecture, namely, the evidence offered to support it. To substantiate the claim that derivations which reduce to the same normal form represent the same proof, Prawitz appeals to what he calls the inversion principle: "a proof of the conclusion of an elimination is already 'contained' in the proofs of the premises when the major premise is inferred by introduction." He infers from this that "a proper reduction does not affect the identity of the proof represented."13

10Prawitz, "Ideas and Results in Proof Theory," page 257. 11 op. cit, pages 254 and 249. 12op. cit, page 253. 13op. cit, pages 246 and 257.

156 Normalization, Cut-Elimination and the Theory of Proofs

Of course, as Prawitz himself is quick to point out, this does not imply the truth of the above claim because it makes no mention of the permutative reductions, but although he acknowledges the possibility that these might be problematic, he does not seem to take it very seriously. As it stands, however, the inversion principle does not even allow us to conclude that proper reductions leave unchanged the proof being represented; it tells us only that the two derivations in question (before and after a reduction) have the same conclusion. To supplement it, we need an explanation of the relationship between introductions and eliminations which will guarantee that the proofs represented by such pairs of derivations are the same. An explanation of the appropriate kind is provided by Prawitz's ideas about the validity of arguments. These are sketched in his "Ideas and Results in Proof Theory" and elaborated in subsequent papers.14

Prawitz suggests that the introduction rules of NJ can be interpreted as expressing the constructive meanings of the logical particles in operational terms, while the inversion principle provides a justification of the elimination rules in terms of these meanings. According to him, the derivations of a formal system are incomplete exemplars of valid arguments; each such derivation needs to be supplemented by operations justifying the inferences of which it is composed. On the above interpretation, introduction inferences are self-justifying; eliminations, on the other hand, do require justifying operations, and these are provided by the proper reduction steps. He goes on to claim that "such an operation which is supposed to justify an inference expresses the meaning of this inference." So, if d contains an elimination whose major premise is maximal and d' results from d by applying a reduction which removes this maximal formula occurrence, "the meaning of the elimination inferences is expressed by saying that an argument [of the form d] . . . represents the same proof as represented by [one of the form d']." d! is "obtained by removing the elimination inference in question according to its meaning and [d and d') are intentional (sic) identical."15

In more detail: Prawitz considers trees of formulae of a fixed language with some additional information; these he calls arguments. An inference is essentially a means of constructing new arguments from ones already given. For simplicity, assume that the language is a propositional one, then the additional information need only enable us to distinguish between open and closed occurrences of assumptions, and to associate with each closed one a unique inference. Arguments are just derivations in the abstract, as opposed to the derivations of a particular formal system.

Prawitz is interested in the question of what constitutes a valid argu-

14See, in particular, "Towards a Foundation of a General Proof Theory" (Logic, Methodology and Philosophy of Science IV, ed. by P. Suppes, Amsterdam, 1973) and "On the Idea of a General Proof Theory" (Synthese, Vol. 27, 1974). 1 5 "Towards a Foundation of a General Proof Theory," page 234.

Interpretations of Derivations 157

ment (when does it represent a proof, when does its conclusion follow from its assumptions) and answers it by elaborating on an idea he attributes to Gentzen. (Writing about his systems of natural deduction, Gentzen remarks, "The introductions represent, as it were, the 'definitions' of the symbols concerned, and the eliminations are no more, in the final analysis, than the consequences of these definitions. . . . In eliminating a symbol, we may use the formula with whose terminal symbol we are dealing only 'in the sense afforded it by the introduction of that symbol'."16) The conditions under which a (logically complicated) statement can be asserted are taken to constitute the meaning of its principal connective, and these conditions are expressed in terms of rules of derivation, or inferences. For example, the meaning of conjunction is expressed by an inference which transforms arguments for A and B into an argument for A A J3. Similarly, the meaning of implication is expressed by an inference which transforms an argument for B from A into an argument for A —> B. Such inferences are said to be canonical. By definition, the result of applying a canonical inference to valid arguments is itself valid; arguments of this form are also called canonical. The meanings given to the connectives may also justify the inclusion of non-canonical inferences in a valid argument, where a justification for an inference I will be an operation which transforms the result of applying I to canonical arguments into a valid argument for the same conclusion. Not only are canonical arguments valid, therefore, but also those arguments which can be converted into canonical ones by means of the justifying operations. Furthermore, if we interpret an open argument as claiming that its conclusion follows from valid arguments for its (open) assumptions, it is reasonable to allow such an argument to be valid whenever all its closed instances are. (A closed instance of an argument is the result of replacing its open assumptions by closed valid arguments for them.) These conditions are sufficient to characterize validity, assuming we know what it means to be a closed valid argument for an atomic sentence:

(1) An argument is valid relative to a set B of atomic arguments iff all its closed instances can be converted into canonical arguments or members of B by means of the justifying operations.

It is then a trivial matter to show:

(2) An argument is valid in the sense of (1) iff it can be generated from assumptions and members of B by means of canonical and justifiable non-canonical inferences.

A notion of strong validity can also be defined by substituting "strongly valid" for "valid" in the explanations of canonical argument and closed instance, and insisting in (1) that every (sufficiently long) sequence of jus-

The Collected Papers of Gerhard Gentzen, page 80.

158 Normalization, Cut-Elimination and the Theory of Proofs

tifications terminates in a canonical argument or a member of B. (This is not quite Prawitz's definition, but see below.)

Specializing the preceding discussion to ATJ, it is obvious that the introduction rules constitute canonical inferences, while the eliminations, although non-canonical, are justified by the various proper reductions. (2) is nothing more than the claim that all the derivations of NJ (relative to a system B of atomic derivations) are valid.17

Strong validity for derivations is a variant Tait's notion of convertibility for terms. The idea of interpreting reduction steps as defining conditions derives from this same source, although in Tait they define the operations associated with the introduction of various constants, rather than the meanings of elimination inferences. Not surprisingly, strong validity provides a tool for proving that derivations must reduce to a normal form, and is so used by Prawitz in the appendices to "Ideas and Results in Proof Theory." In Appendix A, he considers first-order logic and shows without difficulty that every way of reducing a strongly valid N derivation must terminate. A more complicated argument is required to show that every N derivation is strongly valid. To make it go through Prawitz employs an inductive definition of strong validity which differs from the one given above. What he wants to say is that a derivation terminating with an introduction is strongly valid when the derivations of the premises of its final inference are, while a derivation terminating in an elimination is strongly valid when every derivation to which it reduces in one step is strongly valid. But, he is obliged to stipulate in addition that irreducible derivations are strongly valid to ensure that this notion is well-defined. (Validity is defined similarly except that a derivation which terminates with an elimination is valid when there is some valid derivation to which it reduces in one step.)

On the basis of the ideas explained above, however, there is no justification for assigning a special status to normal derivations. They are no more obviously valid or strongly valid (in the earlier sense) than derivations in general, and making them so by definition is to sacrifice the intuitive content of these notions for the sake of technical expediency. It is in fact the lack of an explanation for the significance of normal forms which makes it difficult to interpret normalization theorems in terms of validity. Nonetheless, Prawitz attempts to do so. For example, he writes, "The significance of the strong normalization theorem is very clear from the present point of view. To carry out a reduction is essentially to replace a definiendum by its definiens The strong normalization theorem then says that the arguments in question are well-defined in the sense that each way of successively replacing definiendum by definiens will finally terminate."18 It seems much more reasonable, however, to say that an argument is well-defined

Cf the argument on page 287 of "Ideas and Results in Proof Theory." "On the Idea of a General Proof Theory," page 76.

Interpretations of Derivations 159

if applying the definitions in any order will eventually yield a determinate argument not involving any defined operations. In general, this will mean considering not just the argument in question, but also its closed instances, subarguments of those instances, etc. In other words, an argument is well-defined iff it is strongly valid, and the analogue of (2) above for strong validity expresses the claim that every argument is well-defined. Notice that this differs from the statement of the strong normalization theorem and makes no reference to normal forms.

Whatever the interest of these ideas, they do not provide conclusive evidence for the claim in question. In the first place, they depend upon assigning a very narrow interpretation to the logical connectives—narrower than what is usually taken to be the constructive one, and require a treatment of classical arguments which reduces them to particular cases of in-tuitionistic ones.19 Furthermore, even granted this interpretation, it still does not follow that interreducible derivations represent the same proof. In particular, the notion of validity which Prawitz adopts is not uniquely determined. Given this notion, it is possible to argue (as he does in the quotation above) that the meaning of an elimination can be expressed by its associated reduction step and, a fortiori, that such steps preserve the identity of proofs.

On the other hand, assuming that the reductions have this property, it is possible to justify the definition of validity (as Prawitz does, for example, on page 285 of "Ideas and Results in Proof Theory": " . . . in view of the conjecture about identity between proofs, a derivation that reduces to [a canonical one] .. .shall also be counted as valid."). The interdependence of these ideas, however, makes it unlikely that doubts about the one can be dispelled by an appeal to the other.

All this notwithstanding, Prawitz refers to this half of the conjecture as unproblematic, and his opinion seems to be shared by most commentators. Kreisel, for example, claims that it is evident simply by inspection that such reductions do not change the proof described.20 Apparently, the only dissenting voice belongs to Feferman who, in his review of "Ideas and Results in Proof Theory," objects that information may be lost in the process of reduction. The particular example he considers is a derivation D in arithmetic whose last inference is an application of V-elimination with a maximal premise and a closed atomic conclusion A(t). He points out that the normal form of D "will simply give a computation which verifies A(t); in this case every abstract idea in the derivation may be lost."21

If the relationship between derivation and proof is in fact a special

19 "On the Idea of a General Proof Theory," page 70. 20 "A Survey of Proof Theory II," page 112. 21 Journal of Symbolic Logic, Vol. 40, 1977, page 234.

160 Normalization, Cut-Elimination and the Theory of Proofs

case of that between a linguistic expression and its meaning or content, this objection seems undeniable. The two derivations are clearly not just linguistic variants of one another, and to distinguish between information and content in this context would be nothing more than a quibble.

Prawitz's response to this objection is of interest.22 After conceding the point about loss of information, he goes on to observe "that on the view presented here, a proof is . . . the result of applying certain operations to obtain a certain end result," and claims that this makes it difficult to deny the identity of proofs represented by interreducible derivations. According to him, the real issue is whether proofs can indeed by identified with objects of this kind. It might be more appropriate, however, to question whether a proof thought of in this way can be identified with the meaning of the derivation which represents it. I propose to consider this last question below, even though Prawitz does not do so.

There is less to be said about the other half of the conjecture, that any two representations of a proof are interreducible. Prawitz does not even attempt to argue for it and merely refers the reader to Kreisel's "A Survey of Proof Theory II" for a discussion of "the possibility of finding adequacy conditions for an identity criterion such as the one above."23 As one might imagine from this description, the discussion turns out to be somewhat inconclusive. Kreisel endorses the doctrine that derivations are the linguistic expressions of proofs, but he gives it a psychological twist by identifying these latter with mental acts.24 Furthermore, he takes it for granted that all the reduction steps under consideration preserve the identity of proofs. His idea is that, if he can find a "mathematically manageable" condition M(di,d2) which follows (in some informal sense) from the claim that d\ and c?2 represent the same proof, it might be possible to show:

(*) M(rfi, c/2) implies that d\ and d2 reduce to the same normal form.

Recall that removing the last inference from a closed normal derivation d of an existential formula, (3x)A(x), transforms it into a derivation of A(td) for some term £<*; this term may be written as td> when d! is a derivation whose normal form is d. Also, given a formal system F , an assignment AF of F terms to parameters and closed F derivations to their conclusions, and an arbitrary derivation d, let Ap{d) be obtained from d by replacing the parameters occurring in its open assumptions and conclusion with F terms, and the open assumptions of the derivation which results with closed F derivations—all in accordance with the assignment AF- Kreisel restricts his attention to derivations of the predicate calculus with existential conclusions. Let d\ and d2 be such derivations (of the same conclusion from

22See pp. 249-250 of "Philosophical Aspects of Proof Theory." 2 3 "Ideas and Results in Proof Theory," page 257. 24Kreisel, op. cit, page 111. According to him, words in general express mental entities—namely, thoughts.

Interpretations of Derivations 161

the same assumptions), then he suggests the following as a possible choice for M(di,d2):

For all extensions F of predicate logic for which normalization holds and all assignments AF, W(di ) a n d ^AF{d2) define ex-tensionally equal functions (presumably, in all interpretations of the appropriate kind).

It seems obvious that, if d\ and d2 represent the same proof, M{d\, d2) ought to hold. Unfortunately, however, the converse is not true (as Kreisel himself is quick to point out) and, as a result, (*) fails too. The problem is that we can easily construct an existential statement all of whose derivations will supply the same instantiating term. The particular example Kreisel considers is of the form (3x)(x = cAP), where P is independent of x.25 Clearly, any derivation d of this statement will be such that td = c, no matter how P is obtained.

In view of the above, Kreisel describes M(d\,d2) as providing only a partial criterion, and takes pains to emphasize that this does not invalidate it. Perhaps his general point, that a partial criterion may be both useful and interesting, is correct, but it is of little relevance to the present case. M{di,d2) applies only to some derivations, but we have no idea how to characterize this subclass; it needs to be supplemented by other conditions to increase its range of application, or to make its limitations explicit, but we have no idea what these might be. (In fact, there is no particular reason to believe that such conditions—i.e., ones which do not supplant M(di,d2), but complement it—exist.) In short, not only does (*) fail to hold, but M(d\,d2) seems to have no interesting consequences.

Kreisel too dismisses M(d l 5 d2), albeit on general grounds. For him, (*) has the form of an adequacy condition which the interreducibility relation might satisfy (amongst others) before it can be accepted as characterizing identity between proofs. The example discussed above indicates that it is not applicable to all derivations of existential formulae. What disturbs Kreisel is that, even if (*) could be shown to hold whenever di and d2

actually represented the same proof, we would have managed only to

evade rather than solve questions about the nature of the identity of proofs or, indeed, about the nature of proofs. For what we propose to show, for specific derivations, is that questions of the adequacy of [interreducibility] can be settled without closer analysis of the concepts involved.26

Again, one may grant his general point but question the emphasis he places on it. No one pretends that, if (*) could be shown to hold, it would provide us with an understanding of the concepts involved sufficient to render

25Kreisel, op. cit, page 116. 26Kreisel,op. cit, page 117.

162 Normalization, Cut-Elimination and the Theory of Proofs

further analysis unnecessary; on the other hand, for almost any reasonable choice of M{d\,d2), such a result would be of interest and would at least advance our understanding of them. It seems perverse to criticize M(di,d,2), not because it fails to coincide with any interesting relation between derivations or proofs, but because such a relation would not be fully explained even if it did.

Having reached this negative conclusion, Kreisel claims only a pedagogic value for his discussion. Apparently, our ability to recognize the shortcomings of his proposed criterion serves as a lesson to those who assume "that 'nothing' precise (and reasonable) can be done on questions about synonymity of proofs."27 Whatever contribution may have been made to pedagogy, however, no new evidence has been adduced in favor of Prawitz's conjecture. The net result of our efforts to discover such evidence has in fact been rather disappointing. The analogy between derivations and terms, which suggested the conjecture in the first place, remains the most persuasive argument in favor of its first part, whereas its second part has not been substantiated in any way. We do not even know where to look for evidence which might support this latter. Before drawing any conclusions from this state of affairs, I will consider one other attempt to vindicate Prawitz's conjecture, albeit at the cost of reinterpreting it. The attempt is made by Martin-L6f in "About models for intuitionistic type theories and the notion of definitional equality."28

As its title suggests, much of this paper is devoted to the analysis of a relation which Martin-Lof calls definitional equality. According to him, it "is used on almost every page of an informal mathematical text" (page 93), may reasonably be identified with Frege's equality of content (page 104) and enters into the definitional schemata of the primitive recursive functionals of Godel's theory T (pages 105-6). All of these claims may be called into question, however, as may the mathematical interest of this relation. Martin-L6f distinguishes between informal notions and their formal counterparts, between, for example, propositions and formulae, proofs and derivations, and mathematical objects and terms. Corresponding to the formal relation of convertibility (he calls the symmetric closure of the reduction relation between terms by this name) is the informal one of definitional equality.

It might be thought that the distinction between informal and formal corresponds to that between abstract objects and their linguistic representations, but this is apparently not the case. Definitional equality is explicitly described as a "relation between linguistic expressions and not between the abstract entities they denote and which are the same." (page 93). As

27Ibid. 28Proceedings of the Third Scandinavian Logic Symposium, ed. by S. Kanger, Amsterdam, 1975, pp. 81-109.

In te rpre ta t ions of Derivations 163

for the other informal notions, they too seem to be regarded as linguistic items. Martin-Lof asserts, for example, that definitionally equal propositions are notational variants of the same abstract proposition (page 94) and contrasts proofs thought of as linguistic expressions with abstract proofs (page 104). His usage suggest that, while terms like "proof" and "proposition" may refer either to linguistic or to abstract objects, he intends the former when using them without qualification. It seems to follow that mathematical objects must also be expressions of an informal language. This is consistent with Martin-Lof's account of models which occupies the first half of the paper, although it is not clear whether he really wants to say that a model is a linguistic structure.29

Although a conception of mathematics according to which its objects are certain expressions of an informal language seems sufficiently original to deserve a new name—informalism suggests itself as a suitable candidate, Martin-Lof associates it with the intuitionistic tradition. (He never discusses how, if at all, such a conception can be reconciled with Brouwer's oft repeated insistence that mathematics "neither in its origin nor in the essence of its method has anything to do with language."30) A connection between his views and other attempts to explicate intuitionism becomes apparent once it is realized that Martin-Lof's interest in informal languages derives exclusively from their being interpreted. This enables him to talk about the expressions of such languages where other authors would talk about meanings in the abstract.

Whether his preference for the former idiom over the latter is motivated by philosophical or pragmatic considerations is never made clear, but he does exploit it for his own advantage, most notably in the characterization of definitional equality. Since informal languages, unlike formal ones, come ready interpreted, each informal notion can be used to give meaning to the corresponding formal one. In other words, the distinction between formal and informal, although entirely on the level of language, can be correlated with that between (formal) syntax and semantics. Definitional equality is characterized as the least equivalence relation containing certain equations between expressions called definitions and closed under the following rule:

2 9 The alternative is to regard it as an abstract one, but to interpret formal theories in terms of the (meta-)linguistic description of the model, rather than in the model itself. There is perhaps no essential difference between these two approaches. In both cases we are presented with a formal language, a system of abstract objects, and a description of the latter—in English, let us say. The formal language is given meaning by correlating its well-formed expressions with certain expressions of English and thus, indirectly, with the abstract notions they describe. Whether we reserve the term "model" for the abstract system or apply it to the fragment of English which represents this system seems to be just a terminological matter. 3 0 The quotation is taken from a paper entitled "The Effect of Intuitionism on Classical Algebra of Logic" (Proceedings of the Royal Irish Academy, A 57, 1955), but the same sentiment is expressed by Brouwer in countless other places.

164 Normalization, Cut-Elimination and the Theory of Proofs

(t) If a and b are definitionally equal and c(x) is any expression, then c(a) and c(b) are also definitionally equal.

(Here "a" and "b" are supposed to denote expressions while uc(a)" denotes the result of substituting the expression a for free occurrences of the variable x in c(x).) No explanation is offered as to what constitutes a definition, and the remaining conditions are justified by an argument to the effect that they are the minimal ones needed if definitional equality is to serve Martin-L6f's purposes. A noteworthy feature of this relation is that it is not closed under the condition:

(J) If a and b are definitionally equal, so are Xx.a and Ax.b.

Martin-Lof's formulation of definitional equality glosses over the difficulties associated with this relation. In the first place, it is not clear how definitions are to be distinguished from other equations. Implicit in his discussion is the view that they are more than a means of introducing notation. (Cf. for example, the introduction of 6 (a i , . . . , an) on page 85.) On the other hand, for two expressions to be definitionally equal it is not sufficient that they denote the same object. (The rejection of (J), if nothing else, makes this clear.) The tacit understanding is that the expressions on either side of a definition have the same meaning, but this is nowhere stated explicitly. The reason most likely is that Martin-Lof wishes to avoid that quagmire of philosophical discussion having to do with synonymy and the nature of meaning. His attitude seems to be that, since we use definitions in informal contexts, we must have a sufficient understanding of what they are. By formulating definitional equality as a purely syntactic relation we can avoid the need to analyze this understanding; it is enough to chronicle our usage.

A second point concerns the closure conditions on definitional equality. If this relation is supposed to be something like sameness of meaning, it must be an equivalence and may plausibly be claimed to satisfy (f). What is less obvious is that these are the only conditions it need fulfill. Martin-Lof treats the question of which operations preserve definitional equality purely as a technical one. He therefore feels entitled to reject (|) on the basis of the principle of sufficient reason. This seems innocuous enough so long as we think only in syntactic terms. Consider however that, by doing so, he is in effect maintaining the following:

"the result of applying the operation 7 to a" may not mean the same as "the result of applying the operation 7 to /?" even though "a" and "/?" have the same meaning.

Perhaps this is true, but it is rather a counterintuitive claim which is not supported by anything in our informal experience. Furthermore, there seems to be no way to substantiate it other than by showing it to follow from some convincing account of meaning. (A satisfactory determination

Interpretations of Derivations 165

of its status is of some importance since one of Martin-L6f 's major points is that the formal analogue of (J) should not be part of the definition of convertibility.) Mart in-Lof pretends that definitional equality is a familiar relation, and hence not in need of explanation or justification. This does not, however, prevent him from insisting that it has particular properties which are obviously left undecided by our intuition.

According to Martin-L6f, definitional equality is the standard or intended interpretation of the convertibility relation. This means, in the case of the theory of combinators, that a privileged position is assigned to weak equality and hence to weak reduction. In the case of the A-calculus, the "correct" reduction relation is the image of weak reduction under the usual mapping between combinatory and A-terms, which is of course weaker than A/?-conversion.31 The implications of the above for natural deduction are that the permutative reductions, immediate simplifications and expansions cannot be part of the convertibility relation which correctly formalizes definitional equality between proofs (pages 100-1). More importantly, the proper reductions steps must be weakened so as to apply only to the final inferences of open subderivations (page 96), where an open subderivation of d is one such that none of its open assumptions are closed in d.

We are now in a position to appreciate Martin-Lof's discussion of Prawitz's conjecture. In his opinion it admits of two possible interpretations. According to the first, proofs are linguistic objects whose sameness is a matter of definitional equality. If we accept the foregoing analysis, the conjecture becomes true on this interpretation once the convertibility relation is restricted in the manner indicated above. According to the second interpretation, proofs are abstract objects and to say they are the same means simply that they are identical. Martin-Lof's comment here is that "there seems to be little hope of proving the conjecture in this form unless identical is replaced by provably identical" (page 104). As for the relationship between convertibility and provable identity, he believes it to be settled by his own earlier discussion in which he argues that, for a variety of theories, the former implies the latter but not vice versa.32

3 1 For an explanation of these notions, see for example Introduction to Combinators and \-Calculus by J. R. Hindley and J. P. Seldin, Cambridge, 1986. 32Reduction rules were introduced originally as a means of analyzing equality. It is scarcely surprising therefore that convertibility, on any definition of this relation, implies equality in all formal contexts. The converse does not hold in general for convertibility as understood here. It is relatively easy to find a theory T and terms t\, ti such that T h <i = <2 even though t\ does not convert to £2- The particular counterexample Martin-L6f exhibits is a term of the form < p(z), q(z) > (where p and q are the left and right projection functions, respectively, and z is a variable—all of the appropriate types) in a type theory which, while not including the rule: < p(z), q(z) > reduces to z, contains axioms of the form

Vxa\/yrA(< * , ! / > ) — \/zaXTA(z)

These together with the identity axioms and the usual defining equations for p and q

166 Normalization, Cut-Elimination and the Theory of Proofs

Despite their definiteness, there is something unsatisfactory about these conclusions. In the first place, the interpretation which is dismissed on the grounds that it makes the conjecture unprovable is probably the intended one—and certainly the most interesting. Furthermore, a statement may be worth investigating even though it does not admit of direct proof. In the present case such an investigation might actually involve establishing the relationship between convertibility and identity in some formal theory, provided that the theory in question could plausibly be claimed to capture significant properties of the identity relation between proofs. (The result would be of doubtful interest otherwise.) This claim, however, is no more likely to be provable than the original conjecture. It seems overly optimistic, therefore, to suppose that the status of the conjecture depends solely upon the answer to a technical question—especially because, once the meaning of identity between (abstract) proofs has been settled, there still remains the definition of convertibility. Martin-Lof's characterization of this latter relation is certainly the most controversial aspect of his treatment. Acceptance of it would oblige us to revise our most basic ideas about normal forms and their significance—most notably, the idea that normal derivations have to be direct. A derivation which is irreducible in Martin-Lof s sense may lack the subformula property. In fact, no bound (expressed in terms of the complexity of its assumptions and conclusions) can be placed on the complexity of the formulae occurring in such a derivation.33

enable us to derive < p(z), q(z) > = z. Because the terms on each side of this equation are irreducible, however, we cannot prove that one converts to the other.

The situation for closed terms is slightly different. As Martin-L6f points out, a closed derivation of an equation will usually yield a means of converting its terms into one another. So, we can at least state a partial converse of the above, namely:

t\ converts to 2̂ if there is a closed derivation of t\ — ti

for a range of familiar theories satisfying the normalization theorem. The preceding settles the relationship between provable identity and convertibility

only if one accepts that Martin-Lof's characterization of the latter is, as he asserts, the correct one. 33Consider the following derivation:

[A]

[A] A^(A-{A _ .. .(4 _» A).

[A] A-+A

(A^A)...)

A->A

(All occurrences of the assumption A are discharged by the final application of

Interpretations of Derivations 167

As pointed out earlier, we cannot afford to be too dogmatic about what constitutes the correct definition of interreducibility, but so unacceptable a consequence makes it tempting to reject Martin-Lof's candidate out of hand; whatever applications his notion of convertibility may possess, it is surely not the relation we are interested in analyzing. At the very least, we should be cautious about accepting it and subject the arguments offered in its favor to careful scrutiny.

There are only two such arguments, and one of them is not really relevant to the purpose at hand. The first is that, by weakening the definition of convertibility in the manner suggested above, certain technical advantages are obtained.34 The weaker relation is certainly more manageable (cf. the case of weak vs. strong reduction in combinatory logic), although it is not without its drawbacks too. Even if we grant the point, however, it establishes only that the relation defined by Martin-Lof can be useful—and this is not in dispute. As he himself writes (on page 96): "we are free to define many different relations between terms and call them convertibility relations." He then adds: "but my claim is that only one of these correctly formalizes the [intended interpretation of convertibility]." I am interested in his contention that he has supplied the correct formalization. This brings me to the second argument:

(a) The intended interpretation of convertibility is a relation between linguistic expressions called definitional equality.

(b) Definitional equality is the least equivalence containing various defining equations and closed under (f) above.

(c) It follows that the correct definition of convertibility simply formalizes the properties mentioned in (6) (and that is exactly what Martin-Lof's does).

Once (a) and (b) are accepted, there is obviously no denying (c). It would appear therefore that, no matter which interpretation of the conjecture is favored, MartinLof's remarks about it depend ultimately on his conception of definitional equality. As was hinted earlier, however, he seems to think that this relation is a familiar one whose role needs no explanation and whose properties, once stated, are easily recognizable as such. Consequently, he does not embark on a systematic defense of (a) or (b). In

introduction.) This example depends upon the fact that its conclusion is introduced by an application of —•-introduction. It is the only rule which will produce such a result in DT although similar examples can be constructed in NJ and NK using V- or 3-elimination. (A notion of normal form which is sensitive to differences of this kind is perhaps not entirely satisfactory.) Notice, however, that the subformula property will hold for certain subclasses of derivations—for example, derivations of atomic formulae in the negative fragment—so that many applications of normalization arguments will not be much affected by this weakening of the reduction relation. 34Some of these are listed in Section 2.1 of Martin-Lof's paper, op. cit

168 Normalization, Cut-Elimination and the Theory of Proofs

view of the importance of definitional equality to the discussion, it is worth considering briefly whether he is justified in treating it in this way.

Martin-Lof claims that by definitional equality he means "the relation which is used on almost every page of an informal mathematical text." He also claims that it is to be found in the writings of such authors as Frege and Godel. None of these claims will bear much scrutiny, however. It seems obvious that definitions are used informally in more than one way, and it may plausibly be argued that in one of these usages they express a relationship between signs. Nonetheless, whatever relation R holds between definiens and definiendum, R is not an equivalence. (It is probably not reflexive, symmetric or transitive, let alone all three.) In the second place, it is not closed under (f) above. (This is not to deny that the inference c(a) = c(b) is often drawn from a =df. b, but this is simply to justify the conclusion that c(a) and c(b) are the same by reference to the definition of a. It certainly does not suggest that there is some special relationship between "c{af and "c(6).")

The evidence for these assertions is contained in any "informal mathematical text," where the reader will be hard put to find statements like "b =df. a because a =df. b" or "since a =df. b, it follows that c(a) =df, c(6)." This sort of nitpicking about usage does not really get us very far. It certainly provides no argument against the possibility, or even the desirability, of introducing a relation like definitional equality. On the other hand, it is surely sufficient to establish that, far from playing the central role attributed to it by Martin-Lof, this relation is not to be found in mathematics as it is currently written.

Turning now to Frege, we find according to Martin-Lof that definitional equality may be identified with the relation of equality of content found in the Begriffschrift (and symbolized there by "=") "provided one disregards the geometrical example" with the aid of which it is introduced (in Section 8). He goes on to say that Frege's axiomatization of = (in Sections 20 and 21) is not "compatible with the analysis of the relation given earlier" (presumably, earlier in the Begriffschrift). The reason is that Frege gives the familiar identity axioms for =, and Martin-Lof contends that, because "a" and "6" stand for themselves in some occurrences and for their contents in others, statements like a = b —• (A(a) —* A(b)) are meaningless. He concludes that this led Frege to replace = with the more familiar equality relation analyzed in "Uber Sinn und Bedeutung" and the Grundgesetze. The implication is that Frege gave the wrong axioms for the notion he was trying to capture, and that the consequences of this mistake led him to abandon it altogether.

Unfortunately, none of this is in the least plausible. It is obvious from the outset that Frege's identity of content resembles Martin-Lof's definitional equality in only one respect: it too is a relation between expressions. As the example we are asked to disregard makes clear, it is the relation

In te rpre ta t ions of Derivations 169

which holds between two terms when they denote the same thing. The axioms presented in Sections 20 and 21 are, of course, valid under this interpretation. Furthermore, Frege seems not to have been unduly disturbed by the need to use names in a systematically ambiguous way. (Granted that this usage offends the ears of formal language speakers nowadays, it surely does not reduce the axiom quoted in the preceding paragraph to meaninglessness.) It is unlikely, therefore, that this impelled him to revise his treatment of identity. A more plausible suggestion is that, finding himself unable to distinguish between more than two judgable contents, he abandoned the notion of content altogether in favor of the doctrine of sense and reference. He then found it convenient to reformulate his treatment in terms of this distinction. The point to stress is that no great discontinuity exists between the earlier and the later accounts of identity. In fact, identity judgements are analyzed in remarkably similar terms in Begriffschrift and "Uber Sinn und Bedeutung."35

The geometrical example of a single point determined in two distinct ways which appears in both works, albeit in slightly different guises, makes these similarities particularly evident. Frege takes pains to stress the connection between names and ways of determining. In Begriffschrift he writes that the different "ways of determination" correspond to the "different names of the thing thus determined," while in "Uber Sinn und Bedeutung" he speaks of "different designations of the same point" and states that these names "indicate the mode of presentation."36 It is this connection which saves judgements of identity from triviality. The difference between the two works is that in the later one he grants these modes of presentation an existence apart from the names to which they correspond. Of course, he also reformulates his account of identity statements so that the relation asserted to hold is between the individuals named rather than the names themselves, but this seems to be of less interest. I am not denying the obvious formal differences between these two sorts of relation, but from most perspectives—including Frege's, I suspect—it matters little whether "a = 6" is taken to mean that "a" and "6" name the same individual, or that the individual named by "a" is the same one named by "6."

Frege nowhere suggests that its way of determining is part of the content of a name. On the contrary, his analysis precludes such a possibility since it is only when the same content is determined in different ways that a non-trivial identity judgement can be made. For singular terms, at least, content can be identified with what Frege later called reference. This much seems uncontroversial and is sufficient to refute Martin-Lof's interpretation

35For this observation, and for much else in this paragraph, I am indebted to the interesting discussion of Frege's views in Chapter 4 of An Essay on Facts by K.R. Olson (Stanford, 1987). 3 6 The phrases in quotation marks are those used by Geach and Black in their Translations from the Philosophical Writings of Gottlob Frege (3rd edition, Oxford, 1980).

170 Normalization, Cut-Elimination and the Theory of Proofs

of identity of content, even if one prefers his account of why Frege eventually revised the Begriffschrift theory.

When Frege turns to definitions in Section 24 of Begriffschrift, he distinguishes them from identity judgements in the previous sense not because they are concerned with a different kind of relation—they are not—but because, being prescriptive, they are not to be regarded as judgements at all. Baker and Hacker comment on this section: "If a symbol is introduced by a formal definition, the fact that it designates an entity in a particular way . . . seems to be an altogether objective feature of it, and hence there seems pressure towards adopting the principle that in this special case the way of regarding (or the mode of determining) an entity is part of its content. Frege did not draw this conclusion."37 So, there is no comfort for Martin-Lof here, either. Somewhat ironically, his ideas are more easily reconciled with Frege's later views. It is only after the mode of determining has been separated from the name that it makes sense to ask whether two names determine an object in the same way, and I take this to be the question underlying his conception of definitional equality. In the final analysis, however, nothing approximating to this relation is to be found in Frege's writings.

It only remains to consider what Godel has to say about definitional equality and, in particular, whether it plays any role in his Dialectica paper cited earlier. The paper describes a translation of intuitionistic arithmetic into a system T of computable functional of finite type. Roughly speaking, T comprises:

(i) certain equational axioms for defining these functions, (ii) the principle of proof by induction (with respect to a numerical vari

able), (Hi) the usual axioms for identity, and (iv) the propositional calculus—including axioms of the form

(s = £) V (s ^ t) for all terms s and t.

Martin-Lof contends that 'equality' in (i) is not the relation which is described in (Hi) or appears in an equation obtained with the aid of (ii). His reason is that he takes the former to be definitional equality, whereas "we cannot convince ourselves [of the validity of (ii) or (Hi)] unless, when reading the formulae, we associate with the terms not themselves but the abstract objects which they denote." (page 106). In fact, he upbraids Godel for remarking in a footnote to the identity axioms that this relation is to be understood as "intensional definitional equality."38 Rather sur-

37Frege: Logical Excavations by G.P. Baker and P. Hacker, New York, 1984, page 160. 3 8 The reason for the remark is that from a constructive viewpoint (extensional) equality between functions of higher type is not a decidable relation. The axioms mentioned in iv) are essential for the translation, however, so "=" must be given some other interpretation if T is to be constructively acceptable.

Interpretations of Derivations 171

prisingly, this remark probably gave rise to the idea of definitional equality as a relation between terms which is determined by their conversion rules. Tait, while discussing the interpretation of equality in (certain extensions of) T, observes that according to Godel's own interpretation: "s = t means that s and t denote definitionally equal reckonable terms." (The term he translates as "reckonable" is rendered by "computable" above.) He goes on to say: "Lacking a general conception of the kinds of definitions by which an operation may be introduced, the notion of definitional equality is not very clear to me. But if . . . we can regard the operations . . . as being introduced by conversion rules . . . then definitional equality has a clear meaning: s and t are definitionally equal if they reduce to a common term by means of a sequence of applications of the conversion rules."39 This is essentially Martin-L6f 's view, except that he is less diffident about the general notion of definitional equality.40 There is a difference, however. For the reasons alluded to earlier, Martin-Lof regards definitional equality, not as a possible interpretation for identity in T (or any other theory), but as a relation satisfying different laws.

Despite what Tait writes, it is doubtful whether Godel really intended s = t to express a relationship between terms. If he had, it is difficult to understand why he regarded T as constituting an extension of finitary mathematics. According to his own account:

Bernays' observations . . . teach us to distinguish two component parts in the concept of finitary mathematics, namely: first, the constructivistic element, which consists in admitting reference to mathematical objects or facts only in the sense that they can be exhibited or obtained by construction or proof; second, the specifically finitistic element, which requires in addition that the objects and facts considered should be given in concrete mathematical intuition. This, as far as the objects are concerned, means that they must be finite space-time configurations of elements whose nature is irrelevant except for equality or difference. . . .

3 9Tait, op. cit, page 198. 4 0He is no doubt encouraged in this by Kreisel who makes light of Tait's misgivings in a review of the latter's paper (in Zentralblatt fur Mathematik, Vol. 174, 1969, pages 12-13). Kreisel attributes them to the view—mistaken, in his opinion—that to make sense of definitional equality for constructive operations it is necessary to have a listing of their possible arguments. (He makes the same point in "A Survey of Proof Theory II," page 156: "Tait expressed doubts in [1967] about the sense of definitional equality t = t' unless all possible arguments o f t and t' are listed.") This criticism seems misplaced, however. The passage quoted above points out only that the notion of definitional equality depends upon what constitutes a definition, and that we lack a general answer to this question. It is hard to find fault with this observation. (Presumably, even definitional equality between number-theoretic functions is not very clear to Tait, although there can be no doubt about their possible arguments.)

172 Normalization, Cut-Elimination and the Theory of Proofs

It is the second requirement which must be dropped [in the face of negative results about the provability of consistency].41

He goes on to say that his theory T is one result of doing so. Now, the terms of T conform to the finitistic description of objects; the conversion rules which are supposed to settle questions of their identity and difference are finite combinatorial operations, and the theorems of T are just (propositional combinations of) equations of the form s = t. The intended interpretation of T, i.e., what its theorems are supposed to be about, must therefore lie outside this domain. In fact, it is clearly stated that T is a theory of certain abstract objects, the computable functionals of finite type; this granted, equations in T must surely express a relation between these abstract objects.

It should be apparent by now that there is little in Godel's paper to support Martin-Lof's analysis of definitional equality. In fact, I have tried to argue that there is little reason to accept it at all. Few arguments, and no compelling ones, have been advanced in favor of the thesis that the intended interpretation of convertibility is a relation between linguistic expressions, and the same can be said of his claim that, because its intended interpretation does not satisfy the laws of identity, the definition of convertibility must be weakened in the manner indicated above. Furthermore, even if one accepts definitional equality (in the sense of Martin-Lof) as the correct interpretation of convertibility, it seems to me that this is quite simply a case of explaining obscururn per obscurius: Martin-Lof succeeds neither in establishing that the former is a familiar relation, nor in explaining its significance. These strictures notwithstanding, Martin-Lof's discussion of identity between proofs does focus on the central questions: where proofs are to be located in the scheme of things, and what sort of equivalence relations might hold between them. That these issues should be contentious at all is due, in part at least, to the view that proofs are intensional and that a relation other than the familiar one of extensional equality holds between them.

The notion of intensionality is a notoriously problematic one. It appears under a variety of different names, and the term "intensional" has been used to express a variety of different distinctions.42 This is not the place

4 1 The quotation is from a revised English version of Godel's Diaectica paper (see Vol. II of his Collected Works, page 274), but a similar passage occurs in the original (ibid., page 244, or Dialectica, Vol. 12, page 282).

It is unclear why Tait interprets s = t to mean that s and t denote definitionally equal terms. Three or four lines earlier he suggests that s and t denote functionals. These statements are hard to reconcile unless terms are supposed to be used ambiguously in the manner recommended by Martin-Lof. There is no mention of this, however, which makes it seem an unlikely possibility. Perhaps it is simply a slip on his part, occasioned by the existence of term models for T or his reservations about the notion of definitional equality. 4 2This is well illustrated by the appendix to Fr. Frisch's essay Extension and Com-

In terpre ta t ions of Derivations 173

to survey or evaluate these usages, but I do think it worth distinguishing between two of them which, although not independent of each other, can be separated. In so doing, I do not mean to suggest that these two are the only ways to understand the term or, least of all, that they are the only correct ones. My claim is simply that elements of both are present in discussions about proofs and that some advantage is to be gained from distinguishing between them. The first sense is easy to explain: intensional means not satisfying an extensionality principle of the appropriate kind. The second is harder to make precise, so I propose to give only a rough idea of what I have in mind. A distinction can be drawn between some domain of elements and a system which describes or represents it. The domain can be anything from the universe in which we find ourselves to a mathematical structure.

As for the representational system, it is usually thought of nowadays in linguistic terms, and I shall follow this practice below, but it need not be. Ideas, for example, could serve just as well, and would have done so in another age. I am less concerned with the exact nature of these two items than with the relationship between them. To explain this, i.e., how language can refer to the elements of the domain or, alternatively, how these can be grasped in linguistic terms, an intermediate realm is sometimes postulated which consists of 'ways of determination' or 'modes of presentation'—to borrow a pair of phrases from Frege. In its second sense the word "intensional" characterizes the denizens of this realm. I prefer this usage and propose to adopt it below, reserving "non-extensional" for the first sense.43

Kreisel respects none of the distinctions drawn above. His conception of intensionality is patterned after the paradigm of a predicate or formula which represents a property and its extension. Here, formal systems play the role of predicates. The proofs expressed by such a system can be compared to the property, while the conclusions established by these proofs are analogous to its extension. In case the system generates arguments from assumptions, comparison with a functional term is more appropriate:

prehension in Logic (New York, 1969) in which he lists one hundred and seventy eight modern (i.e., since 1662) logicians, the terms they use to express the distinction between intension and extension, and what they understand it to be. The remarkable feature of this list is not that there are similarities between its entries, but that scarcely any two of them are exactly alike. 4 3 I am quite comfortable with the division into objective (i.e., pertaining to the objects of our interest—not, as opposed to subjective), intensional and linguistic spheres. I realize, however, that there are those who deny the independent existence of one or more of them and others who deny their existence outright. Although this presents a potential source of problems, I hope that the discussion which follows will be acceptable even to those who reject my metaphysical prejudices. All it depends upon really is the claim that we can draw some kind of distinction between these three elements of our experience. Whether it is a distinction in re or merely in intellectu need not be decided here.

174 Normalization, Cut-Elimination and the Theory of Proofs

its consequence relation is analogous to a function in extension, while its proofs correspond to the procedure by which values are obtained from arguments. For example, Prawitz, who shares this general outlook, comments on Frege's rules for the predicate calculus that they "may be understood as an extensional characterization of logical proofs (i.e., a characterization with respect to the set of theorems) within certain languages . . . but the characterization is only extensional since the formal derivation[s] may use quite different methods of proof and have a structure different from the intuitive proof[s]."44

It is not just the system as a whole which may have an intensional feature however. Individual proofs may also be described as intensional objects. Kreisel, for example, at the beginning of his review of The Collected Papers of Gerhard Gentzen refers to them as such when emphasizing that "the distinction between different formal systems with the same set of theorems, in terms of the proofs expressed by their derivations, is meaningful," and he goes on to note: "I use 'express' for the relation between a formal expression E and the intensional object meant by E, and 'denote' for the case when we suppress the intensional features of the object, for example, in model theory."45

It seems then that there are intensional objects but these are continuous with extensional ones—the latter being simply the former minus some of their characteristics. (This is a sort of opposite to the view, espoused by Quine amongst others, that a property is a set with something added.) In the present case, presumably, what the object asserts is an extensional feature, how this assertion is established an intensional one. This view may reasonably be described therefore, in the terms used earlier, as one which associates intensions with how objects are presented and, at the same time, insists that objects cannot be separated from their mode of presentation.

As for the criterion of what constitutes an intensional feature, this is provided by a principle of extensionality. So, for Kreisel, there is no difference between intensional and non-extensional. What form such a principle should take is assumed to be uncontroversial. This assumption—although almost universal—is, I think, unfortunate. Consider, for example, proofs as conceived above. Once their intensional features are suppressed, we are left with the conclusions they assert (on the basis of their assumptions).

The extensionality principle underlying Kreisel's analysis seems, therefore, to be one derived from regarding the consequence relation as the analogue of a function in extension, namely: two proofs are extensionally equal if they have the same assumptions and conclusion. Now, this is certainly a possible criterion, but it is not the only one. In fact, it even conflicts

"Ideas and Results in Proof Theory," page 238. Journal of Philosophy, Vol. 68, 1971, page 243.

In terpre ta t ions of Derivations 175

with the obvious extensionality principle for proofs, when each one of them is regarded as a function along the lines suggested by the analogy with A-terms.46

Intensionality appears in another guise too, associated now with the way in which an expression determines (rather than with the mode of presentation of an object). Somewhat confusingly, therefore, intensional objects seem to be continuous not only with extensional ones, but also with linguistic representations. For example, after an analysis of Godel's Second Theorem, Kreisel remarks that "not only deductions treated as extensional objects are relevant here . . . but even additional information or 'structure', namely the sequence of operations involved in building up the deductions."47

It is true that the features of a derivation which, by implication, do not count as extensional are here associated with how it is constructed (i.e., with its mode of presentation) but, thought of in this way, the distinction is purely conventional and arbitrary. I raised this same point earlier in connection with the elimination rules for V and 3, and when comparing L with N rules. The fact is that formal derivations always bear traces of their construction; the extent to which they do so varies from calculus to calculus, and even from rule to rule within a calculus. It only makes sense to distinguish between extensional and intensional features in this context if derivations are thought of not merely as formal objects, but as expressions. Their extensional features are sufficient to determine what they express, their intensional ones indicate how. This does indeed seem to be the point of the remark quoted above, namely, that for the theorem to hold it matters not only what proof a derivation expresses, but also how (as "the proof X," for example, or as "the proof X than which there is no earlier proof of the negation of its conclusion in some fixed listing").

The intensional features described in the preceding paragraph differ from those discussed earlier. Proofs were originally said to be intensional because they mediate between derivations and assertions, now there are intensional elements which mediate between derivations and proofs. Perhaps these too deserve to be called proofs. Prawitz seems to encourage this usage when he implies that derivations which represent the same proof are synonymous.48 Certainly, it underlies the distinction drawn by Martin-L6f between (linguistic) proofs and abstract ones. His point, apparently, is that a proof can be a certain kind of abstract object or a way of determining (or defining) such an object. He has little to say about what constitutes

4 6 The division of an object's characteristics into extensional and intensional ones is oddly reminiscent of the traditional distinction between primary and secondary qualities. The former inhere in the object itself, the latter have to do with how it appears to us. The problem here is that it is not altogether obvious where the line is to be drawn. 4 7 "A Survey of Proof Theory II," page 179. 4 8 "Ideas and Results in Proof Theory," page 237.

176 Normalization, Cut-Elimination and the Theory of Proofs

identity between proofs in the first sense. As for the second, identity does not mean determining the same object, but determining it in the same way; hence it is a matter of definitional equality. In the terms I introduced earlier, proofs in the second sense are intensional objects. Unfortunately, Martin-Lof, for reasons which he never makes explicit, treats the intensional as a subdivision of the linguistic. As a result, he refers to them as expressions of an informal language.

What emerges from the above is, I think, not only that "intensional" and "extensional" can each be used in different ways, but also that they are so used—sometimes by the same author in a single work. This is relevant to our inquiry because, as I suggested earlier, the language of extensions and intensions provides the framework within which the most basic and general questions about the nature of proofs are to be settled. I do not wish to imply that any of the authors whose views I have been discussing are confused in their use of these terms, but they are perhaps a little confusing. I have tried to indicate above, in a reasonably unambiguous manner, what I understand by "intensional" and "extensional." It remains for me now to classify proofs according to the concepts so introduced.

My proposal is that we should not regard them as intensional objects at all, but simply as the denotations of derivations. This does not rule out a study of the relationship between proofs and what they establish, but it does affect how the relation is to be described. It is also not intended as a comment upon the possibility or the interest of a study of how derivations denote proofs. Contrary to what Martin-Lof suggests, however, I believe that, if such a study is to be successfully undertaken, it can only be after we have gained a better understanding of proofs themselves (including their identity criteria). We would be ill-advised, therefore, to concentrate upon it at the outset. As for the question of whether or not proofs are extensional, and even what constitutes an appropriate criterion of extensionality for proofs, these are matters for investigation. Certainly, we cannot simply assume that they are non-extensional.49

4 9Those who are made uncomfortable by talk of intensional objects can interpret my proposal simply as a methodological principle (or even, less kindly, as a terminological one). It amounts to no more than the claim that the objects of interest in a particular field of inquiry should be separated from the way in which they are presented. If it should happen that we are interested in a domain O of objects which comprises the members of another domain O' together with their modes of presentation 7£, we should be especially careful not to confuse the members of 71 with the manner in which we represent the members of O.

I realize that, even interpreted in this way, the principle is not philosophically neutral. In fact, it contradicts a famous view of the foundations of mathematics. Its rejection, however, seems to lead almost inevitably to obscurity and confusion. For what it is worth, I think that even writers from the intuitionistic standpoint adopt it de facto. Because it violates their principles, however, they do so neither very explicitly nor always consistently.

Interpretations of Derivations 177

There are immediate benefits to be obtained from adopting this proposal.

(i) There is a gain in clarity. Viewed in the way suggested above, the formal

study of proofs takes place within a well-developed conceptual framework, that of model theory. We are interested in models of certain formal systems whose terms are derivations. Our ideas about their intended interpretations are, perhaps, not as precise as we would like. But, as I argued in the introduction, we may hope to clarify them by investigating what interpretations they will in fact admit.

(2) I remarked at the beginning of this chapter on the possibility that the

analogy between derivations and A-terms might be incompatible with that between derivations representing the same proof and sentences having the same meaning. It clearly is if meanings are supposed to be intensional. From the perspective of the A-calculus, the appropriate comparison is not with sentences and their meanings, but with terms and their denotations. In other words, the view of the relationship between derivations and proofs taken above is forced on us if we want to take seriously the analogy between derivations and terms. This analogy, however, has been the mainstay of the subject as developed by Prawitz et a/., and there is no reason to suppose that its usefulness—as a source of results, for example—has been exhausted yet.

(3) I think it is regrettable that the interest of a general theory of proofs for

classical mathematics has seldom been emphasized. The fact that proofs play a special role in mathematics conceived intuitionistically, as part of the subject matter of ordinary mathematical assertions, should not blind us to their importance for mathematics on any conception of the subject. As I remarked earlier, the claim that classical mathematicians are interested only in consequence, and not in the proofs themselves, does not do justice to the facts. Questions concerning the identity of proofs, for example, or their constructive content are no less interesting from a classical, than from any other point of view. On the other hand, intensions have no role in classical mathematics. It seems to me, therefore, that there ought to be a classical theory of proofs, and that the distinction drawn above between a derivation's denotation and its manner of denoting provides a convenient way of differentiating classical from intuitionistic approaches to the subject.

(4) The distinction is also useful for resolving, or at least clarifying, the na

ture of the disagreement between Prawitz and Feferman alluded to earlier.

178 Normalization, Cut-Elimination and the Theory of Proofs

On the one hand, the view that TV derivations adequately represent proofs, their similarity to the terms of certain calculi and the interpretation of the convertibility relation in these calculi combine to make it almost impossible to deny that interreducible derivations represent the same proof. On the other, such derivations are clearly not just linguistic variants of one another; there can be no doubt that information is lost in the process of reduction. The only apparent way to reconcile these facts is to classify the information lost as relevant not to the proof itself but to its mode of presentation. In other words, although interreducible derivations describe the same proof, they may do so in different ways (just as "the author of Waverly" and "the author of Waverly and Kenilworth" describe the same individual, although information is lost when the first replaces the second). This is not to deny the possibility of other conceptions of the subject, according to which distinctions between proofs are expressed by differences between such derivations; they may well be necessary for some purposes. Yet, I want to claim more than that a coherent and interesting notion is also arrived at by identifying the denotations of interreducible derivations. It seems to me that the ideas and methods currently employed in the general theory of proofs—the emphasis on strong normalization, the comparison of proofs with functions, etc.,—presuppose a notion of this kind. (This seems to be Prawitz's point too, when he remarks that the real issue is whether a proof can indeed be identified with "the result of applying certain operations to obtain a certain end result.") If it should turn out to be inadequate, new insights and techniques will be needed to study its replacement.

(5)

Finally, I think the distinction between a derivation's denotation and its way of denoting helps to clarify the status of permutative reductions. The idea that any reduction step preserves the way of denoting is, in my opinion, wholly implausible. Recall that, given any derivation in N or L, it is easy to construct another one with the same normal form which is arbitrarily complex (on any measure of complexity). Although I do not subscribe to the view that meanings are psychological, it does seem to me that psychological criteria can play a role in evaluating a theory of meaning. For example, if two expressions present an object in the same way, anyone familiar with the conventions governing their use should be able to recognize this fact immediately. By the above, however, it is far from obvious in general when two derivations reduce to the same normal form. (In my opinion, this argument counts also against Martin-L6f's conception of definitional equality.) As for permutative reductions in particular, they alter those features of a derivation which, according to Kreisel at least, are paradigms of intensional ones.

Once the idea in question is abandoned, it is possible to look at permutative reductions in a new light. Viewed in the abstract, there is little to

Interpretations of Derivations 179

distinguish one permutation of inference from another. As I observed earlier, it is difficult even to imagine what kind of evidence would legitimate some of these whilst ruling out others. The obvious conclusion to be drawn is that, given any property of proofs, it is either preserved by all such permutations or by none. The circumstances described in (4) above, as well as the interpretations of derivations discussed below, would seem to favor the first of these alternatives. Unfortunately however arbitrary permutations can alter radically the structure of a derivation, which implies that much of this structure cannot correspond to features of the proof being described. Its importance, on the other hand, is undeniable and not explicable simply as a matter of syntax. The position being advocated, then, is that permuting any pair of inferences in a derivation leaves its denotation unchanged, but alters the manner in which this denotation is presented. This does justice to the importance of the structure of a derivation, whilst removing the need to perform the seemingly impossible task of judging between permutative reductions.

I do not mean to imply that there are no grounds for distinguishing between different sets of permutative reductions. For certain applications it may obviously be necessary to restrict attention to a group of them which is adequate (in the sense that it allows every derivation to be reduced to a normal form), yet has certain desirable properties; it ensures that normal forms are unique, for example, or that every reduction sequence must terminate. But these distinctions are based on considerations of technical expediency. It is not surprising therefore that they should be made differently in different formal contexts, and unlikely that any profound consequences can be drawn from their being so. One of the virtues of the multiple-conclusion approach to these matters is that it reveals very clearly both the arbitrariness of the restrictions placed upon permutative reductions in the conventional reduction procedures (in particular, how they are motivated by syntactic features of the calculus concerned) and the wide variety of choices available when it comes to selecting such restrictions. The preceding provides, I think, a reasonable explanation of this situation.

The views expressed above, especially in (5), have an apparently disturbing implication. It is that in some respects proofs might be better represented by derivations whose inferences cannot be permuted. A single such derivation could then be associated with each group of N or L derivations whose members differed from one another only in the order of their inferences. As I argued at the end of Chapter 4, however, the possibilities for a representation of this kind seem rather limited. The most promising appears to be a calculus of the sort mentioned there in which the conclusions of a rule need not be connected to its premises; but it re-

180 Normalization, Cut-Elimination and the Theory of Proofs

mains to be seen whether this possibility can be realized.50 One objection raised against this sort of calculus in Chapter 4—that the representation it provides is not uniform because the relationship between the premise and conclusions of rule (4) is treated differently from that between the premises and conclusions of the other rules—could perhaps be overcome by making more widespread changes in the structural effect of applying these rules. Such an expedient would only serve however to reinforce the other objection, namely, that a calculus of this kind is less than ideal for representing the actual process of reasoning. That this should be so is, in my opinion, simply a reflection of the difference between a proof as conceived above and a particular piece of reasoning. The latter is properly regarded as a mode of presenting the former.

My proposal now is that this difference can be regarded as a special case of the distinction between the grounds for an assertion and the procedure by which they are established. Expressed in these terms the distinction is a familiar one—albeit one which, because it corresponds to the distinction between truth and proof, is usually held not to apply to the present situation. This simple dichotomy is, however, misleading. It seems to me that the ambiguity inherent in the notion of justification is more accurately conveyed by a whole spectrum of possible meanings than by a pair of clear-cut alternatives. At one end of this spectrum is the view, espoused by Frege amongst others, that all justifiable assertions have the same justification, namely, they denote the True. At the other is Brouwer's conception of a justification as a singular process that takes place at a particular point in time; of course, an extrapolation from or description of such a process may also be called a justification, but this is clearly a secondary meaning which applies to something only insofar as it can serve to produce a justification in the primary sense. Justification in Frege's sense is certainly general enough, but this generality is purchased at the expense of informativeness; exactly the reverse could be said of Brouwer's conception. Our general philosophical perspective may determine how wide this spectrum appears to be, and may blind us to certain parts of its range. I am however more concerned with the other factors which influence where along this spectrum an acceptable justification is thought to be located.

One such factor—perhaps the decisive one—is the nature of the assertion to be justified. As a rule of thumb, the distance from the Fregean end seems to be inversely proportional to the degree of obviousness and accessibility of the procedure by which an assertion of the kind in question is established. So, for example, a claim to the effect that its grounds can

5 0 The project of characterizing the equivalence relation generated by permutations of inference is of interest even if one rejects the thesis that interreducible derivations represent the same proof. Whatever view is taken of the proper reduction steps, if order of inference has some real significance for proofs, I do not see how either N or L derivations can be said to represent them adequately.

In terpre ta t ions of Derivations 181

be established is usually sufficient to justify an assertion about observable events close at hand. If there are obstacles in the way of observing an event, the justification of assertions about it may require additional information indicating how their grounds were established. To describe a justification in this way, however, is also to interpret it. For example, the grounds for asserting that it is 70° and sunny in Athens are certain meteorological conditions at a particular time and place, and the weather report in The Times provides a means of ascertaining whether these conditions do in fact obtain; but, if for some reason I am not particularly interested in Athens itself and have made the assertion on the basis of the weather report, it might be more natural to describe the report itself as the grounds for my assertion.

Of more interest in the present context are mathematical assertions. Here, the procedures by which they can be established are well known, but relatively inaccessible. As a result, they normally require a more elaborate justification. (A simple "factual" claim may still suffice, however, when these procedures are accessible—e.g., in the case of a particularly obvious assertion, or one intended to convince only specialists.) Here too there is a question as to how such a justification is to be characterized. The difference is only that in the present case the question is taken more seriously. (After all, the view that Athens exists only insofar as it is reported on in newspapers is currently out of favor even amongst philosophers; its mathematical analogue, however, is actively discussed.)

It may be helpful to compare a mathematical assertion to an empirical one pertaining to a state of affairs which cannot be investigated directly. For example, on a foggy night the location of a sandbank may be inferred from the sound of a bell or a siren, the sight of a warning buoy or light, from depth soundings or from any combination of these. According to some mariners, the sandbank itself, a particular configuration of matter, is distinct from all of the above; they only constitute ways by which we may come to know about it, albeit indirectly. Under other circumstances, we might hope to see the bank directly and comprehend all its properties including its location.51 According to others, there is no such configuration of matter. To assert the existence of a sandbank at some location is simply to make a statement about the possibilities of undergoing certain kinds of experiences (seeing buoys and hearing sirens, for example, in a particular place). The sandbank itself is one of those "noxious ornaments, beautiful in

5 1 This somewhat heavy handed extension of the analogy is intended to provide a metaphorical account of Plato's conception of mathematics. Mathematical objects enjoy an independent existence and under the right circumstances, with the appropriate effort and philosophical training, we may hope to know them directly (i.e., when the fog lifts and the sun comes up we may hope to see them). In the meantime, we must learn about them with the aid of objects we do know (can see) which reflect their properties. See, for example, Republic, 510e.

182 Normalization, Cut-Elimination and the Theory of Proofs

form, but hollow in substance/' which must be excised from our ontology; any risks incurred in so doing will be "at least partly compensated for by the charm of subtle distinctions and witty methods" by which it enriches our thought.52

My purpose in mentioning all this is not to adjudicate between different views, but to illustrate how many different distinctions can be drawn when it comes to a matter of justification. There are at least the following:

I. the particular configuration of sand and its location [mathematical objects and facts about them]

II. evidence for I (for example, sirens and lights, or bells and buoys) [proofs]

III. particular presentations of II (for example, the sighting of a buoy followed by the sound of a bell) [modes of presenting proofs]

IV. linguistic expressions which describe II in terms of III [derivations]

V. a particular experience (for example, seeing the buoy and hearing the bell) [the grasping of a proof].

As I have already indicated, there are those who would dispute the significance of some of the above. Furthermore, the way in which I have chosen to describe them is not neutral. (For example, to characterize the activity V as the grasping of a proof is to give II a priority or, at least, an independence which will offend those for whom II is derivative upon V.) Nevertheless, this list does serve to indicate where proofs, conceived of as the denotations of formal derivations, can be located in the scheme of things. Of course, I have not provided an analysis of this notion of proof, nor can I do so here, but I hope that I have done enough to make the idea of distinguishing between proofs and reasoning at least seem coherent.53

The ideas discussed in (l)-(5) above are of a rather general and speculative kind, although some do contribute to the solution of quite specific problems. As a result, their value depends to some extent on their ability

5 2A. Heyting, Intuitionism, 3rd edition, Amsterdam, 1971, page 11. 5 3I realize that the view I have espoused is an unconventional one, violating both linguistic and philosophical orthodoxy, but I do think it has much to recommend it. For example, I remarked earlier that mathematicians are said to be interested in questions about the identity of proofs. It is doubtful, however, whether they are as interested in such questions in the abstract as they are in new proofs, even of known results. Whatever "new proof" means in this context, it surely is something more than "obtained from the old one by the kind of structural operations on derivations considered above." (Anyone who attempted to publish a proof which was "new" in this sense would invite ridicule.) On the other hand, a determinate structure is an integral part of an argument or a piece of reasoning. There seems to be no other way of reconciling these facts than by distinguishing proofs from reasoning.

Interpretations of Derivations 183

to interact fruitfully with more formal and specialized work in logic and mathematics. By a fruitful interaction, I mean both that there should exist formal results which support them and which acquire significance when interpreted in terms of them, and that they should suggest new lines of inquiry. Whether such interaction is in fact possible remains to be seen. Even at this stage, however, it seems to me that there are interesting connections between these ideas and some formal work. I would like to conclude by mentioning two examples which, in my opinion, illustrate this fact.

(i) The analogy between derivations and terms, coupled with the idea of

looking for models which the former will admit, leads naturally to a consideration of the full type-structure over some domain. Harvey Friedman has considered structures of this kind and characterized the identity relation between their elements in terms of convertibility (or provable equality) in a version of the typed A-calculus.54 The version in question includes— in addition to axioms and rules for identity, an axiom for A-conversion and an axiom which allows changes of bound variables—an extensionality principle:

((Xx)(sx)) = s if x is not among the free variables of s.

After defining a notion of structure appropriate to the typed A-calculus, he prove its soundness—showing by induction on the length of the derivation of s = t that , if h s = t, then M f= s = t for all structures M. He then constructs a particular Mo whose elements are terms of the calculus factored out by the equivalence relation of being provably equal and shows that it is a structure in his sense. Once this has been established, it follows almost immediately that, if M 0 |= s = t, h s = t. (Friedman refers to this as a completeness theorem for the typed A-calculus because, when taken together with the soundness result quoted earlier, it establishes the equivalence of the following three conditions: (i) h s = t; (ii) for all structures M, M (= s = t, and (Hi) Mo \= s = t.)

Friedman goes on to establish what he calls an extended completeness theorem for the typed A-calculus. He first defines a notion of partial homo-morphism between structures which has the property that, if there exists a partial homomorphism from M onto M and M (= s = t, then Af (= s = t. Given any set B, T#—the full type-structure over B—is the paradigm of a structure for the typed A-calculus. Friedman proves that, for any infinite

54"Equality Between Functional ," Springer Lecture Notes in Mathematics, Vol. 453, 1975, pages 22-37. Actually this paper deals with a wider range of topics than the above description indicates. Friedman was interested in classifying the relations of equality and being everywhere different between functionals of various classes according to their complexity. The completeness theorem quoted in the text is an intermediary towards showing that equality between simple functionals (i.e., those in the full type-structure over to which are defined by a closed term of the typed A-calculus) is recursive.

184 Normalization, Cut-Elimination and the Theory of Proofs

set B, there is a partial homomorphism from TB onto Mo* This allows him to conclude that TB f= s = £ is equivalent to any of (i)-(iii) above. Equality in the structure TB is of course set-theoretical equality between functions.

Translated into the language of derivations, the extended completeness theorem asserts that, when closed derivations in the pure implicational fragment of NJ are interpreted as denoting functionals in the set-theoretic sense over some (infinite) domain of atomic proofs, II and IT are interreducible (using Prawitz's reductions steps augmented by expansions) iff they denote the same functional. I think this is an interesting statement; it provides an illuminating characterization of interreducibility, albeit for a restricted class of derivations, and it demonstrates that proofs need not disappear into the consequence relation when they are interpreted as extensional objects. In addition, it suggests some interesting questions for further investigation. For example:

(a) Can this result be extended to all of propositional NJ?

It seems to be a routine matter to extend the theorem to the negative fragment of NJ by adding product types and modifying the notion of structure accordingly. If the type-structure is then extended to take account of disjunction, it seems clear that all permutations of inference will preserve equality. A more problematic issue is whether the converse holds, z.e., whether it will still be true that derivations which denote the same functional are interreducible. The difficulty lies not so much with disjunction itself as with the thinning that accompanies it. It is unclear how best to treat this latter operation in a functional context.

(b) Can it be extended to NIC!

Multiple-conclusion logic suggests a natural generalization of the notion of function and of the type-structure which may provide an appropriate framework within which to tackle this question.

(c) What is the significance of the expansions?

In view of Prawitz's remark quoted earlier, it may be that expansions are of no particular interest, and the need for them here is neither significant nor disturbing. On the other hand, it may be worth considering the possibility of obtaining the same result for the interreducibility relation generated by the reduction steps alone. This could perhaps be accomplished by interpreting derivations not as particular proofs, but as proof patterns, z.e., by regarding their minimal formulae as variables and identifying derivations which differed only with respect to these. Derivations might then be interpreted as denoting (untyped) partial functions—the difference between n and IT, when the latter is obtainable from the former by expan-

Interpretations of Derivations 185

sion, being reflected in a difference in their domains of definition ( Dom(W) C Dom(U) ) .

(2) Another possible interpretation for derivations is as the morphisms of

a category. Again this might provide evidence for the choice of reduction steps—in particular, for whether there is any reason beyond expediency to restrict permutations. The first obstacle to be overcome on this approach is the need to find a suitable generalization of the notion of category which will accommodate morphisms with a series of domains and, if LK derivations are to be interpreted, a series of ranges as well.

In Appendix C below, I have sketched very briefly some work in this direction. I have also outlined what I think is a natural generalization of categories to morphisms with more than one domain and range, suggested by the multiple-conclusion calculus considered earlier. It then turns out that the structural axioms for these generalized categories force us to identify derivations which permute to one another—thus providing some additional support for (5) above. Zucker's non-terminating reduction sequence appears in this context as an innocuous example of an infinite series of terms all of which refer to the same morphism.

The conclusions I have reached may seem surprising at first sight but there is, I believe, much to recommend them. On quite general grounds, the conception of proofs as rather loosely structured objects is a plausible one. Once it is accepted and the denotations of formal derivations are viewed in terms of it, a unified treatment of cut-elimination and normalization becomes a possibility and the general theory of proofs is freed to some extent from the shackles of syntax. Although it is impossible to predict with any certainty how the subject will develop in the future, the direction I have indicated does show some promise of being a fruitful one—or so, at least, I have tried to argue.

Appendix A

A Strong Cut-Elimination Theorem for LJ

This appendix discusses the possibility of avoiding Zucker's negative result about strong cut-elimination for LJ by altering his conventions for the indexing of formulae. As I remarked in Chapter 2 above, his particular counterexample to strong cut-elimination depends upon a special, and perhaps rather unnatural, feature of these conventions. The question that remains is whether an alternative indexing system could avoid such counterexamples altogether. The version of LJ presented below is essentially the one adopted by Zucker in his paper.1 A sequent has the form T H A, where T is a set of indexed formulae; negation is defined in terms of a constant _L for falsity, and there is no thinning rule. I have however altered the conventions which govern the indexing of formulae and, as a result, contraction becomes a derived rather than a primitive rule. My desire to follow Zucker's treatment as closely as possible explains the unusual formulations of V- and 3-left.

As before, F, A , . . . are supposed to range over sets of indexed formulae, and z, j , k,... over indices. I will write T, A for TUA and T, Ai for r u { ^ i } ; this notation is not intended to imply either that m A = 0 or that Ai £ T. When I do want to indicate this, I will use T; A and T;Aj, respectively. Finally, T;(Ai) will be used to denote ambiguously T;Ai and T. (In the latter case, it is assumed that Ai & T.) Similarly, T,(Ai) will be used to denote T, A\ and T, when it is left open whether or not A% G T. The notations T; (A) and T, (A) are explained in an analogous way.

I take the Calculus LJ to consist of the following.

Axioms:

Ai h A ±i\- P (P atomic and different from _L)

l u O n the Correspondence between Cut-Elimination and Normalization," Annals of Mathematical Logic, Vol. 7, 1974, pages 1-156.

186

A Strong Cut-Elimination Theorem for LJ 187

Logical Rules:

Right Left

r i - A A h B r , A I - ^ A J 5

T\-A T\-B TbAVB ri-AVB

T; (A) h B Th A^B

T h A{a) * T h VxA(x)

r i- A{t)

T;(Aj)\-C T;(Bi)\-C r,(AABj)\-C Y,{AABj)\-C

T;{Ai)\-C A ; ( g , ) h C T,A,AvBk\-C

Tt- A A; (Bj) \- C ( r ) , A , ( A - > B j ) h C

r;(^)Qi-B r.O/a;^!^)!-^

r ; ( ^ ( q ) 8 ) h g f T I- 3xA(x) T, 3xA{x)j h S

* where a does not occur in Y \ where a does not occur in T, B.

Cut Rule:

r\-A (At);A\-B (T),A\-B

The application of cut, or of a left rule other than V- or 3-left, to a premise which contains no formula occurrence involved in the inference is empty. This means in the case of cut that, if A; (Ai) = A, then

d d' T\-A A;{AZ)\-B

( r ) , A h B

is just another notation for the derivation d!. Empty applications of A-, —*-, V- and 3-left are treated similarly. Formulating these rules so as to take account of empty applications is just a convenient way of introducing some notation which will be useful later on.

The use of sets in place of sequences of formulae makes a rule of interchange redundant. As for contraction, in this calculus it takes the form

r,AjY- B

and is not included among the basic rules because of the following:

188 Normalization, Cut-Elimination and the Theory of Proofs

Lemma A . l If d is a derivation ofT\Ai h B, then there is a derivation d(Ajfi) o/T, Aj h B which differs from d only in that some formula occurrences are assigned different indices. (In particular, there are no cuts in d(Aj/i) which are not already in d.)

This lemma is proved by a straightforward induction on the construction ofd.

Reduction steps are of three kinds:

A. Elimination of trivial cuts

B. Permuting cuts upwards

C. Reducing the complexity of cuts.

The following reduction steps are to be read from left to right.

A. Elimination of trivial cuts:

a. d d r h A Ai\-A T\- A

r\-A b. d diAi/j)

Ai\- A T; Aj 1- B T,Ai\-B r,At\-B

B. Permuting cuts upwards:

These reductions divide into two groups according to whether the cut-formula is passive on the right or on the left.

(1) Cut-formula passive on the right

a. d\ d<i d r ; ( 4 ) K B A ; ( 4 ) h C

b.

ei-,4 F,A;Ak\- BAC e , r , A h 5 A C

d d\ 6 h i T]{Ak)hB

(e),ri-B

d Q\- A

(6)

d2

A;(Ak) , A I - C

VC

e . r . A h f i A C

d' d A;Ak\-B

T\-A A; Ak\-B'

d d' T\-A A;Aky-B

A,T\- B

A.rhs' A,ri-s' where R is V- or 3-right.

A Strong Cut-Elimination Theorem for LJ 189

c d' d d'(Bm/q) d A; Ap; (Bq) V- C T \-A A ; i p ; ( B m ) h C

F\-A A;AP\-B^C r , A ; ( 5 m ) h C r . A K B ^ C r . A h B ^ C

where m does not occur as an index in d or d'.

d. d'(a) d d'{b) d A; Bfc h A(a) T \-B A;Bk\-A{b)

D-B A;BkhVxA(x) r , A h 4(6) T, A I- Va;A(x) r , A h V«i4(i)

where 6 is a parameter not occurring in d or d'(a).

e. di d-i d A;(Dn);(Ap)hC 8;(D„);(Bq) h C

Y\-D A , 9 , i V B t ; P B h C T,A,e,AVBk\-C

d di(AT/p) d d2(Bs/q) r h - D A;(Dn);(Ar)\-C ThD 9 ; ( D „ ) ; ( B , ) h C

A , ( r ) ; W h C e , ( r ) ; ( B , ) h C

r,A,e,i4vBfchc where r, s do not occur in d, d\ or ^2.

f. d' d d'(Am/p) d A;Ap;Bq\-C T h B A;Am;BqhC

F\-B A,A'k;Bq\-CR A,T-Am\-C

A , r , 4 H C A,T,A'khCR

where m occurs nowhere in d or d', and R is A- or V-left.

g. di d2

d T;{Cm)\-A A;(Cm);Bk\-D e\-C T,A,A~>Bp;Cm^D

r,A,e,A-+Bp\- D

d dx d d2(Bq/k) S\-C r ; ( C m ) h ^ @\-C A;(Cm);Bq\-D

r , ( 8 ) h i 4 A,(@);Bq\-D T, A, 9 , A - • Bp h D

where q occurs nowhere in d or d2.

Normalization, Cut-Elimination and the Theory of Proofs

h. d'(a) d d'(b)(B(b)s/r) d A;Ap;(B(a)q)\-C FhA A;Ap;(B(b)s)i-C

Tt-A A,3xB(x)r;Ap\-C A,T;(B(b)s)\-C A,3xB(x)T,r\-C A,3xB{x)r,V\-C

where 5 occurs nowhere in d or d'(b), and b is a parameter not occurring in d or d'(a).

Cut-formula passive on the left

a. d\ d2

T;(AP)\-C A;(Bq)\-C d T,A,AVBkl-C @;Cm\-D

r,A,AVBk,Q\-D

di(AT/p) d d2{Bs/q) d r;(Ar)hC e;Cm\-D A;(BS)\-C 9 ; Cm \- D

Q,T;(AT)\-D e, A; (Bs) r- D T,A,A\/ Bk,@\- D

where r, s do not occur in d, d\ or d2.

b. d A;AP\-B d' A,A'a^BR B n ; T h C

A,A'q,T\-C

d(Ak/p) d! A;Ak\-B Bn;T\-C

A,T;Ak\-C

A,A'q,r\-cR

where k occurs nowhere in d or d' and R is A- or V-left.

c. d\ d2

T\- A A;Bk\-C T,A,A^ BpbC

r,A,A^Bp,

d Q;Cm\-D

e\-D

di T\-A

d2(Bq/k) A-Bq\-C

d " ; Cm \-D

r,A,A^Bp,et-D

where q occurs nowhere in d or d2.

d. d(a) d(b)(A(b)T/p) d' r ; (A(a)p) \- B d' r ;(i(i)r)hg A;Bm\-C r,3xA(x)q\-B A;Bm\-C A,T; (A(b)r) h- C

A,3xA{x)q,T h C A,3xA(x)q,T \- C

where r occurs nowhere in d(b) or d', and b is a parameter not occurring in d' or d(a).

A Strong Cut-Elimination Theorem for LJ 191

C. Reducing the complexity of cuts:

(1) a. di di d YV A A\-B e;{AABk);Ap\-C r,A\-AAB 9;AABk\-C

e,r,A\-c di d2

r\~A A\-B d(Aq/p) dt T,A\-AAB Q;(AABk);Ag\-C

T\- A Q,(T,A);Aq\-C

e,r,(A)i-c where q does not occur in d, d\ or di.

b. Like (la) with Bv instead of Av and 0*2 playing the role of d\.

(2) a. d d\ d2

&^A r;(AVBk);(Ap)\-C A;(AV Bk);(Bq)\-C Oh AVB r , A; A V Bk H C

r,A,ei-c d

d Q\-AVB r;(AVBk);(Aq)\-C

e\-A r,(ey,(Aq)\-c r, (e)»- c

where q occurs nowhere in d or d\. b. Like (2a) with B instead of A, and d2 playing the role of d\.

(3) d d\ 0Z2 r ; (Ap) h B {A^Bk);A\-A (A -> Bk);G;Br h C r\-A- ->B A^Bk;A,Q\-C

d T-AAP) r\-A-

\-B -*B

(r)

r,A,0(-C7

di (A->Bk);A\-A ,A\-A T-{AP)\-B

d2(Bq/r) (A^Bk);@;Bq

occur

T,{A)\-B

/ T;(AP)\-B / T\-A-> B

d2(Bq/r) (A^Bk);@;Bq \-c

occur

T,(A)hB (V). ,e;B, xvc

where q occur

r , ( A ) , 6 h C

s nowhere in d, di or aV

192 Normalization, Cut-Elimination and the Theory of Proofs

(4) d(a) d' T h A(a) A;(Vx(Ax)p);A(t)q\- B

n-VxA(g) &;VxA{x)phB &,T\-B

d(a) r I- A(a) d'(A(t)r/q)

d(t) T h VJA(X) A; (VaA(g)p); i4(<), I- ff r i- ACQ A,(r) ;(i(t) r)hfi

A,r i-B where r occurs nowhere in d(a) or d'.

(5) d d'(a) r h i4(Q A; (3a;X(a;),); (4(a)p) I- B

r h 3xA(x) A; 3ar^(x)g h g A,T\- B

d r i- i4(t) ( f W ^ W r / p )

d r h 3a:A(x) A; {3xA{x)q); {A(t)r) \- B r I- i4(t) A , ( r ) ; ( / l ( < ) f ) r 5

A , ( r ) i - 5

where r occurs nowhere in d or c?'(a).

Remarks :

(i) The reductions in group B are easy to describe. They simply allow a cut to be permuted upwards past applications of any other rule provided that the cut formula is passive. Unfortunately, there are a large number of cases to consider and I have not been able to find a notation that enables more than a couple of these to be amalgamated. No restrictions are placed on these permutations except to ensure that there are no clashes of indices, i.e., to ensure that a formula occurrence which is passive does not become active as a result of performing one of these reductions. This explains the need to reindex formulae in some of them. (In (Blc), for example, Bq might occur in T. The active occurrence of of Bq in d! must therefore be converted into B m , where m is some new index, before permuting the cut with —•-right.) Such clashes are undesirable in principle since these reductions are supposed to be simple permutations and nothing more. In addition, they pose problems for strong cut-elimination.2

2See the discussion of Gentzen's mix rule in Section 7.6 of Zucker, op. cit.

A Strong Cut-Elimination Theorem for LJ 193

(2) The statement of the reduction rules in group C could be simplified if

the left rules were reformulated in such a way that the formula introduced had to be assigned a new index. Contraction would then no longer be a derived rule, but could be added as a basic one. This in turn would necessitate adding a further group of reductions for permuting contractions downwards past other inferences, and would result in a slightly different set of choices for reduction procedures. Such a modification would not be of much significance, but there is little incentive to make it. It probably complicates matters more than it simplifies them and, in addition, the new restriction on indices is difficult to motivate. What cannot be required is that subderivations of a given derivation have no index in common—for example, that T and A be disjoint in the statement of A-right. Zucker's example of a non-terminating (proper) reduction sequence depends upon just this point.

(3) If d terminates with a cut and d! is obtained from d by applying one of

the above reductions to its final inference, then d! is said to come from d by a 'primitive reduction. In general, we say that d reduces (in one step) to d! if d' can be obtained from d by replacing one of its subderivations by a derivation which comes from it by a primitive reduction. When the primitive reduction in question is from group C, the conclusion of the new subderivation need not coincide with that of the subderivation from which it comes, with the result that some members of the succeeding chain of inferences may become empty. The notation introduced earlier is intended to make clear how such inferences are to be treated. It obviates the need to give a separate definition of what Zucker has called pruning,3 and has the effect of eliminating all inferences made redundant by the reduction except for applications of V- and 3-left. This result is easily seen to be the same as would be obtained by adapting Zucker's definition to the present context. In according special treatment to redundant applications of V- and 3-left I am following Zucker's example. His only motivation for doing so seems to be that it facilitates the comparison with normalization procedures for natural deduction derivations.

It has been shown by Dragalin that every reduction sequence constructed according to A, B and C above must be finite in length.4 In

3Zucker, op. cit, pages 44-47. 4 "A strong theorem on normalization of derivations in Gentzen's sequent calculus,"

Studies in the theory of algorithms and mathematical logic, ed. by A. A. Markov and V. I. Khomich, "Nauka," Moscow, 1979, pp. 26-39 (Russian). An English translation of the proof appears as Appendix B of his monograph Mathematical Intuitionism: introduction

194 Normalization, Cut-Elimination and the Theory of Proofs

fact, he establishes this result for a version of LK, not just for LJ. There are a number of differences between his version of the sequent calculus and the system described above, most of them rather minor. In the first place, his sequents are constructed from lists of formulae, where a list is explained as a finite set with repetitions so that, although a formula may have more than one occurrence in a list, the order of formula occurrences does not matter. In the second place, his rules include thinning and contraction as well as rules for negation (rather than axioms for ±) . Finally, his calculus employs a mix rule instead of cut, i.e., a rule which removes every occurrence of the cut formula from a list. As for his reduction steps, they do not coincide exactly with those listed above. They include, of course, the reductions needed for contraction, thinning and the negation rules. In addition, however, each reduction listed under C above is replaced by a pair of reductions whose applicability depends upon whether the cut formula introduced into each premise of the cut by the preceding inference already occurs in the premise(s) of that inference or not. To handle the former case, there are reductions which allow the cut to be applied before the inference in question to remove these prior occurrences, and then reapplied after it to remove the new occurrence of the cut formula. In the latter case, the familiar steps for replacing the cut by a cut or cuts of lower degree are used. This results in a slightly more flexible reduction procedure.

Although these various differences affect some details of the proof, none of them is of much significance. In my account, I have tried to follow Zucker as closely as possible, except for the reductions in C which have been chosen to simplify the exposition. Dragalin's proof is suggested by Prawitz's proof of the analogous result for natural deduction. An inductively defined property of derivations (an analogue of "strong validity" or "computability") is introduced such that derivations with this property are easily shown to generate only reduction sequences of finite length. The work of the proof consists in establishing—by induction on the definition (amongst other things)—that all derivations have the property. Call the derivation(s) of the premise(s) of the final inference of a derivation d its immediate subderivation(s).

Definition A . l (1) A derivation d is said to be inductive if

a. d is an axiom. b. the last inference of d is not cut and the immediate subderiva-

tion(s) of d is (are) inductive. c. the last inference of d is cut and every derivation to which d

reduces in one step is inductive. (2) The inductive complexity of d is defined as follows:

to proof theory, Vol. 67 of the AMS series "Translations of Mathematical Monographs," Providence, 1988, pp. 185-200.

A Strong Cut-Elimination Theorem for LJ 195

a. if d is an axiom it has inductive complexity 1. b. if the last inference of d is not cut, its inductive complexity is

one more than the inductive complexity (sum of the inductive complexities) of its immediate subderivation(s).

c. if the last inference of d is cut, then its inductive complexity is one more than the sum of the inductive complexities of the derivations to which d reduces in one step.

It is now a straightforward matter to prove by induction on inductive complexity that there is no infinite reduction sequence beginning with an inductive derivation d. If d is an axiom, there is nothing to prove. If d terminates with an inference other than cut, it follows from the elementary properties of the reduction steps that such a sequence must contain an infinite subsequence of reductions applied to (one of) the immediate subderivation(s) of d—contrary to the induction hypothesis. Finally, if d terminates with a cut, the result follows immediately from the induction hypothesis.

It remains to argue that every derivation is inductive. This too is a straightforward induction—here, on the construction of d—provided that the derivation which results from applying cut to the conclusions of a pair of inductive derivations can itself be shown to be inductive. Dra-galin calls such a derivation, i.e., one terminating with a cut whose immediate subderivations are both inductive, a regular figure. So, for strong cut-elimination, it is enough to show:

Lemma A.2 Every regular figure d is inductive.

Lemma A.2 is proved by induction on the pair (a, (3), where a is the degree (or logical complexity) of the cut formula of the terminal cut of d and (3 is the sum of the inductive complexities of its immediate subderivations. The proof is a matter of verifying that every derivation d! to which d reduces in one step is inductive—using the induction hypothesis as needed and the fact that the result of applying a reduction step to an inductive derivation is itself an inductive derivation of lower inductive complexity. Let us designate the final inference of d by C, and its right immediate subderivation by d". There are a number of cases to consider:

(i) d' comes from d by reducing a cut other than C. If C disappears as a

result, it follows from the regularity of d that d! is inductive. If it does not, we argue that the inductive complexity of the immediate subderivation containing the reduced cut has been lowered and apply the induction hypothesis.

(2) C is a trivial cut and d! comes from d by eliminating it with a type

A reduction. Here, the inductiveness of d' follows immediately from the hypothesis that d is regular.

196 Normalization, Cut-Elimination and the Theory of Proofs

(3)

The cut formula is passive in at least one premise of C and d! is obtained by using a type B reduction to permute C. In this case we argue first that the immediate subderivation(s) of d! terminating with C (or the pair of cuts into which C has been split by the reduction) is (are) regular and that one of its (their) immediate subderi vat ions is of lower inductive complexity than the corresponding immediate subderivation of d. Next we apply the induction hypothesis to infer the inductiveness of this (these) subderivation(s) and, finally, use the fact that d' is obtained from an inductive derivation (or two such) by a rule other than cut to infer that it too must be inductive.

(4)

The last case is that d! comes from d by applying a proper reduction to d. In this case, we argue first that the immediate subderivations of d' (or the subderivations obtained from df by removing the last two cuts in the case of (C3)) are inductive. This follows either by the same argument as in the preceding case or from the regularity of d—depending upon whether or not the cut formula of C is already present in the conclusion of an immediate subderivation of d". We can then appeal to the fact that the final cut of d! (or last two cuts in the case of (C3)) is (are) of lower degree than C and infer from the induction hypothesis that d' is inductive.

From the argument outlined above we can conclude:

Theorem A.3 (Dragalin) Every reduction sequence generated by the steps in A, B and C above is finite.

Although this is a satisfying result, it falls short of what we would like to be able to claim. Dragalin rightly points out that the reduction steps he has selected are those employed by Gentzen in his original proof of cut-elimination. On the other hand, Gentzen's procedure, according to which cuts are eliminated from top to bottom in a derivation,5 is often modified in such a way that cuts are systematically reduced by degrees, beginning with those of highest degree.6 This procedure may require a cut to be permuted upwards past other cuts of lower degree. Furthermore, Zucker allows cuts to be permuted freely with one another since, as he shows, such permutation variants will still be mapped onto the same natural deduction derivation and, hence, may plausibly be claimed to represent the same proof. There is no difficulty about formulating steps for permuting cuts with cuts.

5See Chapter 1 above. 6See, for example, "Proof Theory: Some Applications of Cut-Elimination" by

H. Schwichtenberg, Handbook of Mathematical Logic, ed. by J. Barwise, pp. 867-895.

A Strong Cut-Elimination Theorem for LJ 197

To the reductions in group (Bl), cut-formula passive on the right, we add the following:

i. d\ d<i d A ; ( i f c ) h C e;Cp,(Ak)hD

T\-A &,Q;AkhD A,r ,0h£>

d d\ d d2{Cq/p) r h i A; (A f c )hC T\-A Q;Cq;(Ak)\-D

A , ( r ) h C e,(T);Cq\-D A,e,rh£>

where q occurs nowhere in d or d2.

and to those in (B2), cut-formula passive on the left,

e. d\ di TV A A^AVB d

T , A h £ Bq,<d\-D r , A , 9 h D

d2{Ar/p) d di Ar;A\- B Bq;Q\- D

TV A Ar;A,e\-D r , A , 9 h £ )

where r occurs nowhere in d or d2.

The only problem is that the inclusion of (Bli) and (B2e) among the reduction steps obviously opens up the possibility of infinite reduction sequences: they allow successive cuts to be permuted with one another ad infinitum. All is not lost, however, since we may hope to salvage the result by utilizing Zucker's notion of a proper reduction sequence, i.e., one without infinite repetitions. Although (B2e) may increase the number of cuts in a derivation, no proper infinite sequences can be generated by permuting cuts with one another. This can be seen from the following considerations.

Definition A.2 The power of a formula occurrence on the left of a sequent in a derivation is defined by induction on the rules:7

(1) The formula occurrence on the left of an axiom has power 1. (2) The formulae on the left in the conclusion of an application of a

right rule have the (sum of the) power(s) of their occurrence(s) in its premise(s).

(3) The passive formulae in the conclusion of an application of A- or V-left have the same power as their occurrences in its premise.

7Recall that occurrences of the same formula with different indices are counted as different formula occurrences.

198 Normalization, Cut-Elimination and the Theory of Proofs

The active formula has the power of its occurrence (if any) in the premise plus the power of the formula occurrence from which it was obtained by the rule.

(4) The passive formulae in the conclusion of an application of 3-left have the same power as their occurrences in its premise. The active formula has the power of its occurrence (if any) in the premise plus one.

(5) The passive formulae in the conclusion of an application of V-left have the sum of the powers of their occurrences in its premises. The active formula has the sum of the powers its occurrences (if any) in the premises plus one.

(6) Let n be the power of the formula occurrence in the right premise of an application of —•-left which is operated on by the rule. The passive formulae in the conclusion of this application have the power of their occurrences in the right premise plus n times the power of their occurrences in the left premise. The active formula in the conclusion has the power of its occurrence (if any) in the right premise plus n times the power of its occurrence (if any) in the left premise plus n.

(7) Let n be the power of the occurrence of the cut formula in the right premise of an application of cut. The formulae in the conclusion of this application have the power of their occurrences in its right premise plus n times the power of their occurrences in its left premise.

Intuitively, the power of a formula occurring on the left of the conclusion of d is the cardinality of its corresponding assumption class in 4>{d).

Definition A.3 The weight of an application of cut in a derivation d is defined as follows:

(1) If the last inference of d is not cut, then each cut in d has the same weight as it does in the immediate subderivation of d in which it occurs.

(2) If the last inference of d is cut and the cut formula in its right premise has power n, then the weights of the cuts in the right immediate subderivation of d are unchanged. Those in the left immediate sub-derivation have their weights multiplied by n, and the final cut in d is assigned n as its weight.

Again, the motivation for this definition comes from the mapping <f> from LJ to NJ.

Let d be a derivation whose last two inferences are both applications of cut, then it is easy to check that

(1) The use of (Bli) or (B2e) to permute these cuts leaves the powers of the formulae occurring on the left of the conclusion of d unchanged.

A Strong Cut-Elimination Theorem for LJ 199

(2) If the final cut of d is split by an application of (B2e), and with it all the cuts in the left immediate subderivation of d, then the weight of each such cut in d is equal to the sum of the weights of the two cuts which replace it.

It follows from (2) that the weight of a cut provides a bound on the number of cuts into which it can be split by repeated applications of (B2e). Furthermore, (1) guarantees that, when cuts are split in a derivation, no cut occurring below them has its weight increased as a result. Hence the sum of the weights of the cuts occurring in a derivation fixes a bound for the derivation as a whole. It is obvious, however, that no infinite non-repeating sequence can be generated by applications of (Bli) and non-splitting applications of (B2e). We can conclude therefore:

Lemma A.4 No infinite sequence of applications of (Blij or (B2e) is proper?

This is in fact the only point in the discussion which depends upon the distinctive property of LJ. If more than one formula is allowed to occur on the right of a sequent, infinite non-repeating sequences are easily constructed—as the example in Chapter 7 above illustrates.

Lemma A.4 does not imply that strong cut-elimination (in the sense that every proper reduction sequence is finite) holds when the reductions in A, B and C above are augmented by steps for permuting cuts with one another. In fact, it is not clear whether Dragalin's argument can be adapted to this new situation. On the other hand, Zucker's proof for the negative fragment, which allows for such permutations, depends upon translating cut-elimination steps for LJ into the familiar normalization steps for NJ. (B2a) and (B2d) above do not translate in this way however and, although his methods can be extended to take account of (B2d),9 disjunction remains a problem. The upshot of this discussion is that strong cut-elimination does hold relative to the reduction steps described above, subject to certain restrictions. It seems likely that these restrictions—whether to fragments of LJ or on the permutation of cuts—can be removed, but at present this is just a conjecture.

8This does not follow immediately from Zucker's results because his indexing system precludes the possibility of splitting up a cut when it is permuted upwards past another one.

9See Chapter 3 above.

Appendix B

A Formulation of the Classical Sequent Calculus

Axioms: Ai \- Aj

Logical Rules:

Right

r , r ' h A , A ' , i A S f e

T\-A;Aj T h A; B, T\-A,AVBk YhA,A\lBk

r;(Ai)\-Bj,A ThA^Bk,A

T;Aj\-A r h -.Xjt, A

r h ^ ( a ) i ; A * F\-VxA(x)k,A

r\-A;A(t)j r h A, 3x,4(x)fc

* where a does not occur in T or A. (An alternative possibility is to replace the rules for -• by axioms of

the form ±i\- Pj—where P is atomic and different from _L.)

Cut Rule: T\-A;Aj ^ f ' h A ' r,r'hA,A'

Left

T;Aj\-A T; Bj \- A r, A A Bk \- A T, A A Bk h- A

r ; ( ^ ) h A r ; (g j )hA >

T,r',AvBk\-A,A'

r\-A;Aj 5 j ; r ' h A ' r , r , A - £ f e b A , A '

T\-Ai;A r^Ak\-A

r;A(t)j\-A T,VxA(x)k\-A

r ;( i t(a)i)hA * r, 3x,4(:r)fe h A

200

A Formulation of the Classical Sequent Calculus 201

Thinning Rules:

Right Left

r h A ; B f c Bk]T\-A (a)

(b)

r h A,Bj,Ai 4 B ; , r h A

Bk;T\-A T\-A;Bk

Bj,Th- A,Ai i j h A,Bj

Notation:

The notational conventions are as before, in particular:

• T, A , . . . range over sets of indexed formulae, and • z, j , fc,... over indices. • I will write T, A for T U A and I \ Ai for T U {At}; • this notation is not intended to imply either that T fl A = 0 or that

Ai 0 T. When I do want to indicate this, I will use T; A and T; Ai, respectively.

In the cut-elimination steps below,

• T; (Ai) will be used to denote ambiguously T; Ai and I \ (In the latter case, it is assumed that Ai &T.)

• Similarly T,(Ai) will be used to denote Tx when it is left open whether or not Ai 6 T.

• The notations T; (A) and I \ (A) are explained in an analogous way.

As before, we can prove

Lemma B . l If d is a derivation ofT;Ai h A\Bj7 then there are derivations (Akfi)d ofr,Ak h A;Bj and d(Bn/j) ofT;Ai h A , B n which differ from d only in that some formula occurrences are assigned different indices. (In particular, no cuts are introduced, and d, (Ak/i)d and d(Bn/j) all have the same length.)

Proof. A routine induction on the length of d. D

(Akji)d is said to be obtained from d by left contraction and d(Bn/j) by right contraction.

Gentzen's formulation of LK, described in Chapter 1 above, differs from the version presented here in a number of respects. In particular, he uses sequences of unindexed formulae where I have used sets of indexed ones and, as a result, is obliged to introduce interchange rules which vary the order of their terms; also, the premises of his thinning rules contain no active formula. Even the reduction steps listed below do not coincide exactly with his. These various differences arise for the most part from a difference

202 Normalization, Cut-Elimination and the Theory of Proofs

of interest. It seems fair to say that Gentzen was concerned only about proving a normal-form theorem in a sufficiently elementary way. On the other hand, I am interested here in the relationship between the sequent calculus and natural deduction, and between a derivation and its cut-free forms. For this reason, I have tried to formulate the calculus so that the interpretation of its derivations as instructions for constructing natural deduction ones can be extended in a reasonably natural way from LJ to LK and from a calculus without thinning to one with. I have also tried to arrange matters in such a way that the transformation of a derivation into cut-free form preserves as much of its original structure as possible. These considerations help to motivate the cut-elimination steps which follow.

A. Elimination of Trivial Cuts

B. Reducing the Complexity of Cuts

C. Permuting Cuts Upwards The reductions in these three groups require little comment or explanation. They do not differ significantly from the steps which figure in Gentzen's proof of the Hauptsatz except that I have insisted in B on applying any necessary thinnings to the premises of the reduced cut rather than to its conclusion. This is simply to facilitate comparison with reduction in D and does not really complicate matters.

D. Splitting Up Cuts These too are direct analogues of reductions used in Gentzen's proof (in the induction on rank, when the mix formula is introduced by the preceding inference). They could be dispensed with, as they are in Appendix A, at the cost of complicating the steps described in B, but it seems more perspicuous to list them separately. They could also be made redundant if each formula introduced into a sequent were assigned a new index, although it would then become necessary to include contraction as a basic rule and to replace the steps in D by contraction conversions of various kinds.1 The possibility of doing without contraction, while insisting that each formula must be introduced with a new index, is ruled out by the considerations in Chapter 4 above.2 It can only be realized if cut is replaced by a sort of mix rule which operates on all occurrences of the formula to be cut regardless of their indices. There is no advantage to be gained by introducing such a rule, however, since its elimination inevitably involves reduction steps analogous to those in D.

1This is essentially the approach adopted by Zucker for LJ in his paper "On the Correspondence between Cut-Elimination and Normalization."

2See Chapter 8 of Multiple-Conclusion Logic by Smiley and Shoesmith for an argument which in effect demonstrates the inadequacy of such a system.

A Formulation of the Classical Sequent Calculus 203

E. Elimination of Cuts with a Thinned Premise These appear in Gentzen too, although his formulation of thinning makes them slightly simpler to state.

F. Thinning Permutations These have no analogues of any kind in Gentzen. They constitute a not altogether satisfactory solution to the problem posed by the need to associate each formula introduced by an application of thinning with an active formula in the premise of the rule. (I chose to formulate the thinning rules in this way in order to make them inter-pretable as operations on derivations of D. The only non-arbitrary interpretation of Gentzen's version of thinning seems to be in terms of derivations which are not necessarily connected graphs.) If the active formula in the premise of a thinning becomes (possibly after resub-scripting) a cut formula in its conclusion, then it will disappear when the cut is permuted upwards, leaving the thinned formula stranded. The problem is to provide a systematic way of finding a new formula to replace the active one which is removed by the cut. The expedient I have adopted is to allow thinnings to be permuted upwards until they are applied only to axioms, at which point an obvious solution presents itself. It might be argued that this is a much more complicated procedure than the needs of cut-elimination demand. This is no doubt true, but it does seem necessary to preserve the hope of establishing an interesting connection between the various cut-free forms of a given derivation. Of course, cut-free forms are not unique, and to make them so, even in some weak sense of equivalent modulo certain permutations of inference, would require a much more conservative treatment of thinnings in the reductions in B and E.3 To handle thinnings with care only in F represents therefore a rather uneasy compromise. I know of no entirely satisfactory way to deal with these inferences, however, and hence did not feel justified in reformulating the other reduction steps.

G. Additional Thinning Permutations Again, these have no analogues in Gentzen, nor are they needed for cut-elimination. Their only purpose is to justify the notation introduced in convention (1) below. The point of the notation itself is to minimize the arbitrariness in the use of thinnings in B by making the order in which they are applied unimportant.

Comments:

First, there is one obvious difference between Gentzen's cut-elimination steps and those listed below that I have not stressed; it is that his are written for mix rather than cut. The distinction between these two rules

'This topic is discussed further in Chapter 7 above.

204 Normalization, Cut-Elimination and the Theory of Proofs

becomes a little blurred when sets replace sequences. This is because mix is needed only to deal with interchange. (I realize that it helps a little with contraction too, but one can just as easily manage without it as far as this rule is concerned.) Since interchange is not merely redundant, but makes no sense in the version of LK which I have described, the distinction between mix and cut loses some of its importance as well. Put differently, the calculus presented above can be compared to one which differs from Gentzen's in allowing the rules to operate upon any appropriate formula occurrence on the left [right] of a sequent instead of only on the leftmost [rightmost] one. Mix then becomes equivalent to cut or to a series of cuts— not to a combination of cut and the other structural rules. The cut rule, however, has one important advantage over mix from my point of view. It is that the reduction steps for the former preserve the structure of the derivation to which they are applied far better than those for the latter. (I am thinking particularly of reductions of the sort described in group C, which presumably would allow a mix to be permuted upwards past an inference whose premise contained an active occurrence of the mix formula.) Gentzen, on the other hand, seems not to have been concerned with this issue.

Second, if the alternative formulation of LK—that is, without rules for negation—is adopted, all the reduction steps pertaining to -i-right and left can be omitted. No additional steps are made necessary by the inclusion of axioms for _L.

Conventions:

(1) Given r , , A , I will write

d r ; A j h A

r ' , T ; , 4 f c h A , A '

for the derivation of Y',Y,Ak \~ A, A' obtained from (Ak/i)d by a series of thinnings applied to Ak- (When using this notation, I will always choose k so that Ak & I \ r ' . ) The dual notation, with Ai on the right of the sequent instead of the left, is explained in an analogous way.4

(2) If (Bj);r = T, then both

d d (Bj);r\-A and {Bj);T \-A

(B*),(c»),rhA (B,),rhA,(cn) 4Strictly speaking, the notation introduced here is not well-defined. But, whenever it

happens to denote more than one derivation, they will be obtainable from one another by the permutations in G.

A Formulation of the Classical Sequent Calculus 205

denote the derivation d of T h A; otherwise they denote

d d (Bj);T\-A and (B^ThA

Bk,Cn,T\-A Bk,T\-A,Cn

respectively. The dual notation, with Bj, Bk and C„ changing sides, is explained in an analogous way.

(3) I f ( ^ ) ; r ' = r ' , then

d d! T l - A ; ^ (Ai);r'\-A'

(r),r'H(A),A'

is just the derivation d! of H h A'. (4) The letters s, t, u and v will be reserved for subscripts that occur

only where explicitly indicated in the figures of which they are a part. (5) In A-E below, the figure on the left reduces to the one on the right;

the permutations in F and G are symmetrical.

The Cut-Elimination Steps

A. Elimination of Trivial Cuts

(1) d T\-A;Ai Ai\-Aj

r\-A,Aj AAj/i)

r h A, A,

(2) d Ajh Aj A , ; r h A

^ , r i - A (Ai/j)d

B. Reducing the Complexity of Cuts

(1) a. di ^2 ^3 n - A ; 4 j r ' h A';Bj Ap;V" h A" r , r ' l - A , A ' ; ^ A S f c AABk;T"\-A"

r,r',r"h A,A',A" di dz

T\-A;Ai r " ; , 4 p l - A "

r , F I - A ; , 4 s r " ; A 8 h A ' , A "

r,r,r"hA,A',A" b. Like (la) with di and d\ interchanged, and B A Ak in place of

AABk.

206 Normalization, Cut-Elimination and the Theory of Proofs

(2) a. di di dz T h A j f i i Ap;r'\-A' Bq;r"\-A"

T\-A;AWBk AVBk;T',T"\-Al,A"

r,r,r"hA,A',A" d\ dz

T h A; Bj T"\Bqr- A"

r,T'\-A;Bs r";Bs\-A',A"

r,r,r"hA,A',A" b. Like (2a) with d3 and d2 interchanged, and B V Ak in place of

AvBk.

(3) a. di di dz T;AP\- A;Bq r ' h A ' ; A B j ; r " h A "

T\-A;A-+Bk A^Bk;T',T"hA',A"

r,r',r"hA,A',A" d2{As/i) (As/p)di(Bt/q)

r'hA';As As;r\-A;Bt (Bt/j)d3

r , n - A , A ' ; B , BuF'hA"

b. di T\-A;Bq

r,r',r"h A,A' d2

r ' h A ' ; 4 j S7

dz ; T" h A"

T\-A;A-+Bk ^-Bfe;r',r" h A ' ,A"

r,r,

ThA;B

T"hA,A ' ,A"

dz q_ _B,-;r"J:_^

//

r,r'i-A;B, st;r"hA', A" r,r,r"hA,A',A"

(4) dx d2

r F A ; A ( q ) , # ) P ; r ' h A ' T h A; Vgi4(a)fc Vx^(ar)fc; T h A'

r,r'h A, A'

d'(t) (A(t)s/p)d2

r\-A;A(t)s A(t)s;F \-A' r , r ' hA ,A '

where d! = di(A(a)s/i)

(5) di d(a)2

r\-A;A(t)j A{a)p; V h A' T I- A; 3xA(x)k 3xA{x)k; F \- A'

r , r 'h A, A'

A Formulation of the Classical Sequent Calculus 207

dM^s/i) d"{t) T\-A;A(t)s A(t)s;r'hA'

r,r't- A, A' where d" = (A(a)s/P)d2

(6) di d2 d2(As/p) (As/i)di Aj-,r\-A V \- A'; Ap T'\-A';AS AS;T \-A

rhApAt n ^ r h A ' r, r \- A, A' r,r'h A, A'

C. Permuting Cuts Upwards

Cut-formula passive on the right:

(1) d2 d3

rfi ( ^ r h A ' j g j (Ai);T"\-A";Ck

rhA;ij Aj-T'X'h A',A",BhCm

r,r,r"hA,A',A",BACm

d\ d2(Bs/j) di ds{Ct/k) T\-A;A% {Ai);r'\-A';Bt T\-A;Aj (A^T" \-A";Ct

(r),r'h(A),A';g. (r),r"h(A),A";Ct

r,r',r"hA,A',A",JBACm

(2) d2 d3

di ( ^ i B j i P r A ' (^);Cfc;r"KA" T\-A;Ai i i ;BVCm , r ' , r"hA',A"

Bvc*m,r,r',r"hA,A',A"

di (Bs/j)d2 di {Ct/k)dz ThA;At (Aj);BS;F h A' ThA;At (A,);Ct;T" \-A"

fi,;(r),r'h(A),A' Ct;(r),r"h(A),A" Bvcm,r,r',r"F- A, A', A"

(3) rf2 rf3

di ( ^ r H A ' j B j W ; C t ; r h A " r h A ; i j A;B^C m , r ' , r "hA' ,A"

B-*cm,r,r',r"hA,A',A"

di d2(Bs/j) di (Ct/k)d3 r\-A;Aj ( ^ r h A ' j B , rhA^ i (A);C;r"hA"

(r),rh(A),A';B< Ct;(r),r"h(A),A" 5-»cm,r,r',r"KA,A',A"

208 Normalization, Cut-Elimination and the Theory of Proofs

(4) d2 d3

dx (Ai)-,r'\-A';Bi (A^BfX' H A" ry-A-Ai 4r',r"hA',A"

r,r,r"hA,A',A"

d\ d2{Ba/j) di (Bs/j)d3

T\-A;Ai (Aiy,r'\-A';BS ThAjAj (Aj); Ba;T" \-A" ( r ) , r ' H ( A ) , A ' ; B , B3;(T),r"\-(A),A"

r,r',r"t-A,A',A"

(5) d2 di d2{Bs/j) dx ^;r'hA';5j rhA;^ ^ ;r'hA';ga

rhA;^ AJ;T"\-A" R r,r\-A,A'-,Bs r,r"hA,A" r,r"hA,A" R

where R is V-right, 3-right, right thinning (a), left thinning (6) or -i-left (applied to Bj in the left hand figure, and to Bs in the right hand one).

(6) d2 dx (Bs/j)d2

dj. A^B^T'hA' ^ r\-A;Aj Ai;BS;T' h A' r\-A;Aj AJ;T"\-A"

R Bs;T,r'\-A,A' r,r"i-A,A" r,r"i-A,A" R

where R is A-left, V-left, left thinning (a), right thinning (6) or -•-right (applied to Bj in the left hand figure, and to Bs in the right hand one).

(7) d2 dY (Bs/j)d2(Ct/k) <*! A^BfX^C^A1 ri-A;^ Ai;Bs;T' I- Ct; A'

r\-A;Ai AiS'hB^C^A' BS-YJ' h Ct; A, A' r,r'hB-,cra,A,A' r,r'hB-.cm,A,A'

(8) d2{a) dx d2{b) dx A^Bja^r'hA' r\-A;Aj A^Bjb^r'\-A'

r\-A;Aj Aj;3xB(x)m,r' h A ' B(b)f,r,r h A, A' 3xB(x)m, T, T h A, A' 3a;S(a;)m, I \ V h A, A'

where b occurs nowhere in di or d2.

A Formulation of the Classical Sequent Calculus 209

(9) d2(a) di d2(b) di AyV \- B{a)yA' r h - A ; ^ Aj-J'\-B{b)yA'

r\-A;Aj Ai;r'\-VxB(x)k,A' T,T' \- B(b)y A, A'

r, r i- vxB(x)k, A, A' r, r f- vxB(x)k, A, A' where b occurs nowhere in d\ or d2.

Cut-formula passive on the left:

(10)-(18). These are just the duals of (l)-(9) above. In each case, put Ai on the left hand side of the conclusion of d\ and on the right hand side[s] of the conclusion^] of d2 [and dz], then rearrange the premises of the cut accordingly. By way of illustration, I have written out case (14) below. The remaining ones are left to the reader.

(14) d2 d2(Bs/j) dx

T'hA'-^yAj dr T'bA';Bs;Ai AyThA Y"hA"\Aj R T h A r , r ' h A , A ' ; g 8

r,r"i-A,A" r,r"i-A,A" R

where R is V-right, 3-right, right thinning (a), left thinning (b) or -i-left (applied to Bj in the left hand figure, and to Bs in the right hand one).

D. Splitting Up Cuts

Cut-formula active on the right

(1) d2 d3

dx Ay (A V Bm); r h A' By (A V Bm); T" h A" T\-A;A\/Bm A\JBm,A\/Bm;T',T"y A\A"

r,r,r"hA,A',A"

d2 dz At; (A V Bm); V \- A' By, (A V Bm); T" h A"

i V B 8 ; i V 5 m ; r ' , r " h A ' , A "

di d!{A\/Bs/m) T\-A;AwBm

r i -A; .4Vff g AVBs;r,T',r"\-A,A',A" r,r',r"h A,A',A"

(2) d2 dz

di (A -> g m ) ; ^ , h A';Aj By (A - Bm);T" h A" T\-A;A^Bm A^Bm,A^Bm;T',T"\-A',A"

T,r',T"\-A,A',A"

210 Normalization, Cut-Elimination and the Theory of Proofs

(A -» Bm); r h A'; At Bf, {A - Bm); T" h A" A^Bs;A-^Bm;T',T"\-A',A"

dM^Bt/n) T\-A;A^B, r h A\A->B, A - ^ J 9 8 ; r , r ' , r " l - A , A ' , A / A «

r,r,r"hA,A',A"

(3)

(4)

dl d2

Ai;r'\-A' „ r 1- A; Ai A . . ^ f h A " 1 1

r,r"^A,A" rf2

d, ^ ; r ' h A ' di(As/i) r h A ; i , Ai;As;T"\-A"11

T h A ; ^ s y l s ; r , r " h A , A "

r,r"i- A, A" where R is A-, V-, ->-left, left thinning (a) or (b)—provided that if R is left thinning (a), Ai is not the active formula in its premise.

d2

di Ak;T'\-A' r\-A;Az 4 „ i , ; r h A ' L T ( a )

d2

h A' ' h A '

r, r h A, A'

di Ak;F' di{As/i) T\~A;Ai Ai\As;T

d2

h A' ' h A '

LT(a)

r h A ; i 8 i s ; r , r ' h A ,A'

r, r h A, A'

(5) d2(a) di A(a)n;3xA(x)i;T' \- A'

r h A; 3xA(x)t 3xA(x)j, 3XA{X)J;T' \- A' r,r'h A, A'

rfl d2(a)

A{a)n;3xA{x)i;T'h A' d^xAix)^) r h A; 33^4(3:), 3^(3 : ) , ; a ^ Q c ^ ; T' h A'

T h A; 3xA(x)s 3xA(x)s; T, T' h A, A' T,T'\- A, A'

A Formulation of the Classical Sequent Calculus 211

Cut-formula active on the left

(6) di d2

T\-A;(AABm);Aj V h A'; (A A Bm); B, d3

r , r ' h A , A ' ; A A B m , i A B m ,4 A flm;r" I-A"

r,r',r"h A,A',A"

d\ d>2

rhA;(AABm);Aj r'hA';(AABmy,Bj d3

r , r ' h A,A';AABm;AABs AABm\T"\- A" (A A Ba/m)d3

r,r',r"i-a,A',A";iAB8 AABS-,T"\- A" r,r ' ,r"i-A,A',A"

(7) dx

r ' h A ' ^ j „ d2

r , l h A " ; A i , i i A i ; r h A r",ri- A", A

di T'\-A';Ai d2

r " h A " ; ^ ; A 8R ^ f h A (J4s/i)d2

T",r\-A",A;AS A3;r\-A r",ri-A",A

where R is V-, —•-, 3- , -i-right, right thinning (a) or (b)—provided that, if R is right thinning (a), Ai is not the active formula in its premise.

(8) ^ T\-A;Aj d2

r\-A;Ak,AkKTW Ak;T'\-A' r , r i - A , A '

F h A ^ .RT(a) d 2

rhA;i,;4 W 4 r ' H A ' (^s/fc)d2 iTFKATATA: A , ; fhA '

212 Normalization, Cut-Elimination and the Theory of Proofs

(9) di(o) r \-A;VxA(x)f, A(a)n d2

r h A; V x ^ x ) ^ V x ^ x ^ V x ^ x ) ^ T h A'

r, r \- A, A'

di(a) T\-A^xA(x)i;A(a)n

r h A; Vx^(x)8; Vx^(x)j Vx^(x) i ; T' I- A' (Vx>l(x)s/i)d2

r , r i - ^ . A ^ V s - A p c ) , V i A ( i ) 8 ; r ' h A ' r , r i - A, A'

£. Elimination of Cuts with a Thinned Premise

(1) di T l - A ; ^ d2 d'(Aj/s)

rhA,^ ; g f c fit;rhA' r , r h A ,A ' ,A,

r , r 'hA,AUj

where d' = T h A; ^

r,r'HA,A';^s (2) is like (1) except with Ai, ^ and As on the left.

(3) d2

dl Tj^hA' (^/s)d' ri-A;gfc B t ; r ' ,^ i -A' r, r , ^ \- A, A'

r , r , ^ h A , A '

where d'= r ' ; ^ h A

r , r ' ; i s h A , A '

(4) is like (3) except with .4,, Aj and As on the right.

F. Thinning Permutations

Premise of the thinning passive in the preceding inference

Left Thinning (a)

(1) a. Aj r- Aj Ak\-As

Ak,Bnh Aj Ak,Bn\-Aj b. d d

4 r h A A<;ri- A ii;r"HA" 4 C t , r h A „

^,c*fc,r"i-A" Aj,ck,r"\-A"K

A Formulation of the Classical Sequent Calculus 213

where R is -i-left, V-, 3-, V-right, right thinning (a) or left thinning (6), provided that, if R is V-right and the proper parameter of R occurs in C, it is replaced by a parameter which does not occur in the figure on the right.

d (Bs/n)d 4 r ; B n h A 4 r ; 5 s h A Ai;T"\-A" n A , , C i , r ; f l , h A n

rt ^•,cfc,r"hA" Aj, ck, r" h A" where R is ->-, —>-right, A-, 3-, V-left, right thinning (b) or left thinning (a), provided that, if R is 3-left and the proper parameter of R occurs in C, it is replaced by a parameter which does not occur in the figure on the right. (Bn is supposed to be a premise of R in d, and Bs in (Bs/n)d.)

(At);r!-A;EP W i r h A ' j f , 4 r , r ' h A , A ' , £ A F m

Aj,Cfc,r,r'l-A,AM5AFra

(^) ;rhA ;ffp ( ^ r ' l - A ' i f , (^),(Cfc),r h A;EP (^),(Cfc),F h A';F,

A i,C fc,r,r /r-A,A',£AF ro

^ i ;r,r,£vFmi-A,A' ^,Cfc,r,r',£VFro(-A,A'

(Es/P)di (Ft/q)d2

( 4 , ) ; r ; £ 8 H A ( ^ ) ; r ' ; F ( h A ' (Aj), (Ck), F;ES\-A (Aj), (Ck), P ; Ft \~ A'

i j ,ft,r,r',£vFmhA,A'

(Aj);r\-A;EP ( ^ r ' j f ^ A ^ 4 r " h A , A ' H

^,C f e , r"hA,A' di(Es/p) (Ft/q)d2

(Ai);T\-A;E, ( 4 i ) ; r ' ; F « h A ' ( ^ ) , ( C t ) , r h A ; £ 8 {As),(Ck),r;Ft ^ A '

^,C f c , r"hAA' R

where R is —> -left or cut, with premises Ep and Fq in the figure on the left, and Es and Ft in the one on the right. If R is cut, I assume that s = t. (Notice that the possibility of Fq being equal to Ai has not been excluded.)

214 Normalization, Cut-Elimination and the Theory of Proofs

(lb-f) above simply state that an application of left thinning (a) the active formula of whose premise is passive in the conclusion of the preceding inference can be permuted upwards past that inference. Ignoring complications which arise from the need to keep the active formulae in the premises of the inference distinct from the active formulae in the conclusion of the thinning (and, in case the inference is an application of V-right or 3-left, from the need to ensure that the proper parameter does not occur in the conclusion of the thinning), there are really only two cases to consider according to whether the inference has one or two premises. Having shown how these complications are handled in the case of left thinning (a), there is no reason to give a similarly detailed treatment of the remaining thinning rules since no new problems arise.

d So, given T h A where / is an application of any one premise rule,

T ' h A ' 7

let the derivation dj be obtained from d by applying whatever contractions may be necessary to preserve the distinctions mentioned above and, in case J is an application of V-right or 3-left, by possibly replacing its proper parameter. (It is convenient to write the conclusion of dj as T h A, even though it may differ from the conclusion of d in minor respects.) If J is an application of a two premise rule which operates on the conclusions of d and d', the derivations dj and d'j are explained similarly. With the help of this notation the remaining cases can be presented in a simple and uniform way.

Left Thinning (b)

(2) a. Aj h Aj As h Ak

b.

Bn,Ai\-Ak Bn,Ai\- Ak

d T\-A;Ai r

di rh A

T ' l - A ' ; ^ l Ck,T\-A,Aj -5= T-. n :— 1

c. d d' r\-A;{Aj) r ' h A ' ; ( i i )

F ' h A " ; ^ Cfe,r"hA",A •J

dj d'j r\-A;(Aj) r ' H A ' ; ^ )

(<7fc),rhA,(A,-) ( C t l . r ' h A ' , ^ ) Ck,r'\-A",Aj

A Formulation of the Classical Sequent Calculus 215

Right Thinning (a)

(3) a. Ai\-Aj A3hAk

Ai\-Ak,Bn Ai\-Ak,Bn

b. Like (2b), but with Ck on the right. c. Like (2c), but with Ck on the right.

Right Thinning (b)

(4) a. A* h Aj AkhA8

Ak \-Aj,Bn Ak \- Aj,Bn

b. Like (3b), but with Ai and Aj on the left, c. Like (3c), but with Ai and Aj on the left.

Premise of the thinning active in the preceding inference

(5) a. d (As/i)d Ai^BjYJhA A.;(Bj);r\-A

Bj;r\-A As;(Bk),(Cn),T\-A Bk,Cn,T\-A At;(Bk),Cn,T\-A

Bk>Cn,T\-A K

where R is A-, V- or 3-left. In this last case, if the proper parameter of R. occurs in C, it must be replaced in (As/i)d by one which does not.

b. Like (5a) with Cn on the right.

(6) a. Like (6b) below with Cn on the left.

b. d d(As/i) K);rh4A (^A^TY-A.-,*

- ^ j i T f - A (^Ak),rhAs;A,(Cn) ^Ak,T\-A,Cn (-nAk),F\-At;AyCn

^Ak,T\-A,Cn

(7) a. i. d\ c?2 4(Avfim);rhA Bf,(AvBmy,r'\-A'

AVBm;r,r'hA,A' A V B f c . C n . r . n - A . A '

{As/i)di As;{AvBm);r\-A (Bt/j)d2

As;(AVBk),(Cn),r\-A Bt;(A V Bm);T' h A' Au;(A V Bk),Cn,r\-A Bt;(A V Bk), (CB), V f- A'

A v B f c , C n , r , r ' h A , A '

216 Normalization, Cut-Elimination and the Theory of Proofs

ii. d\ d,2 ^ ; (^VB m ) ; r i -A Bf,(A\/Bm);T'\-A'

xvB fc,c7„,r,ri-A,A /

{Bt/j)d2 {As/i)di Bt;(A\/Bm);r'\-A'

As;(AWBm);rh-A Bt; (A V Bk), (Cn),V \- A' As; (A V Bk), (C7W), T h A Bv; (A V £fc), C n , T h A'

AVBk,Cn,T,r'hA,A' b. Like (7ai) and (7aii) with Cn on the right.

(8) a. i. d\ di (A^Bm);r\-A;Ai Bj;(A^Bmy,r'\-A'

^B m ; r , r ' hA ,A ' yl^Bfc,C„,r,r'hA,A'

di (A-*Bmyr\-A;Aj (Bt/j)d2

(A^Bk),(Cn),r\-A;Ai Bt; (A -> ffm);T I- A /

(A -> B f c),Cn,r h A;AU Bt;(A - B f c ) , (C B ) , r h A'

i^B t ) c„ , r , rhA,A'

ii. di c?2 (A^Bmyr\-A;Ai B^iA-^B^-T'hA'

A - f l m ; r , r h A , A ;

^ - B f c , c n , r , r ' i - A , A '

di B , ; ( i ^ f i m ) ; r ' h A ' ( A - a ^ r h A ; ^ fii;(i-^gt),(g,r'hA'

(A - Bk), (Cn), r h A; ̂ B,; (A -> Bk), Cn, V h A' A ^ 5 f e ) c n , r , r i - A , A '

b. Like (8ai) and (8aii) with C„ on the right—except that, if At = Cn, d\ in the figures on the right must be replaced by d\(As/i) whenever (A - > B m y r / r .

(9) i. d (As/i)d (Bmyr;Ai\-A (Bmyr;As\-A Bm;T,Aj\-A (Bk),(Cn),r-,Aa\-A

Bk,Cn,T,Aj\-A {Bk),Cn,T;Aty- A Bk,Cn,Y,Ajb A

A Formulation of the Classical Sequent Calculus 217

ii. d d (Amy,r;Ai\-A ( A m ) ; r ; i i h A

Am;Tt-A ^ m ; r ; ^ s l - A 4 , C „ , r h A Ak,C„,T;As\-A

Ak,Cn,T\-A

a. Like (9ai) and (9aii) with Cn on the right.

(10) a. Like (10b) below, with Cn on the left.

b. d d{Aa/l) {Bm);T\-A;Aj (Bm);T h A; As

£ ? m ; r h A , ^ (Bk),T\-A,{Cn);At

Bk,rhA,Aj,Cn Bk,T\-A,{Cn);At

Bk,T\- A,Aj,C„

(11) a. d d(A,/i) r h A ; ( B m ) ; A T \-A; (Bm); A

• ix r h A ; B m r r - A , ( B f c ) , ( C n ) ; i s

T\-A,Bk,Cn r\-&,(Bk),Cn;At n

r h A , ^ , c n

where R is V-, 3- or V-right. In this last case the proper parameter of R, if it occurs in C, must be replaced in diAg^) by one which does not.

b. Like (11a) with Cn on the left.

(12) a d r;Ai\-Bj;(A-+ Bm);A

D-A^Bm;A

d(Bt/j) Y;A^Bu{A^Bm);A

T;AihBu(A^Bk),{Cn),A T\-A^Bk,Cn,A r;Ai\-Bu;(A->Bk),Cn,A

Y\-A-+Bk,Cn,A

b. Like (12a) with Cn on the left and (As^)d instead oid{Bt/j).

(13) a. Like (13b) below with Cn on the right.

b. d (A9/i)d r;^h(^);A r;A.\-(^Ai);A

rf-^;A T,(Cm);As\-hAk),A T,Cm\-^Ak,A r,Cm;At\-(^Ak),A

T,Cm\-^Ak,A

(14) i. dx d2

r\-A;(AABm);Ai F' 1- A'; (A A Bm); B, r,r'\-&,&';AABm

r , r i - A, A', A A Bk,cn

218 Normalization, Cut-Elimination and the Theory of Proofs

di{Aa/i) rhA;(AABmM, d2(Bt/j)

r\-A,(AABk),(Cn);Aa T' \- A'; (A A Bm); Bt

r I- A, (A A Bk), Cn; Au V h A', (A A Bk), (Cn); Bt

r,r'hA,A',AABfe,c„

ii. d\ ^2 r\-&i(AABm);Ai rr-A'jjAABnhBj

r,r'*-A,A';AABm

r,T't-A,A',AABk,Cn

dMs/i) T'hA';{AABm)\Bt

r\-A;(AABm);Aa V \- A',(A A Bk),(Cn);Bt

r h A, (A A Bk), (Cn); ^ s r h A', (A A Bk), Cn; Bv

r,r \-A, A \ A A Bk,cn

a. Like (Mai) and (14aii) with Cn on the right.

(15) i. d d(As/i) r\-A;Ai;(Bm) T\-A;As;(Bm) T\-A,Aj;Bm r\-A,(Bk),(C„);As

ThA,Aj,Bk,Cn r\-A,(Bk),Cn;At

Y\-A,Aj,Bk,Cn

ii. d d T\-A;Ai;(Am) T\-A;Ai;(Am)

T\-A;Am T\-A;Am;As

T\-A,Ak,Cn r\-A,Ak,Cn;As

T\-A,Ak,Cn

a. Like (15ai) and (15aii) with Cn on the left.

(16) a. Like (16b) below with Cn on the right. b. d (As/i)d

r;Ai\-A;(Bm) T; As b A; (Bm) r,Aj\-A;Bm (Cn),T;AahA,(Bk)

Cn,T,Aj h A ,B k C n , r ; i , h A , ( B t ) Cn,T,Aj\-A,Bk

G. Additional Thinning Permutations

(1) a. d d T\-A;Ai T\-A;Ai

Tr-A,Bj;Ak r\-A,Cn;As

r\-A,Bj,Cn,Am Tr-^B^CntAr,

A Formulation of the Classical Sequent Calculus 219

(2)

b. c. d.

a.

Like (la) with Cn on the left. Like (la) with Bj on the left. Like (lc) with Cn on the left.

d d Ai-,T\- A

Ak-Bj,T\- A Ami Cn•> Bj, r r A

As;Cn,Th-A Am,Gn,Bj,l r A

b. Like (2a) with Cn on the right. c. Like (2a) with Bj on the right. d. Like (2c) with Cn on the right.

Appendix C

Proofs and Categories

This appendix outlines one way in which the derivations of a formal system may be regarded as representing the morphisms of a category with some additional structure. The possibility of such a representation arises from the similarity between the rules of Gentzen's N and L systems, on the one hand, and the definitions of product, exponent, etc. in category theory, on the other. Because these morphisms must satisfy certain identities, they are not in general represented by a unique derivation. This naturally suggests the question of what logical sense can be made of the notion of two derivations representing the same morphism and whether this relationship can be characterized solely in terms of structural properties of the derivations themselves, without reference to their categorial interpretation.

These topics were first investigated by Lambek in a series of papers on deductive systems and categories.1 Subsequently, Szabo gave an account of the negative fragment of LJ in terms of a relation between derivations which he called 'equi-generality'.2 Mann reproduced Szabo's results for the negative fragment of NJ and showed that equi-generality was equivalent to the relation 'being reducible to the same expanded normal form' (in the sense of Prawitz).3 When Szabo extended his treatment to the whole of intuitionistic (first-order) predicate logic, he abandoned his original equivalence relation between derivations in favor of one which, like Mann's, is defined in terms of unique normal forms.4 (The word "unique" needs some qualification, but I shall ignore that complication here.)

Derivations most naturally represent morphisms having a sequence of

1 "Deductive systems and categories I," Mathematical Systems Theory, Vol. 2, pages 287-318, "Deductive systems and categories II" in Springer Lecture Notes in Mathematics, Vol. 86, pages 76-122, and "Deductive systems and categories III" in Springer Lecture Notes in Mathematics, Vol. 274, pages 57-82.

2 "A categorical equivalence of proofs," Notre Dame Journal of Formal Logic, Vol. 15, 1974, pages 177-191. See also the addendum to this paper in the same journal, Vol. 17, 1976, page 78.

3 "The connection between equivalence of proofs and cartesian closed categories," Proceedings of the London Mathematical Society (3), Vol. 31, 1975, pages 289-310.

4 Algebra of Proofs, Amsterdam, 1978.

220

Proofs and Categories 221

objects, rather than a single object, as their domain. There is, however, no standard way of treating such morphisms within category theory. Of course, in the presence of pairing (or products) an n-place function can always be regarded as a function of one argument, but this approach is not well suited to the present purpose. On the other hand, the multicategories of Lambek and sequential categories of Szabo are cumbersome and difficult to work with. I propose to adopt here an alternative notion of 'multicate-gory'. Rather than defining it at the outset, however, I shall try to show how it arises from the attempt to introduce additional structure into an ordinary category.

The idea is to give a categorial interpretation of propositional logic (or, more properly, of its derivations). So, let Lp be a language comprising a set P of propositional variables, the propositional constants T and _L, and the binary connectives A, V and —>. Let Cp be the discrete category whose objects are the members of P. I want to extend Cp to a cartesian category CA which will serve to interpret logic based on the A- fragment of Lp. The objects of CA will be the formulae built up from the members of P by means of conjunction. The morphisms of CA will include an identity morphism for each object. In addition, for each object of the form A A B, there will be projection maps ir^AB: A A B »-• A and n^3'- A /\ B *-+ B. (I will suppress reference to their domains when these are obvious from the

context.) Finally, if / : C*-> A and g: £)»-• B, there will be a unique pairing

map (f,g): C, £)•-• A A B which makes the following diagram commute:

CJD

f/

/ </, / 1

A < A A S • B 7Ti 7T2

Here, C and D are supposed to be sequences of objects of CA, and C, D their concatenation. It is convenient to identify the object A with the sequence of length 1 whose only term is A. It remains to ensure that the multimaps of CA are closed under composition.

The commutativity of the product diagram means that

/ = {/, 9) ° TTI and g = (/, g) o 7r2

As a result, the domains of (/, g) o m (i = 1, 2) will be a subsequence of the domains of (/, g). In general, unless / and g both have single domains, the domains of / o g may differ from those of / . This makes it necessary

222 Normalization, Cut-Elimination and the Theory of Proofs

to specify domains when composing multimaps. It will usually be obvious what is intended, however, and in such cases I won't indicate it explicitly.

In addition to the above, CA should satisfy the usual axioms for a category (adapted straightforwardly to take account of multimaps). It follows that composition, however we understand it, must satisfy the following conditions:

(1) If g = 1A and / : C*-+ A, f o g = / ; if / = 1^ (where E is a term of

C) and g: C*-> A, f o g = g

(2) If / : 0-» A A B is of the form (/*, h'), where h: D^> A, hf: D'^ B

and C=D,D', then: a. if<7 = 7r^AB , fog = h b. ifg = 7rf*B, fog = h!

(3) If 9 = (h, ft'): I?, D'*-* A A B and / : C»-> E, then

fog = (fohJoh'): {C /E)DJ)'^AAB

By convention, / o g is just # when the range of / does not appear among

the domains of g. (C I A) D is supposed to be the result of replacing each

occurrence of A in D by the sequence C-(1) follows from the properties of identity morphisms in a category, (2)

from the commutativity of the product diagram, and (3) from the uniqueness requirement on morphisms of the form (x, y). To see this last notice

that f oh: (C /E) D>-> A and / o h'\ (C /E) D'*-> B, so there is a unique

(fohjo h'): {C IE) D, D'^ A/\B such that

(foh,hof')oiri = f oh

and

(foh,foh')on2 = foh'

But (fog)oTTi = fo(goTTi) = foh

and ( / og)o7T2 = foh'

Hence f°g = ifohjoh')

Suppose we have closed the class of projection maps under composition. (Since these always have a single domain, n o n' for example will be a map from the domain of n to the range of 7r'.) We can now define the class of multimaps of CA to be the closure under pairing of the identity, projection and compositions of projection maps. It is then easy to infer from conditions (l)-(3) above, using the associativity of composition, what might be called the cut-elimination theorem for the A-fragment of LJ, namely:

Proofs and Categories 223

The morphisms of CA are closed under composition. In other words, CA is a cartesian category.5

I now want to extend this treatment to include implication. The idea is to construct a cartesian closed category, C A -M whose objects will be the formulae built up from P by conjunction and implication. The morphisms of CA— will include identity, projection and pairing maps for this enlarged set of objects together with an evaluation map eA~*B: A,{A-* B) *-+ B for each object of the form A —• B. In addition, for each morphism

/ : A, ( 7 ^ B, there will be a unique exponent map e( / ) : C^ (A —• B) such that the following diagram commutes:

The notation ^ * • for arrows is used to make it easier to read the domains of the various maps from the diagram. Here, e(f)oe and / have the same domains and the commutativity of the diagram allows us to identify these two morphisms.

In addition, to (1), (2) and (3) above, composition must satisfy:

(4) If g is for the form e(h): C*-> {A —• B) and / : D*-* J5, where E appears

among the terms of C, then / o g: (£) jE) C*-* (A —> B) is just e{foh).

(5) If / : D»-> {A —> B) is of the form e(/i), where h: A,D*-+ B, and g is eA^B, then / o g = h.

(Again, (4) follows from the uniqueness of e(f o h) and (5) from the commutativity of the exponent diagram.)

It might seem, by analogy with CA, that we need only close the evaluation and projection maps (taken together) under composition, add the resulting class to the identity morphisms of CA— and then close everything under pairing and exponentiation in order to obtain a cartesian closed category. Even with these additions, however, and conditions (l)-(5), there

5C/. the discussion of logical calculi at the beginning of Chapter 4 above, where it was pointed out that the meaning of 'application of a rule' determines the notion of substitution for derivations and that the class of derivations is closed under the latter operation.

224 Normalization, Cut-Elimination and the Theory of Proofs

is still one possibility left uncovered, namely the case in which / : D*~+ A is composed with eA^B'. This makes it necessary to introduce morphisms / o eA~*B: D,{A-+ B) \-+ B for each such / if the cut-elimination theorem is to hold for CA—• An alternative (more in the spirit of Gentzen's —»-left

rule) is to replace the maps eA~~*B by ef~*B: D, (A-> B) *-> B, for each

/ : D*-+ A, and require that, for any h: A,C>-> B, there is a unique map e(h) such that the following diagram commutes:6

/ C D

i e(h)\

i A-+B _ _ *

- ^ > > B

If this alternative is adopted, condition (4) remains unchanged, condition

(5) generalizes to:

(5') If / : D*-> (A —> B) is of the form e(ft), where h: A, D>-> B, and g is of

the form g: C,(A-> B) »-• B is of the form eArB, where h': C^ A, then fog — h' oh.

and the last remaining case is dealt with by the following:

(6) If / : E^> F (where F # A -+ B) and p = e*: 5 , (4 - • B) *-> B,

then / o g: (E /F) D,(A-> B)^B is efoh.

The cut-elimination theorem for CA_>—i.e., the statement that its morphisms are closed under composition, or that it is a closed cartesian category—can now be proved with the help of conditions (l)-(6) by a slightly more complicated inductive argument than was the case for CA. That the maps eA~*B do not suffice for cut-elimination is reflected in the sequent calculus by the fact that cut-elimination does not hold when the —•-left rule is replaced by

T;Bh A Y,A,A->B\- A

6Notice that e(h) does not depend upon / .

A,C

V

Proofs and Categories 225

In natural deduction calculi, it is reflected in the structure of normal derivations—more precisely, in the fact that a branch of such a derivation cannot pass through the minor premise of an application of —^-elimination if it is to consist of a series of eliminations followed only by introductions.

Despite the added complications associated with implication the above seems to provide rather a satisfactory interpretation of a fragment of the sequent (or natural deduction) calculus and its associated normal form theorem. Furthermore, it is easily extended to the full negative fragment (of intuitionistic propositional logic) by introducing an initial object _L into CA— together with a unique morphism JL^: _!_»-*• A for each object A. Unfortunately, the picture is spoiled somewhat by a number of complications having to do with the 'structural' properties of morphisms which I have chosen to disregard here. For example, we really need some principle corresponding to Gentzen's interchange rule which, given / : A,B »-• C, say, will ensure the existence of / ' : B,A*-+C and allow us to treat / and / ' as equivalent in some sense.

For reasons such as this, the approach sketched above does not work out very well in detail. It can, however, be modified in such a way as to avoid these difficulties. The modified approach is connected to the present one as the formulation of sequent calculi in terms of sets of indexed formulae is connected to the usual formulation in terms of sequences. Basically, the idea is to operate with sets of indexed formulae rather than sequences. The objects of the category should still be formulae, however, rather than indexed ones. One way of accomplishing this is to think of multimaps as having arbitrarily long sequences of sets of domains, not excluding the empty set. The occurrence of A as a member of the mth set of a sequence will be associated with the indexed formula Am. Within this framework, the interchange principle mentioned above is no longer needed, and structural operations corresponding to contraction and thinning can be conveniently introduced if desired.

My aim in this appendix has just been to give a preliminary exposition. For that purpose, the approach taken above seems the most perspicuous and is easiest to motivate. Rather than modifying it now and trying to spell out in detail the interpretation of derivations from the negative fragment, I want to conclude with a brief sketch of how disjunction might be incorporated into this framework.

Let us begin by reconsidering Cp to see what is involved in extending it to the dual of CA, the co-cartesian category Cv- The objects of Cv

are just the formulae built up from P using V, and its morphisms include identity maps (one for each object), injection maps z^VB: A »-> A V B and

%2yB' B H-> A V B for each object of the form A V B, and, given / : A *-+C

and g: B H-ȣ), a unique map [/, g]: Ay B i-+C, D, such that the following

226 Normalization, Cut-Elimination and the Theory of Proofs

diagram commutes:

AVB

Dually to the case of CA, the commutativity of the coproduct diagram means that

/ = ii°[f>9] and g = i2<>[/,0] so that the ranges of ij o [/, g] (j = 1, 2) will be a subsequence of the ranges of [/,<?]. In general, unless / and g both have single ranges, the ranges of fog may differ from those of g. Furthermore, composition in Cv must satisfy the duals of conditions (1), (2) and (3), namely:

(1') If f = 1A and g: A H-*C, then fog = g-, if g = \E (where E is a term

of C) and / : A >->C, fog = f.

(2') If g: Ay B *->C is of the form [ft, ft'], where ft: A H->5, ft': B ~D'

and C=D,D', then:

a. if / = ti , fog = ft.

b. if / = t2» f°9 = &'•

(3') If / = [ft, ft']: AW B >-+C and #: D »-+£, then

fog = [hog,tiog]: AyB^(C/D)E

It can now be shown that, once the injection maps are closed under composition, C v becomes a category of the appropriate kind. Of more interest, however, is the result of trying to extend CA and C v to a category CAv containing both products and coproducts. Composition in such a category must satisfy both (3) and (3'). So if / = [ft, ft'] and g = (fc, fc')

fog = [h,ft'] o (i,fc') = ([ft,ft'] o fc,[ft,ft'] o fc') by (3) = [fto(fc,fc'),ft'o(fc,fc')] by (3').

In the special case that h: A*-> D, h': B *-+ D, h = ID and fc': C*~* E,

where D is not among the terms in C, this yields:

w ([h,hik') = [(h,k'),(h\k')}

The derivations of NJ are mapped onto morphisms of a category in such a way that ft, for example, would be the image of a derivation of D from

Proofs and Categories 227

A. Translated into these terms, (*) becomes:

[A] [B] [A] C [B] C Ki n2 c iii n 3 n2 n3 AVB D D n3 = D E D E D ' E AVB DAE DAE

DAE DAE

This illustrates the fact that, if derivations are to be interpreted as the mor-phisms of a category, the appropriate equivalence relation between them is not simply interreducibility in the sense of Prawitz, but interreducibility between equivalence classes of derivations obtainable from one another by permuting inferences.7

The example (*) does not depend upon particular features of the description of multicategofies, in fact, it does not depend upon multicate-gories at all and can be reproduced using the standard definitions of product and coproduct in a category. An example which does depend upon mappings having more than one domain is the following.

Consider the maps / : A, G *-+ E, g: B,G *-+ E, h: C,H »-• E and A:: D.H^E. Then [/,#]: A V B,G ^ E and [h,k]: CVD,H^E.S Let

a = [ [ / , 0 M M H : AM B,G\/H,CV D^E

(C.l) i f v D o a = ft : AyB.GVH.C^E

and (C.2) ^ V D o a = /?2 : A\/B,GVH,D^E

It follows from the coproduct diagram that

(C.3) i f V H o fa = [/, g] and iff v H o fa = h

Also (C.4) i ? v " ° ft = [/,<?] and ifHofa = k

But there is a unique map /?i satisfying (C.3), namely [[/,#],/i], and a unique map f32 satisfying (C.4), namely [[/, g), k]. Similarly there is a unique map a satisfying (C.l) and (C.2), namely [ft, ft] (= [[[/,#], h], [[/,£],*:]]) In short, (C5) [[/,ff],[M]] = [[[fMM\f,9U]]

7 See also "Weak Adjointness in Proof Theory" by R. A. G. Seely (pp. 697-701 of Applications of Sheaves^ ed. by M. P. Fourman, C. Mulvey and D. S. Scott, Vol. 753 of Springer Lecture Notes in Mathematics, 1979), where a similar generalization of the permutative reductions is employed.

8To avoid the kind of problem discussed in Chapter 4 above (in connection with multiple conclusion derivations), I assume some way of amalgamating the occurrences of G in the domains of [f,g] and likewise those of H in [h,k]. These problems are easily solved— disappear, in fact—when the present approach is modified along the lines suggested earlier. Furthermore, for the sake of simplicity, I have tacitly identified the sequence £ , E with the object E.

228 Normalization, Cut-Elimination and the Theory of Proofs

= [[[[f,9},h),[f,9}],[[[f,9},h}M (this last equality by the same argument applied to the right hand side of (C.5)) and so on, ad infinitum. The reason I have given this example is that it corresponds to Zucker's non-terminating, non-repeating reduction sequence for natural deduction derivations. Here, however, it takes the less troubling form of an infinite number of terms all of which denote the same morphism.

Allowing morphisms which, in addition to a series of domains, also have a series of ranges introduces some problems which have not been addressed here. They are needed however if the full duality between products and coproducts is to be preserved. They also facilitate the interpretation of multiple conclusion derivations. For the derivations of NJ and LJ, on the other hand, they are not required and the coproduct diagram can be modified accordingly so that they do not arise.

List of Works Cited

Baker, G.P. and Hacker, P. Frege: Logical Excavations, New York, 1984.

Barwise, J., ed. Handbook of Mathematical Logic, Amsterdam, 1977.

Brouwer, L.E.J. "The Effect of Intuitionism on Classical Algebra of Logic," Proceedings of the Royal Irish Academy, Section A 57, 1955, pp. 113-116; reprinted in Vol. I of Brouwer's Collected Works, pp. 551-554. Collected Works, Vol. I edited by A. Heyting, Amsterdam 1975.

Church, A. The Calculus of \~Conversion, Princeton 1941.

Crossley, J.N. and Dummett, M.A.E., eds. Formal Systems and Recursive Functions, Amsterdam, 1965.

Curry, H.B. and R. Feys. Combinatory Logic I, Amsterdam, 1958.

Dragalin, A.G. "A strong theorem on normalization of derivations in Gentzen's sequent calculus," Studies in the Theory of Algorithms and Mathematical Logic ed. by A.A. Markov and V.I. Khomich, pp. 26-39 (Russian). Mathematical Intuitionism: introduction to proof theory, Vol. 67 of the AMS series "Translations of Mathematical Monographs," Providence, 1988.

Fenstad, J.E., ed. Proceedings of the second Scandinavian Logic Symposium, Amsterdam, 1971.

Feferman, S. Review of Prawitz, "Ideas and Results in Proof Theory," The Journal of Symbolic Logic, Vol. 40, 1977, pp, 232-234.

Fl0istad, G., ed. Contemporary Philosophy: a new survey, Vol. I, The Hague, 1981.

229

230 Normalization, Cut-Elimination and the Theory of Proofs

Fourman, M.P., C. Mulvey and D.S. Scott Applications of Sheaves. Proceedings of the L.M.S. Durham Symposium 1977, Vol. 753 of Springer Lecture Notes in Mathematics, New York and Berlin, 1979.

Frege, G. Conceptual Notation, and Related Articles, translated and edited by T.W. Bynum, Oxford, 1972. (This contains a complete translation of Begriffschrift.) Translations from the Philosophical Writings of Gottlob Frege edited by P. Geach and M. Black, 3rd. edition, Oxford, 1980.

Friedman, H. "Equality between Functional," Logic Colloquium: symposium on logic held at Boston, 1972-73, edited by R. Parikh, pp. 22-37.

Frisch, J.C. Extension and Comprehension in Logic, New York, 1969.

Gentzen, G. The Collected Papers of Gerhard Gentzen, edited and translated by M.E. Szabo, Amsterdam, 1969 "Untersuchungen liber das logische Schliefien," Mathematische Zeit-schrift, Vol. 39, 1934, pp. 176-210 and 405-431. (An English translation of this paper can be found in the preceding item.)

Girard, J.Y. "Linear Logic," Theoretical Computer Science, Vol. 50 (1987), pp. 1-102.

Girard, J.Y., Y. Lafont and P. Taylor. Proofs and Types, Cambridge, 1989.

Godel, K. "Uber eine bisher noch nicht beniitzte Erweiterung des finiten Stan-punktes," Dialectica, Vol. 12, 1958, pp. 280-287. Collected Works, Vol. II, Oxford, 1990.

Grube, G.M.A. Plato's Republic, translated by G.M.A. Grube, Indianapolis, 1974.

Gunthner, F. and D.M. Gabbay, eds. Handbook of Philosophical Logic, Vol. I (Elements of Classical Logic), Dordrecht, 1983.

Heyting, A. Intuitionism, 3rd. edition, Amsterdam, 1971.

Hilbert, D. and W. Ackermann. Principles of Mathematical Logic, New York, 1950. (This is a revised translation of the 2nd. edition of their Grundzuge der theoretischen Logic, Berlin, 1938.)

Hilton, P., ed. Category Theory, Homology Theory and their Applications I, Vol. 86

List of Works Cited 231

of Springer Lecture Notes in Mathematics, Berlin and New York, 1969.

Hindley, J.R. and Seldin, J.P. Introduction to Combinators and \-Calculus, Cambridge, 1986.

Howard, W. "The Formulae-As-Types Notion of Construction," manuscript, 1969; a slightly revised version of this paper appears in To H.B. Curry, edited by Seldin and Hindley, pp. 479-490.

Hyland, J. and R. Gandy, eds. Logic 76, Amsterdam, 1977.

Kanger, S., ed. Proceedings of the third Scandinavian Logic Symposium, Amsterdam, 1975.

Kleene, S.C. Mathematical Logic, New York, 1967.

Kline, M. Mathematics and the Loss of Certainty, Oxford, 1980.

Kneale, W. "The Province of Logic," Contemporary British Philosophy, third series, ed. by H.D. Lewis, London, 1956, pp. 237-261.

Kneale, W. and M. Kneale. The Development of Logic, Oxford, 1962.

Kreisel, G. "A Survey of Proof Theory," Journal of Symbolic Logic, Vol. 33, 1968, pp. 321-388. Review of Tait, "Intensional Interpretations of Functional of Finite Type I," Zentralblatt fur Mathematik, Vol. 174, 1969, pp. 12-13. Review of Szabo (ed.), The Collected Papers of Gerhard Gentzen, The Journal of Philosophy, Vol. 68, 1971, pp. 238-265. "A Survey of Proof Theory II," Proceedings of the second Scandinavian Logic Symposium, ed. by J.E. Fenstad, pp. 109-170.

Lambek, J. "Deductive Systems and Categories I," Mathematical Systems Theory, Vol. 2, pp. 287-318. "Deductive Systems and Categories II," in Category Theory, Homology Theory and their Applications I, edited by P. Hilton, pp. 76-122. "Deductive Systems and Categories III," in Toposes, Algebraic Geometry and Logic, edited by F.W. Lawvere, pp. 57-82.

Lawvere, F.W., ed. Toposes, Algebraic Geometry and Logic, Vol. 274 of Springer Lecture Notes in Mathematics, Berlin and New York, 1972.

232 Normalization, Cut-Elimination and the Theory of Proofs

Leivant, D. "Assumption Classes in Natural Deduction," Zeitschrift fiir mathe-matische Logik und Grundlagen der Mathematik, Vol. 25, 1979, pp. 1-4.

Lewis, H.D., ed. Contemporary British Philosophy, third series, London, 1956.

Mann, C. "The Connection between Equivalence of Proofs and Cartesian Closed Categories," Proceedings of the London Mathematical Society (3), Vol. 31, 1975, pp. 289-310.

Markov, A.A. and V.I. Khomich, eds. Studies in the Theory of Algorithms and Mathematical Logic, "Nau-ka," Moscow, 1979 (Russian).

Martin-Lof, P. "About Models for Intuitionistic Type Theories and the Notion of Definitional Equality," Proceedings of the third Scandinavian Logic Symposium, edited by S. Kanger, pp. 81-109.

McCall, S., ed. Polish Logic 1920-1939, Oxford, 1967.

Nagel, E., P. Suppes and A. Tarski, eds. Logic, Methodology and Philosophy of Science, Stanford, 1962.

Olson, K.R. An Essay on Facts, Stanford, 1987.

Parikh, R., ed. Logic Colloquium: symposium on logic held at Boston, 1972-73, Vol. 453 of Springer Lecture Notes in Mathematics, Berlin and New York, 1975.

Pottinger, G. "Normalization as a Homomorphic Image of Cut-Elimination," Annals of Mathematical Logic, Vol. 12, 1977, pp. 323-357.

Prawitz, D. "Ideas and Results in Proof Theory," in Proceedings of the second Scandinavian Logic Symposium, edited by J.E. Fenstad, pp. 235-307. Natural Deduction. A Proof-Theoretical Study, Uppsala, 1965. "On the Idea of a General Proof Theory," Synthese, Vol. 27, 1974, pp. 63-77. "Philosophical Aspects of Proof Theory," Contemporary Philosophy, Vol. I, ed. by G. Fteistad, pp. 235-277. "Towards a Foundation of a General Proof Theory," Logic, Methodology and Philosophy of Science IV, edited by P. Suppes et ah, pp. 225-250.

Ribenboim, P. The Book of Prime Number Records, 2nd. edition, New York, 1989.

List of Works Cited 233

Russell, B. and A.N. Whithead. Principia Mathematica, Vol. I, Second Edition, Cambridge, 1927.

Schwichtenberg, H. "Proof Theory: Some Applications of Cut-Elimination," Handbook of Mathematical Logic, ed. by J. Barwise, pp. 867-895.

Seely, R.A.G. "Weak Adjointness in Proof Theory," Applications of Sheaves, ed. by M.P. Fourman, C. Mulvey and D.S. Scott, pp. 697-701.

Seldin, J.P. and J.R. Hindley, eds. To H.B. Curry: essays on combinatory logic. A-calculus and formalism, New York, 1980.

Shoesmith, D.J. and T.J. Smiley. Multiple-Conclusion Logic, Cambridge, 1978.

Sundholm, G. "Systems of Deduction," Handbook of Philosophical Logic, Vol. I, ed. by Gunthner and Gabbay, pp. 133-188.

Suppes, P. et ah, eds. Logic, Methodology and Philosophy of Science IV (proceedings of the fourth international congress for logic, methodology and philosophy of science held at Bucharest in 1971), Amsterdam, 1973.

Szabo, M.E. "A Categorical Equivalence of Proofs," Notre Dame Journal of Formal Logic, Vol. 15, 1974, pp. 177-191, and the addendum thereto in Vol. 17, 1976, p. 78. The Algebra of Proofs, Amsterdam, 1978.

Tait. W. "Infinitely Long Terms of Transfinite Type," Formal Systems and Recursive Functions, edited by Crossley and Dummett, pp. 176-185. "Intensional Interpretation of Functionals of Finite Type I," The Journal of Symbolic Logic, Vol. 32, 1967, pp. 198-212.

Wang, H. Reflections on Kurt Godel, Cambridge, Mass., 1987.

Zucker, J.I. "The Correspondence between Cut-Elimination and Normalization," Annals of Mathematical Logic, Vol. 7, 1974, pp. 1-156.

Index

Ackermann, W., 13, 18 active occurrence, 29 adequate for 11, 62 alike, 83 almost alike, 82 almost cut-free, 35 analytic part, 23 assumption class, 16

Baker, G.P., 170 Begriffschrift, 168 Bernays, P., 171 Black, M., 169 branch of derivation, 72 Brouwer, L.E.J., 1, 163

canonical inference, 157 category, 73

cartesian, 221, 224 co-cartesian, 225 theory, 221

Church, A., 153 Church-Rosser type theorem, 40 classical negation rule, 18 closed assumptions, 15 closed cartesian category, 224 closed instance of an argument, 157 cluster, 69 co-cartesian category, 225 combination of graphs, 80 combinators, 154 compatible quasi-deri vat ions, 83 completeness theorem for typed

A-calculus, 183

composition, 222 computable functionals of finite type,

170 computation rules, 8 congruent quasi-deri vat ions, 82 contractum, 20 convertibility, 158 Copi, I.M. 11 coproduct diagram, 226 crucial elimination, 37 Curry, H.B., 153, 154 CUT, 86 cut-elimination theorem, 9

DT, 107 D'T, 109 definitional equality, 162 degree of a mix, 30 derivation, 5 discrete category, 221 Dragalin, A.G., 43, 193

eigenvariable, 17 elimination, 7, 17

segment, 21 equality of content, 168 equality relation, 8 equi-generality, 220 essential cut, 35 evaluation map, 223 expansion, 184 exponent diagram, 223 exponent map, 223 extensional equality, 172

234

Index 235

extensionality, 173

Feferman, S., 159 Feys, R., 153 finitary mathematics, 171 finite type, 170 Friedman, H., 183 Frisch, J.C., 172 Frege, G., 14, 168 full type-structure, 183

Gddel, K., 153 Geach, P., 169 general proof theory, 39, 178 Gentzen, G., 7 Girard, J.Y., 55, 74, 154 Glivenko, 18 Grundgesetze, 168

Hacker, P., 170 Hauptsatz, 14, 28 Heyting, A., 154, 182 Hilbert, D., 1, 13, 18 Hilbert-style formalization, 13 Howard, W., 154

identity, 3 criteria, 2 morphism, 221 of proofs, 155

immediate subderivation, 194 inductive complexity, 195 inductive derivation, 195 inital subderivation, 112 initial object, 225 injection maps, 225 intensional object, 8, 174 intensionality, 172 introduction, 7, 17

segment, 21 intuitionism, 163 intuitionistic negation rule, 17 inversion principle, 155

Jaskowski, S., 12 justification for an inference, 157 justifying operations, 156

Kleene, S.C., 39

Kneale, W., 70 Kreisel, G., 39, 159

A-calculus, 8 A-terms, 177 A/?-conversion, 165 Lambek, J., 220 left rules, 26 Leivant, D., 16 LJ, 27 LJ^\ 52 LJD, 98 LJDT,110

LK, 26 LKD, 98 LKDT, 110

logical operators, 6 Lukasiewicz, J., 12

main routes, 22 major premise, 17 Mann, C., 220 Martin-L6f, P., 8, 45, 154 maximal formula occurrence, 19, 102 maximal segment, 21 meaning, 153 minimal logic, 17 minimal segment, 21 minor premise, 17 mix rule, 28 morphisms of a category, 185 multicategories, 221 multimaps, 221 multiple-conclusion logic, 149

natural deduction, 7 ND, 86 ATJ, 17 Nf, 56 NJ(~V\ 51 ATJ<-V)/, 51 NJD, 97 NJDT, 110

NK, 17 NK', 150 NKID, 99

NK2D, 99 NKDT, 110

236 Normalization, Cut-Elimination and the Theory of Proofs

non-extensional domain, 173 normal form, 8, 20

theorem, 7 for 7VJ, 20

Olson, K.R., 169 open assumption, 15

pairing map, 221 permutative reduction, 21, 118 Plato's Republic, 181 Pottinger, G., 43 power of a formula occurrence, 197 Prawitz, D., 8, 154 primary and secondary qualities, 175 primitive reduction, 193 principle of extensionality, 174 principle of sufficient reason, 164 product diagram, 221 projection map, 222 proof, 5 proof-net, 75 proper parameter, 17, 51 proper reduction sequence, 40, 116,

199 proposition, 2 provable identity, 165 pruning reduction, 120

qualities, 175 quasi-derivation, 81

rank for indexed formulae, 35 rank of a formula, 30 rank of a mix, 30 reasoning, 180 redex, 20 reduce in one step, 20 reduction relations, 7 reduction sequence, 40 regular figure, 195 right rules, 26 routes, 22

rule of non-constructive dilemma, 150 Russell, B., 11

Schwichtenberg, H., 196 Seely, R.A.G., 227 separability property, 7 sequent calculus, 9, 26 sequential categories, 221 set theory, 6 Shoesmith, D.J., 55 simultaneous substitution, 58 Smiley, T.J., 55 strong cut-elimination, 186

theorem, 40 strong equivalence, 40 strong normalization theorem, 31 strong reduction, 167 strong validity, 157 subderivation, 20, 112 subformula property, 22 substitution, 91 Sundholm, G., 150 synonymy, 164 synthetic part, 23 Szabo, M.E., 220

T, 170 table of development, 70 Tait, W., 8, 154 t-connection, 50 theory of proofs, 5 thinning, 106

permutation, 122 reduction, 119

tree, 8

validity, 156

weak equality, 165 weak reduction, 165 weight of an application of cut, 198

Zucker, J.L, 9

CSLI Publications

Lecture Notes The titles in this series are distributed by the University of Chicago Press and may be purchased in academic or university bookstores or ordered directly from the distributor: Order Department, 11030 S. Langely Avenue, Chicago, Illinois 60628.

A Manual of Intensional Logic. Johan van Benthem, second edition, revised and expanded. Lecture Notes No. 1. ISBN 0-937073-29-6 (paper), 0-937073-30-X (cloth)

Emotion and Focus. Helen Fay Nis-senbaum. Lecture Notes No. 2. ISBN 0-937073-20-2 (paper)

Lectures on Contemporary Syntactic Theories. Peter Sells. Lecture Notes No. 3. ISBN 0-937073-14-8 (paper), 0-937073-13-X (cloth)

An Introduction to Unification-Based Approaches to Grammar. Stuart M. Shieber. Lecture Notes No. 4. ISBN 0-937073-00-8 (paper), 0-937073-01-6 (cloth)

The Semantics of Destructive Lisp. Ian A. Mason. Lecture Notes No. 5. ISBN 0-937073-06-7 (paper), 0-937073-05-9 (cloth)

An Essay on Facts. Ken Olson. Lecture Notes No. 6. ISBN 0-937073-08-3 (paper), 0-937073-05-9 (cloth)

Logics of Time and Computation. Robert Goldblatt, second edition, revised and expanded. Lecture Notes No. 7. ISBN 0-937073-94-6 (paper), 0-937073-93-8 (cloth)

Word Order and Constituent Structure in German. Hans Uszkoreit. Lecture Notes No. 8. ISBN 0-937073-10-5 (paper), 0-937073-09-1 (cloth)

Color and Color Perception: A Study in Anthropocentric Realism. David Russel Hilbert. Lecture Notes No. 9. ISBN 0-937073-16-4 (paper), 0-937073-15-6 (cloth)

Prolog and Natural-Language Analysis. Fernando C. N. Pereira and Stuart M. Shieber. Lecture Notes No. 10. ISBN 0-937073-18-0 (paper), 0-937073-17-2 (cloth)

Working Papers in Grammatical Theory and Discourse Structure: Interactions of Morphology, Syntax, and Discourse. M. Iida, S. Wechsler, and D. Zee (Eds.) with an Introduction by Joan Bresnan. Lecture Notes No. 11. ISBN 0-937073-04-0 (paper), 0-937073-25-3 (cloth)

Natural Language Processing in the 1980s: A Bibliography. Gerald Gaz-dar, Alex Franz, Karen Osborne, and Roger Evans. Lecture Notes No. 12. ISBN 0-937073-28-8 (paper), 0-937073-26-1 (cloth)

Information-Based Syntax and Semantics. Carl Pollard and Ivan Sag. Lecture Notes No. 13. ISBN 0-937073-24-5 (paper), 0-937073-23-7 (cloth)

Non-Well-Founded Sets. Peter Aczel. Lecture Notes No. 14. ISBN 0-937073-22-9 (paper), 0-937073-21-0 (cloth)

Partiality, Truth and Persistence. Tore Langholm. Lecture Notes No. 15. ISBN 0-937073-34-2 (paper), 0-937073-35-0 (cloth)

Attribute-Value Logic and the Theory of Grammar. Mark Johnson. Lecture Notes No. 16. ISBN 0-937073-36-9 (paper), 0-937073-37-7 (cloth)

The Situation in Logic. Jon Barwise. Lecture Notes No. 17. ISBN 0-937073-32-6 (paper), 0-937073-33-4 (cloth)

The Linguistics of Punctuation. Geoff Nunberg. Lecture Notes No. 18. ISBN 0-937073-46-6 (paper), 0-937073-47-4 (cloth)

Anaphora and Quantification in Situation Semantics. Jean Mark Gawron and Stanley Peters. Lecture Notes No. 19. ISBN 0-937073-48-4 (paper), 0-937073-49-0 (cloth)

Propositional Attitudes: The Role of Content in Logic, Language, and Mind, C. Anthony Anderson and Joseph Owens. Lecture Notes No. 20. ISBN 0-937073-50-4 (paper), 0-937073-51-2 (cloth)

Literature and Cognition. Jerry Ft. Hobbs. Lecture Notes No. 21. ISBN 0-937073-52-0 (paper), 0-937073-53-9 (cloth)

Situation Theory and Its Applications, Vol. 1. Robin Cooper, Kuniaki Mukai, and John Perry (Eds.). Lecture Notes No. 22. ISBN 0-937073-54-7 (paper), 0-937073-55-5 (cloth)

The Language of First-Order Logic (including the Macintosh program, TarskVs World). Jon Barwise and John Etchemendy, second edition, revised and expanded. Lecture Notes No. 23. ISBN 0-937073-74-1 (paper)

Lexical Matters. Ivan A. Sag and Anna Szabolcsi, editors. Lecture Notes No. 24. ISBN 0-937073-66-0 (paper), 0-937073-65-2 (cloth)

Tarskifs World. Jon Barwise and John Etchemendy. Lecture Notes No. 25. ISBN 0-937073-67-9 (paper)

Situation Theory and Its Applications, Vol. 2. Jon Barwise, J. Mark Gawron, Gordon Plotkin, Syun Tutiya, editors. Lecture Notes No. 26. ISBN 0-937073-70-9 (paper), 0-937073-71-7 (cloth)

Literate Programming. Donald E. Knuth. Lecture Notes No. 27. ISBN 0-937073-80-6 (paper), 0-937073-81-4 (cloth)

Normalization, Cut-Elimination and the Theory of Proofs. A. M. Ungar. Lecture Notes No. 28. ISBN 0-937073-82-2 (paper), 0-937073-83-0 (cloth)

Lectures on Linear Logic. A. S. Troel-stra. Lecture Notes No. 29. ISBN 0-937073-77-6 (paper), 0-937073-78-4 (cloth)

A Short Introduction to Modal Logic. Grigori Mints. Lecture Notes No. 30. ISBN 0-937073-75-X (paper), 0-937073-76-8 (cloth)

Other CSLI Titles Distributed by UCP Agreement in Natural Language: Ap

proaches, Theories, Descriptions. Michael Barlow and Charles A. Ferguson (Eds.). ISBN 0-937073-02-4 (cloth)

Papers from the Second International Workshop on Japanese Syntax. William J. Poser (Ed.). ISBN 0-937073-38-5 (paper), 0-937073-39-3 (cloth)

The Proceedings of the Seventh West Coast Conference on Formal Linguistics (WCCFL 7). ISBN 0-937073-40-7 (paper)

The Proceedings of the Eighth West Coast Conference on Formal Linguistics (WCCFL 8). ISBN 0-937073-45-8 (paper)

The Phonology-Syntax Connection. Sharon Inkelas and Draga Zee (Eds.) (co-published with The University of Chicago Press). ISBN 0-226-38100-5 (paper), 0-226-38101-3 (cloth)

The Proceedings of the Ninth West Coast Conference on Formal Linguistics (WCCFL 9). ISBN 0-937073-64-4 (paper)

Japanese/Korean Linguistics. Hajime Hoji (Ed.). ISBN 0-937073-57-1 (paper), 0-937073-56-3 (cloth)

Experiencer Subjects in South Asian Languages. Manindra K. Verm a and K. P. Mohanan (Eds.). ISBN 0-937073-60-1 (paper), 0-937073-61-X (cloth)

Grammatical Relations: A Cross-Theoretical Perspective. Katarzyna Dziwirek, Patrick Farrell, Errapel Mejias Bikandi (Eds.). ISBN 0-937073-63-6 (paper), 0-937073-62-8 (cloth)

The Proceedings of the Tenth West Coast Conference on Formal Linguistics (WCCFL 10). ISBN 0-937073-79^2 (paper)

Books Distributed by CSLI The Proceedings of the Third West Coast

Conference on Formal Linguistics (WCCFL 3). ($10.95) ISBN 0-937073-45-8 (paper)

The Proceedings of the Fourth West Coast Conference on Formal Linguistics (WCCFL 4). ($11.55) ISBN 0-937073-45-8 (paper)

The Proceedings of the Fifth West Coast Conference on Formal Linguistics (WCCFL 5). ($10.95) ISBN 0-937073-45-8 (paper)

The Proceedings of the Sixth West Coast Conference on Formal Linguistics (WCCFL 6). ($13.95) ISBN 0-937073-45-8 (paper)

Hausar Yau Da Kullum: Intermediate and Advanced Lessons in Hausa Language and Culture. William R. Leben, Ahmadu Bello Zaria, Shekarau B. Maikafi, and Lawan Danladi Yalwa. ($19.95) ISBN 0-937073-68-7 (paper)

Hausar Yau Da Kullum Workbook. William R. Leben, Ahmadu Bello Zaria, Shekarau B. Maikafi, and Lawan Danladi Yalwa. ($7.50) ISBN 0-93703-69-5 (paper)

Ordering Titles Distributed by CSLI Titles distributed by CSLI may be ordered directly from CSLI Publications, Ventura Hall, Stanford University, Stanford, California 94305-4115 or by phone (415)723-1712 or (415)723-1839. Orders can also be placed by e-mail ([email protected]) or FAX (415)723-0758.

All orders must be prepaid by check, VISA, or MasterCard (include card name, number, expiration date). For shipping and handling add $2.50 for first book and $0.75 for each additional book; $1.75 for the first report and $0.25 for each additional report. California residents add 7% sales tax.

For overseas shipping, add $4.50 for first book and $2.25 for each additional book; $2.25 for first report and $0.75 for each additional report. All payments must be made in US currency.


Recommended