+ All Categories
Home > Documents > New Expected utility theory and some extensions · 2014. 3. 26. · Expected utility theory and...

New Expected utility theory and some extensions · 2014. 3. 26. · Expected utility theory and...

Date post: 23-Oct-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
39
Expected utility theory and some extensions * Paul Schweinzer Birkbeck College, University of London School of Economics, Mathematics, and Statistics Malet Street, Bloomsbury, London WC1E 7HX [email protected] Abstract We provide a nontechnical survey of expected utility theory and some of its exten- sions. The general focus is on the interpretation of the theory rather than on the formal discussion of the theory’s properties for which we refer to original work. 1 Introduction Choice under risk 1 is the branch of economic theory that models decisions under imperfect information over the course of future events. Expected utility theory restricts the set of all possible preferences over risky alternatives to a subset which defines the rationality of the indi- viduals holding such preferences. Modern expected utility theory was created by von Neumann & Morgenstern as a theory based on objective probabilities. 2 Its subjective Bayesian foun- dations were developed by Savage and others into the main theoretical vehicle used to model economic situations under uncertainty. It was widely successful in key applications like insur- ance theory and portfolio selection and has become the workhorse of modern utility theory. It is a normative, axiomatised theory that is faced with a handful of seemingly paradoxical but systematic empirical violations—an example is simultaneous gambling and insurance—which confirm von Neumann & Morgenstern’s view that expected utility theory is not a description of people’s actual behaviour (cf. [Mor79, 176f]). Strictly speaking, one is not forced to accept any of these empirical objections as falsifications. 3 It is always possible to say (with Savage) that expected utility theory is not merely a prescriptive theory and, at the end of the day, thorough deliberation will lead individuals to accept expected utility theory as the only theory of rational choice. In that view, the principle of rationality is regarded rather as an animating principle than a testable hypothesis and, consequently, all deviations from expected utility theory are considered theories of irrational choice. This strategy, though, is not entirely convincing. The theory’s inability to solve several ‘paradoxes’ satisfactorily has given rise to the development of alternative and descriptively more successful non -expected utility theories. In general, these * This work is part of my M.Sc. dissertation at the London School of Economics and Political Science, Department of Philosophy, Logic and Scientific Method. I am grateful for helpful discussions, comments, and corrections of numerous mistakes to Richard Bradley, Till Gr¨ une, Georg Kirchsteiger, Christoph Schmidt-Petri, and Luca Zamparini. 1
Transcript
  • Expected utility theory

    and some extensions∗

    Paul SchweinzerBirkbeck College, University of London

    School of Economics, Mathematics, and Statistics

    Malet Street, Bloomsbury, London WC1E 7HX

    [email protected]

    Abstract

    We provide a nontechnical survey of expected utility theory and some of its exten-

    sions. The general focus is on the interpretation of the theory rather than on the formal

    discussion of the theory’s properties for which we refer to original work.

    1 Introduction

    Choice under risk1 is the branch of economic theory that models decisions under imperfectinformation over the course of future events. Expected utility theory restricts the set of allpossible preferences over risky alternatives to a subset which defines the rationality of the indi-viduals holding such preferences. Modern expected utility theory was created by von Neumann& Morgenstern as a theory based on objective probabilities.2 Its subjective Bayesian foun-dations were developed by Savage and others into the main theoretical vehicle used to modeleconomic situations under uncertainty. It was widely successful in key applications like insur-ance theory and portfolio selection and has become the workhorse of modern utility theory. Itis a normative, axiomatised theory that is faced with a handful of seemingly paradoxical butsystematic empirical violations—an example is simultaneous gambling and insurance—whichconfirm von Neumann & Morgenstern’s view that expected utility theory is not a description ofpeople’s actual behaviour (cf. [Mor79, 176f]). Strictly speaking, one is not forced to accept anyof these empirical objections as falsifications.3 It is always possible to say (with Savage) thatexpected utility theory is not merely a prescriptive theory and, at the end of the day, thoroughdeliberation will lead individuals to accept expected utility theory as the only theory of rationalchoice. In that view, the principle of rationality is regarded rather as an animating principlethan a testable hypothesis and, consequently, all deviations from expected utility theory areconsidered theories of irrational choice. This strategy, though, is not entirely convincing. Thetheory’s inability to solve several ‘paradoxes’ satisfactorily has given rise to the developmentof alternative and descriptively more successful non-expected utility theories. In general, these

    ∗ This work is part of my M.Sc. dissertation at the London School of Economics and Political Science,

    Department of Philosophy, Logic and Scientific Method. I am grateful for helpful discussions, comments, and

    corrections of numerous mistakes to Richard Bradley, Till Grüne, Georg Kirchsteiger, Christoph Schmidt-Petri,

    and Luca Zamparini.

    1

  • theories are mathematically less elegant and analytically less powerful than their antecedent butare more general in the sense that they give up certain aspects of the theory in order to bringabout more consistency with empirical results. Both their more cumbersome application togeneral economic questions and their (allegedly) inferior normative appeal are the reasons whythere is no established alternative to expected utility theory which is widely used for modeling.

    The following section is concerned with presenting von Neumann-Morgenstern expected util-ity theory. The normative axioms that, if obeyed, lead individuals to behave as if they weremaximising expected utility are discussed. Several subsections are devoted to the description ofthe empirical and theoretical problems troubling the theory. These constitute the principle rea-son for being concerned with generalisations of expected utility theory. Throughout the analysiswe shall focus on the objective probability interpretation of von Neumann & Morgenstern—Savage’s subjective interpretation will only be touched on when indispensable. In that sense,we shall not discuss theories of choice under uncertainty. This is a serious restriction becauseobjective probabilities may well not exist for some real choice situations.4

    In the third section, we shall elaborate on a ‘minimalist’ extension of the von Neumann-Morgenstern framework. It has interesting interpretations, is easily tractable and should thus beuseful for the development of the arguments that form the ratio for more complicated extensionsdiscussed in the fourth section.

    The fourth section presents some of the most widely discussed theories based on non-expected utility functions. We are concerned with only one strand of the theoretical literature—axiomatic non-expected utility theory—and structure this literature according to some plau-sible deviations from the independence axiom. These are the betweenness property and rank-dependent probability representations.5

    In section five we shall discuss some interpretations of the different axiomatic frameworks.Most of the non-technical examples are given in this section.6

    2 von Neumann-Morgenstern expected utility theory

    2.1 The model

    Von Neumann-Morgenstern expected utility theory asks us to express our preferences overlotteries much in the same way as we usually express our preferences over goods in the generaleconomic theory of choice under certainty. Lotteries are taken to be representations of risky oruncertain, mutually exclusive alternatives; they are denoted by {x1: payoff in case of outcome1, :, p1: probability of outcome 1, x2: payoff in case of outcome 2, :, p2: probability of outcome2, . . . }. The outcomes (x1, . . . , xn) of simple lotteries are sure, while in compound lotteries wereplace the sure outcomes by (simple) lotteries. We shall assume throughout the discussionthat a compound lottery can be reduced to a probabilistically equivalent simple lottery (A0—reduction axiom).7

    Von Neumann & Morgenstern axiomatise the theory with the undefined primitive binaryrelation ’�’ on the (abstract) convex and nonempty set L (cf. [Neu44, 24ff]). We interpretmembers of L as probability measures on a Boolean algebra of subsets of a set X of outcomes(that is, as lotteries),8 ‘u � v’ as ‘u is preferred to v’ and ‘u ∼ v’ as ‘u is indifferent to v’ ifneither ‘u � v’ nor ‘v � u’. Let u, v, w ∈ L and α, β ∈ [0, 1]:

    A1—order: the binary relation � is asymmetric and transitive; ∼ is transitive.9 (2.1)

    A2—separation (‘independence’): u � v ⇒ αu + (1 − α)w � αv + (1 − α)w.10 (2.2)

    A3—continuity: ∃α, β ∈ [0, 1]|u � v � w ⇒ αu+ (1−α)w � v∧ v � βu+ (1− β)w.11 (2.3)

    2

  • Based on these restrictions on �, von Neumann-Morgenstern’s expected utility theorem estab-lishes the existence of an ordinally unique and in probabilities linear real function U(·) thatrepresents preferences over lotteries meeting A0–A3 such that:

    v � w ⇔ U(v) > U(w). (2.4)

    That is, individuals whose preferences satisfy the above axioms choose as though maximis-ing expected utility. Von Neumann-Morgenstern expected utility functions have the followingcharacteristic form for p = {x1 : p1, . . . , xn : pn} ∈ L:

    12

    U(p) =∑

    piu(xi). (2.5)

    That is, the utility valuation function u(x) used to rank the sure outcomes (to be interpreted asmutually exclusive, final wealth levels) xi is the same for each outcome. This implies that eachoutcome’s utility is totally independent of the valuation of the other outcomes in the lottery;therefore, they can be summed up independently, that is, outcomes are additively separable.13

    Utilities are combined in a multiplicative manner with the probabilities and are subsequentlyadded—just as mathematical expectations are formed. But since we form expectations not oversure outcomes xi but over their utility u(xi), the result is considerably more general than themere mathematical expectation of separable events. We call the expected utility function U(p)the von Neumann-Morgenstern expected utility function and the function u(x) the Bernoulliutility function (cf. [Mas96, 184]).14

    All of the above axioms can be (and have been) challenged. In the present discussion,however, we shall accept the order axiom A1 (implying transitivity and completeness) and thecontinuity axiom A3 as normatively compelling. This will be justified in section five and is notuniversally supported (cf. [Fis88a, 27f]).

    2.2 The core element: the independence axiom

    In general, we would assume that any choice-act under risk depends on the utility of the outcomeand its probability of occurring. Such a general function could look like:

    U(p) = u(p, x). (2.6)

    The independence axiom A2, however, shows a much deeper insight that gives the theory itsvast empirical content and power in determining irrational behaviour (cf. [Mac82]). Its essentialpresumption is that utilities are in a sense the unalterable data of a decision problem whileprobabilities represent some relative frequency of occurrence that may vary with the evidenceavailable. Hence, some separation between the two seems warranted. The structure of ourdiscussion is given by several different attempts to axiomatise the separation of these factorsin a meaningful way that allows for empirical phenomena such as e.g. Allais’ paradox to beaccommodated.

    The independence axiom is a normative argument and can be deduced from von Neumann& Morgenstern, although they did not explicitly state it in their axiomatisation. A statementof the independence axiom alternative to A2 is:

    If the lottery u is preferred (res. indifferent) to the lottery w, then the mixture αu+(1−α)wwill be preferred (res. indifferent) to the mixture αv + (1 − α)w for all α in the unit intervaland w.

    If we mix two lotteries u and v over which there is an established preference relationship witha third lottery w with equal probability 1−α, the original and mixed relationships between u and

    3

  • v coincide. This is the key ingredient of the definition of von Neumann-Morgenstern rationalchoice behaviour. It can be easily shown that the level set15 of functional representations ofpreferences over lotteries u, v and w that satisfies A1–A3 such as (5) exhibits straight, parallellines (indifference curves) in a probability diagram (cf. [Mas96, 175f]).

    The level sets of expected utility functions for n outcomes can be represented in an n − 1dimensional simplex. In its three dimensional case, a simplex ∆ = {p ∈ RN , Σpi = 1} can begraphically represented by an equilateral triangle with altitude 1. Each perpendicular then canbe interpreted as the probability of the outcome at the opposing vertex. Hence, every point inthe triangle represents a lottery. This is shown in figure 1.a for the three sure outcomes £3 �£2 � £1 placed at the vertices. Notice that these vertices represent degenerate lotteries (i.e.pi = 1; pj 6=i = 0). The Lottery Q = {1 : p1, 2 : p2, 3 : p3}, with all probabilities equal to onethird, is drawn in figure 1.a.

    £3

    £2£1

    £3

    £2£1

    (a) (b)

    f+

    f+

    Q

    p1p2

    p3

    1

    ½

    ½

    Figure 1: Level set of an expected utility function in a three-dimensional simplex.

    Assuming u(·) is twice differentiable and increasing, we define the Arrow-Pratt coefficientof absolute risk aversion of Bernoulli utility functions defined over outcomes xi as

    16

    r(x) = −u′′(x)

    u′(x). (2.7)

    Indifference curves that show a coefficient of absolute risk aversion higher than unity havea steeper slope than depictions of the mathematical expectation that are shown as dashed linesin figure 1.b.17 Trivially, the level sets of the mathematical expectation consist of straight,parallel lines as well. Hence, the level set in figure 1.b represents risk-prone behaviour and theset in 1.a risk-averse choices. The arrow points in the preferred direction.

    Bernoulli functions of risk averse agents in [x, u(x)] space are concave—the more concave,the more risk-averse—as depicted in figure 2.a., while those of risk prone agents are strictlyconvex. Notice that this follows directly from Jensen’s inequality.18

    Another technicality we need to touch on, which is concerned with the shape of u(·), isstochastic dominance. We have already seen that we can move from simple lotteries with adiscrete number of outcomes to any finite number of outcomes by using compound lotteries. Byswitching to continuous distribution functions, we can cover an infinite number of outcomes.The probability distribution of F first-order stochastically dominates that of G iff (for non-decreasing u):19

    u(x)dF (x) ≥

    u(x)dG(x). (2.8)

    4

  • u()

    £1

    (a) (b)

    2 3

    u(3)

    u(2)

    u(1)

    ½u(1) + ½u(3)

    u()

    u()

    £1 2 3

    u(1)

    u(3)

    u(2)

    ½u(1) + ½u(3)

    u()

    Figure 2: Bernoulli utility function of a risk averse (a) and a risk prone (b) decision-maker.

    First-order stochastic dominance (fosd) is a straightforward extension of the � relation tothe stochastic case: more is better. � describes a weak order over the continuous probabilitydistributions F and G rather than over the simple measures u and v. So essentially, the discreteprobabilities in a lottery are replaced by a continuous distribution function. Fosd implies thatthe mean of F is larger than that of G; it does not imply, however, that each probability valueof F (xi) is larger than that of G(xi). Actually, the discrete version of the definition of fosd isF (xi) ≤ G(xi) for every xi. Stochastic dominance of distribution F over G means that F givesunambiguously higher returns than G. Since the dominating distribution is always preferred,fosd gives some structure to the preferences of a decision-maker endowed with preferencesmeeting this criterion—these are called stochastic dominance preferences. figure 3.a shows fosdfor the continuous case and 3.b is a discrete version for a lottery with three outcomes.

    cumulative density

    x

    (a) (b)

    1

    G(x)

    cumulative density

    x£1 £2 £3

    0

    F(x)

    1

    G(x)

    0

    F(x)

    .5

    E[F(x)]E[G(x)]

    p3F

    p2F

    p1F

    Figure 3: Two lotteries represented as cumulative distribution functions with supports in theunit interval. Distribution F first-order stochastically dominates G if distribution F is alwaysat or below the distribution G.

    5

  • 2.3 Some paradoxical results

    Its founders took expected utility theory as a normative argument for idealised behaviourunder risk. Its successes, however, made it imperative to test the theory’s predictions aswell. The classical falsifications of the descriptive side of expected utility theory are Allais’argument against the independence axiom, Ellsberg’s paradox against the existence of uniquesubjective probabilities, and, perhaps most fundamentally, Lichtenstein & Slovic’s argumentagainst transitivity of preferences over lotteries (cf. [All53], [Ell61], [Lic71]).

    The fundamental objections to expected utility theory fall into five broad categories: (1)common consequence, (2) common ratio, (3) preference reversal (or, more generally, all intran-sitivity) phenomena, (4) non-existence of subjective probabilities,20 and (5) framing effects.Each of these is taken up in turn.

    2.3.1 Common consequence effect

    The intuition behind paradoxes based on the common consequence effect is summarised by Bellas ,,winning the top prize of $10,000 in a lottery may leave one much happier than receiving$10,000 as the lowest prize in a lottery“ [Bel85]. Essentially, it says that outcomes are notindependent and agents show a higher degree of risk aversion for losses than for gains. Therefore,the independence axiom does not hold and we would expect the expected utility form to beviolated. The Allais paradox depicted in table 2 is the leading example of this class of anomaliesthat shows that this is indeed what happens in experiments (cf. [All53, 89]). It illustrates thenormative tension between the reduction and the independence axioms: if one accepts the prior,one is to discard the latter. The general form of the common consequence argument is:

    b1 αδx + (1 − α)F b2 αP + (1 − α)Fb3 αδx + (1 − α)G b4 αP + (1 − α)G

    Table 1: General form of (row-wise) common consequence effects.

    The paradoxical result is obtained by setting δx to yield x with certainty while the lotteryP contains both larger and smaller outcomes than x. In addition, the probability distributionof F fosds that of G. A concrete example is Allais’ paradox.

    a1 {£1 Mio: 1, £0 Mio: 0} a2 {£5 Mio: .1, £1 Mio: .89, £0 Mio: .01 }a3 {£5 Mio: .1, £0 Mio: .9 } a4 {£1 Mio: .11, £0 Mio: .89 }

    Table 2: Allais’ paradox; usually a1 � a2 and a3 � a4.

    In 2.a we show a set of level curves that represents risk-prone behaviour and in 2.b indif-ference curves of risk-averse attitudes are graphed. Notice that the dotted lines connectingthe choice lotteries in 2 form a parallelogram with the base of the simplex. Therefore, no par-allel, straight level set (i.e. indifference curves consistent with the independence axiom) canrepresent both a1 � a2 and a3 � a4 in either of the two representations. Hence, whatever theattitude towards risk, the choice behaviour described by Allais’ paradox cannot be representedby expected utility theory. Apparently, for this purpose we need a level set where indifferencecurves are not parallel.21

    Indifference curves in the level set of 2.c are not parallel but ‘fan out’ from a point ofintersection south-east of the £1 Mio vertex in order to accommodate preferences that areforbidden by the independence axiom.

    6

  • £5 Mio

    £1 Mio£0

    £5 Mio

    £1 Mio£0

    (a) (b)

    a2a3

    a4

    f +

    a1

    a2a3

    a4

    f +

    (c)

    £5 Mio

    £1 Mio£0a1

    a2a3

    a4

    f +

    a1

    Figure 4: (a) Allais’ paradox with an expected utility level set that represents risk-prone be-haviour, (b) a level set representing risk-aversion. (c) Indifference curves showing the ‘fanningout’ property.

    2.3.2 Common ratio effect

    The most well known examples of this second class of paradoxes are Kahneman & Tversky’scertainty effect and Hagen’s Bergen paradox (cf. [Kah79], [Hag71, 289]). Analytically, thecommon ratio effect is very similar to the common consequence effect.

    d1 {X : p, 0 : 1 − p} d2 {Y : q, 0 : 1 − q}d3 {X : rp, 0 : 1 − rp} d4 {Y : rq, 0 : 1 − rq}

    Table 3: General form of common ratio effects (the ratio p:q equals rp:rq).

    Here, the paradoxical result is obtained by setting p > q, r ∈ (0, 1), and 0 < X < Y . Thespecial case of the certainty effect is illustrated in 3.

    c1 {3.000 : 1, 0 : 0} c2 {4.000 : .8, 0 : .2}c3 {3.000 : .25, 0 : .75} c4 {4.000 : .2, 0 : .8}

    Table 4: Kahneman & Tversky’s certainty effect as a special form of common ratio effect.

    Kahneman & Tversky report that in their experiments 80% of the subjects chose c1 over c2and 65% chose c4 over c3, which again implies that level sets will show fanning out of indifferencecurves contradicting the independence axiom.

    2.3.3 Order problems

    In the preference reversal setting of Lichtenstein & Slovic, the agent is asked to choose betweena number of betting situations of the form shown in table 5 (cf. [Lic71]). Agents are first askedto choose between the two bets and then to give their (true) certainty equivalents22 for the betsin the form of a selling and a buying price. Clearly, the bet that is chosen should also be the onegiven the higher certainty equivalent—this, however, is not in general the case. Lichtenstein &Slovic report that in one particular setting 127 out of 173 subjects chose e1 over e2 but assigneda higher value to e2 than to e1 (therefore the naming p-bet and £-bet).

    7

  • £4.000

    £3.000£0

    £4.000

    £3.000£0

    (a) (b)

    c1

    c2

    c4

    c3

    f +f +

    c1

    c2

    c4

    c3

    Figure 5: (a) Preferences allowing for the certainty effect; (b) Preferences forbidding the effect.

    e1 (p-bet): {X : p, x : 1 − p} e2 (£-bet): {Y : q, y : 1 − q}

    Table 5: Preference reversal when stated as certainty equivalent and payoff.

    If X > x, Y > y, Y > X and p > q, a violation of the order axiom (A1) is usually observedin the agent’s choices. This can be used to the effect that a lottery can be bought from the agentand sold back to her at a premium while leaving her other options unchanged. That is, she canbe used as a money pump. We shall argue below that, upon reflection, such behaviour is notplausible. This is the basis of our acceptance of the order axiom on normative grounds.23 Someof the theories discussed in section four, however, can accommodate violations of transitivity.

    Other systematic intransitivity has been reported by MacGrimmon and Larsson & May (cf.[Mac79], [May54]).

    2.3.4 Framing effects

    Framing effects are disturbing because evidently, the way a decision problem is posed hassome influence on how individuals choose. One of the most persuasive examples reported isdue to Kahneman & Tversky, who ask doctors to choose between two alternative vaccinationprograms, each carried out on 600 persons:

    “Assume that the exact scientific estimate of the consequences of the programs is as follows:If program A is adopted, 200 people will be saved. If program B is adopted, there is a 1/3probability that 600 people will be saved, and a 2/3 probability that no people will be saved.”

    “If program C is adopted, 400 people will die. If program D is adopted, there is a 1/3probability that nobody will die, and a 2/3 probability that 600 people will die.”[Kah88]

    Usually, agents choose program A over B but D over C, although probabilistically A equalsC and B equals D. This behaviour is in violation of a consequence of A1—asymmetry. This classof experiments targets the so-called reference point from which the agent evaluates variationsin individual wealth. The focus on a reference point as incorporated e.g. in prospect theoryis a modification of von Neumann & Morgenstern’s original concept of regarding total wealthas the argument of their Bernoulli utility functions. While normatively compelling, this wasdropped by Kahneman & Tversky after they showed that agents choose inconsistently overlotteries with identical distributions over total wealth (cf. [Kah88]). The particular referencepoint taken by an individual to judge the desirability of a gamble, however, can be influencedby the description of the choice situation—this is not true to the same extent for final wealth

    8

  • levels.Although the above example only hits asymmetry, there are different frames against virtually

    every axiom that has been proposed. Hence, we have to select the axioms we wish to keep ona normative basis. The arguments to do so will be discussed in section five.

    2.4 Static Dutch book arguments

    A Dutch book is a bet where no matter how uncertainty or risk is resolved, the decision-makerwill always lose. A decision situation is said to be dynamic, if it involves some decisions to bemade after (some of) the initial uncertainty is resolved, that is, if at least one information set24

    is more finely partitioned at some point of the decision making as was the case ex ante. Whenall decisions are made on the initial information partitioning, the choice situation is static. Partof the literature claims that only expected utility theory can ensure that preferences are of aform that static Dutch books can never be made (cf. [Web87]). This is wrong for appropriatelydesigned non-expected utility functionals because of an implicit continuity assumption. Thisis well documented in the literature and we shall only pick one example to illustrate this (cf.[Ana93, 79]). Let a risk-averse decision-maker hold preferences over w = {£3 : 2p, 0 : 1 − 2p}and v = {£3 : p, 0 : 1 − p} such that:

    u(w) < u(v) + u(v). (2.9)

    Notice that w′ = {£3 : 2p + ε, 0 : 1 − 2p − ε} and v′ = {£3 : p + ε, 0 : 1 − p − ε} fosd wand v. By continuity, however, we can restate (9) as:

    u(w′) < u(v) + u(v) (2.10)

    which is a direct violation of fosd, although this behaviour seems to be entirely sensible. Hence,we should be suspicious about proofs involving the continuity assumption in this context—notall failures of fosd are troubling.

    Other breakdowns of fosd, however, allow Dutch books to be made. If we use a functionalform which does not employ the Bernoullian riskless intensity notion but uses probabilitiesin some more general form τ(pi), we may have

    τ(pi) 6= τ(1).25 If the inequality is strictly

    greater, we can always extract a positive amount from an individual who holds such preferencesin return for a smaller sure payoff. Kahneman & Tversky’s ’pure’ prospect theory (i.e. themathematical form without the ’editing out’ of dominated outcomes) violates fosd in this wayallowing for Dutch books. More sophisticated functionals (exhibiting stochastic dominance),however, avoid this shortcoming (cf. [Kel93, 17]).26

    All forms of intransitive preferences will lead to the possibility of Dutch books as well.27

    The same is true if someone’s preferences over lotteries are incoherent (i.e. not satisfyingKolmogorov’s axioms of probability)—a fact used by Ramsey in his statement of subjectiveprobability theory.

    2.5 Dynamic Dutch book arguments

    In the dynamic setting where decisions take the form of contingent plans involving sequentialmoves, a distinction can be made between planned and actual choices. If there are no unex-pected events, the planned and actual choices will be consistent. As in the preceding subsection,an important defence of expected utility functions would be to show that only the indepen-dence axiom implies dynamically consistent behaviour, and a Dutch book can always be madeagainst an agent holding non-expected preferences. In general, again, this is not the case—it

    9

  • is true, however, for theories that (implicitly) assume a version of consequentialism (discussedin section four).28

    Our plan is to show that there are reasonable consistency requirements, which imply someversion of utility maximisation. In particular, we want to explore which axioms are compatiblewith the additive separability of states implied by the independence axiom, while still entailingthat agents will choose dynamically consistent (if beliefs are updated using Bayes’ rule). Thereare several possible ways of establishing this.

    Employing the reduction axiom, we can rewrite table 2 in a way that lends itself to thedirect application of the independence axiom.

    a1 {£1 Mio: .89, £1 Mio: .11} a2 {£5 Mio: .1, £1 Mio: .89, £0 Mio: .01}a3 {£5 Mio: .1, £0 Mio: .89, £0 Mio: .01 } a4 { £1 Mio: .11, £ 0 Mio: .89}

    Table 6: An equivalent form of Allais’ paradox; again usually a1 � a2 and a3 � a4.

    This decision situation can be represented in the following extensive form game tree (cf.[Mac89, 57]). Nature’s move at A (A’) precedes the move by the decision-maker, and thereforethis move is included in whatever is chosen at B (B’).

    £5 Mio

    £1 Mio

    £0 Mio

    £1 Mio

    .11

    .89

    10/11

    1/11a2

    a1

    A

    B

    C

    £5 Mio

    £1 Mio

    £0 Mio

    £0 Mio

    .11

    .89

    10/11

    1/11a3

    a4

    A’

    B’

    C’

    Figure 6: The two choices a1 : a2 and a3 : a4 of Allais’ paradox in the extensive form.

    After Nature’s first move at A, the player’s choice at B precisely corresponds to the firstrow of Allais’ paradox in the above table. The choice at B′ corresponds to the table’s secondrow. The highlighted moves are the ones that are usually taken: a1 and a3. Notice that thesubtrees rooted at B and B′ are identical—we therefore denote the choice node by B∗.

    We first record the decision-maker’s choice at B∗ ignoring the preceding part of the tree(this is a consequentialist procedure) as the decision-maker’s planned move for the case that sheshould be to choose at B∗ (i.e. if Nature chooses ‘up’ at A or A′). If she were now put in thesituation to actually decide after Nature’s move (which she has no influence on whatsoever), itwould be paradoxical if she were to change her mind. This is, however, precisely what subjectsin experiments usually do by preferring a1 to a2 and a3 to a4. This can certainly not be broughtinto line with the kind of (temporal) separability the independence axiom allows. We shall usethis as our intuitive notion of dynamic inconsistency between planned and actual moves.

    A different version of dynamic inconsistency again employs the notion of money pumps (cf.[Gre87, 787]). An outside agent (who knows the preferences of the decision-maker) formulates

    10

  • compound lotteries and offers the decision-maker at intermediate stages a reduction in riskfor a beneficial outcome to the outside agent. Thereby the outside agent leads the decision-maker from one probability distribution over wealth to another probability distribution withthe difference being absorbed by the agent.29 If this difference is positive, outsiders flourish,and decision-makers with such preferences will always lose money.

    The route we shall follow for a more formal definition of dynamic consistency was charted byKarni & Schmeidler (cf. [Kar91, 1787ff]). As mentioned above, fosd does not imply separabilityacross sublotteries. However, this is precisely what we need in order to fulfil our dynamicconsistency requirements. We denote the space of all compound lotteries on X as P (X). Letz, y be two elements of the set of compound lotteries P (X) and denote the sublottery z of y asz ∈ y or (z|y). Finally, we define Ψ(X) = {(z|y)| y ∈ P (X), z ∈ y} as the space of lotteries andsublotteries on which the preference relation % is defined. The preference relation % satisfiesdynamic consistency on Ψ(X) if for all quadruples of compound lotteries (y, y′, z, z′) we have:

    C1—dynamic consistency : (y|y) % (y′|y′) ⇔ (z|y) % (z′|y′) (2.11)

    where z ∈ y and y’ is obtained from y by replacing z with z’. This means that if the decision-maker prefers the compound lottery y to y′, then, if the sublottery (z|y) is played, she has noincentive to switch to z′ from z.

    Similarly, we define Hammond’s consequentialism (subsection 4.1.2) where actions are solelyjudged by their consequences as follows: % satisfies consequentialism on Ψ(X) if for all quadru-ples of compound lotteries (y, y′, z, z′):

    C2—consequentialism : (z|y) % (z′|y′) ⇔ (z|ȳ) % (z′|ȳ′) (2.12)

    where z ∈ y, z ∈ ȳ. y’ is obtained from y by replacing z with z′ and similarly ȳ′ is obtainedfrom ȳ by replacing zwith z′.

    Compound lotteries satisfy reduction if for all (y, y′, z, z′):

    C0—reduction : (z|y) % (z′|y′) ⇔ (z̄|ȳ) % (z̄′|ȳ′) (2.13)

    where z ∈ y, z′ ∈ y′, but z̄ is a simple probability measure.30 z̄ is obtained by replacing thecompound lottery z in y by its reduced form, which is obtained by applying the calculus ofprobabilities. z̄′ is defined analogously.

    A representation theorem then establishes that if the preference relation % on Ψ(X) satisfiesC0 and C2, then it also satisfies C1—dynamic consistency—iff it satisfies the independenceaxiom A2 (cf. [Kar91, 1789]). This is our desired result.

    The consequence of this discussion is that we cannot have it all: we cannot weaken theindependence axiom while sticking to the reduction axiom to accommodate ‘paradoxical’ be-haviour and still satisfy dynamic consistency in general. Our formulations of non-expectedutility functions in section four will be measured by their ability to comply with the require-ments imposed by static and dynamic consistency and the dominance criteria developed in thissection. Moreover, we can structure the different approaches by checking which of the aboveaxioms—reduction, consequentialism or dynamic consistency—are given up in order to arriveat the particular non-expected theory.

    3 A minimalist extension of expected utility

    3.1 The inductive model

    To study the argumentation that leads to functional forms of utility functions which are moregeneral than the above-described additive expected utility functions, we first develop a min-

    11

  • imalist extension of expected utility. The idea is to provide the theory with the ability todiscount not only wealth levels (in the form of utility functions) but also probabilities in verysmall or very large regions. In this extension, probabilities have the usual linear relationshipthroughout the interval [p, p̄], but p > 0 and p̄ < 1 are two thresholds where the function isdiscontinuous. More formally:

    U(x) =∑

    v(pi)u(xi), (3.1)

    v(p) ≡

    0 if p < p

    p if p ≤ p ≤ p̄

    1 if p > p̄

    . (3.2)

    If individuals perceive probabilities31 as too low to care for or, symmetrically, too high todoubt the outcome, they consider the respective states as impossible or certain. This seems tobe intuitively sensible as long as different thresholds are allowed for different outcomes. Forinstance, the lower tolerance limit for nuclear accidents may well be considerable lower than thelower limit for what is perceived as crossing a road safely. Two sensible (empirical) assumptionsare:

    dp(x)

    dx< 0, and

    dp̄(x)

    dx< 0. (3.3)

    This expresses the above idea that both the lower and the upper thresholds decrease withgrowing values of the gamble. For very high stakes, one is unwilling to accept the same risk as forrelatively unimportant low stakes. Equivalently, the upper limit is constructed by formulatingthe bet symmetrically, i.e. by asking at which probability the nuclear plant is perceived as‘safe’.

    One interpretation of such behaviour is simple induction. A Bayesian can never actuallyreach probability 1 if there is any counter-evidence—be it as slight as it may. In our inductivemodel, however, we can (discontinuously) jump from a threshold value that is ‘quite sure’ toabsolute certainty.

    The lower level p can be interpreted as the attention span an individual requires to devoteany attention at all to a phenomenon (e.g. the positive probability of breaking through theceiling of a building is usually neglected). p̄, accordingly, can be seen as the probability abovewhich an individual is certain that a phenomenon will occur (e.g. the sun rising tomorrow). Thisis in contrast to the usual Bayesian approach, which requires the density function between thetwo limits to be smooth and (indeed asymptotically) S-shaped. Since we use the interpretationof induction now, the abscissa is labeled H—the set of hypotheses—in figure 7.

    The discontinuous nature of the inductive distribution function means that, in general, weshall lose many of the established economic tools we can use to show the existence of equilibria.We can, however, project our inductive model on a lattice and study probability not definedas in section two as a probability measure on some σ-algebra of attributes but as itself being alattice. An argument in favour of such a strategy is that our inductive model will only be ableto cope with Allais’ paradox if the probabilities in question are either very small or very large.This, however, is not necessarily the case. We can extend our model to what is suggested in6.b: To give up the linear segment as well and only retain a discrete set of probability values.In this view, probabilities themselves constitute a set M that can be partially ordered by arelation �. There are no numerical probability measures but only a weak order on the elementsof M . Addition and subtraction are defined in the usual way by their set-theoretic equivalentsof union and intersection. If the probability ‘sum’ or ‘difference’ hits or exceeds the thresholds,the supremum and infimum (which themselves need not be elements of the lattice) of 1 and 0

    12

  • cumulative density

    H

    (a) (b)

    1

    cumulative density

    H0

    1

    0

    upper threshold

    lower threshold

    linearsegment

    elementsof the set ofprobabilities

    Figure 7: A smooth and a discontinuous cumulative distribution. The S-shaped function (a)represents the Bayesian approach, while the linear function (b) illustrates the inductive ap-proach.

    result. Notice that the minimum difference between two outcome probabilities which agents arenot indifferent towards (i.e. the distance of the lattice-elements) can be arbitrarily small—thesame is true for the distance of the thresholds from 1 and 0. We shall refer to this as the agent’sattention span. There exists an isomorphism between each element of X (outcomes) and theelements of M (probabilities).32

    3.2 The inductive model’s predictions

    The level set of this inductive model would look like figure 8.a in the probability simplex. Thepoint is not that the level set fans out (which would signal a systematic one-sided distortion ofhow probabilities are mapped onto the lattice). Much rather we want to point out that thereare only slight departures from parallelity. Moreover, indifference is not defined in a densefashion but only for the values the probabilities can take (i.e. for the elements of M)—hence,the dotted indifference ‘curves’.

    £3

    £2£1

    f +

    a1

    a2a3

    a4

    (a) (b)

    u(£ )

    p

    (p3,u(£1))

    (p1,u(£1))

    (p3,u(£2))

    (p2,u(£2))

    (p1,u(£2))

    (p3,u(£3))

    (p2,u(£3))

    (p1,u(£3))

    (p2,u(£1))

    p3

    p2

    p1

    u(£1) u(£3)u(£2)

    Figure 8: A level set corresponding to a transitive inductive model and Allais’ paradox is shownin (a). In (b) we give an example of a non-additive union operation yielding a different resultfrom expected utility theory.

    13

  • Why should this be worth our while? The easiest way to demonstrate the general argumentis again graphically. A drawback is that we have to restrict ourselves to two dimensions.33

    It is easy to see from figure 8.b that we can define unions which do not exhibit the additiveseparability of expected utility theory. Hence, we can easily find non-expected formulations. Inexpected utility theory, multiplying the probability with the outcome forms each lattice element;then all elements along the 45o diagonal of the lattice are joined (i.e. added). Separabilitymeans that only one element is evaluated at a time—the off-diagonal elements are ignored.Our union operation is considerably more general and can exclude certain diagonal elementswhile including some off-diagonal elements.

    An example for a non-standard union operation is the set depicted in figure 8.b. In additionto the diagonal elements, it includes the element (p1, u(£1))—for some reason, this element isimportant for the decision. In the above example of the nuclear power plant, we could thinkof the small probability that the plant might explode as affecting all other states. Hence, stateevaluation is not separable.

    3.3 A disclaimer

    The most important step in developing a theory as outlined above is to show that some versionof Kolmogorov’s axioms of probability still holds. This, however desirable, is not a simple taskand will not be attempted here. Moreover, for the purposes of this essay this is not required.We just want to develop an alternative reasoning for the sake of illustrating which argumentsare important—we do not need to show that it actually works. This is done for the moreelaborate theories in the following section.

    4 Non-expected utility theory

    There exist a variety of non-expected utility theories. All are designed to accommodate em-pirically established violations of one or more of the axioms A0-A3. The focus of the presentdiscussion lies in alternatives to the additive separability implied by the independence axiom.The alternative roads that are taken are to discuss early alternatives, the betweenness property,and some form of rank-dependent probability representation. These four different (empiricallyincreasingly weak while mathematically progressively more general) fundamental intuitions giverise to a number of different axiomatisations, which will be discussed in turn. Space limitations,however, force use to discuss only two examples of each category. For each different kind ofseparation, an example is presented in section five, which illustrates the fundamental intuitionbehind this class of theories.34

    4.1 Variations on the additivity property

    If preferences satisfy the independence axiom, outcomes in the respective lottery are indepen-dent of the specific context they are placed in. The theories described in this subsection areextensions or reformulations of expected utility theory that leave von Neumann-Morgenstern’sbasic tenet intact. Their level sets are therefore represented properly by figure 1 and are notduplicated here.

    4.1.1 Bernoulli’s riskless intensity notion

    Using the notation of section two, Bernoulli’s notion of expected utility is

    14

  • U(p) =∑

    p(xi) log(xi). (4.1)

    Quite remarkably and with considerable effects on economics, this is the first statementof the principle of (logarithmically) diminishing marginal ‘moral worth’ or utility of wealth.Obviously, the additive separability implied by the independence axiom is obeyed.

    Since Bernoulli uses a specific (risk averse) function u(·), his version is less general than vonNeumann-Morgenstern expected utility theory. Bernoulli does not place weights on probabilities—hence, the name riskless intensity (cf. [Kar91, 1778], [Fis88a, 50]).

    4.1.2 Hammond’s consequentialism

    The separability of outcomes given by the independence axiom is backed up by Hammond’sconsequentialist analysis. If an agent ignores all information contained in the decision treepreceding her choice node, she acts in a purely consequentialist manner: only the consequencesof the agent’s action count. Hence, Hammond’s version of the independence axiom is that in anyfinite decision tree, everything that influences a decision must be modelled as a consequence. Itis consequences and nothing else that determines behaviour. This axiom—we stated a versionas C2 in (2.12)—justifies studying the normal form (i.e. the decision matrix) of the tree onlyand to neglect the extensive form (i.e. the decision tree) of a decision situation because thereis no additional information contained in the tree.

    Hammond’s axiom is an in principle testable normative hypothesis and, as shown in sub-section 2.5, it implies the independence axiom. As Hammond proves, a consequentialist normprescribing consistent behaviour over a decision tree can be defined, which only depends onconsequences of behaviour. He proceeds to show that there exists a (complete and transitive)revealed preference ordering maximising this norm at any decision node that satisfies boththe independence axiom and a version of the sure-thing principle allowing independent prob-abilities. Finally, he proves that these conditions (consistency, order, independence axiom /sure-thing principle) are a complete characterisation of consequentialist behaviour. Togetherwith an additional continuity axiom, this implies the existence of a von Neumann-Morgensternexpected utility function. That is, consequentialism implies (in its objective probability version)expected utility maximisation.

    To obtain the subjective version of the theory, additional assumptions have to be made. AsHammond points out and contrary to the objective case, it is not the case that all probabilitiesmust be independent—consequently, we arrive at a state dependent utility theory which is basedon multiplicative separability rather than additive separability (cf. [Ham88, 74]). Hence, thereis also a foundation for non-expected utility theory in Hammond. Further support for thisrejection of additive separability comes from Drèze. He shows that the standard hypothesisof possible consequences being independent of the state is completely unacceptable ‘if statesinclude calamities such as accidental death or injuries’. Under these circumstances, the lotteryframework of expected utility theory is not useful and is usually replaced by state-dependentutility formulations (cf. [Drè87], [Mas96, 199ff]).

    The same conclusion, unacceptability of the additive separability assumption if utility de-pends on states, is also implied by another result of the consequentialist approach. One canshow that to avoid inconsistencies of behaviour, there cannot be zero probabilities of conse-quences at any chance node of a decision tree. This is in general not defendable and, hence, theassumption of additive separability cannot be maintained in general. We need non-expectedformulations.

    15

  • 4.2 Early alternatives

    The basic idea of all theories in this category is to assign probability weights to single out-comes in order to accommodate empirically troubling effects, such as Allais’ paradox. Allthese attempts are vulnerable to Dutch books—only the more sophisticated RDU-approachcan circumvent this problem by looking at the complete distribution.

    4.2.1 Intensity theory

    Intensity theory goes back to Allais, who uses the same form as Bernoulli and von Neumann-Morgenstern and extends it with a functional θ(p∗). Indeed, Allais is led to a Bernoulli utilityfunction with a very similar shape to Bernoulli’s own logarithmic function (cf. [All53, 34],[Sug86, 12], [Fis88, 274]).

    V (p) = θ(p∗) +∑

    u(xi)p(xi) (4.2)

    where θ(p∗) is defined as the measure induced by p on the difference of utilities from expectedutility. Allais argues that θ depends at least on the second moment (M2) and Hagen takesthis to the third moment of p∗ (M3).

    35 Hence, the basic idea of this approach is to enrichvon Neumann-Morgenstern’s conception by factors determined by the shape of the probabilitydistribution over outcomes. If θ vanishes, that is, if we neglect all distributional aspects otherthan the expectation, we are back with expected utility. A particularly simple form of θ wouldbe:

    θ = αM2 + βM3. (4.3)

    Allais assumes the sign on the second moment’s influence to be negative, while he assumesβ to be positive. This has the reasonable interpretation of people disliking increased risk andshowing higher risk aversion with respect to good outcomes than to bad ones. The simpleabove form, however, is vulnerable to Dutch book attacks because it does not ensure thatstochastically dominating lotteries are chosen. Loomes & Sugden’s disappointment theoryincorporates this aspect with a straightforward extension that amounts to a locally linear S-shaped utility function (concave for gains but convex for losses) as depicted in figure 17.b (cf.[Sug86, 13]). This type of utility function is the basis of Machina’s local linear approximation ofutility functions (cf. [Mac82]). Since this is not an approach that fits well within our axiomaticframework, we shall not discuss it here.36 The below level set of figure 9.a is, however, applicableto both theories.

    As can be seen from figure 9.b, in Allais’ theory the decumulative probability density isdistorted both horizontally as the effect of the Bernoulli utility function and vertically as theeffect of the functional θ(p∗). For a decumulative density function that shows only the effect ofexpected utility functions see figure 15.a.

    Allais does not assume that decision-makers maximise the expected value of their risklessutility since he considers only one-off choice situations where ‘it would be wrong to consider thata strategy of maximising the mathematical expectation would be a good first approximation’(cf. [All53, 73, 92ff]). In that, he departs from Bernoulli or von Neumann-Morgenstern. Allais’basic principles are the reduction axiom A0 and a version of the order axiom A1 that hecalls ‘axiom of absolute preference’. This axiom contains the assumption that decision-makerspreferences satisfy fosd because otherwise consistency cannot be ensured.

    16

  • £3

    £2£1

    f +

    (a) (b)

    1

    £0

    p+

    Figure 9: (a) A level set corresponding to Allais’ non-expected utility theory and (b) thedecumulative probability representation of the same theory.

    4.2.2 Prospect theory

    This very influential descriptive approach is due to Kahneman & Tversky (cf. [Kah79). Theydeveloped their theory as a direct response to their experimental evidence against expectedutility theory. In contrast to the other theories presented here, it is a descriptive theory andhas no normative standing. Indeed, Kahneman & Tversky believe that no theory can existthat is descriptively accurate and normatively appealing (cf. Fis88a, 26]). The normativetouch Kahneman & Tversky give their theory is the invariance property of preferences todifferent frames (cf. [Fis88a, 27]). As mentioned above, lotteries in this theory are not aboutfinal wealth levels but about deviations from a certain reference point.

    Prospect theory distinguishes two successive stages: the editing phase and the evaluationphase. At the editing stage, decision-makers contemplate the choice situation and, if possible,simplify the problem. This includes the operations of coding, combining, segregating, rounding,and cancelling that, in essence, amount to the usual manipulation rules we applied abovefor simplifying Allais’ paradox from the representation of table 2 to 5. The most importantadditional operation included in the editing stage is the detection of dominated prospects thatare ruled out and discarded. Therefore, prospect theory is not solely based on the probabilitydistribution over the ultimate payoffs but on additional factors, which are seen as importantingredients of the individual’s actual choice—in this, the theory is similar to regret theory.The activities of the editing stage are below referred to as methodological twist—withoutthem, prospect theory would be prone to Dutch book attacks and, being unable to modelintransitivity, descriptively false.

    In the subsequent evaluation phase, the prospects are ranked, and the most highly valuedrisky outcome is chosen. Prospect theory employs two functionals: a probability weightingfunctional τ(p) as shown in figure 10.b and a Bernoulli utility function measuring gains andlosses u(x), which is S-shaped to reflect Kahneman & Tversky’s findings that individuals arerisk-prone towards losses and risk-averse towards gains (cf. figure 17.b). This amounts to:

    τ(pi)u(xi). (4.4)

    In general, τ(p)+τ(1−p) 6= 1; hence, τ(·) cannot be a simple measure. Prospect theory reducesto the expected utility form if τ(·) is the identity mapping. That this is indeed not in generalthe case in order to ensure stochastic dominance preferences is due to the methodological twists

    17

  • of the editing stage.

    £3

    £2£1

    f +

    (a) (b)

    1

    01.5 .75.25

    .5

    .75

    .25

    (p)

    p

    (p)

    Figure 10: (a) A level set corresponding to prospect theory and a weighting function τ(·) in(b).

    Interpreted in a purely descriptive fashion, prospect theory allows Dutch books—only if weallow for methodological twists, the theory is an alternative to expected utility theory.

    4.3 The betweenness property

    A weakened form of the independence axiom A2 called betweenness axiom says that the pref-erence ranking of a probability mixture of any two lotteries is always intermediate between theindividual lotteries.37 We leave the order axiom A1 unchanged and therefore retain the weakorder � and the equivalence order ∼. Again let p, q ∈ P (X), with α ∈ (0, 1):38

    B2—separation (‘betweenness’): p � q ⇒ p � αp + (1 − α)q � q. (4.5)

    This implies for every p, q ∈ P (X) with p ∼ q and α ∈ (0, 1) that

    p ∼ αp + (1 − α)q ∼ q, (4.6)

    that is, if the decision-maker is indifferent between two lotteries, she is indifferent to any mixtureof the lotteries as well. Figure 11.b. illustrates this point. Betweenness is clearly implied byindependence; this is easy to see since (4.6) implies both quasi-concavity and quasi-convexityof � on P (X).39 Therefore, level curves are linear, but they are not parallel and they need notall emanate from a single point (cf. [Che98, 211]). Hence, mixtures between lotteries are notfully separable. A picture may clarify the difference between independence and betweenness:

    For all functionals V (p) that satisfy betweenness, we have for all α ∈ (0, 1):

    V (p) > V (q) ⇒ V (p) > V (αp + [1 − α]q) > V (q). (4.7)

    In general, a more appropriate formulation of continuity B3 has to be supplied (cf. [Kar89,431], [Kar91, 1772]). Since its interpretation is equivalent to A3 we do not duplicate it here.Axiomatisations in this subsection are based on axioms A0, A1, B2, and B3.

    4.3.1 Weighted (linear) utility theory

    Weighted utility theory is based on an axiomatisation of the betweenness property by Chew andMacGrimmon called the weak substitution axiom (cf. [Che83], [Mac79]). The independence

    18

  • £3

    £2£1

    (a) (b)

    p

    q

    i

    £3

    £2£1

    f +p

    q

    i

    Figure 11: Three lotteries with i = αp + (1−α)q, α ∈ (0, 1); (a) illustrates (4.6) and (b) showsthat (4.7) forbids non-linear level curves.

    axiom requires that mixtures of two distributions with the same expectation with anotherdistribution in the same proportions share the same mean, regardless of the third distributionthey are mixed with. The weak substitution axiom allows the above mixture proportions, whichgive rise to the same mean value, to be different (cf. [Che83, 1068]).

    One interpretation of weighted utility theory is in terms of a transformation of the ratio oftransformed probability to probability as a function of the outcome x. If the positive weight-ing functional τ(·) is low for highly ranked outcomes and high for lowly ranked outcomes, theresulting distortion of probabilities implies overestimation of the disregarded outcome’s proba-bility and underestimation of the highly regarded outcome’s probability. An appropriate choiceof τ may well represent a pessimistic (res. optimistic) attitude towards risk (cf. [Kar91, 1775]).We use:

    V (p) =

    piu(xi)∑

    piτ(xi)(4.8)

    to separate pi and pj : The final utility attached to an outcome is given by the ratio of thetwo ‘expected utility’ functions u/τ . If τ(·) is identical 1, the above formulation reduces to theexpected utility form. In general, however, the weighted linear utility function is given by theratio:

    pi > pj ⇒u(pi)

    τ(pi)>

    u(pj)

    τ(pj). (4.9)

    In the simplex representation, indifference curves for this form are straight, though notin general parallel, and intersect at a point outside the simplex. Here the intersection ofindifference curves does not imply intransitivity because indifference curves are not definedoutside the simplex: preference structures over outcomes outside the choice set do not matter(cf. [Sug86, 11]).

    4.3.2 Regret and SSB theory

    Regret theory as defined by Loomes & Sugden is a normative approach that uses a modifiedutility functional r and allows for intransitivity by dropping axiom A1 (cf. [Loo82]). Fish-burn’s independent axiomatisation rests on the betweenness and continuity axioms (cf. [Fis82],[Fis88a, 68f]). Since the feeling of regret is a fundamentally individual capacity, the theory

    19

  • £3

    £2

    £1

    f +

    Figure 12: A level set corresponding to weighted linear utility theory.

    is founded on subjective probabilities; in that it differs from all other theories discussed here.Different from the subsequently discussed rank-dependent theories, regret theory is based onaspects other then the probability distribution over the ultimate payoffs (viz. on regret). There-fore, important ingredients of individual choice, such as variations in the temporal resolutionof the uncertainty or of the payoffs themselves, can be included (cf. [Gre88, 377]). A simpleformalisation of regret theory defines the modified utility function r as:

    r(xi, xj) = u(xi) + R[u(xi) − u(xj)]. (4.10)

    The functional r is designed to accommodate a regret factor along with the usual Bernoullianriskless intensity notion u. The idea is that if action p is chosen from {p,q} and consequencexi obtains as a result of the choice, one may rejoice if u(xi) > u(xj), but experience regretif u(xj) > u(xi) where u(xj) is the consequence that would have resulted if action q hadbeen chosen. Depending on the regret / rejoice functional R, regret theory can ensure eitheru(xi) > u(xj) or u(xj) > u(xi) and therefore accommodate different intensities.

    Decision-makers then maximise the non-expected utility of choosing p rather than q:

    V (p) =∑

    i

    j

    r(xi, xj)p∗i pj . (4.11)

    The full power of the theory is only reached when a skew-symmetric bilinear (SSB) functionalis used that represents preferences by a SSB functional φ on L × L. φ is skew-symmetric ifφ(xi, xj) = -φ(xj, xi) for all xi, xj ∈ L, and it is bilinear if it is linear separately in each argument(cf. [Fis88, 275]). φ is defined as:

    φ(xi, xj) = r(xi, xj) − r(xj , xi). (4.12)

    SSB theory requires in its axiomatised form only the very weak substitution axiom ([Kar91,1776]). The major achievement of this extended formulation is that it can model statisticallydependent prospects which (4.11) cannot. Decision-makers maximise:

    V (p) =∑

    i

    j

    φ(xi, xj)p∗i pj . (4.13)

    As mentioned above, this freedom in assigning regret or rejoice to the same choice using aSSB functional allows for the modelling of cyclic (i.e. intransitive) behaviour that is observedin e.g. voting.40 In the simplex representation, indifference curves can intersect at one interior

    20

  • point. With such a weakened structure we can accommodate all of the paradoxes discussed insection two since our level set can exhibit both the fanning out and intransitivity properties.

    £3

    £2£1

    f +

    £2

    £1

    f +

    (a) (b)

    £3

    Figure 13: Level sets corresponding to SSB theory. The intersection of indifference curves canbe both inside the simplex as in (a) or outside as in (b). Since SSB theory generally can allowfor intransitivity, cycles are not excluded.

    4.4 Rank-dependent probabilities

    The non-expected utility theories based on Quiggin’s rank-dependent probabilities (RDU) ex-tend Allais’ and Hagen’s idea that utility evaluation should be based on more than just thefirst moment (i.e. the expectation) of (cardinal) utility (cf. [Qui93, 55]).41 The RDU approachis characterised by ordering outcomes prior to the application of a utility-representation. Theranking is used to assign a weight to an outcome depending on the relative rank of this outcomein the whole probability distribution of outcomes (i.e. lotteries). The weighting functional isdefined on the cumulative probability distribution p+ as τ :[0,1]→[0,1]. If it is the identity func-tion, we are back with expected utility theory, but for convex τ(·) with increasing elasticitywe can accommodate fanning out. This approach escapes the fosd vs. Dutch book problem byrelying on a transformation that considers the whole structure of a risky prospect. Similarly,sophisticated formulations such as the ordinal independence approach can avoid higher-orderdominance problems as well.42 RDU theories keep the reduction, order, and continuity axioms(A0, A1, A3) but reformulate the separation axiom A2.

    RDU-approaches that cannot be discussed here are Quiggin’s anticipated utility theory andnon-Lebesgue-measure approaches as pioneered by Segal.

    4.4.1 Dual theory

    An approach dual to the von Neumann-Morgenstern’s was axiomatised by Yaari for u(xi) = xi(cf. [Yaa87]). Here, the wealth levels are undiscounted, but the attitude towards probabilitiescan be specified. Hence, in this case the utility function is linear in wealth rather than inprobability:

    V (p) =∑

    τ(p+(xi))u(xi). (4.14)

    In Yaari’s model, the decision-maker’s attitude towards risk is not reflected in the curvatureof the Bernoulli utility function but in the way the decumulative probability distribution func-tion is distorted when a lottery is evaluated by the decision-maker. The difference to expected

    21

  • £3

    £2£1

    f +

    (a) (b)

    1

    01.5 .75.25

    .5

    .75

    .25

    (p)

    p

    (p)

    Figure 14: A level set corresponding to dual theory. Notice that Yaari’s approach is linear inoutcomes rather than in probabilities. The weighting functional τ(p+) in (b) compares withthe unrestricted form of 10.b.

    utility theory is shown in figure 15: Yaari’s decumulative distribution function is verticallydistorted, while the expected utility function is horizontally distorted (cf. [Mun87, 11, 23]).

    decumulative density

    £

    (a) (b)

    1

    decumulative density

    £0

    1

    0

    Figure 15: The horizontal corrections made by expected utility theory with respect to a lotterywith a linear distribution function are shown in (a) while (b) shows the vertical correctionsmade by dual theory.

    As seen in figure 15, expected utility theory replaces for each decumulative density levelp+ the outcome x by its valuation u(x)—hence, the horizontal distortion. Dual theory goesthe opposite way: For each outcome level x, the probability density is distorted verticallyby replacing p+ it with τ(p+). Therefore, in Yaari’s model it is the curvature of τ(p+) thatrepresents attitudes towards risks. Again, concavity represents risk-aversion (cf. figure 9.b).

    An axiomatisation of the dual theory is based on the dual independence axiom (cf. [Qui93,148]).

    4.4.2 Ordinal independence

    Green & Jullien replace the independence axiom by their ordinal independence axiom to weakenthe implied linearity to quasi-convexity of preferences (cf. [Gre88, 255], [Qui93, 149]). In

    22

  • repeated betting situations ordinal independence amounts to two conditions:

    • The decision-maker never agrees to (intermediate) outcomes that are fosd’d by the pre-vious distribution.

    • An outsider proposing a bet to the decision-maker should not be able to extract a profitwith a positive mean.

    Green & Jullien conclude that for any quasi-convex preference relation and any initial wealthlevel, no manipulation exists that leads the agent to a stochastically dominated alternative.Notice that a quasi-convex but non-linear preference relation can be manipulated from someinitial random wealth but not from any non-random wealth. Whenever we can eliminatedistributions that would indeed lead to the possibility of manipulations (e.g. by looking at thetotal ranking of outcomes), we have a strong argument for quasi-convexity and, hence, onlya weak argument for additivity. This result supports similar results by Kreps & Porteus andMachina (cf. [Kre78], [Mac84], [Gre87, 788]).

    In its spirit, the theory is an extension of Quiggin’s anticipated utility theory. The ideabehind ordinal independence is that if two distributions over payoffs share a common tail,then this common tail can be modified for both distributions without altering the individual’spreference between these distributions. The shared segments do not affect the ranking of thedistributions—preference is determined only by the interval on which the two distributionsdiffer. This is considerably weaker than the independence axiom and is illustrated in figure 16.

    cumulative density

    X

    1

    0

    .5

    G(x)

    F(x)

    a b

    Figure 16: Distributions F and G share the same tails—they only differ on S = [a, b]. (G isnot necessarily a mean-preserving spread of F .)

    The interpretation of the ordinal independence axiom contains some resemblance to thepsychological concepts of editing in e.g. prospect theory by editing out the part where thedistributions coincide. While the independence axiom implies additive separability along asingle dimension, the utility functionals implied by the ordinal independence axiom exhibitadditive separability along multiple dimensions (i.e. the Gorman-from, cf. [Mas96, 119f]).

    Let the preference relation % be complete, transitive, and continuous. Then the ordinalindependence axiom for the distributions F and G, S ⊂ X, is defined as (cf. [Gre88, 357],[Qui93, 149]):D2—ordinal independence: If F % G and

    i) F̃ (x) = G̃(x) ∧ F (x) = G(x)|x ∈ S, and ii) F̃ (x) = F (x) ∧ G̃(x) = G(x)|x /∈ S, (4.15)

    then F̃ (x) % G̃(x).

    23

  • Together with appropriate forms of the order and continuity axioms and suitable monotonic-ity (i.e. stochastic dominance) assumptions, this axiom defines a class of non-expected utilityfunctionals that we can compare in its discrete form to alternative functionals (cf. [Qui93, 57]):

    V (p) =

    n∑

    i=1

    u (xi)

    [

    τ

    (

    i∑

    j=1

    p(xj)

    )

    − τ

    (

    i−1∑

    j=1

    p(xj)

    )]

    . (4.16)

    This formulation allows for the above-mentioned S-shaped (concave-convex) utility functionsthat are able to accommodate the Friedman-Savage paradox of simultaneous gambling andinsurance (cf. [Che83, 1082], [Gre88, 372].

    £3

    £2£1

    f +

    (b)

    u(x)

    u(x)

    (a)

    Figure 17: (a) A level set corresponding to ordinal independence theory and an S-shapedBernoulli utility function over monetary gains and losses in (b).

    Figure 17.a shows a level set for both an S-shaped Bernoulli function—as in 17.b—and anS-shaped weighting functional as in 14.b. This bestows the theory with enormous flexibilitybut also incredible weakness in determining irrational behaviour.

    5 Implied notions of rationality

    The axioms of expected utility theory prescribe a particular form of rationality for homo eco-nomicus. It is obvious from the previous discussion that there are a lot of different axiomsthat compete for this role. The most prominent are alterations of the independence axiom, butthe other axioms are by no means sacrosanct either. Therefore, the need arises to reflect uponthe axioms intuitive appeal in order to determine which set of axioms we are most inclined toadopt.

    The main reason why expected utility is still the most widely used theory for modellingdecisions under risk and uncertainty is its normative appeal to many researchers (cf. [Har92,320]). Normative approaches are concerned with consistency and coherence requirements ofrational preferences that are mostly formulated as (in themselves rather) convincing axioms.They do not necessarily have descriptive accuracy but, upon reflection, the axioms shouldconvince people that their choices are wrong if the axioms are violated: People ought to behaveas prescribed by the theory, although they sometimes make errors and do not. The normativeinterpretations of the two competing approaches discussed here are far less developed. Theirclaims to be reasonable foundations of rationality are nevertheless strong.

    24

  • For convenience, we shall first state a somewhat more verbose version of the axioms A0-A3in a concrete example. It is, however intuitive, not a full statement and therefore analyticallyinferior to the prior statement (cf. [Bin94, 272f]).

    Let X be a set of sure outcomes of a decision under risk and let w, b ∈ X be its worstand best elements. S = {b : p, w : 1 − p} is a simple lottery with only two outcomes. LetC = {b : p, x : q, w : r} be a 2-stage compound lottery. We assume the existence of stochasticallyindependent objective probabilities. For every outcome x ∈ X there exists a probability p∗ suchthat

    E0—reduction: x ∼ {b : p∗, w : 1 − p∗} , b ∼ {b : 1, w : 0} and w ∼ {b : 0, w : 1} . (5.1)

    E1—order: Higher probabilities p∗ in {b : p∗, w : 1 − p∗} are preferred. (5.2)

    Since (5.1) gives indifference of x, b and w with {b : p∗, w : 1 − p∗}, {b : 1, w : 0} and {b : 0, w : 1},we can substitute the latter for the prior in the compound lottery C:

    E2—independence: {{b : 1, w : 0} : p1, {b : p∗, w : 1 − p∗} : p2, {b : 0, w : 1} : p3} . (5.3)

    E0–E2 amount to the agent maximising the expectation of the lottery

    {{b : 1, w : 0} : p1, {b : p∗, w : 1 − p∗} : p2, {b : 0, w : 1} : p3} ,

    which equals p1b + p2p∗b + p3w. If we define u(b)=1, u(w)=0, and u(x) = p

    ∗b, we get theexpected utility form U(C) = p1u(1) + p2u(p

    ∗) + p3u(0) =∑

    pu(·).43 Hence, agents behave asif maximising expected utility.

    5.1 A0: Reduction

    A simple justification of the reduction axiom goes like this. If p, q are lotteries, e.g. p={1:.4,0:.6}, q={2:.5, 0:.5}, then a mixture of the two is not trivial and can be interpreted as choicefrom a 2-stage compound lottery—which implies the reduction axiom (cf. [Kar91, 1769]). Thejustification for the existence of objective probabilities lies in the fact that the probabilitiescome from mixed strategies—strategy profiles that are obtained by the employment of a ran-domisation device such as a coin toss. Since each mixed strategy is based on its own device,it is quite reasonable to assume complete stochastic independence of the underlying probabil-ities.44 This property commands support for the reduction principle if the generated utilityfunctions are used for game theoretic analysis. If this is not the case, A0 may be indefensible(cf. [Drè87]).

    5.2 A1: Order

    Axiom A1—order gives completeness, reflexivity, and transitivity of our primitive preferencerelation. The completeness property can easily be attacked. It stems from the analyticalrequirement of ex ante and constant preferences over outcomes and says that the decision-makermust be able to compare all pairs of possible risky outcomes, which is obviously unrealistic.It is, however, much easier to defend the complete preordering requirement in the specialisedexpected utility context than for general goods-bundles, since it is quite plausible that peoplehave a universal (i.e. complete) preference order over final wealth levels. Hence, we rule outincommensurability of prospects.

    Reflexivity requires p ∼ p which is a mere technical requirement that always holds.Transitivity is more serious. The idea behind transitivity is that people will always want to

    correct intransitive behaviour if they discover it in their preordering simply because if they do

    25

  • not, Dutch books can be made against them (cf. [Fis88a, 10]). The problem is more subtle,however, as the following example may illustrate.

    In apparent violation of transitivity, an individual chooses Salmon while Hot Dog is alsoavailable in restaurant A and therefore reveals her preferences as Salmon � Hot Dog. In adifferent restaurant B, both are again available, but she chooses Hot Dog over Salmon. Wecannot represent these preferences by a rational preference ordering. The story, however, goeson: The individual knows that it is outright dangerous to eat Salmon in a bad restaurant andshe would rather have a Hot Dog in a place like that where nothing much can go wrong. Hereffective preferences are therefore {Salmon, good restaurant} � {Hot Dog, good restaurant}� {Hot Dog, bad restaurant} � {Salmon, bad restaurant}. In the second choice situation,the decision-maker presumed from the general cleanliness of the place, area of the restaurant,competence of the waiter etc. that restaurant A is ‘good’ and B is ‘bad’. Stated like this, thereis no problem in expressing the individual’s preferences by a rational preordering.

    This restatement reinforces the logical transitivity property by putting more pressure onthe completeness assumption—we conjecture that we can resolve all intransitivity with thisstrategy (cf. [Ana93, 103]). Hence, we view the transitivity hypothesis as justified as long ascompleteness is a reasonable assumption and conclude that intransitivity is a sign of irrationalbehaviour.

    5.3 A2: Separation

    5.3.1 Independence

    Samuelson’s reasoning in favour of the independence axiom can be summarised as follows (cf.[Sam52, 672]). We flip a coin that has a probability of showing tails of (1-α); in this case wewin the lottery w. If we are independently asked to decide whether we prefer lottery u or v inthe case the coin shows heads, we have duplicated the setting of the independence axiom. Ifthe coin lands tails, our preference over u and vdoes not matter—we get w in any case, but ifthe coin shows heads, we are back to precisely the decision between u and v, independently ofw.

    In general, the utility valuation of the outcome in one state depends on the valuation indifferent states of nature. Apparently, in the coin example, the outcomes heads and tailsare mutually exclusive and therefore it is immaterial what would have happened in the otherstate: the independence axiom is applicable—and quite rightly so.45 If events are exclusiveand lotteries are over total wealth levels, it is indeed rational to act like this. If heads occurs,tails cannot have occurred—hence, the state realisations are independent. The point of non-expected utility theory is that apparently there exist choice situations where this independencecannot be granted.

    We conclude that the independence axiom is applicable iff the decision-maker’s preferencesare separable among outcomes in the sense that each utility-ranking of an outcome is identicalregardless in what state it happens. There are choice situations, however, where it is rationalnot to treat events as independent. Hence, we cannot retain the independence axiom for ageneral theory of choice under risk.

    5.3.2 Betweenness

    As we have seen in subsection 4.3, betweenness is the property that for every p, q ∈ P (X), andα ∈ (0, 1):

    p ∼ q ⇒ αp + (1 − α)q ∼ p. (5.4)

    26

  • If a decision-maker is indifferent between two lotteries, she is also indifferent between anyconvex combination of the two lotteries. Under the above assumptions, this implies both quasi-concavity and quasi-convexity of preferences:

    p ∼ q ⇒ αp + (1 − α)q % p, p ∼ q ⇒ p ∼ p % αp + (1 − α)q (5.5)

    but does not require the other restrictions that the independence axiom places on preferenceswhich imply additive separability. We use (5.4) to give an intuitive example of what betweennessmeans (cf. [Gra98, 15]).

    I recently bought a CD-Writer, a so called ‘toaster’. This little machine allows me to ‘burn’CDs. The technology, however, is not very mature: approximately one in five copies doesnot work and the CD can only be thrown away. This is annoying because burning a CD onmy double-speed toaster takes a lot of time—approximately 40 minutes (”) for a 80” CD andproportionally less time for less data. Moreover, it only shows if the CD is broken at the veryend of the process. There are very cheap 18” CDs, cheap 75” and not so cheap 80” CDs.46

    Now suppose I want to take a weekend off and plan to burn a CD to take some data withme. If the CD were useable on the one hand, I would prefer to have all the documents I havewritten recently with me on a 80” CD because I could re-use many bits of these. On the otherhand, if the CD turned out to be broken, I would prefer to have burned only my dissertationdraft on a cheap 18” CD since then both the wasted money and time would be minimised.Alternatively, I could also burn an audio-CD and just enjoy a nice weekend and listen to somemusic on the train. There is an analogous choice situation between my beloved Pet Shop Boysalbum on an 80” CD and just a single song from it on an 18” CD. I am quite indifferent betweenactually working on the full data set and listening to the complete album, but I would preferto have only my dissertation to listening to the same song over and over again.

    The example can be formalised as follows: p is the probability that the burned CD worksand (1-p) that it does not; Nature decides which state occurs. Let dis be the Bernoulli-valuationof my dissertation, and all that of the complete data set, si the utility of the short song, andlp that of the long album. bd18 is the outcome of the useless broken 18” data-CD, which Iprefer to the outcome bd80 of the broken 80” data-CD. ba18 denotes the broken 18” audio-CDpreferred to ba80. According to the above preferences we have:

    (lp ∼ all) � (dis � si), (bx18 � bx80). (5.6)

    The corresponding lotteries look like this:47

    t1 {all : p, bd80 : 1 − p} t2 {dis : p, bd18 : 1 − p}t3 {lp: p, ba80 : 1 − p} t4 {si : p, ba18 : 1 − p}

    Table 7: The toaster example in lotteries.

    The extensive form of the choice situation is depicted in figure 5.3.2.In the setting of the independence axiom, we can calculate independently of each state

    the expectations of the lotteries ti and act according to expected utility maximisation. Inparticular, we apply the independence axiom to eliminate the equivalent choices all and lp andwe are done.

    To illustrate betweenness, however, we argue as follows. Ex ante, we choose under risk andform plans of how to proceed. There are two equally valued, ‘optimal’ plans (all |CD works)and (lp|CD works). Condition (5.4) together with (5.6) require that the decision maker isindifferent between any mixture of the two optimal plans. Hence, any mixture of the optimal

    27

  • all

    bd80

    dis

    bd18

    p

    1-p

    long

    N

    short

    long

    short

    (a)

    lp

    ba80

    si

    ba18

    p

    1-p

    long

    N

    short

    long

    short

    (b)

    CD works

    CD broken

    CD works

    CD broken

    Figure 18: The two choices of the toaster example in decision trees.

    plans αoall |CD works)+(1-α)(lp|CD works) is another optimal plan. This is clearly not anindependence of the two optimal plans but a kind of ‘shuffling’ condition. It also illustratesthat there is nothing like consequentialism in this choice—at the contrary: the decision is basedon past information.

    In the context of the decision trees of figure 5.3.2, betweenness does not imply that thedecision-maker’s choices at terminal nodes are independent of what her choices would be at other(unrealised) terminal nodes. (The independence axiom would require just that.) Betweennessdoes imply, however, that if there are several optimal plans, the decision-maker’s choices donot depend on which optimal plan she would have followed at prior nodes (cf. [Gra98, 15]).

    5.3.3 Regret

    As Sugden points out, “you feel disappointed when what is compares unfavourably with whatmight have been” [Sug86, 16]. This is the rationalisation behind regret theory, which is anapparently weaker form of separation than complete independence of events. Even if events aremutually exclusive, a decision-maker might feel regret that the other outcome had not realised.Hence, the valuations of exclusive states are not independent. Since the experience of regretshould be intuitively clear, we shall not expand further.

    5.3.4 Rank-dependent axioms: ordinal independence

    The basic idea of rank-dependent utility theory is to provide a probability weighting whileavoiding dominance problems. Formulations are representable by the mathematical expec-tations of a utility function (over outcomes) with respect to transformations of the wholeprobability distribution of outcomes. Since the prior ranking of each outcome is decisive forthe probability-weight it is assigned, all outcomes are taken into account for calculating theprobability-weight of each outcome. Therefore, it is possible for two outcomes with the sameobjective probability to have different weights—which contrasts with the separability dictatedby the independence axiom.48 An example may illustrate this (cf. [Qui93, 63f]):

    Imagine a decision-maker whose expected annual income is drawn from the interval £10.000-£11.000. There is also a 1/100.000 chance for her to win in a lottery and receive an income of£1 Mio. All realisation probabilities in [10.000, 11.000] are 1/100.000 as well. It is reasonable,however, to assume that the extreme outcome will receive a different final weight than theones in the wage interval. Furthermore, it is quite realistic that this weight will depend on theranking of all possible outcomes—something quite impossible if outcomes were independent.

    28

  • 5.4 A3: Smoothness

    The third axiom (continuity) cannot be defended globally but is a consequence of our desireto obtain real-valued utility functions. As is well known, lexicographical preferences cannotbe represented by a continuous utility functional because the notion of one alternative beinginfinitely preferred to the other cannot be modelled (cf. [Mar98]). Thus, a continuity axiomexcludes a whole class of intuitively rather plausible preferences. To Julia the world is infinitelyworse without Romeo—no bundle is desirable for her if it does not include Romeo.

    It is worth noting, though, that a rejection of the continuity axiom leaves von Neumann’s &Morgenstern’s theory itself unscathed—their set theoretical axiomatisation does not requirecontinuous preferences. This fact is exploited by Hammond in his formulation of a non-Archimedean axiom that still allows the construction of an (extended) expected utility rep-resentation (cf. [Ham97, 26]). Hence, while the particular form A3 may not be convincing,formulations exist that are. We therefore accept smoothness as one criterion of rationality.

    6 Conclusion

    This essay focuses on the narrow interpretation of rationality in the context of choice situationswhere the course of the unfolding future is not known to the decision-maker. A series of alter-native formulations that are intended to model such situations have been put forward. For thecritical appraisal of these competing theories, we use reasonable criteria such as static and dy-namic consistency of choice, and persuasive rules of behaviour such as lower payoff-expectationsnot to be preferred to higher ones (fosd), or higher risk not to be preferred to lower-risks whenexpectations are equal (sosd). We suggest that we should accept the reduction, order, and con-tinuity axioms (A0, A1, A3) and focus our criticism on the independence axiom in cases whereoutcomes cannot be viewed as mutually exclusive. We identify a small group of alternatives toindependence centred on the ideas of betweenness and rank-dependence and discuss both theirnormative and descriptive appeals. These are irresistible iff the strict independence of eventsassumed by the independence axiom cannot be ensured.

    Very influential people like Raiffa, Savage, or Harsanyi hold that expected utility theory isthe only adequate normative theory for choice under risk. In the face of the systematic empiricalviolations that were established in numerous experiments, this view is difficult to uphold.The basic result of this paper is that several normative non-expected utility theories havea stronger claim to convincingly model rationality than von Neumann-Morgenstern’s theoryhas. Nevertheless, due to their technical complexity and weakly explored implications, none ofthe alternative theories we have at our disposal today commands comparable support in theliterature. They are but valuable stepping stones in the direction of a universally acceptedtheory of choice—the remaining problems are formidable.

    Notes

    1Both the technical terms risk and uncertainty refer to situations where an action leads toa number of known outcomes among which one is selected randomly. A choice is referred to asmade under risk when the underlying probability distributions are known objectively (imperfectbut complete information). In the case of uncertainty, these probability distributions are notobjectively known—(differing) subjective beliefs have to be formed (incomplete information).

    29

  • 2Von Neumann & Morgenstern interpret probabilities in terms of relative frequency as op-posed to Ramsey’s interpretation of subjective degrees of belief (cf. [Neu44]). Hence, vonNeumann-Morgenstern’s theory is about risk and Ramsey’s and Savage’s theories are aboutuncertainty. Today both cases are mostly subsumed under the latter. If we were in the presentapproach to think of the set of outcomes as a primitive and the lotteries as acts, the theory ofdecision-making under risk would be analogous to decision-making under uncertainty.

    Von Neumann-Morgenstern’s theory has been developed to represent mixed strategies in thegeneral solution of zero sum games. Therefore, the probabilities in von Neumann-Morgenstern’slotteries are an implication of using mixed strategies. Since all mixed strategies are taken to bebased on independent randomisation devices, the assumption of the existence of independentobjective probabilities is justified in the context of games.

    3Recent contributions that compare expected utility theory’s predictions with those of non-expected refinements are Hey & Orme and Carbone & Hey. The basic conclusion to be drawnfrom this work seems to be that von Neumann-Morgenstern’s expected utility theory fares asgood as its non-expected contenders in about 40-55% of the tested cases. In the remainingsituations, one or another of the non-expected models shows better results (cf. [Hey94, 1321],[Car95, 131]).

    4In essentially one-off situations (as e.g. horse races), we cannot conduct the necessaryrepeated experiments to establish the relative frequency of the event in question. In thesesituations, objective probabilities do not exist, and we have to recur to subjective degrees ofbelief. To illustrate the difference, Anscombe & Aumann call uncertain lotteries horse lotteriesand risky lotteries roulette lotteries (cf. [Ans63]).

    As Karni and Schmeidler point out, however, the theoretical arguments to apply non-expected utility theories that obey only the betweenness or rank-dependency properties touncertainty are still to be found (cf. [Kar91, 1810]). Hence, we are forced to remain in theframework of analysing risk.

    5Insightful alternative routes such as state-dependent utility theory, that gives up the re-quirement that all decisions are reducible to lotteries (cf. [Drè87]), are not elucidated. The sameapplies to evolutionary (learning) approaches such as Binmore’s (cf. [Bin99]) and the field oflocal expected utility analysis where the preference functional (i.e. a real valued function)is mod-ified to accommodate the empirical violations while preserving in a surprisingly robust mannermost of the results of expected utility theory locally. Space limitations make it impossible todiscuss interesting and important empirical work (cf. [Hey94], [Har94]).

    6A word on the overall approach is in order here. Section three consists of original workwhile the rest of the paper takes the form of a specialised survey. Though attempting notto be excessively formal, a precise statement of the theories under investigation requires someformal apparatus that in some cases may exceed an introductory level. Where necessary andpossible, however, this material is supplemented by examples which should convey the gist ofthe argument.

    7Notice that the reduction axiom ensures that the players in the game are not playingfor their enjoyment. As Binmore points out, a gambler is unlike


Recommended