Download - Learning Intransitive Reciprocal Relations with Regularized Least-Squares

European Journal of Operational Research 206 (2010) 676–685

Contents lists available at ScienceDirect

European Journal of Operational Research

journal homepage: www.elsevier .com/locate /e jor

Interfaces with Other Disciplines

Learning intransitive reciprocal relations with kernel methods

Tapio Pahikkala a,*, Willem Waegeman b, Evgeni Tsivtsivadze c, Tapio Salakoski a, Bernard De Baets b

a Turku Centre for Computer Science (TUCS), University of Turku, Department of Information Technology, Joukahaisenkatu 3-5 B, FIN-20520 Turku, Finlandb KERMIT, Department of Applied Mathematics, Biometrics and Process Control, Ghent University, Coupure Links 653, B-9000 Ghent, Belgiumc Institute for Computing and Information Sciences, Radboud University, Heyendaalseweg 135, 6525 AJ Nijmegen, The Netherlands

a r t i c l e i n f o a b s t r a c t

Article history:Received 25 June 2009Accepted 8 March 2010Available online 15 March 2010

Keywords:TransitivityReciprocal relationsUtility functionsKernel methodsPreference learningDecision theoryGame theory

0377-2217/$ - see front matter � 2010 Elsevier B.V. Adoi:10.1016/j.ejor.2010.03.018

* Corresponding author. Tel.: +358 2 2154204; fax:E-mail addresses: [email protected] (T. Pa

@ugent.be (W. Waegeman), [email protected]@utu.fi (T. Salakoski), Bernard.DeBaets

In different fields like decision making, psychology, game theory and biology, it has been observed thatpaired-comparison data like preference relations defined by humans and animals can be intransitive.Intransitive relations cannot be modeled with existing machine learning methods like ranking models,because these models exhibit strong transitivity properties. More specifically, in a stochastic context,where often the reciprocity property characterizes probabilistic relations such as choice probabilities,it has been formally shown that ranking models always satisfy the well-known strong stochastic transi-tivity property. Given this limitation of ranking models, we present a new kernel function that togetherwith the regularized least-squares algorithm is capable of inferring intransitive reciprocal relations inproblems where transitivity violations cannot be considered as noise. In this approach it is the kernelfunction that defines the transition from learning transitive to learning intransitive relations, and the Kro-necker-product is introduced for representing the latter type of relations. In addition, we empiricallydemonstrate on two benchmark problems, one in game theory and one in theoretical biology, that ouralgorithm outperforms methods not capable of learning intransitive reciprocal relations.

� 2010 Elsevier B.V. All rights reserved.

1. Introduction

We start with an introductory example in the field of sportsgames in order to describe the purpose of this paper. Let us assumethat an online betting company for tennis games wants to buildstatistical models to predict the probability that a given tennisplayer will defeat his/her opponent in the next Grand Slam compe-tition. The company could be interested in building such models tomaximize its profit when defining the amount of money that a cli-ent gets if he/she is able to predict the outcome of the game cor-rectly. To this end, different types of data could be collected inorder to construct the model, such as previous game outcomes,strong and weak points of players, current physical and mentalconditions of players, etc. Yet, which type of machinery is requiredto obtain accurate predictions in this type of data mining prob-lems? Firstly, as we will discuss in more detail below, we are forthis example looking for an algorithm capable of predicting reci-procal relations from data, i.e., a relation between couples of play-ers leading to a probability estimate of the outcome of a game.Secondly, we are also looking for a model that can predict intran-sitive relations, since commonly in sports games it turns out that

ll rights reserved.

+358 2 2410154.hikkala), Willem.Waegemanience.ru.nl (E. Tsivtsivadze),@ugent.be (B. De Baets).

game outcomes manifest cycles such as player A defeating playerB, B defeating a third player C, and simultaneously C winning fromA.

So, this paper in general considers learning problems whereintransitive reciprocal relations need to be learned. As mathemati-cal and statistical properties of human preference judgments, reci-procity and transitivity have been a subject of study for researchersin different fields like mathematical psychology, decision theory,social choice theory, and fuzzy modeling. Historically, this kind ofresearch has been motivated by the quest for a rational character-ization of human judgments, and to this end, transitivity is often as-sumed as a crucial property (Diaz et al., 2008). This propertybasically says that a preference of an object xi over another objectxj and a similar preference of xj over a third object xk should alwaysresult in a preference of xi over xk, if preference judgments are madein a rational way. Nevertheless, it has been observed in several psy-chological experiments that human preference judgments oftenviolate this transitivity property (see e.g. Azand, 1993; Tversky,1998), especially in a context where preference judgments are con-sidered as uncertain, resulting in non-crisp1 preference relations be-tween objects.

Contrary to some approaches taken in fuzzy set theory anddecision theory, we adopt a probabilistic view of expressing

1 In this work a relation is called crisp, when it can take only three values, e.g. 0 if Ains from B, 1 if B wins from A and 0.5 in case of a tie.
w
http://dx.doi.org/10.1016/j.ejor.2010.03.018

mailto:[email protected]

mailto:Willem.Waegeman @ugent.be

mailto:Willem.Waegeman @ugent.be




http://www.sciencedirect.com/science/journal/03772217

http://www.elsevier.com/locate/ejor

T. Pahikkala et al. / European Journal of Operational Research 206 (2010) 676–685 677

uncertainty in decision behavior, as it is for example the case in so-cial choice theory and mathematical psychology, where preferencerelations are often called binary choice probabilities. In this prob-abilistic framework, it can be assumed that a preference relationdefined on a space X satisfies the reciprocity property.

Definition 1.1. A function Q : X2 ! ½0;1� is called a reciprocalrelation if for any ðx;x0Þ 2 X2 it holds that

Qðx;x0Þ þ Qðx0;xÞ ¼ 1:

The reciprocity property was already taken into considerationin the early work of Luce and Suppes (1965) in mathematical psy-chology. In addition, the same authors also introduced several sto-chastic transitivity properties like weak, moderate and strongstochastic transitivity to characterize rational preference judg-ments in a probabilistic sense. Let us recall the definition of weakstochastic transitivity.

Definition 1.2. A reciprocal relation Q : X2 ! ½0;1� is calledweakly stochastically transitive if for any ðxi;xj; xkÞ 2 X3 it holdsthat

ðQðxi; xjÞP 1=2 ^ Qðxj;xkÞP 1=2Þ ) Qðxi;xkÞP 1=2: ð1ÞThis definition of transitivity for reciprocal relations naturally

extends the basic definition of transitivity for crisp relations. Be-low, when we speak about intransitive reciprocal relations, we spe-cifically allude to reciprocal relations violating weak stochastictransitivity. In addition, we will also utilize strong stochastic tran-sitivity a few times in this paper. This stronger condition is definedas follows.

Definition 1.3. A reciprocal relation Q : X2 ! ½0;1� is calledstrongly stochastically transitive if for any ðxi;xj;xkÞ 2 X3 it holdsthat

ðQðxi; xjÞP 1=2 ^ Qðxj;xkÞP 1=2Þ) Qðxi;xkÞP maxðQðxxi; xjÞ;Qðxj;xkÞÞ:

Many other transitivity properties for reciprocal relations havebeen put forward in recent years, but these properties will not bediscussed here. Moreover, many of these properties can be ele-gantly expressed in the cycle-transitivity framework. We refer toDe Baets et al. (2006) for an overview of this framework and thevarious transitivity properties it covers.

As for crisp relations, several authors observed that stochastictransitivity properties are often violated. This is definitely the casefor strong stochastic transitivity (Garcia-Lapresta and Mesenes,2005), but sometimes even weak stochastic transitivity can be vio-lated (Switalski, 2000). As a consequence, there has been a long de-bate on interpreting this absence of transitivity. If preferencejudgments are considered as rational human decisions, then oneshould neglect the transitivity violations and apply traditionaltransitive models to represent this type of data. Although such asimplification makes sense in certain situations, different authorsargued that often these transitivity violations describe potentialtruths of reasoned comparisons (see Fishburn, 1991 for a review).As a result, several authors have constructed models for paired-comparison data that are able to represent intransitive judgmentsexplicitly (e.g. Carroll et al., 1990; Tsai and Böckenholt, 2006).

The motivation for building intransitive reciprocal preferencerelations might be debatable in a traditional (decision-theoretic)context, but the existence of rational transitivity violations be-comes more appealing when the notion of a reciprocal preferencerelation is defined in a broader sense, like in the introductoryexample, or generally as any binary relation satisfying the reci-procity property. For example, reciprocal relations in game theory

violate weak stochastic transitivity, in situations where the beststrategy of a player depends on the strategy of his/her opponent– see e.g. the well-known rock-scissors-paper game (Fisher,2008), dice games (De Schuymer et al., 2003, 2006, 2009) andquantum games in physics (Makowski and Piotrowski, 2006). Fur-thermore, in biology many examples of intransitive reciprocal rela-tions have been encountered, like in competition between bacteria(Kerr et al., 2002; Czárán et al., 2002; Nowak, 2002; Kirkup and Ri-ley, 2004; Károlyi et al., 2005; Reichenbach et al., 2007) and fungi(Boddy, 2000), mating choice of lizards (Sinervo and Lively, 1996)and food choice of birds (Waite, 2001). Other examples of intransi-tive reciprocal relations can be found in order theory, when consid-ering mutual ranking probabilities of the elements of a partiallyordered set (De Baets et al., submitted for publication; De Loofet al., 2010).

Generally speaking, we believe that enough examples exist tojustify the need for models that can represent intransitive recipro-cal relations. In this article we will address the topic of construct-ing such models based on any type of paired-comparison data.Basically, one can interpret these models as a mathematical repre-sentation of a reciprocal preference relation, having parametersthat need to be statistically inferred. The approach we take findsits origin in machine learning, as a generalization of existing utilityor ranking models. These models have been popular in areas likeinformation retrieval and marketing for predicting decisions ofweb users and clients of e-commerce applications (see e.g. Kalishand Nelson, 1991; Joachims, 2002). Utility or ranking models byconstruction possess weak (and often even strong) stochastic tran-sitivity properties, rendering them unsuitable for representingintransitive preference judgments in an accurate way. As a solu-tion, we will extend an existing kernel-based ranking algorithmthat has been proposed recently by some of the present authors(Pahikkala et al., 2007, 2009b). This algorithm has been calledRankRLS, as it optimizes a regularized least-squares (RLS) objectivefunction on paired-comparison data that is represented as a graph.

This article is organized as follows. In Section 2 we give a shortreview on the role of transitivity in decision making and its con-nection to ranking models. Using the notions weak and strong sto-chastic transitivity, we in particular claim that ranking methodsalways exhibit certain transitivity properties that makes them use-less for representing intransitive reciprocal relations. Then, in Sec-tion 3 we start with a brief introduction to kernel methods,followed by a discussion of a general kernel-based framework forlearning reciprocal relations. We show that existing ranking mod-els are included in this framework via a particular choice of kernelfunction. We prove that these models cannot learn intransitive re-ciprocal relations. Subsequently, we formally claim and prove howin our framework another kernel, based on the Kronecker-product,is able to represent intransitive reciprocal relations in a much moreadequate way. Finally, we present in Section 4 experimental resultsfor two benchmark problems, demonstrating the advantages of ourapproach over traditional (transitive) ranking algorithms.

2. From transitive to intransitive preference models

In order to model preference judgments one can distinguishtwo main types of models in decision making (Öztürk et al.,2005; Waegeman et al., 2009):

(1) Scoring methods: these methods typically construct a contin-uous function of the form f : X! R such that:

x � x0 () f ðxÞP f ðx0Þ;

which means that alternative x is preferred to alternative x0 ifthe highest value was assigned to x. In decision making, f is

2 It sdefinedSuppessometimonoto

678 T. Pahikkala et al. / European Journal of Operational Research 206 (2010) 676–685

usually referred to as a utility function, while it is called aranking function in machine learning.2

(2) Pairwise preference models: here the preference judgmentsare modeled by one (or more) valued relationsQ : X2 ! ½0;1� that express whether x should be preferredover x0. One can distinguish different kinds of relations suchas crisp relations, fuzzy relations or reciprocal relations.

The former approach has been especially popular in machinelearning for scalability reasons. The latter approach allows a flexi-ble and interpretable description of preference judgments and hastherefore been popular in decision theory and the fuzzy set com-munity, see e.g. (Mousseau et al., 2001; Doumpos and Zopounidis,2004; De Baets and De Meyer, 2005; Dias and Mousseau, 2006).

The semantics underlying reciprocal preference relations is of-ten probabilistic: Qðx;x0Þ expresses the probability that object xis preferred to x0. One can in general construct such a reciprocalor probabilistic preference relation from a utility model in the fol-lowing way:

Qðx;x0Þ ¼ gðf ðxÞ; f ðx0ÞÞ; ð2Þ

with g : R2 ! ½0;1� usually increasing in its first argument anddecreasing in its second argument (Switalski, 2003). Examples ofmodels based on reciprocal preference relations are Bradley–Terrymodels (Bradley and Terry, 1952; Agresti, 2002) and Thurstone-Case5 models (Thurstone, 1927). They have been applied in a ma-chine learning context by Chu and Ghahramani (2005), Herbrichet al. (2007), Radlinski and Joachims (2007) and Hüllermeier et al.(2008).

The representability of reciprocal and fuzzy preference relationsin terms of a single ranking or utility function has been extensivelystudied in domains like utility theory (Fishburn, 1970), preferencemodeling (Öztürk et al., 2005), social choice theory (Dasgupta andDeb, 1996; Fono and Andjiga, 2007), fuzzy set theory (Billot, 1995)and mathematical psychology (Luce and Suppes, 1965; Doignonet al., 1986). It has been shown that the notions of transitivityand ranking representability play a crucial role in this context.

Definition 2.1. A reciprocal relation Q : X2 ! ½0;1� is calledweakly ranking representable if there exists a ranking functionf : X! R such that for any ðx;x0Þ 2 X2 it holds that

Qðx;x0Þ 6 12() f ðxÞ 6 f ðx0Þ:

Reciprocal preference relations for which this condition is satis-fied have also been called weak utility models. Luce and Suppes(1965) proved that a reciprocal preference relation is a weak utilitymodel if and only if it satisfies weak stochastic transitivity, as de-fined by (1). As pointed out by Switalski (2003), a weakly rankingrepresentable reciprocal relation can be characterized in terms of(2) such that the function g : R2 ! R satisfies

gða; bÞ > 12() a > b; gða; bÞ ¼ 1

2() a ¼ b:

Analogous to weak ranking representability or weak utility models,one can define other conditions on the relationship between Q and f,leading to (stronger) transitivity conditions like moderate andstrong stochastic transitivity. These properties are satisfied respec-tively by moderately and strongly ranking representable reciprocalpreference relations. For such reciprocal relations one imposesadditional conditions on g, for example the following type of

hould be emphasized that the notion of a utility function is often ambiguouslyin the literature. We adopt in this paper the rather mild definition of Luce and(1965), while more recent papers in economics and decision theory

mes require more restrictive properties for a utility function, such asnicity, concavity and continuity.

reciprocal relation satisfies strong stochastic transitivity (Luce andSuppes, 1965).

Definition 2.2. A reciprocal relation Q : X2 ! ½0;1� is calledstrongly ranking representable if it can be written as in (2) withg given by

gðf ðxÞ; f ðx0ÞÞ ¼ Gðf ðxÞ � f ðx0ÞÞ; ð3Þ

where G : R! ½0;1� is a cumulative distribution function satisfyingGð0Þ ¼ 1

2.In addition, other transitivity conditions and corresponding

conditions on G have been defined, such as strict ranking repre-sentability. This last property of reciprocal preference relations issatisfied by the Bradley-Terry model, a classical model forpaired-comparison data (Bradley and Terry, 1952). A further dis-cussion on ranking representability is however beyond the scopeof this paper. More details and proofs can be found in Luce andSuppes (1965); Carroll et al. (1990); Ballinger and Wilcox (1997);Tversky (1998); Switalski (2003); Dhzafarov (2003); Zhang(2004) and Waegeman and De Baets (submitted for publication).

3. Learning intransitive reciprocal relations

In this section we will show how intransitive reciprocal rela-tions can be learned from data with kernel methods. During thelast decade, a lot of interesting papers on preference learning haveappeared in the machine learning community, see e.g. (Herbrichet al., 2000; Freund et al., 2003; Crammer and Singer, 2001; Chuand Keerthi, 2007). Many of these authors use kernel methods todesign learning algorithms. The majority of them also considersutility approaches to represent the preferences. Only a few authorssuch as Hüllermeier et al. (2003) and Chu and Ghahramani (2005)talk about pairwise preference relations, assuming weak stochastictransitivity so that an underlying ranking function exists.

We first explain the basic ideas behind kernel methods, fol-lowed by a discussion of a general framework for learning intran-sitive reciprocal relations. In this framework ranking can be seen asa special case, with a particular choice of the kernel function. Tolearn intransitive reciprocal relations, we then define a new typeof kernel over pairs of data objects. We will formally prove thatusing this kernel we always learn relations that are reciprocal,but do not necessarily fulfill weak stochastic transitivity. Thisnew kernel can be seen as a general concept that can be pluggedinto other kernel-based ranking methods as well, but in this paperwe will illustrate its usefulness with the RLS algorithm. As thismethod optimizes a least-squares loss function, it is very suitablefor learning reciprocal relations if the mean squared error mea-sures the performance of the algorithm.

3.1. A brief introduction to kernels

This section is primarily based on Schölkopf and Smola (2002),Shawe-Taylor and Cristianini (2004). A better and much more de-tailed introduction to kernel methods can be found in these works.Given a not further specified input space E that shows at this mo-ment no correspondence with the space X defined in the previoussection, let us consider mappings of the following form:

U : E!F;

e#UðeÞ:

The function U represents a so-called feature mapping from E to F

and F is called the associated feature space. Initially, kernels wereintroduced to compute the dot-product h�; �i in this feature spaceefficiently. Such a compact representation of the dot-products in a


certain feature space H will in general be called a kernel with thenotation

hUðe1Þ;Uðe2Þi ¼ Kðe1; e2Þ:

For a given sequence e1; . . . ; eN of objects, let us define the Grammatrix K of a given kernel as Ki;j ¼ Kðei; ejÞ. Kernel functions result-ing in positive semi-definite Gram matrices always yield a dot-product. As a consequence, data analysis methods based ondot-products can always be rewritten in terms of kernels. Kernelversions have been proposed for classification, regression, cluster-ing, principal component analysis, independent component analysisand many other methods (Shawe-Taylor and Cristianini, 2004).These algorithms are quite general, because the class of models con-sidered is simply changed by replacing the kernel function.

Kernels can be interpreted as similarity measures, allowing tomodel similarity of complex data objects. The specific form of thekernel function is domain-dependent and usually constructed bythe data analyst (Schölkopf and Smola, 2002). Since the introduc-tion of kernels, similarity measures have been proposed for a largenumber of complex data types, like trees, graphs, strings, text, sets,images, DNA-sequences, etc. In this paper we will restrict our dis-cussion to kernels for vectorial data. The most basic kernel for vec-tors one can think of is the one for which U defines the identitymapping, i.e.

Kðe1; e2Þ ¼ he1; e2i:

This similarity measure is called a linear kernel, since it defines lin-ear models. Alternatively, if interactions up to d features are al-lowed, one can use a polynomial kernel of degree d, i.e.

Kðe1; e2Þ ¼ he1; e2id:

Another popular kernel function for non-linear modeling is theGaussian RBF kernel

Kðe1; e2Þ ¼ e�cke1�e2k2;

where c is a parameter determining the width of the kernel, result-ing in an infinite-dimensional feature map U. Many other kernels,like spline kernels and ANOVA kernels, exist, but are rarely em-ployed in practice.

In the context of kernel methods, we also have the concept ofso-called regularized bias (see e.g. Rifkin (2002)). With this, we re-fer to the approach in which an extra constant valued dimension isadded to the feature mapping. Consequently, the kernel value thenchanges into Kðe1; e2Þ þ b2, where b is the extra constant feature.

Following the standard notations for kernel methods, we for-mulate our learning problem as the selection of a suitable functionh 2H, with H a certain hypothesis space, in particular a kernelreproducing Hilbert space (RKHS). Hypotheses h : E! R are usu-ally denoted as hðeÞ ¼ hw; ei with w a vector of parameters thatneeds to be estimated based on training data. Let us denote a train-ing dataset as a sequence

E ¼ ðei; yiÞNi¼1; ð4Þ

of input-label pairs, then we formally consider the following varia-tional problem in which we select an appropriate hypothesis h fromH for training data E. Namely, we consider an algorithm

AðEÞ ¼ arg minh2H

1N

XN

i¼1

LðhðeiÞ; yiÞ þ kkhk2H ð5Þ

with L a given loss function and k > 0 a regularization parameter.The first term measures the performance of a candidate hypothesison the training data and the second term, called the regularizer,measures the complexity of the hypothesis with the RKHS norm.In our framework below, a squared loss is optimized in (5):

LðhðeÞ; yÞ ¼ ðhðeÞ � yÞ2: ð6Þ

Optimizing this loss function instead of the more conventionalhinge loss has the advantage that the solution can be found by sim-ply solving a system of linear equations. We do not describe in de-tail the mathematical properties and advantages of this approachcompared to more traditional algorithms, since that is not in thescope of this paper. More details can be found for example in Rifkin(2002) and Suykens et al. (2002).

According to the representer theorem (Schölkopf and Smola,2002), any minimizer h 2H of (5) admits a dual representationof the following form:

hðeÞ ¼XN

i¼1

aiKðe; eiÞ ¼ hUðeÞ;wi;

where ai 2 R;K is the kernel function associated with the RKHSmentioned above, U is the feature mapping corresponding to K, and

w ¼XN

i¼1

aiUðeiÞ:

We will alternate several times between the primal and dual repre-sentation for h in the remainder of this article.

3.2. Learning reciprocal relations

We will use the above framework in order to learn intransitivereciprocal relations. To this end, we associate in a preference learn-ing setting with each input a couple of data objects, i.e. ei ¼ ðxi;x0iÞ,where xi;x0i 2 X and X can be any set. Consequently, we have ani.i.d. dataset

E ¼ ðxi; x0i; yiÞNi¼1;

so that for each couple in the training dataset a label is known.These labels will represent reciprocal relations observed on trainingdata, but rescaled to the interval [�1,1]. This means that the follow-ing correspondence holds

y ¼ 2Qðx;x0Þ � 1; 8ðx;x0Þ 2 X2:

Such a conversion is primarily made for ease of implementation.This implies that we will minimize the regularized squared errorso that a model of type h : X2 ! R is obtained. An additional map-ping G : R! ½0;1� is required to ensure that [0,1]-valued relationsare predicted:

Qðx;x0Þ ¼ Gðhðx; x0ÞÞ: ð7Þ

In our framework, we will aim to find such a Q that minimizes themean squared error between the true and predicted reciprocal rela-tions. When using a least-squares loss function, we can equivalentlysearch for a model h that minimizes the same mean squared errorand choose G as follows:

GðaÞ ¼0; if a < �1;ðaþ 1Þ=2; if � 1 6 a 6 1;1; if a > 1:

8><>: ð8Þ

Furthermore, to guarantee that reciprocal relations are learned, letus suggest the following type of feature mapping:

UðeiÞ ¼ U xi;x0i� �

¼ W xi; x0i� �

�W x0i;xi� �

;

where U is just the same feature mapping as before but now writtenin terms of couples and W is a new (not further specified) featuremapping from X2 to a feature space. As shown below, this construc-tion will result in a reciprocal representation of the corresponding[0,1]-valued relation. By means of the representer theorem, theabove model can be rewritten in terms of kernels, such that two


different kernels pop up, one for U and one for W. Both kernels ex-press a similarity measure between two couples of objects and thefollowing relationship holds:

KUðei; ejÞ ¼ KU xi;x0i; xj;x0j� �

¼ W xi; x0i� �

�W x0i;xi� �

;W xj;x0j� �

�W x0j;xj

� �D E

¼ W xi; x0i� �

;W xj;x0j� �D E

þ W x0i;xi� �

;W x0j; xj

� �D E

� W xi; x0i� �

;W x0j;xj

� �D E� W x0i; xi

� �;W xj;x0j� �D E

¼ KW xi;x0i;xj; x0j� �

þ KW x0i;xi; x0j;xj

� �

� KW x0i;xi; xj;x0j� �

� KW xi; x0i;x0j;xj

� �:

Using this notation, the prediction function given by the representertheorem can be expressed as:

hðx;x0Þ ¼ hw;Wðx;x0Þ �Wðx0;xÞi ¼XN

i¼1

aiKU xi; x0i;x;x

0� �:

For this prediction function, we can easily show that it forms the ba-sis of a reciprocal relation.

Proposition 3.1. Let G : R! ½0;1� be a cumulative distributionfunction satisfying Gð0Þ ¼ 0:5 and Gð�aÞ ¼ 1� GðaÞ, then the func-tion Q : X2 ! ½0;1� defined by (7)with h : X2 ! R given by (9), is areciprocal relation.

Proof. One can easily see that hðx; x0Þ ¼ �hðx0;xÞ for all x;x0 2 X.The proof then immediately follows from:

Qðx;x0Þ þ Qðx0;xÞ ¼ Gðhðx; x0ÞÞ þ Gðhðx0;xÞÞ¼ Gðhðx; x0ÞÞ � Gðhðx; x0ÞÞ þ 1 ¼ 1: �

3.3. Ranking: learning transitive reciprocal relations

Using the above notation, utility or ranking functions are usu-ally written as

f ðxÞ ¼ hw;/ðxÞi: ð9Þ

They can be elegantly expressed in our framework by defining aspecific feature mapping and corresponding kernel function.

Proposition 3.2. If KW corresponds to the transitive kernel KWT

defined by

KWT xi;x0i; xj;x0j� �

¼ K/ðxi; xjÞ ¼ h/ðxiÞ;/ðxjÞi;

with K/ any two-dimensional kernel function on X2, whose value de-pends only on the arguments xi and xj and their feature representations/ðxiÞ and /ðxjÞ, then the reciprocal relation Q : X2 ! ½0;1� given by(7) is strongly stochastically transitive.

Proof. With the above representation for KW and Kðx;x0Þ ¼h/ðxÞ;/ðx0Þi, the feature mapping W, further denoted as WT , is forthis model defined by

WTðx;x0Þ ¼ /ðxÞ:

So, only the first element of the couple is taken and the second ele-ment is simply ignored. Because of that, the model can be written as

hðx;x0Þ ¼ hw;/ðxÞi � hw;/ðx0Þi ¼ f ðxÞ � f ðx0Þ;

with f defined by (9) such that Q takes the form of (3) without anyfurther specified G. So, Q is strongly ranking representable. As a con-

sequence, Q is also strongly stochastically transitive. We refer to(Luce and Suppes, 1965) for this proof. h

For this choice of KW, our framework is reduced to a populartype of kernel function that has been introduced by Herbrich etal. (2000). The insight of the proposition is that the use of this ker-nel is equivalent to constructing a ranking for the individual inputs.This ranking function is in the dual representation given by:

f ðxÞ ¼ hw;/ðxÞi ¼XN

i¼1

ai K/ðxi; xÞ � K/ðx0i;xÞ� �

:

As explained in Section 2, ranking results in a reciprocal relationsthat satisfies the weak stochastic transitivity property. Due to theabove proposition, we can even claim that the resulting reciprocalrelation satisfies strong stochastic transitivity. Different rankingmethods are obtained with different loss functions, such as Rank-SVM (Joachims, 2002) for the hinge loss and RankRLS (Pahikkalaet al., 2007, 2009b) for the least-squares loss.

3.4. Learning intransitive reciprocal relations

Since the above choice for W forms the core of all kernel-basedranking methods, these methods cannot generate intransitive rela-tions, i.e. relations violating weak stochastic transitivity. In order toderive a model capable of violating weak stochastic transitivity, weintroduce the following feature mapping WI for couples of objects:

WIðx;x0Þ ¼ /ðxÞ � /ðx0Þ;

where /ðxÞ is again the feature representation of the individual ob-ject x and � denotes the Kronecker-product, which is defined asfollows:

A� B ¼

A1;1B � � � A1;nB

..

. . .. ..

.

Am;1B � � � Am;nB

0BB@

1CCA;

where A and B are matrices and Ai;j is the i; jth element of A. Kernelfunctions induced by this type of feature maps have also been con-sidered under the name tensor product kernels (see e.g.Westonet al. (2005)) or Kronecker kernels (see e.g. Kashima et al. (2009))and the Kronecker product has also been used to construct kernelsbased of linear feature transformation (see e.g. Pahikkala et al.(2009a)).

We use the following property of the Kronecker product:

ðA� BÞðC � DÞ ¼ ðACÞ � ðBDÞ;

where A 2 Ra�b;B 2 Rc�d;C 2 Rb�e, and D 2 Rb�f . The Kronecker-product establishes joint feature representations UI and WI that de-pend on both arguments of U and W. Instead of ignoring the secondargument of U and W, we now represent all pairwise interactionsbetween individual features of the two data objects in the joint fea-ture representation. Using the notation KW

I , this leads to the follow-ing expression:

KWI xi;x0i; xj;x0j� �

¼ /ðxiÞ � / x0i� �

;/ðxjÞ � / x0j� �D E

¼ /ðxiÞ;/ðxjÞ� �

� / x0i� �

;/ x0j� �D E

¼ K/ðxi;xjÞK/ x0i;x0j

� �;

with again K/ any kernel function defined over X2. As a result, usingthe Kronecker-product as feature mapping basically leads to a verysimple kernel in the dual representation, consisting of just a regularproduct between two traditional kernels K/. Remark that K/ can beany existing kernel, such as the linear kernel, the RBF-kernel, etc. Asa result of the above construction, the kernel function KU becomes:

KUI xi; x0i;xj;x0j� �

¼ 2K/ðxi;xjÞK/ x0i; x0j

� �� 2K/ðx0i;xjÞK/ xi; x0j

� �:

We further refer to KUI as the intransitive kernel.


Indeed, in the above extension of the ranking framework, twodifferent kernels KW and K/ must be specified by the data ana-lyst, while the third kernel KU is defined by the choice for KW.On the one hand, the choice for KW (and hence KU) determineswhether the model is allowed to violate weak stochastic transi-tivity. On the other hand, the kernel function K/ acts as the tra-ditional similarity measure on X, resulting in a linear,polynomial, radial basis function or any other representation ofthe data.

We now present a result indicating that the intransitive ker-nel KU

I can be used to learn arbitrary reciprocal preference rela-tions, provided that the feature representation / of theindividual objects is powerful enough. It is important to empha-size that the proposition does not impose any assumption on theloss function.

Proposition 3.3. Let E be a training dataset of type (4), letL : R2 ! Rþ be a loss function, and let HR : X�X! R be the setof all hypotheses inducing a reciprocal relation on X. Moreover, let

h�ðx;x0Þ ¼ arg minh2HR

XN

i¼1

L yi;h xi;x0i� ��

ð10Þ

be the set of hypotheses inducing a reciprocal relation on X that have aminimal empirical loss on E. Further, let

hðx;x0Þ ¼XN

i¼1

aiKUI ðxi;x0i;x;x

0Þ

¼XN

i¼1

ai2 K/ðxi; xÞK/ x0i;x0� �� K/ðx0i;xÞK

/ðxi; x0Þ� �

; ð11Þ

where ai 2 R, be the set of hypotheses we can construct using theintransitive kernel KU

I and a given feature representation / of a basekernel K/. There exists a feature representation / and coefficients ai

for which the corresponding hypothesis (11) is one of the minimizersof (10).

Proof. First of all, remark that the above proposition does notmake any assumption on the loss function. As mentioned in Sec-tion 3.1, in experiments we will consider the squared loss (6),but the proof we give here holds for other loss functions too. Westart by defining the reciprocal relation which is the solution to(10).

The training set E may contain several couples that have thesame two data objects either in the same or in the opposite order,while their labels may be noisy in such a way that there would beno reciprocal relation that would have a zero loss on the wholetraining set. Therefore, we define

Zþi ¼ jjj 2 f1; . . . ;Ng;xj ¼ xi;x0j ¼ x0in o

;

Z�i ¼ jjj 2 f1; . . . ;Ng;xj ¼ x0i;x0j ¼ xi

n o;

that is, Zþi is the set of indices of the couples in the training set hav-ing xi as the first and x0i as the second data object and Z�i is the cor-responding index set of the couples having xi and x0i in the oppositeorder. Moreover, for all ðxi;x0i; yiÞ 2 E, let

yi ¼ arg miny2R

Xj2Zþi

Lðyj; yÞ þXj2Z�i

Lðyj;�yÞ

0@

1A;

that is, yi minimizes the sum of losses for the couples in E havingthe same two data objects as the i-th couple. Now, the function

h xi;x0i� �

¼ yi; 8i 2 f1; . . . ;Ng;

obviously determines a solution to (10).

Next, let us define K/ as follows:

K/S ðx;x0Þ ¼

1; if x ¼ x0;0; if x – x:

�

This kernel can be interpreted as a limit case of the Gaussian RBFkernel with c! þ1. So, we define / so that h/ðxÞ;/ðx0Þi ¼ 1 ifx ¼ x0 and h/ðxÞ;/ðx0Þi ¼ 0 otherwise. Then, hUðx;x0Þ;Uðx;x0Þi ¼ 1,hUðx;x0Þ;Uðx0; xÞi ¼ �1, and the value of the inner product is zeroin all other cases. KU is in this case given by

KUI;S xi;x0i;xj; x0j� �

¼1; if xi ¼ xj ^ x0i ¼ x0j;

�1; if xi ¼ x0j ^ x0i ¼ xj;

0; otherwise:

8><>:

In this construction, choosing

ai ¼yi

jZþi j þ jZ�i j;

satisfies h xi;x0i� �

¼ yi for all couples in the training set. h

The above result indicates that this type of model is flexible en-ough to obtain an as low as possible empirical error on trainingdata, while maintaining the reciprocity property. Hence, the algo-rithm can learn intransitive reciprocal relations, because intransi-tive reciprocal relations will minimize the empirical loss inintransitive problem settings.

4. Experiments

4.1. Rock-paper-scissors

In order to test our approach, we consider a semi-syntheticbenchmark problem in game theory, a domain in which intransi-tive reciprocal relations between players is often observed. In sucha context, a pure strategy provides a complete description of how aplayer will play a game. In particular, it determines the move aplayer will make for any situation (s)he could face. A player’s strat-egy set is the set of pure strategies available to that player. A mixedstrategy is an assignment of a probability to each pure strategy.This allows for a player to randomly select a pure strategy. Sinceprobabilities are continuous, there are infinite mixed strategiesavailable to a player, even if the strategy set is finite.

We consider learning the reciprocal relation of the probabilitythat one player wins from another in the well-known rock-pa-per-scissors game. To test the performance of the learning algo-rithm in such a non-linear task, we generated the followingsynthetic data. First, we generate 100 individual objects for train-ing and 100 for testing. The data objects are three-dimensionalvectors representing players of the rock-paper-scissors game. Thethree attributes of the players are the probabilities that the playerwill choose ‘rock’, ‘paper’, or ‘scissors’, respectively. The probabilityPðrjxÞ of player x choosing rock is determined by PðrjxÞ ¼expðwuÞ=z, where u is a random number between 0 and 1, w is asteepness parameter, and z is a normalization constant ensuringthat the three probabilities sum up to one. By varying the widthw of the exponential function, we can generate players tendingto favor one of the three choices over the others or to play eachchoice almost equally likely.

We generate 1000 player couples for training by randomlyselecting the first and the second player from the set of trainingplayers. Each couple represents a game of rock-paper-scissorsand the outcome of this game can be considered as stochastic innature, because the strategy of a player is chosen in accordancewith the probabilities of picking a particular fixed strategy fromthat player’s set of mixed strategies. For example, when a fixedrock player plays against a mixed strategy player that plays scis-sors with probability 0.8 and paper with probability 0.2, then we

Fig. 1. Illustration of the players in the three data sets generated using the values 1 (left), 10 (middle), and 100 (right) for the parameter w.

Table 1Mean-squared error obtained with three different approaches: regularized least-squares with the kernel KU

I (I), regularized least-squares with the kernel KUT (II) and a

naive approach consisting of always predicting 1=2 (III).

w ¼ 1 w ¼ 10 w ¼ 100

I 0.000209 0.000445 0.000076II 0.000162 0.006804 0.131972III 0.000001 0.006454 0.125460

Table 2Classification accuracy obtained with three different approaches: regularized least-squares with the kernel KU

I (I), regularized least-squares with the kernel KUT (II) and a

naive approach consisting of always predicting a tie (III).

w ¼ 1 w ¼ 10 w ¼ 100

I 0.538200 0.957800 0.995000II 0.5 0.5 0.592950III 0.5 0.5 0.5


have a higher chance of observing a game outcome for which thefixed rock player wins from the second player. Yet, the same coupleof players with different outcomes can simultaneously occur in thetraining data. During training and testing, the outcome of a game is�1, 0, or 1 depending on whether the first player loses the game,the game ends in a tie, or the first player wins the game, respec-tively. We use the game outcomes as the labels of the trainingcouples.

For testing purposes, we use each possible couple of test playersonce, that is, we have a test set of 10000 games. However, insteadof using the outcome of a single simulated game as label, we assignfor each test couple the element of the reciprocal relation that cor-responds to the probability that the first player wins:

Qðx;x0Þ ¼ PðpjxÞPðrjx0Þ þ 12

PðpjxÞPðpjx0Þ þ PðrjxÞPðsjx0Þ

þ 12

PðrjxÞPðrjx0Þ þ PðsjxÞPðpjx0Þ þ 12

PðsjxÞPðsjx0Þ:

The task is to learn to predict this reciprocal relation. The algorithmestimates the relation by rescaling the predicted outputs that lie inthe interval [�1,1], as discussed above.

As mentioned above, we optimize a squared loss function ontraining data.3 The values of the regularization and bias parametersare selected with a grid search and cross-validation performed onthe training set. We conduct experiments with three data sets gen-erated using the values 1, 10, and 100 for the parameter w. Theseparameterizations are illustrated in Fig. 1. The value w ¼ 1 corre-sponds to the situation where each player tends to play ‘rock’, ‘pa-per’, or ‘scissors’ almost equally likely, that is, the players areconcentrated in the center of the triangle in the figure. Forw ¼ 100 the players always tend to play only their favorite item,that is, the players’ strategies are concentrated near the three cor-ners of the triangle. Finally, w ¼ 10 corresponds to a setting be-tween these two extremes.

The regression results are presented in Table 1. We report themean squared error obtained by regularized least-squares in atransitive and intransitive setting, respectively by specifying thekernels KU

T and KUI . For K/ a simple linear kernel is chosen in both

cases. In addition, in Table 2 we present the results of binary clas-sification experiments, in which the aim is to correctly predict thedirection of preference, that is, whether the first player is morelikely to win the second player or vice versa. In both regressionand classification experiments, we also compare these two ap-proaches with a naive heuristic consisting of always predicting1=2 (a tie).

In the regression experiments, the heuristic of always predict-ing a tie can be interpreted as quite optimal for w ¼ 1, becausein that case all players are located in the center of the triangle. This

3 For running the experiments, we use our RLScore software package, available athttp://www.tucs.fi/rlscore.

explains why neither the transitive nor the intransitive regularizedleast-squares algorithm can outperform this naive approach whenw ¼ 1. We conclude that there is not much to learn in this case. Forthe other two values of w, the situation is different, with the regu-larized least-squares algorithm with the intransitive kernel per-forming substantially better than the naive approach, while theperformance with the transitive kernel being close to that of thenaive one. Unsurprisingly, learning the intransitive reciprocal rela-tions is more difficult when the probabilities of the players areclose to the uniform distribution ðw ¼ 10Þ than in case the playerstend to always play their favorite strategy ðw ¼ 100Þ. Especially inthis last case, regularized least-squares with an intransitive kernelperforms substantially better than its transitive counterpart. Thissupports the claim that our approach works well in practice, whenthe reciprocal relation to be learned indeed violates weak stochas-tic transitivity. The stronger this violation, the more the advantageof an intransitive kernel will become visible.

In the binary classification experiments, the heuristic of alwayspredicting a tie or the same class has the classification accuracyequal to 0:5 in all cases, because each pair of players is twice inthe test data. For the trained predictors, the most difficult case isnow with the parameter w ¼ 1, because the game outcomes are al-most random when both players are in the center of the triangle.For the values w ¼ 10 and w ¼ 100, RLS with the kernel KU

I is ableto perform the classification almost perfectly, while the classifica-tion accuracy with the kernel KU

T is of random level.

4.2. Theoretical biology

Non-transitive competition between species has recently re-ceived attention in theoretical biology. This phenomenon has been

http://www.tucs.fi/rlscore

Table 3Classification accuracy (left) and mean squared regression error (right) obtained withthree different approaches: regularized least-squares with the kernel KU

I (I),regularized least-squares with the kernel KU

T (II) and a naive approach consisting ofalways predicting a tie in classification and 0.5 in regression.

Method Accuracy MSE

I 0.960600 0.000146II 0.680000 0.007835III 0.500000 0.007757


observed in many natural systems (see e.g. Sinervo and Lively(1996), Boddy (2000), Kerr et al. (2002), Czárán et al. (2002), Now-ak (2002), Kirkup and Riley (2004), Károlyi et al. (2005), Reichen-bach et al. (2007)) and it has been studied and analyzed withcomputer simulations (see e.g. Frean and Abraham (2001), Frean(2006)). The simulations usually consist of an initial populationof individuals or species and some limited resource, such as space,for which they compete. Most of the studies of non-transitive sys-tems have considered rock-paper-scissors type of relationships be-tween the competing species. Some of the studies and simulationsalso address competition of mutated individuals of a single specieshaving a similar type of non-transitive fashion as that of interspe-cific competition. Below, we use the term species when referring tothe individuals in a cyclic competitive structure.

Inspired by the simulations made by Frean (2006), we considerthe following setting. Suppose we have a number of competingspecies, each of them having two features. Namely, a species xhas a strong point denoted by sðxÞ and a weak point denoted bywðxÞ, and the values of both features are between 0 and 1. Then,for a couple of individuals, say ðx;x0Þ, we define a label y, whose va-lue equals 1 if x dominates x0 and �1 in the opposite case. Thedominance is determined by the following formula:

y ¼ signðuðsðx0Þ;wðxÞÞ � uðsðxÞ;wðx0ÞÞÞ ð12Þ

where sign is the signum function and

uða; bÞ ¼ minðja� bj;1� ja� bjÞ: ð13Þ

We observe that the species x dominates x0 if and only if the strongpoint sðxÞ of x is closer to the weak point wðx0Þ of x0 than sðx0Þ is towðxÞ, the closeness being defined by (13), expressing distance con-sidering the unit interval ½0;1� as a circular domain.

We elucidate the strong and weak points of a species with thefollowing toy example about animals. The strong point of an ani-mal can be considered, for example, as the colour that the animalis best able to see and the weak point the colour of the animal. Fur-ther, the colour can be considered as a continuous variable so thatthe smaller the distance between the colours sðxÞ and wðx0Þ is, thebetter x is able to see x0. Then, an animal x can dominate animal x0

if the distance is small enough.We set up an experiment in which we randomly generate an

initial population of 2500 species so that their strong and weakpoints have been drawn from a uniform distribution between 0and 1. Then, we select randomly two species x and x0 from the pop-ulation for which we compute a label y with (12). In the confron-tation of these two species, we say that x is the winner and x0 isthe loser if y ¼ 1 and vice versa if y ¼ �1. After the confrontation,

Fig. 2. The set of 2500 species after 900,000 (

the loser is replaced with a mutation x̂ of the winner. The strongand weak points of the mutant are obtained from the strong andweak points of the winner by shifting them by small amountswhose sizes are drawn from a normal distribution having zeromean and standard deviation 0.005.

Unlike in the experiments done by Frean (2006), we adopt anapproach in which we do not consider any local neighborhood ofthe species, that is, the two confronting species are randomly se-lected from the current population of 2500 species. This is donein order to simplify the experimental setting. In addition, for eachconfrontation of two species, there is always a winner and a loser,while this was the case in the experiments of Frean (2006) only ifthe value of (13) for sðxÞ and wðx0Þ was smaller than a certainthreshold. A next couple was randomly selected in case of the va-lue being larger than the threshold. Finally, our closeness function(13) differs from the one used by Frean (2006) so that the strongand the weak points are cyclic in the sense that values 0 and 1can be considered to be equal. We adopted the cyclic property ofthe weak and strong points in order to eliminate the special caseof the values being close to 0 and 1.

We perform altogether 1,000,000 subsequent confrontations oftwo species. In the beginning, there are no clusters, since thestrong and weak points of the species are uniformly distributed.However, the species start to form small clusters after a coupleof tens of thousands of confrontations and large clusters when acouple of hundreds of thousands of confrontations has passed.We sample our training and test sets from the 100,000 last con-frontations, since at this point the simulation has already formedquite stable clusters. Namely, we randomly sample withoutreplacement 1000 couples for a training set and 10,000 for a testset. The clusters formed after 900,000 and 1,000,000 confronta-tions are depicted in Fig. 2. The figures are two consecutive snap-shots of a movie which is available online at http://staff.cs.utu.fi/aatapa/tbmovie.avi.

left) and 1,000,000 (right) confrontations.

http://staff.cs.utu.fi/~aatapa/tbmovie.avi

http://staff.cs.utu.fi/~aatapa/tbmovie.avi

Fig. 3. Illustration of 100 randomly selected test couples. Left: the dotted lines denote the 69 couples classified correctly and the dashed lines denote the 31 incorrectlyclassified ones using RLS with the transitive kernel. Right: the dotted lines denote the 89 couples classified correctly and the dashed lines denote the 11 incorrectly classifiedones using RLS with the intransitive kernel.


We train two RLS classifiers with the training set of 1000 con-frontations and use them for predicting the outcomes of the un-seen 10000 confrontations in the test set. The first classifier usesa transitive kernel KU

T and the second one an intransitive kernelKU

I . The base kernel K/ is chosen to be the Gaussian radial basisfunction kernel for both the cases, that is,

K/ðx;x0Þ ¼ e�cððsðxÞ�sð x0ÞÞ2þðwðxÞ�wðx0ÞÞ2Þ: ð14Þ

The value of the regularization and bias parameters, and the width cof the Gaussian kernel are selected with a grid search and cross-val-idation performed on the training set.

The experimental results are listed in Table 3. Moreover, a ran-dom sample of 100 test couples and their classifications by thetransitive and intransitive RLS classifier are illustrated in Fig. 3.From the results, we observe that the classifier using the transitivekernel can learn the relation to some extent, but the intransitivekernel is clearly better for this purpose. In addition to the classifi-cation experiments, we also conduct experiments in which we aimto correctly regress the value of the reciprocal relation betweentwo species. The value of the relation for two species is obtainedvia using the scaling function (8) in place of the signum functionin (12). One can clearly observe that here as well the intransitivekernel performs clearly better than its transitive counterpart.

5. Conclusion

In this paper the problem of learning intransitive reciprocalrelations was tackled. To this end, we showed that existing ap-proaches for preference learning typically exhibit strong stochastictransitivity as property, and we introduced an extension of theexisting RankRLS framework to predict reciprocal relations thatcan violate weak stochastic transitivity. In this framework, thechoice of kernel function defines the transition from transitive tointransitive models. By choosing a feature mapping based on theKronecker-product, we are able to predict intransitive reciprocalrelations. Experiments on benchmark problems in game theoryand theoretical biology confirmed that our approach substantiallyoutperforms the ranking approach when intransitive reciprocalrelations are present in the data. Given the absence of publiclyavailable datasets on learning intransitive reciprocal relations, weare willing to share our data with other researchers, and in the fu-ture we hope to apply our algorithm in other domains as well.

From a decision making point of view, one might argue that themodels proposed in this paper suffer from some lack of interpret-ability. However, for most real-world data modeling problems, aclear trade-off between interpretability and performance can beexpected. Our methods rather incline to the latter side of the bal-ance: state-of-the-art predictive performance but a limited inter-pretability. Nevertheless, interpretability can simply be preservedby choosing a linear kernel for K/, while still not necessarily impos-ing transitivity by using the Kronecker-product kernel on top ofthis linear kernel. Furthermore, as another important property ofcommonly used decision models, monotonicity can also be guaran-teed in the same way, since a linear model always satisfies mono-tonicity, but unfortunately not vice versa, because monotonemodels do not necessarily have to be linear models. For kernelmethods in particular, it is widely accepted that monotonicity can-not be easily guaranteed in the dual formulation. As a specific formof incorporating domain knowledge into kernel methods, Le et al.(2006) recently proposed an algorithm capable of enforcing mono-tonicity in regression problems, but the topic definitely remains anopen challenge that deserves further attention in future work.

References

Agresti, A., 2002. Categorical Data Analysis, second version. John Wiley and Sons.Azand, P., 1993. The philosophy of intransitive preferences. The Economic Journal

103, 337–346.Ballinger, T., Wilcox, P., 1997. Decisions, error and heterogeneity. The Economic

Journal 107, 1090–1105.Billot, A., 1995. An existence theorem for fuzzy utility functions: A new elementary

proof. Fuzzy Sets and Systems 74, 271–276.Boddy, L., 2000. Interspecific combative interactions between wood-decaying

basidiomycetes. FEMS Microbiology Ecology 31, 185–194.Bradley, R., Terry, M., 1952. Rank analysis of incomplete block designs. I: The

method of paired comparisons. Biometrika 39, 324–345.Carroll, J., De Soete, G., De Sarbo, W., 1990. Two stochastic multidimensional choice

models for marketing research. Decision Sciences 21, 337–356.Chu, W., Ghahramani, Z., 2005. Preference learning with gaussian processes. In:

Proceedings of the International Conference on Machine Learning, Bonn,Germany, pp. 137–144.

Chu, W., Keerthi, S., 2007. Support vector ordinal regression. Neural Computation 19(3), 792–815.

Crammer, K., Singer, Y., 2001. Pranking with ranking. In: Proceedings of theConference on Neural Information Processing Systems. Vancouver, Canada, pp.641–647.

Czárán, T., Hoekstra, R., Pagie, L., 2002. Chemical warfare between microbespromotes biodiversity. Proceedings of the National Academy of Sciences 99 (2),786–790.

Dasgupta, M., Deb, R., 1996. Transitivity and fuzzy preferences. Social Choice andWelfare 13, 305–318.


De Baets, B., De Meyer, H., 2005. Transitivity frameworks for reciprocal relations:Cycle-transitivity versus FG-transitivity. Fuzzy Sets and Systems 152, 249–270.

De Baets, B., De Meyer, H., De Schuymer, B., Jenei, S., 2006. Cyclic evaluation oftransitivity of reciprocal relations. Social Choice and Welfare 26, 217–238.

De Baets, B., De Meyer, H., De Loof, K., submitted for publication. On the cycle-transitivity of the mutual rank probability relation of a poset. Fuzzy Sets andSystems.

De Loof, K., De Baets, B., De Meyer, H., 2010. Counting linear extension majoritycycles in posets on up to 13 points. Computers and Mathematics withApplications 59, 1541–1547.

De Schuymer, B., De Meyer, H., De Baets, B., Jenei, S., 2003. On the cycle-transitivityof the dice model. Theory and Decision 54, 164–185.

De Schuymer, B., De Meyer, H., De Baets, B., 2006. Optimal strategies for equal-sumdice games. Discrete Applied Mathematics 154, 2565–2576.

De Schuymer, B., De Meyer, H., De Baets, B., 2009. Optimal strategies for symmetricmatrix games with partitions. Bulletin of the Belgian Mathematical Society –Simon Stevin 16, 67–89.

Dhzafarov, E., 2003. Thurstonian-type representations for same-differentdiscriminations: Deterministic decisions and independent images. Journal ofMathematical Psychology 47, 184–204.

Dias, L., Mousseau, V., 2006. Inferring ELECTRE’s veto-related parameters fromoutranking examples. European Journal of Operational Research 170, 172–191.

Diaz, S., Garcia-Lapresta, J.-L., Montes, S., 2008. Consistent models of transitivity forreciprocal preferences on a finite ordinal scale. Information Sciences 178 (13),2832–2848.

Doignon, J.-P., Monjardet, B., Roubens, M., Vincke, Ph., 1986. Biorder families, valuedrelations and preference modelling. Journal of Mathematical Psychology 30,435–480.

Doumpos, M., Zopounidis, C., 2004. A multicriteria classification approach based onpairwise comparisons. European Journal of Operational Research 158, 378–389.

Fishburn, P., 1970. Utility Theory for Decision Making. Wiley.Fishburn, P., 1991. Nontransitive preferences in decision theory. Journal of Risk and

Uncertainty 4, 113–134.Fisher, L., 2008. Rock, Paper, Scissors: Game Theory in Everyday Life. Basic Books.Fono, L., Andjiga, N., 2007. Utility function of fuzzy preferences on a countable set

under max-*-transitivity. Social Choice and Welfare 28, 667–683.Frean, M., 2006. Emergence of cyclic competitions in spatial ecosystems. In:

Whigham, P.A. (Ed.), SIRC 2006: Interactions and Spatial Process (EighteenthColloquium hosted by the Spatial Information Research Centre), pp. 1–9.

Frean, M., Abraham, E., 2001. Rock-scissors-paper and the survival of the weakest.Proceedings of the Royal Society (London), Series B 268, 1323–1328.

Freund, Y., Yier, R., Schapire, R., Singer, Y., 2003. An efficient boosting algorithm forcombining preferences. Journal of Machine Learning Research 4, 933–969.

Garcia-Lapresta, J., Mesenes, C., 2005. Individual-valued preferences and theiraggregation: Consistency analysis in a real case. Fuzzy Sets and Systems 151,269–284.

Herbrich, R., Graepel, T., Obermayer, K., 2000. Large margin rank boundaries forordinal regression. In: Smola, A., Bartlett, P., Schölkopf, B., Schuurmans, D. (Eds.),Advances in Large Margin Classifiers. MIT Press, pp. 115–132.

Herbrich, R., Minka, T., Graepel, T., 2007. Trueskill: A bayesian skill rating system.In: Schölkopf, B., Platt, J., Hoffman, T. (Eds.), Advances in Neural InformationProcessing Systems, vol. 19. The MIT Press, Cambridge, MA, pp. 569–576.

Hüllermeier, E., Fürnkranz, J., 2003. Pairwise preference learning and ranking. In:Proceedings of the European Conference on Machine Learning, Dubrovnik,Croatia, pp. 145–156.

Hüllermeier, E., Fürnkranz, J., Cheng, W., Brinker, K., 2008. Label ranking by learningpairwise preferences. Artificial Intelligence 172, 1897–1916.

Joachims, T., 2002. Optimizing search engines using clickthrough data. In: Hand, D.,Keim, D., Ng, R. (Eds.), Proceedings of the Eighth ACM SIGKDD Conference onKnowledge Discovery and Data Mining (KDD’02). ACM Press, pp. 133–142.

Kalish, S., Nelson, P., 1991. A comparison of ranking, rating and reservation pricemeasurement in conjoint analysis. Marketing Letters 2 (4), 327–335.

Károlyi, G., Neufeld, Z., Scheuring, I., 2005. Rock-scissors-paper game in a chaoticflow: The effect of dispersion on the cyclic competition of microorganisms.Journal of Theoretical Biology 236 (1), 12–20.

Kashima, H., Oyama, S., Yamanishi, Y., Tsuda, K., 2009. On pairwise kernels: Anefficient alternative and generalization analysis. In: Theeramunkong, Thanaruk,Kijsirikul, Boonserm, Cercone, Nick, Ho, Tu Bao (Eds.), PAKDD, Lecture Notes inComputer Science, vol. 5476. Springer, pp. 1030–1037.

Kerr, B., Riley, M., Feldman, M., Bohannan, B., 2002. Local dispersal promotesbiodiversity in a real-life game of rock-paper-scissors. Nature 418, 171–174.

Kirkup, B., Riley, M., 2004. Antibiotic-mediated antagonism leads to a bacterialgame of rock-paper-scissors in vivo. Nature 428, 412–414.

Le, Q., Smola, A., Grtner, T., 2006. Simpler knowledge-based support vectormachines. In: Proceedings of the 23rd International Conference on Machinelearning, Pittsburgh, PA, USA, pp. 521–528.

Luce, R., Suppes, P., 1965. Preference, utility and subjective probability. In: Luce, R.,Bush, R., Galanter, E. (Eds.), Handbook of Mathematical Psychology. Wiley, pp.249–410.

Makowski, M., Piotrowski, E., 2006. Quantum cat’s dilemma: An example ofintransitivity in quantum games. Physics Letters A 355, 250–254.

Mousseau, V., Figueira, J., Naux, J., 2001. Using assignment examples to inferweights for ELECTRE TRI method: Some experimental results. European Journalof Operational Research 130 (2), 263–275.

Nowak, M., 2002. Biodiversity: Bacterial game dynamics. Nature 418, 138–139.Öztürk, M., Tsoukiàs, A., Vincke, Ph., 2005. Preference modelling. In: Figueira, J.,

Greco, S., Ehrgott, M. (Eds.), Multiple Criteria Decision Analysis, State of the ArtSurveys. Springer, Verlag, pp. 27–71.

Pahikkala, T., Tsivtsivadze, E., Airola, A., Boberg, J., Salakoski, T., 2007. Learning torank with pairwise regularized least-squares. In: Joachims, Thorsten, Li, Hang,Liu, Tie-Yan, Zhai, ChengXiang (Eds.), SIGIR 2007 Workshop on Learning to Rankfor Information Retrieval, pp. 27–33.

Pahikkala, T., Pyysalo, S., Boberg, J., Järvinen, J., Salakoski, T., 2009a. Matrixrepresentations, linear transformations, and kernels for disambiguation innatural language. Machine Learning 74 (2), 133–158.

Pahikkala, T., Tsivtsivadze, E., Airola, A., Järvinen, J., Boberg, J., 2009b. An efficientalgorithm for learning to rank from preference graphs. Machine Learning 75 (1),129–165.

Radlinski, F., Joachims, T., 2007. Active exploration for learning rankings fromclickthrough data. In: Proceedings of the International Conference onKnowledge Discovery and Data Mining, San Jose, CA, USA, pp. 570–579.

Reichenbach, T., Mobilia, M., Frey, E., 2007. Mobility promotes and jeopardizesbiodiversity in rock-paper-scissors games. Nature 448, 1046–1049.

Rifkin, R., 2002. Everything Old Is New Again: A Fresh Look at Historical Approachesin Machine Learning. Ph.D. Thesis, Massachusetts Institute of Technology.

Schölkopf, B., Smola, A., 2002. Learning with Kernels, Support Vector Machines,Regularisation, Optimization and Beyond. The MIT Press.

Shawe-Taylor, J., Cristianini, N., 2004. Kernel Methods for Pattern Analysis.Cambridge University Press.

Sinervo, S., Lively, C., 1996. The rock-paper-scissors game and the evolution ofalternative male strategies. Nature 340, 240–246.

Suykens, J., Van Gestel, T., De Brabanter, J., De Moor, B., Vandewalle, J., 2002. LeastSquares Support Vector Machines. World Scientific Publication Co., Singapore.

Switalski, Z., 2000. Transitivity of fuzzy preference relations – an empirical study.Fuzzy Sets and Systems 118, 503–508.

Switalski, Z., 2003. General transitivity conditions for fuzzy reciprocal preferencematrices. Fuzzy Sets and Systems 137, 85–100.

Thurstone, L., 1927. A law of comparative judgment. Psychological Review 79, 281–299.

Tsai, R.-C., Böckenholt, U., 2006. Modelling intransitive preferences: A random-effects approach. Journal of Mathematical Psychology 50, 1–14.

Tversky, A., 1998. Preference, Belief and Similarity. MIT Press.Waegeman, W., De Baets, B., submitted for publication. On the ERA ranking

representability of multi-class classifiers. Artificial Intelligence.Waegeman, W., De Baets, B., Boullart, L., 2009. Kernel-based learning methods for

preference aggregation. 4OR 7, 169–189.Waite, T., 2001. Intransitive preferences in hoarding gray jays (Perisoreus

canadensis). Journal of Behavioural Ecology and Sociabiology 50, 116–121.Weston, J., Schölkopf, B., Bousquet, O., 2005. Joint kernel maps. In: Cabestany, Joan,

Prieto, Alberto, Sandoval Hernández, Francisco (Eds.), ComputationalIntelligence and Bioinspired Systems, Eighth International Work – Conferenceon Artificial Neural Networks (IWANN 2005), Lecture Notes in ComputerScience, vol. 3512. Springer-Verlag, Berlin Heidelberg, Germany, pp. 176–191.

Zhang, J., 2004. Binary choice, subset choice, random utility, and ranking: A unifiedperspective using the permutahedron. Journal of Mathematical Psychology 48,107–134.