+ All Categories
Home > Documents > A LIKELIHOOD APPROACH TO HLA SEROLOGY

A LIKELIHOOD APPROACH TO HLA SEROLOGY

Date post: 12-Nov-2023
Category:
Upload: inserm
View: 0 times
Download: 0 times
Share this document with a friend
12
European Journal of lmmunogenetics (1 992), 19, 3 I 1-322 A LIKELIHOOD APPROACH TO HLA SEROLOGY J. F. CLAYTON,* C. LONJOU,* P. BOURRET‘t A . CAMBON-THOMSEN,* E. OHAYON,* J. HORS’ & E. D. ALBERT^ *CNRS-CRPG, and ‘CERT-ONERA, Toulouse, and ‘France Transplant, Paris, France, and ‘lrnrnunogenetics Laboratory, Munich, Germany (Received 9 March 1992; revised 19 June 1992; accepted 22 June 1992) SUMMARY A likelihood approach to HLA serology has been developed in which the aim is not to define a recognition set for a serum but to describe the serum’s ability to react with each and every antigen in the test cells, this ability being quantified in terms of the probability of a positive reaction. For a given set of probabilities, one for each antigen, it is possible to derive the probability of the observed set of reactions (the likelihood of the set of probabilities). The maximum possible value of the likelihood for any possible combination of the probability set can then be sought, but this requires a maximization of likelihood with respect to 60-100 independent parameters. Theoreti- cal considerations of the shape of the likelihood surface prove that, in this particular case, this is a feasible proposition. This approach allows the recognition of three groups of antigens: those for which there is considerable evidence of a specificity, those for which there is either no specificity or a very weak specificity, and those for which there is insufficient evidence on which to base a conclusion. The existence of a specificity can be tested using a log likelihood ratio as a statistic, but the usual assumption of a xz distribution of this statistic cannot automatically be made in this situation. Therefore, the distribution is estimated by simulation. A serologist using this approach would receive considerably more information as to the serum’s reaction patterns and valid statistics for the existence, or not, of a specificity. INTRODUCTION The basic experimental result of HLA serology is the reaction pattern between a number of cells and a particular serum (Bodmer, 1986,1987). However, except in the most straightforward cases, it is not at all easy to establish which antigens on the cells’ surfaces cause a positive reaction with the serum and consequent cell death. To aid in this, serologists have invoked the concept of the recognition set of a serum, which is an ordered list of antigens with which the serum appears to be Correspondence: Dr John Clayton, CNRS-CRPG, Hopital Purpan, Ave de Grande Bretagne, 31300 Toulouse, France. 31 1
Transcript

European Journal of lmmunogenetics (1 992), 19, 3 I 1-322

A LIKELIHOOD A P P R O A C H T O H L A S E R O L O G Y

J . F . C L A Y T O N , * C . L O N J O U , * P . B O U R R E T ‘ t A . C A M B O N - T H O M S E N , * E . O H A Y O N , * J . HORS’ & E . D . ALBERT^

*CNRS-CRPG, and ‘CERT-ONERA, Toulouse, and ‘France Transplant, Paris, France, and ‘lrnrnunogenetics Laboratory, Munich, Germany

(Received 9 March 1992; revised 19 June 1992; accepted 22 June 1992)

S U M M A R Y

A likelihood approach to HLA serology has been developed in which the aim is not to define a recognition set for a serum but to describe the serum’s ability to react with each and every antigen in the test cells, this ability being quantified in terms of the probability of a positive reaction. For a given set of probabilities, one for each antigen, it is possible to derive the probability of the observed set of reactions (the likelihood of the set of probabilities). The maximum possible value of the likelihood for any possible combination of the probability set can then be sought, but this requires a maximization of likelihood with respect to 60-100 independent parameters. Theoreti- cal considerations of the shape of the likelihood surface prove that, in this particular case, this is a feasible proposition.

This approach allows the recognition of three groups of antigens: those for which there is considerable evidence of a specificity, those for which there is either no specificity or a very weak specificity, and those for which there is insufficient evidence on which to base a conclusion. The existence of a specificity can be tested using a log likelihood ratio as a statistic, but the usual assumption of a xz distribution of this statistic cannot automatically be made in this situation. Therefore, the distribution is estimated by simulation.

A serologist using this approach would receive considerably more information as to the serum’s reaction patterns and valid statistics for the existence, or not, of a specificity.

I N T R O D U C T I O N

The basic experimental result of H L A serology is the reaction pattern between a number of cells and a particular serum (Bodmer, 1986,1987). However, except in the most straightforward cases, it is not at all easy to establish which antigens on the cells’ surfaces cause a positive reaction with the serum and consequent cell death. To aid in this, serologists have invoked the concept of the recognition set of a serum, which is an ordered list of antigens with which the serum appears to be

Correspondence: Dr John Clayton, CNRS-CRPG, Hopital Purpan, Ave de Grande Bretagne, 31300 Toulouse, France.

31 1

312 J . F. Clayton et al.

able to react (Albert et al., 1984; Fergusson et al., 1987). However, this concept requires a further definition which has not yet been standardized: what statistic is to be used in the ordering of these antigens and what value of it is to be used to terminate the list?

A number of statistics have been proposed and used both in laboratory serum typing, international workshops and in computer programs: the most prominent of these being the x2 statistic and correlation coefficient (both based on representing the data as two by two contingency tables) and the Q-score, which incorporates the ability of serologists to recognize varying levels of reaction (Sierp & Albert, 1984).

A major drawback t o this approach is that the definition of a positive recognition set might be wrongly taken to imply that unnamed antigens are unrecognized by the serum, there being no simple method of assessing the quantity of evidence in favour of an antigen’s positivity or negativity. In order to avoid this complication, a scheme of serological typing has been developed, based on likelihood principles, which are widely used in genetics. This scheme starts with the assumption that there is no such thing as a serum’s recognition set. Instead, the serum is seen as having a finite probability of a positive reaction with each of the antigens in the set of test cells, this probability being different for different antigens. The aim is to estimate, for each antigen, the value of this probability.

The theory involved in this approach and an implementation of it in a computer program are presented here.

Overview of the theory

The basic parameter of this approach is the probability of a positive reaction (4.J between a serum and a particular antigen x, o r its converse the probability of a negative reaction (Ox). It is considerably easier to formulate the algebra and calculus of this approach using the negative probability, but, in accordance with the wishes of serologists, results are expressed in terms of positive reactions. For a cell which has antigens x, y, and z on its surface, the probability of a negative reaction will be:

P - = 0x 8, BZ

assuming that the reactions between the serum and antigens occur independently. Similarly the probability of a positive reaction will be:

P+ = 1 - 0, ey 0*

The assumption of independent reactions seems reasonable in that there is no suggestion that the antigenic properties change accordance with the other antigens present on the cell, for example. This assumption does not preclude cross reactions. Further, if a serum reacts with two or more antigens on a cell, the probability of cell death would be increased over that which would be the case if only one of these antigens was present. However, although it is intuitively reasonable, the assumption remains unproven.

Thus for a given set of values 0, it is possible to calculate the probability of finding exactly the set of observed data. This figure, the probability of the data for given 0, is also termed the likelihood of the assumed values of 0. In general, there is a set of values for 0 which has the maximum possible likelihood (Lmax). That is, all possible changes in the values of the B’s, either singly or in combinations, would produce a smaller value of the likelihood. This set of values provides the maximum likelihood estimates of the true values of 8. For details on the properties of such estimates see Edwards (1984).

A likelihood approach to HLA serology 313 A theoretical study of the shape of this likelihood surface has been undertaken and is presented

in more detail in the appendix. The principle conclusions of this study are: (1) that all stationary points on the likelihood surface are maxima, ( 2 ) that there are no saddle points, (3) that there is a single maximum, and (4) that the unique maximum can be formally demonstrated by calculation of the likelihoods at 2 n + l points, where n is the number of antigens present. Hence, the procedure is a practical proposition which can be accomplished in a reasonable time.

Under the Null hypothesis of no specificity to any antigen, each cell has exactly the same probability of a positive, or negative, reaction.

Pnull= n (n + m1-I

where n is the number of positively reacting cells, and m the number of negatively reacting cells. Therefore, the likelihood for the null hypothesis is Lnull = p,,,,ll” (l-pnull)m.

The binomial coefficient, which is common to both L,,, and Lnull, has been ignored since it is irrelevant to their ratio. The Lnull and L,,, can be used to generate a measure of the evidence for the specificity of the serum. According to large sample theory, the statistic:

is asymptotically x2 distributed, with degrees of freedom equal to the number of antigens in the data set. In practice, this will only rarely apply to data sets such as are used in serological analysis since many alleles will be represented in very few cells. It is predictable that the distribution of this statistic will be dependent on the repartition of the antigens in the test cells and therefore, must be assessed for those cases where a relatively low value has been achieved. This is discussed in more detail below. However, as a first approximation this parameter can be thought of as a x2 test for the existence of a specificity of the serum.

Therefore, for each antigen, the maximum likelihood estimate of the 0 is provided. However, there is also the need for an evaluation of the reasonable dispersion of this estimate. For this a ‘support interval’ is provided, which is defined as that interval for the 6 outwith which it is not possible to achieve a likelihood greater than L,,,/e2 by varying the values of all the other 6’s (see the appendix for further details). Again, given the uneven sampling of antigens in the test cells, it is not possible to be sure of the level of significance of such a support limit.

R E S U L T S

Typical results for sero-typing are shown in Tables 1 , 2 , and 3, the sera and the test cells having been taken from the study ‘Provinces Francaises’ (Ohayon & Cambon-Thomsen, 1986). The first line of each listing shows the general parameters for the serum: the Pnull, the natural logarithm of the likelihood of the null hypothesis, the logarithm of the maximum likelihood achievable under the hypothesis of allele specificity and twice the log likelihood difference (called the Xlld(l)) . These parameters have already been defined.

There then follows the list of all alleles which have a non-zero estimate of 0. For each such, there are displayed the maximum likelihood estimate of 0 and its support limit as previously defined. There is also given a further statistic which quantifies the support for the existence of a specificity against the allele and which has been used to order the list of alleles. This statistic (Xlld(2)) is defined as follows:

314 J . F. Clayton et al. TABLE 1. The result of analysing data for serology between Workshop Serum 026 and 200 test cells

allele Jr S I XII,dz)

B39 0.100 0.00 0.38 {2.201} A23 0.888 0.59 0.99 {29.867}

Bw58 0.334 0.00 0.85 {1.924} A29 0.062 0.00 0.25 10.916)

SI alleles 0-0.9 Bw53 Bw56 DRwlO 0-0.7 B49 (M.65 A25 Aw33 DR9 0-0.5 Bw50 DRw8 (w.45 DRwl2 0-0.4 A26 B37 Bw41 04 .35 B38 Bw47 Bw61 Cwl 0 . 3 845 Bw52 0-0.25 A30 B27 Bw5S &0.2 B13 B14 0-0.15 (w. 1

0-0.05

A28 A31 A32 B18 Bw62 Cw2 A l l A24 B35 B44 B51 Bw57 Bw60 Cw3 Cw4 CwS DR4 DRwll DRwl4 A1 A2 A3 BI B8 Cw6 Cw7 DRl DR2 DR3 DR7

See text for explanation of symbols. SI = support interval.

where L,,, is the maximum likelihood, and Lo is the maximum likelihood achievable under the assumption that there is n o specificity against the allele, i.e. 0 = 1. Asymptotically, this statistic is x 2 distributed with one degree of freedom. However, this cannot be assumed to be the case for many of the alleles in a study. The significance of this statistic is discussed later.

After this, the alleles for which there is a zero estimate of 0 are listed. These are organized on the basis of the upper value of the support limits. It is not possible to declare that there is no specificity against such antigens, but, for those antigens low in the list, if such specificity exists, it is weak.

Table 1 shows the results for Workshop Serum number 026, when tested against a sample of 200 cells, representative of the French Caucasian population. The global Xlld (61.559) is significant for the rate of positivity (p,,11) and this particular cell set, as is discussed below. A specificity against A23 is established, and any specificity against A l , A2, or A3 or any other of the antigens at the foot of the table must be weak. In the central part of the table, each of the antigens Bw53, Bw56, DRwlO, A25, Aw33 and DR9 have a very wide support limit because of the paucity of the number of cells expressing these antigens. None of these antigens is present on more than 3 cells. By contrast, B49 was present on five cells and it would have been expected that a narrower value of the support limit could have been found. The support limit actually found (0 to 0.7) is double that of Cwl, also present on 5 cells. This is explained by the strong linkage disequilibrium between A23 and B49, both antigens being present on four cells. This disequilibrium has acted to flatten the likelihood surface, thus increasing the support intervals of both antigens.

A likelihood approach to HLA serology 315 Tables 2 and 3 show the results for Workshop Serum 199, when tested against 200 and 800 cells

respectively. B7 heads both lists despite the fact that the maximum likelihood estimate of its 0 is smaller than those of Bw60, B13 and Bw61. This is because its Xlldr which reflects the quantity of evidence favouring a specificity, is the largest. It is important to note that the estimates of the 8's and their support limits are independent of the method of ordering. The order of the antigens is unchanged between the two lists for all antigens with an Xlld greater than 5 with the 200 cell set. However, the Xlld's are much larger with 800 cells, and the support intervals much narrower. With the exception of B7, the estimates of 8's are little different. There is a much longer list of alleles a t the foot of the table with the 800 cell set.

Statistical considerations

The use of the log likelihood difference is central to the approach presented here and, for those antigens present on the test cells with high frequency, there is little problem in its interpretation. However, for the others some caution is required. This caution applies equally to the parameter used to test the null hypothesis of no specificity for any antigen and those used to test for specificity to a particular antigen. Therefore, assessment of the distributions of these statistics is required: that is, it is necessary to be able to estimate the probability of obtaining any particular value of the statistic or higher if the null hypothesis was to be true. This distribution is a function of the number of cells in the test sample, the number of alleles present, the patterns of association of the alleles, and the overall rate of positive reaction for the serum under the null hypothesis (pnull). Hence, there can be no globally determined level of significance and the distributions must he estimated for any serum typing for which there is doubt.

TABLE 2. The result of analysing data for serology between Workshop Serum 199 and 200 test cells

serum 199

allele B7 Bw60 B 13 Bw61 Bw47 Bw41 A29 DRw 10 Bw52

SI G0.9 (M.65 C0.5 0 . 4 5 s 0 . 3 5 (M.3 0-0.25 0-0.2 (M.15 0-0.1 0 . 0 5

4J 0.888 0.934 1.000 1.000 0.684 0.500 0.126 1 .000 0.202

SI 0.77 0.96 0.73 0.99 0.80 1.00 0.67 1 .00 0.17 0.98 0.09 0.90 0.00 0.36 0.00 1 .oo 0.00 0.64

Xlld(2.J

(96.014} {43.493) (42.414) { 15.093} (8,920) (5.745) (1.454) {0.130) { 0,029)

alleles A25 Bw53 Bw56 DRw8 Aw33 Bw50 Bw58 DR9 DRwl2 B37 Cwl A28 B27 Bw55 A30 A31 A32 838 B45 B49 A23 A26 B39 Cw2 DRwl4 A l l B14 B18 A3 A24 B44 Bw57 DR7 B8 B35 B51 Bw62 Cw3 Cw4 Cw5 Cw6 Cw7 DRl DR2 DR4 DRwl1 A1 A2 DR3

316 J. F. Clayton et al. TABLE 3. The result of analysing data for serology between Workshop Serum 199 and 800 test cells

serum 199

allele B7 Bw60 B13 Bw61 Bw47 B41 B27 Bw55

SI C0.9 04.65 0 4 . 5 0-0.4 C0.35 0-0.25 0-0.2 0-0. 15 M. 1

04 .05

P n d / 0.244

+ 0.632 0.960 0.814 0.942 0.832 0.590 0.056 0.68

SI 0.55 0.71 0.88 0.99 0.61 0.94 0.77 0.99 0.44 0.99 0.27 0.85 0.00 0.17 0.00 0.26

alleles DRw6 Aw34 Aw66 DR5 Bw63 DR9 DRwlO A25 Bw56 DRwl2 B45 B49 Bw52 Bw53 Bw58 Cw2 A23 A32 Bw50 A l l A28 A30 A31 Aw33 B14 B37 B38 B39 DRw8 DRwll DRwl3 DRw8 A1 A2 A3 A24 A26 A29 B8 B18 B44 B35 B51 Bw57 Bw62 Cwl Cw3 Cw4 Cw5 Cw6 Cw7 DRl DR2 DR3 DR4 DR7 DRwl4

X//d(2) {203.469} { 1 95.94 1 } {81.418) {66.985} (33,384)

11.337)

{29.047) { 1.549)

Within the computer program, there exists an option which allows this distribution to be estimated for a set of test cells and a user defined value of pnull. This is achieved by simulating the cell/serum reactions for a number of hypothetical sera. In this, each cell is randomly assigned a positive or negative reaction using a random number generator, with an overall frequence of positive reactions equal to the pnull chosen. This is performed a large number of times, typically either 100 or 500 times and, for each simulation, the Lnullr L,,, and the log likelihood difference calculated. The distribution of the observed log likelihood differences gives an estimate of the true distribution subject to the null hypothesis of the serum having no antigen specificity. Thus, if the log likelihood difference for a real serum exceeds the value found in, for example, 95% of simulated sera with the same pnull, it would be reasonable to reject the null hypothesis of no specificity with a confidence of the order of 5%. Similarly, the log likelihood difference can be calculated for the hypothesis of no specificity directed against any antigen(s) of interest.

Three distributions of the global log likelihood difference for a set of 200cells are shown in Fig. 1. The distribution displays a rightward shift as the pnull increases, as it does for an increase in the number of cells. Hence, the limit of significance for the statistic would be different for sera with different rates of positivity: for example, a value of 50 with a 200-cell experimental set could be considered to be significant for a serum with a pnull of 0.1, but would not be significant if the pnull

A likelihood approach to HLA serology 317

60

Xlld achieved

FIG. 1 . The distribution of the global Xlld statistic, for a set of 200 cells (as used in the production of the results in Tables 1 and 2) for different values of pnull. These distributions were estimated by simulating 200 non-specific sera and, for each, calculating the relevant statistics.

was 0.4. Further increase of the Pnull, up to about 0.7, creates little further right shift, and increases beyond this are accompanied by a left shift. Although these results are dependent m the particular set of cells used, they may also be valid for other sets of cells chosen in the same way from the same population. In this case, the cells were taken from the study ‘Provinces Francaises’ (Ohayon & Cambon-Thomsen, 1986), and so the distributions may be valid for other unselected Caucasian populations.

DISCUSSION

Previous attempts to produce a likelihood approach to serological analysis failed due to the extremely large number of independent parameters, each of which have to be optimized t o find the solution. In order to demonstrate a maximum with a single variable, it is necessary to show that the likelihood is smaller on both sides; hence three calculations of likelihood are necessary. With two variables, nine calculations are needed and with three variables, 27. The number of calculations, and hence time, rises exponentially with the number of variables. It is rare that a maximization of likelihood is attempted with more than ten variables. In serological analysis, it is necessary to define an independent variable for each antigen, so that the total number is of the order of 60-100. Assuming that it takes 1 ms to calculate a single likelihood, it would normally take 10l8 years to formally demonstrate that a maximum had been found in a 60variable problem. In the case of serological analysis, such a formal demonstration is unnecessary. The theoretical work, presented in the appendix, demonstrates that there are no ‘saddle points’ (Spiegel, 1974) on the surface of likelihood. This observation is of the utmost importance to this approach, since

C

318 J . F. Clayton et al. it reduces the necessary calculation to find and demonstrate the maximum likelihood by a factor of the order of 3 x loZh, allowing the maximum to be found in seconds.

The value of this approach to H L A serology is the completeness of the description of the serum’s ability to react with each and every named antigen in the panel of cells tested. This allows the definition of not only those antigens for which there is sufficient evidence to conclude that there is a specific reaction but also those antigens against which any specific reaction, if it exists, is feeble and those for which there is insufficient evidence on which to base any conclusion.

The first of these groups of antigens is an equivalent of the recognition set of classical serological analysis. These antigens are usually ordered by such parameters as chi-squares and correlation coefficients, based on two-by-two analysis, or Q-scores, based on an analysis incorporating the reaction strengths (Sierp & Albert, 1984). In the likelihood approach, four parameters could have been used to order this list of antigens: the maximum likelihood estimate of the probability of a positive reaction, its upper and lower support limits, and the antigen specific log likelihood ratio. In its present form, the computer program uses the log likelihood ratio. Therefore, it emphasizes those antigens for which there is more evidence of a specificity even when the estimate of the probability of a positive reaction is low. It may be better to calculate the distribution of the ratio for each such antigen, given the data set, and to order them on the value of the estimated ‘p’. This will need to await a future implementation of the program.

The second group of antigens effectively define a window of opportunity which is unavailable using other methods of serological analysis. It is important to emphasize that specificity against these antigens has not been excluded, but it is hard to imagine that such a list, with the quantified upper limit on the probability of a positive reaction, would not be of use in decisions on transplantation.

The third group of antigens are those with a wide support limit and includes antigens for which there is a non-zero estiamte of $, but small Xlldr and ones with a zero estimate. These are present in the central part of the table. This group is of considerable theoretical and practical interest. Some members are obvious following simple inspection of the data set: it is highly unlikely that it would be possible to demonstrate a specificity against an antigen present on only two cells. The rest are those antigens for which the antigen-antigen correlations, either due to linkage disequilibrium or to the selection of the data set, conspire to flatten the likelihood surface. If two antigens are very frequently associated in a data set, a reduction in the $ of one can be compensated for by an increase in the $ of the other, the likelihood being maintained. This will produce a wide support limit for the +’s and low log-likelihood ratios of both antigens. Higher order interactions are also possible, with three, four or more antigens interacting to flatten the likelihood surface. Such interactions are fully taken into account by the approach used here and are reflected in the support limits calculated for each antigen.

Future developments of the approach include the possibility of incorporating the observed strength of reactions into the analysis, requiring for each antigen probabilities for each reaction grade and a program to produce likelihoods for cell typing. A project is under way to develop a program to identify serologically defined antigen splits using likelihoods. It is not yet certain if such a program is feasible, but if so, it will provide a means of defining such splits taking into account real or putative linkage disequilibria.

A more prosaic, but perhaps useful, application would be the use of the serum simulation procedure with various sets of experimental cells. The results of simulations of both specific and non-specific sera could be used to optimize the choice of cells in order to maximize the chance of recognizing a specificity, minimize the chance of a false recognition and minimize the number of cells used. Such a procedure could reduce the considerable costs involved in serum, and hence cell typing.

A likelihood approach to HLA serology 319 A C K N O W L E D G M E N T S

This work has benefited by its association with the EEC SPRINT program. The development of the approach has been achieved on a computer bought with grants from the Conseil Regional of the Haute Garonne, France and from the Ligue Nationale Francaise Contre Cancer. The aid of these organizations is gratefully acknowledged.

R E F E R E N C E S

ALBERT, E.D., BARBALLO, T., BAUR, M.P., CHRISTIANSEN, F.T., DEPPE, H., KELLER, E., LUTON, T., MARSHALL, W.H., MCNICOLAS A., RAFFOUX, C., SCHIESSL, B., SCHOLZ, S., SIERP, G. & SIEGMUND, M. (1984) Serological analysis of the data of the ninth Histocompatibility Workshop. In: Histocompatibility Testing 1984 (ed. by E.D. Albert, M.D. Baur & W.R. Mayr), pp. 72-74. Springer-Verlag, Berlin.

BODMER, W.F. (1986) HLA, Immune Response and Disease. In: Human Generics, Proceedings of the 7ih International Congress, Berlin 1986 (ed. by F. Vogel & K. Sperling), pp. 107-1 13. Springer-Verlag, Berlin.

BODMER, W.F. (1987) HLA 1987. In: Immunobiology of HLA, Hisiocompaiibilily Tesiing 1987 (ed. by B. Dupont), Vol. 11, pp. 1-10. Springer-Verlag, New York.

CAMBON-THOMSEN, A, , BOROT, N., NEUGEBAUER, M., SEWN, A. & OHAYON, E. (1989) Inter-regional variability between fifteen French Provinces and Quebec. Collegium Anthropologicum, 13(1), 25.

EDWARDS, A. (1984) Likelihood. Cambridge University Press, Cambridge. FERGUSSON, M., MILFORD, E., WHEELER, R. DUPONT, B. & LALOUEL, J.M. (1987) Data processing and

serum analysis. In: lmmunobiology of HLA, Histocompatibility Testing lY87(ed. by B. Dupont), Vol. I, pp. 106-1 10. Springer-Verlag, New York.

OHAYON, E. & CAMBON-THOMSEN, A. (1986) Human Population Genetics Colloque INSERM, Vol. 142, Paris, France.

SIERP, G. & ALBERT, E.D. (1984) Unbiased serum analysis. In: Histocompatibility Testing 1984 (ed. by E.D. Albert, M.P. Baur & W.R. Mayr), pp. 60-63. Springer-Verlag, Berlin.

SPIEGEL, M.R. (1974) Advanced Calculus. Macgraw-Hill International Book Company, New York.

A P P E N D I X

The form of the likelihood surface In this appendix, the shape of the likelihood surface is investigated algebraically, the aim being t o determine if there are multiple maxima and saddle shapes. In this work, the following symbols are used:

0, L L X

L,, pz Z,-

= the probability of a negative reaction given antigen x = the probability of the data set, or the likelihood of the currently assumed values of 8 = the partial differential of L with respect to 0, i.e. 6 W 6 0 , = the double partial differential S2L/S0,S8, = the product of the 0’s of all other antigens present in the cell = implies a summation over all cells expressing antigen x and which react negatively with

the serum and 2, summation over all positive cells

Therefore the probability of a negative reaction between the serum and a cell with antigen x will by 8, pz and for a positive reaction 1 - 0, pz. Therefore, the overall probability of all the reactions observed will be:

log L = c,- log(0, pz) + c, log(1 - 8, PX) + . . .

320 J . F. Clayton et al. and differentiating with respect to 8, and setting it to zero as in a stationary point:

( 1 ~ ) L, = z,- P J ~ , pZ - ex Pz/(l-ex PA = n/8, - (l /e,) X, (1-f)/f

where f is the probability of the positively reacting cell. So that,

n = Z, (1 - f)/f

where n is the number of cells which express x and have a negative reaction. This expression assumes that 8, > 0, and hence that n > 0.

Differentiating a second time:

(UL) L,, - ( I /L~) (L,)~ = -n/e2, - 2, pZ2/(i - ex (I/L) L,, = -(I/@,)* {n + Z, (1-f)2/P}

= -(i/ex)2 {z, (i-f)/f + c, (i-f)2/P}

= -(i/ex)2 c, (((1-f)f + (i-f)2)/f2}

= -(1/0x)2 2, {(f - f 2 + 1 - 2f + f2)/f2}

= -(l/8x)2 {Z, (1 - f)/f2}

Hence, all second differentials are negative. This implies that all stationary points on the surface are either maxima or saddle shapes (Spiegel, 1974). Similarly:

(1/L) L,, = - (1/8,)2 {c, (1 - f)/f2}

Now expressing the above differential for 8,, but emphasizing the positive cells which express antigen y as well as x:

(UL) L, = n/8 , - C,, 8,p,/(1-8,8,p,) + . . . and differentiating with respect to 8,:

(UL) L,, = -&, {&/f + exe,p,2/f2}

= (1/8,8,) z,, {(l-f)/f + (1-f)2/f2}

= (i/exe,) c,, {(f-p + 1-2f + f2)/f2}

= ( i / e x e y ) cxy ((1 -f)/f2}

It follows that:

L,, . L,, - Lxyz > 0

where x and y do not have the same list of positively reacting cells, and it is zero when the two lists are identical. Hence there are no saddle surfaces on the likelihood surface (Spiegel, 1974). Therefore all stationary points on the likelihood surface are maxima. It follows that there can only be one such point.

Consider now a likelihood function of 3 parameters. The change in the likelihood produced by small increments in each parameter away from the turning point is:

sL = L(e,+m,, ey+se,, e,+se,) + L(e,,ey,8,)

321

This is readily generalizable to any number of parameters, the result of which can be written in terms of individual cells:

- 2 ~ W L = zcclls {(i-f)/fz} 2, {(6e1/e1)2 + 2, (6e,/e1) (se,/e,)} Where,

zcclls implies a summation over all positively reacting cells, (under this view of the likelihood surface, the contribution of negatively reacting cells has been effectively transferred to the positive cells).

Z, implies a summation over all antigens I which are present on the cell which is currently being considered.

ZJ implies a summation over all antigens other than the current antigen I. This can be transformed to:

The expression within the summation for all cells is positive or zero for all possible small values of the increments 6 8 , and hence 6L is always negative. This constitutes a proof that the turning point is a maximum and not a saddle point.

This likelihood function is extremely well behaved. In whatever direction it is approached, it simply rises to a maximum and then falls. Indeed, it is virtually impossible to construct a situation in which the likelihood was better suited to a numerical estimation of its maximum.

The computer program uses equation 1 above as a fast approach to the maximum. This first routine finds values for 8 such that:

L,(e,) > 0 and Lx(8, + Sex) < 0

This produces a reasonable estimate of the maximum likelihood. However, when there is a high correlation in the presence of two antigens, for example, due to linkage disequilibrium, the estimates of 8 can be very inaccurate. Because of this, a second slower algorithm is used to identify a point such that:

L(e, - 6) < L(e,) > L(e, + 6)

For all antigens the turning point in 8, is found to an accuracy f 6 , which is currently set at 0.002. It will be rarely the case that the data justify greater accuracy than this. Again, if there is considerable correlation between two antigens the error in the identification of the maximum could be larger than 6. This is not a major problem in this context due to the fact that support limits are calculated for each and every antigen.

max(L(e(8, = a)} = .e-2 max {L(e)}

The support limit of a 8, is defined as the values a for which,

322 J . F. Clayton et al.

where the operator max {}is the procedure of likelihood maximization for all values of 0, subject on the left hand side to the condition that 8, is held fixed. This point is calculated to an accuracy of 0.05 for those antigens which have an estimated 0 = 1 (i.e. negative antigens) and 0.002 for the others (positive antigens). Such a limit is the likelihood equivalent of a confidence interval (Edwards, 1984).


Recommended