+ All Categories
Home > Documents > Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

Date post: 01-Dec-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
16
Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF GOLOBOFF'S TR~F, FITNESS MEASURE F Hubert Turner I and Rino Zaudee 2 l Rijksherbarium/Hortus Botanicus, Leiden University, PO Box 9514, 2300 RA, Leiden, The Netherlands, and ~Theoretical Biology Section, Institute of Evolutionary and Ecological Sciences, Leiden University, The Netherlands Receivedfor publication 23 November 1994; accepted 18June 1995 Abstract -- Goloboff recently introduced a method of character weighting that can be performed concomitantly with tree reconstruction. The basis for this method is his tree fitness measure F. The behaviour of F is examined for a number of hypothetical and real data sets. It depends strongly on the value of the concavity constant k, and does not seem to be predictable. This makes it difficult to make general recommendations about the appropriate value of k in specific cases. The basis for F, the number of extra steps taken by a character on a tree, does remain valuable as a basis for quality measures of trees, because it is independent of the number of states in the character, unlike the total number of steps and measures based on it such as CI and RC. Although no new measure is developed here, a number of requirements are formulated for an ideal tree quality measure. © 1995 The Willi Hennig Society In~oducfion Recently, Goloboff (1993a) proposed a new scheme for weighting a set of characters. His main concern about previously proposed schemes (e.g. successive weighting: Farris, 1969) was that there are no unambiguous criteria for the weighting procedure and that the resulting trees are not always self-consistent. Thus, the result of successive weighting is dependent on the initial weighting. Self- consistency is the property that a tree implies, under some weighting scheme, weights for the characters that will lead to the same tree when re-analysed. Another way of expressing this is that in the case of character conflict (homoplasy), characters that show' the lowest number of homoplasies are favoured over others that have more homoplasy on the tree under consideration. In other words, the tree itself tells us how much confidence to place in each character. Trees in the set resulting from successive weighting are not necessarily self-consistent because the weight of each character is implied by the resulting set of trees, rather than by each tree individually. Goloboff (1993a) developed a weighting method which weights each character according to the number of extra (homoplasious) steps it takes on a tree. Because the weight of a character depends solely on the number of extra steps it takes on the tree under consideration, it is possible to evaluate each tree without reference ~Present address: Institute for Systematics and Population Biology, University of Amsterdam, P.O. Box 94766, 1090 GT Amsterdam, The Netherlands. 0748-3007/95/010057+16/$12.00/0 © 1995 The Willi Hennig Society
Transcript
Page 1: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

Cladistics (1995) 11:57-72 ( ~

T H E B E H A V I O U R O F G O L O B O F F ' S TR~F, F I T N E S S MEASURE F

H u b e r t T u r n e r I and Rino Z a u d e e 2

l Rijksherbarium/Hortus Botanicus, Leiden University, PO Box 9514, 2300 RA, Leiden, The Netherlands, and ~Theoretical Biology Section, Institute of Evolutionary and

Ecological Sciences, Leiden University, The Netherlands

Received for publication 23 November 1994; accepted 18June 1995

A b s t r a c t - - Goloboff recently in t roduced a m e t h o d of character weighting that can be pe r fo rmed concomitant ly with tree reconstruction. The basis for this m e t h o d is his tree fitness measure F. T he behaviour o f F is examined for a n u m b e r o f hypothetical and real data sets. It depends strongly on the value of the concavity constant k, and does no t seem to be predictable. This makes it difficult to make general r ecommenda t ions about the appropriate value o f k in specific cases. T he basis for F, the n u m b e r of extra steps taken by a character on a tree, does remain valuable as a basis for quality measures o f trees, because it is i ndependen t of the n u m b e r o f states in the character, unlike the total n u m b e r o f steps and measures based on it such as CI and RC. Al though no new measure is developed here, a n u m b e r o f requi rements are formulated for an ideal tree quality measure.

© 1995 The Willi Hennig Society

I n ~ o d u c f i o n

Recently, Goloboff (1993a) proposed a new scheme for weighting a set of characters. His main concern about previously proposed schemes (e.g. successive weighting: Farris, 1969) was that there are no unambiguous criteria for the weighting procedure and that the resulting trees are not always self-consistent. Thus, the result of successive weighting is dependen t on the initial weighting. Self- consistency is the property that a tree implies, under some weighting scheme, weights for the characters that will lead to the same tree when re-analysed. Another way of expressing this is that in the case of character conflict (homoplasy), characters that show' the lowest number of homoplasies are favoured over others that have more homoplasy on the tree under consideration. In other words, the tree itself tells us how much confidence to place in each character. Trees in the set resulting from successive weighting are not necessarily self-consistent because the weight of each character is implied by the resulting set of trees, rather than by each tree individually.

Goloboff (1993a) developed a weighting method which weights each character according to the number of extra (homoplasious) steps it takes on a tree. Because the weight of a character depends solely on the number of extra steps it takes on the tree under consideration, it is possible to evaluate each tree without reference

~Present address: Institute for Systematics and Populat ion Biology, University o f Amsterdam, P.O. Box 94766, 1090 GT Amsterdam, The Netherlands.

0748-3007/95/010057+16/$12.00/0 © 1995 The Willi Hennig Society

Page 2: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

58 H. TURNER AND R. ZANDEE

to o ther trees, just as the total number of steps on a tree is independent of the number of steps on other trees. Goloboff's weighting scheme therefore allows character weighting to proceed simultaneously with tree reconstruction. This has the advantage of requiring only a single pass through the data in order to come up with the "best" trees. Whether these trees are actually self-consistent is a point not addressed by him.

Goloboff 's formula for the weight of a character (orfit) is:

f,= (k+ l ) / ( s~-~+k+ l) (1)

where si is the actual number of steps, observed for character i on a particular tree, is the minimum number of steps possible for character i (i.e. the number of

states minus 1), and k is a constant of concavity added in order to influence how severely homoplasious characters are down-weighted. The total fitness F for a tree equals Yr. Goloboff notes that "[t]he degree of concavity that should be preferred . . . remains to be investigated". It is our intention to investigate here some properties o f f and ofF, and thus at tempt to answer Goloboff's question.

Behaviour o f f and F

From Equation (1) it can readily be seen t h a t f depends on the number of extra steps (ES) s,-m,-. For ES=0 (i.e. for a perfectly fitting character), f is maximal and equal to 1. As the number of steps increases, f decreases. The weighting function is concave, more precisely a hyperbola. It reaches its lower limit (but >0) when ES is maximal. This will always be the case on a completely unresolved tree, but may also occur on (partly) resolved trees. For increasing values of the concavity constant k, the steepness of the hyperbola decreases. For values of k approaching infinity, f approaches 1; in o ther words, all characters are weighted equally independen t of the value of ES. This behaviour o f f is very straightforward and needs no fur ther comment.

The behaviour of F is less easily predicted. Because F=Zf, and f for each character depends on the topology of the tree, there is no straightforward relationship between F and, e.g. the total number of extra steps. The only thing that can be concluded directly is that, as k-->00, F becomes equal to the number of characters in the data matrix, and thus the lengths of trees under implied weights become equal to their lengths under unweighted parsimony analysis. Thus, for k=00, selecting trees with the highest fit becomes equal to selecting the most parsimonious trees under equal weights (MPTs). However, for low values of k this is not necessarily the case.

We can think of no biological reason for preferring one particular value of k over any other. A reasonable initial guess would seem to be k=0, or k=00. The latter is equal to not weighting at all, as shown above, and k=0 may be too strong a weighter, according to Goloboff (1993a). We chose to investigate a number of different hypothetical and real data sets in order to elucidate the behaviour of F for different values of k. Because the computer program Pee-Wee, in which Goloboff (1993b) implemented F, only allows values of k up to 5 to be used, we calculated Fvalues for different values of k and different trees using a spreadsheet

Page 3: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

TREE FITNESS 59

a 0 0 0 0 0 b 0 0 0 0 1 c 0 0 0 1 1 d 0 0 1 1 1 e 0 1 1 1 1 f 1 1 1 1 1 g 1 1 1 1 1

a

b

C

d

e

f

g

5.00

4.75

3.75

3.50

3.00

2.75 \ k = 0

2.50 0 1 2 3 4 5 6

Extra s teps

Fig. 1. Data matrix All7, most parsimonious tree, and diagram of maximum and minimum fitness values vs. ZES for all possible different tree topologies at different values of k.

p r o g r a m a n d an APL p r o g r a m m i n g e n v i r o n m e n t . Us ing hypo the t i ca l da t a sets for l imi t ed n u m b e r s o f taxa a l lowed us to invest igate all poss ib le t rees for these da t a sets. T h e t rees s u b m i t t e d for the hypo the t i ca l da t a sets were u n r o o t e d , because the n u m b e r o f c h a r a c t e r state changes , a n d thus t , d e p e n d s on ly o n the t opo logy o f the u n r o o t e d t ree (ne twork) .

F o r a da t a set in which all cha rac te r s a re c o n g r u e n t , h e r e e x e m p l i f i e d for seven taxa (ma t r ix All7, Fig. 1), the b e h a v i o u r o f f is r ea sonab ly regula r . T h e t ree tha t fits the da t a per fec t ly (no homoplasy , i.e. n o ex t r a steps) receives the h ighes t F

Page 4: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

60 H. T U R N E R AND R. ZANDEE

a 1 0 0 0 0 b 0 0 0 0 1 c 0 0 0 1 1 d 0 0 1 1 1 e 0 1 1 1 1 f 1 1 1 1 1 g 1 1 1 1 1

a

b

C

d

e

f

g

5.00

4.75

4.50

4.25

4.00

\ k=3

='" // ' i \ \ . , 3.50

3.25 / \ \ k = 1

\1 2.75

2.50 k = O 2 3 4 5 6 7

Extra steps

Fig. 2. Data matrix A117_2, most parsimonious tree, and diagram of maximum and minimum fitness values vs. ZES for all possible different tree topologies at different values of k.

v a l u e o f 51 . T h i s v a l u e is e q u a l to t h e n u m b e r o f c h a r a c t e r s in t h e ma t r i x , a n d is

i n d e p e n d e n t o f t h e v a l u e o f k. U p to t r ees wi th t h r e e e x t r a s teps t h e b e h a v i o u r o f F

is r e g u l a r , d e c r e a s i n g l i nea r ly wi th t h e n u m b e r o f e x t r a steps. F o r t r ees w i t h f o u r

e x t r a s teps this r e g u l a r i t y b r e a k s d o w n : two d i f f e r e n t F v a l u e s a r e o b s e r v e d . T r e e s

~Note that, in PeeWee, F values are multiplied by 10 and are corrected for autapomorphies, thus resulting in a value of 40 for matrix All7. In addition, the concavity index in Pee-Wee, set with the command CONC, equals k+l.

Page 5: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

TREE FITNESS 61

a 1 1 0 0 0 1 0 0 0 0 b 1 0 0 0 1 1 0 0 0 0 c 0 0 0 1 1 1 0 0 1 1

d O O 1 1 1 0 O O 1 1 e 0 1 1 1 0 0 1 1 1 1

f 1 1 1 0 0 0 0 1 1 0 g 1 1 1 0 0 0 1 0 0 1

a

b

g

f

e

d

00 12 1 1

1 1

1 1 11 ~ 1 1 ~ ~ ~ 10

~ k = 9 01 10 ~

/ f 9 ~ ~ ~ ~,.. ~ ~ k = 3

\ k - - 2

d 7 \ ~ " ~

\ k : l

\ \ 6 ~

k--O

9 10 11 12 13 14 15 16 17 18 19 20

Extra steps

Fig. 3. Data matrix All7hl I, most parsimonious trees, and diagram of maximum and minimum fimess values vs. ZES for all possible different tree topologies at different values of k.

with ZES=5 or 6 again all have the same Fvalue. This behaviour is consistent for all values of k.

Introducing a single homoplasy (matrix Al17_2, Fig. 2) increases the range of ZES values for which F shows hysteresis. Equally interesting, for EES=4 some trees have a lower value of F t h a n some trees for which ZES=5, at least for k=0. For k=l, the best Fvalue for ZES=5 becomes equal to the worst Fvalue for ZES=4. For k=2 and higher, all trees with ZES=4 have better Fvalues than any tree with ZES=5.

For matrix AU7hll (Fig. 3), for which the two MPTs have ZES=9, similar behaviour is observed, but here some trees for which EES=10 have a better F t h a n any MPT, at least for k=0. For k=l the MPTS have Fvalues equal to those for some trees with ZES=10, while for values of k>l the trees with the highest Fbe long to the set of MPTs.

For the real data matrices Fordia (Schot, 1991; Table 1) and Arytera (Turner, 1995; Table 2) similar behaviour is observed. Fordia represents a data matrix for

Page 6: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

62 H. TURNER AND R. ZANDEE

which t h e r e a re 111 MPTs o f l e n g t h 165 (CI=0.47, RI=0.63) with few u n k n o w n da t a a n d a r ea sonab ly reso lved consensus t ree. Ary te ra is mess ie r in tha t t h e r e a re m a n y u n k n o w n entr ies , b u t the n u m b e r o f MPTs is m u c h smal l e r 17 ( l eng th 336, CI=0.30, RI=0.59). Due to the n u m b e r o f taxa for these ma t r i ces (n=19 a n d n=33, respect ively) no full eva lua t ion o f all t ree t opo log i e s was possible . Ins tead , Pee-Wee was used to f ind all f i t test t ree t opo log i e s for k values u p to 5, us ing the o p t i o n mult*50. In add i t i on , the F values for the set o f MPTs a n d the best-f i t t ing t rees f r o m P e e W e e were eva lua t ion for k values u p to 49. T h e resul ts a re p r e s e n t e d in Tab les 3 a n d 4.

As can b e seen in Tab le 3, for Ford ia , a m o n g the set o f MPTs, k values u p to 19 show the same five t rees to be fittest; above k=19, however , two d i f f e r en t t rees a re f i t test (at leas t u p to k=10 000). T h e f i t test t rees at k=0 have l eng ths o f 172 a n d 173 o n the u n w e i g h t e d da t a ma t r i x (ZES=93 a n d 94, respect ively) ; at k=l a n d k=2 the s ame t ree o f l e n g t h 167 is fittest; at k=3 a n d k=4 the f i t test t ree has l eng th 166; a n d at k=5, in a d d i t i o n to two M F r s , the same t ree as for k=3 o r 4 is fittest. However , p r o b a b l y d u e to shor tcu t s t aken d u r i n g fi tness ca lcu la t ion , the resul ts r e p o r t e d by P e e W e e a re inexact . As can be seen in T a b l e 3, the two t rees r e p o r t e d as f i t test at k=0 actual ly d i f fer in f i tness by 0.008 (i.e. wi th in the m a r g i n r e p o r t e d by Pee-Wee) ; addi t iona l ly , the two MPTs r e p o r t e d as f i t test a t k=5 actual ly r a n k as s e c o n d a n d th i rd , respect ively, n o r d o they b e c o m e the f i t test MPTs at any value above k=5. We can thus n o t g u a r a n t e e tha t t rees o b t a i n e d by Pee-Wee ( and r e p o r t e d he re ) as f i t test for k=0-5 a re i n d e e d the best-f i t t ing ones .

F o r Ary te ra (Tab le 4) at k=0, t h e r e a re t h r e e f i t test t rees o f l eng th 357 (ZES= 256); at k=1-4 the same t ree is se l ec ted each t ime, o f l eng th 347; a n d at k=5 aga in t h r e e trees, this t ime o f l e n g t h 338, a re fittest. N o n e o f these t rees is in the set o f MPTs, however . T h e fi t test MPTs a re the same t rees for all values o f k (at least u p

Table 1 Data matrix Fordia (Schot, 1991). Characters 13 and 14 ordered, all others unordered.

Species Character number

0000000000 1111111111 2222222222 3333333333 44444 0123456789 0123456789 0123456789 0123456789 01234

MiUet~apu~hra 1111311231 5451112111 1111321211 1111411114 12111 Fordia albiflora 1111123122 5321114322 121232223? 1233422111 11133 F. brachybotrys 1 2 1 2 1 2 3 2 2 1 3432113221 3121121111 2124212113 12122 ~brac~o~ta 1111123112 4321111112 1212322237 1233422111 21133 F. cau~flora 1221412221 2433212321 3121111112 2112411113 12133 ~incredi~lis 1222223222 4323222322 2211211231 1213422112 11133 ~johorensis 1121221132 1113214321 3121121132 2131212111 11233 ~ n ~ o ~ t a 1221213111 1332212121 3121121112 2133111111 11211 F. lep~botrys 1121223112 5321214322 221122123? 1233112111 11122

ngii 1111123122 5322212122 2211221131 2232212111 13133 nivea 1121223122 5322212122 2211211131 1232212111 13133

F. ophirensis 1221321231 2321212221 3121171177 2132311113 13133 F. paudflora 1221211232 4223312121 3121121122 2133121122 11133

rheophy~ca 1112223221 3532212121 3121121111 2112411214 12111 ~splendid~ma 1212222221 3642212121 3121121111 2124411213 12112 F. sfipu~ris 1222223211 3442212121 3121171177 2112411113 12113 E unifo~o~ta 2 1 1 1 1 2 3 1 1 2 1111214122 121231223? 1232422112 11133

spec. a 1221223222 4323224422 2211121131 1212422112 11133 F. spec. b 2111114231 3533212121 3121121111 2134211113 12111

Page 7: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

TREE F I T N E S S 63

-d

o

g N

e~

r , . - ~ r.-- t '-- t '-- ',~D

t--. c,'~ L'-- L~l t - - . ~ t - - - ~

V'~OO

tr~ C,~ V ' ~ C q

N

C"} O'~ C'~OO C'~ t " -

( ' , 1 ~

{~ l L"- ( ' , l V'~ ( '~ , ,¢ ( '4 C'~ (',1 ( 'q k " q ~

t " -

~ { , r )

~ : ~ ~ g ~ ~ ~ g ~ ~ ~ ~ ~ ~ ~ ~ ~ N ~ ~ ~ •

• -~-.. - ~ . ~ . ~ . ~ - - ~ : R ~ . . ~ . ~ ~: ~ . ~ - ~ - ~ ~

~ ~ ~ ~ i ~ ~ ~ ~ ° ° ~ ~ ~ . _ ~ . ~ - - ~ ~ ==~~°. . .~~._~.~. .~.~._~~-- ~.~.~ ~. ~ ~_~" . .~ ~ ~..~.~__.~.-- ~:.

~ . ~ ~ . ~ ~ . ~ ~ ~ . ~ ~ ~ ~ ~ ' ~ ~ ~ ~ ' ~ ~ ' ~ ~ ~ ~ . ~ ~

. ~ ~ . ~ .~ . ~ ~ ~ . ~ . ~ ~ ~ ~ .

~ ~ ~ ~ J ~ ~ ' ~ , ~ ~ ~ -

~ ~.. ~ ~ . . N - - ~, ~ ' ~

.s ~ ~ - ~ ~ - ~ . ~ , ~ ~ - ~ _ ' ~ o.~ ~..s ~"~ t~ ~ . ~ =L~ ~ - - ~ '~"a I

Page 8: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

64 H. TURNER AND R. ZANDEE

t~ ..=

°~

O

.== ,,,

o O

..=

~ ~ii~ ~ ~ i ~ ~ ~

~ ~ ~ ~ ~ i i ~ i ~ ~ ~ ~ ~ ~ o o o o o ~ . . . . . . . ~

Page 9: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

TREE FITNESS 65

0

M ~ M M ~ M M ~ M ~ M M M ~ M ~ M M M M M M M

m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m

~ N N

m m m m m m m m m m m m m m m m m m m m m m m m m m m ~ m m

~ M M M M M ~ M M M M M M M M M M M M M M M M M M M M M M ~ m m m m m m m m m m m m m m m m m m m m m m m m m m m ~ m

~ ~ ~ ' ~ ' ~ ~ ~ ' m m m m m m m m m m m ~ m m ~ m m m m m m m m m m m m m m ~

~ N N ~ N M N N N ~ N N N N N N N ~ N ~ N ~ N ~ m m m m m m m ~ m m m m m m m m ~ m m m m m m m m m m m m m

m m m m m m m m m m ~ m m m m m m m m m m m m ~ m m m m m m

m m m m m m m m ~ o ~ o

~ N m m m m m m m m m ~ m ~ - - ~

o o

Page 10: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

66 H. TURNER AND R. ZANDEE

i

~ ~ ~ . ~ . . . ~ ~ ~

m m m m m ~ m ~ m m m m m m m m ~ m m m m m ~ m

~ . ~ - ~ . . . . . ~ - ~ - ~ . ~ . . . . . .

m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m

m m m ~ m ~ ~ ~ ' ~ ~ ~ ~

m m m m m m m m m m ~ m m m m m m m m m m m m m m m m m

~ m m m m m m m M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M m m m m m m m m m m m m m m m ~ m m m m m m m m m m m m m m

- ~ - ~ . . . . . . . ~ . ~ . . ~ . ~ . . . ~

~ N ~ N N N N ~ N N N ~ N N N ~ N N N N

g N N N N N N N N N N ~ N N N N N N N N N N N N N N N N N N

N N N N N N N N ~ N N N N N N N N N N N N N ~ N N N N ~ N

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . M N N N N N N N N N N N N N N N N N N N ~ N N N N N N N N N ~

Page 11: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

TREE FITNESS 67

] o

N ~ N N ~ N N ~ N N N N N ~ ~ ~ e

. ~ ~ . ~ ~ ~ . .

~ ~ ~ . ~ . . - ~ . . ~ ~ . . . . .

m o

m m ~ m m m ~ m ~ m m ~

Page 12: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

68 H. TURNER AND R. ZANDEE

8

~ °

,"

~ q q q q q q q q q q q q q q ~ ~ ~ 0 0 0

~i ~~ i~ ~ ~i'i ~ ~ ~ ~ .~ . ~ . . . . . ~

m m m

. . . . . . . . . N N N N

N N ~ N N N ~ N N N N N ~ N ~ N ~ N ~ N

. . . . . . ~ ~ ~ ~ ~

. . . . . . . . . . . . . ~ ~ ~ ~ - ~ ~

Page 13: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

TREE FITNESS 69

64

63

62

57

~ ~ ~ ' l p P " k =9

) /

56 ~

55

k = l

31 ""'|r | | |

29

k = O

28 235 236 237 238 239 240

Extra steps

Fig. 4. Minimum and maximum Fvalues for the 17 MPTs and 978 additional trees up to five steps longer, at different values of k.

to 10 000). MPTs only b e c o m e fitter than the fittest trees at k=-0-5 for values of k> 11 (Table 4). In add i t ion to these trees, 978 non-MPTs with lengths up to 341 were gene ra t ed us ing PAUP version 3.1.1 (Swofford, 1993). The m i n i m u m a nd m a x i m u m F values were calculated for these trees with k values up to 49. The results (Fig. 4) show that the MPTs are fittest a m o n g this set of trees only for values of k>4.

Discussion and Conclus ions

The behav iour descr ibed here seems to indicate that there is a m i n i m a l value of

Page 14: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

70 H. TURNER AND R. ZANDEE

k above which all fittest trees are MFrs. We offer no p roo f for this conjecture but in our exper iments we have not me t a single counterexample . For lower values of k less pars imonious (under equal weighting) trees may have a bet ter fitness. The conflict that arises when faced with a choice between a set o f MPTs for a particular unweighted data set and a different set o f trees that is fittest according to their F value seems to disappear when the value of k is chosen to be sufficiently high. The border l ine case appears to be related to the m a x i m u m ES; value over the set of MPTs, ESMrrl, m~ For matrix AI17_2, the maximum EsMPTI,,.~ value is 1, for matrix AU7hl l it is 2. For values of k below this borde r value the fittest trees are not necessarily part o f the set o f MPTs, but may have a higher ZES value; for such values of k, which particular set of trees is fittest may vary with the value of k. For

ES i,m~x equals 7 and 11, respectively. For these matrices Fordia and Arytera the MPT alSO, EsMPTi.m~ functions as a border l ine value for k, below which not all fittest trees are in the set o f MPTs.

Within the set o f MPTs, which trees are fittest may also depend on the value of k (e.g. for Fordia). Getting rid of a step in a very homoplasious character at the expense of acquiring one in a character with good fit will always result in a decrease in F, because the decrease in fi for the bet ter character is larger than the corresponding gain for the worse one, regardless of the value of k. However, gaining extra steps in a good character while losing one each in several slightly more homoplasious ones does not necessarily result in a decrease of F. Only as k--~o0 will the difference in F values between equally parsimonious trees become infinitely small or, in o ther words, will all MPTs become equally fit. This behaviour makes it necessary to investigate whether the MPTs that are selected as fittest remain the same above a certain value of k. For the Fordia matrix, at k>19, different MPTs are chosen than at lower values of k. This shows that nei ther EsMrrI,,~ nor the absolute m a x i m u m for ES over the data matrix, i.e. its value on a completely unresolved polytomy for the taxa, ES;.,~ (10 for Fordia), is the border l ine value. Because P e e W e e allows only values of k up to 5 to be evaluated, we could not check whether such a border l ine value of k exists. Such an investigation will have to wait till this constraint is removed f rom the program.

Goloboff 's concern that MPTs may not be self-consistent,is un founded if self- consistency can be equated with m a x i m u m fitness according to his formula for F. At sufficiently high values of k at least some MPTs are always in the set of fittest trees. Moreover, character weighting presupposes that some characters are phylogenetically more informative than others. Even taking this assumption for granted, it does not follow necessarily that more homoplasious characters are per se

less reliable as indicators of phylogeny. Rather, in our view, homoplasy is in the first place the result o f incorrect assumptions of homology and should be treated as such by re-assessing these assumptions in the light o f the initial analyses (cf. e.g. Hennig, 1966; Bryant, 1989). I f the possibilities for re-assessment of homology statements (i.e. hypotheses of synapomorphies) have been exhausted, the remaining homoplasy may be indicative of the unreliability of the affected characters as markers of phylogeny, but still not necessarily so. In itself, the re- assessment of homology assumptions constitutes reweighting of the characters, but based on biological grounds (observations made on the specimens), until all characters are equally reliable as markers of phylogeny. The only remaining basis for weighting characters then becomes a parsimony argument , namely in order to

Page 15: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

TREE FITNESS 71

select those MPTs in which the m ax imum n u m b e r of characters are congruent. This can be done by adjusting k so that a subset o f the MPTs is selected, or by applying o ther measures based on the same line of reasoning (e.g. OCCI [Rodrigo, 1999] or average RI [Turner, 1995]). We can see no foundat ion for preferr ing any particular k value that does not result in a subset of the set o f MPTs. The same criticism (that weighting may lead to sets of trees not in the set o f MPTs) can be expressed towards successive weighting (cf. Platnick et al., 1991).

Above a certain threshold value of k the subset of fittest MPTs seems to remain stable. This raises the possibility that this k value can be used as a basis for an index of the quality of the data set as a whole. Such an index would have at least the desirable propert ies that, unlike CI and RC, it is independen t o f the n u m b e r of taxa in the matrix and independen t of variation in the min imum number of steps due to characters with different numbers of states, because nei ther influences the n u m b e r of extra steps. Data sets with different numbers of characters are not directly comparable , however. As the n u m b e r of characters increases, so will F. Possibly, the fitness index, FIk

FIk = (F-Fml.) / (F.~-Fmi.) (2)

(where Fm~ is the F value for a data matrix of equal dimensions but without homoplasy and thus equal to the n u m b e r of characters in the data matrix, and Fmin is the Fvalue for a completely unresolved tree on the matrix unde r consideration) will be comparable across different data sets, at least for fixed values of k. As k--->o0, FI--~I for any data matrix, so k should be set as low as possible in order to obtain m a x i m u m resolution. When compar ing two data sets, the k value above which the set of selected MPTs remains stable for both matrices might be chosen as the k value at which to calculate FI.

We have shown above that F is not well-behaved in that (1) different values of k may result in different sets o f fittest trees, and (2) even within the set o f MPTs, the fittest tree may depend on the value of k. This behaviour is not completely unexpec ted because' different weighting schemes (i.e. different values of k) are expected to give different results. Our (and GoloboliVs) initial question as to the appropr ia te value of k could not be answered. There seems to be no foundat ion for any particular choice. Therefore , F seems inappropr ia te as a tool for weighting characters. At best, it may serve as a secondary tool with which to select a subset o f the set of MPTs for the unweighted data set. The value of k above which the selected subset remains stable may serve as an indication of the quality of the data set as a whole. In addition, F may form an appropr ia te basis for a general quality measure which can be used to compare different data sets.

Nevertheless, the concept o f count ing n u m b e r of extra steps remains valuable in that it is independen t o f n u m b e r of states per character, unlike CI and RC. In addition, the n u m b e r of extra steps a character takes on a tree depends solely on the topology of the tree in question, and can be calculated for any optimization scheme. These are valuable propert ies which make ESI a good basis for a quality measure of trees because it allows trees to be selected or discarded independent ly of other trees and therefore requires only a single pass through the data, thus retaining the advantages of implied weighting. An ideal function for implied weighting should also be independen t o f any buffering constant, unlike Fwhich is

Page 16: Cladistics (1995) 11:57-72 (~ THE BEHAVIOUR OF …

72 H. TURNER AND R. ZANDEE

dependen t on k, Other requirements can be formula ted that should be met by such a weight function. These are the following:

(1) The measure must differentiate between trees of different length, preferr ing the most parsimonious trees. The weight function should have a combinat ion of ESI and the total n u m b e r of (extra) steps for all characters on the tree in the denominator , i.e. it should weight against longer trees. (2) I t should differentiate between trees of the same length, both within the set o f MPTs and within sets o f trees with a fixed larger n u m b e r of extra steps. The weight function should have in its numera to r a pa ramete r describing the difference in degree of homoplasy among characters on the same tree.

\

Thus the general fo rm of a weight function is:

W ~ = Ywi wl = g(homoplasy difference) / h(ESl, ZES) (a)

The exact fo rm of functions g and h will have to be de te rmined in future research. On the basis o f such a weight function, by analogy to Equation (2) a general quality measure can be devised, which is independen t o f the size of the data matrix, and of the kind and relative frequency of different types of characters (binary or multistate).

Acknowledgements We would like to thank an anonymous referee for pointing out several

inconsistencies in the original manuscr ipt and making suggestions which have improved its quality.

REFERENCES

BRYANT, H. N. 1989. An evaluation of cladistic and character analyses as hypothetico- deductive procedures, and the consequences for character weighting. Syst. Zool. 38: 214-227.

FAmus, J. 1969. A successive approximations approach to character weighting. Syst. Zool. 18: 374-385.

GoLoao~, P. A. 1993a. Estimating character weights during tree search. Cladistics 9: 83-91. GoLo~o~v, P. A. 1993b. Pee-Wee. Ver. 2.0. American Museum of Natural History, New York. HE~rNm, W. 1966. Phylogenetic Systematics. University of Illinois Press, Urbana. PLAa'NICV,, N., C. A. CoDDr~oTO~, 17,. R. FOaS-rER AND C. E. GVaSWOLD. 1991. Spinneret

morphology and the phylogeny of Haplogyne spiders (Araneae, Araneomorphae). Am. Mus. Novit. 3016: 1-73.

RoDvac, o, A. G. 1992. Two optimality criteria for selecting subsets of most parsimonious trees. Syst. Zool. 41: 33-40.

SCHOT, A. M. 1991. Phylogenetic relations and historical biogeography of Fordia and Imbralyx (Papilionaceae, Millettieae). Blumea 36: 205--234.

Swo~voRo, D. 1993. PAUP--Phylogenetic analysis using parsimony. Ver. 3.1.1. Illinois Natural History Survey, Champaign, Illinois.

TURNER, H. 1995. Cladistic and biogeographic analyses of Arytera Blume and Mischarytera gen. nov. (Sapindaceae), with notes on methodology and a full taxonomic revision. Blumea Suppl. G: 1-230.


Recommended