Using baboon "mothering" behavior to help teach Permutation tests

transcript

Using baboon "mothering" behavior to help teach

Permutation tests

Tom Moore

Grinnell College

Sept. 14, 2010

Plan for presentation

• A case study in baboon behavior

• Descriptive analysis

• The question of statistical significance

• Review permutation tests

• Finish baboon case study

• Statistical lessons

See previous CAUSE webinar:

• "Concepts of Statistical Inference: A Randomization-Based Curriculum"with Allan Rossman & Beth Chance, Cal Poly - San Luis Obispo; John Holcomb, Cleveland State University

• 2:00 to 2:30 pm Eastern time, April 14th, 2009

Infant handling examples

The data (handout Table 1) HANDLERS ranks

INFANTS/ KM KN NQ PO HQ LL NY PS SK ST WK AL CO DD LS LY MH ML MM PA PH PT RSMothers 1 1 1 1 | 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 3 3 3 3 ranks KG/KM 1 0 0 4 1 | 1 0 0 0 3 1 0 | 0 0 0 0 0 0 0 0 0 0 2 1 HZ/HQ 2 13 23 7 5 | 0 2 1 1 5 6 18 | 1 6 3 0 1 4 1 0 9 0 10 1 LC/LL 2 4 0 1 4 | 3 0 2 1 1 5 3 | 1 0 0 1 0 2 1 1 1 0 1 6 NK/NY 2 12 4 10 5 | 9 1 0 2 3 11 7 | 8 6 3 1 0 2 1 1 5 3 2 3 PZ/PS 2 1 3 4 1 | 0 0 0 0 0 0 2 | 0 2 0 0 0 3 0 1 1 0 3 0 CY/CO 3 2 2 7 3 | 1 1 2 0 3 12 16 | 3 0 2 0 0 2 0 0 1 0 0 2 LZ/LS 3 1 0 3 2 | 1 1 0 0 2 0 5 | 2 2 2 0 1 9 2 0 0 0 3 2 MQ/ML 3 0 1 5 2 | 2 4 2 2 2 4 5 | 7 5 2 1 1 7 0 4 4 1 0 2 MW/MH 3 3 0 7 4 | 2 3 0 5 2 8 13 | 7 14 2 0 0 0 4 0 8 0 13 6 MX/MM 3 2 3 4 5 | 0 0 0 0 0 5 2 | 9 3 1 0 0 2 0 0 1 2 2 3 PK/PH 3 2 0 6 4 | 3 4 1 0 0 15 10 | 8 5 1 0 3 1 1 6 3 0 7 5

HANDLERS ranks KM KN NQ PO HQ LL NY PS SK ST WK AL CO DD LS LY MH ML MM PA PH PT RS 1 1 1 1 | 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 3 3 3 3 INFANTS/ Mothers ranks KG/KM 1 0 0 4 1 | 1 0 0 0 3 1 0 | 0 0 0 0 0 0 0 0 0 0 2 1 HZ/HQ 2 13 23 7 5 | 0 2 1 1 5 6 18 | 1 6 3 0 1 4 1 0 9 0 10 1 LC/LL 2 4 0 1 4 | 3 0 2 1 1 5 3 | 1 0 0 1 0 2 1 1 1 0 1 6 NK/NY 2 12 4 10 5 | 9 1 0 2 3 11 7 | 8 6 3 1 0 2 1 1 5 3 2 3 PZ/PS 2 1 3 4 1 | 0 0 0 0 0 0 2 | 0 2 0 0 0 3 0 1 1 0 3 0 CY/CO 3 2 2 7 3 | 1 1 2 0 3 12 16 | 3 0 2 0 0 2 0 0 1 0 0 2 LZ/LS 3 1 0 3 2 | 1 1 0 0 2 0 5 | 2 2 2 0 1 9 2 0 0 0 3 2 MQ/ML 3 0 1 5 2 | 2 4 2 2 2 4 5 | 7 5 2 1 1 7 0 4 4 1 0 2 MW/MH 3 3 0 7 4 | 2 3 0 5 2 8 13 | 7 14 2 0 0 0 4 0 8 0 13 6 MX/MM 3 2 3 4 5 | 0 0 0 0 0 5 2 | 9 3 1 0 0 2 0 0 1 2 2 3 PK/PH 3 2 0 6 4 | 3 4 1 0 0 15 10 | 8 5 1 0 3 1 1 6 3 0 7 5

High-ranked female handles mid-ranked infant: Female NQ handles Infant NK 10 times

NK’s mother is NY

KM KN NQ PO HQ 1 1 1 1 | 2

INFANTS/

Mothers

KG/KM 1 0 0 4 1 | 1

HZ/HQ 2 13 23 7 5 | 0

LC/LL 2 4 0 1 4 | 3

NK/NY 2 12 4 10 5 | 9

PZ/PS 2 1 3 4 1 | 0

NK’s mother is NY

KM KN NQ PO HQ

1 1 1 1 2

KG/KM 1 0 0 4 1 1

HZ/HQ 2 13 23 7 5 0

LC/LL 2 4 0 1 4 3

NK/NY 2 12 4 10 5 9

PZ/PS 2 1 3 4 1 0

NK’s mother is NY

Dominance scores

• The data are related to interactions between two females and the infant of one of them, i.e., triads

• But dominance scores are determined by a prior set of observations involving just two females, i.e., dyads.

• Dominance scores range between -22 and 22, but we create from them a categorical variable of High(1), Mid(2), and Low(3) ranks.

Dominance scores

The variables

• Handler rank: high(1), mid(2), low(3)

• Infant rank: high(1), mid(2), low(3)

• The number of interactions between a given infant-handler pair

Primary question

Is infant handling behavior related to the dominance hierarchy rankings of females in the troop?

Research hypotheses

1. Females will tend to handle the infants of females who are ranked the same as or lower than themselves. (RH1)

2. Females will tend to handle the infants of females who are ranked directly below them (or same rank if female is low-ranked). (RH2)

The data (handout) HANDLERS ranks

INFANTS/ KM KN NQ PO HQ LL NY PS SK ST WK AL CO DD LS LY MH ML MM PA PH PT RSMothers 1 1 1 1 | 2 2 2 2 2 2 2 | 3 3 3 3 3 3 3 3 3 3 3 3 ranks KG/KM 1 0 0 4 1 | 1 0 0 0 3 1 0 | 0 0 0 0 0 0 0 0 0 0 2 1 HZ/HQ 2 13 23 7 5 | 0 2 1 1 5 6 18 | 1 6 3 0 1 4 1 0 9 0 10 1 LC/LL 2 4 0 1 4 | 3 0 2 1 1 5 3 | 1 0 0 1 0 2 1 1 1 0 1 6 NK/NY 2 12 4 10 5 | 9 1 0 2 3 11 7 | 8 6 3 1 0 2 1 1 5 3 2 3 PZ/PS 2 1 3 4 1 | 0 0 0 0 0 0 2 | 0 2 0 0 0 3 0 1 1 0 3 0 CY/CO 3 2 2 7 3 | 1 1 2 0 3 12 16 | 3 0 2 0 0 2 0 0 1 0 0 2 LZ/LS 3 1 0 3 2 | 1 1 0 0 2 0 5 | 2 2 2 0 1 9 2 0 0 0 3 2 MQ/ML 3 0 1 5 2 | 2 4 2 2 2 4 5 | 7 5 2 1 1 7 0 4 4 1 0 2 MW/MH 3 3 0 7 4 | 2 3 0 5 2 8 13 | 7 14 2 0 0 0 4 0 8 0 13 6 MX/MM 3 2 3 4 5 | 0 0 0 0 0 5 2 | 9 3 1 0 0 2 0 0 1 2 2 3 PK/PH 3 2 0 6 4 | 3 4 1 0 0 15 10 | 8 5 1 0 3 1 1 6 3 0 7 5

[,1] [,2] [,3] [1,] 5 5 3 [2,] 97 83 95 [3,] 68 138 184

X=handler rank; Y=Infant rank

Handler's rank Hi Mid Low Infant Hi 5 5 3 Rank Mi 97 83 95 Lo 68 138 184 Totals: 170 226 282 (A)Counts

Handler's rank Hi Mid Low Infant Hi 2.9% > 2.2% > 1.1% Rank Mi 57.1% > 36.7% > 33.9% Lo 40.0% < 61.1% < 65.0% (B)Column%

Residuals

Handler's rank Hi Mid Low

Infant Hi 0.96 0.32 -2.33

Rank Mi 3.38 -0.91 -1.81

Lo -3.01 0.70 1.71

Residuals

Handler's rank Hi Mid Low

Infant Hi 0.96 0.32 -2.33

Rank Mi 3.38 -0.91 -1.81

Lo -3.01 0.70 1.71

Is the result Statistically Significant?

Permutation test: simpler example

• Do birds react to observer clothing color?

• Species-confidence hypothesis

• Field experiment (1997):– Treatments: vest worn vs. vest not worn– Response: approach distance– n=19 different transects, widely spaced– Transects are randomly allocated to “vests” or

“no vests.”

Approach distance with and without wearing an orange vest

Summary data

n Mean SD

vest worn 10 16.68 6.95

no vest 9 10.16 2.89

• Approach distance is greater when vest is worn.• Is the difference statistically significant?

Assessing statistical significance• Statistical significance means “we cannot

attribute the observed difference to random allocation alone.”

• H0: “Vest mean distance” = “No-vest mean distance” (Null hypothesis)

• Ha: “Vest mean distance” > “No-vest mean distance” (Alternative hypothesis)

The test for statistical significance using a permutation test

• Assume chance alone, i.e., assume H0 is true;

• Then “vest” and “no vest” are merely labels that the experimenter has randomly assigned to the 19 outcomes;

• So simulate this process of random assignment many times and form a distribution of mean differences.

Distance group(1=vest, 0= no vest)

14.20 1

13.69 1

3.12 1

15.40 1

9.67 1 mean = 16.68

19.01 1

20.38 1

26.67 1

21.21 1 observed mean

23.40 1 difference = 6.5211.94 0

11.27 0

8.81 0

6.09 0

8.18 0 mean = 10.16

13.92 0

6.38 0

13.25 0

11.62 0

Distance Random assign re-arrange mean difference

14.20 0 3.12 1

13.69 0 9.67 1

3.12 1 21.21 1

15.40 0 23.40 1

9.67 1 11.27 1 mean = 11.89

19.01 0 8.81 1

20.38 0 6.09 1

26.67 0 8.18 1

21.21 1 13.92 1

23.40 1 13.25 1 diff =

11.94 0 14.20 0 -3.5911.27 1 13.69 0

8.81 1 15.40 0

6.09 1 19.01 0

8.18 1 20.38 0 mean = 15.48

13.92 1 26.67 0

6.38 0 11.94 0

13.25 1 6.38 0

11.62 0 11.62 0

Histogram of replicates

replicates

-10 -5 0 5 10

Mean diffs for 100,000 replicates

p-value and statistical significance

• P(mean diff ≥ observed mean diff) = the p-value of the test;

• of the 100,000 mean differences in the simulation, 908 were ≥ observed difference of 6.52;

• so the p-value is empirically estimated to be .00908;

Drawing a conclusion from a p-value

• p-value = .009 says that “random allocation alone” is very unlikely to produce a difference as extreme as we observed in the data;

• The results are statistically significant

• We need some other way to explain this large a mean difference, such as …

• The wearing—or not—of an orange vest.

Two-sample t-test

• p-value for t=2.613 and 17 degrees of freedom is 0.0091;

• essentially identical to permutation p-value;

• Generally the permutation result can be approximated by the t

A fact discovered …

• nearly 100 years ago by:

• R.A. Fisher and

• E.J.G. Pitman;

• Fisher’s design of experiments book asserted this fact;

• Pitman provided the mathematical derivation;

• I give references in the last slide.

Interesting bit of history

• Pitman wrote in paper: “The main idea is not new, it seems to be implicit in all Fisher’s writings.”

• Pitman later said: “I was always dissatisfied with the sentence I wrote ... I wanted to say I really was doing something new.”

• Pitman’s contribution was new and it was too bad he ceded credit for it to Fisher.

Warblers vs Baboons examples

• Warblers data structure is simpler;• Warblers example is a designed experiment, so

that statistical significance implies a causal effect;• Baboons example is an observational (non-

experimental) study, so statistical significance can at best suggest the presence of a biological mechanism;

• with either example, we use a computer simulation to estimate the p-value, but …

• with warblers, a t-distribution provides a simpler method of calculating the p-value.

The Null Hypothesis implies …

The female handlers interacted with infants as given in the data set. These interactions involved a variety of complex causes, but none of this complexity had anything to do with ranks. That is, ranks can be viewed as meaningless labels attached to infants and females.

“…if the data are consistent with a properly constructed null model we can infer that the biological mechanism is not operating, but if the data are inconsistent with the null model, “...this provides some positive evidence in favor of the mechanism.”

(Gotelli and Graves, Null Models in Ecology, 1996)

With observational data …

Computing a permutation test

• Choose a test statistic, C, which reflects the version of research hypothesis being tested, RH1 or RH2.

• (1) Assign ranks at random to infants and females using the rank distributions of the data set. That is, assign ranks at random so that infants are assigned, in this case, 1 High, 4 Mid, and 6 Low and so that females are assigned 4 High’s, 7 Mid’s, and 12 Low’s. This assignment leads to the original data table but with permuted ranks.

• (2) Re-form the 3-by-3 table.• (3) Compute the value of C for this table.• Iterate (1)-(3) many times for empirical null distribution.• )( CC D

PvalueP

For RH1, use this test statistic for C

18413868

958397

For RH2, use this test statistic for C

18413868

958397

A sample permutation (Table 2)

HANDLERS ranks KM KN NQ PO HQ LL NY PS SK ST WK AL CO DD LS LY MH ML MM PA PH PT RS 1 3 1 3 3 3 2 2 3 1 3 2 2 3 3 1 3 3 3 2 2 3 2 INFANTS/ Mothers ranks KG/KM 1 0 0 4 1 | 1 0 0 0 3 1 0 | 0 0 0 0 0 0 0 0 0 0 2 1 HZ/HQ 3 13 23 7 5 | 0 2 1 1 5 6 18 | 1 6 3 0 1 4 1 0 9 0 10 1 LC/LL 3 4 0 1 4 | 3 0 2 1 1 5 3 | 1 0 0 1 0 2 1 1 1 0 1 6 NK/NY 2 12 4 10 5 | 9 1 0 2 3 11 7 | 8 6 3 1 0 2 1 1 5 3 2 3 PZ/PS 2 1 3 4 1 | 0 0 0 0 0 0 2 | 0 2 0 0 0 3 0 1 1 0 3 0 CY/CO 2 2 2 7 3 | 1 1 2 0 3 12 16 | 3 0 2 0 0 2 0 0 1 0 0 2 LZ/LS 3 1 0 3 2 | 1 1 0 0 2 0 5 | 2 2 2 0 1 9 2 0 0 0 3 2 MQ/ML 3 0 1 5 2 | 2 4 2 2 2 4 5 | 7 5 2 1 1 7 0 4 4 1 0 2 MW/MH 3 3 0 7 4 | 2 3 0 5 2 8 13 | 7 14 2 0 0 0 4 0 8 0 13 6 MX/MM 3 2 3 4 5 | 0 0 0 0 0 5 2 | 9 3 1 0 0 2 0 0 1 2 2 3 PK/PH 2 2 0 6 4 | 3 4 1 0 0 15 10 | 8 5 1 0 3 1 1 6 3 0 7 5

[,1] [,2] [,3] [1,] 5 1 7 [2,] 85 60 119 [3,] 81 117 203

A sample permutation (handout)

HANDLERS ranks KM KN NQ PO HQ LL NY PS SK ST WK AL CO DD LS LY MH ML MM PA PH PT RS 1 3 1 3 3 3 2 2 3 1 3 2 2 3 3 1 3 3 3 2 2 3 2 INFANTS/ Mothers ranks KG/KM 1 0 0 4 1 | 1 0 0 0 3 1 0 | 0 0 0 0 0 0 0 0 0 0 2 1 HZ/HQ 3 13 23 7 5 | 0 2 1 1 5 6 18 | 1 6 3 0 1 4 1 0 9 0 10 1 LC/LL 3 4 0 1 4 | 3 0 2 1 1 5 3 | 1 0 0 1 0 2 1 1 1 0 1 6 NK/NY 2 12 4 10 5 | 9 1 0 2 3 11 7 | 8 6 3 1 0 2 1 1 5 3 2 3 PZ/PS 2 1 3 4 1 | 0 0 0 0 0 0 2 | 0 2 0 0 0 3 0 1 1 0 3 0 CY/CO 2 2 2 7 3 | 1 1 2 0 3 12 16 | 3 0 2 0 0 2 0 0 1 0 0 2 LZ/LS 3 1 0 3 2 | 1 1 0 0 2 0 5 | 2 2 2 0 1 9 2 0 0 0 3 2 MQ/ML 3 0 1 5 2 | 2 4 2 2 2 4 5 | 7 5 2 1 1 7 0 4 4 1 0 2 MW/MH 3 3 0 7 4 | 2 3 0 5 2 8 13 | 7 14 2 0 0 0 4 0 8 0 13 6 MX/MM 3 2 3 4 5 | 0 0 0 0 0 5 2 | 9 3 1 0 0 2 0 0 1 2 2 3 PK/PH 2 2 0 6 4 | 3 4 1 0 0 15 10 | 8 5 1 0 3 1 1 6 3 0 7 5

[,1] [,2] [,3] [1,] 5 1 7 [2,] 85 60 119 [3,] 81 117 203

Test statistic for Research hypothesis 1

20311781

1196085

Null distribution: 1000 resamples

100 200 300 400 500

Conclusion

• P-value ≈ 15/1000 = .015

• Observed pattern is unlikely the result of chance alone.

Results for All and Successful

LTE LT n

All 0.015 0.038 678

Succ 0.372 0.017 112

---------------------------

p-values using two test statistics:

LTE = less than or equal to;

LT = less than

Look at Successful interactions

Counts

hi mid lo

hi 0 3 2

mid 19 6 22

lo 3 14 43

Column percentages

hi mid lo

hi 0% 13% 3%

mid 86% 26% 33%

lo 14% 61% 64%

Interpretation of results

• Infant-handling as a way to dominate.

• Infant-handling as “status striving”

• Data supports a nuanced give-and-take between infant handling as nurturing for infant and infant handling as dangerous to infant.

Statistical lessons from the baboon study:

• An example where permutation tests have no simple alternative for the calculation of p-value.

• Limits of observational study.

• Shows the use of p-values as a springboard toward interpretation …

• e.g., statistical significance gives a reason to take a pattern seriously and try and make sense of the patterns from scientific theory.

For a teaching module built around permutation tests, which includes this case study, you should be alert to the arrival soon of:

Practicing Statistics, by Shonda Kuiper, Pearson

http://www.pearsonhighered.com/kuiper1einfo/

• “A study of permutation tests in the context of a problem in primatology,” J. of Data Science, v.8, no. 1, Jan. 2010, by Moore and Bentley-Condit.

• “Bird reactions to observer clothing: Implications for distance-sampling techniques,” J. of Wildlife Management, V.61,no.3, July 1997, pp.935-947, by Kevin T. Gutzwiller and Heidi A. Marcum;

• R.A. Fisher’s The Design of Experiments, 1935, Hafner.• Pitman, E. J. G., "Significance tests which may be applied to

samples from any population---part I", Royal Statistical Society Supplement, 1937; v.4: 119-130.

• Eddington, Eugene S., Randomization Tests, 1995, Marcel Dekker.• Ernst, Michael D., “Permutation methods: A basis for exact

inference”, Statistical Science, 2004, v.19, no.4, 676-685.• Kuiper, Shonda, Practicing Statistics, Pearson, (to appear).

References:

Using baboon "mothering" behavior to help teach Permutation tests

Documents