MODERN ROBUST METHODS: BASICS AND SOME ......methods, plus some recent advances, will be described....

MODERN ROBUST METHODS: BASICS AND

SOME RECENT ADVANCES

OR

HOW MANY DISCOVERIES ARE LOST BY

IGNORING MODERN ROBUST METHODS?

ABSTRACT

Consider the standard, routinely taught and used methods for comparing groups and study-ing associations. Based on hundreds of papers published during the last fifty years, thesemethods perform well when groups do not differ in any manner (they have identical distri-butions), and when there is no association. If the groups differ or there is an association,these methods might continue to perform well. But based on four major insights, undergeneral conditions, this is not the case. Power can be relatively poor and a highly superficialunderstanding of the data can result. In contrast, there is now a vast array of new and im-proved methods that effectively deal with known concerns associated with classic techniques.They include substantially improved methods for dealing with skewed distributions, outliers,heteroscedasticity (unequal variances) and curvature. Under general conditions these mod-ern methods offer substantial gains in power. Perhaps more importantly, they can provide adeeper, more accurate and more nuanced understanding of data. The basics of these modernmethods, plus some recent advances, will be described.

1 SOME PRELIMINARY REMARKS

As noted in the abstract, hundreds of papers published during the last half century providesubstantially improved methods for dealing with non-normality, outliers, heteroscedasticityand curvature.

PRACTICAL REASONS FOR TAKING ADVANTAGE OF MORE MODERN METH-ODS:

• The possibility of substantially higher power relative to methods that assume normalityand homoscedasticity.

• More accurate confidence intervals and better control over the probability of a Type Ierror.

• A deeper and more accurate sense of how groups compare and how variables are related.

1

There are, of course, differences in the relative merits of the methods to be described,raising a natural question: which method is best?

Perhaps a better question: How many methods does it take to understand data?

Various aspects of this issue will be discussed.

Remark: A copy of these notes can be downloaded from my web page. Search the webfor

software Rand Wilcox

to find a link to my web page. On my web page, click on workshops.

2

FOUR MAJOR INSIGHTS WITH MANY RELATED RESULTS:

• Heavy-tailed distributions (outliers are likely) are commonly encountered and can de-stroy power when using means or least squares regression.

• Skewed distributions can result in highly inaccurate results and low power when usingany method based on means, even when dealing with light-tailed distributions. Thisissue is related to insights about the central limit theorem, which will be illustrated.Certain classic methods can be inaccurate under general conditions regardless of howlarge the sample sizes might be.

• Heteroscedasticity can be much more serious than once thought.

• Curvature, when studying associations, is much more serious than what is generallyassumed.

GOAL: Briefly summarize the many issues and techniques that have been developed todeal with known concerns associated with standard methods.

More details can be found in

Wilcox, R. R. (2017). Introduction to Robust Estimation and Hypothesis Testing. 4thedition. San Diego, CA: Academic Press.

Wilcox, R. R. (2012). Modern Statistics for the Social and Behavioral Sciences: APractical Introduction. New York: Chapman & Hall/CRC press.

Wilcox, R. R. (2017). Understanding and Applying Basic Statistical Methods Using R.New York: Wiley.

3

THE THREE COMPONENTS OF THIS TALK:

• BASICS PLUS SOME RECENT ADVANCES

• METHODS FOR COMPARING GROUPS

• METHODS AND SOME RECENT ADVANCES FOR STUDYING ASSOCIATIONS

4

DEALING WITH VIOLATIONS: STRATEGIES THAT PERFORM POORLY:

• Transform the data.

• Test assumptions.

• Use standard methods and appeal to the central limit theorem when the sample sizeis greater than 40.

• Remove outliers among the dependent variable and apply standard methods using theremaining data.

More details are given later.

5

HISTORICAL COMMENTS

Concerns about non-normality, and how these concerns might be addressed, date backtwo centuries to results derived by Laplace.

He was aware, for example, that the sample median can have a smaller standard error thanthe sample mean as we move toward heavy-tailed distributions. Empirical results indicatingthat heavy-tailed distributions occur were first reported by Bessel (1818). And the commonoccurrence of such distributions, within astronomy and physics, was well established late inin the 19th century (e.g., Newcomb, 1882). Today it is well established that heavy-taileddistributions are commonly encountered in a wide range of situations.

But without reasonably fast computers and appropriate software, practical solutions aregenerally impossible.

The catalyst that led to the mathematical foundation of modern robust methods is apaper by Tukey (1960).

6

x

-3 -2 -1 0 1 2 3

normal curve

mixed normal

Figure 1: For normal distributions, increasing the standard deviation from 1 to 1.5 resultsin a substantial change in the distribution. But when considering non-normal distributions,seemingly large differences in the variances does not necessarily mean that there is a largedifference in the graphs of the distributions. The two curves shown here have an obvioussimilarity, yet the variances are 1 and 10.9.

7

x

-10 -5 0 5 10

mixed normal curve

normal curve

Figure 2: Two probability curves having equal means and variances.

8

x

0 2 4 6 8

Figure 3: Two probability curves having equal means and variances.

9

median = 3.75

mean = 7.6

Figure 4: The population mean can be located in the extreme portion of the tail of adistribution. That is, the mean can represent a highly atypical response.

10

-2 0 2 4

0.0

0.1

0.2

0.3

0.4

-2 0 2 4

0.0

0.1

0.2

0.3

Figure 5: In the left panel, power is 0.96 based on Student’s T, α = 0.05. But in theleft panel, power is only 0.28, illustrating the general principle that slights changes in thedistributions being compared can have a large impact on the ability to detect true differencesbetween the population means.

11

The variance is highly sensitive to the tails of a distribution. Classic example is the mixednormal shown in Figure 1.

To illustrate one aspect regarding how the mean, 20% trimmed mean and median com-pare, first consider normality.

Generated 30 observations from a normal distribution, computed the mean, 20% trimmedmean and median, repeated this 10,000 times. Boxplots shown in Figure 6 illustrate theincreased accuracy of the mean over the median.

But now suppose sampling is from a mixed normal or a skewed distribution for whichoutliers tend to be relatively common. Repeating the simulation, now the mean performspoorly as indicated by the boxplots in Figure 7.

ILLUSTRATIONS USING DATA FROM ACTUAL STUDIES: cortisol awakening re-sponse and a measure of reading ability.

Note implications about replicating a study.

12

Means Trimmed Means Medians

−1.

0−

0.5

0.0

0.5

1.0

Figure 6: Boxplots of means, 20% trimmed means and medians when sampling from a normaldistribution. 13

Means Trimmed Means Medians

−3

−2

−1

01

23

4

Figure 7: Boxplots of 10,000 means, 20% trimmed means and medians using data sampledfrom a mixed normal distribution. 14

Means Medians

CAR

Means Medians

Reading Data

Figure 8: Boxplots of means and medians using data from two different studies.

15

DEALING WITH OUTLIERS: two strategies that are reasonable, one that is relativelyineffective and another seemingly natural strategy that should never be used.

• Trim. Best-known example is the median, but it might trim too much. Non-bootstrapmethods perform well using a 20% trimmed mean, but a percentile bootstrap is gener-ally best. For median, when there are tied values, a percentile bootstrap method is theonly known method that performs well. (Rank-based methods perform poorly undergeneral conditions.

• Remove outliers among the dependent variable and average the values that remain. (Anexample is the modified one-step M-estimator, which is based on the MAD-median rulefor detecting outliers.)

Easily, bootstrap methods are the best at handling this approach.

• Transform the data, for example take logarithms, but this approach performs poorly.

• Highly unsatisfactory strategy: discard outliers and apply standard hypothesis testingmethods for means to the remaining data. If, for example, the sample size is reducedfrom n to m after trimming, use a method for means that assumes we have m obser-vations. This results in using the wrong standard error. Methods that remove outlierscan be used, but it is imperative that a correct estimate of the standard be used, whichdepends on how outliers are treated. (Details are given later.)

Why is it technically unsound to discard outliers and apply standard methods based onmeans using the remaining data? This results in using an incorrect estimate of the standarderror.

All indications are that typically, a 20% trimmed mean is a good choice for general use.But there are always exceptions. No single method is always best. only effective way todetermine whether another choice makes a practical difference is to try it.

16

DETECTING OUTLIERS AND RELATED MEASURES OF LOCATION

METHOD GENERALLY AGREED TO BE LEAST SATISFACTORY:

|X − X̄|s

> 2. (1)

Suffers from masking. Reason: the sample variance has a breakdown point of only 1/n.Even a single outlier can inflate s, which makes the left side of the last equation small.

Boxplot rule (using ideal fourths): The boxplot is based on the interquartile range, whichhas a breakdown point of 0.25.

MAD-MEDIAN RULE

MAD is the median of |X1 −M |, . . . , |Xn −M |. MADN is MAD/0.6745 and M is theusual sample median; MADN estimates σ under normality and its breakdown point is 0.5.

THE MAD-MEDIAN RULE declares X an outlier if

X −MMADN

> 2.24

R SOFTWARE: HOW TO GAIN ACCESS TO MODERN METHODS.

The R function

outbox(x,mbox=FALSE),

The R function

outbox(x,mbox=FALSE),

MODIFIED ONE-STEP M-ESTIMATOR: remove values declared outliers via the MAD-median rule, average the remaining values. Breakdown=0.5.

The R function

out(x)

17

checks for outliers using the MAD-median rule.

EXAMPLE

Consider the values

2, 2, 3, 3, 3, 4, 4, 4, 100000, 100000.

The value 100,000 is not declared an outlier using the mean and standard deviation.

For the same data, using the R function outbox to detect outliers, the value 100,000 isdeclared an outlier. Same is true using MAD-median rule.

EXAMPLE

Cortisol upon awakening (Well Elderly study n = 460. The mean and standard deviationdetect 7 outliers (using the R function outms) boxplot detects 19 outliers (using outbox) andthe MAD-median rule detects 34 (using the R function out). the MAD-median rule has ahigher breakdown point than the boxplot rule.

18

CENTRAL LIMIT THEOREM

n=40, can assume normality when using a mean?

No: two major insights, one of which is particularly devastating.

1. Early studies focused on light-tailed distributions and the distribution of the samplemean. Heavy-tailed distributions turn out to be a problem.

2. There was an implicit assumption that if the sample mean has approximately a normaldistribution, Student’s t would perform well. But this is not necessarily the case as will beillustrated.

Even with skewed, light-tailed distributions, can need n ≥ 200. when dealing with twoor more groups, violating the normality assumption is an even more serious concern. Forskewed, heavy-tailed distributions, Student’s t can require n ≥ 300

BUT WHAT ABOUT REAL DATA? DO PROBLEMS PERSIST?

19

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

Figure 9: Shown is a lognormal distribution, which is skewed and relatively light-tailed,roughly meaning that the proportion of outliers found under random sampling is relativelysmall.

20

x

−6 −4 −2 0 2

x

−2 0 2

Figure 10: The left panel shows the distribution of 5000 T values, with each T value basedon 20 observations generated from the lognormal distribution. The symmetric solid line isthe distribution of T under normality. The right panel is the same as the left, only now thesample size is n = 100.

21

T

−20 −15 −10 −5 0

0.0

0.1

0.2

0.3

0.4

Figure 11: The asymmetric curve is the distribution of T when sampling 20 observationsfrom a contaminated lognormal distribution, which is heavy-tailed. The symmetric curve isthe distribution of T under normality.

22

−30 −20 −10 0

0.0

0.1

0.2

0.3

0.4

Figure 12: The distribution of T when randomly sampling from the hangover data. Thesymmetric curve is the distribution of T under normality.

23

−140 −120 −100 −80 −60 −40 −20 0

0.00

0.02

0.04

0.06

0.08

Figure 13: The distribution of T when randomly sampling from the sexual attitude data,n = 105.

24

−10 −8 −6 −4 −2 0 2

0.0

0.1

0.2

0.3

0.4

Figure 14: The distribution of T when randomly sampling from the sexual attitude datawith the extreme outlier removed.

25

2 TRIMMED MEANS

Trimming reduces problems associated with skewed distributions and low power due tooutliers.

Basic issue: estimating the squared standard error of the trimmed mean

Recall estimate of VAR(X̄) = s2/n

For a 20% trimmed mean, which has a breakdown point of 0.2, estimate is

s2w0.62n

,

where s2w is the 20% Winsorized sample variance.

MORE BROADLY, SIMPLY DISCARDING OUTLIERS AND APPLYING METHODSFOR MEANS TO THE REMAINING DATA CAN RESULT IN HIGHLY INACCURATERESULTS. STANDARD ERRORS DEPEND ON THE METHOD USED TO DETECTAND REMOVE THE INFLUENCE OF OUTLIERS.

EXAMPLE

For sexual attitude data, n = 105 males, the estimated standard error of the 20% trimmedmean is 0.53.

Imagine we use the method for the sample mean on the remaining 63 values left aftertrimming. That is, we compute s using these 63 values only and then compute s/

√63. This

yields 0.28, which is about half of the value based on a theoretically sound technique.

The R function

trimse(x,tr=0.2)

ESTIMATES THE STANDARD ERROR OF A TRIMMED MEAN.

26

A CONFIDENCE INTERVAL FOR THE POPULATION TRIMMED MEAN: TUKEY-MCLAUGHLIN METHOD

The R function

trimci(x,tr=0.2,alpha=0.05,nullvalue=0)

computes a 1 − α confidence interval for µt. The amount of trimming, indicated by theargument tr, defaults to 20%. If the argument alpha is unspecified, α = 0.05 is used. Theargument nullvalue indicates the null value when testing some hypothesis.

Figure 15 shows the distributions of tt (left panel) and T (right panel) based on n = 20and when sampling from the lognormal distribution in Figure 9 (a skewed, relatively light-tailed distribution). Also shown is the approximation of the distribution assuming normality.Note that tt is better approximated by a Student’s t distribution compared to T .

Figure 16 shows an estimate of the distribution of T for some cortisol data. could theapproximation be misleading? Yes, problems with T are probably being underestimated.

27

−4 −2 0 2 −4 −2 0 2

Figure 15: The left panel shows the actual distribution of Tt and the approximation based onStudent’s t distribution when sampling from the lognormal distribution in Figure 9, n = 20.The right panel shows the actual distribution of T . In practical terms, using Student’s tto compute confidence intervals for the 20% trimmed mean tends to be more accurate thanusing Student’s t to compute a confidence interval for the mean.

28

−6 −4 −2 0 2

0.0

0.1

0.2

0.3

Figure 16: The distribution of T based on cortisol measures upon awakening, n=8729

−4 −3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Figure 17: The distribution of Tt (20% trimming) based on cortisol measures upon awakening,n=87 30

MEDIAN: REQUIRES SPECIAL METHODS

FOR DISTRIBUTION FREE CONFIDENCE INTERVAL, USE THE R FUNCTION

sint(x,alpha=0.05)

The R function

msmedse(x)

computes an estimate of the standard error of sample median, M

But this estimator, as well as all others that have been proposed, can perform poorlywhen tied (duplicated) values occur.

Practical implication: Need an alternative method for comparing groups based on me-dian.

We can deal with tied values using a percentile bootstrap method, which will be describedlater.

Note: Student’s t can be biased. That is, probability of rejecting is not minimized whenthe null hypothesis is true.

COMMENTS ON M-ESTIMATOR AND MODIFIED ONE-STEP M-ESTIMATOR

Standard error of M-estimator can be estimated. But resulting test statistic is unsat-isfactory, in terms of Type I errors, when dealing with skewed distributions. Need to usea percentile bootstrap to be described. Same is true when using the modified one-stepM-estimator.

31

3 BOOTSTRAP METHODS

A collection of nonparametric techniques that can be used to compute confidence intervals,test hypotheses and solve other statistical problems. The basic strategy is to run simulationsusing the observed data, as will be illustrated.

What does nonparametric mean? Consider, for example, the one-sample Student’s t test.Can determine a critical value by imposing a particular parametric assumption about thedistribution under study : normality. Normal distributions are completely determined bytwo unknown parameters: the population mean and population variance.

In contrast, bootstrap methods do not assume that the population distribution is deter-mined by a specified class of distributions characterized by unknown parameters such as thepopulation mean and population variance.

Note: Bootstrap methods are not distribution free. A distribution free method meansthat when testing hypotheses, control over the Type I error probability can be determinedexactly assuming random sampling only. An example of a distribution free method willbe given later. (The shift function, to be described.) The R function sint computes adistribution free confidence interval for the population median.

Bootstrap methods are not a panacea for dealing with the many practical problemsencountered when trying to understand data.

But they have considerable practical value for a wide range of situations.

Examples:

• Can remove outliers and test hypotheses in a technically sound manner. Removingoutliers and applying a standard method to the remaining data is disastrous.

• Bootstrap methods are particularly important when making inferences about measuresof location when outliers are empirically determined and eliminated, in contrast totrimming a specified amount.

• A percentile bootstrap method is currently the best method for comparing medians,particularly when there are tied values. Helps deal with heteroscedasticity.

• When using robust regression estimators, bootstrap methods are the best methodswhen testing hypotheses. They include the ability to deal with heteroscedasticity.

• Certain bootstrap methods are arguably best when making inferences based on means,but any method based on means can perform poorly relative to other techniques.

• Estimating Power When Testing Hypotheses

32

• A Bootstrap Estimate of Standard Errors

• Bootstrap methods do not always dominate. Example: there are bootstrap methodsfor least squares regression, but HC4 method performs about as well and sometimes isa bit better. Same is true when comparing correlations. (Perhaps certain refinementsof the basic bootstrap methods perform better than HC4.)

• Detecting Associations Even When There Is Curvature

• Quantile Regression

• A test for homoscedasticity using a quantile regression approach or a method based onthe residuals that allows curvature.

• Regression: Which Predictors are Best? Can even test hypotheses. Example: Are IVs1 and 2 more important than IV 3 when all three IVs are included in the model?

• When comparing groups, inferences about a collection of quantiles can be made suchthat the probability of one or more Type I errors is controlled.

• ANCOVA. Can deal with both types of heteroscedasticty, outliers, non-parallel regres-sion lines, non-normality and even curvature.

UNDERSTANDING THE BASICS:

DETERMINING CRITICAL VALUES VIA SIMULATION STUDIES

If the goal is to have a Type I error probability of 0.05, this can be accomplished if thedistribution of

T =X̄ − µ0s/√n

(2)

can be determined.

Here is how we would use R to determine T if sampling from a lognormal distribution:

• Generate n observations from the distribution of interest. (For a lognormal distributionthe R function rlnorm accomplishes this goal.)

33

• Compute the sample mean X̄, the sample standard deviation s, and then T . (Thepopulation mean of a lognormal distribution is approximately 1.65.)

• Repeat steps 1 and 2 many times. For illustrative purposes, assume they are repeated1000 times yielding 1000 T values: T1, . . . , T1000.

• Put these values in ascending order yielding T(1) ≤ · · · ≤ T(1000).

• An estimate of the 0.975 quantile of the distribution of T is T(975) again because theproportion of observed T values less than or equal to T(975) is 0.975. If, for example,T(975) = 2.1, this suggests using t = 2.1 when computing a 0.95 confidence interval.

But this assumes T has a symmetric distribution. Alternative solution: also estimate the0.025 quantile with T(25). Call this t` and label T(975) tu.(

X̄ − T ∗(u)s√n, X̄ − T ∗(`+1)

s√n

).

Called an equal-tailed confidence interval

But we don’t know the distribution we are sampling from.

However, we have an estimate of it based on the observed data.

So conduct a simulation on the observed data to determine distribution of t.

This is the strategy used by the bootstrap-t method.

34

Table 1: Actual Type I error probabilities for three methods based on the mean when testingat the α = 0.05 level

MethodDist. BT SB T

n = 20 N .054 0.051 0.050LN 0.078 0.093 0.140MN 0.100 0.014 0.022SH 0.198 0.171 0.202

n = 100 N 0.048 0.038 0.050LN 0.058 0.058 0.083MN 0.092 0.018 0.041SH 0.168 0.173 0.190

N=Normal, LN=Lognormal, MN=Mixed normal, SH=Skewed, heavy-tailed,BT=Equal-tailed, bootstrap-t, SB=Symmetric bootstrap-t, T=Student’s T

4 PERCENTILE BOOTSTRAP METHOD

Not recommended when testing hypotheses about means, but easiest to explain when workingwith the mean. Works very well with a 20% trimmed mean or median.

GOAL: TESTH0 : µ = µ0,

where µ0 is some hypothesized value.

STRATEGY:

Estimate the probability that a bootstrap sample mean is greater than µ0.

In symbols, the goal is to determine

p = P (X̄∗ > µ0),

the probability that the bootstrap sample mean is greater than the hypothesized value.

The R function

onesampb(x, est = onestep, alpha = 0.05, nboot = 2000)

35

computes a confidence interval based on the one-step M-estimator, where x is an R variablecontaining data, alpha is α, which defaults to 0.05, and nboot is b, the number of bootstrapsamples to be used, which defaults to 2000. (This function contains two additional argu-ments, the details of which can be found in my books.) This function can be used with anymeasure of location via the argument est. For example,

onesampb(x, est = tmean)

would return a confidence interval based on the 20% trimmed mean. For trimmed means,the R function

trimpb(x, tr=0.2)

can be used instead.

EXAMPLE

CORTISOL AWAKENING RESPONSE PRIOR TO INTERVENTION (FROM THEWELL ELDERLY 2 STUDY).

Percentile bootstrap, 20% trimmed, the confidence interval is

(−0.041, 0.00765)

compared to

(−0.058, 0.030),

the confidence interval for the mean using Student’s t, which is nearly twice as long as theconfidence interval based on the 20% trimmed.

36

Table 2: Actual Type I error probabilities using 20% trimmed means, α = 0.05

MethodDist. BT SB P TM

n = 20 N .067 .052 .063 .042LN .049 .050 .066 .068MN .022 .019 .053 .015SH .014 .018 .066 .020

N=Normal, LN=Lognormal, MN=Mixed normal, SH=Skewed, heavy-tailed,BT=Equal-tailed, bootstrap-t, SB=Symmetric bootstrap-t, P=Percentile bootstrap,

TM=Tukey-McLaughlin

5 BOOTSTRAP-TMETHOD BASED ON TRIMMED

MEANS

If the amount of trimming is 20% or more, it seems that using a percentile bootstrap methodis best for general use, but with the amount of trimming close to zero, it currently seemsthat using a bootstrap-t method is preferable. (With 10% trimming, it is unclear whether abootstrap-t is preferable to a percentile bootstrap method.)

The R function

trimcibt(x, tr = 0.2, alpha = 0.05, nboot = 599, side = T)

computes a bootstrap-t confidence interval for a trimmed mean.

In summary, all indications are that the percentile bootstrap is more stable (with at least20% trimming) than the bootstrap-t method. That is, the actual Type I error probabilitytends to be closer to the nominal level. And it has the added advantage of more power, atleast in some situations, compared to any other method we might choose.

However, there are situations where the bootstrap-t method outperforms the percentilemethod. and there are additional situations where the percentile bootstrap is best. So bothmethods are important to know.

37

6 COMPARING TWO INDEPENDENT GROUPS

FOUR GENERAL APPROACHES WHEN COMPARING TWO GROUPS:

1. Compare measures of location, such as the mean or median.

2. Compare measures of variation.

3. Focus on the probability that a randomly sampled observation from the first groupis smaller than a randomly sampled observation from second group.

4. Simultaneously compare all of the quantiles to get a global sense of where thedistributions differ and by how much. For example, low scoring participants ingroup 1 might be very similar to low scoring participants in group 2, but for highscoring participants, the reverse might be true.

CONCERNS ABOUT STUDENT’S T

EXAMPLE

YUEN’S METHOD FOR TRIMMED MEANS

Yuen (1974) derived a method for comparing the population trimmed means of twoindependent groups that reduces to Welch’s method for means when there is no trimming.Trimming 20% is generally a good choice, but there are exceptions.

The improvement in power, achieving accurate confidence intervals, and controlling TypeI error probabilities, can be substantial when using Yuen’s method with 20% trimming ratherthan Welch’s test.

The R function

yuen(x,y,alpha=0.05,tr=0.2)

COMPARING MEDIANS

Best overall method, especially when dealing tied (duplicated) values: percentile boot-strap.

With no tied values, a non-bootstrap method can be used based on McKean-Schraderestimate of standard error works well.

38

Note: under general conditions, all rank-based methods can perform poorly given thegoal of comparing medians.

POINT WORTH STRESSING

Although the median belongs to the class of trimmed means, special methods are requiredfor comparing groups based on medians.

NO TIED VALUES:

Then an approximate 1−α confidence interval for the difference between the populationmedians is

(M1 −M2)± c√S21 + S

22

where c is the 1 − α/2 quantile of a standard normal distribution, and S2j is the McKean–Schrader estimate of the standard error of Mj.

The R function

msmed(x,y,alpha=0.05)

compares medians using the McKean-Schrader estimate of the standard error.

If there are tied values, a percentile bootstrap is the only known method that performswell in simulations.

7 PERCENTILE BOOTSTRAPMETHODS FOR COM-

PARING MEASURES OF LOCATION

Let M∗1 and M∗2 be the bootstrap sample medians.

p∗ = P (M∗1 > M∗2 ) + 0.5P (M

∗1 = M

∗2 ).

The p-value is2min(p̂∗, 1− p̂∗).

A confidence interval can be computed as well.

THE SAME STRATEGY IS USED WITH OTHER MEASURES OF LOCATION.

With skewed distributions, when comparing M-estimators, this is the best approach tohypothesis testing.

Works well when using 20% trimmed means

39

The R function

medpb2(x,y,alpha=0.05,nboot=2000,SEED=T)

tests the hypothesis of equal medians using the percentile bootstrap method just described.The function also returns a 1−α confidence interval for the difference between the populationmedians.

EVEN AMONG ROBUST METHODS, THE CHOICE OF METHOD CAN MATTER

EXAMPLE

In an unpublished study by Dana (1990), the general goal was to investigate issues relatedto self-awareness and self-evaluation. In one portion of the study, he recorded the timesindividuals could keep an apparatus in contact with a specified target. Two independentgroups were compared. Comparing 20% trimmed means, fail to reject at the 0.05 level. the0.95 confidence interval using a bootstrap-t method is (−305.7, 10.7). comparing mediansusing a non-bootstrap method (McKean-Schrader estimate of the standard errors), reject;the 0.95 confidence interval is (−441.4, −28.6), p = 0.04. But percentile bootstrap does notreject. Using MOM and the percentile bootstrap, p = 0.47.

EXAMPLE

Hangover Symptoms.

The data are from a study dealing with the effects of consuming alcohol on hangoversymptoms.

Group 1 was a control group and measures reflect hangover symptoms after consuminga specific amount of alcohol in a laboratory setting. Group 2 consisted of sons of alcoholicfathers. The sample size for both groups is 20. Comparing means, the estimated differenceis 4.5,

CI [-1.63, 10.73], p = 0.14.

Figure 18 shows boxplots of the data. As is evident, the data are skewed with outliers.Using 20% trimmed means non-bootstrap method (R function yuenv2) yields an estimateddifference of 3.7,

CI [-0.456, 7.788], p = 0.076.

40

Control Sons of Alcoholics

010

2030

40

Figure 18: Boxplots of hangover symptoms.

Note that the length of the confidence intervals differ substantially; the ratio of thelengths is 0.67.

Using a percentile bootstrap, 20% trimmed mean :

CI[0.08. 8.3], p = 0.0475.

Lower quantiles differ.

41

8 RANK-BASED AND NONPARAMETRIC METH-

ODS

All of the standard rank-based methods have been improved. Details are summarized in mybooks. of particular interest are the improvements related to the Wilcoxon–Mann–Whitneytest.

COMPARING ALL QUANTILES SIMULTANEOUSLY

Roughly, when comparing medians, the goal is to compare the central values of thetwo distributions. But an additional issue is how low scoring individuals in the first groupcompare to low scoring individuals in the second. And in a similar manner, how do relativelyhigh scores within each group compare? A way of addressing this issue is to compare the0.25 quantiles of both groups as well as the 0.75 quantiles. Or to get a more detailed sense ofhow the distributions differ, all of the quantiles might be compared. There is a method forcomparing all quantiles in a manner that controls the probability of a Type I error exactlyassuming random sampling only. The method was derived by Doksum and Sievers (1976)and is based on an extension of the Kolmogorov-Smirnov method. Complete computationaldetails are not provided, but a function that applies the method is supplied and illustratednext.

The R function

sband(x,y, flag = F, plotit = T, xlab = "x (First Group)", ylab = "Delta")

computes confidence intervals for the difference between the quantiles using the data storedin the r variables x and y. Moreover, it plots the estimated differences as a function of theestimated quantiles associated with the first group, the first group being the data stored inthe first argument, x. This difference between the quantiles, viewed as a function of thequantiles of the first group, is called a shift function. to avoid the plot, set the argumentplotit=FALSE.

EXAMPLE

In a study by Victoroff et al. (2008), 52 14-year-old refugee boys in Gaza were classifiedinto one of two groups according to whether a family member had been wounded or killed byan Israeli. One issue was how these two groups compare based on a measures of depression.In particular, among boys with relatively high depression, does having a family memberkilled or wounded have more of an impact than among boys with relatively low measures ofdepression? Figure 19 shows a plot of the shift function.

42

Control Group

Del

ta

5 10 15 20 25 30

−10

010

2030

+o o

Figure 19: A plot of the shift function based on the Gaza data. The plot indicates that amongboys with low measures of depression, there is little difference between the two groups. Butas we move toward subpopulations of boys with high depression, the difference between thetwo groups increases.

43

0 10 20 30 40 50

0.00

0.01

0.02

0.03

0.04

0.05

CES−D

Est

imat

ed D

istr

ibut

ion

Figure 20: Distribution of CES-D before intervention (solid line) and after intervention

The shift function just illustrated is distribution free and controls the the probability ofone or more Type I errors when comparing all quantiles. But when dealing with the tails ofthe distributions or when tied values can occur, power can be relatively low. Can deal withthese concerns using a recently developed method, which is covered in the 4th Ed. of myrobust book.

EXAMPLE

COMPARE CONTROL GROUP TO EXPERIMENTAL GROUP THAT RECEIVEDINTERVENTION BASED ON DEPRESSIVE SYMPTOMS.

44

9 MEASURING EFFECT SIZE

Cohen’s d is not robust and it assumes homoscedasticity. Methods for dealing with thesetwo issues are summarized in my books.

The left panel of Figure 21, which is the same as in Figure 5, shows two normal distribu-tions where the difference between the means is 1 (µ1−µ2 = 1) and both standard deviationsare one. so

δ = 1,

which is often viewed as being relatively large. Now look at the right panel of Figure 21as is evident, the difference between the two distributions appears to be very similar to thedifference shown in the left panel, so according to Cohen we again have a large effect size.However, in the right panel, δ = 0.3 because these two distributions are mixed normals withvariances 10.9. This illustrates the general principle that arbitrarily small departures fromnormality can render the magnitude of δ meaningless.

10 Comparing Correlations and Least Squares Regres-

sion Slopes

The goal is to testH0 : ρ1 = ρ2,

the hypothesis that the two groups have equal population correlation coefficients.

Do not use Fisher’s r-to-z transformation. Can perform poorly under non-normality(Duncan & Layard, 1973).

Currently, one of the more effective approaches is to use a (modified) percentile bootstrapmethod.

When comparing slopes, based on least squares, a wild bootstrap method can be used aswell as a non-bootstrap method based in part of the HC4 estimate of the standard, whichdeals well with heteroscedasticity.

But a seemingly better approach is the so-called HC4 method. It uses a correct estimateof the standard error when there is heteroscedasticity.

Ignoring heteroscedasticity can result in using the wrong standard error, invalidating theresults.

The R function

45

-2 0 2 4

0.0

0.1

0.2

0.3

0.4

-2 0 2 4

0.0

0.1

0.2

0.3

Figure 21: In the left panel, δ = 1. In the right panel, δ = 0.3, illustrating that a slightdeparture from normality can lower δ substantially.

46

olsJ2(x1, y1, x2, y2, xout = FALSE, outfun = outpro, plotit = TRUE, xlab = ‘X’, ylab =‘Y’, ISO = FALSE, ...)

can be used test the hypothesis that two independent groups have identical least squaresregression lines. Setting the argument ISO=TRUE, the slopes are compared.

twohc4cor(x1,y1,x2,y2)

11 MAKINGDECISIONS ABOUTWHICHMETHOD

TO USE

Numerous methods have been described for comparing two independent groups. How doesone choose which method to use? There is no agreed upon strategy for addressing this issue,but a few comments might help.

First, and perhaps most obvious, consider what you want to know. If, for example, thereis explicit interest in knowing something about the probability that an observation from thefirst group is smaller than an observation from the second, use Cliff’s method or the Brunner-Munzel technique. Despite the many problems with methods for comparing means, it mightbe that there is explicit interest in the means, as opposed to other measures of location.If this is case, the R function yuenbt seems to be a relatively good choice for general use(setting the argument tr=0) with the understanding that all methods for means can resultin relatively low poor, inaccurate confidence intervals, and unsatisfactory control over theprobability of a Type I error. But if any method based on means rejects, it seems reasonableto conclude that the distributions differ in some manner. A possible argument for using othermethods for comparing means is that they might detect differences in the distributions thatare missed by other techniques (such as differences in skewness). If, for example, boxplotsindicate that the groups have no outliers and a similar amount of skewness, a reasonablespeculation is that using other measures of location will make little or no difference. Butthe only way to be sure is to actually compare groups with another measure of location.

If the goal is to maximize power, a study by Wu (2002), where data from various disser-tations were reanalyzed, indicates that comparing groups with a 20% trimmed mean is likelyto be best, but the only certainty is that exceptions occur. Generally, the method that hasthe highest power depends on how the groups differ, which is unknown. With sufficientlyheavy tails, methods based on the median might have more power. To complicate matters,situations are encountered where a rank-based method has the most power. Keep in mindthat for skewed distributions, comparing means is not the same as comparing trimmed meansor medians.

An issue of some importance is whether it is sufficient to use a single method for summa-

47

rizing how groups differ. Different methods provide different perspectives, as was illustrated.So one strategy might be to focus on a single method for deciding whether groups differ andthen use other methods, such as rank-based techniques or a shift function, to get a deeperunderstanding of how and where the groups differ.

A criticism of performing multiple tests is that as the number of tests performed increases,the more likely it is that at least one test will reject even when the groups do not differ.

12 COMPARING TWO DEPENDENT GROUPS

13 ONE-WAY AND HIGHER ANOVA DESIGNS

Violating assumptions is even more serious compared to the one- and two-sample situations.

All of the methods for comparing two groups in a robust manner can be extended tohigher-way designs.

Test the assumptions of normality and homoscedasticity? Not supported based on pub-lished studies. such tests often do not have enough power to detect violations of assumptionsthat have practical importance.

R functions for dealing with two-way, three-way designs, including repeated measuresdesigns are described in my 2012 books. New editions of these books are in progress.

14 MULTIPLE COMPARISONS

TWO-WAY AND THREE-WAY DESIGNS CAN BE HANDLED AS WELL, INCLUDINGWITHIN SUBJECTS (REPEATED MEASURES) DESIGNS.

15 SOME MULTIVARIATE METHODS

15.1 Detecting Outliers

Usual Mahalanobis distance: Suffers from masking. One of the worst possible methods.

Need a method that uses a robust measure of location and scatter. Numerous methods

48

have been proposed. Two that perform relatively well are a projection method and what iscalled the minimum generalized variance method. Substantially better methods are availablethat take into account the overall structure of the data.

EXAMPLE

Predictors of reading ability study. Figure 23 shows the points flagged as outliers plusthe regression plane based on the Theil-Sen estimator. The R command is

out3d(read[,c(3,4,8)],reg.plane=T,regfun=tsreg,xout=FALSE,xlab="SBT1",ylab="RAN1T1",

zlab="WWISST2")

Figure 24 is the same, only the regression plane estimated ignoring the outliers. The Rcommand is now

out3d(read[,c(3,4,8)],reg.plane=T,regfun=tsreg,xout=TRUE,xlab="SBT1",ylab="RAN1T1",

zlab="WWISST2")

the two regression planes differ substantially.

16 REGRESSION AND CORRELATION

Recall from basic principles √√√√ σ̂2∑(Xi − X̄)2

estimates standard error of b1. Note that outliers among independent variable reduce thestandard error, but outliers among the dependent variable increase it.

REVIEW THE NOTION OF HOMOSCEDASTICITY AND HETEROSCEDASTIC-ITY.

Figure 24 illustrates homoscedasticity and Figure 25 illustrates heteroscedasticity.

WHEN USING LEAST SQUARES REGRESSION, SUGGEST DEALING WITH HET-EROSCEDASTICITY USING THE HC4 ESTIMATOR.

IN PRACTICAL TERMS, USE ONE OF THE FOLLOWING R FUNCTIONS:

The R function

olshc4(x,y,alpha=0.05,xout=F,outfun=out)

49

0 5 10 15

40

60

80

100

120

140

0

50

100

150

200

SBT1

RA

N1T

1WW

ISS

T2

*

*

*

*

**

Figure 22: A scatterplot of reading data based on the R function out3d.

50

0 5 10 15

40

60

80

100

120

140

0

50

100

150

200

SBT1

RA

N1T

1WW

ISS

T2

*

*

*

*

**

Figure 23: A scatterplot of reading data based on the R function out3d, only now outliersamong the independent variables are ignored

51

X

Y

Observed values of Y when X=25



•••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••

Figure 24: An example of homoscedasticity. The conditional variance of Y , given X, doesnot change with X.

52

X

Y




•••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••

•••••••••••••••••••••••••••••••••••••••

Figure 25: An example of heteroscedasticity. The conditional variance of Y , given X, changeswith X.

53

computes 1− α confidence intervals and p-values for each of the individual parameters. Bydefault, 0.95 confidence intervals are returned. Setting the argument alpha equal to 0.1, forexample, will result in 0.9 confidence intervals.

The function

hc4test(x,y,xout=F,outfun=out)

tests the hypothesis that all of the slopes are equal to zero. Note that both functions includethe option of removing leverage points via the arguments xout and outfun.

17 Problems with Least Squares

EXAMPLE

Figure 27 shows a scatterplot of data dealing with the association between surface tem-perature of stars and their light intensity. The solid line is the least squares regression lineusing all of the data. The dashed line is the least squares regression line ignoring leveragepoints.

In regression, any outlier among the X values is called a leverage point. There are twokinds: good and bad.

BAD LEVERAGE POINTS

Roughly, bad leverage points are outliers that can result in a misleading summary of howthe bulk of the points are associated. That is, bad leverage points are not consistent withthe association among most of the data.

Note: even when no outliers are detected among the data for the independent variable,unusual points can have a substantial impact on the least squares estimator as illustrated inFigure 26.

EXAMPLE

54

•

•

•

•

•

•

•

•

•

•

•

••

•

•

•

•

•

•

• •

X

Y

−2 −1 0 1 2

−3

−2

−1

01

Figure 26: The two points marked by the square in the lower right corner have a substantialimpact on the least squares regression line. Ignoring these two points, the least squaresregression is given by the solid line. Including them, the least squares regression line is givenby the dotted line. Moreover, none of the X or Y values are declared outliers using theboxplot rule or MAD-median rule.

55

•

•

•

•

•

•

•

•

•

•

•

• •

•

•

•

•

••

•

•

•

•

•

•

••

•

•

•

•

••

•

•

•

•• •

•

•

•

••

•

•

•

Surface Temperature

Ligh

t Int

ensi

ty

3.6 3.8 4.0 4.2 4.4 4.6

4.0

4.5

5.0

5.5

6.0

Figure 27: The solid line is the least squares regression line using all of the star data. Ignoringoutliers among the X values, the least squares regression line is now given by the dottedline.

56

•

•

••

•

•

•

•

•

•

•

••

•

•

•

•

••

•

•

••

•

••

•

•

•

NIN

TN

0 10 20 30

12

34

•

•

•

•

•••

•

•

•

•

•

•

•

•• •

•

•

•

••

•

•

•

•

•

•

•

•• •

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

••

•

•• •

•

•

•

•

•

•

••

•

•

•

•

•

•

•

•

•

•

•

••

•

•

RAN1T1

WW

ISS

T2

50 100 150

6080

100

120

Figure 28: The left panel shows a scatterplot of the lake data. (TN=the mean annual totalnitrogen concentration. NIN=the average influent nitrogen concentration) The bad leveragepoints, located in the lower right portion of the scatterplot, have a tremendous influence onthe least squares estimate of the slope resulting in missing any indication of an associationwhen using least squares regression. The right panel shows a scatterplot of the reading data.Now the bad leverage points mask a negative association among the bulk of the points.

57

•

•

••

•

•

••

•

•

••

•

•

•

•

•

•

•

• •

•

••

•

•

•

0 5 10 15

02

46

810

12

o

o

Good Leverage Point

Bad Leverage Point

Figure 29: Shown are both good and bad leverage points. Good leverage points do not maskthe true association among the bulk of the points and they have the practical advantage ofresulting in shorter confidence intervals.

GOOD LEVERAGE POINTS

Leverage points can distort the association among most of the data, but not all leveragepoints are bad. A crude description of a good leverage point is a point that is reasonablyconsistent with the regression line associated with most of the data. Figure 29 illustratesthe basic idea.

BEWARE OF DISCARDING OUTLIERS AMONG THE Y VALUES (THE DEPEN-

58

DENT VARIABLE)

This results in using an invalid estimate of the standard error.

PEARSON’S CORRELATION

THE FOLLOWING FEATURES OF DATA INFLUENCE THE MAGNITUDE OFPEARSON’S CORRELATION:

• THE SLOPE OF THE LINE AROUND WHICH POINTS ARE CLUSTERED.

• THE MAGNITUDE OF THE RESIDUALS.

• OUTLIERS

• RESTRICTING THE RANGE OF THE X VALUES, WHICH CAN CAUSE r TOGO UP OR DOWN.

• Curvature.

Regardless of how large the sample size might be, Pearson’s correlation can miss a strongassociation among the bulk of the participants. The left panel of Figure 30 shows a bivariatenormal distribution with ρ = 0.8. in the right panel, ρ = 0.2. But look at Figure 31. Thecorrelation appears to be high, but ρ = 0.2

The R function

pcorhc4(x,y)

has been provided to compute a confidence interval for ρ using the HC4 method, meaningthat it deals with heteroscedasticity, and a p-value is returned as well.

18 ROBUST REGRESSION AND MEASURES OF

ASSOCIATION

18.1 The Theil-Sen Estimator

Momentarily consider a single predictor and imagine that we have n pairs of values:

(X1, Y1), . . . , (Xn, Yn).

59

X

Y

Correlation=.8

X

Y

Correlation=.2

Figure 30: When both X and Y are normal, increasing ρ from 0.2 to 0.8 has a noticeableeffect on the bivariate distribution of X and Y .

60

XY

Figure 31: Two bivariate distributions can appear to be very similar yet have substantiallydifferent correlations. Shown is a bivariate distribution with ρ = 0.2, but the graph is verysimilar to the left panel in the previous Figure where ρ = 0.8.

61

Consider any two pairs of points for which Xi > Xj. The slope corresponding to the twopoints (Xi, Yi) and (Xj, Yj) is

b1ij =Yi − YjXi −Xj

. (3)

The Theil (1950) and Sen (1964) estimate of β1 is the median of all the slopes representedby b1ij, which is labeled b1ts. The intercept is estimated with

b0ts = My − b1tsMx,

where My and Mx are the sample medians corresponding to the Y and X values, respectively.

The R function

tsreg(x, y, xout=F, outfun=out, iter=10, varfun=pbvar, corfun=pbcor, ...)

computes the Theil –Sen estimator. If the argument xout=T, leverage points are removedbased on the outlier detection method specified by the argument outfun.

A recent modification gives better results (e.g., higher power) when there are tied (du-plicated) values among the dependent variable. Use the R function:

tshdreg(x, y, HD = TRUE, xout = FALSE, outfun = out, iter = 10, varfun = pbvar,corfun = pbcor, plotit = FALSE, tol = 1e-04, RES = FALSE, ...)

There are several alternative robust regression estimators that deserve serious consider-ation. (See Wilcox, in press).

Bootstrap methods are currently the best techniques for testing hypotheses about theslopes:

regci(x, y, regfun=tsreg, xout=F, outfun=out, ...)

regtest(x, y, regfun=tsreg, xout=F, outfun=out, ...)

QUANTILE REGRESSION: AN EXTENSION OF LEAST ABSOLUTE VALUE RE-GRESSION.

The R

rqfit(x,y,qval=.5,xout=F,outfun=out,res=F),

62

performs the calculations; calls the function rq. (Both rq and rqfit assume that you haveinstalled the R package quantreg. To install this package, start R and use the commandinstall.packages("quantreg").) One advantage of rqfit over the built-in function rqis that, by setting the argument xout=T, it will remove any leverage points found by thefunction indicated by the argument outfun, which defaults to the function out.

The R function

mdepreg(x,y)

computes the deepest regression line. It is based on the goal of a regression line giving themedian of Y given X in a manner that protects against bad leverage points.

Other methods are least trimmed squares, S-estimators, skipped estimators, plus severalothers listed in Wilcox (2005).

DO NOT ASSUME THE REGRESSION LINE IS STRAIGHT

EXAMPLE

In a study by C. Chen and F. Manis, one general goal was to investigate the extentAsian immigrants to the United States learn how to pronounce English words and sounds.Figure 32 shows a scatterplot of age versus an individual’s score on a standardized test ofpronunciation. As indicated, there appears to be curvature as well as heteroscedasticity.

Note: an improved smoother for estimating a quantile regression line is now availablethat can make a substantial difference in a variety of situations. details are in the 4th ed.of my robust book. It uses a combination of the running interval smooth, the Harrell–Davisestimator and loess.

18.2 Dealing with Curvature: Smoothers

EXAMPLE

For a diabetes study, the goal was to understand the association between the age ofchildren at diagnosis and their C-peptide levels. As noted there, the hypothesis of a zero

63

6 8 10 12 14 16 18

4050

6070

8090

100

Age

Pro

nunc

iatio

n

Figure 32: Shown are age and an individual’s score on a pronunciation test. The lower,middle and upper regression lines indicate the predicted 0.25 quantile, the median, and the0.75 quantile, respectively. Note that the regression lines appear to be fairly horizontalamong young individuals, but as age increases, scores on the test tend to decrease, and thevariation among the scores appears to increase.

64

5 10 15

3.0

3.5

4.0

4.5

5.0

5.5

6.0

6.5

AGE

C−

PE

PT

IDE

Figure 33: A smooth created by the R function lplot using the diabetes data. Note thatthere seems to be a positive association up to about the age of 7, after which there is littleno association.

slope is rejected with the R function hc4test; the p-value is 0.034. Student’s T test of a zeroslope has a p-value of 0.008. So a temptation might be to conclude that as age increases,C-peptide levels increase as well. But look at Figure 33, which shows Cleveland’s smooth.Note that for children up to about the age of 7, there seems to be a positive association.But after the age of 7, it seems that there is little or no association at all.

EXAMPLE

The Well Elderly 2 study by Clark et al. (2012) was generally aimed at assessing theimpact of an intervention program designed to enhance the physical and emotional wellbeing of older adults. A portion of the study dealt with the association between the cortisol

65

awakening response and a measure of depressive symptoms, which is labeled CESD. Cortisolis a hormone produced by the adrenal cortex. The cortisol awakening response (CAR) refersto the change in cortisol upon awakening and measured again 30-60 minutes later. Here thefocus is on measures taken after six months of intervention. The sample size is n = 328.Using least squares regression, which is the most commonly used method for detecting anassociation, no association is found. The p-value is 0.218. Using least squares regression inconjunction with a method that allows heteroscedasticity (via the R function olshc4), thep-value is 0.168. Using the Theil–Sen estimator (via the R function regci), the p-value is0.531. Eliminating leverage points and again using the Theil–Sen estimator, the p-value is0.242. Given the reasonably large sample size, these results might suggest that there is littleor no association.

However, look at Figure 34, which shows a plot of the regression line using the runninginterval smoother. (The data are stored on the author’s web page in the file A3B3C.txt,which can be downloaded as described in Section 1.5.) Assuming the data have been readinto the R variable A3B3C, here are the R commands that were used:

dif=A3B3C$cort1-A3B3C$cort2rplot(dif,A3B3C$CESD,xout=TRUE,xlab=‘CAR’,ylab=‘CESD’).

The R variable A3B3C$cort1 contains the cortisol measures upon awakening and A3B3C$cort2contains the cortisol measures 30-60 minutes after awakening. So the first R command storesthe CAR values in the R variable dif. The argument xout=TRUE indicates that observa-tions associated with leverage points were removed. Notice that when the CAR is negative(cortisol increases after awakening), the regression line appears to be nearly horizontal. Butwhen the CAR is positive, there appears to be a positive association with CESD (depressivesymptoms). Focusing on only the data for which the CAR is positive, the p-value based onthe Theil–Sen estimator is 0.038. For CAR negative, no association is found; the p-value is0.631.

EXAMPLE

The next example involves examining the relationship between quality of life and anxietyamong 47 patients diagnosed with depressive and anxiety disorders. The data come froma study by McEvoy et al. (2014). Quality of Life (QOL) was measured using the Qualityof Life Enjoyment and Satisfaction Questionnaire Short Form (Endicott, Nee, Harrison &Blumenthal, 1993), with scores ranging from 0 to 100, larger scores indicating greater qualityof life. Anxiety was measured using the Beck Anxiety Inventory (Beck, Epstein, Brown, &Steer, 1988). Scores can range from 0 to 63, with higher scores indicative of higher levelsof anxiety. The hypothesis under investigation was that as anxiety increases, quality of lifeshould decrease in a linear fashion. The data are stored in the file anxiety.csv. Correlation

66

−0.4 −0.2 0.0 0.2 0.4

010

2030

4050

CAR

CE

SD

Figure 34: Shown is the smooth created by the R function rplot. The solid line reflectsan estimate of the typical CESD measure (depressive symptoms) given a value for the CAR(the cortisol awakening response). Notice that there appears to be a distinct bend close towhere the CAR is zero.

67

coefficients suggest that there is a strong linear relationship between the two variables.Pearson’s correlation is -0.53 and the robust skipped correlation is -0.59. (The skippedcorrelation can be computed with the R function scor or the matlab function described byPernet et al. 2013.) Classical (least squares) and robust regression estimators also suggestthat there is a linear relationship between the two variables. Based on the classical (leastsquares) regression estimator (using the R function olshc4, which allows heteroscedasticity),the slope is estimated to be -0.70, CI [-1.06, - 0.34]. Using the modified Theil–Sen estimator,the estimated slope is similar, -0.84, CI [-1.32, -0.41]. Confidence intervals were computedwith the R function regci and the argument regfun=tshdreg.) A scatterpot of the data andthe estimate of the regression line are shown Figure 36.

Next, we fit a regression line using lplot, which is the curved line in Figure 37. Thissuggests that there is a strong linear relationship between quality of life and anxiety whenanxiety scores are between 0 and 20, but for anxiety scores above 20 there appears to belittle or no association.

This finding can be probed further by splitting the data into two groups, according towhether anxiety scores are above or below 20, and computing regression slopes separatelyfor each group. If the regression line is truly straight, then the slopes should be similar. Ifthe regression line is curved, they should differ. The estimated slope (using the R functionregci) is -2.15, CI [-3.17, -1.46] for anxiety scores under 20 and the strength of associationis estimated to be .63 (using the R function tshdreg). For scores above 20 the estimatedslope is -0.06 and now the strength of the association is 0.04. (When using least squaresregression and the usual sample variance, the strength of the association used here reduces tothe absolute value of Pearson’s correlation.) There is a significant difference between the twoslopes (using the R function reg2ci and the argument est=tshdreg). The estimated differenceis -2.2 CI [-3.65, -0.97], which provides additional evidence of a curvilinear association.

EXAMPLE

Consider again the Well Elderly study, only now the goal is to predict a measure ofmeaningful activities, labeled MAPA, using two independent variables: the CAR (the cortisolawakening response) and CESD (a measure of depressive symptoms). Using least squaresregression, the slope associated with the CAR is not significant using the R function olshc4;the p-value is 0.787 (with leverage points removed). So when taking CESD into account, theusual regression model does not indicate any association between the CAR and MAPA. Also,a smooth indicates a fairly straight, horizontal regression line between MAPA and the CAR(ignoring CESD), and no association is found using least squares or the Theil–Sen estimatorwhen ignoring CESD. Moreover, no association between MAPA and the CAR is found whenfocusing on only the CAR values greater than zero. That is, unlike depressive symptoms,no association is found between the CAR and MAPA when cortisol decreases shortly after

68

0 10 20 30 40

2030

4050

6070

80

Anxiety

Qua

lity

of L

ife

Figure 35: The straight line is the estimate of the regression line using a modification of theTheil–Sen estimator.

69

o

o

o

o

o

oo

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

0 10 20 30 40

2030

4050

6070

80

Anxiety

Qua

lity

of L

ife

Figure 36: The LOESS regression line

70

awakening. (The same is true when CAR is negative.) In summary, all of the analyses justdescribed find no association between MAPA and the CAR.

However, look at Figure 38, which shows the regression surface for predicting MAPAgiven a value for CAR and CESD. Notice the distinct bend close to where CESD is 16. Thenature of the association appears to depend crucially on whether depressive symptoms arehigh or low. Applying least squares regression again, but now using only the data for whichthe CAR is less than 16, both slopes for predicting MAPA are significant when testing atthe 0.05 level. For the slope associated with CAR the p-value is 0.022 and for the slopeassociated with CESD the p-value is less than 0.001. Here are the R commands that wereused to compute these p-values, again assuming the data have been read into the R variableA3B3C:

dif=A3B3C$cort1-A3B3C$cort2flag2=A3B3C$CESD

CAR

−0.6

−0.4

−0.2

0.0

0.2

0.4

CESD

0

10

20

30

MA

PA

25

30

35

Figure 37: Shown is the smooth created by the R function lplot using the CAR and CESDto predict the typical MAPA score. Notice the distinct bend close to where CESD is equal to16. Focusing on only those participants who have a CESD score less than 16, an association isfound between CAR and MAPA, in contrast to an analysis where any possibility of curvatureis ignored.

72

does a smooth for one or more quantiles using the running interval smoother in conjunctionwith Harrell–Davis estimator.

18.3 COMPARING THE SLOPES OF TWO INDEPENDENTGROUPS BASED ON ROBUST ESTIMATOR

Consider two independent groups and imagine that the goal is to test

H0 : β11 = β12, (4)

The hypothesis that the two groups have equal slopes. Use percentile bootstrap method.Heteroscedasticity is allowed.

THE R FUNCTION

reg2ci(x1, y1, x2, y2, regfun=tsreg, nboot=599, alpha=0.05, plotit=T)

compares the slopes of two groups.

19 Robust Measures of Association

TWO TYPES: M AND O.

M: guard against outliers among the marginal distributions

O: guard against outliers in a manner that takes into account the overall structure ofdata

19.1 MEASURING THE STRENGTH OF AN ASSOCIATIONBASED ON A ROBUST FIT

η2 =τ 2(Ỹ )

τ 2(Y ). (5)

19.2 Which Predictors Are Most Important?

Consider the usual linear model

Y = β0 +∑

βjXj + �, (6)

73

Can use LASSO, least angle regression and 0.632 bootstrap to identify more importantIVs

HOW STRONG IS THE EMPIRICAL EVIDENCE THAT THE MOST IMPORTANTINDEPENDENT VARIABLES HAVE BEEN IDENTIFIED?

How strong is the empirical evidence that the first independent variable, for example, ismore important than the second independent variable? When there are three independentvariables, how strong is the empirical evidence that the first two are more or less importantthan the third? Moreover, how can these questions be addressed in a manner that allowsheteroscedasticity and simultaneously deal with skewed distributions and outliers?

Can address this with methods in Wilcox (2017, section 11.10.6): Use the R functionregIVcom.

19.3 Tests for Linearity

The R function

lintest(x,y,regfun=tsreg,nboot=500,alpha=0.05)

tests the hypothesis that a regression surface is a plane. It uses a wild bootstrap method.

The R function

lintestMC(x,y,regfun=tsreg,nboot=500,alpha=0.05)

is the same as lintest, only it takes advantage of a multicore processor, if one is available,with the goal of reducing execution time.

20 MODERATOR ANALYSIS

Standard approach uses least squares assuming that

Y = β0 + β1X1 + β2X2 + β3X1X2 + e. (7)

A more flexible approach for establishing that there is an interaction is to test

H0 : Y = β0 + f1(X1) + f2(X2) + e, (8)

74

the hypothesis that for some unknown functions f1 and f2, a generalized additive model fitsthe data, versus the alternative hypothesis

H1 : Y = β0 + f1(X1) + f2(X2) + f3(X1, X2) + e.

The R function

adtest(x, y, nboot=100, alpha=0.05, xout=F, outfun=out, SEED=T, ...)

tests this hypothesis.

EXAMPLE

A portion of a study conducted by Shelley Tom and David Schwartz dealt with theassociation between a Totagg score and two predictors: grade point average (GPA) and ameasure of academic engagement. The Totagg score was a sum of peer nomination itemsthat were based on an inventory that included descriptors focusing on adolescents’ behaviorsand social standing. (The peer nomination items were obtained by giving children a rostersheet and asking them to nominate a certain amount of peers who fit particular behavioraldescriptors.) The sample size is n = 336. Assuming that the model given by Equation(7) is true, the hypothesis of no interaction (H0: β3 = 0) is not rejected using the leastsquares estimator. The p-value returned by the R function olswbtest is .6. (And the p-valuereturned by olshc4 is 0.64.) But look at the left panel of Figure 39, which shows the plotof the regression surface assuming Equation (7) is true. (This plot was created with the Rfunction ols.plot.inter.) And compare this to the right panel, which is an estimate ofthe regression surface using LOESS (created by the R function lplot). This suggests thatusing the usual interaction model is unsatisfactory for the situation at hand. The R functionadtest returns a p-value less than .01 indicating that an interaction exists.

21 MEDIATION ANALYSIS

SEE MY ROBUST BOOK FOR DETAILS

22 ANCOVA

Methods are now available that allow both types of heteroscedasticity, non-parallel regressionlines and outliers. there are even substantially improved methods dealing with curvature in

75

GPA

ENGA

GE

TOTA

GG

GPA

ENGA

GE

TOTA

GG

Figure 38: The left panel shows the plot created by ols.plot.inter, which assumes thatan interaction can be modeled with Y = β0 +β1X1 +β2X2 +β3X1X2 + e and where the leastsquares estimate of the parameters is used. The right panel shows an approximation of theregression surface based on the R function lplot

76

a very flexile manner. the practical importance of the latter methods cannot be stressedenough.

Too many options to explain quickly

EXAMPLE

Do males and females differ in terms of depressive symptoms when a measure of inter-personal support (PEOP) is used as a covariate? Classic ANCOVA: no. Assuming straightregression lines and using a robust regression estimator: no. But when using a smootherthat deals with curvature in a flexible method: yes. For PEOP relatively low, females tendto have higher depressive symptoms. Figure 39 shows the plot created by the R functionancova.

Even with a single covariate, many new and improved ANCOVA methods have beenadded to my R package. Details are in the 4th ed. of my book on robust methods.

Well Elderly 2 study, SF36 before and after intervention. Use cortisol awakening responseas covariate. No difference based on medians, or using the obvious parametric linear models.But looking at the 20% trimmed mean using a percentile bootstrap method, differences arefound when CAR is negative (cortisol increases after awakening). In contrast, assuming theregression lines are straight and using the R function Dancts, no significant differences arefound.

In a similar manner, find significant differences using CESD (measure of depressive symp-toms) as the dependent variable.

22.1 Multiple Covariates

There are various ways multiple covariates might be handled some of which have been devel-oped just in the last few years, but no details are given here. They include both parametricregression models as well as methods based on smoothers (Wilcox, 2017, Chapter 12).

Data from the Well Elderly 2 study (Clark et al., 2011;Jackson et al., 2009) are used toillustrate an ANCOVA method for two covariates when there is curvature. A general goalin the Well Elderly 2 study was to assess the efficacy of an intervention strategy aimed atimproving the physical and emotional health of older adults. A portion of the study wasaimed at understanding the impact of intervention on a measure of self-perceived physicalhealth and mental well-being, which was based on the Rand 36-item (SF36) health survey(Hays, 1993; Mchorney et al., 1993). Higher scores reflect greater perceived health andwell-being. There were two covariates. The first is a measure of depressive symptoms

77

5 10 15

010

2030

4050

PEOP

CE

SD

+

+

++

++

+

++

+

+

+

+

+

+

+ +

+

+

++

+

+

+

+

++

+

+

+

++

+

+++

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+++

+

+

+

+

+

+

+

+

+

+

+

+++

+

++

+

+

+

+

+++

+

++

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

++

+

+

+

+

++

+

+

+

+

++

++

+

+

+

++

+

++

+

++

+

++

+

+

+

++

+

+

+

+

+

++

++

+

+

+

+

+

+

+

+

+

+

+

++

+ +

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+

++

+

+

+

+

+

+

++

+

+

+

++

+

+ +

+

++

++

Figure 39: A comparison of males (solid line) and females (dotted line) based on a measureof depressive symptoms using a measure of personal support as a covariate78

−0.4 −0.2 0.0 0.2 0.4

2030

4050

CAR

SF

36 ++

+

+

+

+

+ + +

+

+

+

+

+

++

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

++

+

+

+

+

+

+

++

+

+

+

+

++

+

+

+

+

++

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

+

+

++

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Figure 40: Regression lines for predicting perceived health based on the cortisol awakeningresponse. The solid line is the estimated regression line prior to intervention and the pointsprior to intervention are indicated by an ‘o.’

79

−0.4 −0.2 0.0 0.2 0.4

010

2030

4050

CAR

CE

SD

*

*

*

**

*

*

***

*

**

**

*

*

*

**

*

**

*

*

**

**

*

*

*

*

*

*

*

*

**

*

**

*

*

*

*

*

*

*

**

*

**

**

*

*

*

*

*

*

*

*

*

*

*

*

*

*

****

*

*

*

*

*

***

*

*

*

*

*

*

*

*

*

*

*

**

*

*

*

*

*

**

****

*

**

**

*

*

*

*

*

*

*

**

*

*

*

*

*

*

*

*

*

*

*

*

*

**

*

*

*

*

*

**

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

***

*

*

*

*

*

*

*

*

*

*

**

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

**

**

*

*

*

*

*

*

*

*

*

*

*

*

**

*

*

*

*

*

*

***

*

**

*

*

*

*

*

*

*

*

*

*

**

*

*

*

*

**

+

+

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

++

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

++

+

+

+

+

+

+

+

++++

+

+

+

+

+

+

+

+

++

++

+

++

+

+

+++

+

+

++++

+

+

+

+

+

++

+

+

+

+

+

+++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

++

+

++

+

+

+

+

+

+

+

+

+

++

++

++

+

+

++

+

+

+

+

+

+

+

+

+

++

+

+

+++

++

++

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

++

+

+

+

+

+

+

++

+

+

+

+

+

+

+

Figure 41: The 0.75 quantile regression lines for depressive symptoms (CES-D) before (solidline) and after intervention. The points after intervention are indicated by a +.80

CAR

−0.2

−0.1

0.0

0.1

CESD

5

10

15

20S

F36

0

5

10

Figure 42: Regression surface predicting the typical difference in SF36 scores as a functionof the CAR and CESD. 81

based on the center for epidemiologic studies depressive scale (CESD). the CESD (Radloff,1977) is sensitive to change in depressive status over time and has been successfully usedto assess ethnically diverse older people (Lewinsohn et al., 1988; Foley et al., 2002). Higherscores indicate a higher level of depressive symptoms. The other covariate was the cortisolawakening response (CAR), which is defined as the change in cortisol concentration thatoccurs during the first hour after waking from sleep. Extant studies (e.g., Clow et al.,2004; Steptoe, 2007; Chida & Steptoe, 2009) indicate that measures of stress are associatedwith the CAR. (The CAR is taken to be the cortisol level upon awakening minus the levelof cortisol after the participants were awake for about an hour.) The sample size for thecontrol group was 187 and the sample size for the group that received intervention was 228.Figures 42 shows the regression surface for predicting the typical difference in SF36 scoresfor a control groups versus an experimental group.

23 SOME FINAL COMMENTS ON HOW TO PRO-

CEED

Rigid rules about how to gain multiple perspectives seem unwise. technology keeps changingand improving, and substantive issues might dictate to some extent which perspectives arerelatively useful. However, some rough guidelines can be offered:

1. Whenever possible, plot the data. Error bars are popular, they provide someuseful information, but they reflect a rather narrow feature of the data. boxplots,estimates of the distributions, shift function, and smoothers can be invaluable.

2. Be careful about using the mean to the exclusion of all other measures of centraltendency. From a substantive point of view, situations might be encounteredwhere the mean is the most meaningful measure or there might be a reason tofocus on a particular measure such as the median. As illustrated here, differentmeasures of central tendency can broaden our understanding of data.

3. Consider methods aimed at comparing multiple quantiles with the goal of getting amore detailed understanding of how distributions differ. (For discrete data wherethe the cardinality of the sample space is relatively small, consider the R functionbinband, which is similar in spirit to the shift functions that were described.)

4. Consider measures of effect size other than Cohen’s d. If they paint a differentpicture regarding how groups compare, the reasons for the differences need to beunderstood.

5. Be aware that the method used to detect outliers can make a practical difference.This is particularly true when dealing with multivariate data.

82

6. When dealing with regression, consider measures of association beyond the obvi-ous three choices: Pearson, Kendall’s tau and Spearman’s rho. Skipped correla-tions, as well as measures of association based on some robust regression estimator,are among the possibilities.

7. When dealing with regression, consider robust estimators including methods thatdeal with curvature in a flexible manner. For more details regarding the relativemerits of the estimators that might be used, see Wilcox (2017, chapters 10–12).

8. Routinely use methods that allow heteroscedasticity. Homoscedastic methods aresatisfactory in terms of testing the hypothesis that groups have identical dis-tributions or that variables are independent. But when trying to understandhow groups differ and how variables are related, there is the concern that typ-ically, homoscedastic methods use the wrong standard error, which might havea substantial impact when testing hypotheses or computing confidence intervals.Heteroscedasticity can now be accommodated when using any of the usual anovadesigns as well as all of the robust regression estimators that have been derived.

9. Be cautious with models. As Huber (2011, p. 31) puts it: “Insight is gainedby thinking in models, but reliance on models can prevent insight.” An exampleis the model where groups have normal distributions and differ only in terms oftheir means. This model leads naturally to Cohen’s d, but it can prevent insightas was illustrated.

83

Date post:	25-Jan-2021
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

MODERN ROBUST METHODS: BASICS AND SOME ......methods, plus some recent advances, will be described....

Documents