Download - BASICS OF WET STATISTICS

BASICS OF WET STATISTICS

SETAC Expert Advisory PanelPerformance Evaluation and

Data Interpretation

GRAPH THE DATA

Concentration (% Effluent)0 1 2 3 4 5 6

Res

pons

e (%

Eff

ect)

0

10

20

30

40

50

60

70

80Raw DataMean

ANALYZE DATA FOLLOWING EPA WET STATISTICAL FLOWCHARTS

• Hypothesis Tests–NOAEC (Acute)–NOEC (Chronic)

• Point Estimation–LC50 (Acute)–EC25 or IC25 (Chronic)

PURPOSE OF HYPOTHESIS TESTS AND BASIC CONSIDERATIONS

• Purpose - Determine if two things (responses) are different

• Relevance of initial (control) condition(s)• Power of statistical test

Test #0 2 4 6 8 10 12 14 16

Effe

ct a

t NO

EC

-10

-5

0

5

10

15

20

25

EFFECTS ASSOCIATED WITH THE NOEC IN FATHEAD MINNOW

GROWTH DATA

EPA HYPOTHESIS TEST FLOWCHART (MULTI-CONC)

• Test assumptions of ANOVA– Transform data if

necessary – Normally distributed

data• Shapiro-Wilks Test

– Variance is equal• Bartlett’s test

• Select appropriate test– Parametric Tests

• Assumptions met– Non-Parametric Tests

• Assumptions NOT met

MULTIPLE CONCENTRATION PARAMETRIC TESTS

• Dunnett’s Test–Equal number of replicates in

each treatment

• Multiple t-tests with Bonferroni adjustment–Unequal number of replicates in

each treatment

MULIPLE CONCENTRATION NON-PARAMETRIC TESTS

• Steel’s Many-one Rank Test–Equal number of replicates in

each treatment

• Wilcoxon Rank Sum–Unequal number of

replicates in each treatment

PASS/FAIL TESTS• Control and critical concentration (IWC)• Test assumptions

– Transformations - Arc sine square root– Normality - Shapiro-Wilk’s test– Homogeneity - F-test

• Test for statistical difference– Normal/homogeneous - t-test– Non-normal - Wilcoxon rank sum test– Normal/heterogeneous - Modified t-test

PURPOSE OF POINT-ESTIMATIONAND BASIC CONSIDERATIONS

Describe relationship between two parameters

Selection of a significant response

Elucidation of relationship

Confidence in relationship

0 2 4 6 8 10 120

2

4

6

8

10

12

EPA POINT-ESTIMATE METHOD SELECTION

• Binomial Data–Probit–Spearman-Karber

• Untrimmed or trimmed–Graphical

• Continuous Data–ICp / Linear Interpolation

PROBIT ANALYSIS• Binomial data only (two choices)

– Dead or alive, normal/abnormal, etc.• Normally distributed• Adjusted for control mortality

– Abbott’s correction• At least two partial mortalities• Sufficient fit

– Chi-square test for heterogeneity• Designed for LC50/EC50 and confidence intervals

SPEARMAN-KARBER• Nonparametric model• Monotonic concentration response

– Smoothing• Adjusted for control mortality• Zero response in the lowest concentration• 100% response in the highest concentration• Calculates LC50/EC50 • Confidence interval calculation requires at least

one partial response

TRIMMED SPEARMAN-KARBER

• Same basic procedure as Spearman-Karber

• Requires at least 50% mortality in one concentration

• The trimming procedure is employed when the zero and/or 100% response requirements of Spearman-Karber method are not met.

GRAPHICAL METHOD

• Specifics–Nonparametric procedure–Adjusted for control mortality–Monotonic concentration response

• Smoothing–Linear interpolation of “all or nothing” response–Calculates LC50/EC50 - No CI’s

INHIBITION CONCENTRATION (ICp)

• Specifics– Nonparametric procedure– Calculates any effect level– Monotonic concentration response

• Smoothing– Random, independent, and representative data– Piecewise linear interpolation– Bootstrapped confidence intervals

SOFTWARE PROGRAMS

• Many software packages/programs are available

• DO NOT assume they follow the EPA recommended analysis

• DO verify the software by running example datasets from the methods manuals

DO THE RESULTS MAKE SENSE ???


Perc

ent E

ffec

t

80706050403020100

Raw DataMeanProbit% MSDEC25

TOXIC UNITS IN WET TESTS

• Goals1) Standardize the results of

toxicity tests to simulate chemical specific criteria.

2) Create a reporting value which increases with sample toxicity.

DEFINITIONS OF TU VALUES

• Acute– TUa = 100/LC50 OR

• Chronic– TUc = 100/NOEC

• where the NOEC is defined by hypothesis testing or the IC25

SUMMARY OF THE ANALYSIS OF WET DATA

• STEP 1: Graph The Data

• STEP 2: Analyze The Data By EPA Methods

• STEP 3: Do The Results Make Sense?

ANALYSIS OF MULTIPLE CONTROL

TOXICITY TESTSSETAC Expert Advisory PanelPerformance Evaluation and

Data Interpretation

WHAT IS A CONTROL SAMPLE ?

• A treatment in a toxicity test that duplicates all the conditions of the exposure treatments but contains no test material. The control is used to determine the absence of toxicity of basic test conditions (e.g. health of test organisms, quality of dilution water). Rand and Petrocelli, 1985.

WHAT IS A REFERENCE SAMPLE?

• “A reference sample is the “control” by which to gauge the instream effects of a discharge at a particular site.” Grothe et.al. 1996. - site-specific - ecoregional

• When manipulations are made to SOME of the test concentrations or treatments.

• To compare “standard” and “alternative” methods.

• When testing control and/or reference samples in which the quality is unknown.

• When a sample used for toxicity testing possess physico-chemical properties significantly different from water in which surrogate test organisms were cultured.

• TIEs - Toxicity Identification Evaluations.

WHEN ARE MULTIPLE CONTROLS USED?


Example #1• When manipulations are made to

SOME of the test concentrations or treatments.

BRINE ADDITION IN MARINE TESTS

Concentration Effluent Volume Brine Volume Seawater Volume Salinity

( 0 ppt) (68 ppt)(34 ppt)

Seawater 0 ml 0 ml 1000 ml 34 ppt Control1.25 % 12.5 ml 0 ml

987.5 ml 34 ppt2.5 % 25 ml 0 ml

975 ml 33 ppt5 % (IWC) 50 ml 0 ml

950 ml 32 ppt10 % 100 ml 100 ml

800 ml 34 ppt20 % 200 ml 200 ml

600 ml 34 pptBrine 0 ml 200 ml

600 ml 34 pptControl + 200 ml

ANALYSIS OF TWO-CONTROL TOXICITY TESTS WHEN SOME CONCENTRATIONS

WERE MANIPULATED

N o Y es

Y esY es N o N o

A n a lyze IW C an d L ikeTrea ted C on cs . an d

C on tro l U s in gE P A F low ch arts

R ep eat Tes t

IW C Trea tedC on tro l V a lid ?

P oo l C on tro lsan d A n a lyze A ll D ata

U s in g E P A F low ch arts

A n a lyze IW C an d L ikeTrea ted C on cs . an d

C on tro l U s in gE P A F low ch arts

C on tro l t-Tes tN on -S ig n ifican t?

B o th C on tro lsV a lid ?

WHEN ARE MULTIPLE CONTROLS USED ?

Example #2• To compare “standard” and “alternative”

methods.• To determine treatment effects.

EFFECT OF KELP STORAGE ON SENSITIVITY TO COPPER

F re s h S t o r e d

Co

pp

er C

on

ce

ntra

tion

(pp

b)

0

5

1 0

1 5

2 0

2 5

3 0

3 5

4 0

F re s h S t o r e d0

1 0

2 0

3 0

4 0

5 0

6 0


1 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0


2 0

4 0

6 0

8 0

1 0 0

F re s h S t o r e d

Co

pp

er C

on

ce

ntra

tion

(pp

b)

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

E f f e c t L e v e l1 5 1 0 1 5 2 5

Ch

an

ge

in E

C V

alu

es

(Sto

red

- Fre

sh

; pp

b C

u)

- 7 0

- 6 0

- 5 0

- 4 0

- 3 0

- 2 0

- 1 0

0

1 0

E C 1 E C 5 E C 1 0 E C 1 5

E C 2 5

*

**

*


Example #3• When testing control and/or reference samples in

which the quality is unknown. - Use of a reference not previously tested (ambient). - Quality of reference may vary from season to season (ambient). - When the potential exists for a sample to be impacted or impaired.

EFFECT OF A NON-POINT DISCHARGE ON AN INSTREAM

DILUTION WATERC. dubia Control Survival

0

20

40

60

80

100

120

Apr-96 May-96 Jun-96 Aug-96 Dec-96Test Date

Perc

ent S

urvi

val

Lab ControlUpstream


Example #4• When a sample used for toxicity testing

possess physico-chemical properties significantly different from water in which surrogate test organisms were cultured - As a natural phenomenon - Due to sample manipulation


Example #5• TIEs - Toxicity Identification Evaluations.

- Methods require the use of multiple controls called “blanks” which are

exact manipulations on the dilution water.

TAKE HOME POINTS• Multiple negative controls are a good idea if:

- New reference or control sample.

- Performing any sample manipulations.

- Comparing “standard” vs. “alternative” methods. Multiple Positive Controls (e.g. Ref Tox tests) should be used in this situation

- Using multiple organisms with different sensitivities.

REFERENCES:• Short-Term Methods For Estimating The Chronic Toxicity Of Effluents And

Receiving Water To Freshwater Organisms. EPA-600-4-91-002. July, 1994.

• Methods for Measuring the Acute Toxicity of Effluents and Receiving Waters to Freshwater and Marine Organisms. EPA/600/4-90/027F. August, 1993. - Have recommendations for multiple controls under certain conditions.

• Methods for Aquatic Toxicology Identification Evaluations. Phase I Toxicity Characterization Procedures. EPA/600/6-91/003. February, 1991.- Has recommendations for multiple controls “blanks”.

• Whole Effluent Toxicity Testing: An Evaluation of Methods and Prediction of Receiving Water System Impacts. Grothe et al.. 1996.

SUSPICIOUS DATA AND OUTLIER DETECTION


Data Interpretation

CONCERNS

• Outliers make interpretation of WET data difficult by

– Increasing the variability in test responses

– Biasing mean responses

IDENTIFYING OUTLIERS

• Graph raw data, means and residuals

Raw Data and Means

Copper Concentration (ppb)0 100 200 300 400

Pro

porti

on A

live

0.0

0.2

0.4

0.6

0.8

1.0

Residuals

Copper Concentration (ppb)0 100 200 300 400

Res

idua

l (pr

edic

ted

- obs

erve

d)

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

IDENTIFYING OUTLIERS

• Formal statistical test - Chauvenet’s Criterion– Using the previous mysid data, the critical values are:

• Mean = .80, Std. Dev. = 0.302, n = 8

– Chauvenet’s Criterion Value = n/2 = 4– Z-score = 2.054 (two-tailed probability of n/2 = 4 %)

– The calculations are:• Equation 1) (Z-score)(Std. Dev.) = (2.054)(0.302) = 0.620• Mean Equation 1 = 0.80 0.620 = 1.42 - 0.18• Outlier Range is >1.42 or <0.18

– A value of 0.2 is not an outlier.

CAN A CAUSE BE ASSIGNED TO THE OUTLIER(S) ?

• Review analyst’s daily observations• Check water chemistry data• Check data entry• Check calculations

• If cause can be assigned to outlier, then reanalyze data without outlier

DETERMINE EFFECT ON TEST INTERPRETATION

• Keep all data unless cause is found• Analyze data with and without suspect data

• Determine effect of suspect data on test interpretation

• Results reported will depend on effect of outlier(s) on test interpretation, best professional judgement, and discussions with regulatory agency

REPORTING OF RESULTS• Insignificant Effect

– With Outlier• IC25 = 131 (96.9-158) ppb• NOEC = 100 ppb• % MSD = 28.1 %

– Without Outlier• IC25 = 124 (93.6-152) ppb• NOEC = 100 ppb• % MSD = 20.9 %

• Report results with suspect data included

• Significant Effect– With Outlier

• IC25 = 131 (96.9-158) ppb• NOEC = 100 ppb• % MSD = 28.1 %

– Without Outlier• IC25 = 106 (83.8-126) ppb• NOEC = 50 ppb• % MSD = 12.2 %

• Report results from both analyses

CONCENTRATION -RESPONSE CURVES

IN WET TESTSSETAC Expert Advisory PanelPerformance Evaluation and

Data Interpretation

NON-MONOTONICITYvs. HORMESIS

• Hormesis is a toxicological response to a single toxicant characterized by low-concentration stimulation but is inhibitory at higher doses.

• Non-monotonicity is a relationship where a smaller response (e.g. mortality) is observed at the higher of two consecutive concentrations.

TYPICAL TRAITS OF HORMESIS

• Calabrese and Baldwin, 1998

• Hormetic - concentration range

• Magnitude of hormetic stimulation

• Range from maximum stimulation to NOEL (NOEC) Concentration

Res

pons

e

Max. Stimulation (30-60%)

Hormetic Range (10 x)

Max. Stimulationto NOEL Range

(4-5 x)

NOEL

WHY IS HORMESIS DIFFICULT TO DETECT IN TOXICITY TESTS?

• Inadequate concentration series

• Inadequate description of concentration - response

• Inadequate statistical power

• Hormesis is not the cause

Well Defined Hormetic Response

Concentration100 1000

Res

pons

e

Poorly Defined "Hormetic" Response

Concentration100

Res

pons

e

EFFECTS OF NON-MONOTONIC DATA

NOEC >LOECSea Urchin Fertilization Data

Percent Effluent0 1 2 3 4 5 6

Per

cent

Fer

tiliz

ed

70

75

80

85

90

95

100

Statistically Significant Reduction

NOEC = 6.0 %LOEC = 0.36 %% MSD = 5.82 %IC25 = > 6.0 %

• Limited replicates (4)• High control & low

concentration variability

• High Statistical Power• NOEC > LOEC

EFFECTS OF NON-MONOTONIC DATA

HETEROGENEITY IN PROBIT ANALYSIS

• Limited replicates (5)• High control & low

concentration variability• Significant chi-square • Inflated confidence

intervals• Reanalyze with non-

parametric models

Significant Chi-Square for Heterogeneity

0.00.10.20.30.40.50.60.70.80.91.0

1 10 100 1000 10000

Dose ppb

Resp

onse

EFFECTS OF NON-MONOTONIC DATA SMOOTHING IN ICp

ANALYSIS• Smoothing is used in

all non-parametric models.

• Smoothing procedure averages treatment responses

• Increases estimated toxicity

Selenastrum Cell Growth Data

Percent Effluent0 20 40 60 80 100

Res

pons

e (%

of C

ontro

l)

0

25

50

75

100

125

150

175

200

225

250

Actual ResponseSmoothed Response

REMEDIES FOR PROBLEMS ASSOCIATED WITH NON-

MONOTONIC DATA• Better concentration series selection• Increase number of replicates• % MSD limits (NOEC’s)• Use of more robust parametric models Bailer

and Oris, 1997 Kerr and Meador, 1996 Baird et al., 1996

• Concentration-response curve criterion

CONFIRMATION OF A CONCENTRATION-RESPONSE

CURVE

• Graphical• Linear regression Analysis• Correlation Analysis

GRAPHIC ANALYSIS OF CONCENTRATION-RESPONSE

CURVESConcentration-Response Curve Absent


Res

pons

e (%

Eff

ect)

-10

0

10

20

30

40

50

60

70

80

% MSDRaw DataMean

Concentration-Response Curve Present


Res

pons

e (%

Eff

ect)

-10

0

10

20

30

40

50

60

70

80

% MSDRaw DataMean

GRAPHIC ANALYSIS OF CONCENTRATION-RESPONSE

CURVESConcentration-Response Curve Present ???

Concentration(% Effluent)

0 1 2 3 4 5 6

Res

pons

e (%

Eff

ect)

-100

1020304050607080

Raw DataMean% MSD

LINEAR REGRESSION ANALYSIS OF CONCENTRATION-RESPONSE CURVES

Concentration-Response Curve Absent


Res

pons

e (%

Eff

ect)

-10

0

10

20

30

40

50

60

70

80

Raw DataMeanProbit% MSD

Negative Slope Not Sig. Dif. from Zero

Concentration-Response Curve Present


Res

pons

e (%

Eff

ect)

-10

0

10

20

30

40

50

60

70

80


Positive Slope and Sig. Dif. than Zero

LINEAR REGRESSION ANALYSIS OF CONCENTRATION-RESPONSE CURVES

Concentration-Response Curve Present ???


0 1 2 3 4 5 6

Res

pons

e (%

Eff

ect)

-100

1020304050607080


Positive Slope Not Sig. Dif. from Zero

CORRELATION ANALYSIS OF CONCENTRATION-RESPONSE

CURVESConcentration-Response Curve Present


Res

pons

e (%

Eff

ect)

0

10

20

30

40

50

60

70

80

% MSDRaw DataMean

Significant Negative Correlation(r = -0.965, P = 0.000)

Concentration-Response Curve Absent


Res

pons

e (%

Eff

ect)

-10

0

10

20

30

40

50

60

70

80

% MSDRaw DataMean

Insignificant Correlation(r = -0.0931, P = 0.593)

CORRELATION ANALYSIS OF CONCENTRATION-RESPONSE

CURVESConcentration-Response Curve Present ???


0 1 2 3 4 5 6

Res

pons

e (%

Eff

ect)

-100

1020304050607080

Raw DataMean% MSD

Significant Negative Correlation(r = -0.389, P = 0.021)

SUMMARY

• Identification of a significant C-R curve is an important QA check.

• Graphical analysis is simple but subjective• Linear regression analysis is objective and

conservative but requires parametric analysis.• Correlation analysis is objective and liberal

and non-parametric methods are available.

BIOLOGICAL INTEFERENCE IN

FATHEAD CHRONIC TESTS

• Seasonal (cold months)

• Affects only fathead minnows

• High variability

• Poor dose response

• Fungus-like growth

TOXICITY CHARACTERISTICS

Normal Gills and Pharynx

Bacterial Clogging

% Survival on Day of Test

Rep 3 4 7

1 100 13 0

2 100 25 0

3 100 100 100

4 100 88 88

5 100 50 13

UV LIGHT

020406080

100

Untrt UV

% S

urvi

val

25%

50%

100%

Autoclaved

020406080

100

Untrt Autoclaved

% S

urvi

val

50

100%

PASTEURIZE

020406080

100

Untrt Pasteur

% S

urvi

val

25%

50%

100%

ANTIBIOTIC

020406080

100

Untrt Antibiotic

% S

urvi

val

25%

50%

100%

STERILIZATION

ANTIBIOTIC ADDITION

0

20

40

60

80

100

Baseline Diluent Only

% S

urvi

val

Rec control

32%

42%

56%

75%

100%

ANTIBIOTIC ADDITION

0

20

40

60

80

100

Baseline Diluent + Effluent

% S

urvi

val

Rec cont

32%

42%

56%

75%

100%

EFFECT OF ISOLATION

02040

6080

100

1 2 3 4 5 6

Day of Test

% A

live

Sinc

e Pr

evio

us

Day

Sick FishRemoved

Dead FishRemoved

• “Toxicity” due to a naturally occurring pathogen

• Best viewed as a kind of interference

CONCLUSION

• Heat

• Filtration (0.2 uM)

• UV light

• Antibiotics

CONTROLLING BIOLOGICAL INTERFERENCE

Advantages:

• Simple, no specialized equipment

Disadvantages:

• May be more “intrusive” (e.g. removal of volatile components

• Must re-aerate sample

HEAT

Advantages:

• Usually very effective

Disadvantages:

• Impractical with high suspended solids

• Requires specialized equipment for filtering large volumes

• May remove particulate bound contaminants

FILTRATION (0.2 UM)

Advantages:• Usually very effective.• Uses common equipment

Disadvantages:• Less effective with high suspended solids or stained water• May degrade organic contaminants or enhance organic toxicity (e.g. PAHs)

UV LIGHT

Advantages:• Usually very effective. • Chemicals inexpensive and widely

available• Easy to treat large volumes

Disadvantages:• May require determination of proper

dose

ANTIBIOTICS

• Chronic WET tests using fathead minnows may show evidence of interference due to pathogens.

• Interference = high variability, poorly defined dose response

• Most common with surface waters • Control measures = sample treatment

to kill or remove pathogens.

SUMMARY

STATISTICAL AND BIOLOGICAL

SIGNIFICANCESETAC Expert Advisory PanelPerformance Evaluation and

Data Interpretation

TOXIC VS. NON-TOXIC

• WET Tests Developed to Identify Toxic Samples

• Two Methods Used–Hypothesis testing - Statistical

difference–Point-estimation - Standard level of

effect

TOXICITY ASSUMPTIONS OF HYPOTHESIS TESTING

• Non-Toxic = No statistical difference between control and critical concentration response

• Toxic = Statistical difference between control and critical concentration response

TOXICITY ASSUMPTIONS OF POINT-ESTIMATION

• A preselected level of effect is considered toxic– Acute test: 50 % effect–Chronic test: 25 % effect

• Toxic = ECx/ICx is less than the critical concentration (IWC)

• Non-Toxic = ECx/ICx is equal or greater than the critical concentration (IWC)

BOTH APPROACHES HAVE STRENGTHS AND LIMITATIONS

• Complete Discussion in:

–Grothe et al. Eds. 1996. Whole Effluent Toxicity Testing: An Evaluation of Methods and prediction of Receiving System Impacts, SETAC Press, Pensacola, FL, USA.

STRENGTHS AND LIMITATIONS OF HYPOTHESIS TESTS

• Strengths– Suited for

comparison of treatments

– Simple to calculate (no modeling)– Not model

dependent

• Limitations– NOEC is concentration

dependent– Variability reduces

statistical power and increases significant effect

– No confidence intervals– Results are independent of

concentration-response curve

STRENGTHS AND LIMITATIONS OF POINT ESTIMATES

• Strengths– Uses concentration-

response curve– Not limited to tested

concentrations– Confidence intervals

• Limitations– Selection of effect

level– Partial responses

increase accuracy– Model dependent– More difficult

computations

WHICH METHOD IS BEST?

• Both Approaches Are Supported By The TSD And The Methods Manuals

• Depends On The Purpose Of The WET Test–Hypothesis test - Identify statistical

difference from control response–Point-estimate - Concentration which

shows a standard effect

TOXIC MAY NOT = ECOLOGICAL IMPACT

• Hypersensitive Hypothesis Tests• Relatively Sensitive Test Species• Inconsistent Exposure Parameters Between

the Toxicity Test and Receiving Water– Magnitude, duration, frequency of exposure– Water chemistry

• Population/Community Structure Dynamics

NONTOXIC MAY NOT = NO ECOLOGICAL IMPACT

• Hyposensitive Hypothesis Tests• High Effect Level In Point-Estimates• Relatively Insensitive Test Species• Inconsistent Exposure Parameters Between

the Toxicity Test and Receiving Water– Magnitude, duration, frequency of exposure– Water chemistry

• Undetected Biological Effects• Population/Community Structure Dynamics

WHAT CONCLUSIONS CAN BE MADE?

• The Sample Is Toxic/Non-Toxic As Defined By The WET Program

• The Biological Impact Was Significant/Insignificant In The Beaker

• The Receiving Water May or May not Become Impacted

WAYS TO INCREASE THE ECOLOGICAL RELEVANCE

• Identification of Toxic Agent(s)• Consider the Use Of Indigenous Species In Toxicity Tests• Consider Exposure Parameters Found In Receiving Water

– Magnitude, duration, frequency of exposure– Water chemistry– Ambient water tests

• In Situ Bioassays• Detection and Study Of Other Biological Effects• Comprehensive Study Of Population/Community Structure Dynamics In

Receiving Water• Further Studies In A Variety Of Ecosystems Which Examine The

Relationship Between WET Tests And Ecological Impact.

COST OF “ECOLOGICALLY RELEVANT” WET TESTS

• Very Expensive–Methods Research and Development–Receiving water characterization–Field bioassessments

• Loss Of Comparability

• Increase In Complexity Of Water Quality Standards and Interpretation

SUMMARY

• WET Tests Were Developed To Identify Toxic and Nontoxic Samples

• WET Tests Are Useful In Conjunction With Chemical And Field Assessment Data To Protect Aquatic Ecosystems

• Adaptation Of WET Tests To Be Ecologically Relevant Can Be Helpful But Comes At A Cost

FALSE POSITIVES FALSE NEGATIVES

GUIDING PRINCIPLE = REPEATABILITY

Repeatable test results are taken as “true” or “real” or “correct”.

FALSE POSITIVES/NEGATIVES IN CONTEXT OF WET TESTS

Depends on presumed function of WET tests:

• WET Test as “predictor” of instream effects.

• WET Test as “detector” of toxic amounts of toxic chemicals

WET TEST AS “PREDICTOR” OF INSTREAM EFFECTS.

• False Positive = false indication of instream effects

• False Negative = failure to indicate instream effects

WET TEST AS “DETECTOR” OF TOXIC AMOUNTS OF TOXIC

CHEMICALS• False Positive = false indication of presence of toxic

amounts of toxic chemicals

• False Negative = failure to indicate presence of toxic amounts of toxic chemicals

WHAT IS “TOXICITY”?

• Statistically significant difference between effluent concentration and control

• An LC50 or other point estimate that is less than some predetermined value

The operational definition of toxicity is often statistical

TOXICITY AS A STATISTICAL CONCEPT

• False Positive = Statistically significant effect that is not “Real” (spurious, artifactual).

• False Negative = Effect that should be observed but is not.

THERE ARE REASONS WHY STATISTICALLY SIGNIFICANT

RESULTS HAPPEN

At most, 4 things are present in a test beaker: Diluent Sample Organism(s) Food

TOXICITY NOT DUE TO SAMPLE

• Technician error

• Bias in test chamber location or in assigning organisms to treatments.

• Statistical sampling error (Type 1 error)

• Other

TECHNICIAN ERROR

• Expertise• Experience

BIAS IN ORGANISM/CHAMBER ASSIGNMENT

• Bias in organism assignment is a tendency to assign healthier or less healthy organisms to certain test concentrations

• Systematic arrangement of test chambers can result in systematic bias in organism response (e.g. Selenastrum algal growth test)

• Can be eliminated through proper randomization.(See Davis, et al, 1998)

STATISTICAL OUTCOMESTypes of Errors in Hypothesis

Testing

If Ho is True If Ho is False

If Ho isrejected

Type I error No error

If Ho is notrejected

No error Type II error

HYPOTHESIS TESTING FACTS

• NOECs are not point estimates

• Cannot calculate coefficients of variation or confidence intervals

• NOEC is a lower concentration level than the LOEC when the dose response curve is smooth

• LOEC may represent a different amount of effect from test to test

= 0.05 = Type 1 Error

o msdNull Hypothesis is TRUE

Null Hypothesis is FALSE

o msd a

= 0.05

= 0.2 = Type 2 Error

Power = 0.8

STATISTICAL SAMPLING ERROR

• Type 1 error.

• Should be rare (P < alpha)

• Not repeatable

• Can be reduced by decreasing alpha but at cost of increasing Type 2 error (False Negatives)

“UNINTERESTING” TOXICITY

Toxic response due to a sample that deviates from culture conditions but is still within standard test conditions. E.g. The toxic response is due to a slight difference in pH (0.2 units).

FALSE NEGATIVE: FAILURE OF THE TEST SYSTEM TO INDICATE

TOXICITY• Operator error

• Bias

• Type 2 error

• Intrinsically variable data

• Interference

False +/- are “wrong” answers.• In the absence of technician error, biased test

design and biased sampling, the False +/- rate = Type I and II error rate, respectively.

• Repeatable results, in the absence of technician error and biased sampling, cannot be False +/-’s.

• An estimate of the False + rate could be obtained through testing of blanks.

CONCLUSIONS

INTRA- AND INTER- TEST VARIABILITY


Data Interpretation

TYPES OF VARIABILITY• Variability inherent in any analytical procedure• Intra-test : among and between concentrations• Intra-lab: within one lab, same method• Inter-lab: between labs, same method• Method specific: within limits of method

–organism age, length of test, dilution water, food type, etc.

INTRA-TEST VARIABILITY

Group N MeanSurvival

s.d. CV(%)

control 4 0.975 0.050 5.12 4 0.975 0.050 5.13 4 0.975 0.050 5.14 4 0.950 0.058 6.15* 4 0.675 0.150 22.26* 4 0.275 0.222 80.6

MSE = 0.033 MSD = 13.9 %

INTRA-TEST VARIABILITY AND ENDPOINT UNCERTAINTY

EC Conc. Lower95% CL

Upper95% CL

Conf.Int/EC

1 220 95 310 0.9810 332 196 422 0.6850 553 440 670 0.4190 919 744 1416 0.7399 1392 1024 2906 1.35

5

6

7

8

9

10

11

12

13

Tests

LC50

(mg/

l SD

S)

LC5095% UCI95% LCImean LC50

POINT ESTIMATE INTRA-LAB VARIABILITY

HYPOTHESIS TESTS INTRA-LAB VARIABILITY

Horizontal lines = acceptance limits for two dilution series(red dotted = 0.5; blue dashed = 0.75)

0

50

100

150

200

250

300

0 1 2 3 4 5 6 7 8 9 10Test #

NO

EC (p

pb C

u)

SOURCES OF INTRA-TEST VARIABILITY

• Genetic variability• Organism handling and feeding• Toxicity among and between treatments• Non-homogeneous sample source• Sample toxicity

SOURCES OF INTRA-TEST VARIABILITY

• Abiotic conditions

• Dilution scheme

• Number of organisms/treatment

• Dilution water pathogens

• Randomization important!

SOURCES OF INTRA-LAB VARIABILITY

• Intra-test sources• Analyst experience and practice• Organism age and health• Acclimation• Dilution water• Type of sample


• Sample quality• Test chamber characteristics• Organisms/source• food type/rate/source


• Replicate volume• Test duration• Procedures

SOURCES OF INTER-LAB VARIABILITY

• All of previous are important

• Differences allowed in methods - Could be significant between labs

• Differences in protocols - State, federal, local, etc. Use promulgated standard

• ANALYST EXPERIENCE

VARIABILITY AND POINT ESTIMATE UNCERTAINTY

Test #1 Test #2

Mean CV (%) 9.9 33.8

IC25 (%) 27.2 26.0

MSE 34.5 290.6

95% CI 25.7-28.5 17.2-31.3

HYPOTHESIS TESTSHIGH VARIABILITY - LOW

STATISTICAL POWER

Group n Mean wt(ug/ind)

s.d. CV%

Control 4 632 552 87.42 4 727 674 92.73 4 1080 408 37.74 4 564 493 87.55 4 748 235 31.4

MSD = 131 %

HYPOTHESIS TESTS LOW VARIABILITY - HIGH

STATISTICAL POWERGroup n Mean wt

mg/survivors.d. CV

Control 4 0.30 0.012 4.0%

10% 4 0.30 0.013 4.3%

18% 4 0.31 0.008 2.6%

32% 4 0.30 0.010 3.3%

56% 4 0.27* 0.013 4.8%

100% 4 0.27* 0.013 4.8%

MSD = 6.5 %

ACTIONS TO REDUCE VARIABILITY

• Establish performance criteria

• QA program

• Establish and follow strict procedures

• MAXIMIZE ANALYST SKILL

• Contract lab selection

• Additional QA/QC criteria

WHY DETERMINE METHOD VARIABILITY AND WHY

CONTROL VARIABILITY?• If inherent variability of each method is

known there will be less chance of making errors concerning toxicity.

• Variability too high - not detect toxicity when present. Variability too low - might detect toxicity when it is not there.

• At present there is little incentive to reduce variability.

EXAMPLES OF ADDITIONAL QC TEST CRITERIA

• EPA Region IX: upper MSD limits

• Washington: upper MSD limits, change in

• N. Carolina: limit control CVs, C. dubia “Practical Sensitivity Criteria”

• EPA Region VI: limit control CV, increase number replicates,biological significance

THE CHRONIC TEST GROWTH ENDPOINT


Data Interpretation

CHANGE IN GROWTH ENDPOINT CALCULATION

Pre-Nov., 1995 ApproachGrowth = D.W. surviving organisms # surviving organisms

Post-Nov., 1995 ApproachGrowth = D.W. surviving organisms # initial organisms

Treatment%

MortalityBefore

PromulgationAfter

PromulgationControl 5.1 325 308

2 2.6 353 3413 5.0 345 3294 17.9 387 3065 47.5 319 167

EFFECT ON MEAN TREATMENT RESPONSES

5

10

15

20

25

30

35

Observations

CV

(%)

AfterBefore

INTRA-TREATMENT VARIABILITY AND WEIGHT CALCULATIONS

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1 2 3 4 5 6 7 8 9 10

Tests

Ref. Tox.Effluent

OLD MSE/NEW MSE RATIO

EFFECTS ON HYPOTHESIS TEST ENDPOINTS

BeforePromulgation

AfterPromulgation

Test #

%MSD NOEC %MSD NOEC1 16.4 50 16.7 502 10.8 10 29.1 103 11.9 5 39.0 54 19.7 25 18.5 25

EFFECTS ON HYPOTHESIS TEST ENDPOINTS

BeforePromulgation

AfterPromulgation

Test #

%MSD

NOEC

Avg.wgt.at

NOEC

%MSD

NOEC

Avg.wgt.at

NOEC

1 20.9 100 296 23.4 100 2962 19.5 100 268 25.1 100 2333 22.1 100 254 24.1 100 2274 21.4 100 387 22.8 100 313

EFFECTS ON POINT ESTIMATE ENDPOINTS

BeforePromulgation

AfterPromulgation

Test #

IC25 95%CI IC25 95%CI1 56.2 45.4-79.3 48.3 43.3-61.9

2 NC NC 12.4 6.4-13.8

3 NC NC 4.2 1.5-7.3

4 33.7 28.2-40.6 30.0 19.4-35.0

EFFECTS ON POINT ESTIMATE ENDPOINTS

BeforePromulgation

AfterPromulgation

Test #

IC25 95%CI IC25 95%CI1 291 NC 234 191-262

2 386 NC 176 140-256

3 227 179-258 138 111-155

4 >400 NC 144 104-162

NOEC/IC25 RELATIONSHIP

Test # TestType

NOEC IC25Before

IC25After

1 Effluent 50% 56.2 48.3

2 Effluent 25% 33.7 30.0

3 Ref. Tox. 100 ppb 291 234

4 Ref. Tox. 100 ppb 386 176

5 Ref. Tox. 100 ppb 227 138

6 Ref. Tox. 100 ppb >400 144

IMPACT ON TEST INTERPRETATION

• Hypothesis Test Results - most cases show little change, but not always

• Point Estimate Results - usually increases predicted toxicity

ISSUES RELATED TO CHANGE IN APPROACH

• Test growth or biomass?

• Accurate representation of growth?

• Correlation between new results and instream responses?

ISSUES RELATED TO CHANGE IN APPROACH

• Conflict between new results and unchanged effluent quality?

• Effect on reference toxicant control charts

• Relationship between NOEC and IC25

AGE-RELATED SENSITIVITY OF FISH IN ACUTE WET TESTS


Data Interpretation

REVISIONS TO FISH AGES IN EPA ACUTE TEST MANUALS

• From: 1-90 days old in the 3rd edition of the acute manual (1985; EPA/600/4-85/013)

• To: 1-14 days old (or 9-14 days old for silversides) in the 4th edition of the acute manual (1993; EPA/600/4-90/027F)

COMMONLY USED TEST SPECIES

• Fathead minnows• Sheepshead minnows• Silversides (inland, atlantic, and

tidewater)

RATIONALE

• Younger life stage is generally more sensitive than older life stage

• Reduction in range of acceptable ages from 1-90 to 1-14 days will reduce variability

CONCERN

• Use of younger fish in NPDES testing may show an increase in apparent toxicity, without any changes in effluent conditions

COMMON QUESTIONS

• Are <14-day old fish more sensitive than <90-day old fish to toxicants?

• Does the use of <14-day old fish reduce intertest variability when compared to <90 day-old fish?

• How does the sensitivity and precision vary within the 1 to 14 day old age range?

SENSITIVITY OF 14, 30, AND 90 DAY-OLD FATHEAD MINNOWS

Copper

Age (days)14 30 90

Mea

n 96

hr L

C50

(ppb

)

0

200

400

600

800

1000

1200

Unionized Ammonia

Age (days)14 30 90

Mea

n 96

hr L

C50

(ppm

)

0.00

0.25

0.50

0.75

1.00

1.25

1.50

A

B

C

AA

B

INTER-TEST PRECISION OF 14, 30, AND 90-DAY OLD FATHEAD MINNOWS

Copper

Age (days)14 30 90

Coe

ffici

ent o

f Var

iatio

n

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Unionized Ammonia

Age (days)14 30 90

Coe

ffici

ent o

f Var

iatio

n

0.00

0.05

0.10

0.15

0.20

0.25

SENSITIVITY OF 1-14 DAY-OLD FATHEAD MINNOWS

Sodium Pentachlorophenol

Age (days)1 4 7 10 14

Mea

n 48

hr L

C50

(ppb

)

0

100

200

300

400

Hexavalent Chromium

Age (days)1 4 7 10 14

Mea

n 48

hr L

C50

(ppm

)

0

50

100

150

200

250

SDS

Age (days)1 4 7 10 14

Mea

n 48

hr L

C50

(ppm

)

01234567

Unionized Ammonia

Age (days)1 4 7 10 14

Mea

n 48

hr L

C50

(ppm

)

0.0

0.5

1.0

1.5

2.0

2.5

A

BB B B

A

AA A A

A

A

B BB B

BBB

B

INTER-TEST PRECISION OF 1-14 DAY-OLD FATHEAD MINNOWS

Age Range (days)1 - 14 4 - 14 7 - 14 10 - 14

Coe

ffici

ent o

f Var

iatio

n

0.0

0.1

0.2

0.3

0.4

0.5

0.6NaPCPCr+6SDSNH3

SUMMARY

• 14-day old fathead minnow larvae are more sensitive to copper & ammonia than 90 day- old fish.

• The inter-test precision of 90 day old fish is equal or better than 14 day-old fish for copper & ammonia.

SUMMARY - Cont.

• Within the 1-14 day age range, 1 day-old larvae are less sensitive to several toxicants.

• The sensitivity of these toxicants becomes constant after 4-7 days of age.

• Maximum inter-test precision for these toxicants is observed when the age range is limited to 7 -14 day old larvae.

REASONABLE POTENTIAL AND TOXICITY TEST

DESIGN

RP DETERMINATION DEFINITION

• “to determine whether the discharge causes, has the reasonable potential to cause, or contributes to an excursion of numeric or narrative water quality criteria” (TSD, 1991)

REASONABLE POTENTIAL

• 40 CFR 122.44(d)(1) requires that the RP procedure address the following:– effluent variability– existing controls on all pollution sources– available dilution– species sensitivity

• WERF POTW survey found that RP is not consistent among regulatory agencies

REASONABLE POTENTIAL EXAMPLES

• Virginia definition is that 75% of tests must meet decision criterion

• Region IX uses a statistical approach adopted from the TSD

• Some states do not issue limits• Some states issue limits to all

major dischargers

VARIABILITY AND RP• Primarily an inter-test issue

– effluent variability– method variability

• How is it determined?– Assumptions

• TSD• Similar facilities

– Collecting sufficient data• Monthly?, Quarterly?, Annually?

VARIABILITY ASSUMPTION ISSUES

• TSD assumption (CV=0.6) may not be accurate

• May take advantage of data for similar facilities, reduces some uncertainty

• Actual data always best - greater certainty in decision to issue limit

• Reduce potential for erroneous conclusions based on a few data points

95%1WLA 95%2

HOW TO ADDRESS VARIABILITYTHROUGH TEST DESIGN

• Consistency between tests:– dilution schemes– dilution water type and characteristics– test vessel dimensions and material– test replicate volume– increase sample size per rep. or conc.– test organism age (acute tests)– species sensitivity affects variability

SPECIES SENSITIVITY AND RP

• Two Components:– Representative of condition to be protected?– Magnitude of toxicity

• Both components affected by:– species– age of life stage– dilution water quality– test type (static, renewal, flow-through)– culturing/handling of organisms

SPECIES SENSITIVITY AND REPRESENTATION OF TOXICITY

• Important that tests be reliable indicators of toxicity, dependent on some test design parameters:–pH–hardness–alkalinity–treatment renewals

TEST AND INSTREAM HARDNESS

• C. dubia sensitive to hardness• C. dubia acclimated and tested at 120 ppm

hardness• Instream and effluent hardness is 300 ppm• Test result due to effluent or sensitivity to

hardness?• Solution: test different organism or C.

dubia cultured at higher hardness

SPECIES SENSITIVITY, TOXICITY, ORGANISM AGE & RP

• Flexibility in organism age tested–acute: significant–chronic: minimal

• Data indicates that age affects sensitivity

SPECIES SENSITIVITY, TOXICITY, DILUTION WATER QUALITY & RP

• Example: pH• If ammonia is present, and pH artificially rises in

test beyond that in real world, ammonia may contribute to toxicity and affect results used to determine RP

• Solution: control pH in tests at levels occurring at the condition of interest (IWC, 100% discharge, etc.) using direct control (CO2 headspace) or flow-through testing

DILUTION & RP

• EPA’s RP approach compares data distribution to WLA

• If WLA predicted to be exceeded by a specific percentile of the distribution, then RP exists

• WLA consists of numeric standard and dilution

Ceriodaphnia sp.

CV = 1.06

Long - Term Average

Chronic Toxic UnitsWLA1 95th % WLA2

Rel

ativ

e Fr

eque

ncy

ADDRESSING DILUTION & RP IN TEST DESIGN

• Center test dilutions on respective effluent concentrations of concern

• Test dilutions below and above • Avoid testing concentrations/conditions

which are unlikely to naturally occur• Maximize dilution factor with intra-test and

inter-test uncertainty in mind

CHOOSING TEST DILUTIONS

• Example:– Chronic IWC = 25%– Dilutions of 23%, 24%, 25%, 26% and 27% may miss

toxicity at 28% which is well within uncertainty of most chronic endpoints and may result in a false negative indication of toxicity

– If dilutions are 6.25%, 12.5%, 25%, 50% and 100%, there is little environmental relevance to results at concentration 4x the IWC

– Choose something in between, like 12%, 17%, 25%, 35% and 50% (dilution factor 0.7)

RP TEST DESIGN SUMMARY

• Minimize inter-test method variability• Insure representative test results

through control of parameters not limited by methods

• Account for dilution in tests• Balance maximum dilution factors in

tests with endpoint uncertainty

MOST SENSITIVE SPECIES SELECTION


Data Interpretation

MOST SENSITIVE SPECIES (MSS) DETERMINATION

• Purpose–To determine which test species is “most

sensitive” to an effluent source or ambient water

• Desired Toxicity Information from MSS–Variability/Seasonality–Magnitude or frequency of “sensitive”

response

COMMON CONSIDERATIONS

• Test Frequency• Species Selection • Dilution Water Type• Sample Type • Concentration Series• Statistical Analysis

FREQUENCY AND TIMING OF MSS SCREENS

• Balance of Cost and Adequate Information

• Initial or Reevaluation

• Seasonal or Summary Information Desired

SELECTION OF TEST SPECIES

• Diversity of Organism Types–Plant, vertebrate, invertebrate

• Nature of Receiving Water–Salinity, resident species

• Non-promulgated, Resident Species• Suspected Toxicant(s)

–USEPA Region 9 & 10 Guidance Document

SELECTION OF DILUTION WATER

• Method Defined Synthetic Dilution Water

• Natural Receiving Waters

• Receiving Water Defined Synthetic Dilution Water

SELECTION OF SAMPLE TYPE

• Whole Effluents• Receiving Water• Composite or Grab Samples

CONCENTRATION SERIES SELECTION

• Multiple Concentration Tests– Preferred experimental design for MSS screens– Select concentrations based upon IWC and

elucidation of concentration-response (C-R) relationship.

• Single Concentration Tests (Pass/Fail)– Effective if cost is prohibitive– Control and IWC

STATISTICAL ANALYSIS AND INTERPRETATION

• Multiple Biological Endpoints• Combining Multiple Screen Results• Statistical Analysis Method

MULTIPLE BIOLOGICAL ENDPOINT ANALYSIS

• Evaluate each biological endpoint

• Use most “toxic” endpoint

Kelp Germination and Germ Tube Length

Statistical EndpointNOEC EC/IC25

Effl

uent

Con

cent

ratio

n (%

)

0

20

40

60

80

100 GerminationTube Length

METHODS OF COMBINING MSS RESULTS

• Proportion (X times out of Y screens)

• Averaging

Multiple MSSS Data Using FW Chronic Tests

Screen Number1 2 3

EC 25

/IC25

(% E

fflue

nt)

0

20

40

60

80

100

FH CD SC

*

*

*

Species Proportion (X/Y) Average Fathead Minnow (FH) 67 % (2/3) * 87 % Ceriodaphnia (CD) 33 % (1/3) 70 % *Selenastrum (SC) 0 % (0/3) 97 %

STATISTICAL ANALYSIS METHODS FOR MSS SCREENS

• NOEC’s• Point-estimates• Probability of effect at critical

concentration (pECC)

NOEC’S

• Experimental Question

Which method/species is most likely to identify a change from control

response?

ADVANTAGES OF NOEC’S

• Common method

• Integrates effect and intratest variability

SpeciesFH CD SC

NO

EC

(% E

fflue

nt)

0

20

40

60

80

100

*

DISADVANTAGES OF NOEC’S

• Can not separate biological effect and statistical sensitivity

• Can not average• NOEC’s may not

be environmentally relevant

SpeciesFH CD SC

Effl

uent

Con

cent

ratio

n (%

)

0

20

40

60

80

100

NOEC EC/IC25

>100 >100

IWC

POINT ESTIMATES


Which method/species shows the

specified effect at the lowest concentration?

ADVANTAGES OF POINT ESTIMATES

• Evaluates a common effect level

• Utilizes the entire concentration-response curve (parametric models)

• Can use proportion or average analysis

Concentration (%)0 20 40 60 80 100

Effe

ct (%

)

0

10

20

30

40

50

60

70

80

90

100FH - EC25/IC25 = 70 % *

CD - EC25/IC25 = 90 %SC - EC25/IC25 = > 100 %

DISADVANTAGES OF POINT ESTIMATES

• Effect level selection• Concentration-

response required• Smoothing• No consideration of

endpoint precision• EC values may not be

environmentally relevant

Concentration (%)0 20 40 60 80 100

Effe

ct (%

)

0

10

20

30

40

50

60

70

80

90

100 FH - EC25/IC25 = 70 % *

CD - EC25/IC25 = 90 %SC - EC25/IC25 = > 100 %

IWC

PROBABILITY OF EFFECT AT THE CRITICAL CONCENTRATION

(pECC)


At the concentration of environmental concern, which method/species had the greatest

effect at the lower 95 % confidence limit?

ADVANTAGES OF pECC

• Considers precision of response estimate

• Can use proportion or average analysis

• Environmental relevance

• No concentration-response required

SpeciesFH CD SC

Effe

ct (%

)

-10

0

10

20

30

ECCpECC

*

DISADVANTAGES OF pECC

• Zero replicate variance

• Boot-strapping • Obtaining 95%

confidence intervals at IWC

SpeciesFH CD SC

Effe

ct (%

)

-15

-10

-5

0

5

10

ECCpECC

*0 0

SUMMARY• Discuss the MSS procedure in detail during permit development

• Select variety of organism types• Initially test for trends in toxicity• Continue periodic screening

• Select type of statistical analysis carefully

• Make sure that statistical analysis and the raw results “make sense”

WHOLE EFFLUENT

TOXICITY TEST DESIGN

WET TESTING DESIGN• Important factors

– discharge concentration of concern– type of statistical analysis– typical toxicant(s)– dilution/control water– receiving water quality– number of concentrations tested– stage in testing program (initial, advanced)

DISCHARGE CONCENTRATION

OF CONCERN (COC)• Acute

– initial dilution, if allowed, at edge of acute mixing zone multiplied by 3.3 (TSD, 1991) to convert concentration at LC1 to concentration at LC50

• Chronic– dilution available at edge of chronic

mixing zone

TYPES OF WET TESTS

• COC and control

• Multiple concentrations and control

WET TESTS WITH MULTIPLE CONCENTRATIONS

• Recommended design for discharge monitoring• Usually includes small number of replicates• Focus more on concentration-response

relationship• Dilutions center on COC• EPA recommends dilution factor > 0.5• Maximize dilution factor with endpoint

uncertainty and inter-test variability in mind

WET TESTING ONLY THE COC

• Design for ambient and some discharge monitoring

• Little flexibility in test design• Increase number of replicates and/or

organisms to increase confidence in results • Information on concentration/response

relationship not available and not considered

WET TESTS & WATER QUALITY PARAMETERS

• Important that parameters match goals of testing, either:–instream condition of discharge

upon dilution, or–inherent toxicity of discharge

independent of instream condition

WET TEST WATER QUALITY PARAMETERS

• Most common parameters of concern– hardness– salinity– pH– temperature– conductivity

• Test design solution: extra controls

EXAMPLE OF ADDITIONAL CONTROL

TO ADDRESS HARDNESS• Example goal: test instream condition of

discharge after dilution • Daphnids cultured at 120 ppm• Discharge and receiving water are at 300

ppm• Prepare extra controls at 300 ppm

hardness and compare results with dilutions tested

WET TEST DESIGN AND TYPICAL TOXICANTS

• The toxicant(s) suspected determine if and which test conditions are important

• Good example is ammonia:– pH affects ammonia toxicity– pH is not strictly limited by the methods– pH drift beyond realistic levels may bring

unionized ammonia to unrealistic levels• Test design solution: use pH control in WET

tests

WET TEST DESIGN &DILUTION/CONTROL WATER

• Depends on test goals• Instream mixed discharge condition

– use of water upstream from discharge preferred

– second choice is water similar to upstream – as culture and dilution water differ,

acclimation importance prior to testing increases

WET TESTING FREQUENCY• Dependent on variability in condition

(instream or discharge)• As variability increases, frequency

should increase• Balance variability and frequency of

testing with cost• Goal is to accurately represent the

condition in question

WET TEST DESIGN & STAGE OF TESTING

• Species sensitivity varies with biological endpoints and test conditions

• Frequency of testing and number of endpoints tested can decrease as data set increases

WET TEST DESIGN & STATISTICS

• Statistical approach used to analyze results affects test design and usually is permit-defined

• Point estimates benefit from fewer replicates but more treatments

• Hypothesis testing benefits from greater numbers of replicates but the number of treatments minimally affects results

WET TEST DESIGN SUMMARY

• Focus on condition to be tested and question being asked

• Insure test parameters are representative of condition being tested

• Testing frequency is driven by temporal variability in condition

• Design tests to meet requirements of statistical approaches to be used

Ambient Water Testing:

Experimental Design and Data Analysis

SETAC Expert Advisory Panel Performance Evaluation and Data

Interpretation

AMBIENT TOXICITY TESTING

OBJECTIVES OF AMBIENT TOXICITY TESTING

• Objectives vary–General assessment of water quality in streams, rivers, bays,

ocean

• Determine whether water body should receive more focused assessment

• Assess whether water body or segment thereof should be placed or taken off of CWA 303d list of impaired waterways

• Ascertain source of water contamination

OBJECTIVES OF AMBIENT TOXICITY TESTING - Cont.

• Compare results of effluent toxicity tests with receiving water tests

• In conjunction with TIEs, and associated chemical analysis, identify the cause(s) of contamination

• Assess the success of remediation efforts

• Determine compliance with water quality standard for toxicity

INFORMATION PROVIDED BY AMBIENT TOXICITY TESTING

• Toxicity testing procedures with TIEs and chemical analyses have been used effectively to identify the chemical causes and sources of water quality contamination.

• When applied in conjunction with carefully designed sampling regimes (e.g., site selection and timing of collection) these procedures can describe:

– Magnitude of toxicity– Temporal extent (duration and frequency)– Spatial/geographic distribution– Land use practices responsible for toxicity

STRENGTHS OF SINGLE SPECIES TESTS

• An integrative measure of aggregate, additive toxicity

• Provide a direct measure of toxicity and bioavailablity

• In combination with TIEs, they can identify chemical cause(s) of toxicity

• Measure toxicological responses to chemicals for which there are no chemical specific water quality standards

STRENGTHS OF SINGLE SPECIES TESTS - Cont.

• Reliable predictors of instream impacts

• Afford reliable, repeatable, and comparable results compared to other types of biological and chemical tests

• Furnish an early warning signal so that actions can be taken to minimize ecosystem impacts from toxic chemicals

• Can be performed quickly and inexpensively compared to other biological monitoring procedures

LIMITATIONS OF SINGLE SPECIES TESTS

• Do not characterize the persistence/duration or frequency of exposures in ambient waters without repeated sampling and testing

• Do not directly measure biotic community responses

• Do not encompass the range of species, sensitivities, or functions (endpoints) responsive to toxic chemicals which occur in biological communities

LIMITATIONS OF SINGLE SPECIES TESTS - Cont.

• Do not measure delayed impacts nor effects due to bioaccumulation or bioconcentration, mutagenicity, carcinogenicity, teratogenicity, and enrichment.

• Laboratory tests do not reflect the multivariate and complex exposure conditions which exist in many aquatic ecosystems

• Results may underestimate biotic community responses to chemicals because of multiple stressors acting on aquatic ecosystems

• Use of surrogate species may not represent toxicological sensitivities in some aquatic ecosystems

LIMITATIONS OF SINGLE SPECIES TESTS - Cont.

AMBIENT TESTING METHODS

• Usually U.S. EPA marine or freshwater methods

• Other (e.g., ASTM) protocols or indigenous species tests are sometimes used

DEVIATIONS FROM U.S. EPA EFFLUENT TESTING

PROCEDURES

• Ambient water testing follows U.S. EPA protocols for testing effluents with a few exceptions

• A dilution series usually is not included in testing until TIEs are performed on toxic samples

• Water renewals may be from a single sample• Number of control replicates may be increased• Tests are conducted in glass or teflon containers

“TIERED” APPROACH TO AMBIENT TESTING

• Initial surveys intended to characterize watershed or waterbody sites over several years or hydrologic cycles - sampling may be monthly

• Focused follow-up studies may include:– Increased number of sites and frequency of sampling

– TIEs conducted

– Evaluation monitoring to assess toxicity reduction/remediation efforts

EXPERIMENTAL DESIGN

• Centers around selection of:

– Surface waterbody or segment(s) thereof to be monitored

– Number and location of sampling sites

– Sample type

– Timing/period and frequency of sampling

FACTORS TO CONSIDER WHEN SELECTING SAMPLING SITES

• Significant source of flow or loads into the watershed?

• Representative type of drainage (agriculture, urban, mining, etc.)?

• Receives runoff from particular land use?

• Predicted or suspected toxicity?

• “Integrator” site indicative of inputs and/or of waterway (e.g., near mouth of river)

• Previously identified toxicity?

• Critical or sensitive habitat?

TYPE OF SAMPLE

• Composite collected over various time periods

• Sub-surface grab sample

SELECTING PERIOD AND FREQUENCY OF SAMPLING

• Selecting sampling period depends on objectives of investigation

• Selecting sampling frequency relates to defining duration and frequency of toxic events

DATA ANALYSIS

• EPA recommends t-tests to compare laboratory control to single ambient water sample

• ANOVA and Dunnett’s multiple comparison are appropriate for multiple sites/samples

ECOLOGICAL RELEVANCE QUESTION

• Are the results of the U.S. EPA tests, or other single species tests, reliable predictors of biotic community responses/impacts?

TWO REVIEWS OF ECOLOGICAL RELEVANCE ISSUE

• Waller W.T., et. al. 1996.

• de Vlaming V, Norberg-King T.J. 1999.

ENCAPSULATED CONCLUSIONS OF REVIEWS

• SETAC Panel - “It is unmistakable and clear that when U.S. EPA toxicity test procedures are used properly, they are reliable predictors of environmental impact provided that the duration and magnitude of exposure are sufficient to resident biota.” and “a strong predictive relationship exists between ambient toxicity and ecological impact.”

ENCAPSULATED CONCLUSIONS OF REVIEWS - Cont.

• de Vlaming and Norberg-King - The U.S. EPA, and other single species toxicity test results are, in a majority of cases, reliable qualitative predictors of responses in aquatic ecosystem populations.

DE VLAMING AND NORBERG-KING SUMMARY

• Available literature yields a weight of evidence demonstration that WET, and other indicator species, toxicity test results are reliable qualitative predictors of biotic responses.

• There are no empirical data which demonstrate that the indicator species results consistently fail to provide reliable predictions of instream biological responses.

DE VLAMING AND NORBERG-KING SUMMARY - Cont.

• When toxicity test results fail to provide a reliable prediction, they more frequently underestimate instream biological responses.

• Lab toxicity test results do not tend to overestimate bioavailability of chemicals.

• Reliability with which toxicity test results predict instream biological responses increases when tests are performed on ambient waters and with magnitude of toxicity.

• Reliability with which toxicity test results predict instream biological responses increases with characterization of persistence and frequency of toxicity.

• Reliability with which toxicity test results predict instream biological responses increases with effective matching (or accounting for) of lab and field exposures.

DE VLAMING AND NORBERG-KING SUMMARY - Cont.

TIE/TRE TEST DESIGN

TIE/TRE GOAL• To identify, confirm and remove toxicant(s) in

order to bring effluent into compliance with water quality standards

• Test design is dependent on the phase of the TIE and the magnitude/variability of toxicity present

• As toxicity decreases, number of replicates and identification/confirmation trials may need to increase

TEST DESIGN AND PHASE I TIE

• Use species that were used in testing which suggests toxicity

• Many sample manipulations

• Minimum number of replicates/treatment

• Primarily analyze with hypothesis testing and BPJ

• Test at 100% concentration or concentration providing significant response compared to controls

• Minimum QA/QC

TEST DESIGN AND PHASE III TIE

• May use more than one species to compare sensitivities in supporting hypothesis

• Few sample manipulations• Number or replicates and treatments similar to normal

tests• May use hypothesis or point estimate statistical

approaches - depends on permit• Usually test at multiple concentrations to support point

estimates and to capture concentration-response relationships

• Standard QA/QC

OTHER TIE/TRE TEST DESIGN ISSUES

• Flexibility• Temporal variability within and

between samples• Screening• Dilution water• Controls for manipulations• QA/QC

FLEXIBILITY• Be creative• Do not be constrained by required methods• Consider toxicology in test design and

interpretation– rate of action– changes with organism age or development

• Consider magnitude of toxicity for chronic TIEs - can you use acute tests?

REFERENCE TEST APPROACHFLUORIDE LC50S FOR EFFLUENT

AND LAB WATER Age (days) Series #1 Series #2 Series #3

2 7.8, 4.7 7.1, 4.4 8.0, 5.0

4 11.0, 6.8 11.7, 6.8 9.5, 7.3

6 16.3, 9.3 17.6, 8.0 18.6, 9.2

TEST DESIGN & TEMPORAL VARIABILITY

• Variability can occur within and between samples, as well as between toxicant(s), over time

• As toxicity persistence within samples decreases, may increase requirement for renewals

• As temporal variability in toxicant identity and magnitude of toxicity increases, the number of trials increases

TIE/TRE TEST DESIGN AND SCREENING

• Only possible if screen can be a reliable predictor of toxicity in definitive test

• Utility of screens impacted when toxicity is not persistent

• Good idea when toxicity is unpredictable between samples - saves resources

• Difficult for chronic TIE/TREs

TIE/TRE TEST DESIGN AND DILUTION WATER

• Should use same dilution water as that in tests which originally suggested toxicity

• Advisable to test another dilution water to see if it impacts test results

• Dilution water may influence toxicity and TIE interpretation

• Differences may be biological, chemical or physical

TIE/TRE TEST DESIGN AND ADDITIONAL CONTROLS

• Phase I includes numerous manipulations of tested sample

• Manipulations may cause toxicity independent of samples

• Be wary of chemical additions which oxidize or reduce (examples will be provided)

• Solution: treat control water in same fashion as sample and add to test as another control

TIE/TRE TEST DESIGN SUMMARY

• Design changes with stage of study

• Focus resources on issues specific to each stage of study

• Maintain flexibility and creativity

• Avoid false conclusions with multiple controls and checks

• Expertise