+ All Categories
Home > Documents > Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

Date post: 10-Oct-2016
Category:
Upload: diana
View: 213 times
Download: 0 times
Share this document with a friend
27
CHAPTER 17 Statistical Analysis of Comet Assay Data DAVID P. LOVELL Department of Biostatistics, Postgraduate Medical School, University of Surrey, Daphne Jackson Road, Manor Park, Guildford, Surrey, GU2 7WG, UK 17.1 Introduction The single-cell gel electrophoresis (SCGE) or Comet assay is a quick, relatively simple and economic method for the investigation of single- and double-strand breaks in DNA. The assay has been used in in vivo and in vitro experimental approaches across a range of species as well as in human studies and other biomonitoring investigations. It is now used to assess the genotoxicity of che- mical and physical agents. The method is increasingly accepted by regulatory authorities in their assessment of the genotoxicity of chemicals 1 and initiatives are underway to develop OECD guidelines for an in vivo version, while in vitro methods are being investigated with the objective of future validation. 2 Special issues of Mutagenesis (2008, 23, 143–240), Mutation Research (2009, 681, 1–109), and Cell Biology and Toxicology (2009, 25, 1–98) provide overviews of the fields of research where the Comet assay is now used. The standard alkaline Comet assay detects strand breaks and acid-labile sites 3 but since its first description in the 1980s the fields of research the assay has been applied to have grown and it now exists in a number of forms with new applications of the methods continuing to be developed, such as the development of assays including lesion-specific enzymes. 4 This has resulted in a series of protocols that have undergone various modifications depending upon Issues in Toxicology No 5 The Comet Assay in Toxicology Edited by Alok Dhawan and Diana Anderson r Royal Society of Chemistry 2009 Published by the Royal Society of Chemistry, www.rsc.org 424 Downloaded by Duke University on 15 October 2012 Published on 27 August 2009 on http://pubs.rsc.org | doi:10.1039/9781847559746-00424
Transcript
Page 1: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

CHAPTER 17

Statistical Analysis of CometAssay Data

DAVID P. LOVELL

Department of Biostatistics, Postgraduate Medical School, University ofSurrey, Daphne Jackson Road, Manor Park, Guildford, Surrey, GU2 7WG,UK

17.1 Introduction

The single-cell gel electrophoresis (SCGE) or Comet assay is a quick, relativelysimple and economic method for the investigation of single- and double-strandbreaks in DNA. The assay has been used in in vivo and in vitro experimentalapproaches across a range of species as well as in human studies and otherbiomonitoring investigations. It is now used to assess the genotoxicity of che-mical and physical agents. The method is increasingly accepted by regulatoryauthorities in their assessment of the genotoxicity of chemicals1 and initiativesare underway to develop OECD guidelines for an in vivo version, while in vitromethods are being investigated with the objective of future validation.2 Specialissues of Mutagenesis (2008, 23, 143–240), Mutation Research (2009, 681,1–109), and Cell Biology and Toxicology (2009, 25, 1–98) provide overviews ofthe fields of research where the Comet assay is now used.The standard alkaline Comet assay detects strand breaks and acid-labile

sites3 but since its first description in the 1980s the fields of research the assayhas been applied to have grown and it now exists in a number of forms withnew applications of the methods continuing to be developed, such as thedevelopment of assays including lesion-specific enzymes.4 This has resulted in aseries of protocols that have undergone various modifications depending upon

Issues in Toxicology No 5

The Comet Assay in Toxicology

Edited by Alok Dhawan and Diana Andersonr Royal Society of Chemistry 2009

Published by the Royal Society of Chemistry, www.rsc.org

424

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

Page 2: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

the proposed use. However, general recommendations and guidelines for car-rying out the Comet assay have been produced.5–7

The purpose of this chapter is to discuss some of the experimental design andstatistical analysis issues associated with the use of the Comet assay. Theobjective is to discuss some of the statistical concepts underlying the design ofcomet experiments with emphasis on aspects of experimental design as opposedto a detailed mathematical treatment of different statistical methods. In thiscontext, the link between the experimental unit (a term that has a very precisemeaning in the context of statistical methodology) and the statistical analysis iscritical. More detailed discussion of statistical methods can be found in Lovellet al.8 and Lovell and Omori9 that include sets of recommendations.

17.2 Experimental Design and Statistical Analysis

Although researchers often concentrate much of their attention on the specificmethods used for carrying out statistical tests it is important to appreciate thatthis is only part of the statistical input into the design of Comet assay studies.Experimental design can be viewed as strategic, while the statistical analysis ofthe data obtained is more tactical. The analysis applied may, thus, be somewhatsecondary or consequential to the work that had gone before into the design ofa successful study.It is crucial, therefore, to involve statistical expertise at the design stage. It is

frequently stated, but unfortunately sometimes ignored, that a statisticianshould be consulted before starting a study. This continues to be extremelyrelevant. Failure to seek or act on statistical advice can lead to a poor designwith the consequence that subsequent statistical analysis is either suboptimal orimpossible. Such an event, particularly where it involves human subjects orexperimental animals, is both ethically and economically unsatisfactory. Noamount of statistical ‘‘wizardry’’ or virtuosity can rescue a badly designedexperiment. If this were the only point the reader takes away from this chapterthen a major objective would have been achieved.The advice is, perhaps, even more relevant than in the past. Statistical soft-

ware has become increasingly easy to use, some comes with the sales ‘‘pitch’’that statistical analyses can be carried out without needing the help of a sta-tistician, instrumentation systems and apparatus often includes statisticalanalysis options embedded in the equipment. This increased convenience comeswith the risk, unless the researcher is careful, of introducing serious errors intothe analysis and interpretation of studies.

17.3 Study Design

Many of the experimental design and statistical issues related to the use of theComet assay are also relevant to other mutagenicity tests as well as, in general,to other biological systems. Many of the points made in this chapter can,therefore, be applied more generally.

425Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 3: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

Statistical methods used in mutagenicity and other toxicology studies havetraditionally been based upon approaches using hypothesis testing with thereporting of the probability associated with the testing of a null hypothesis. Thefindings are often reported as statistically significant when the probability orP values are below certain critical values. Formerly, the researcher had tocompare the test statistic obtained from the experiment with critical valuespublished in special sets of statistical tables.10 Now, the test statistics and theirassociated P values are provided as part of the output of a software package.One main criticism of the hypothesis-testing approach is that the fact that aresult is statistically significant does not mean that it is a large or a biologicallyimportant result. This is a symptom of the much greater problem of equatingsignificance testing and P values with decision making.Many statisticians have argued for a move away from formal hypothesis

testing to one based more upon the estimation of the size of effects detected in astudy together with some measure of the uncertainty associated with the esti-mate (such as a confidence interval).11,12 A number of journals have followedthe British Medical Journal’s approach13 in their guidelines for publications,stressing estimation over P values.A clear objective, with a realistic chance of achieving this objective, is a

crucial aspect of any study design. An example is whether a study is a dose–response investigation and whether the objective is to identify an effect of agiven size. The objective ties in with the concept of the power of a study wherepower is the probability of detecting an effect of a given size if it is really present(or the probability of rejecting the null hypothesis where it is false).Comet assays can be divided into three main areas of investigations: human,

animal (in vivo) and in vitro studies. These studies may be inferential ordescriptive and may also be observational or experimental.Inferential studies require a comparator (or control) group and the objective

is to identify differences between the groups. Descriptive studies do not, gen-erally, involve hypothesis testing, instead they focus on providing an accuratedescription of the variables under some specified conditions. They can often beconsidered as hypothesis-generating studies.Experimental and observational studies differ in the degree of intervention

and the relationship with causality. Most in vivo and in vitro studies areexperimental studies. In the case of human studies the potential is for a ‘‘goldstandard’’ randomised control trial (RCT) or a less well controlled (quasi-experimental) comparative study. In both cases, the treatment is administeredand a cause–effect relationship is sought. The statistical methods for the ana-lysis of experimental design and observation studies are similar but differ insome major respects. In the experimental study an intervention is applied to onegroup (such as the treated group in the classic RCT) and the effects ‘‘caused’’ bythis intervention assessed. In observational studies an attempt is made to findan association (using, for instance, the Bradford Hill causality criteria14). In theobservational study membership of the groups is a consequence of how thegroups are defined, while in the experimental case it is (or should be) byrandomisation. It is important to remember that the statistical tests (such as the

426 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 4: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

t-tests or analysis of variance) applied to both observational and experimentalstudies will produce numerical results even though the assumption is that thedata are from a designed experiment. However, not all the assumptionsunderlying the statistical tests will be met in the observational study and thiscan result in biases being introduced. Copas and Li wrote: ‘‘observationalstudies are often analysed as if they had resulted from a controlled study, andyet the tacit assumption of randomness can be crucial for the validity ofinference’’.15

The key concepts of experimental design are: independence of experimentalunits, randomisation, replication and local control. These concepts were firstproposed by R.A. Fisher in the 1920–30s and have undergone much subsequentdevelopment.16–18 Fisher’s work on the factorial and similar designs providespowerful methods for the investigation of planned factors while controllinginaccuracy and estimating precision.Fisher’s work laid the basis for the important and increasingly influential

field of design of experiment (DOE) methodology.18 Factorial designs areparticularly powerful as they provide an efficient way of exploring both themain and interaction effects of experimental factors using relatively smallnumbers of experimental units (which, of course, has implications for the 3Rsand animal usage). The use of DOE approaches instead of the traditionalOFAT (one factor at a time) approach is an important methodological devel-opment. It is also an entry into more complex designs suitable for the inves-tigations of mixtures and interactions. Factors that could be examined withoutfurther use of resources include the effect of sex, different treatments and diets.An example might be the investigation of factors relevant to optimising elec-trophoretic conditions in the Comet assay.

17.4 Endpoints

The Comet assay is a quantitative or semiquantitative method. The identifi-cation of the endpoint to be measured is an important aspect of the study. Themeasurements taken need to be consistent and repeatable so that valid com-parisons can be made between sets of samples from different treatments.The assay has the advantage of being relatively straightforward to carry out

and does not need particularly sophisticated equipment. However, the analysisof the images of comets is not simple. Comets can have complex shapes andquantifying these shapes in terms of simple measures is challenging. Increas-ingly computer-based image analysis is preferred to a visual classification ofcomets based on the morphology and degree of damage,7 although bothmethods are still acceptable.4 Computer-based methods can increase the pre-cision and reduce the subjectivity of the measurement process. Image-analysisprogrammes are capable of collection of large quantities of data but onechallenge is to reduce these data, which Collins et al.4 call a ‘‘surfeit of infor-mation’’, into a smaller number of informative values that summarises anddescribes the comet from that particular cell.

427Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 5: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

A number of measures, some derived from the use of automated imageanalysis, can be taken to describe and quantify the comet (Table 17.1).Examples include measures of the absolute amount of DNA as measured by thesum of the intensities of the pixels in the head or tail, the values of the relativeamount of DNA in the head or tail, measures of absolute length, the tail lengthand the head radius of the comet. Measures representative of the ‘‘centre ofgravity’’ in the head and tail as well as ‘‘moment’’ measures (compositesmeasures taking into account both comet length and the intensity of staining)can be derived. The Olive tail moment, for instance, attempts to combine twoaspects of the comet shape: the length of the comet and the intensity of thecomet by calculating the product of the percentage DNA in the tail and thedifference between the head and tail centres of gravity.Three measures – % tail DNA, tail length and tail moment – are now

commonly used as measures of DNA migration with an increasing tendency forthe endpoint, % tail DNA, to be the preferred measure for assessment.2,7

Collins et al.4 discuss scoring methods and point to the limitation of some ofthe quantitative methods such as a lack of a standardised measurement forcomparisons across studies. Some measurements, such as tail length, are madein pixels and are, in effect, ‘‘arbitrary values’’. As damage increases the intensityof the staining of the DNA in the tail increases rather than the tail lengthincreasing. However, the tail length may be the most sensitive endpoint at verylow levels of damage. Measures may be difficult to generalise across studies,limiting, for instance, their use in quantitative comparisons across studies or forderiving power calculations. Standardisation of measurements betweenlaboratories can be problematic as, for instance, the choice of where to beginand end tail length measures can vary between laboratories.3 Collins et al.4 alsosuggest there is an indication of nonlinearity at low doses in calibration curvesusing the tail moment.Collins et al.4 state that % tail DNA is ‘‘strongly recommended as the

parameter of choice’’. The % tail DNA value has the advantage of beingexpressed on a scale from 0 to 100% making comparisons across studies easier

Table 17.1 Various measures obtainable from image-analysis programmes.

Head DNA Amount of DNA in the comet headTail DNA Amount of DNA in the comet tail% head DNA Percent of DNA in the comet head% tail DNA Percent of DNA in the comet tailHead radius Radius of the comet headTail length Length of the comet tail measured from right border of head area to

end of tailComet length Length of the entire comet from left border of head area to end of

tailHead CoG ‘‘Centre of gravity’’ of DNA in the headTail CoG ‘‘Centre of gravity of DNA’’ in the tailTail moment % tail DNA � tail lengthOlive tailmoment

% tail DNA � (tail CoG–head CoG)

428 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 6: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

and being linearly related to dose in calibration studies. Electrophoretic con-ditions should be adjusted to allow cells from negative control samples to showsome migration of DNA, such as 5–10% tail DNA, which gives a measure ofvariability for the statistical analysis. Collins3 suggests that untreated controlcells should, in general, have a low level of damage probably less than 10% (butmore than 0).Suggestions have been made that the % tail DNA values of negative control

cells should be between 10–20% (or 5–15%) to allow for the detection of bothan increase and a decrease in migration to be detected. A decrease mayrepresent a reduction in apparent damage as a consequence of crosslinkingagents. A statistical point is that a formal hypothesis test would be one-sided ifany change would only be expected in one direction, while a two-sided testwould be appropriate if the results could go in either direction. A two-sided testis slightly less powerful than a one-sided test.Semiquantitative approaches have also been used to score comets on a scale

from 0 to 4. Collins et al.4 show a photograph of different grades of damage.(A grade 4 comet is equivalent to the ‘‘hedgehog’’ cell where all the DNA is inthe tail.) These grading scores correlate well with quantitative values of % tailDNA with the difference between each grade being equivalent to an extra 20%tail DNA.3,4 The values (0–4) given to the comets can be summed to provide aquantitative measure for 100 cells on a scale from 0 to 400. Collins et al.19

showed a close agreement in the relationship between visual- and image-ana-lysis-based methods.Similarly, Pitarque et al.20 calculated a genetic damage index (GDI) based

upon differential weightings given to the different grades of damage for five‘‘arbitrary’’ categories from Type 0 (undamaged) to Type IV (highly damaged))and used this categorisation to obtain a quantitative measure for the slide basedupon a weighting applied to the number of cells with the different grades ofdamage where the GDI¼ (Type I+2�Type II+3�Type III+4�Type IV)/(Type 0+I+II+III+IV).Cells can also be assessed using a binary endpoint as either a ‘‘responder’’ or

‘‘nonresponder’’ based on an assessment of the degree of migration. The per-centage of ‘‘responder’’ cells per slide is then recorded.One approach to the analysis of the data given the number of alternative

endpoints is to analyse a number of them. Analyses should give similar resultsbecause studies have shown appreciable correlations between the measures ofdifferent endpoints. However, if the conclusions drawn by using differentendpoints differ this would indicate that the data should be examined closely toidentify the reasons for the divergent results.Extra data on toxicity may also be collected. Collins et al.4 discuss the use

and limitations of the trypan blue exclusion test for viability but point out thatviability should not be a problem in the in vivo assay but may be for in vitrostudies. Counts of the numbers of ‘‘hedgehog’’ cells and ‘‘ghosts’’ comets(consistent with the complete migration of DNA) can also be made. Such dataare usually not included in the formal statistical analysis of the comet measuresbut provide an important aid in the assessment of the quality of a study.

429Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 7: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

It is good practice to sample cells from different parts of a slide to reduceintroducing biases because the comets are not homogeneous across the slide,for instance, because of, ‘‘edge effects’’, where comets at the edge of the slidehave different measures to those nearer the centre of a slide. Sampling to reducesuch potential biases could consist of ensuring that if 50 cells are measured perslide, then no more that 5 cells are taken from each of 10 of more randomlychosen areas on the slide.Bowden et al.21 suggested that the shape of the comet may be informative

and may be preferred to measures of comet mass or length. The multiplemeasurements possible on the comet image would allow such a suggestion to beinvestigated using multivariate methods to try to distinguish whether there areparticular comet shapes indicative of particular types of damage.Collins et al.4 note that the Comet assay is very sensitive and capable of

detecting between 100 to several thousand breaks per human cell. They stressthe use of rigorously controlled calibration studies using ionising radiation.These show near-linear slopes from 0 to 10 Gy, suggesting that it is possible toexpress data as Gy equivalents that can then be converted into lesion fre-quencies per 106 Da (Daltons).Forchhammer et al.22 have suggested that the most informative way to present

Comet assay results is as lesions per unaltered nucleotides or diploid cells.Collins et al.4 point out that there is considerable interlaboratory variability

in the steepness of the calibration curves, probably reflective of protocol dif-ferences. Interlaboratory comparisons are in progress to try to reduce thediscrepancies through, presumably, subtle differences in protocols such asvariability in electrophoretic conditions. There is appreciable potential forDOE methodology to identify the important factors involved.Statistical analysis can be carried out using the values for the individual cells

but it is often carried out at the level of the experimental unit (the animal or theculture/subculture that the treatment is applied to) or of the individual slide.Identifying a representative value for the comet measures for each experimentalunit is not straightforward. Complications arise because the distribution of theindividual cell endpoints is unlikely to match any of the common statisticaldistributions such as the normal distribution. The distributions observed, espe-cially after treatment, are complex, and even if a function could be fitted to thedistribution, it would require a number of parameters. Simple transformationsuch as the logarithm of the measures may not produce a normal distribution.A number of summary statistics of the cell measures have been suggested

including the geometric mean (equivalent to the antilog of the mean of thelog10 transformed data), the median (the 50th percentile) and various otherpercentiles: 75th, 90th and 95th. The untransformed mean is usually notrecommended because the distribution, particularly of treated cells, is oftenskewed.23 However, the central limit theorem implies that the means of samplesare normally distributed even when the distribution of the population they werederived from is distinctly non-normal, provided the sample sizes are aboveabout 30, so that concern about analysis using untransformed means may notbe crucial.

430 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 8: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

17.5 The Experimental Unit and Experimental Design

A central concept of experimental design, critical to a successful statisticalanalysis, is the identification of the experimental unit. The US NIST defines itas ‘‘the entity to which a specific treatment combination is applied’’ (NISThttp://www.itl.nist.gov/div898/handbook/pri/section7/pri7.htmref) and is theunit to which treatments are randomised. In the case of in vivo studies this is theanimal,6 while for in vitro studies it will be an independent culture or sub-culture.7 The linking of the experimental unit to the level at which randomi-sation occurs is related to the concept of independence of the measures; this isan important assumption underlying many statistical tests. An incorrect spe-cification of the experimental unit in the statistical analysis can lead to a seriousmisinterpretation of the results of the statistical analysis.Replication is an important aspect of experimental design as it provides an

estimate of the ‘‘error’’ variability used in the statistical tests. Replication canbe either by biological or technical replicates. The former are taken fromseparate experimental units; the latter are repeat samples from within the sameunit. In general, it is better, if the opportunity arises, to increase the number ofbiological as opposed to technical replicates. Pooling of samples from differentexperimental units before measuring should, in general, be avoided. Repeatedsampling from this pooled sample will then result in a set of technical replicatesbut would provide no estimate of the variability between biological replicates.In general, the Comet assay is a hierarchical or ‘‘nested’’ design (Figure 17.1).

In a hierarchical design the experimental unit (the animals in the in vivo designand the cultures in the in vitro design) are ‘‘nested’’ or replicated within doses,while a number of slides or gels from each animal or culture are prepared and anumber of cells from each slide or gel are ‘‘scored’’. For instance, in a study ofgoldfish exposed to a glyphosate formulation five fish per dose per duration

Figure 17.1 Hierarchical or nested design. Example of a hierarchical in vitro designbased upon 4 dose levels including a negative control, 5 cultures/sub-cultures at each does level, 3 slides/gels per subculture, 50 cells per slide/gel. (From Lovell and Omori, 2008).

431Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 9: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

were studied, five slides prepared per fish and 200 cells were scored from eachslide.24

Such designs thus have a number of levels of variability – experimentalgroups, animals, slides, cells – and the statistical analysis involves developingmethods that model these different levels of variability. Ideally a statisticalanalysis should ‘‘account’’ for the different levels of variability in the design andavoid the error of not taking into account ‘‘hidden layers’’ of variability in thedesign. A serious error is the use of the individual cell as the experimental unitin the statistical analysis as this may overestimate the statistical significance of acomparison. Wrongly treating repeated measures on the same individual asindependent can result in a superficially more powerful test and an over-estimation of statistical significance. This problem of what is termed ‘‘pseu-doreplication’’ has been recognised for some time in ecological studies and itsimplications and effects on experimental design and analysis are well docu-mented and widely, if not completely, appreciated.25,26 Similar points have beenmade in the neurosciences.27 Failing to take the experimental unit into accountin the statistical analysis is a serious error.In the case of in vitro designs it is important to clarify the relationship

between the cultures and subcultures with respect to the experimental units andto ensure adequate replication. Lovell and Omori9 illustrate the different typesof in vitro designs and the distinction between repeat experiments using dif-ferent cultures and the use of different cultures within the same experiment. Inthe absence of appropriate replication there is a danger that any variability insubculture is confounded with treatment effects leading to potential artifactualresults. Studies, for instance, which fail to take into account this hiddenvariability can result in apparent significant differences between treatments.The more cells scored per subculture in these designs the more likely a sig-nificant result will occur.Wiklund and Agurell23 have provided specific recommendations for Comet

assay designs. Based upon simulation studies, they recommended a design with50 cells from 3 slides per experimental unit and 4 to 5 animals per group for anin vivo study and 2 or 3 cultures for an in vitro study. Recently, Smith et al.28

have provided recommendations for the design of the rat Comet assay designs.They suggest that a design with 6 animals per group, 3 gels per animal and50 cells per gel would have 80% power to detect a 2-fold difference for studiesusing liver, bone marrow and stomach and a 3-fold increase in studies usingblood. They recommend that investigators using the rodent Comet assayshould carry out a similar analysis to determine the optimal experimentaldesign for their own laboratory.

17.6 Statistical Methods

There is no consensus on a single statistical method for the analysis of Cometassay data.6 This is not surprising as there is probably no statistical method thatcan adequately handle data from the individual cell values given the complexity

432 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 10: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

of the distribution of the values.9 However, concentrating on a single repre-sentative value for each experimental unit (animal or cultures) will probablyresult in data that can, with care, be analysed by a number of standard sta-tistical methods. Some care may, however, be needed in the interpretation.Duez et al.,29 for instance, suggested using either the median or 75% percentileof the sample. They concluded that a trend analysis on medians of the sampleswas satisfactory. In practice, though, any statistical analysis is a trade-offbetween the sophistication of the model being fitted and the practicalities of theconduct and reporting of the data.In broad terms there are three types of statistical comparisons that may be

made. Firstly, a comparison between the negative and positive control groups(see below); secondly, a test for differences between a number of groups and fora dose–response relationship and thirdly, pairwise comparisons between theindividual treated groups and the negative control group.Statistical tests usually involve a test of a null hypothesis. In a simple case this

means that there is no difference between a treated and control group. Thealternative hypothesis is that there is a difference. The statistical test applied canbe either one- or two-sided.One-sided means the experimental effect will either have no effect or have an

effect in one predefined direction, two-sided meaning the effect, if any, could goin either direction. It is argued that if, when using a one-sided test, an effect inthe wrong (unpredicted) direction is found that this should be ignored nomatter how significant it might be because if there is any interest at all in a resultin the opposite direction then a two-sided test should be used.The outcome of a statistical test of a null hypothesis can be illustrated by a

2� 2 table (Figure 17.2). This table shows there are two types of correct resultsand two types of incorrect results: the Type 1 error (or a) (falsely rejecting thenull hypothesis) which is related to the significance level of the test and the TypeII error (or b) (wrongly accepting the null hypothesis) related to the power ofthe test. The power of the test is (1 – b).A range of statistical methods (both parametric and nonparametric) are

available, and have been used, for the analysis of comet data.8,9 Each testmakes some assumptions about how the study was carried out and the natureof the data. In the case of parametric statistical tests (those based upon anunderlying parameterised distribution such as the normal) these are: indepen-dence, normal distribution of the residual errors and equal variability withinthe groups. There is also the assumption (which is often violated in observa-tional studies) that the experimental units were randomly assigned to thetreatments. If the analysis is based upon a hypothesis-testing approach thereare a range of parametric test that include the t-tests for two-group and theanalysis of variance methods for multiple-group comparisons and specific testsof dose–response relationships. These tests are special cases of the wider generallinear model (GLM) methodology.For most of the simpler parametric tests there is a nonparametric test

equivalent: the Mann–Whitney for two groups, the Kruskal–Wallis for multi-ple-group comparisons and the Jonkheere–Terpstra trend test for specific tests

433Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 11: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

of dose–response relationships. Table 17.2 lists parametric tests and theirnonparametric equivalent.In general terms, the parametric test will be more powerful if the assumptions

underlying it are met. The nonparametric tests are slightly less powerful underthese circumstances but may give more accurate Type 1 errors (Figure 17.2)when the assumptions are violated. However, nonparametric tests are notassumption free and violations affecting the distributions may also affect theprobability values associated with nonparametric tests. Small sample sizes ornumber of experimental units (such as n¼ 4 or 5) will also reduce the power ofthe nonparametric tests.A test for a dose-related effect will have greater statistical power than pair-

wise comparisons. The dose–response test can be thought of as a more‘‘sophisticated’’ hypothesis with the potential to define a set of orthogonal(statistically independent) components, testing in a four-group design, linear,quadratic and cubic contrasts. A curvilinear response may have two or more ofthe components statistically significant. The greater power of these tests ofspecific hypotheses may mean that a shallow but real, dose–response rela-tionship can be detected by the specific linear trend test even though the overalltest of the equality of the four means in the ANOVA can be nonsignificant.Some decision trees/flow charts used for choosing statistical tests suggest nofurther testing if the overall or omnibus ANOVA test of the equality of all thegroup means is not significant. This is clearly inappropriate if a more specifichypothesis is implicit in the experimental design.A more general approach taking account of the hierarchical design is possible

using the general linear model (GLM). The GLM is a specific case of an evenmore general approach, confusingly, called generalised linear modelling

Null Hypothesis

False True

Reject Correct Result

False Rejection

Type I error

Significance level (α)

Dec

isio

n

Accept

False Acceptance

Type II error

Power (1-β)

Correct Result

Figure 17.2 Hypothesis testing. 2� 2 table showing possible results of a test of a nullhypothesis illustrating the occurrence of Type 1 (or a) error that is relatedto the significance level of the test and Type 2 (or b) error that is relatedto the power of the test.

434 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 12: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

(GZM). GZM provides a modelling approach that can be used with a variety oftheoretical data distributions. A range of more sophisticated methods can beused to model the hierarchical nature of the designs. These include randomeffects modelling (REM), generalised estimating equations (GEEs) and hier-archical linear models (HLMs). Some of these have been applied to Cometassay data.8 The computing facilities for these methods are becoming morewidely available and the use of such methods by statisticians is likely to increasein the future. A number of statistical software packages such as SAS (throughits GLM and MIXED procedures), SPSS, Genstat and as well as R (a public-domain open-source statistical analysis software language) can be used to carryout analyses using some of these models. Lovell and Omori8 provide moredetails of the range of methods available. It is not clear yet whether thesemethods will provide appreciably more information than the less sophisticatedmethods currently in use.Decision trees for statistical tests often include tests for whether data fit a

particular distribution (Kolmogorov–Smirov and Shapiro–Wilks tests) or haveequal variability (homogeneity of variances) between groups (Levene andBartlett’s tests). Tests for normality are likely to have high power to detectdeviations because of the large datasets produced by the Comet assay. The testsare, consequently, capable of detecting relatively minor deviations from dis-tributions such as the normal. This means that small datasets that are non-normally distributed may show a nonsignificant departure for a goodness of fittest while a large set showing a slight departure will show significance. Thisagain illustrates some of the potential pitfalls of selecting statistical tests on thebasis of the results of other tests.

Table 17.2 Parametric tests and their nonparametric equivalents.

Objective Parametric Nonparametric

Description of a group Mean and standard devia-tion (SD)

Median and interquartilerange (IQR)

One group with standardvalue

One-sample t-test Wilcoxon rank sum test

Compare two groups(unpaired)

Unpaired t-testa Mann–Whitney

Compare two groups(paired)

Paired t-test Wilcoxon rank sum test

CompareZ 2 groups One-way ANOVA Kruskal–WallisCompareZ 2 matchedgroups

Repeated-measuresANOVA

Friedman test

Test for linear trend Linear component inANOVA

Jonkheere–Terpstra trendtest

Association (2 variables) Pearson’s product momentcorrelation

Spearman correlation

Predict dependent variablefrom independent

Linear regression Nonparametric regression

aNote: there are a number of versions depending upon whether the within group variability isassumed to be the same and pooled for the analysis.

435Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 13: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

Care should be taken in choosing statistical tests if the cells have beenclassified as either responders or nonresponders or some such similar binaryresponse. Both the chi-square and Fisher exact tests of 2� 2 tables assumeindependence of the data but if individual cells are used in the analysis ratherthan the correct experimental unit such as the culture or the animal then thetests are vulnerable to producing highly significant but incorrect results. Datain this form can be expressed as the percentage of responder cells and analysedusing methods developed for handling proportional data such as appropriatelogistic regression models or by analysis of variance after an arcsin, angular orlogit transformation. Escobar et al.,30 for instance, used ordered logisticregression to investigate the use of the Comet assay combined with fluorescencein situ hybridisation (Comet–FISH) to detect DNA breakage in the specificchromosomal regions in in vitro TK6 lymphoblastoid cells.In conclusion, there is no general consensus as to which method should be

the standard.8 In practice, because of the different statistical philosophiesunderlying statistical analysis this is likely to continue especially as approachesbased upon estimation rather than hypothesis testing, Bayesian methods andmodelling become more widely used. A range of different methods can,therefore, be used. It may be useful to see if different methods give broadlysimilar results. If not, and the conclusion differs, it would be sensible to explorethe data to try to identify what causes the difference in interpretation.

17.7 Use of Control Groups

Many experimental studies will have two types of concurrent controls includedin the design: a negative (vehicle) control and a positive control using a com-pound known to produce comets. Further control groups may be included ifcomparisons between, for instance, untreated and vehicle-treated groups wereconsidered relevant.The negative vehicle control is a comparator for the various treated groups

and is involved in the formal statistical comparison between groups. The rolefor the positive control is different. It may be included to characterise thesensitivity of the test method and to provide a check or an evaluation of thetesting techniques of the laboratory.Statistical tests between the negative and positive control groups can mis-

leadingly producing nonsignificant results because of the small sample sizestogether with the high variability sometimes found in the positive control groupresulting from variability in response such as can arise from a mixture ofresponders and of nonresponders. This would create problems if there was adecision rule that an experiment was rejected as unsatisfactory if a significantdifference was not found between the two groups.It is, therefore, not necessary to make formal statistical tests between the two

control groups. Rather, as the purpose of the positive control is to check thetechnique, consideration should be given to methods that minimise the numberof animals needed to provide this reassurance. An approach that made more

436 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 14: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

use of historical control data or used a few or even one concurrent positivecontrol animal to demonstrate technical capability would reduce animal usage.In this context, although methods could be developed that took into accountthe difference between the two control groups, it does not seem sensible toinclude the positive control data in any formal testing method for the testmaterial.The development of sets of both negative and positive historical control data

by a laboratory is a resource that could be used in conjunction with qualitycontrol (QC) methods31,32 to assess the quality of the concurrent experimentalwork and to identify and, if necessary, correct any long-term drift in the per-formance of the assay by the laboratory. Such an assessment could be part ofthe evaluation of whether the experimental study is satisfactory.

17.8 Assessment of Results

Controversies over the use of the statistics to assess the results of studies oftenrelate to the use of probability values to draw a conclusion. The equating ofa significant effect with a positive result and a nonsignificant effect with anegative result is a serious problem. This is a symptom of the greater problemof equating significance testing and P values with decision making. Increasinglystatistical opinion is moving from the concept of hypothesis testing to the ideaof estimation and model testing.12 Much more emphasis, therefore, should begiven to the estimation of the size of an effect and the confidence intervalassociated with it than the specific statistical significance level.One example of this philosophy is the increased emphasis on the iterative and

data-driven aspect of model building that contrasts with the development ofcodified statistical analysis plans (SAPs) increasingly required in regulatoryscience.33 Longford and Nelder, in particular, criticise what they call the ‘‘cultof the single study’’, the use of P values, multiple comparison and of non-parametric tests in the provision of evidence to regulatory authorities. Nester34

discusses some of the philosophical underpinning of statistical analyses and hasproduced a set of quotes criticising the use of significance testing. (This isreproduced at http://welcome.warnercnr.colostate.edu/~anderson/nester.html.)The comet experiment is a test of whether a compound is biologically active.

The statistical tests of whether a test such as the Comet assay is a good pre-dictor of say, genotoxicity, is different from whether it is detected as positive inan experiment.The finding of a significant effect of a treatment in a comet experiment does

not mean that this compound is predicted to be, say, a carcinogen. Dichot-omisation of results into genotoxic or nongenotoxic classification based upon adecision rule may be a convenient management/regulatory endpoint. However,dichotomisation leads to a loss of information.33,35 A consequence is that someweak mutagens will be ‘‘called’’ negative and disagreements will occur whendifferent criteria are used by different laboratories. Longford and Nelder pointto the potential of modelling approaches to handle the dichotomisation

437Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 15: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

problem. It is also important to note that the two statistical procedures:assessing the results of a study and for measuring the methods predictive abilityare different processes and that equating a negative result as a consequence of adichotomisation into a mechanistic threshold is a serious error.Comparisons against some Gold Standard provide a test of whether the

assay is a good discriminator of carcinogens and noncarcinogens.36 A 2� 2table for the properties of a diagnostic test are shown in Figure 17.3. Estimatesof statistics such as sensitivity and specificity can be derived. Although the 2� 2table is superficially similar to that in Figure 17.2 the false-positive and false-negative errors are different from the Type 1 and Type 2 errors associated withhypothesis testing.

17.9 Multiple Comparison Issues

Many studies involve multiple comparisons such as between each dose level andthe concurrent negative control or between sets of subgroups or correlations.This can raise the concern that when a number of hypothesis tests are carriedout some results will be significant by chance alone. Figure 17.4 shows that if20 independent comparisons are made at the significance level P¼ 0.05 thenthere is a 64% chance that one or more of these comparisons will be significantby chance alone even though none of the groups are, in fact, different fromone another. Using a significance level P¼ 0.01 the corresponding percentageis 18%.

Sensitivity = a/(a+c)Specificity = d/(b+d)Positive Predictive Value (PPV) = a/ (a+b) Negative Predictive Value (NPV) = d/ (c+d) Prevalence = (a+c)/N Where N = a+b+c+d

Disease or carcinogenic status

Present /Carcinogenic

D

Absent / non-carcinogenic

Diagnostic or STTresult

Positive

+

a b

Negative c d

Figure 17.3 Diagnostic test statistics. 2� 2 table showing statistics derived from theuse of a diagnostic test or a short term-test (STT) to predict disease orcarcinogenic status.

438 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 16: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

This is of particular concern when a series of post hoc comparisons arecarried out after the study has been completed and there is a danger of ‘‘datadredging’’. A number of multiple comparison methods have been proposed totry to address this problem.37

Multiple comparison methods aim to control the Type 1 error (false rejectionrate) by managing the experiment- or family wise error rate (EER or FWER) orthe individual or comparisonwise error rates (CWER). The consequence oftheir use is to lower the power of the study, in effect, ‘‘damping’’ down thesignificance of the results. Two widely used multiple comparison methods intoxicology are Bonferroni’s correction and Dunnett’s test.The Bonferroni correction adjusts the significance level that a hypothesis test

is carried out by taking the number of comparisons (n) being made intoaccount. A simple approximation is to use a/n as the significance level forrejecting the null hypothesis or by multiplying the actual P value obtained by nand comparing this with the significance level of, say, 0.05. This is a highlyconservative method. Other multiple comparisons methods are somewhat lessconservative.Dunnett’s test was originally designed to test multiple treatments against a

common control. It was designed to maintain the EER at 0.05 meaning thatexperiments where one or more of the comparisons with the negative controlwere falsely declared as significant would occur on average only one in every20 similar experiments. It is arguable whether its use is appropriate for studies

100500

0.0

0.5

1.0

No. of tests

P(o

ne o

r m

ore

test

s si

g.)

Probability of a significant result with increasing numbers of tests

P<0.05

P<0.01

Figure 17.4 Multiple comparisons. The multiple comparison problem: the prob-ability of one of more significant results occurring when using P¼ 0.05 orP¼ 0.01 critical values in a set of tests when the null hypothesis of nodifference between the groups is true.

439Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 17: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

where there is an explicit dose–response contrast in the design. A furtherargument against its use in that the size of effect detected as significant dependsupon the number of other groups in the comparison even when some of thesegroups may not be relevant to the comparison of interest. It has also beensuggested that a larger negative control group should be included in the designto provide a better estimate of negative control values This should be about theaverage treated group size multiplied by the square root of the number oftreatment groups.37

A criticism of multiple comparison methods is that they ignore the structureof a carefully designed experiment where the doses and groups sizes have beenchosen to have a high probability of identifying an effect of a certain size that isbiologically important or explicitly includes a dose–response component. Testsof this comparison have appreciable statistical power but the use of multiplecomparisons together with corrections will reduce the power appreciably. Theuse of multiple comparison methods when there is a specific a priori designedcomparison explicit in the study cannot be recommended.It is important to identify any comparisons planned before the study begins

(a priori) and to be transparent in the set of comparisons to be made. Planneda priori tests are preferred to a posterior comparisons because the former is acase of hypothesis testing, the latter of hypothesis generation. There is still amultiple comparison issue when there are many a priori tests. However, in thecase of experimental studies this is less of a problem because only a smallnumber of specific contrasts are explicitly included in, say, a factorial design orin the test for a linear trend/contrast in a dose–response relationship. Arecommendation is that the exact P values without multiple comparisonadjustments should be reported. Statistical contrasts reflecting the underlyingexperimental design should also be reported.In observational and human clinical trials subgroup analysis is a common

secondary objective. This is a controversial area. Lagakos38 provides a clearexposition of the issue involved in subgroup analysis. Finding an effect in onesubgroup but not another (say in one sex only) is a treatment � group inter-action. A consideration is whether the interaction is qualitative or quantitative.The power of the test for an interaction is lower than that for the main effect.The finding of uniform effects in all subgroups (i.e. lack of interaction) shouldbe reassuring. However, as the individual subgroup tests have lower powertheir interpretation may be misleading if a hypothesis-testing approach is usedrather than one based upon the estimation of the size of an effect.The danger of subgroup analysis is that it can become ‘‘data dredging’’. In

the case of conventional clinical trials any proposed subgroup analyses shouldbe predetermined and included in the statistical analysis plan as there is adanger of Type 1 errors. A formal test for the interaction should also be carriedout. Mastaloudis et al.,39 for instance, report a sex–treatment interaction. Inthis study, endurance exercise resulted in DNA damage as shown by the Cometassay and antioxidants seemed to enhance recovery in women but not in men.In observational studies the problem is complicated because many covariates

may have been measured that may lead to subgroup analyses and multiple

440 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 18: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

model fitting. Effects not seen in individual subgroups may be seen when thegroups are combined. Ideally, an experimental plan should be developed beforethe analysis is conducted to identify which subgroups will be investigated andhow the potential risk of Type I errors will be evaluated. Post hoc comparisonsthat appear interesting by inspection should be viewed with considerable care.Any comparison chosen after a study has been completed solely because it islarge is, almost by definition, likely to be statistically significant in a standardtest.Multiple comparison issues also arise when many measures are made on the

same unit such as the multiple endpoints possible on a cell with the Cometassay. However, many of these endpoints are likely to be correlated so that theresults should be consistent (any bias in the measures should appear in theanalyses for all the measures). Multiple comparison approaches areuseful when the objective is to screen a large set of chemicals or genes as incompound screening or microarray studies and to select out a subset forfurther study.In conclusion, the use of multiple comparison methods is a controversial area

with considerable debate amongst statisticians over their use. Some statisticiansargue ‘‘multiple comparison methods have no place at all in the interpretationof data’’.40 Others argue that all hypothesis testing is a multiple comparisonapproach and that further corrections are inappropriate.41,42

17.10 Power and Sample Size

A study should be designed to have a high chance of detecting an effect of adefined size. The power of a study is defined as the probability of detecting aneffect of a specified size if it is really there.Power calculations and/or sample-size determinations are increasingly

required for regulatory and ethical reasons. Simple designs can be handled bystandard software packages, interactive web sites tables or equations. It isimportant to realise that the sample-size determination carried out at the designstage is increasingly expected to be transparent and subject to scrutiny by astatistician on a grant, regulatory or ethics board.Five pieces of information are needed for an estimate of the sample size of a

study using quantitative measures. These are the required power, the sig-nificance level, the size of the effect and a measure of the variability such as thebetween unit standard deviation (SD). A fifth piece of information is whetherthe test will be one- or two- sided.Similarly, power calculations can be carried out for qualitative data. Again,

five pieces of information are needed for an estimate of the sample size of astudy. There are the required power and the significance level, the proportionsin the control and treated groups. Again, a decision is needed on whether thetest is carried out as one- or two-sided.Most of the software is flexible enough to be able to provide sample sizes for

a given power or the power associated with a particular sample size. Note that

441Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 19: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

many programs give the number of units per experimental group rather thanthe total number of units needed in the full experiment. Note also that studieswith qualitative endpoints have appreciably lower power that those withquantitative endpoints because of their lower information content.Usually, a power of 80 or 90% is chosen and an alpha of either 0.05 or 0.01.

Defining the size of the treatment effect, the biologically relevant difference(brd) is more difficult. This value is a matter of scientific judgement but clearlyhas to be realistic. Data from previous and pilot studies can provide estimatesof the brd and the variability of the material.The brd could be either an absolute or relative difference. An absolute dif-

ference might, for instance, be an increase in the % tail DNA from 10% to20%. A relative difference might be a 2- or 3-fold change. In one case anincrease from 5% to 15% would be the same as from 10% to 20% but on a foldchange the first would be 3-fold, the second 2-fold. For a power calculationsome estimate of variability is required. If the variability is the same across thescale then the sample size/power associated with the two examples would be thesame and the power associated with the fold change would be dependent uponthe negative control incidence. The researcher needs to select the measure thatis most relevant to the system being investigated. If large absolute changes arerelevant then fold changes may be appropriate but if changes relative to theunderlying background control incidence (noise) are of interest then an abso-lute difference may be more relevant. A similar discussion takes place con-cerning fold changes compared to statistically significant differences in astatistical test (modified t-test) for the identification of important genes inmicroarray studies.43

Estimates of variability in the interexperimental units are needed for inclu-sion in power and sample-size calculations. Results reported in the literatureare one source. For example, Muller et al.44 reported a relative interpatientcoefficient of variance (CV) of 14.3% and a relative average intrapatient CV of15.3% in a study of effects of fractionated radiotherapy on the DNA-repaircapacity of lymphocytes in 50 patients based upon a measure of the relativeamount of DNA in the comet tail.An alternative approach is based upon the work of Cohen.45 Cohen suggests

expressing the difference based upon standard deviation units. He defined asmall difference as equivalent to 0.2 standard deviation units, a medium0.5 units and large effect 0.8 units. A simple rule of thumb is that for a two-sided test of the means of two groups at alpha¼ 0.05 and with 80% power thatthe sample size in each group increases by approximately 4-fold for everyhalving of the effect size in SD units. Cohen’s effect size approach is potentiallyuseful when information from previous studies is difficult to obtain. However,the approach, while useful, is not without its critics. Lenth, for instance, iscritical of its indiscriminant use.46

Methods exist for estimating the power and sample sizes for more complexexperimental designs but these often require the specification of a particularhypothesis. The power of more complex studies can be investigated by simu-lating patterns of results of interest. A number of books also provide methods

442 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 20: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

for sample size, etc., determinations of more complex designs especially in thecontext of clinical studies.47,48

Simple power calculations assume that the animal or the culture is theexperimental unit. Sample-size calculations can also be carried out if the designis considered clustered or hierarchical. Examples of ‘‘clusters’’ are pupils withina school or patients within a practice. Members of a cluster are autocorrelatedin that they are more like each other than a member of another cluster: Sample-size estimates for clustered studies need a measure of this autocorrelation (theintracluster correlation or ICC) to be included in the calculation.49 Softwareexists for these calculations.50

17.11 Human Studies

The Comet assay, because of its relative simplicity and versatility, is a con-venient and popular biomarker or surrogate for human population/biomoni-toring studies of DNA damage. It is convenient because it is quick, sensitive,only needs a small number of cells, has low invasiveness and can be used withboth proliferating and nonproliferating cells (nasal and buccal epithelium,leucocytes and exfoliated bladder cells) from individuals.Human studies can be intervention trials such as clinical trial or volunteer

studies or biomonitoring involving the analysis of samples from individualswho have various conditions or potential exposures. Clinical or volunteer trialsare intervention studies where there is a well-defined design such as in therandomised clinical trial (RCT). Biomonitoring studies are usually observa-tional and may be case-control (retrospective), cohort (prospective) and cross-sectional (both prospective and retrospective) studies. In general, more weightis given to cohort than case-control studies because of their better quality ofdata and the lack of recall bias.Wasson et al.,51 for instance, reviewed the use of the Comet assay as a

biomarker in the study of human nutrition and cancer. Their table 1 illustratedthe range of human studies carried out in one area, antioxidant dietary factors,using case-control, cross-sectional and intervention studies. Many of theintervention studies were of the order of 10–30 subjects probably reflecting costas opposed to formal sample-size considerations with the cross-sectional andcase-control studies being, in general, somewhat larger.The main problems with observational studies are bias and confounding.

Confounding occurs when other factors are associated with the factor understudy and may result in incorrect conclusions being drawn. It represents athreat to the internal validity (the underlying causal relationships) in a study.The extreme sensitivity of the Comet assay to detect DNA damage and repairat the single-cell level at very low exposures levels is a major advantage of thesystem but it also makes it vulnerable to biases such as can be introduced inobservational or ‘‘nonrandomised research’’. Bias is where there is a systematicdifference in the measures taken in one group compared with another. Anexample might be where samples from control and exposed individuals are

443Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 21: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

collected at different times and processed separately. Confounding would occurif an exposed group were, for instance, a set of male middle-aged manualworkers and the controls were a set of younger mainly female office workers.The possible effect of any exposure would be confounded with any effect of sexand/or age.Methods to try to protect against the effect of confounding are similar to

those used in the experimental situation: randomisation, restriction, matching,stratification and adjustment. The aim of randomisation is the random dis-tribution of known and unknown confounders between study or experimentalgroups. Restriction aims to exclude individuals with confounding factors butthis approach can itself introduce biases. Individuals or groups may be matchedto try to equalise the distribution of confounders between the groups. Strati-fication (the equivalent of blocking in the experimental situation) tries to ensurethat confounders are distributed evenly within each stratum.Confounders are not always known (termed residual confounding or lurking

variables). Randomisation provides the best protection against both knownand unknown confounders with its random distribution of units to the varioustreatment groups. (This is one of the main arguments for randomisation in anintervention study such as an RCT.)Many factors could affect the quality of samples before they are analysed. In

order to prevent systematic biases being introduced into data derived fromthese sample guidelines on the collection and processing should be followed.52

Randomisation should be applied to all aspects of the study. Care is neededto ensure that samples, for instance, are processed in a random order. Studiesshould also be run as ‘‘blind’’ as possible to minimise biases being introduced.It is not appropriate to confine randomisation just to the allocation of indivi-duals to the treatment groups and then to process the samples, etc., in a sys-tematic order after the code has been broken.Data can be adjusted or standardised through the use of multivariate

methods such as multiple regression (this only works if the confounders can beidentified and measured). Including covariates in multivariate analysis is anattempt to remove biases introduced by confounding but the analysis is open tocriticism associated with the use of multiple regression methods. Modellingapproaches need to be transparent with the choice of variables to be included orexcluded, the tests of model fit, assumptions made explicit, sensitivity analysesconducted and all available for scrutiny. Mullner et al.53 discuss the reportingof statistical methods to adjust for confounding and Campbell54 providesrecommendations for reporting such analyses.Matched studies are analysed differently from unmatched: conditional

logistic regression is used for matched and unconditional logistic regression forunmatched. Case-control matching of potential confounders, e.g. age, carries arisk of either over- or undermatching.Dusinska and Collins55 have reviewed the use of the Comet assay in bio-

monitoring (in particular for studies of gene–environment interaction).Table 17.3 lists some of the issues to bear in mind in the conduct of biomo-nitoring studies.

444 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 22: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

17.12 Standardisation and Interlaboratory

Comparisons

A common feature of reviews of Comet assay studies are discussions of theproblems associated with reviewing data across different studies. Jha,56 forinstance, discussed issues relating to optimizing procedures and generalisationof historical control data with relevance to the use of the Comet assay inecogenotoxicology studies particularly with respect to the development of thetest so that it is reliable, reproducible and robust.

Table 17.3 Points to consider in human studies to minimise effects of con-founding and bias.

Ensure that there is appropriate ethical approval.Ensure appropriate sample sizes (power calculation to determine numbers needed ingroups).

Include appropriate controls groups (unexposed or untreated or placebo treated).In an intervention study ensure that participants are randomly assigned to treatmentgroups.

Ensure that control and treated/exposed samples are collected at the same time (avoidcollecting batches of controls and treated/exposed samples at different times).

In particular, avoid confounding effects such as seasonal effects, day of week differen-tially affecting groups.

If sample sizes impractical to handle as one randomised group, use ‘‘blocking’’ tominimize the effect of changes over time.

Carry out consistent sampling throughout study using same batch of reagents, equip-ment etc.

Randomise order of all procedures and scoring of samples to minimise effects of anyuncontrollable variables.

Carry out all procedures and scoring, where practical, blind of the identification,treatment or exposure group the sample was derived from.

Include relevant negative and positive control samples to provide check on technique.Ensure replication of samples to provide estimate of variability and to avoid completeloss of samples.

Ensure similar storage of batches of samples.Ensure a common standard protocol is used throughout the study. Don’t change ormodify protocol during the course of the study.

Process samples at random to avoid introducing accidental biases.Work to good laboratory practice (or at least in the spirit of it).Process all samples at the same time or arrange to handle in blocks with groups equallyrepresented in batches.

Ensure that all scoring within a study is carried out by the same experienced scorer. Userandom order and blind scoring to minimize the effect of any ‘‘drift’’ in performanceover time.

If more than one scorer needs to be used, organise scoring so that scorers are, in effect,‘‘blocks’’.

Ensure scoring of samples is done blind.Identify experimental unit and apply appropriate statistical methods.Consider implications of missing data when analysing results: determine whetherintention to treat (ITT) or per protocol analyses are most appropriate.

445Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 23: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

Forchammer et al.22 have pointed to the difficulties of making comparisonsof Comet assays results across studies because of different experimental pro-tocols and methods of reporting data. They found appreciable interscorervariability in measures of Comet assay among 8 experienced scorers that couldnot be reduced by the use of investigator specific calibration curves and con-cluded that these difference in scoring are ‘‘a strong determinant of DNA-damage levels measured by the Comet assay’’. Collins et al.4 also noted scorereffects between experienced trained scorers and suggest that the same scorershould score all samples from a specific project/experiment.McKenna et al.57 discussed the use of the Comet assay in a clinical setting

pointing out the current limitations in the use of the Comet assay for predic-tion. They stressed the need for more standardisation of protocols and multi-laboratory validation trials. Controlling these variables is a challenge to theacceptance of the comet as a reliable method for measuring DNA damage formonitoring exposures to DNA-damaging agents or for its use as diagnostictests. Taube et al.,58 for instance, suggest three tests before a diagnostic tool isadopted for routine use. First, it needs to be robust and reproducible, secondly,proven to be useful in the clinic and, thirdly meet a need and produce a benefit.More emphasis is, therefore, needed for developing guidelines for the devel-opment of predictive biomarkers to reduce poor experimental design, and theuse of inappropriate or misleading statistical analyses, nonstandard protocolsand the lack of reproducibility. Guidelines such as those developed for asses-sing tumour biomarker prognostic and diagnostic studies59 would be useful.Opportunities exist to develop databases of results of published Comet assaysthat would be suitable for meta- and teloanalysis.60

A number of interlaboratory comparisons are in progress. Examples includethe in vitro and in vivo validations studies being organised by Japanese Centerfor the Validation of Alternative Methods (JaCVAM) and the ESCODDstudy.61 Lovell and Omori8 have discussed issues related to the design ofinterlaboratory comparisons and validations studies. Guidelines have beendeveloped for studies of repeatability and reproducibility in intra- and inter-laboratory comparisons.62,63

References

1. S. Brendler-Schwaab, A. Hartmann, S. Pfuhler and G. Speit, The in vivoComet assay: use and status in genotoxicity testing, Mutagenesis, 2005, 20,245–254.

2. B. Burlinson, R. R. Tice, G. Speit, E. Agurell, S. Y Brendler-Schwaab, A. R.Collins, P. Escobar, M. Honma, M., T. S. Kumaravel, M. Nakajima, Y. F.Sasaki, V. Thybaud, Y. Uno, M. Vasquez and A. Hartmann, In vivo CometAssay Workgroup, part of the Fourth International Workgroup on Geno-toxicity Testing.) Fourth International Workgroup on Genotoxicity Testing:result of the in vivo Comet assay workgroup, Mutation Research 627 31–35.

3. A. R. Collins, The Comet assay for DNA damage and repair: principles,applications, and limitations, Mol. Biotechnol., 2004, 26, 249–261.

446 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 24: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

4. A. R. Collins, A. A. Oscoz, G. Brunborg, I. Isabel Gaivao, L. Giovannelli,M. Marcin Kruszewski, C. C. Smith and R. Stetina, The Comet assay:topical issues, Mutagenesis, 2008, 23, 143–151.

5. N. P. Singh, M. T. McCoy, R. R. Tice and E. L. Schneider, A simpletechnique for quantitation of low levels of DNA damage in individual cells,Exp. Cell Res., 1988, 75, 184–191.

6. R. R. Tice, E. Agurell, D. Anderson, B. Burlinson, A. Hartmann,H. Kobayashi, Y. Miyamae, E. Rojas, J. C. Ryu and Y. F. Sasaki, Singlecell gel/Comet assay: guidelines for in vitro and in vivo genetic toxicologytesting, Environ. Mol. Mutagen., 2000, 35, 206–221.

7. A. Hartmann, E. Agurell, C. Beevers, S. Brendler-Schwaab, B. Burlinson,P. Clay, A. Collins, A. Smith, G. Speit, V. Thybaud and R. R. Tice, 4thInternational Comet Assay Workshop. Recommendations for conductingthe in vivo alkaline Comet assay. 4th International Comet Assay Work-shop, Mutagenesis, 2003, 18, 45–51.

8. D. P. Lovell, G. Thomas and R. Dubow, Issues related to the experimentaldesign and subsequent statistical analysis of in vivo and in vitro cometstudies, Teratogenesis Carcinog. Mutagen., 1999, 19, 109–119.

9. D. P. Lovell and T. Omori, Statistical issues in the use of the Comet assay,Mutagenesis, 2008, 23, 171–182.

10. R. A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural andMedical Research, 6th edn, Oliver and Boyd, Edinburgh, 1963.

11. M. J. Gardner and D. Altman, Confidence intervals rather than Pvalues: estimation rather than hypothesis testing, Br. Med. J., 1986, 292,746–750.

12. D. G. Altman, T. N. Bryant, M. J. Gardner and D. Machin (eds), Statisticswith Confidence—Confidence Intervals and Statistical Guidelines, 2nd edn.,BMJ Books, London, 2000.

13. D. G. Altman, S. M. Gore, M. J. Gardner and S. J. Pocock, Statisticalguidelines for contributors to medical journals, Br. Med. J., 1983, 286,1489–1493.

14. A. B. Hill, The environment and disease: Association or causation? Proc.R. Soc. Med., 1965, 58, 295–300.

15. J. B. Copas and H. G. Li, Inference for non-random samples (with dis-cussion), J. Roy. Stat. Soc., 1997, 59, 55–95.

16. R. A. Fisher, Statistical Methods for Research Workers. Oliver and Boyd,Edinburgh, 1925.

17. R. A. Fisher, Design of Experiments. Oliver and Boyd, Edinburgh,1935.

18. G. E. P. Box, W. G. Hunter and J. S. Hunter, Statistics for Experimenters.An Introduction to Design, Data Analysis, and Model Building, 2nd edn.,Wiley, 2005.

19. A. Collins, M. Dusinska, M. Franklin, M. Somorovska, H. Petrovska,S. Duthie, L. Fillion, M. Panayiotidis, K. Raslova and N. Vaughan, Cometassay in human biomonitoring studies: reliability, validation, and appli-cations, Environ. Mol. Mutagen., 1997, 30, 139–146.

447Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 25: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

20. M. Pitarque, A. Creus, R. Marcos, J. A. Hughes and D. Anderson,Examination of various biomarkers measuring genotoxic endpoints fromBarcelona airport personnel, Mutation Research, 1999, 440, 195–204.

21. R. D. Bowden, M. R. Buckwalter, J. F. McBride, D. A. Johnson, B. K.Murray and K. L. O’Neill, Tail profile: a more accurate system for analyzingDNA damage using the Comet assay, Mutation Research, 2003, 537, 1–9.

22. F. Forchhammer, E. V. Brauner, J. K. Folkmann, P. H. Danielsen,C. Nielsen, A. Jensen, S. Loft, G. Friis and P. Møller, Variation inassessment of oxidatively damaged DNA in mononuclear blood cells bythe Comet assay with visual scoring, Mutagenesis, 2008, 23, 223–231.

23. S. J. Wiklund and E. Agurell, Aspects of design and statistical analysis inthe Comet assay, Mutagenesis, 2003, 18, 167–175.

24. T. Cavas and S. Konen, Detection of cytogenetic and DNA damage inperipheral erythrocytes of goldfish (Carassius auratus) exposed to a gly-phosate formulation using the micronucleus test and the Comet assay,Mutagenesis, 2007, 22, 263–268.

25. S. H. Hurlbert, Pseudoreplication and the design of ecological fieldexperiments, Ecol. Monographs, 1984, 54, 187–211.

26. S. H. Hurlbert, On misinterpretations of pseudoreplication and relatedmatters: a reply to Oksanen, Oikos, 2003, 104, 591–597.

27. L. Machlis, P. W. D. Dodd and J. C. Fentress, The pooling fallacy: pro-blems arising when individuals contribute more than one observation tothe dataset, Z. Tierpsychol., 1985, 68, 201–214.

28. C. C. Smith, D. J. Adkins, E. A. Martin and M. R. O’Donovan, Recom-mendations for design of the rat Comet assay, Mutagenesis, 2008, 23,233–240.

29. P. Duez, G. Dehon, A. Kumps and J. Dubois, Statistics of the Cometassay: a key to discriminate between genotoxic effects, Mutagenesis, 2003,18, 159–166.

30. P. A. Escobar, M. T. Smith, A. Vasishta, A. E. Hubbard and L. Zhang,Leukaemia-specific chromosome damage detected by comet with fluores-cence in situ hybridization (Comet-FISH), Mutagenesis, 2007, 22, 321–327.

31. T. P. Ryan, Statistical Methods for Quality Improvement, 2nd edn., JohnWiley and Sons, New York, 2000.

32. E. Mullins, Statistics for the Quality Control Chemistry Laboratory, RoyalSociety of Chemistry, Cambridge, 2003.

33. N. T. Longford and J. A. Nelder, Statistics versus statistical science in theregulatory process, Stat. Med., 1999, 18, 2311–2320.

34. M. R. Nester, An Applied Statistician’s Creed, Appl. Stat., 1996, 45,401–410.

35. D. G. Altman and P. Royston, The cost of dichotomising continuousvariables, Br. Med. J., 2006, 332, 1080.

36. D. Kirkland, M. Aardema, L. Henderson and L. Muller, Evaluation of theability of a battery of three in vitro genotoxicity tests to discriminate rodentcarcinogens and non-carcinogens. I. Sensitivity, specificity and relativepredictivity, Mutation Research, 2005, 584, 1–256.

448 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 26: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

37. J. E. De Muth, Basic Statistics and Pharmaceutical Statistical Applications,2nd edn., Chapman and Hall/CRC, Boca Raton, FL, 2006.

38. S. W. Lagakos, The Challenge of Subgroup Analyses – Reporting withoutDistorting, New Eng. J. Med., 2006, 354, 1167–1669.

39. A. Mastaloudis, T.-W. Yu, R. P. O’Donnell, B. Frei, R. Dashwood andM. Traber, Endurance exercise results in DNA damage as detected by theComet assay, Free Rad. Biol. Med., 2004, 36, 966–975.

40. J. A. Nelder, Contribution to the Discussion of R. T. O’Neill and B. G.Wetherill. The present state of multiple comparison methods, J. Roy. Stat.Soc., B, 1971, 33, 218–241.

41. K. J. Rothman, No adjustments are needed for multiple comparisons,Epidemiology, 1990, 1, 43–46.

42. S. Greenland, Multiple comparisons and association selection in generalepidemiology, Int. J. Epidemiol., 2008, 37, 430–434.

43. D. M. Witten and R. Tibshirani (2007) A comparison of fold-change andthe t-statistic for microarray data analysis. Posted at http://www-stat.stanford.edu/Btibs/ftp/FCTComparison.pdf ).

44. W.-U. Muller, T. Bauch, C. Streffer and D. Von Mallek, Does radio-therapy affect the outcome of the Comet assay? Br. J. Radiol., 2002, 75,608–614.

45. J. Cohen, Statistical Power Analysis for the Behavioral Sciences, 2nd edn.,Academic Press, New York, 1988.

46. J. Lenth, Statistical power calculations, Anim. Sci., 2007, 85, E24–E29.47. D. Machin, M. Campbell, P. Fayers and A. Pinol, Sample size tables for

clinical studies, 2nd edn., Blackwell Science, Oxford, 1997.48. S.-C. Chow, S. Jun and W. Hansheng, Sample-size calculations in

Clinical Research, CRC Press, Taylor and Francis Group, Boca Raton,2003.

49. S. M. Kerry and J. M. Bland, Statistics Notes. Sample size in clusterrandomisation, Br. Med. J., 1998, 316, 549.

50. M. K. Campbell, M. S. Thomson, C. R. Ramsay, G. S. MacLennan andJ. M. Grimshaw, Sample-size calculator for cluster randomized trials,Comput. Biol. Med., 2004, 34, 113–125.

51. N. T. Holland, M. T. Smith, B. Eskenazi and M. Bastaki, Biologicalsample collection and processing for molecular epidemiological studies,Mutat. Res., 2003, 543, 217–234.

52. G. R. Wasson, V. J. McKelvey-Martin and C. S. Downes, The use of theComet assay in the study of human nutrition and cancer, Mutagenesis,2008, 23, 153–162.

53. M. Mullner, H. Matthews and D. G. Altman, Reporting on statisticalmethods to adjust for confounding: a cross-sectional survey, Ann. InternalMed., 2002, 136, 122–126.

54. M. J. Campbell, Statistics at square two: understanding modern statisticalapplications in medicine, BMJ Publishing Group, London, 2001.

55. M. Dusinska and A. R. Collins, The Comet assay in human biomonitoring:gene–environment interactions, Mutagenesis, 2008, 23, 191–205.

449Statistical Analysis of Comet Assay Data

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online

Page 27: Comet Assay in Toxicology Volume 8 || Chapter 17. Statistical Analysis of Comet Assay Data

56. A. N. Jha, Ecotoxicological applications and significance of the Cometassay, Mutagenesis, 2008, 23, 207–221.

57. D. J. McKenna, S. R. McKeown and V. J. McKelvey-Martin, Potential useof the Comet assay in the clinical management of cancer, Mutagenesis,2008, 23, 183–190.

58. S. E. Taube, J. W. Jacobson and T. G. Lively, Cancer Diagnostics: Deci-sion Criteria for Marker Utilization in the Clinic. Molecular Diagnostics,Am. J. Pharmacogenomics, 2005, 5, 357–364.

59. L. M. McShane, D. G. Altman, W. Sauerbrei, S. E. Taube, M. Gion andG. M. Clark, Reporting Recommendations for Tumor Marker PrognosticStudies (REMARK), Breast Cancer Res. Treat., 2006, 100, 229–235.

60. N. J. Wald and J. K. Morris, Teleoanalysis: combining data from differenttypes of study, Br. Med. J., 2003, 327, 616–618.

61. ESCODD (European Standards Committee on Oxidative DNA Damage).C. M. Gedik and A. Collins, Establishing the background level of baseoxidation in human lymphocyte DNA: results of an interlaboratory vali-dation study, FASEB, 2005, 19, 82–84.

62. International Standards Organization (ISO) Precision of test methods –Determination of repeatability and reproducibility for a standard test byinter-laboratory tests. ISO 5726, 1986.

63. American Society for Testing and Materials, ASTM E691-99 Standardpractice for conducting an Interlaboratory Study to determine the precision ofa test method. ASTM 10 May 1999.

450 Chapter 17

Dow

nloa

ded

by D

uke

Uni

vers

ity o

n 15

Oct

ober

201

2Pu

blis

hed

on 2

7 A

ugus

t 200

9 on

http

://pu

bs.r

sc.o

rg |

doi:1

0.10

39/9

7818

4755

9746

-004

24

View Online


Recommended