+ All Categories
Home > Documents > CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the...

CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the...

Date post: 20-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
24
CROPS AND SOILS RESEARCH PAPER Equivalence criteria for the safety evaluation of a genetically modified crop: a statistical perspective C. I. VAHL 1 * AND Q. KANG 2 1 Department of Statistics, Kansas State University, Manhattan, KS 66506, USA 2 Independent Statistical Consultant, Manhattan, KS 66503, USA (Received 18 July 2014; revised 2 January 2015; accepted 3 March 2015; first published online 8 April 2015) SUMMARY Safety evaluation of a genetically modified (GM) crop is accomplished by establishing its substantial equivalence to non-GM reference crops with a history of safe use. Testing hypotheses of equivalence rather than difference is the appropriate statistical approach. A necessary first step in this regard is to specify a reasonable equivalence criterion that includes a measure for discrepancy between the GM and reference crops as well as a regulatory threshold. The present work explored several equivalence criteria and discussed their pros and cons. Each criterion addresses one of three ordered classes of equivalence: super, conditional and marginal equivalence. Their implications were investigated over an array of parameter values estimated from a real-world dataset. Marginal equivalence was ident- ified as adhering most closely to the concept of substantial equivalence. Because conditional equivalence logically implies marginal equivalence and is practically quantifiable from current field designs, the present work rec- ommends conditional equivalence criteria while encouraging producers to improve their design to enable testing marginal equivalence in the future. Contrary to concerns of the ag-biotech industry, empirical evidence from recent publications indicates that a linear mixed model currently implemented by the European Food Safety Authority is adequate for assessing equivalence despite its lack of genotype-by-environment interaction terms. INTRODUCTION Many countries throughout the world conduct an extensive safety evaluation on genetically modified (GM) crops before permitting their import for food and feed (König et al. 2004; Price & Underhill 2013). Although there are procedural distinctions from country to country, a general consensus among regulatory authorities is that proof of genetically modi- fied organism (GMO) safety can be accomplished under the concept of substantial equivalence (OECD 1993; FAO/WHO 1996; Codex Alimentarius Commission 2009). In accordance, a GM crop is com- pared to conventional non-GM crops in terms of key agronomic, phenotypic and compositional character- istics, or endpoints. Establishing substantial equival- ence involves assessing their differences within the context of natural variation in conventional food crops (FAO/WHO 1996). Until recently, this has been accomplished via statistically comparing the GMO with its near-isogenic non-GM counterpart for each endpoint under the null hypothesis of no differ- ence. A partial list of literature that executes such an assessment includes (Berman et al. 2010; Zhou et al. 2011; Herman et al. 2013; Lepping et al. 2013; Lundry et al. 2013; Privalle et al. 2013; Brink et al. 2014; Venkatesh et al. 2014). Non-significance is taken to mean there is no consequential difference between the two crops for a given endpoint. Unfortunately there are a series of flaws in this proof-of-difference approach. The most obvious is that the logic of statistical hypothesis testing does not allow one to interpret non-significance as evidence for the null hypothesis. At the same time, a significant difference does not necessarily imply practical impor- tance. This shortcoming has been addressed in the lit- erature listed above by a subsequent across-site comparison of the GMO mean against the data * To whom all correspondence should be addressed. Email: vahl@ ksu.edu Journal of Agricultural Science (2016), 154, 383406. © Cambridge University Press 2015 doi:10.1017/S0021859615000271
Transcript
Page 1: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

CROPS AND SOILS RESEARCH PAPER

Equivalence criteria for the safety evaluation of a geneticallymodified crop: a statistical perspective

C. I. VAHL1* AND Q. KANG2

1Department of Statistics, Kansas State University, Manhattan, KS 66506, USA2 Independent Statistical Consultant, Manhattan, KS 66503, USA

(Received 18 July 2014; revised 2 January 2015; accepted 3 March 2015;first published online 8 April 2015)

SUMMARY

Safety evaluation of a genetically modified (GM) crop is accomplished by establishing its substantial equivalence tonon-GM reference crops with a history of safe use. Testing hypotheses of equivalence rather than difference is theappropriate statistical approach. A necessary first step in this regard is to specify a reasonable equivalence criterionthat includes a measure for discrepancy between the GM and reference crops as well as a regulatory threshold. Thepresent work explored several equivalence criteria and discussed their pros and cons. Each criterion addresses oneof three ordered classes of equivalence: super, conditional and marginal equivalence. Their implications wereinvestigated over an array of parameter values estimated from a real-world dataset.Marginal equivalencewas ident-ified as adhering most closely to the concept of substantial equivalence. Because conditional equivalence logicallyimplies marginal equivalence and is practically quantifiable from current field designs, the present work rec-ommends conditional equivalence criteria while encouraging producers to improve their design to enabletesting marginal equivalence in the future. Contrary to concerns of the ag-biotech industry, empirical evidencefrom recent publications indicates that a linear mixed model currently implemented by the European Food SafetyAuthority is adequate for assessing equivalence despite its lack of genotype-by-environment interaction terms.

INTRODUCTION

Many countries throughout the world conduct anextensive safety evaluation on genetically modified(GM) crops before permitting their import for foodand feed (König et al. 2004; Price & Underhill2013). Although there are procedural distinctionsfrom country to country, a general consensus amongregulatory authorities is that proof of genetically modi-fied organism (GMO) safety can be accomplishedunder the concept of substantial equivalence (OECD1993; FAO/WHO 1996; Codex AlimentariusCommission 2009). In accordance, a GM crop is com-pared to conventional non-GM crops in terms of keyagronomic, phenotypic and compositional character-istics, or endpoints. Establishing substantial equival-ence involves assessing their differences within thecontext of natural variation in conventional food

crops (FAO/WHO 1996). Until recently, this hasbeen accomplished via statistically comparing theGMO with its near-isogenic non-GM counterpart foreach endpoint under the null hypothesis of ‘no differ-ence’. A partial list of literature that executes such anassessment includes (Berman et al. 2010; Zhou et al.2011; Herman et al. 2013; Lepping et al. 2013;Lundry et al. 2013; Privalle et al. 2013; Brink et al.2014; Venkatesh et al. 2014). Non-significance istaken to mean there is no consequential differencebetween the two crops for a given endpoint.Unfortunately there are a series of flaws in thisproof-of-difference approach. The most obvious isthat the logic of statistical hypothesis testing does notallow one to interpret non-significance as evidencefor the null hypothesis. At the same time, a significantdifference does not necessarily imply practical impor-tance. This shortcoming has been addressed in the lit-erature listed above by a subsequent across-sitecomparison of the GMO mean against the data

* To whom all correspondence should be addressed. Email: [email protected]

Journal of Agricultural Science (2016), 154, 383–406. © Cambridge University Press 2015doi:10.1017/S0021859615000271

Page 2: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

range and/or estimated percentiles derived from com-mercial non-GM references grown in the same study.However, such comparison suffers from its non-robustness to extreme data values. Further worseningthe situation, the uncertainty of random sampling isnot properly addressed. A less obvious but perhapsmore important deficiency of the proof-of-differenceapproach is that it mitigates the producer’s risk whilefailing to adequately control the consumer’s risk.Under the context of substantial equivalence, the pro-ducer’s risk is measured by the probability of conclud-ing GMO is not comparable to non-GM crops whenthey are actually similar; the consumer’s risk is quan-tified by the probability of accepting equivalencewhen certain features of GMO are severely alteredbeyond expectation. A difference test conducted at afixed significance level (which is also the type I errorrate) controls the producer’s risk in GMO safety evalu-ation; it does not, however, facilitate transparent spe-cification and straightforward management of theconsumer’s risk – a fact running contrary to the consu-mer’s primary status in regulatory affairs. Some produ-cers lessened their risk in safety studies even further byapplying the false-discovery-rate adjustment originallydeveloped for testing multiple endpoints in efficacystudies (Herman et al. 2013; Lepping et al. 2013;Brink et al. 2014). The systematic flaw of the test-of-difference approach has been outlined in a statisticalframework by Berger (1982). Its problematic appli-cation in GMO safety evaluation was illustratedthrough numerical examples in Hothorn &Oberdoerfer (2006).

A proof-of-equivalence approach aims to establishthe similarity of two populations and essentiallyreverses the null and alternative hypotheses of thedifference test. Significance in the equivalence testdoes indeed provide evidence of similarity and the sig-nificance level now represents the consumer’s risk.Additionally, a non-significant result does not necess-arily indicate a difference exists and the producer’srisk can be minimized through increasing the samplesize and optimizing the experimental design. Testingfor equivalence is therefore the appropriate choicefor GMO safety evaluation. Currently the GMOpanel of the European Food Safety Authority (EFSA)mandates two sets of tests in the comparative assess-ment of a GM crop (EFSA 2010). Their differencetests detect potential changes from the near-isogenicnon-GM counterpart and their equivalence testsevaluate GMO’s similarity to commercial non-GMreferences. It is commonly agreed that the difference

between GMO and references should be sufficientlysmall, but not necessarily zero, for one to concludeequivalence. A general expression for the equivalencehypotheses is given as

H0 : DðT;RÞ � ϑ v: H1 : DðT;RÞ < ϑ ð1ÞFunction D(T, R) represents a measure that quantifiesthe discrepancy of the GMO (test product) populationwith respect to the reference population. This measureis fully specified by population parameters rather thandata-dependent statistics. When two populations areidentical,D(T, R) = 0. The positive constant ϑ is a regu-latory threshold for a given D(T, R). It defines the bio-logical equivalence range tolerated by a regulatoryauthority. Schall & Williams (1996) refer to the equiv-alence measure D(T, R) together with its regulatorythreshold ϑ as an equivalence criterion. EFSA (2010)and Van der Voet et al. (2011) recommend testingequivalence through comparison of intervals gener-ated by standard statistical software without definingthe equivalence criterion in terms of model par-ameters. As emphasized by Schall (1995a), the criticalstep towards a compelling procedure for demonstrat-ing equivalence is a clear specification of the equival-ence criterion. Furthermore, it is unnecessary torestrict the choice of criteria according to the limit-ations of traditional statistical techniques as progressin modern statistical methodology already providessatisfactory paths forward. The present work focuseson developing equivalence criteria suitable for GMOsafety evaluation. For the sake of brevity, statisticalmethodologies tailored to these criteria will behandled in separate publications.

Equivalence criteria have been much studied for thecomparison between generic and brand name drugs.For decades, scientists from academia, governmentand industry have debated intensely over themeasure of equivalence under a linear mixed modelsetting (Anderson & Hauck 1990; Sheiner 1992;Schall & Luus 1993; Hauck & Anderson 1994;Schall 1995a; Hauck et al. 1996; Endrenyi & Midha1998; Chen et al. 2000; Dragalin et al. 2003; Haidaret al. 2008). This has led to a better understanding ofissues related to the demonstration of bioequivalence,i.e. equivalence with respect to drug absorption. Newbioequivalence criteria continue to appear asadditional applications emerge (Chow et al. 2010,2013; Kang & Chow 2013). Nonetheless, there areremarkable differences between GMO safety evalu-ation and bioequivalence testing. From the biologicalaspect, in vivo bioequivalence studies of orally

384 C. I. Vahl and Q. Kang

Page 3: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

administered drugs intend to assess the differencebetween test and reference products at the individualsubject level; in vitro bioequivalence studies of nasalsprays and aerosols intend to assess product differ-ences across batches and canisters; GMO safetystudies aim at evaluating the difference betweenGMO and references in the context of natural vari-ation. From the design aspect, in vivo bioequivalencestudies use cross-over designs and in vitro bioequiva-lence studies use completely randomized designs withsub-sampling; GMO safety studies typically use multi-site randomized complete block (RCB) designs. Fromthe modelling aspect, a study on drug bioequivalencereferences one product; a study on GMO equivalenceincludes an assortment of non-GM commercial var-ieties for references. From the decision-makingaspect, statistical significance is required for regulat-ory drug approval; non-significance in testing GMOequivalence does not lead directly to regulatorydenial.The present paper first introduces the linear mixed

model employed by EFSA. The impact of genotype-by-site interaction, also known as G × E with Estanding for environment, is then examined usingresults from several recent publications. Models withmultiple variance components give rise to threetypes of distributions relevant for GMO safety evalu-ation. Equivalence criteria addressing each type ofdistribution are generated using two different machi-neries. The machinery of scaled average equivalence(SAE) compares the test product with the referenceproduct by scaling their squared mean difference tothe variability of the reference distribution. Thismachinery complies with the common practice ofjudging the GMO mean against percentiles of refer-ence distributions. Its usage in assessing GMOequivalence also harmonizes with the current

recommendation by the U.S. Food and DrugAdministration (FDA) for in vivo bioequivalencestudies (Davit et al. 2012). The machinery of distri-bution-wise equivalence (DWE) judges the testproduct against the reference product via their entiredistributions. It is useful for incorporating G × E.Furthermore, DWE criteria deliver straightforwardinterpretation regardless of whether G × E is presentin the model. The criteria introduced here assessthree ordered classes of equivalence. Their practicalimplications are investigated over an array ofparameter values estimated from a real-worlddataset. Finally, the present work provides newperspectives on issues surrounding G × E and makesrecommendations on the choice of equivalence cri-teria for evaluating GMO safety.

MATERIALS AND METHODS

Experimental design and linear mixed models

EFSA (2010) recommends that the field design forGMO safety evaluation contain a minimum of eightsites (nS≥ 8) and a total of at least six referencevarieties (nR≥ 6) with a minimum of three referencevarieties per site. The RCB design is typicallyimplemented within each site, where GMO, its near-isogenic counterpart (control) and the selected refer-ences are randomized into field plots arranged infour blocks (nB(S) = 4). Table 1 presents the samplessizes employed by several major producers in theag-biotech industry. Figure 1 illustrates a practicaldesign where the allocation of references to site wastaken from a safety study on GM cotton (Monsanto2012; Harrison et al. 2013), where the GMO and itscontrol were grown at eight sites together with ninereferences. Each site contains three or four references.

Table 1. Sample sizes of field designs employed by the ag-biotech industry

Producer Species nR nS nB(S) ν

Dow AgroSciences (2011), Lepping et al. (2013) Soybean 6 10 4 3Monsanto (2011) Soybean 16 8 4 3Dow AgroSciences (2012) Soybean 6 10 4 3Monsanto (2012), Harrison et al. (2013) Cotton 9 8 4 3, 4Syngenta Seeds & Bayer CropScience AG (2012) Soybean 6 8 4 6Dow AgroSciences (2013), Herman et al. (2013) Cotton 6 8 4 2, 3Monsanto (2013) Maize 20 8 4 3, 4Venkatesh et al. (from Monsanto) (2014) Maize 22 8 4 4

nR, number of references; nS, number of sites; nB(S), number of blocks per site; ν, number of reference varieties per site.

Equivalence criteria for GMO safety evaluation 385

Page 4: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

Let Yijl be the response of crop genotype i in block lat site j where i = 1,…, nR, T, C; j = 1,…, nS; l = 1,…,nB(S). EFSA (2010) and Van der Voet et al. (2011)applied the following linear mixed model for testingequivalence:

Yijl ¼μR þ Ri þ Sj þ BðSÞlð jÞ þ Eijl i ¼ 1; . . . ; nRμT þ Sj þ BðSÞlð jÞ þ Eijl i ¼ T

μC þ Sj þ BðSÞlð jÞ þ Eijl i ¼ C

8><>: ð2Þ

Because of natural variation among reference geno-types, Model (2) considers genotype as a treatmentfactor with both fixed and random levels. ParametersμR, μT and μC correspond to the respective fixedeffect of the reference, GMO and control genotypegroup. Effect Ri from reference i is a normal randomvariable with mean zero and variance σ2R, i.e.Ri ∼ Nð0; σ2RÞ. It represents a super-population ofreference genotype means centred at zero. Randomeffects corresponding to site and block nestedwithin site are represented by Sj ∼ Nð0; σ2SÞ andBðSÞlðjÞ ∼ Nð0; σ2BðSÞÞ, respectively. The random error

term Eijl ∼ Nð0; σ2EÞ comprises plot-to-plot natural vari-ation and variation due to sample processing andmeasurement. Random effects in Model (2) aremutually independent. The block-within-site term isusually kept in the model so as to fully represent thefield design notwithstanding its trivial effect in com-parison with the other variance components.

Terms representingG × E are customary in assessingGMO efficacy from field experiments. A classical

G × E effect occurs when the difference betweentwo genotypes experiences more variability thancould be explained by a model without interaction.Historical studies on crop composition indicate thatthe effect of G × E could be significantly non-zero insome occasions (Oberdoerfer et al. 2005; Harriganet al. 2007, 2010; Zhou et al. 2011). The ag-biotechindustry viewed the absence of G × E in Model (2) asa fatal flaw and argued against the legitimacy of regu-latory requirements on equivalence testing (Wardet al. 2012). Van der Voet et al. (2012) attended totheir criticism by examining the effect of G × E in ananalysis-of-variance model containing fixed effectsonly. However, it is not clear how to connect resultsfrom two conflicting models and draw a cohesive con-clusion on equivalence. This prompted the presentwork to pursue equivalence criteria under the morecomplex model given below.

Yijl ¼

μR þ Ri þ Sj þG × SRjþR × Sij þ BðSÞlð jÞ þ Eijl

i ¼ 1; . . . ; nR

μT þ Sj þG × STjþBðSÞlð jÞ þ Eijl; i ¼ TμC þ Sj þG × SCjþBðSÞlð jÞ þ Eijl; i ¼ C

8>>>>>>><>>>>>>>:

ð3Þ

Model (3) accounts for G × E with two random terms:G × S and R × S. The interaction of genotype groupwith site is represented by the tri-variate normalrandom vector (G× SRj G × STj G × SCj)′ with mean(0 0 0)′ and covariance σ2G×SI3, where I3 denotes the3 × 3 identity matrix. The interaction of reference

Fig. 1. An example of the field design for a GMO safety study. T, the GM crop; C, the non-GM near-isogenic counterpart;R1∼R9, commercial non-GM references. Allocation of references to sites is taken from Harrison et al. (2013).

386 C. I. Vahl and Q. Kang

Page 5: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

genotypes with site is captured by the random effectR × Sij ∼ Nð0; σ2R×SÞ. All random effects in Model (3)are mutually independent. Other linear mixedmodels that depict G × E in more intricate variance-covariance structures can be envisaged. The strengthof Model (3) resides in its simplicity and interpretabil-ity. Note that when σ2G×S ¼ σ2R×S ¼ 0, Model (3)reduces to Model (2).The safety implication ofG × E has been questioned

by Van der Voet et al. (2012) in their response to Wardet al. (2012). Recent publications provide consistentevidence against the ag-biotech industry’s assertionson the absolute necessity of G × E in modelling cropcomposition. In particular, Harrison et al. (2013) rep-resented the effect of G × E in a way similar toModel (3). To reduce the number of response vari-ables, they processed the 51 compositional endpointsof a GM cotton through principal component analysisand then modelled the resulting scores of seven uncor-related key components in a manner analogous toModel (3) except treating the effect of genotypegroup as random. Table 2 summarizes the relativecontribution of random effects to data total variation.This result is in agreement with previous recognitionthat site, reference genotype and error terms are themajor sources of variation in crop composition,while blocking within site has a minor effect(Berman et al. 2010; Harrigan et al. 2010; Van derVoet et al. 2011; Shewry et al. 2013). Contributionof the two G × E terms to the total variation wasmuch smaller than Ward et al. (2012) speculated,

ranging from 0% in the primary component (account-ing for 38% of data total variation) to 11% in theseventh largest component (accounting for 5% ofdata total variation). Moreover, the negligible effectof G × E has been reported from variance componentanalyses on the principal components of GMsoybean (Harrigan et al. 2013) and on the compo-sitional endpoints of GM maize (Venkatesh et al.2014). Results from these three separate studies indi-cate that the non-zero significance of G × E observedin the past is unlikely to be present to a practicallyimportant degree.

Natural-log transformation before fitting the linearmodel to compositional endpoints is suggested byEFSA (2010). This parallels the recommendation byFDA (2001, 2003c) in bioequivalence studies. To aidthe interpretation of equivalence criteria, it is helpfulto review two properties that connect parameters ofthe log-normal distribution, i.e. normal distributionon the log scale, to the original data scale.

Property 1: A positive random variable X is saidto follow a log-normal distribution ifLn(X)∼N(μ, σ2). The geometric mean ofX is exp(μ), which is also the medianof X. The ratio of the geometric mean(median) for two log-normal randomvariables, LnðX1Þ ∼ Nðμ1; σ21Þ and∼ Nðμ2; σ22Þ ∼ Nðμ2; σ22Þ, is exp(μ1− μ2).

Property 2: The coefficient of variation (CV) of X isffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiexpðσ2Þ � 1

p.

Table 2. An example for the relative contribution of various sources to the total variation in crop composition(Harrison et al. 2013)

Relative contribution to variation in the component score

Component*

Percentage ofdata totalvariationaccounted for (%)

Genotypegroup (%)

Reference(%)

Site(%)

Blockwithinsite (%)

Error(%)

Genotype-group-by-siteinteraction (%)

Reference-by-site interaction(%)

1 38·19 25·84 4·40 50·43 1·82 15·88 1·63 0·002 8·82 12·78 1·94 69·23 0·29 13·65 0·00 2·113 7·75 17·25 43·68 27·98 0·40 7·54 0·83 2·324 7·37 0·00 62·59 28·75 0·23 4·32 1·00 3·125 7·37 0·00 13·94 75·82 0·00 6·70 1·27 2·286 5·95 4·95 13·82 66·79 0·55 6·70 4·87 2·327 5·18 0·26 0·00 0·00 2·11 86·85 0·00 10·78

* Variance components were estimated from scores corresponding to seven key components in principal component analysison 51 compositional endpoints. The seven components cumulatively accounted for 80% of the total variation of the compo-sitional data.

Equivalence criteria for GMO safety evaluation 387

Page 6: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

Equivalence criteria

It is articulated by EFSA (2010) that ‘statistical analysisof data from the experiments for comparative riskassessment is mainly concerned with studying theaverage difference and the average equivalence (AE)over sites. Here, the term ‘average equivalence’ isadopted in the sense used in the drug testing litera-ture’. The AE criterion used in regulating drugs (FDA2001, 2003a, b, c; EMA 2010) corresponds to the fol-lowing hypotheses:

H0 : ðμT � μRÞ2 � ϑAE v: H1 : ðμT � μRÞ2 < ϑAE

The equivalence limit (EL) for (μT− μR)2 is fixed at

ΔAE(ϑAE, σ2) = ϑAE. The FDA sets ϑAE= [Ln(1.25)]2 for

in vivo bioequivalence studies and ϑAE = [Ln(0.9)]2 forin vitro bioequivalence studies. The AE criterion isdeficient for GMO safety evaluation in several ways.First, using [Ln(1.25)]2 or [Ln(0.9)]2 as the one-size-fits-all EL for testing GMO lacks scientific justification.Second, explicit ELs are difficult, if not impossible, tospecify for the large number of compositional end-points (Hothorn &Oberdoerfer 2006). Third, the AE cri-terion fails to consider natural variation.

The concept of substantial equivalence suggestscomparing (μT− μR)

2 to an EL that accounts fornatural variation. Hypotheses (1) are rather abstract.Equivalence criteria to be introduced henceforth canall be re-expressed in the following form.

H0 : ðμT � μRÞ2 � Δðϑ; σ2Þ v:

H1 : ðμT � μRÞ2 < Δðϑ; σ2Þð4Þ

Function Δ(·) represents the EL for (μT− μR)2. Its formu-

lation is derived from one’s choice of D(T, R). Thereare two arguments for Δ(·): constant ϑ is the regulatorythreshold for D(T, R); symbol σ2 denotes the vector ofvariances in the linear mixed model. A practicalinterpretation of Hypotheses (4) is that GMO and refer-ences are deemed equivalent when their absolutemean difference is less than

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiΔðϑ; σ2Þp

. Appendix 1conducts a brief survey on general strategies for devel-oping equivalence criteria. The SAE and DWEmachineries employ the collective strategies of‘moment-based’, ‘reference-scaled’, ‘aggregated’,‘non-symmetrical’ and ‘univariate’. They bothdemand specification of the GMO and reference dis-tributions. It is important to recognize that there arethree types of distributions relevant for testing GMOequivalence under Models (2) and/or (3): the superdistributions of reference genotype means and GMO

mean; the conditional distributions of references andGMO at the site and block level; the marginal distri-butions of references and GMO across sites andblocks. Comparing super distributions, in particularNðμR; σ2RÞ with μT, may seem enticing at first.Nonetheless, EFSA (2010) advocates checking GMOequivalence within individual sites. This, in essence,assesses equivalence on the basis of conditional distri-butions. Recall that traditional GMO safety evaluationaccounts for natural variation by judging, across sites,the GMOmean against the data range and/or estimatedpercentiles from in-study references. This correspondsto assessing equivalence based on the marginal distri-bution of the references. Both conditional and marginaldistributions have been involved in testing bioequiva-lence. The FDA currently considers conditional distri-butions (at the individual subject level) for in vivobioequivalence studies and marginal distributions(across batches and canisters) for in vitro bioequiva-lence studies (FDA 2003a, c; Davit et al. 2012). Also,veterinary medicines are often administered to food-producing animals on a population level via feed orwater; marginal inference is more meaningful there(Toutain & Koritz 1997). In order to thoroughlyexplore the options for assessing GMO equivalence,the present work operated the SAE and DWE machi-neries on super, conditional and marginal distributions.

Scaled average equivalence

The SAE machinery tunes the EL for (μT− μR)2 propor-

tionally to the variance of the reference distribution.Due to the rising awareness of random effects inlinear models, scaling has gradually been recognizedas a sensible alternative to the AE criterion for in vivobioequivalence studies on highly variable drugs(Boddy et al. 1995; Midha et al. 1997; Haidar et al.2008; EMA 2010; Schall & Endrenyi 2010). Its accep-tance by the FDA was made official in 2012 (Davitet al. 2012). Because SAE focuses on comparison ofthe mean difference and does not permit a straightfor-ward incorporation of G × E, the three SAE criteriaintroduced here are expressed in terms of parametersfor Model (2) only. Impact of G × E on GMO safetyevaluation is investigated via the DWE machineryassuming the more complex Model (3).

Reference genotype is a major source of naturalvariation for many endpoints measured in cropsafety evaluation (Van der Voet et al. 2011; Harrisonet al. 2013; Shewry et al. 2013; also see Table 2).The concept of substantial equivalence suggests that

388 C. I. Vahl and Q. Kang

Page 7: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

reference-to-reference variability should be used whenassessing GMO. Van der Voet et al. (2011) consider‘appropriate percentiles (e.g. 2·5 and 97·5) of the distri-bution of reference variety characteristics’ as the limitsfor GMO’s relative deviation from the overall mean ofreferences. Accordingly, Kang & Vahl (2014) proposedthe following hypotheses under Model (2).

H0 : μT � μR � z0:025σR or μT � μR � z0:975σR v:

H1 : z0:025σR < μT � μR < z0:975σR

Constants z0.025 =−1.96 and z0.975 = 1.96 are the2·5th and 97·5th percentiles of N(0,1). By symmetryof N(0,1), z0.975 =−z0.025. To be consistent withForm (1), the above hypotheses can be expressedmore concisely as

H0 :ðμT � μRÞ2

σ2R� z20:975 v:

H1 :ðμT � μRÞ2

σ2R< z20:975

ð5Þ

This reveals that an SAE criterion with ϑSAE ¼ z20:975has been employed. Because the equivalencemeasure in Hypotheses (5) scales (μT− μR)

2 to the var-iance for the super population of reference genotypemeans, equivalence defined by H1 of Hypotheses (5)is referred to as ‘SAE-S’. Although more biologicalsupport is needed to back up the decision on the regu-latory threshold, the current work focused on theequivalence measure and tentatively assumed thatϑSAE ¼ z20:975 is a reasonable choice. Hypotheses (5)can be converted to Hypotheses (4) with

ΔSAE�SðϑSAE; σ2Þ ¼ z20:975σ2R

According to Property 1 of log-normal distributions, theSAE-S criterion considers GMO equivalent to refer-ences, with respect to a key compositional endpoint, ifthe ratio of their geometric means, i.e. exp(μT− μR), isbetween exp(−z0.975σR) and exp(z0.975σR).Hypotheses (5) conforms to the concept of substan-

tial equivalence when σ2R dominates natural variation.For endpoints whose σ2R is considerably smaller thanσ2E , i.e. σ2R ≪ σ2E , scaling leads to low ELs whichdemand (μT− μR)

2 to be near zero. However, this isin conflict with the concept of substantial equivalence.Responses from the same genotype grown at differentfield plots at the same site tend to deviate from μR;but those values are all considered safe. It is then desir-able to incorporate σ2E into GMO testing. Kang & Vahl(2014) propose the following SAE criterion.

H0 :ðμT � μRÞ2σ2R þ σ2E

� z20:975 v:

H1 :ðμT � μRÞ2σ2R þ σ2E

< z20:975

ð6Þ

The scaling in Hypotheses (6) utilizes σ2R þ σ2E , the var-iance for the conditional distribution of references at agiven site and block. Equivalence depicted by H1 ofHypotheses (6), referred to as ‘SAE-C’, complies withVan der Voet et al. (2011) by choosing the 2·5th and97·5th percentiles of the conditional distribution ofreferences as the ELs for μT. It was noticed that theaverage bioequivalence of a highly variable drug isscaled to the within-subject variance of the referencedrug in the cross-over design (Haidar et al. 2008;Davit et al. 2012). Modelling-wise, sites and blocksin the multi-site RCB design operate analogously tohuman subjects in the cross-over design. The SAE-Ccriterion therefore harmonizes with the SAE criteriacurrently employed in drug studies. ConvertingHypotheses (6) into the form of Hypotheses (4) yields

ΔSAE�CðϑSAE; σ2Þ ¼ z20:975ðσ2R þ σ2EÞFor log-transformed data, the SAE-C criterion deemsGMO equivalent to references if the ratio of their geo-

metric means is between expð�z0:975ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiσ2R þ σ2E

qÞ and

expðz0:975ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiσ2R þ σ2E

qÞ. Note that ΔSAE−C(ϑSAE, σ2)≈

ΔSAE−S(ϑSAE, σ2) when σ2R ≫ σ2E . Hypotheses (6) thus

unifies with Hypotheses (5) when variability of theerror term is trivial.

Environment, i.e. site, is another major contributorto crop natural variation (Berman et al. 2010;Harrigan et al. 2010; Harrison et al. 2013; Van derVoet et al. 2011; also see Table 2). A literaturesearch on GMO safety evaluation revealed a consen-sus that the EL for the GMO mean could be based onthe marginal distribution of references across sites (e.g.Berman et al. 2010; Zhou et al. 2011; Herman et al.2013; Lepping et al. 2013; Lundry et al. 2013;Privalle et al. 2013; Brink et al. 2014; Venkateshet al. 2014). Under Model (2), the marginal referencedistribution has a variance of σ2R þ σ2S þ σ2BðSÞ þ σ2E .Assessing GMO substantial equivalence can then becarried out by testing the following hypotheses.

H0 :ðμT � μRÞ2

σ2R þ σ2S þ σ2BðSÞ þ σ2E� z20:975 v:

H1 :ðμT � μRÞ2

σ2R þ σ2S þ σ2BðSÞ þ σ2E< z20:975

ð7Þ

Equivalence criteria for GMO safety evaluation 389

Page 8: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

Equivalence depicted by H1 of Hypotheses (7) isreferred to as ‘SAE-M’. Its EL for (μT− μR)

2 is

ΔSAE�MðϑSAE; σ2Þ ¼ z20:975ðσ2R þ σ2S þ σ2BðSÞ þ σ2EÞ

The SAE-M criterion complies with Van der Voet et al.(2011) by taking the 2·5th and 97·5th percentiles ofthe marginal distribution of references as the ELs forμT. It is worthwhile to cite EFSA (2010) that ‘Whenthe natural variation [σ2R] is very small or zero, andthe calculated equivalence limits are considered byexperts to have little practical relevance, externaldata may be used to establish new equivalencelimits.’ Hong et al. (2014) estimated percentiles forthe marginal distribution of references from historicaldata. When these data are large enough to ignorethe uncertainty in estimation, establishing ELs fromthis external information can be viewed as implement-ing the SAE-M criterion where the reference meanand variances are derived from historical data.However, such an approach is not as reliable asdirectly testing Hypotheses (7) from in-study refer-ences because historical data may lose relevanceover time. For log-transformed data, the SAE-Mcriterion considers GMO equivalent to referencesif the ratio of their geometric means falls

between expð�z0:975ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiσ2R þ σ2S þ σ2BðSÞ þ σ2E

qÞ and

expðz0:975ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiσ2R þ σ2S þ σ2BðSÞ þ σ2E

qÞ. When σ2R ≫ σ2Sþ

σ2BðSÞ þ σ2E , ΔSAE−M(ϑSAE, σ2)≈ ΔSAE−C(ϑSAE, σ2)≈ΔSAE−S(ϑSAE, σ

2) and Hypotheses (5–7) reconcile.Noticeably ELs of the three SAE criteria are

weighted sums of mixed model variance components.The SAE-S criterion places zero weight on σ2S þ σ2BðSÞand σ2E . It depicts equivalence in the most stringentmanner. The SAE-C criterion broadens Δ(·) byplacing positive weight on σ2E . The SAE-M criterionfurther broadens Δ(·) by assigning positive weight toσ2E and σ2S þ σ2BðSÞ. Inspection of these ELs revealed

the following inequalities.

ΔSAE�SðϑSAE; σ2Þ < ΔSAE�CðϑSAE; σ2Þ< ΔSAE�MðϑSAE; σ2Þ

This organizes the H1 parameter spaces of the SAE cri-teria according to a hierarchical order. Consequently,SAE-S implies SAE-C and SAE-C implies SAE-M (Fig. 2).

Distribution-wise equivalence

The comments from the ag-biotech industry on theEFSA guidance plead for an equivalence criterion to

take into account G × E (Ward et al. 2012). Thisprompted the present work to explore equivalence cri-teria under Model (3). Incorporating the two G × Eterms in this model requires the statistical principleof judging two products based on their entire distri-butions instead of their means only. The DWEmachinery was originally introduced to bioequiva-lence testing in order to guard against subject-by-drug interaction (Anderson & Hauck 1990; Sheiner1992; Schall & Luus 1993; Hauck & Anderson 1994;Schall 1995a). In the context of a clinical cross-overdesign, it created the individual bioequivalence andpopulation bioequivalence criteria for comparingtest and reference products based on their conditionaldistributions (at the individual subject level) and mar-ginal distributions (across study subjects), respect-ively. These two criteria were once thought tosupplant the AE criterion for in vivo bioequivalencestudies (Endrenyi & Midha 1998). As concerns onsubject-by-drug interaction attenuated over time(Endrenyi et al. 2000; Hsuan 2000; Zariffa et al.2000), the FDA dismissed DWE criteria from theirfinal guidance for in vivo bioequivalence testing(FDA 2003b). Nevertheless, the DWE machineryshould not be excluded from other areas of appli-cation. For example, FDA (2003a, c) recommends invitro studies on nasal products to implement the popu-lation bioequivalence criterion.

The usefulness of DWE does not reside only in itscapability to handle interaction terms. Dragalin et al.(2003) identified that its machinery generates equival-ence measures linearly related to the Kullback–Leibler

Fig. 2. Logical hierarchy in equivalences at ϑDWE = ϑSAE−1under Model (2).

390 C. I. Vahl and Q. Kang

Page 9: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

(KL) divergence, a well-known measure of the dis-tance between two distributions in probability andinformation theories. For instance, the commonlypracticed likelihood maximization aims at minimizingthe KL divergence between the presumed model andthe observed data (Section 13.2 of Pawitan 2001). ADWE criterion can then be interpreted, in general, asdeeming the test and reference products equivalentwhen the distance between their distributions iswithin the regulatory threshold. Because the KL diver-gence is invariant to monotone transformations ofdata, the interpretation of DWE is not restricted tothe original data scale as long as there exists a mono-tone transformation (including but not limited to thelog function) satisfying the normality assumption inthe linear mixed model. Such a feature may appeasepractitioners who are concerned about the interpret-ation of equivalence criteria on endpoints, such ascrop agronomic and phenotypic characteristics, thatare not traditionally modelled by log-normal distri-butions. The DWE machinery uses a comparison ofreferences to themselves as the basis for the compari-son of GMO to references. Because a distributionunder the linear mixed model is fully specified bythe first and second moments, the construction of aDWE criterion automatically enlists the means (offixed effects) and variance components (of randomeffects) in the model. As demonstrated later, the result-ing equivalence criteria incorporate G × E terms in alogical manner and unify with SAE criteria in the sim-plified case.First consider the most complex case where a DWE

criterion is constructed to compare the conditionaldistributions of GMO and references at the site andblock level. Equivalence depicted by this criterion isreferred to as ‘DWE-C’. Suppose that YT is a randomobservation from the GMO distribution; YR and YR′are two random observations from the reference distri-bution; YR, YR′ and YT are all from the same site andblock. Assessing DWE-C involves comparing theexpected squared distance between YT and YR, i.e.E((YT− YR)

2), with the expected squared distancebetween YR and YR′, i.e. E((YR− YR′)

2). ObviouslyE((YT− YR)

2) should not be much greater than E((YR−YR′)

2) if GMO is similar to references. The expectedsquared distances are expressed in terms of par-ameters in Model (3) by the following equations.

EððYT � YRÞ2Þ ¼ ðμT � μRÞ2 þ σ2R þ σ2R×S þ 2σ2G×S þ 2σ2E

EððYR � YR0 Þ2Þ ¼ 2σ2R þ 2σ2R×S þ 2σ2E

The DWE-C measure is defined as the scaled differ-ence of expected squared distances, i.e.

EððYT � YRÞ2Þ � EððYR � YR0 Þ2ÞEððYR � YR0 Þ2Þ=2

¼ ðμT � μRÞ2 � σ2R � σ2R×S þ 2σ2G×S

σ2R þ σ2R×S þ σ2E

The hypotheses for DWE-C are given by

H0 :ðμT � μRÞ2 � σ2R � σ2R×S þ 2σ2G×S

σ2R þ σ2R×S þ σ2E� ϑDWE v:

H1 :ðμT � μRÞ2 � σ2R � σ2R×S þ 2σ2G×S

σ2R þ σ2R×S þ σ2E< ϑDWE

ð8Þ

Accordingly, the DWE-C criterion deems GMO equiv-alent to references when the distance between theirconditional distributions at the site and block level isless than ϑDWE. Hypotheses (8) can be re-expressedin the form of Hypotheses (4) with

ΔDWE�CðϑDWE;σ2Þ ¼ ðϑDWE þ 1Þðσ2R þ σ2R×SÞ � 2σ2G×S

þ ϑDWEσ2E

As anticipated, large values of σ2R and σ2E make it easierto conclude equivalence at a fixed (μT− μR)

2. It isinteresting to observe that the two variances for G ×E, σ2R×S and σ2G×S, affect DWE-C in opposite ways:ΔDWE−C(·) increases with respect to σ2R×S while itdecreases with respect to σ2G×S. Nonetheless, theserelationships are completely logical. Equivalencebetween two genotype groups (GMO and reference) atthe site and block level should be supported by smallgenotype-group-by-site interaction as well as by largereference-by-site interaction. In the absenceof any regu-latory input, the threshold for DWEwas tentatively set atϑDWE ¼ z20:975 � 1 so that ΔDWE�CðϑDWE;σ2Þ≈ z20:975σ

2R

when σ2R >>σ2R×S þ σ2G×S þ σ2E . This unites Hypotheses(8) with Hypotheses (5–7) for those compositional end-points where natural variation is dominated by σ2R.

Now, develop a DWE criterion that compares themarginal distributions of GMO and references acrosssites and blocks. Equivalence depicted by this cri-terion is referred to as ‘DWE-M’. Define YT, YR andYR′ as before except they are not taken from thesame site or block. Model (3) results in the following

Equivalence criteria for GMO safety evaluation 391

Page 10: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

expected squared distances.

EððYT � YRÞ2Þ ¼ ðμT � μRÞ2 þ σ2R þ σ2R×S þ 2σ2S

þ 2σ2BðSÞ þ 2σ2G×S þ 2σ2E

EððYR � YR0 Þ2Þ ¼ 2σ2R þ 2σ2R×S

þ 2σ2S þ 2σ2BðSÞ þ 2σ2G×S þ 2σ2E

The DWE-M measure is defined as the differencebetween E((YT− YR)

2) and E((YR− YR′)2) scaled to

E((YR− YR′)2)/2, i.e.

EððYT � YRÞ2Þ � EððYR � YR0 Þ2ÞEððYR � YR0 Þ2Þ=2

¼ ðμT � μRÞ2 � σ2R � σ2R×Sσ2R þ σ2R×S þ σ2S þ σ2BðSÞ þ σ2G×S þ σ2E

The corresponding equivalence hypotheses aregiven as

H0 :ðμT �μRÞ2�σ2R�σ2R×S

σ2Rþσ2R×Sþσ2S þσ2BðSÞ þσ2G×Sþσ2E� ϑDWE v:

H1 :ðμT �μRÞ2�σ2R�σ2R×S

σ2Rþσ2R×Sþσ2S þσ2BðSÞ þσ2G×Sþσ2E< ϑDWE

ð9Þ

The DWE-M criterion considers GMO equivalent toreferences when the distance between their marginaldistributions across sites and blocks is less thanϑDWE. After converting Hypotheses (9) intoHypotheses (4) with

ΔDWE�MðϑDWE;σ2Þ¼ ðϑDWEþ1Þðσ2Rþσ2R×SÞ

þϑDWEðσ2S þσ2BðSÞ þσ2G×Sþσ2EÞ

it becomes apparent that large variability in any of thefive random components in Model (3) leads to a lessstringent EL for (μT− μR)

2. Unlike DWE-C, equivalencebetween two genotype groups (GMO and reference)across sites and blocks is affected by genotype-group-by-site interaction and reference-by-site inter-action affects in the same direction. Setting ϑDWE ¼z20:975�1 yields ΔDWE�MðϑDWE;σ2Þ≈ z20:975σ

2R when

σ2R ≫ σ2R×Sþσ2S þσ2BðSÞ þσ2G×Sþσ2E . Hypotheses (9)

then unite with Hypotheses (5–8) for those endpointswhere natural variation is dominated by σ2R.

The super-distribution of reference genotype meanis NðμR; σ2RÞ and the super-distribution of GMO

mean has probability mass one at point μT. Refer tothe DWE based on these two super-distributions as‘DWE-S’. Its equivalence measure is

½ðμT � μRÞ2 � σ2R�=σ2R. In terms of Hypotheses (4),

ΔDWE�SðϑDWE; σ2Þ ¼ ðϑDWE þ 1Þσ2R

Setting ϑDWE ¼ z20:975 � 1 equates DWE-S with SAE-S.It was observed under Model (2) that

ΔDWE�CðϑDWE; σ2Þ ¼ ðϑDWE þ 1Þσ2R þ ϑDWEσ

2E

ΔDWE�MðϑDWE; σ2Þ

¼ ðϑDWE þ 1Þσ2R þ ϑDWEðσ2S þ σ2BðSÞ þ σ2EÞΔDWE�SðϑDWE; σ

2Þ< ΔDWE�CðϑDWE; σ2Þ

< ΔDWE�MðϑDWE; σ2Þ

The hierarchical ordering between H1 parameterspaces of the three DWE criteria indicates thatDWE-S supersedes DWE-C and DWE-C supersedesDWE-M (Fig. 2).

Table 3 summarizes the SAE and DWE criteria forGMO safety evaluation. Due to the reported trivialeffects of G × E (Harrigan et al. 2013; Harrison et al.2013 and Venkatesh et al. 2014), the present workconsidered the simple case of σ2R×S ¼ σ2G×S ¼ 0.Figure 2 shows that the three SAE criteria run parallelto the three DWE criteria: SAE-S and DWE-S criteriacompare super-distributions. They assess the generalclass of ‘super-equivalence’, SAE-C and DWE-C cri-teria compare conditional distributions and henceassess ‘conditional equivalence’, and SAE-M andDWE-M compare marginal distributions and thereforeassess ‘marginal equivalence’. The SAE and DWEmachineries pave two separate paths from superequivalence to conditional and then to marginalequivalence. One possible way to envision the con-nection between these two paths is to level their ELsby assigning a common weight to σ2R, i.e. at ϑDWE =ϑSAE− 1. It turns out that there is a close resemblancebetween the DWE and SAE criteria: ΔSAE−S(·) =ΔDWE−S(·); ΔSAE−C(·) and ΔDWE−C(·) are both weightedsums of σ2R and σ2E ; ΔSAE−M(·) and ΔDWE−M(·) are bothweighted sums of σ2R and σ2S þ σ2BðSÞ þ σ2E . The distinc-

tion is in the weights assigned to σ2S þ σ2BðSÞ and σ2E . In

contrast to the two SAE criteria, the two DWE criteriaallocate lighter weights to σ2S þ σ2BðSÞ and σ2E . This is

intuitively sensible since the GMO and reference dis-tributions share the common random components ofσ2R, σ2BðSÞ while σ2R is the extra variability in the

392 C. I. Vahl and Q. Kang

Page 11: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

reference distribution. Setting ϑDWE = ϑSAE− 1 enablesthe H1 parameter spaces of the six criteria to be orga-nized according to the inequalities of

ΔSAE�SðϑSAE; σ2Þ ¼ ΔDWE�SðϑDWE; σ2Þ

< ΔDWE�CðϑDWE; σ2Þ

< ΔSAE�CðϑSAE; σ2Þ&ΔDWE�MðϑDWE; σ

2Þ< ΔSAE�MðϑSAE; σ2Þ

It is thus seen that SAE-S/DWE-S represents the moststringent level of equivalence and SAE-M is the mostrelaxed. Note that there is no definite orderingbetween SAE-C and DWE-M. Their relationship is con-trolled by the relative magnitude of σ2S þ σ2BðSÞ to σ2E .

RESULTS

The SAE and DWE criteria were investigated over therange of mixed model parameter values typicallyencountered in GMO safety evaluation. As far as it isknown, the statistical analysis by Van der Voet et al.(2011) is the only public record that documents real-

world variance component values of every singlecompositional endpoint. The exercise in the presentwork relied on their analysis results (cited inTable 4). It is important to know that the values inTable 4 produce point estimates of ELs and shouldnot be treated as known parameters when testing forequivalence. A valid statistical analysis must accountfor the uncertainty in estimating (μT− μR)

2 as well asthe EL. Appendix 2 reviews statistical methodologiesemployed in equivalence testing. The purpose of thisexercise was to examine the range of ELs for cropcompositional endpoints. The formal statistical analy-sis of this dataset is not trivial and will be publishedelsewhere.

By Property 1 of the log-normal distribution,

expð ffiffiffiffiffiffiffiffiffiΔð�Þp Þ represents the maximum allowable

GMO-over-reference ratio of geometric means. Plugthe variance component estimates of every analyte

in Table 4 into expð ffiffiffiffiffiffiffiffiffiΔð�Þp Þ. Figure 3 illustrates the

value of expð ffiffiffiffiffiffiffiffiffiΔð�Þp Þ for each equivalence criterion

(ϑSAE ¼ z20:975; ϑDWE ¼ z20:975 � 1). Note that the fivedistinct criteria numerically reconciled for many ana-lytes. This is due to the fact that σ2R was often the dom-inating source of variation. A perhaps comfortingsurprise was that the maximum allowable ratios of

Table 3. Summary of equivalence criteria for GMO safety evaluation

Equivalencecriterion Model D(T, R): equivalence measure

ϑ: regulatorythreshold for theequivalencemeasure Δ(ϑ, σ2): equivalence limit for (μT− μR)

2

SAE-S (2) ðμT � μRÞ2σ2R

z20:975 ¼ 3:84 z20:975σ2R

SAE-C (2) ðμT � μRÞ2σ2R þ σ2E

z20:975 ¼ 3:84 z20:975ðσ2R þ σ2EÞ

SAE-M (2) ðμT � μRÞ2σ2R þ σ2S þ σ2BðSÞ þ σ2E

z20:975 ¼ 3:84 z20:975ðσ2R þ σ2S þ σ2BðSÞ þ σ2EÞ

DWE-S (2) and (3) ðμT � μRÞ2 � σ2Rσ2R

z20:975 � 1 ¼ 2:84 z20:975σ2R

DWE-C (3) ðμT � μRÞ2 � σ2R � σ2R×S þ 2σ2G×S

σ2R þ σ2R×S þ σ2E

z20:975 � 1 ¼ 2:84 z20:975ðσ2R þ σ2R×SÞ � 2σ2G×S þ ðz20:975 � 1Þσ2E

DWE-C (2) ðμT � μRÞ2 � σ2Rσ2R þ σ2E

z20:975 � 1 ¼ 2:84 z20:975σ2R þ ðz20:975 � 1Þσ2E

DWE-M (3) ðμT � μRÞ2 � σ2R � σ2R×Sσ2R þ σ2G×S þ σ2S þ σ2BðSÞ þ σ2R×S þ σ2E

z20:975 � 1 ¼ 2:84 z20:975ðσ2R þ σ2R×SÞþ ðz20:975 � 1Þðσ2S þ σ2BðSÞ þ σ2G×S þ σ2EÞDWE-M

(2) ðμT � μRÞ2 � σ2Rσ2R þ σ2S þ σ2BðSÞ þ σ2E

z20:975 � 1 ¼ 2:84 z20:975σ2R þ ðz20:975 � 1Þðσ2S þ σ2BðSÞ þ σ2EÞ

Equivalence criteria for GMO safety evaluation 393

Page 12: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

Table 4. Variability of compositional endpoints in a study on GM maize (Van der Voet et al. 2011)

Variance component*

Group Analyte σ2R�104 σ2S �104 σ2BðSÞ�104 σ2E �104 CVM (%)

Amino acid Alanine 121·5 17·57 11·37 19·74 13Arginine 26·4 2·9 3·18 25·05 8Aspartic acid 77·4 5·15 5·92 13·91 10Cystine 54·4 3·05 10·14 18·62 9Glutamic acid 147·4 17·69 14·7 22·6 14Glycine 24·3 2·3 4·1 13·27 7Histidine 65·6 0 7·05 19·49 10Isoleucine 128·9 1·31 15·1 28·9 13Leucine 185·1 41·43 17·88 26·88 17Lysine 7·4 1·8 0 22·59 6Methionine 125·5 19·74 17·86 30·79 14Phenylalanine 145 22·08 12·55 19·75 14Proline 137·2 0 20·04 25·92 14Serine 90·1 33·57 13·9 23·31 13Threonine 62·1 0 7·73 23·62 10Tryptophan 17·6 3·33 3·18 44·06 8Tyrosine 112·4 32·16 6·12 147·4 17Valine 79·3 0 9·22 20·64 10

Fatty acid 16 : 0 Palmitic acid 157·3 35·97 0 38·31 1516 : 1 Palmitoleic acid 40 98·2 0 8·73 1218 : 0 Stearic acid 274 7·01 8·14 55·4 1918 : 1 Oleic acid 330·5 2·98 3·07 34·84 1918 : 2 Linoleic acid 118·9 24·07 9·09 49·23 1418 : 3 Linolenic acid 133·6 31·34 8·8 46·58 1520 : 0 Arachidic acid 103·7 15·17 8·02 34·84 1320 : 1 Eicosenoic acid 184·6 35·02 6·42 57·08 1722 : 0 Behenic acid 107·5 29·56 4·6 25·57 13

Proximate and fibre Acid detergent fibre 71 3·55 0 270·63 19Ash 0 58·99 4·55 109·76 13Carbohydrates 0·6 1·77 0·006 0·41 2Moisture 9·8 70·57 0 6·93 9Neutral detergent fibre 40·9 17·5 16·13 153·29 15Protein 87·8 4·37 7·13 13·14 11Total detergent fibre 54·5 25·2 9·49 158·22 16Total fat 102·4 17·44 3·82 22·28 12

Mineral Calcium 255·6 122·44 0 42·53 21Copper 389·6 0 0 91·64 22Iron 144·5 121·5 0 93·86 19Magnesium 67·2 3·45 0·45 18·82 10Manganese 339·1 24·48 8·85 39·24 21Phosphorus 42·7 25·69 0 20·05 9Potassium 31·3 80 0 15·03 11Zinc 136 18·51 1·92 33·64 14

Vitamin Folic acid 134·1 0 13·77 796·94 31Niacin 126·4 20·64 3·51 25·13 13Vitamin B1 84·4 1·5 6·11 31·95 11Vitamin B2 26·9 180·09 0 77·77 17Vitamin B6 94·6 92·81 2·96 28·28 15Vitamin E 261·3 3·18 1·15 96·76 19

394 C. I. Vahl and Q. Kang

Page 13: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

geometric means were not overly broadened in thesecriteria, as their values were mostly near 1·25. Thisrelates to the fact that many analytes in Table 4 hadsmall to moderate variability (Property 2 of the log-normal distribution indicates that their CVs werenear 15%). Three analytes, namely, folic acid, p-cou-maric acid and raffinose, had CVs around 30%. Ifadopting the standard used for evaluating drugs(Midha et al. 1997), these analytes fell into the‘highly variable’ category. As intended, scalingrelaxed the maximum allowable ratios in SAE-M toas large as 1·83. The CVs of arginine, glycine, lysine,tryptophan and carbohydrates were 8% or less.Scaling tightened their maximum allowable ratiosaround or below 1·11. For ash and phytic acid, σ2Rwas estimated to be zero while the estimates of σ2Sand σ2E were considerably large. In these cases, it isunreasonable to pin the maximum allowable ratio atone, as in SAE-S/DWE-S. The maximum allowableratio of the other four criteria was 1·19∼1·29 for ashand 1·35∼1·49 for phytic acid, which are more sensi-ble. The visible separation among SAE and DWE cri-teria also occurred with tyrosine, acid detergentfibre, moisture, neutral detergent fibre, total detergentfibre, iron, potassium, folic acid, vitamin B2 and raffi-nose. Referring to Table 4, these analytes all had a fairamount of variability with sizable contributions fromsite, block and/or error term variance components.Ordering of the ELs for the equivalence criteriaunder Model (2) was confirmed by these analytes aswell. Specifically it was observed that when the errorterm dominated the total variation, as in acid fibreand folic acid, the maximum allowable ratio in theSAE-S/DWE-S criterion was distinctively smaller thanin other criteria; when site was the foremost sourceof variation, as in moisture and vitamin B2, the SAE-M and DWE-M criteria had visibly higher maximumallowable ratios than the others.

DISCUSSION

The present paper introduces several equivalencecriteria for GMO safety evaluation. The SAE-S andDWE-S criteria compare GMO against referencesaccording to their super-distributions. They assesswhat is called super-equivalence, which could beunreasonably stringent when σ2R, the variabilityamong reference genotypes, is relatively small.Motivated by the concept of substantial equivalence,the SAE-C, SAE-M, DWE-C and DWE-M criteriadefine equivalence by taking into account additionalsources of natural variation. Note that their ELs areloosened when σ2E , variance for the error term, islarge. Because part of σ2E comes from variability insample processing and measurement, Hsuan (2000)cautioned that its inclusion in equivalence criteriamay reward a sloppy study with an artificially largeEL. This concern can be dismissed by the followingreasoning. It is important to include plot-to-plotvariability as this makes up part of natural variation.Wellek (2010, Section 10.2.2) speculated that astudy with poor quality not only has an excessivelyhigh σ2E but also risks inducing large bias in the esti-mation of the mean difference; it is thus unlikely fora producer to profit from trying to surreptitiouslymanipulate these tests. After all, no method of statisti-cal inference is immune to deceptive strategies indata collection. It is the responsibility of the oversee-ing regulators to guard against fraud. Enforcing com-pliance with current Good Laboratory Practices is apractical solution to prevent artificial inflation of σ2E .If values of an endpoint are difficult to quantify ana-lytically, a regulatory authority could require produ-cers to take replicates so as to improve precision.The average of the analytical replicates could thenbe treated as the response variable in the linearmixed model.

Table 4. (Cont.)

Variance component*

Group Analyte σ2R�104 σ2S �104 σ2BðSÞ�104 σ2E �104 CVM (%)

Secondary metabolite and anti-nutrient Ferulic acid 133·1 42·6 5·44 81·06 16p-coumaric acid 567·8 60·3 0·346 113·03 28Phytic acid 0 2·68 95·07 319·19 21Raffinose 427·9 177·92 0 283·93 31

CVM,ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiexpðσ2R þ σ2S þ σ2BðSÞ þ σ2EÞ � 1

q.

* Variance components in Model (2) are multiplied by 104 for ease in reading.

Equivalence criteria for GMO safety evaluation 395

Page 14: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

The SAE-C and DWE-C criteria compare GMOwithreferences according to their conditional distributionsat the site and block level. They are meant to evaluateconditional equivalence. The SAE-M and DWE-M cri-teria judge GMO against references on the basis oftheir marginal distributions across sites and blocks.They are meant to evaluate marginal equivalencewhich is less stringent than conditional equivalence.The question now becomes what class of equivalence

should be used in evaluating GMO safety. Note thatforage and grain harvested from different locations isusually bulked for shipment and/or pooled for proces-sing before release to market. From a statistical stand-point, consumers are exposed over time to a blend ofthe GM product coming from various sites. Thisdiminishes the necessity for demanding equivalencewithin individual sites. Marginal equivalence acrosssites and blocks should be sufficient. Empirical evidence

Fig. 3. Values of expð ffiffiffiffiffiffiffiffiffiΔð�Þp Þ, the maximum allowable GMO-over-reference ratio of geometric means, specified by the SAE

and DWE criteria. □, SAE-S/DWE-S; ♦, DWE-C; ◊, SAE-C; ●, DWE-M; ○, SAE-M. Dashed line represents the AE maximumallowable ratio of either 1·11 or 1·25. Values of variance components are taken from Table 4.

396 C. I. Vahl and Q. Kang

Page 15: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

consistently demonstrates the trivial effect of G × E incompositional data. Accordingly, there is little biologi-cal justification for insisting upon G × E in the safetyevaluation of a GM product for food and feed.Although marginal equivalence is more relevant to

GMO safety, the conditional equivalence criteriaturn out to be useful. It is observed that producersusually enrol eight to ten sites in their field exper-iments. Estimating the site-to-site variation, σ2S , undersuch a limited sample size is fraught with difficultyand hinders the application of the SAE-M and DWE-

M criteria. The situation with σ2R is quite different,though. Some producers have met the minimumrequirement of the EFSA guidance exactly by includ-ing only six references per study. Other producershave used as many as 22 references. Kang & Vahl(2014) demonstrated that super equivalence couldbe established with good power when a reasonablylarge number of references are allocated to sites inan incomplete partially balanced fashion. Uponfurther validation of their statistical method, con-ditional equivalence could be established from

Fig. 3. (cont.)

Equivalence criteria for GMO safety evaluation 397

Page 16: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

practical designs. Considering the limited number ofsites in current field experiments and the logical dom-inance of conditional equivalence over marginalequivalence, the SAE-C and DWE-C criteria appearto be more practical and conservative choices thanthe SAE-M and DWE-M criteria.

A recent symposium on the composition analysis ofGM crops proclaimed that ‘composition analysis ismeant to look for unintended effects and not tomeasure natural variation’ (Hoekenga et al. 2013).This statement is at odds with the concept of substan-tial equivalence where a reliable quantification ofnatural variation is crucial. It comes as a reminderthat the experimental design of a study should concen-trate on addressing specific scientific questions. If astudy is intended to establish marginal equivalence,it must contain enough sites to adequately estimateσ2S . The position of marginal equivalence at the lowend of the hierarchy also provides an incentive forproducers to furnish their studies with more sites.This is an immense improvement over the proof-of-difference approach, where producers collectingmore data are unfairly burdened with more significant,but spurious, differences to explain away. Testing forequivalence clearly offers the best of both worlds: itgrants regulators firm control over consumer risk andencourages producers to conduct scientifically rigor-ous studies. Another relevant question is whether toenforce Model (3) in order to account for potentialG × E effects. Because equivalence criteria entail esti-mation of variance components from small samples,the degree of complexity in statistical analysis is tre-mendously amplified as more random effects enterthe model (e.g. Öfversten 1993; Christensen 1996;Burch & Iyer 1997; Burch 2007). This is in contrastto traditional estimation of fixed effects where statisti-cal methods are more forgiving of an overly parame-terized model. The extra cost incurred by includingtrivial random effects may not be worthwhile. Openaccess to real data will add further clarity. Sincethere has been no evidence justifying the importanceof G × E, Model (2) and its associated equivalence cri-teria are preferable.

The SAE and DWE criteria under Model (2) posesimilar degrees of technical challenge because theyboth require inference on means as well as variances.The SAE criteria are consistent with the common beliefthat percentiles of the reference distribution define theELs for the GMO mean. They also follow the prevail-ing trend of scaling in drug studies. A DWE criterion

is more of a mathematical construct. For in vivo drugstudies, it has been labelled as a theoretical solutionto a theoretical problem (Patterson 2001; Schall &Endrenyi 2010). Wellek (2010, p. 312) commentsthat ‘…statistical BE [bioequivalence] assessment hasdeveloped into a field where more mathematicallysophisticated solutions exist than problems worth theeffort entailed in deriving solutions.’ Yet for assessingGMO equivalence, the underpinning theory forDWE yields a more interpretable formulation. TheDWE criteria resonate with the common statisticalphilosophy of controlling the KL divergence betweendistributions and are informative even if the data aretransformed by monotone functions other than log.Because their ELs assign more weight to the extra var-iance of the reference distributions, skeptics of GMOmay be more willing to accept DWE than SAE. Thepreference for one type of criteria over the othershould be determined through an open dialoguebetween the ag-biotech industry and regulatory auth-orities. This process, in part, should be based on anabundance of empirical evidence provided by real-world data.

The present paper intends to challenge traditionalthinking and stimulate further research in evaluatingGMO safety. Both EFSA (2010) and Van der Voetet al. (2011) assess equivalence via a two-step pro-cedure that replaces the fixed EL in the AE criterionwith a data-dependent statistic. Appendix 3 demon-strates the series of issues associated with theirapproach. Interestingly, similar mistakes have occurredin regulatory applications of bioequivalence and non-inferiority testing (for comments and corrections, seepp. 1866–1867 of Boddy et al. 1995, pp. 293–294 ofBerger & Hsu 1996; Hung et al. 2003). It is hopedthat the present work provides a solid statistical frame-work in support of the concept of substantial equival-ence. Further progress in GMO risk assessment is bestmade through judicious application of statisticalrigour and a transparent public debate centred onreal-world data. Given an opportunity, statisticiansare willing and able to assist regulators in the protectionof consumer safety while providing producers scientifi-cally sound yet economical decision rules.

CONCLUSION

Testing for equivalence rather than difference is theappropriate approach for establishing the substantialequivalence of a GM crop to non-GM reference

398 C. I. Vahl and Q. Kang

Page 17: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

crops with a history of safe use. A necessary first step isto define a reasonable criterion for equivalence. Thepresent work explored criteria generated by the SAEand DWE machineries. Each of the resulting criteriaaddresses one of three ordered classes of equivalence:super, conditional and marginal. Marginal equival-ence adheres most closely to the concept of substan-tial equivalence. Because conditional equivalencelogically implies marginal equivalence and is practi-cally quantifiable from current field studies, thepresent paper recommends conditional equivalenceto assess GMO while encouraging producers toimprove their design to enable testing marginal equiv-alence in the future. Contrary to concerns of the ag-biotech industry, the linear mixed model currentlyimplemented by EFSA was shown to be adequate forassessing equivalence despite its lack of G × E terms.

We gratefully appreciate the comments from thethree anonymous referees. Their input was immenselyhelpful for improving the clarity of the manuscript.

REFERENCES

ANDERSON, S. & HAUCK, W.W. (1990). Consideration of indi-vidual bioequivalence. Journal of Pharmacokinetics andBiopharmaceutics 18, 259–273.

BERGER, R. L. (1982). Multiparameter hypothesis testing andacceptance sampling. Technometrics 24, 295–300.

BERGER, R. L. & HSU, J. C. (1996). Bioequivalence trials, inter-section-union tests and equivalence confidence sets.Statistical Science 11, 283–302.

BERMAN, K. H., HARRIGAN, G. G., RIORDAN, S. G., NEMETH, M.A., HANSON, C., SMITH, M., SORBET, R., ZHU, E. &RIDLEY, W. P. (2010). Compositions of forage and seedfrom second-generation glyphosate-tolerant soybeanMON 89788 and insect-protected soybean MON 87701from Brazil are equivalent to those of conventionalsoybean (Glycine max). Journal of Agricultural and FoodChemistry 58, 6270–6276.

BODDY, A.W., SNIKERIS, F. C., KRINGLE, R. O., WEI, G. C. G.,OPPERMANN, J. A. & MIDHA, K. K. (1995). An approach forwidening the bioequivalence acceptance limits in thecase of highly variable drugs. Pharmaceutical Research12, 1865–1868.

BRINK, K., CHUI, C. F., CRESSMAN, R. F., GARCIA, P.,HENDERSON, N., HONG, B., MAXWELL, C. A., MEYER, K.,MICKELSON, J., STECCA, K. L., TYREE, C.W., WEBER, N.,ZENG, W. Q. & ZHONG, C. X. (2014). Molecular character-ization, compositional analysis, and germination evalu-ation of a high-oleic soybean generated by thesuppression of FAD2-1 expression. Crop Science 54,2160–2174.

BROWN, L. D., CASELLA, G. & HWANG, J. T. G. (1995). Optimalconfidence sets, bioequivalence, and the limaçon of

Pascal. Journal of the American Statistical Association90, 880–889.

BROWN, L. D., HWANG, J. T. G. & MUNK, A. (1997). Anunbiased test for the bioequivalence problem. Annals ofStatistics 25, 2345–2367.

BURCH, B. D. (2007). Generalized confidence intervals forproportions of total variance in mixed linear models.Journal of Statistical Planning and Inference 137, 2394–2404.

BURCH, B. D. & IYER, H. K. (1997). Exact confidence intervalsfor a variance ratio (or heritability) in a mixed linearmodel. Biometrics 53, 1318–1333.

CARRASCO, J. L. & JOVER, L. (2003). Assessing individual bioe-quivalence using the structural equation model. Statisticsin Medicine 22, 901–912.

CASELLA, G. & BERGER, R. L. (2002). Statistical Inference, 2ndedn. Pacific Gove, CA, USA: Duxbery.

CHEN, M. L., PATNAIK, R., HAUCK, W.W., SCHUIRMANN, D. J.,HYSLOP, T. & WILLIAMS, R. (2000). An individual bioequiva-lence criterion: regulatory considerations. Statistics inMedicine 19, 2821–2842.

CHERVONEVA, I., HYSLOP, T. & HAUCK, W.W. (2007). A multi-variate test for population bioequivalence. Statistics inMedicine 26, 1208–1223.

CHINCHILLI, V. M. (1996). The assessment of individual andpopulation bioequivalence. Journal of BiopharmaceuticalStatistics 6, 1–14.

CHIU, S. T., TSAI, P. Y. & LIU, J. P. (2010). Statistical evaluationof non-profile analyses for the in vitro bioequivalence.Journal of Chemometrics 24, 617–625.

CHOW, S. C., SHAO, J. & WANG, H. S. (2003a). Statistical testsfor population bioequivalence. Statistica Sinica 13, 539–554.

CHOW, S. C., SHAO, J. &WANG, H. S. (2003b). In vitro bioequi-valence testing. Statistics in Medicine 22, 55–68.

CHOW, S. C., HSIEH, T. C., CHI, E. & YANG, J. (2010). A com-parison of moment-based and probability-based criteriafor assessment of follow-on biologics. Journal ofBiopharmaceutical Statistics 20, 31–45.

CHOW, S. C., ENDRENYI, L. & LACHENBRUNCH, P. A. (2013).Comments on the FDA draft guidance on biosimilar pro-ducts. Statistics in Medicine 32, 364–369.

CHRISTENSEN, R. (1996). Exact tests for variance components.Biometrics 52, 309–314.

Codex Alimentarius Commission (2009). Foods Derivedfrom Modern Biotechnology, 2nd edn. Joint FAO/WHOFood Standards Programme. Rome: FAO & WHO.Available from: http://www.fao.org/docrep/011/a1554e/a1554e00.htm (verified 2 January 2015).

DAVIT, B. M., CHEN, M. L., CONNER, D. P., HAIDAR, S. H.,KIM, S., LEE, C. H., LIONBERGER, R. A., MAKLOUF, F. T.,NWAKAMA, P. E., PATEL, D. T., SCHUIRMANN, D. J. & YU, L. X.(2012). Implementation of a reference-scaled averagebioequivalence approach for highly variable genericdrug products by the US Food and Drug Administration.The AAPS Journal 14, 915–924.

Dow AgroSciences (2011). Petition for Determination ofNonregulated Status for Herbicide Tolerant DAS-444⊘6-6 Soybean. USDA-APHIS Petition No. 11-234-01p.Indianapolis, IN, USA: Dow AgroSciences LLC.

Equivalence criteria for GMO safety evaluation 399

Page 18: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

Available from: http://www.aphis.usda.gov/brs/aphisdocs/11_23401p.pdf (accessed January 2015).

Dow AgroSciences (2012). Petition for Determination ofNonregulated Status for Insect-Resistant DAS-81419-2Soybean. USDA-APHIS Petition No. 12-272-01p.Indianapolis, IN, USA: Dow AgroSciences LLC. Availablefrom: http://www.aphis.usda.gov/brs/aphisdocs/12_27201p.pdf (accessed January 2015).

Dow AgroSciences (2013). Petition for Determination ofNonregulated Status for Herbicide Tolerant DAS-8191⊘−7 Cotton. USDA-APHIS Petition No. 13-262-01p.Indianapolis, IN, USA: Dow AgroSciences LLC.Available from: http://www.aphis.usda.gov/biotechnology/petitions_table_pending.shtml (accessed 2 January2015).

DRAGALIN, V., FEDOROV, V., PATTERSON, S. & JONES, B. (2003).Kullback–Leibler divergence for evaluating bioequiva-lence. Statistics in Medicine 22, 913–930.

EFSA (2010). Scientific opinion on statistical considerationsfor the safetyevaluationofGMOs.EFSApanelongeneticallymodified organisms (GMO). EFSA Journal 8, 1250. doi:10.2903/j.efsa.2010.1250. Available from: http://www.efsa.europa.eu/en/efsajournal/pub/1250.htm (accessed 2January 2015).

EMA (2010). Guideline on the Investigation ofBioequivalence. CPMP/EWP/QWP/1401/98 Rev. 1/ Corr.London: Committee for Medicinal Products for Human Use,European Medicines Agency. Available from: http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500003011.pdf (accessed 2 January2015).

ENDRENYI, L. & MIDHA, K. K. (1998). Individual bioequiva-lence – has its time come? European Journal ofPharmaceutical Sciences 6, 271–277.

ENDRENYI, L., TABACK, N. & TOTHFALUSI, L. (2000). Properties ofthe estimated variance component for subject-by-formu-lation interaction in studies of individual bioequivalence.Statistics in Medicine 19, 2867–2878.

ESINHART, J. D. & CHINCHILLI, V. M. (1994). Extension to the useof tolerance intervals for the assessment of individualbioequivalence. Journal of Biopharmaceutical Statistics4, 39–52.

FAO/WHO (1996). Joint FAO/WHO Expert Consultation onBiotechnology and Food Safety. FAO Food and NutritionPaper No. 61. Rome: FAO. Available from: http://www.fao.org/ag/agn/food/pdf/biotechnology.pdf (accessed 2January 2015).

FDA (2001). Guidance for Industry: Statistical Approachesto Establishing Bioequivalence. Silver Spring, MD, USA:US Department of Health & Human Services, FDA.Available from: http://www.fda.gov/downloads/Drugs/Guidances/ucm070244.pdf (accessed 2 January 2015).

FDA (2003a). Guidance for Industry: Bioavailability andBioequivalence Studies for Nasal Aerosols and NasalSprays for Local Action. Silver Spring, MD, USA: USDepartment of Health & Human Services, FDA.Available from: http://www.fda.gov/OHRMS/DOCKETS/98fr/99d-1738-gdl0002.pdf (accessed 2 January 2015).

FDA (2003b). Guidance for Industry: Bioavailability andBioequivalence Studies for Orally Administered Drug

Products – General Considerations. Silver Spring, MD,USA: US Department of Health & Human Services,FDA. Available from: http://www.fda.gov/downloads/Drugs/.../Guidances/ucm070124.pdf (accessed 2 January2015).

FDA (2003c). Statistical Information from the June 1999Draft Guidance and Statistical Information for in vitroBioequivalence Data Posted on August 18, 1999. SilverSpring, MD, USA: US Department of Health & HumanServices, FDA. Available from: http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/ucm070118.pdf (accessed 2 January 2015).

FREITAG, G., CZADO, C. & MUNK, A. (2007). A nonparametrictest for similarity of marginals – with applications to theassessment of population bioequivalence. Journal ofStatistical Planning and Inference 137, 697–711.

GOULD, A. L. (2000). A practical approach for evaluatingpopulation and individual bioequivalence. Statistics inMedicine 19, 2721–2740.

GRAYBILL, F. A. & WANG, C. M. (1980). Confidence intervalson nonnegative linear combinations of variances.Journal of the American Statistical Association 75, 869–873.

HAIDAR, S. H., MAKHLOUF, F., SCHUIRMANN, D. J., HYSLOP, T.,DAVIT, B., CONNER, D. & YU, L. X. (2008). Evaluation of ascaling approach for the bioequivalence of highly variabledrugs. The AAPS Journal 10, 450–454.

HARRIGAN, G. G., STORK, L. G., RIORDAN, S. G., REYNOLDS, T. L.,RIDLEY, W. P., MASUCCI, J. D., MACISAAC, S., HALLS, S. C.,ORTH, R., SMITH, R. G., WEN, L., BROWN, W. E.,WELSCH, M., RILEY, R., MCFARLAND, D., PANDRAVADA, A. &GLENN, K. C. (2007). Impact of genetics and environmenton nutritional and metabolite components of maizegrain. Journal of Agricultural and Food Chemistry 55,6177–6185.

HARRIGAN, G. G., GLENN, K. C. & RIDLEY, W. P. (2010).Assessing the natural variability in crop composition.Regulatory Toxicology and Pharmacology 58 (Suppl. 1),S13–S20.

HARRIGAN, G. G., CULLER, A. H., CULLER, M., BREEZE, M. L.,BERMAN, K. H., HALLS, S. C. & HARRISON, J. M. (2013).Investigation of biochemical diversity in a soybeanlineage representing 35 years of breeding. Journal ofAgricultural and Food Chemistry 61, 10807–10815.

HARRISON, J. M., HOWARD, D., MALVEN, M., HALLS, S. C.,CULLER, A. H., HARRIGAN, G. G. & WOLFINGER, R. D. (2013).Principle variance component analysis of crop compo-sition data: a case study on herbicide-tolerant cotton.Journal of Agricultural and Food Chemistry 61, 6412–6422.

HAUCK, W.W. & ANDERSON, S. (1994). Measuring switchabil-ity and prescribability: when is average bioequivalencesufficient? Journal of Pharmacokinetics and Biopharma-ceutics 22, 551–564.

HAUCK, W.W., CHEN, M. L., HYSLOP, T., PATNAIK, R.,SCHUIRMANN, D. & WILLIAMS, R. (1996). Mean differencevs. variability reduction: tradeoffs in aggregatemeasures for individual bioequivalence. InternationalJournal of Clinical Pharmacology and Therapeutics 34,535–541.

400 C. I. Vahl and Q. Kang

Page 19: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

HAUCK, W.W., BOIS, F. Y., HYSLOP, T., GEE, L. & ANDERSON, S.(1997). A parametric approach to population bioequiva-lence. Statistics in Medicine 16, 441–454.

HERMAN, R. A., FAST, B. J., JOHNSON, T. Y., SABBATINI, J. &RUDGERS, G.W. (2013). Compositional safety of herbi-cide-tolerant DAS-8191⊘−7 cotton. Journal ofAgricultural and Food Chemistry 61, 11683–11692.

HOEKENGA, O. A., SRINIVASAN, J., BARRY, G. &BARTHOLOMAEUS, A. (2013). Compositional analysis ofgenetically modified (GM) crops: key issues and futureneeds. Journal of Agricultural and Food Chemistry 61,8248–8253.

HOLDER, D. J. & HSUAN, F. (1993). Moment-based criteria fordetermining bioequivalence. Biometrika 80, 835–846.

HONG, B., FISHER, T. L., SULT, T. S., MAXWELL, C. A.,MICKELSON, J. A., KISHINO, H. & LOCKE, M. E. H. (2014).Model-based tolerance intervals derived from cumulativehistorical composition data: application for substantialequivalence assessment of a genetically modified crop.Journal of Agricultural and Food Chemistry 62, 9916–9926.

HOTHORN, L. A. & OBERDOERFER, R. (2006). Statistical analysisused in the nutritional assessment of novel food using theproof of safety. Regulatory Toxicology and Pharmacology44, 125–135.

HOWE, W. G. (1974). Approximate confidence limits on themean of X + Y where X and Y are two tabled independentrandom variables. Journal of the American StatisticalAssociation 69, 789–794.

HSUAN, F. C. (2000). Some statistical considerations on theFDA draft guidance for individual bioequivalence.Statistics in Medicine 19, 2879–2884.

HUNG, H.M. J., WANG, S. J., TSONG, Y., LAWRENCE, J. &O’NEIL, R. T. (2003). Some fundamental issues with non-inferiority testing in active controlled trials. Statistics inMedicine 22, 213–225.

HYSLOP, T., HSUAN, F. & HOLDER, D. J. (2000). A small sampleconfidence interval approach to assess individual bioe-quivalence. Statistics in Medicine 19, 2885–2897.

KANG, Q. & VAHL, C. I. (2014). Statistical analysis in the safetyevaluation of genetically modified crops: equivalencetests. Crop Science 54, 2183–2200.

KANG, S. H. & CHOW, S. C. (2013). Statistical assessment ofbiosimilarity based on relative distance between follow-on biologics. Statistics in Medicine 32, 382–392.

KIMANANI, E. K. (2000). Definition of individual bioequiva-lence: occasion-to-occasion versus mean switchability.Statistics in Medicine 19, 2797–2810.

KÖNIG, A., COCKBURN, A., CREVEL, R.W. R., DEBRUYNE, E.,GRAFSTROEM, R., HAMMERLING, U., KIMBER, I., KNUDSEN, I.,KUIPER, H. A., PEIJNENBURG, A. A. C. M., PENNINKS, A. H.,POULSEN, M., SCHAUZA, M. & WAL, J. M. (2004).Assessment of the safety of foods derived from geneticallymodified (GM) crops. Food and Chemical Toxicology 42,1047–1088.

KRISHNAMOORTHY, K. & MATHEW, T. (2004). One-sided toler-ance limits in balanced and unbalanced one-wayrandom models based on generalized confidence inter-vals. Technometrics 46, 44–52.

KRISHNAMOORTHY, K. & MATHEW, T. (2009). StatisticalTolerance Regions: Theory, Applications, and Computation.New York: Wiley.

LEE, Y. H., SHAO, J. & CHOW, S. C. (2004). Modified large-sample confidence intervals for linear combinations ofvariance components: extension, theory, and application.Journal of the American Statistical Association 99,467–478.

LEHMANN, E. L. & ROMANO, J. P. (2005). Testing StatisticalHypotheses, 3rd edn. New York: Springer.

LEPPING, M. D., HERMAN, R. A. & POTTS, B. L. (2013).Compositional equivalence of DAS-444⊘6-6 (AAD-12+ 2mEPSPS + PAT) herbicide-tolerant soybean and non-transgenic soybean. Journal of Agricultural and FoodChemistry 61, 11180–11190.

LIAO, C. T., LIN, T. Y. & IYER, H. K. (2005). One- and two-sidedtolerance intervals for general balanced mixed modelsand unbalanced one-way randommodels. Technometrics47, 323–335.

LIU, J. P. & CHOW, S. C. (1997). A two one-sided tests pro-cedure for assessment of individual bioequivalence.Journal of Biopharmaceutical Statistics 7, 49–61.

LUNDRY, D. R., BURNS, J. A., NEMETH, M. A. & RIORDAN, S. G.(2013). Composition of grain and forage from insect-protected and herbicide-tolerant corn, MON 89034 ×TC 1507 ×MON 88017 ×DAS-59122-7 (SmartStax), isequivalent to that of conventional corn (Zea mays L.).Journal of Agricultural and Food Chemistry 61, 1991–1998.

MCNALLY, R. J., IYER, H. &MATHEW, T. (2003). Tests for individ-ual and population bioequivalence based on generalizedp-values. Statistics in Medicine 22, 31–53.

MIDHA, K. K., RAWSON, M. J. & HUBBARD, J. W. (1997).Individual and average bioequivalence of highly variabledrugs and drug products. Journal of PharmaceuticalSciences 86, 1193–1197.

Monsanto (2011). Petition for the Determination ofNonregulated Status for MON 87712 Soybean. USDA-APHIS Petition No. 11-202-01p. St Louis, MO, USA:Monsanto. Available from: http://www.aphis.usda.gov/brs/aphisdocs/11_20201p.pdf (accessed January 2015).

Monsanto (2012). Petition for Determination ofNonregulated Status for Dicamba and Glufosinate-toler-ant Cotton MON 887Æ1. USDA-APHIS Petition No. 12-185-01p. St Louis, MO, USA: Monsanto. Available from:http://www.aphis.usda.gov/brs/aphisdocs/12_18501p.pdf(accessed January 2015).

Monsanto (2013). Petition for Determination ofNonregulated Status for Corn Rootworm Protected andGlyphosate Tolerant MON 87411 Maize. USDA-APHISPetition No. 13-290-01p. St. Louis, MO, USA: Monsanto.Available from: http://www.aphis.usda.gov/brs/aphisdocs/13_29001p.pdf (accessed 2 January 2015).

MUNK, A. (1996). Equivalence and interval testing for Leh-mann’s alternative. Journal of the American StatisticalAssociation 91, 1187–1196.

MUNK, A. & PFLÜGER, R. (1999). 1-α equivariant confidencerules for convex alternatives are α/2-level tests – withapplications to the multivariate assessment of

Equivalence criteria for GMO safety evaluation 401

Page 20: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

bioequivalence. Journal of the American StatisticalAssociation 94, 1311–1319.

OBERDOERFER, R. B., SHILLITO, R. D., BEUCKELEER, M. D. &MITTEN, D. H. (2005). Rice (Oryza sativa L.) containingthe bar gene is compositionally equivalent to the non-transgenic counterpart. Journal of Agricultural and FoodChemistry 53, 1457–1465.

OECD (1993). Safety Evaluation of Foods Derived byModernBiotechnology: Concepts and Principles. Paris, France:OECD. Available from: http://dbtbiosafety.nic.in/guideline/OACD/Concepts_and_Principles_1993.pdf (accessed 2January 2015).

ÖFVERSTEN, J. (1993). Exact tests for variance componentsin unbalanced mixed linear models. Biometrics 49,45–57.

PATTERSON, S. (2001). A review of the development ofbiostatistical design an analysis techniques for assessingin vivo bioequivalence: part two. Indian Journal ofPharmaceutical Sciences 63, 169–186.

PAWITAN, Y. (2001). In All Likelihood: Statistical Modellingand Inference Using Likelihood. New York: OxfordUniversity Press.

PRICE, W. D. & UNDERHILL, L. (2013). Application of laws, pol-icies, and guidance from the United States and Canada tothe regulation of food and feed derived from geneticallymodified crops: interpretation of composition data.Journal of Agricultural and Food Chemistry 61, 8349–8355.

PRIVALLE, L. S., GILLIKIN, N. & WANDELT, C. (2013). Bringing atransgenic crop to market: where compositional analysisfits. Journal of Agricultural and Food Chemistry 61,8260–8266.

QUAN, H., BOLOGNESE, J. & YUAN, W. Y. (2001). Assessment ofequivalence on multiple endpoints. Statistics in Medicine20, 3159–3173.

QUIROZ, J., TING, N., WEI, G. C. G. & BURDICK, R. K. (2002).Alternative confidence intervals for the assessment ofbioequivalence in four-period cross-over designs.Statistics in Medicine 21, 1825–1847.

SCHALL, R. (1995a). A unified view of individual, populationand average bioequivalence. In Bio-International2. Bioavailability, Bioequivalence and PharmacokineticStudies (Eds H. H. Blume & K. K. Midha), pp. 91–106.Stuttgart, Germany: Medpharm Scientific Publication.

SCHALL, R. (1995b). Assessment of individual and populationbioequivalence using the probability that bioavailabilitiesare similar. Biometrics 51, 615–626.

SCHALL, R. & ENDRENYI, L. (2010). Bioequivalence: tried andtested. Cardiovascular Journal of Africa 21, 69–71.

SCHALL, R. & LUUS, H. G. (1993). On population and individ-ual bioequivalence. Statistics in Medicine 12, 1109–1124.

SCHALL, R. & WILLIAMS, R. L. (1996). Towards a practicalstrategy for assessing individual bioequivalence.Journal of Pharmacokinetics and Biopharmaceutics 24,133–149.

SCHUIRMANN, D. J. (1987). A comparison of the two one-sidedtests procedure and the power approach for assessing theequivalence of average bioavailability. Journal ofPharmacokinetics and Biopharmaceutics 15, 657–680.

SHAO, J., CHOW, S. C. & WANG, B. (2000). The bootstrap pro-cedure in individual bioequivalence. Statistics inMedicine 19, 2741–2754.

SHEINER, L. B. (1992). Bioequivalence revisited. Statistics inMedicine 11, 1777–1788.

SHEWRY, P. R., HAWKESFORD, M. J., PIIRONEN, V., LAMPI, A. M.,GEBRUERS, K., BOROS, D., ANDERSSON, A. A. M., ÅMAN, P.,RAKSZEGI, M., BEDO, Z. & WARD, J. L. (2013). Natural vari-ation in grain composition of wheat and related cereals.Journal of Agricultural and Food Chemistry 61, 8295–8303.

Syngenta Seeds & Bayer CropScience AG (2012). RevisedPetition for Determination of Nonregulated Status forHerbicide-tolerant Event SYHT0H2 Soybean. USDA-APHIS Petition No. 12-215-01p. Research Triangle Park,NC, USA: Syngenta Seeds, INC. & Bayer CropScienceAG. Available from: http://www.aphis.usda.gov/brs/aphisdocs/12_21501p.pdf (accessed 2 January 2015).

TING, N. T., BURDICK, R. K., GRAYBILL, F. A., JEYARATNAM, S. &LU, T. F. C. (1990). Confidence interval on linear combi-nations of variance components that are unrestricted insign. Journal of Statistical Computation and Simulation35, 135–143.

TOUTAIN, P. L. & KORITZ, G. D. (1997). Veterinary drug bioe-quivalence determination. Journal of VeterinaryPharmacology and Therapeutics 20, 79–90.

TSUI, K.W. &WEERAHANDI, S. (1989). Generalized p-values insignificance testing of hypotheses in the presence of nui-sance parameters. Journal of the American StatisticalAssociation 84, 602–607.

VAN DER VOET, H., PERRY, J. N., AMZAL, B. & PAOLETTI, C. (2011).A statistical assessment of differences and equivalencesbetween genetically modified and reference plant var-ieties. BMC Biotechnology 11, 15. doi:10.1186/1472-6750-11-15.

VAN DER VOET, H., PERRY, J. N., AMZAL, B. & PAOLETTI, C. (2012).Response to comments on the paper ‘A statistical assess-ment of differences and equivalences between geneticallymodified and reference plant varieties’ by van der Voetet al. 2011. BMC Biotechnology 12, 13. doi:10.1186/1472-6750-12-13.

VENKATESH, T. V., BREEZE, M. L., LIU, K., HARRIGAN, G. G. &CULLER, A. H. (2014). Compositional analysis of grainand forage from MON87427, an inducible male sterileand tissue selective glyphosate-tolerant maize productfor hybrid seed production. Journal of Agricultural andFood Chemistry 62, 1964–1973.

VOURINEN, J. & TURUNEN, J. (1996). A three-step procedure forassessing bioequivalence in the general mixed model fra-mework. Statistics in Medicine 15, 2635–2655.

WANG, W. Z., HWANG, J. T. G. & DASGUPTA, A. (1999).Statistical tests for multivariate bioequivalence.Biometrika 86, 395–402.

WARD, K. J., NEMETH, M. A., BROWNIE, C., HONG, B.,HERMAN, R. A. & OBERDOERFER, R. (2012). Comments onthe paper ‘A statistical assessment of differences andequivalences between genetically modified and referenceplant varieties’ by van der Voet et al. (2011). BMCBiotechnology 12, 13. doi:10.1186/1472-6750-12-13.

402 C. I. Vahl and Q. Kang

Page 21: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

WEERAHANDI, S. (1991). Testing variance components inmixed models with generalized p values. Journal of theAmerican Statistical Association 86, 151–153.

WEERAHANDI, S. (1993). Generalized confidence intervals.Journal of the American Statistical Association 88, 899–905.

WELLEK, S. (1996). A new approach to equivalence assess-ment in standard comparative bioavailability trials bymeans of the Mann–Whitney statistic. BiometricalJournal 38, 695–710.

WELLEK, S. (2000). On a reasonable disaggregate criterion ofpopulation bioequivalence admitting of resampling-freetesting procedures. Statistics in Medicine 19, 2755–2767.

WELLEK, S. (2010). Testing Statistical Hypotheses ofEquivalence and Noninferiority, 2nd edn. Boca Raton,FL, USA: Chapman and Hall/CRC.

ZARIFFA, N. M. D., PATTERSON, S. D., BOYLE, D. & HYNECK, M.(2000). Case studies, practical issues and observationson population and individual bioequivalence. Statisticsin Medicine 19, 2811–2820.

ZHOU, J., BERMAN, K. H., BREEZE, M. L., NEMETH, M. A.,OLIVEIRA, W. S., BRAGA, D. P. V., BERGER, G. U. &HARRIGAN, G. G. (2011). Compositional variability in con-ventional and glyphosate-tolerant soybean (Glycine maxL.) varieties grown in different regions in Brazil. Journalof Agricultural and Food Chemistry 59, 11652–11656.

APPENDIX 1: STRATEGIES FORCONSTRUCTING EQUIVALENCE CRITERIA

Over the past two decades, a number of equivalencecriteria have been proposed to assess bioequivalence.Strategies for their construction could be cross-classi-fied according to (i) probability-v. moment-based, (ii)constant-, reference-, v. mixed-scaled, (iii) symmetricv. non-symmetric, (iv) aggregate v. disaggregate.Reviews of these strategies are available from Schall(1995a), Schall & Williams (1996) and Chen et al.(2000). Briefly, probability-based criteria are definedin terms of the probability of a random event. Forinstance, Anderson & Hauck (1990) assessed the con-ditional equivalence of a drug based on the proportionof individuals in the cross-over design whose test-over-reference ratios of responses fall into a pre-setinterval around unity. Meanwhile, Wellek (1996)and Munk (1996) evaluated bioequivalence basedon the proportion of individuals whose test-over-refer-ence ratios are less than unity. Moment-based criteriaare constructed directly from the means and variancesin the underlying linear mixed model. They weredeveloped in parallel to probability-based criteria.There is a close mathematical connection betweenprobability- and moment-based criteria (Holder &Hsuan 1993; Hauck & Anderson 1994). Because thedependence on mixed model parameters is lessstraightforward in the probability-based criteria, FDA(2001) adopts the moment-based criteria for assessingindividual and population bioequivalences. The SAEand DWE criteria introduced here are moment-based, too. Constant-scaled criteria compare thedifference between test and reference products to afixed EL. A classic example is the AE criterion.Reference-scaled criteria are created under the ideol-ogy that reference variability should be used to

standardize the ELs. FDA (2001) recommends themixed-scaling approach, which utilizes either con-stant scaling or reference scaling, depending onwhether the reference variability is less than an expli-cit value. This approach circumvents the narrowingeffect of scaling for drugs with low variability but awide therapeutic window. Regarding GMO safetyevaluation, the absence of a one-size-fits-all EL fortesting AE prevents the application of mixed scaling.

An equivalence criterion with the property of sym-metry, i.e. D(T, R) =D(R, T), might be desirable forcomparing two chemically equivalent drugs.Dragalin et al. (2003) proposed a modified versionof KL divergence in order to attain symmetry in bioe-quivalence criteria. Nonetheless, the pursuit of sym-metry triggers controversies in regulatory policies,namely, commutativity and transitivity. Note thatGMO and references belong to different genotypegroups and therefore should not be handled equally.A statistical test in this regard is expected to proveGMO is equivalent to references, not the other wayaround. There has also been no indication that anapproved GMO will join commercial non-GM cropsto serve as a reference for assessing other GMOs.Commutativity and transitivity are then not necessarywhen choosing equivalence criteria for GMO. Theequivalence measures in Hypotheses (5–9) are non-symmetric.

For each of the SAE and DWE criteria discussedhere, means and variances are aggregated into asingle criterion. Disaggregate criteria, in contrast,evaluate means and variances separately. This ideawas systematically implemented by Vourinen &Turunen (1996) through a stepwise procedure,where the first step establishes bioequivalencebetween the means of test and reference drugs, thesecond step establishes bioequivalence in variances,

Equivalence criteria for GMO safety evaluation 403

Page 22: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

and the final step assures subject-by-drug interactionis relatively small. Many ensembles of disaggregatecriteria have been proposed for evaluating bioequiva-lence (e.g. Chinchilli 1996; Vourinen & Turunen1996; Hauck et al. 1997; Gould 2000; Wellek 2000;Carrasco & Jover 2003). These criteria are easy tointerpret and omit the trade-off effect inherited bythe aggregate bioequivalence criteria, i.e. the meandifference between two drugs could be intractablyoffset by their difference in variances (Hauck et al.1996; Midha et al. 1997). Disaggregate criteria aimat assuring the test product is close to the referenceproduct in each aspect of their distributions, includingmeans, variances and covariance. This requirement isunnecessary for testing GMO because the referencedistribution is known beforehand to have more varia-bility than the GMO distribution. Criteria for assessingsubstantial equivalence are expected to take thisadditional variability into proper account. As seenfrom Fig. 3, equivalence criteria introduced here donot overly broaden the ELs for the geometric meanratio. Concern on the mean-variance tradeoff is thenmoot. Furthermore, a separate test for interaction iscounter-productive given that it is trivial relative toother sources of variation and does not appear tohave any practical implications.

Most efforts to date have been spent on creatingequivalence criteria for univariate distributions.Munk & Pflüger (1999), Wellek (2010, Section 8.2),and Chervoneva et al. (2007) proposed multivariateanalogues of AE, SAE and DWE criteria, respectively.These criteria are tailored to either paired observationsor two independent samples. Extension to morecomplex designs is currently unavailable.

APPENDIX 2: STATISTICAL ANALYSISMETHODS

Various statistical methods have been studied along-side every newly proposed criterion with respect totheir feasibility in evaluating equivalence under real-world designs. The traditional nonparametricmethod discussed by Anderson & Hauck (1990) sacri-fices efficiency for robustness and overlooks certainimportant effects in complex experimental designs. Itwas quickly supplanted by linear mixed models(Esinhart & Chinchilli 1994; Liu & Chow 1997) sincethe log-normal distribution is deemed appropriate formodelling endpoints in bioequivalence studies.When normality is in doubt or there are unexplainableoutliers, nonparametric statistics may still serve as a

last resort (a comprehensive review can be found inChapters 5 and 6 of Wellek 2010; also see Freitaget al. 2007). Within the mixed model framework, theasymptotically unbiased restricted maximum likeli-hood method tends to overestimate mixed model var-iance components and has been replaced by themethod-of-moments approach, which remainsunbiased even for small samples (Chen et al. 2000;Endrenyi et al. 2000; FDA 2001). The versatile boot-strapping approach was formerly used to estimatethe individual and population bioequivalencemeasures (Schall 1995b; Kimanani 2000; Shao et al.2000; Freitag et al. 2007). Also, Chow et al. (2003a)adopted the delta method for assessing populationbioequivalence under a collection of cross-overdesigns. The modified-large-sample method comesas the third method. It was originally developed togenerate approximate confidence intervals for linearcombinations of variance components (Howe 1974;Graybill & Wang 1980; Ting et al. 1990). Unlike thebootstrapping approach and the delta method, it isless computationally intensive and has good finite-sample performance (Hyslop et al. 2000; Quirozet al. 2002; Chow et al. 2003b; Lee et al. 2004). Themodified-large-sample method is currently the pre-ferred approach by FDA (Chen et al. 2000; FDA2001). Because this method relies on frequentistlarge-sample theory, simulation studies are neededto verify its feasibility for executing GMO equivalencecriteria. A method based on generalized inferenceoffers an alternative approach to estimate functionsof mixed model parameters (Tsui & Weerahandi1989; Weerahandi 1991, 1993; Krishnamoorthy &Mathew 2004; Liao et al. 2005; an excellent introduc-tion on generalize inference can be found in Section1.4 of Krishnamoorthy & Mathew 2009). It has beenused by McNally et al. (2003) and Chiu et al. (2010)to execute the individual and population bioequiva-lence criteria. Recently Kang & Vahl (2014) estimatedvariance components by method of moments underModel (2) and applied the generalized inferenceapproach to execute the SAE-S criterion in testingGMO. The application of generalized inference inexecuting the other four equivalence criteria awaitsfurther investigation.

Analyzing multivariate data adds more challenge toequivalence tests. EFSA (2010) recommends the‘informal’ procedure that analyses multiple endpointsone at a time and integrates the results afterwards viagraphical representation. Statistically, the task ofestablishing multivariate equivalence has been

404 C. I. Vahl and Q. Kang

Page 23: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

tackled by constructing simultaneous equivalenceintervals (Brown et al. 1995; Berger & Hsu 1996;Wang et al. 1999). It should be born in mind thatmultivariate equivalence logically demands equival-ence at every endpoint. The overall alternativehypothesis for equivalence is an intersection ofalternatives for individual endpoints whereas theoverall null hypothesis constitutes a union of individ-ual null hypotheses. By the intersection-union prin-ciple (Berger 1982; Berger & Hsu 1996), the type Ierror rate for the overall test cannot be greater thanthat of an individual test. Quan et al. (2001) demon-strated that this type I error rate can fall below thenominal level when correlations among endpointsare low. This is in reverse to the mutivariate differencetest, which entails an inflated family-wise type I errorrate. From a regulatory standpoint, testing for equival-ence deminishes the need to correct for multiplicityand provides good protection against the consumer’srisk. This conservativeness increases the burden ofproof for producers, though. Adjusting for the deflatedtype I error in the intersection-union type of hypoth-eses is mathematically sophisticated and research inthis area is limited (Schuirmann 1987; Berger & Hsu1996; Brown et al. 1997).

APPENDIX 3: THE FALLACY OF A DATA-DRIVEN EQUIVALENCE CRITERION

EFSA’s decision rule for establishing equivalence isbased on a two-step procedure proposed by Van derVoet et al. (2011). Step L of this procedure computesMEP, the margin of error for the two-sided 95% pre-diction interval of μT− μR− R under Model (2). StepE computesMEP, the margin of error for the 90% con-fidence interval of μT− μR. Denote the estimate of μT− μR as d̂. The two-step procedure concludes equival-

ence at the 0.05 significant level when d̂ �MEC >

�MEP and d̂ þMEC <MEP. Van der Voet et al.(2011) justify this procedure under a completelybalanced design by noting that

MEP ¼ t0:975;dfP

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiσ̂2R þ σ̂2R

nRþ σ̂2EnSnBðSÞ

þ σ̂2EnRnSnBðSÞ

s;

MEC ¼ t0:95;dfC

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiσ̂2RnR

þ σ̂2EnSnBðSÞ

þ σ̂2EnRnSnBðSÞ

s:

Constant t0:975;dfP represents the 97·5th percentile ofthe t distribution with degrees of freedom dfP. TermsdfP and dfC are computed from their respective sum

of weighted variances under the square root.Notation ’^’ indicates the variance component is esti-mated. The resemblance ofMEP2 to z20:975σ

2R prompted

them to test AE with the following data-driven ELs.

Δðt0:975;d fP ; σ̂2Þ ¼ MEP2

Note that MEP2 > z20:975σ̂2R because (1) t0:975;dfP >

z0:975 and (2) the term under the square root of MEPis always greater than σ̂2R. Van der Voet et al. (2011)refer to MEP2 as the ‘outer’ confidence limit of theEL for (μT− μR)

2.Kang & Vahl (2014) pointed out a series of issues

associated with the two-step procedure. First, usingthe outer limit of the EL inflates the type I error rateof the testing procedure for a given set of hypotheses.In fact, existing statistical methods for testing equival-ence control their type I error rates by plugging in the‘inner’ confidence limit of the EL (Hyslop et al. 2000;Kang & Vahl 2014). Second, the two-step proceduredoes not properly account for variability in estimatingσ2R. The term σ̂2R=nR þ σ̂2E=ðnSnBðSÞÞ þ σ̂2E=ðnRnSnBðSÞÞcorresponds to the variability in estimating μT− μR.Its double inclusion in the decision rule (i.e. in bothMEP and MEC) is logically indefensible. Third,exchanging a z percentile to a t percentile does notadequately address the reference natural variation.Rather, it introduces ambiguity and causes the actualthreshold to be higher than the intended level. Kang& Vahl (2014) shows that the two-step procedureuses 0·5th and 99·5th percentiles of the reference dis-tribution (assumed by Model (2) to be normal) whennR = 6 and 1·8th and 98·2th percentiles when nR = 21.

It was argued by Van der Voet et al. (2011) that ‘thenull hypothesis value of the test is only establishedafter step L’ and ‘statistical properties of such testscan only be defined conditionally on the outcome ofstep L’. Contrary to this misconception, statistical pro-cedures involving conditioning have always beenstudied with respect to their unconditional perform-ance. Theoretical discussion of improper conditioningis provided in Section 10.4 of Lehmann & Romano(2005). Real-world examples can be found inpp. 293–294 of Berger & Hsu (1996) as well asHung et al. (2003). Even though Van der Voet et al.(2011) never explicitly defined the equivalence cri-terion in terms of parameters in Model (2), they inves-tigated the type I error under Hypotheses (4) with

Δðϑ; σ2Þ ¼ z20:975 σ2R þσ2RnR

þ σ2EnSnBðSÞ

þ σ2EnRnSnBðSÞ

� �:ðA1Þ

Equivalence criteria for GMO safety evaluation 405

Page 24: CROPSANDSOILSRESEARCHPAPER Equivalence criteria for the ...vahl/Vahl_Kang_GMO_Equiv_Crit_2016.pdf · extensive safety evaluation on genetically modified (GM) crops before permitting

However, their simulation study altered the two-stepprocedure by replacing t0:975;dfP in MEP with z0.975and replacing the estimated variances (σ̂2R and σ̂2E ) bytrue simulation settings. The reported nominal levelof this modified two-step procedure does not justifythe statistical properties of their original procedurewhen applied to data with unknown variances. Kang& Vahl (2014) demonstrates that the original two-step procedure carries a type I error rate of 0·353under the simulation setting of Van der Voet et al.(2011).

Disregarding the statistical properties of the two-step procedure, it may be tempting to retain Formula(A1) as an EL in assessing substantial equivalence.An important reminder to this end is that hypothesesare made with respect to the target population(s)(Chapter 8 of Casella & Berger 2002; Lehmann &Romano 2005). In the case of GMO safety evaluation,GMO and reference populations are of interest.Sample sizes have nothing to do with these two popu-lations and should not appear in the hypotheses. A dis-turbing consequence of Formula (A1) is that producersconducting studies with more references will be pun-ished with a lower EL. Without loss of generality, con-sider the situation where σ2R ≫ σ2S þ σ2BðSÞ þ σ2E .Formula (A1) simplifies to Δðϑ; σ2Þ ≈ ð1þ 1=nRÞz20:975σ

2R. Suppose there exists a valid equivalence

testing procedure for assessing (μT − μR)2 against the

EL of ð1þ 1=nRÞz20:975σ2R. That is, its type I errorrate is no greater than the 0·05 nominal

level when ðμT � μRÞ2 � ð1þ 1=nRÞz20:975σ2R and its

power is greater than 0·05 when ðμT � μRÞ2<ð1þ 1=nRÞz20:975σ2R. Let two producers, A and B, evalu-ate the same GM product whose true squared meandeviation from μR is ð13=12Þz20:975σ2R. Producer Aincludes six references in his field design (nR = 6)and Producer B uses 12 references (nR = 12).Because ð13=12Þz20:975σ2R is at the boundary of theparameter space of H0 tested by Producer B, he willconclude equivalence with 0·05 probability(nominal type I error rate = 0·05). In contrast, thetrue squared mean difference is within the parameterspace of H1 tested by Producer A sinceð13=12Þz20:975σ2R < ð7=6Þz20:975σ2R. He will concludeequivalence with greater than 0·05 probability(power > 0·05). Producer B with a larger sample sizeis therefore unfairly penalized by a more stringent ELthan the one used by Producer A.

In summary, variability of an estimator should notbe folded into the EL. Rather, its impact should be les-sened by implementing efficient experimental designsin conjunction with unbiased and powerful testingprocedures. The ELs of the SAE-C and SAE-M criteriarepresent better ways to account for natural variationin the reference population.

406 C. I. Vahl and Q. Kang


Recommended