+ All Categories
Home > Documents > Statistical methods for the analysis of bioassay dataThis work was financially supported by the...

Statistical methods for the analysis of bioassay dataThis work was financially supported by the...

Date post: 27-Feb-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
175
Statistical methods for the analysis of bioassay data Citation for published version (APA): Mzolo, T. V. (2016). Statistical methods for the analysis of bioassay data. Eindhoven: Technische Universiteit Eindhoven. Document status and date: Published: 19/04/2016 Document Version: Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication: • A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement: www.tue.nl/taverne Take down policy If you believe that this document breaches copyright please contact us at: [email protected] providing details and we will investigate your claim. Download date: 02. Mar. 2020
Transcript
Page 1: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

Statistical methods for the analysis of bioassay data

Citation for published version (APA):Mzolo, T. V. (2016). Statistical methods for the analysis of bioassay data. Eindhoven: Technische UniversiteitEindhoven.

Document status and date:Published: 19/04/2016

Document Version:Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can beimportant differences between the submitted version and the official published version of record. Peopleinterested in the research are advised to contact the author for the final version of the publication, or visit theDOI to the publisher's website.• The final author version and the galley proof are versions of the publication after peer review.• The final published version features the final layout of the paper including the volume, issue and pagenumbers.Link to publication

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, pleasefollow below link for the End User Agreement:

www.tue.nl/taverne

Take down policyIf you believe that this document breaches copyright please contact us at:

[email protected]

providing details and we will investigate your claim.

Download date: 02. Mar. 2020

Page 2: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

STATISTICAL METHODS FOR THEANALYSIS OF BIOASSAY DATA

THEMBILE VIRGINIA MZOLO

Page 3: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

This work was financially supported by the pharmaceutical company MSD based in Oss, TheNetherlands.

c© Thembile Virginia Mzolo, 2016Statistical Methods for the Analysis of Bioassay Data by T. Mzolo - Eindhoven University ofTechnology, 2016

A catalogue record is available from the Eindhoven University of Technology LibraryISBN: 978-90-386-4051-8

Printed by Gildeprint, Enschede.

Page 4: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

Statistical Methods for the Analysis of Bioassay Data

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan deTechnische Universiteit Eindhoven, op gezag van derector magnificus prof.dr.ir. F.P.T. Baaijens, voor een

commissie aangewezen door het College voorPromoties, in het openbaar te verdedigenop dinsdag 19 april 2016 om 16:00 uur

door

Thembile Virginia Mzolo

geboren te Othulini, Zuid-Afrika

Page 5: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

Dit proefschrift is goedgekeurd door de promotoren en de samenstelling van de promotiecom-missie is als volgt:

voorzitter: prof. dr. J. de Vlieg1e promotor: prof. dr. E. R. van den Heuvelcopromotor: dr. A. Di Bucchianicoleden: prof. dr. E. C. Wit (Rijksuniversiteit Groningen)

dr. H. Geys (Universiteit Hasselt)prof. dr. W. H. Woodall (Virginia Tech)prof. dr. R. Gorb (Universität Würzburg)

Het onderzoek of ontwerp dat in dit proefschrift wordt beschreven is uitgevoerd in overeen-stemming met de TU/e Gedragscode Wetenschapsbeoefening.

Page 6: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016
Page 7: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

Acknowledgements

This research work was carried out over a four year period at the University Medical CenterGroningen and Eindhoven University of Technology. A lot of people played a huge role in mysuccess in this position and I cannot name them all here; but I will go over a few names.

Firstly I am grateful to my promoter Prof. dr. Edwin van den Heuvel who offered methis position a little over four years ago. This has been an amazing journey and I have beenprivileged to work with you. I learnt a lot from you and I am positive that we will continueworking together in future.

I am thankful to my co-promoter dr. Alessandro Di Bucchianico, who dedicated a lot of timeon my thesis. Thank you very much for your insightful comments on my research work.

This work would not have been possible without the financial support from MSD, Oss, TheNetherlands. I am grateful to Erik Talens and Pieta IJzerman-Boon for being part of my journey.I also like to pass my gratitude to Marga Hendriks and Goele Goris who allowed me to workon their master theses. To everyone at MSD, including those who were responsible for the datathat I used for my thesis, thank you.

I am grateful to Prof Henry Mwambi and Prof Khangelani Zuma who planted the biostatisticsseed in me a few years ago when I was still in South Africa. Thank you for seeing somethingdifferent in me and for your encouraging words.

To my colleagues in the Stochastics section at Eindhoven University of Technology and myformer colleagues at the Medical Statistics unit at University Medical Center Groningen, thankyou very much for your support.

To all members of the African Students Community in Belgium, Eindhoven and Groningenyou made me feel at home away from home. The list of names is too long to name everyonehere, all I can say is thank you for your warm friendship.

Phindile, Linda, Sisa, and Kwande, thank you for your kindness and the spirit of ubuntuyou have shown towards me. For you driving from Bekkevoort all the way to Groningen orEindhoven was never an issue. The good old South African ubuntu lives within you and I wishyou more blessings in life.

To my family thank you for the love and support you have given me all these years.

Page 8: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

Finally, I would like to thank the most amazing person I have ever met, Wami. Thank youfor being my pillar of strength.

This thesis is dedicated in loving memory of Bheki, Ngelo and Bonginkosi Mzolo.

Page 9: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

Contents

Acknowledgements

Chapter 1: General introduction 11.1 A brief history of animal experiments . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Areas of applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Definition of bioassays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Calculation of bioactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4.1 Linear regression analysis for quantitative bioassays . . . . . . . . . . . . 61.4.2 Probit analysis for quantal bioassays . . . . . . . . . . . . . . . . . . . . 81.4.3 Goodness-of-fit and parallelism . . . . . . . . . . . . . . . . . . . . . . . 9

1.5 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.6 Aim of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.6.1 Estimation of bioactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.6.2 Estimation of product quality . . . . . . . . . . . . . . . . . . . . . . . . 12

1.7 Scope of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Chapter 2: Equivalence testing for similarity in bioassays: A critical note 192.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2 Statistical methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.1 Dose-response relationships . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.2 Relative bioactivities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.1 Traditional hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . 262.3.2 Equivalence testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.4 Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.1 Parallel line bioassays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.2 Slope-ratio bioassays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.4.3 Quantal bioassays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Page 10: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2.4.4 Quantitative sigmoid bioassays . . . . . . . . . . . . . . . . . . . . . . . 332.5 Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Chapter 3: A comparison of statistical methods for combining relative bioac-tivities from parallel line bioassays 43

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.1.1 Motivating example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2 Parallel line bioassays analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.3 (Un)weighted average methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3.1 Bliss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.3.2 Cochran . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.3.3 Morse and Bickle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.4 Maximum likelihood methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.4.1 Armitage, Bennett, and Finney . . . . . . . . . . . . . . . . . . . . . . . 533.4.2 Williams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.4.3 Meisner, Kushner, and Laska . . . . . . . . . . . . . . . . . . . . . . . . 543.4.4 Hardy and Thompson . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553.4.5 Random coefficient model . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.5 Pretests for homogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.6 A simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.6.1 Design of the simulation study . . . . . . . . . . . . . . . . . . . . . . . . 573.6.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.7 Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Chapter 4: Statistical process control methods for monitoring in-house ref-erence standards 67

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.2 Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2.1 Bioassay experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.2.2 Statistical model and hypotheses . . . . . . . . . . . . . . . . . . . . . . 724.2.3 Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.2.4 The Exponentially Weighted Moving Average . . . . . . . . . . . . . . . 754.2.5 α-Spending functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.3 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

ii

Page 11: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

4.3.1 Study designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.3.2 Deterioration profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774.3.3 Performance measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.4 Statistical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.4.1 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.4.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.5 Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Chapter 5: Monitoring the bioactivity in the absence of an internationalreference standard 89

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.2 Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.1 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.2.2 Statistical details on monitoring the primary and secondary references . . 93

5.3 A simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995.3.1 Performance measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.3.2 Probability of false alarms . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.3.3 Probability of detecting a shift . . . . . . . . . . . . . . . . . . . . . . . . 1035.3.4 Probability of detecting a linear trend . . . . . . . . . . . . . . . . . . . . 108

5.4 Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

Chapter 6: A modified Satterthwaite approach for estimation of one-sidedtolerance limits for general mixed effects models 115

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1176.2 Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6.2.1 Determination of the tolerance factor for tolerance intervals . . . . . . . 1226.2.2 The generalised pivotal quantity tolerance intervals . . . . . . . . . . . . 1256.2.3 The modified large sample tolerance intervals . . . . . . . . . . . . . . . 126

6.3 Simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1266.3.1 Two-way nested random effects model . . . . . . . . . . . . . . . . . . . . 1276.3.2 Two-way crossed random effects model with interaction . . . . . . . . . . 1286.3.3 Application to bioassay analysis data . . . . . . . . . . . . . . . . . . . . 130

6.4 Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

iii

Page 12: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

Chapter 7: Estimation of shelf life of a drug product in a stability degrada-tion study 135

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1377.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

7.2.1 Statistical models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1387.2.2 Shelf life estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1407.2.3 A theoretical comparison of the standard errors used in the estimation of

shelf life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1417.3 A simulation study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457.4 Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Chapter 8: Concluding remarks 1558.1 Estimation of the bioactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1558.2 Monitoring the bioactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1568.3 Estimation of tolerance limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1588.4 Estimation of the shelf life of a drug product . . . . . . . . . . . . . . . . . . . . 159

Summary 160

About the Author 163

iv

Page 13: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

CHAPTER 1

General introduction

1.1 A brief history of animal experiments

The first reported animal experiment was about the discovery of diphtheria antitoxin (Young-dahl, 2010) ∗. Diphtheria is a contagious bacterial infection caused by Corynebacterium diph-theria which mainly affects the nose and throat, leading to breathing difficulties (Markel, 2007;Youngdahl, 2010). In the 19th century, diphtheria infection led to a high mortality rate amongchildren (Markel, 2007)†. According to Youngdahl (2010), a heat-treated diphtheria toxin wasused to immunise guinea pigs in 1890 by physicians Dr Emil von Behring and Dr Emile Roux.However, a few years later the two physicians made a significant breakthrough in the fight againstdiphtheria toxin by discovering a new effective treatment for treating diphtheria (Markel, 2007).In the lead to discovering the diphtheria treatment, Behring and Roux injected diphtheria toxinto healthy horses, then drew at most four blood samples from the injected horses and theseblood samples were subsequently stored under freezing conditions. The treatment was the diph-theria antitoxins separated from the refrigerated blood samples (Markel, 2007). The first dosewas made available in January 1, 1895 and this treatment proved to be highly efficacious, moreso when administered at an early onset of infection (Markel, 2007). The diphtheria efficacioustreatment drug finding resulted in a reduction in the number of death rates due to diphtheriabacteria (Markel, 2007; Youngdahl, 2010).

Following this major breakthrough where animals were used to find treatment for humans,several experiments were subsequently performed. The discovery of vitamins is one example ofscientific findings that led to a high reduction of deformity and mortality in humans (DeLuca,2014). For instance, vitamins A to D were all discovered when scientists were investigating

∗History of vaccines http://www.historyofvaccines.org/content/blog/early-uses-diphtheria-antitoxin-united-states, Accessed 26-11-2015

†The New York Times http://www.nytimes.com/2007/07/10/health/10hors.html?_r=0, Accessed 26-11-2015

Page 14: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2 1.1. A brief history of animal experiments

appropriate dietary requirements. For example, McCollum and Davis (1913) discovered that aningredient found in butter fat and cod liver oil promotes growth, prevents xerophthalmia, andeye infection in white rats. Similar experiments (using rats as experimental units) that followeddiscovered a fat-soluble factor, namely, vitamin A and a water-soluble factor, that is, vitamin B.Vitamin A was responsible for the prevention of xerophthalmia and vitamin B was responsiblefor the prevention of neurological diseases (DeLuca, 2014). A second water-soluble factor, termedvitamin C was discovered from the same experiment, and this vitamin was responsible for theprevention of scurvy which was more prevalent among sailors (DeLuca, 2014).

One of the mysteries that were still uncovered was the prevention of rickets which was commonamong Scottish nationals (DeLuca, 2014). An early discovery on the causes of rickets involvedan experiment consisting of dogs that were fed food mostly eaten by Scottish people (Hess,1929). The settings of this particular experiment were that the dogs were deprived of sunlightand these dogs eventually developed rickets (Mellanby, 1919). Later on, McCollum et al. (1922)discovered that vitamin A deficient cod liver oil cured rickets. This led to the discovery of a newvitamin responsible for curing rickets, and it was called vitamin D (DeLuca, 2014; Hess, 1929;McCollum et al., 1922).

A graphical representation of an example of an animal experiment is shown in Figure 1.1.This is a two-arm treatment experiment; where one arm consisted of rats that were injectedwith cancer cells and these rats are given antagomir treatment, and the other treatment arm ismade up of rats that are injected with cancer cells, but are not given any treatment. In thisexperiment, it was observed that rats that were not given treatment developed more metastasiswhile those that received antagomir treatment reduced the number of lung metastasis but hadno effect on the original tumor cells.

Figure 1.1: A schematic example of a two-arm animal experiment Source: Michele De Palma& Luigi Naldini, Antagonizing metastasis, Nature Biotechnology 28, 331-332 (2010)

All these experiments were conducted by either physicians, biologists, or bacteriologists. Thechemical extraction was performed in laboratories and the response was the observed reaction on

Page 15: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

1.2. Areas of applications 3

the experimental unit. There were no strict binding regulations in place at the time to overseethat these experiments are performed in good conduct. Currently, animal experiments arehighly regulated by regulatory bodies, such as the United States Pharmacopeia (USP), EuropeanPharmacopeia (EP), Food and Drug Administration (FDA), and International Conference ofHarmonisation (ICH). This is mostly done to prevent the misuse of animals, and to maintainthe standard and quality of medications. Note that these issues have been a continuous point ofdiscussion from the discovery of diphtheria antitoxin involving horses, where the use of horsesas ‘patients’ led to many disputes with the public (Markel, 2007).

1.2 Areas of applications

Bioassays are used in various applications, for example, in the pharmaceutical industry, wherethe bioactivity can be the main quantity of a medicinal product (Irwin, 1950). They play a partin the development, licensing, manufacturing, stability testing, lot release and marketing of drugproducts in the pharmaceutical industry (Jeffcoate, 1996; Salvador et al., 2007; USP < 1032 >,2010; USP < 1033 >, 2010). For example, they are used for cancer drugs or hormonal products.A highly active cancer drug is beneficial for destroying cancer tumor cells, albeit with seriousadverse effects (Nastoupil et al., 2012; Rampling et al., 2004) while hormonal products are used inbirth control and in fertility products. Even when the bioactivity is not the main characteristic,bioassays are used to quantify toxicity levels of drug products (Govindarajulu, 2001).

Bioassays are also used in applications that are not connected to drug products for hu-mans. For example, bioassays are used in water pollution controls (Mackay et al., 1989), assess-ment of contaminated sediments (Brils et al., 2000), contaminated soils (Fernández et al., 2010;González et al., 2011), detection of dioxins and dioxin-like compounds in wastes and environment(Takigami et al., 2008), and growth responses of plants (Salvador et al., 2007). Furthermore,bioassays are used in veterinarian studies, for example, Bell et al. (1967) studied the serum andurinary gonadotrophin levels in pregnant donkeys.

1.3 Definition of bioassays

A bioassay is a (scientific) experiment conducted to estimate the biological activity (shortlybioactivity) of an unknown test preparation (Finney, 1978; Govindarajulu, 2001). There are twotypes of bioassays: direct and indirect. A direct bioassay measures the dose or concentrationof a stimulus that is needed to obtain a certain well-defined response, that is, a response thatis unambiguous and easily recognised (Finney, 1978). A typical example is the lethal dose ina certain animal experiment. When the stimulus is infused at a fixed rate and the time todeath is measured, the dose is the time multiplied with the rate and the response is death. An

Page 16: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

4 1.3. Definition of bioassays

indirect bioassay seeks to estimate equally effective doses of a standard and a test preparation(Finney, 1978). This means that the potency or biological activity of the test preparation, thatis, a measure of the biological strength is expressed in dose units of the standard. Indirectbioassays are considered more reliable than direct assays, since direct assays often demonstratemore variability than indirect bioassays (Finney, 1978).

In an indirect bioassay it is common practice that several doses of the standard and testpreparation are considered. It is preferred to have equally spaced doses and equal number ofdoses per preparation (symmetry) for purposes of highest precision (Finney, 1978) but it is notrequired. The biological response is the reaction observed at the biological unit caused by theapplication of the doses of the preparations. Based on these results, bioactivity of the testpreparation is estimated (see, Section 1.4). Note that the standard preparation is a knownand available robust preparation which is not affected by time and which has similar propertiesto that of a test preparation (Irwin, 1950). These known preparations are often internationalreference standards and they were created to standardise the bioassays. Available internationalreference standards are documented in the chapters of the United States Pharmacopeia. Knownstandards can also be in-house reference standards created from a product (Müller et al., 1996).

Bioassays are classified as in vivo or in vitro, where the in vivo bioassays constitute the use ofanimals, and in vitro bioassays uses biological materials (such as cells) tested on 96 well-plates,Petri dishes or in test tubes (Finney, 1978; USP < 1032 >, 2010; USP < 1033 >, 2010).

Figure 1.2: A schematic example of an in vivo bioassay experiment. Source: Bruno Lunenfeld,Historical perspectives in gonadotrophin therapy, Human reproduction update, 10(6), 453-467(2004)

An example of an in vivo assay is that of Lunenfeld (2004) for hormone preparation. The

Page 17: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

1.4. Calculation of bioactivity 5

authors gave a graphical representation of a Steelman-Pohley assay for a gonadotrophin hCG(test preparation) and FSH (Follicle stimulating hormone standard preparation) which is repro-duced in Figure 1.2. In this assay, a group of rats was injected with hCG and the other groupwith FSH. The ovaries of the rats were harvested and weighed as shown in the scale in Figure1.2. Based on the scale, the bioactivity is the amount of hCG required to yield ovaries weighingsimilar to ovaries of rats injected with FSH.

An example of an in vitro bioassay is that of measuring the activity of an influenza virus.Such a bioassay has been studied by Sidwell and Smee (2000) and Van Kessel et al. (2012). Inthis experiment, a gel is poured on a plate and it is set to harden the gel. Then a dilution seriesof a standard antigen and a test sample are punched into the plate. An incubation period of 18hours is allowed and at the end of this incubation period, the plate is washed and the diameterof the formed rings is measured (see Figure 1.3). The larger the diameter the more active thevirus is.

Figure 1.3: A schematic example of an in vitro bioassay experiment for influenza. Source:Van Kessel et al. (2012)

1.4 Calculation of bioactivity

In the early animal experiments, for diphtheria and vitamins, the biological response was binary,that is, quantal bioassays. For the diphtheria, a response was whether a horse did or did notrecover and for the case of vitamin D the response was whether the dogs did or did not developrickets. At the time, there was no involvement of sound statistical analysis and sound statisticalcalculations only began at the turn of the 20th century. Some of the earliest works were byIrwin (1937) and Bliss and Cattell (1943). The work of Irwin (1937) focused on both quantal

Page 18: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

6 1.4. Calculation of bioactivity

and quantitative (e.g., ovarian weight) responses. For the quantal bioassays a normal sigmoiddose-response relation was suggested while for the quantitative bioassays a linear regressionanalysis was considered. Finney (1947) and Finney (1978) presented a general formulation ofstatistical methods for the estimation of the bioactivity in indirect bioassays. To illustrate thecalculations we will describe the linear regression analysis for quantitative bioassays (used in theSteelman-Pohley assay) and probit analysis for quantal bioassays.

1.4.1 Linear regression analysis for quantitative bioassays

Let xij (with i = 1, 2 and j = 1, 2, . . . , Ji) be the jth dose for the ith preparation. We assumethat i = 1 represents the standard preparation and i = 2 the test preparation, Ji is the number ofdoses for the ith preparation. Now, let Yijk be the kth biological response (k = 1, 2, . . . , Kij)measured at dose xij. The most common statistical model for quantitative bioassays from ahistorical point of view is the parallel line model (Finney, 1978) given by

Yijk = αi + βlogxij + εijk, (1.1)

where αi is the intercept of the ith preparation, β is the common slope, and εijk is the residualassumed to be independent and normally distributed, εijk ∼ N(0, σ2). A graphical display ofthe linear dose-response relationship is shown in Figure 1.4.

Figure 1.4: A depiction of a linear dose-response relationship for a standard (solid line) prepa-ration and a test (dashed line) preparation

For the gonadotrophin assay example given in the previous section (Figure 1.2), a parallel

Page 19: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

1.4. Calculation of bioactivity 7

line of the form in (1.1) was applied. In that assay, the biological response was the ovarianweight. The standard preparation was the FHS and the test preparation was the hCG.

If both lines fall on top of each other, the test preparation provides the exact same biologicalresponse as the standard preparation. Thus, both preparations must be identical on the applieddose range. This implies that a relative bioactivity is equal to one for the test preparationwith respect to the standard. In case the test preparation lies on the right side of the standardpreparation, then the test preparation is less potent or weaker than the standard preparationbecause the test preparation needs higher doses to obtain the same biological response as thestandard preparation. If the test preparation falls on the left side of the standard, the testpreparation is more potent than the standard. For the parallel line model in (1.1), the potency ofthe test preparation with respect to the standard preparation is defined as ρ = exp{(α2−α1)/β}.

To estimate the bioactivity, Model (1.1) is fitted to the bioassay data to estimate the param-eters of this model. The estimation of these parameters is typically performed with ordinaryleast squares. The estimate of the bioactivity of the test sample is then obtained by substitutingthe parameter estimates in ρ. It is given by

logρ = α2 − α1

β. (1.2)

The calculation of the variance of the bioactivity in (1.2) requires the variances and covariances ofthe parameter estimators α1, α2, and β. They are given by σ2ν11, σ

2ν22, and σ2ν12, respectively,where σ2 is the residual variance. The constants ν11, ν22, ν12 are known parameters determinedby the design of the parallel line bioassay and they depend on the log doses (Finney, 1978). Usingthe delta method (Cramér, 1946), the variance of the log bioactivity in (1.2) is approximated by

τ 2 = σ2

β2 (ν11 − 2logρν12 + (logρ)2ν22). (1.3)

It can be estimated with S2 by substituting the parameter estimates. The number of degreesof freedom (df) is often taken as the number that corresponds to the estimate σ2 from theregression analysis. A confidence interval on the long bioactivity (logρ) is then calculated aslogρ± t−1

df (1−α/2)S with t−1n (q) the qth quantile of the t-distribution with n degrees of freedom.

An alternative approach to obtain confidence intervals is the approach of Fieller (Fieller, 1944).It is given by

XL, XU =logρ− gν12

ν22±t−1df (1− α/2)σ

β

√√√√ν11 − 2logρν12 + (logρ)2ν22 − g(ν11 −

ν212ν22

) /(1− g),(1.4)

where g = (t−1df (1 − α/2))2σ2ν22)/β2, XL and XU are lower and upper confidence limits of the

bioactivity, respectively, and all other parameters are as previously defined. If g is close to zero

Page 20: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

8 1.4. Calculation of bioactivity

(i.e., β � σ2√ν22), then the confidence interval in (1.4) simplifies to the confidence intervalobtained with the delta method, that is,

XL, XU =[logρ±

t−1df (1− α/2)σ

β

√ν11 − 2logρν12 + (logρ)2ν22

]. (1.5)

1.4.2 Probit analysis for quantal bioassays

In quantal bioassays the response Yijk is binary. One way of describing the dose-response rela-tionship is to apply

EYijk = Φ(αi + βlogxij) (1.6)

with Φ the standard normal distribution function and the remaining parameters are as definedin Section 1.4.1. If at each dose xij, the proportion pij = ∑Kij

k=1 Yijk/Kij is within (0, 1), aparallel line model can be applied to Φ−1(pij). In this case Model (1.1) may fit with k = 1.All calculations in Section 1.4.1 may be applied. Although, the parallel line model may beperformed as an approximate model which was done in the early days of bioassays, the methodof maximum likelihood for the observations Yijk with form (1.6) may be a better approach.

Figure 1.5: A depiction of a parallel probit dose-response relationship for a standard preparation(solid curve) and a test preparation (dashed curve)

One reason is that proportions p = 0 or p = 1 can still be part of the maximum likelihood(ML) estimation method but not in an analysis of Φ−1(pij) (because Φ−1(0) =∞). However, if

Page 21: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

1.4. Calculation of bioactivity 9

the proportions are somewhat adjusted then it is possible. Another reason is that the residualsin (1.1) do not follow a normal distribution when applied to Φ−1(pij), although this would onlybe a serious issue when Kij is relatively small.

The parallel line model applied to Φ−1(pij) does demonstrate that the potency ρ can stillbe calculated as ρ = exp{(α2 − α1)/β} for quantal bioassays of the dose-response form in (1.6).Substituting the ML estimates for α1, α2, and β provides a ML potency estimate. The confidenceinterval on logρ is typically calculated with the delta method where t−1

df (1− α/2) is replaced bya normal quantile zα. A graphical representation of the parallel probit models is given in Figure1.5.

1.4.3 Goodness-of-fit and parallelism

Prior to estimating the bioactivity, several assumptions about dose-response relationship in (1.1)and (1.6) will have to be met. For the quantitative bioassays these are the assumptions commonlymade in linear regression and they include linearity, normality of residuals and constant variance(homogeneity assumption) (Kutner et al., 2005). For qualitative bioassays the focus is mostly onthe selected dose-response relationship. If any of these assumptions are not met then remedialmeasures will have to be taken.

In linear regression a transformation of the biological response can be considered to keep thesame Model (1.1) for potency estimation. Alternatively, a completely different relationship canbe selected. This would also be the case for the quantal bioassays. A general formulation forthe response outcome Yijk (or the transformation thereof) is

EYijk = Fηi(αi + βiψ(xij)), (1.7)

with Fηi a sigmoid curve depending on the parameter ηi, with ψ a transformation of the dose(such as the logarithmic transformation), and αi and βi an intercept and slope for preparationi. In general, the relationship (1.7) provides enough flexibility to generate a reasonable dose-response relationship.

Calculation of the bioactivity also requires that both preparations are parallel or similar. ForModel (1.1) and (1.6) this is guaranteed by the slope β that is common to both preparations.In general, in formulation (1.7) this can also be guaranteed, but the set of restrictions on theparameters depend on the choice of ψ (see Chapter 2 for more details). Parallelism would implythat the test preparation is a dilution of the standard or the other way round (with respect totheir biological response). This means that the two preparations are biologically the same, theyonly differ in strength. Parallelism is an important characteristic in the bioassay analysis (USP< 1030 >; USP < 1034 >).

Page 22: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

10 1.5. Motivation

1.5 Motivation

Bioassays play a crucial role in the development, licensing, and marketing of biological productsin the pharmaceutical industry. For instance, it is part of clinical trials, optimising processes(process analytical technology), setting product specifications, and quantifying product shelflife. All these areas often encounter product-specific challenges that would require tailor-madestatistical methodology for bioassays to work optimally. This means that the bioassay has thehighest precision and is reliable.

Statistical methods for bioassays are unique (Jeffcoate, 1996). One reason is the elaborativeexperimentation that is often needed to obtain just one relative bioactivity and the statisticaleffort that is put into it to provide an appropriate bioactivity. Given the uniqueness of bioassaydata, some of the currently available statistical methodology may not be suitable or have notbeen adapted for this application. Another reason is that bioassays are mostly indirect contraryto microbiological methods that directly measure the bacterial quantity. The information that isprovided from a bioassay experiment, whether the biological response is continuous (e.g., ovarianweight) or binary (e.g., dead/alive), is a relative bioactivity with respect to a standard. Theoutcome of the bioassay is an estimate of the bioactivity, its precision and a number of degrees offreedom for the precision. Most measurement systems only provide a value without a standarderror and a degree of freedom.

This necessitates more research work to fully evaluate, understand, and develop new method-ology appropriate for the bioassay application at hand. This thesis will focus on two parts ofstatistical methodology for biological assays, which are strongly interrelated. The first partis on the estimation of the bioactivity and the second part is on the estimation of productquality. The equations described in Section 1.4 are based on one bioassay experiment or onebioassay run. However, it is common practice to perform more than one bioassay run. Thesemultiple bioassay runs are performed to improve the estimation of the relative bioactivity.The resulting outcome from each bioassay run is a tripartite of the relative potency, stan-dard error, and its degrees of freedom. If H bioassay runs are performed, then there will be(X1, S1, df1), (X2, S2, df2), . . . , (XH , SH , dfH) sets of parameter estimates. These sets ofestimates are used to estimate a pooled or combined bioactivity estimate where the variabilitybetween estimates (Xh) and within estimates (Sh) are taken into account. For the estimationof the bioactivity, a statistical method that is highly precise when combining bioassay estimatesis still not known. Another issue in the bioassay analysis is that official guidelines require thesimilarity assumption to be assessed using an equivalence hypothesis, but the feasibility of thisapproach has not been appropriately evaluated in the context of bioassays.

The second part of the thesis focuses on several topics of the product quality where thequality of the product is mainly measured using the bioactivity. Statistical methodology for

Page 23: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

1.6. Aim of the thesis 11

monitoring the bioactivity of a known in-house preparation with respect to an internationalreference standard or monitoring the bioactivity of a known in-house preparation when an inter-national reference standard is not available is very limited or at most nonexistent. This meansthat new methodology is required to enable the monitoring of the bioactivity and this methodol-ogy should take into account the bioassay data structure. Monitoring the standard preparationshelp guarantee the quality of drug products that are released to the market. Another qualityrelated aspect of a drug product is the determination of specification limits.

The characteristics of drug products manufactured by pharmaceutical companies are ex-pected to lie within a certain range, and this is commonly known as specification limits. Theselimits are often set using tolerance limits of the bioactivity. Statistical methods that are cur-rently available were specifically developed for certain settings and as a result these statisticalmethods are not applicable to more general settings or more complex designs that are commonfor bioassays. The last topic on product quality investigates the estimation of shelf life. This isthe time in which the product will remain biologically active. The estimation of the shelf lifeof a product is well documented by the ICH. The proposed method does not take into accountthe design of the bioassay. For example, the standard error and degrees of freedom are notaccounted for in the shelf life estimation nor is the design structure that would be imposed bybioassay runs. This implies that there is a higher likelihood of imprecisely estimating the shelflife, especially when there is significant variability between bioactivity estimates.

1.6 Aim of the thesis

The specific aims of the thesis with more detailed explanation on the issues regarding bioactivityestimation and product quality are given in Section 1.6.1 and Section 1.6.2, respectively.

1.6.1 Estimation of bioactivity

For the relative bioactivity to be fit-for-use implies that well-designed and well-executed bioas-say runs must be conducted. The relative bioactivity is estimated with an appropriate modelfitting the data. Regulatory recommendations require that the test preparation (or test sample)behaves similar to the standard preparation (or known sample). The test preparation is thenconsidered as a dilution of the standard preparation (see Section 1.4.2). In practice, evaluation ofsimilarity is performed by assessing the traditional null hypothesis of similarity or by the hypoth-esis of equivalence (Callahan and Sajjadi, 2003; Gottschalk and Dunn, 2005; Hauck et al., 2005;Yellowlees et al., 2013). Regulators recommend that the equivalence hypothesis is used over thetraditional hypothesis. The equivalence approach may accept a small but significant violation ofthe traditional null hypothesis on similarity, while the traditional hypothesis should not reject

Page 24: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

12 1.6. Aim of the thesis

similarity between the two preparations. In Chapter 2, an equivalence hypothesis formulatedon the relative bioactivity instead of the model parameters is proposed. The consequences ofthis hypothesis are critically evaluated and compared with the traditional hypothesis.

Two practical situations may lead to pooled or combined bioactivities from multiple individ-ual bioactivities. The first situation is when a bioactivity intended as a reportable value is notprecise enough. Multiple bioassay runs are then conducted to improve precision. The secondsituation is when a bioassay run consists of multiple bioactivities to determine a reportable value.This is often the case in the in vitro bioassays with multiple well-plates and each plate providesone bioactivity. Combining or pooling information from different experiments is referred to asmeta-analysis (Glass, 1976). Meta-analysis is strongly associated with medical sciences and notso much with bioassays, although pooling bioactivities has a longer history and originated frombioassays and social sciences. In meta-analysis, a parameter of interest is typically taken froma (linear or logistic) regression model, such as an overall effect of a drug taken from differentstudies to improve the effect size. For bioassays, the relative bioactivity may not just be oneparameter but a function of several model parameters, like in the parallel line model (see (1.2)).

Combining estimates from different studies is often complicated by the fact that estimatestend to be highly heterogeneous. This implies that the variability between estimates is largerthan the precision of individual estimates (Cochran, 1954; Finney, 1978). Thus, applying statis-tical models for pooling estimates must account for sources of variation between studies. Failingto account for these sources of variation may result in the underestimation of the standard errorof the pooled estimate. For bioassays, underestimation may lead to false acceptance of a drugproduct. Between bioassay variability is due to differences in conditions from conducting severalexperiments and the these conditions affect the bioactivities.

Many approaches for combining bioactivities have been developed long before meta-analysisbecame famous. These include the methods of moments, simple averages, and likelihood ap-proaches (Bliss, 1952; Cochran, 1954; Jeffcoate, 1996; Laska and Meisner, 1987; Meisner et al.,1986; Morse and Bickle, 1967). In meta-analysis other approaches have been developed. Theseinclude mixed effects (Searle et al., 1992; Searle, 1971) and profile likelihood (Hardy and Thomp-son, 1996) approaches. However, it is unknown how these methods perform on pooling bioac-tivities. Even older approaches have never been compared in full. In Chapter 3, all theseapproaches are assessed to determine the most optimal and efficient approach to employ whencombining bioactivity estimates and the effects of heterogeneity and sample sizes are assessed.The results have been published in the Pharmaceutical Statistics journal (Mzolo et al., 2013).

1.6.2 Estimation of product quality

The relative bioactivity of a product is estimated relative to a known standard preparation.The international reference standard tends to be hugely expensive for routine purposes, and to

Page 25: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

1.6. Aim of the thesis 13

minimise costs, pharmaceutical companies may develop their own in-house reference standard.The bioactivity of the in-house reference standard is tested against the international referencestandard to quantify its bioactivity. The consequence of using an in-house reference is that thepharmaceutical company is obliged to assess the biological stability of this in-house referencestandard by monitoring its bioactivity during the period of its use. For example, this periodcould be five years, and the tests to evaluate the stability of this bioactivity could be performedannually. If it is found that the bioactivity of the in-house reference standard is not stable,then a new in-house reference standard should be created and tested against the internationalreference standard.

Several statistical methods are available for monitoring the quality of products and theseinclude the exponential weighted moving average (EWMA) (Roberts, 1959), cumulative sum(CUSUM) (Page, 1954) control charts, andQ charts, and these are commonly used in engineeringapplications. Monitoring bioactivities of an in-house standard using these control charts has twocomplications. Firstly, the frequency and the duration period for in-house standards lead to alimited number of follow-up points which implies that average run lengths, which are used incontrol charts, are not useful for evaluating the bioactivities. Secondly, the data at each follow-up point may consist of a set of heterogenous bioactivities accompanied by standard errors anddegrees of freedom. Incorporating this additional information in the monitoring is unknown.Chapter 4 of this thesis attempts to address these two issues for monitoring the in-housereference standard by adjusting the existing methods to this application. Different follow-upschemes are also investigated to assess how the power of detecting changes in the bioactivityis influenced. The chapter has been published in the Statistics in Biopharmaceutical Researchjournal (Mzolo et al., 2015).

For some products, an international reference standard does not exist and in such cases,a new approach is needed to be able to create and monitor the bioactivity of an in-housereference standard. In Chapter 5 of this thesis, a strategy together with a testing schemeis devised to enable the pharmaceutical company to create and monitor the stability of relativebioactivities. This strategy involves creating a primary and secondary reference standards,where the former replaces the international reference standard and the latter replaces an in-house reference standard. The stability of the primary reference is now the responsibility of thepharmaceutical company and it is of great significance because this reference is used to qualifysecondary references. Thus, both the primary and secondary standard are to be controlled andmonitored. The proposed strategy is used to guarantee stability in the standards and assessthe longevity of the primary reference. In this thesis, optimal parameters of an EWMA controlchart used for monitoring the bioactivity of the primary reference are determined.

The bioactivity of a drug product is expected to lie within a particular range called specifi-cation limits. The specification limits can be determined by estimating tolerance limits of the

Page 26: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

14 1.7. Scope of the thesis

relative bioactivity. Currently, a number of statistical approaches are available for estimatingboth one- and two-sided tolerance limits. However, these statistical methodologies are developedfor simple statistical models and not easily generalisable to more complex designs for higher or-der analysis of variance (ANOVA) models. The bioactivity can be typically affected by a numberof variation sources which need to be included in the tolerance limits if specifications are to beset realistically. In Chapter 6, a flexible approach for determining one-sided tolerance limitsapplicable to any variance component model is proposed.

The final research chapter of this thesis focuses on the estimation of the shelf life of drugproducts using bioassays. In order to estimate the shelf life of a drug product, batches of productsproduced by pharmaceutical companies are subjected to different storage conditions. Stabilitystudies enable the determination of the shelf life of the drug product under these conditions.Any product will degrade but the degradation rates may vary with storage conditions over time.In literature so far, the effect of bioassay runs has been ignored in the estimation of shelf lives.Consequently, the bioassay run variability, bioactivity precision, and degrees of freedom are notaccounted for. InChapter 7 of this thesis, the enhanced statistical analysis for two experimentaldesigns are compared for estimating shelf life.

1.7 Scope of the thesis

The thesis is organised as follows: the parallelism of the test and standard sample is introducedin Chapter 2. In this chapter, two types of testing procedures are introduced and these arecompared using case studies and simulations. Methods for estimating a relative bioactivity areintroduced in Chapter 3. The efficiency and precision of these method is fully examined bymeans of simulations. In Chapter 4, methods for monitoring an in-house reference standard arediscussed. These include methods that are used in dose-finding studies and their applicability onbiological assays is assessed. However, some products do not have an international or standardreference standard which enables the estimation of bioactivities. As a result, a scheme and testingstrategy is introduced where both the primary (standard) and secondary (test) references aremonitored through time and this is covered in Chapter 5. Chapter 6 introduces a method forsetting up specification limits using tolerance limits. The designs of stability degradation studiesare introduced in Chapter 7. The designs enable a precise estimation of the shelf life. The lastchapter of the thesis summarises the conclusions and the prospective further research.

Page 27: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

REFERENCES 15

References

Bell, E. T., Loraine, J. A., Jennings, S., and Weaver, A. D. (1967), “Serum and Urinary Go-nadotrophin Levels in Pregnant Ponies and Donkeys,” Quartely Journal of Experimental Phys-iology, 52, 68–75.

Bliss, C. I. (1952), The Statistics of Bioassay, New York: Academic Press Inc.

Bliss, C. I. and Cattell, M. (1943), “Biological Assay,” Annual Review of Physiology, 5, 479–539.

Brils, J., Stronkhorst, J., and Van De Guchte, K. (2000), “The Status and Use of Bioassays forthe Assessment of Contaminated Sediments in the Netherlands,” In Workshop Report, 12–16.

Callahan, J. and Sajjadi, N. (2003), “Testing the Null Hypothesis for a Specified Difference: TheRight Way to Test for Parallelism,” Bioprocessing Journal, 2, 71–78.

Cochran, W. G. (1954), “The Combination of Estimates from Different Experiments,” Biomet-rics, 10, 101–129.

Cramér, H. (1946), Mathematical Methods of Statistics., Princeton, NJ: Princeton UniversityPress.

DeLuca, H. F. (2014), “History of the Discovery of Vitamin D and Its Active Metabolites.”BoneKEy Reports, 3, 479.

Fernández, M. D., Babín, M., and Tarazona, J. V. (2010), “Application of Bioassays for theEcotoxicity Assessment of Contaminated Soils,” Methods in Molecular Biology (Clifton, N.J.),599, 235–62.

Fieller, E. C. (1944), “A Fundamental Formula in the Statistics of Biological Assay, and SomeApplications,” Quarterly Journal of Pharmacy and Pharmacology, 17, 117–123.

Finney, D. J. (1947), “The Principles of Biological Assay,” Supplement to the Journal of theRoyal Statistical Society, 9, 46–91.

— (1978), Statistical Method in Biological Assay, London: Charles Griffin & Co. Ltd.

Glass, G. V. (1976), “Primary, Secondary and Meta-analysis Research,” Educational Researcher,10, 3–8.

González, V., Díez-Ortiz, M., Simón, M., and van Gestel, C. A. M. (2011), “Application ofBioassays with Enchytraeus Crypticus and Folsomia Candida to Evaluate the Toxicity of aMetal-contaminated Soil, Before and After Remediation,” Journal of Soils and Sediments, 11,1199–1208.

Page 28: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

16 REFERENCES

Gottschalk, P. G. and Dunn, J. R. (2005), “Measuring Parallelism, Linearity, and RelativePotency in Bioassay and Immunoassay Data,” Journal of Biopharmaceutical Statistics, 15,437–63.

Govindarajulu, Z. (2001), Statistical Techniques in Bioassay, Basel: Karger, 2nd ed.

Hardy, R. J. and Thompson, S. G. (1996), “A Likelihood Approach to Meta-analysis WithRandom Effects,” Statistics in Medicine, 15, 619–629.

Hauck, W., Capen, R., Callahan, J., De Muth, J. E., Hsu, H., Lansky, D., Sajjadi, N., Seaver, S.,Singer, R. R., and Weisman, D. (2005), “Assessing Parallelism Prior to Determining RelativePotency,” PDA Journal of Pharmaceutical Science and Technology, 59, 127–137.

Hess, A. (1929), “The History of Rickets,” in Rickets, Including Osteomalacia and Tetany,Philadelphia: Lea & Febiger, pp. 22–37.

Irwin, J. O. (1937), “Statistical Method Applied to Biological Assays,” Royal Statistical Society,4, 1–60.

— (1950), “Biological Assays with Special Reference to Biological Standards,” The Journal ofHygiene, 48, 215–238.

Jeffcoate, S. (1996), “The Role of Bioassays in the Development, Licensing and Batch Controlof Biotherapeutics,” Trends in Biotechnology, 14, 121–124.

Kutner, M. H., Nachtsheim, C. J., Neter, J., and Li, W. (2005), Applied Linear Statistical Models,New York: McGraw-Hill/Irwin, 5th ed.

Laska, E. M. and Meisner, M. J. (1987), “Statistical Methods and Applications of Bioassay,”Annual Review of Pharmacology and Toxicology, 27, 385–97.

Lunenfeld, B. (2004), “Historical Perspectives in Gonadotrophin Therapy,” Human ReproductionUpdate, 10, 453–467.

Mackay, D. W., Holmes, P. J., and Redshaw, C. J. (1989), “The Application of Bioassay Tech-niques to Water Pollution Problems - The United Kingdom Experience,” Hydrobiologia, 188-189, 77–86.

Markel, H. (2007), Long Ago Against Diphtheria, the Heroes were Horses (accessed 2015/11/26),vol. 1, New York: The New York Times.

McCollum, E. V. and Davis, M. (1913), “The Necessity of Certain Lipins in the Diet DuringGrowth,” Journal of Biological Chemistry, 25, 167–175.

Page 29: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

REFERENCES 17

McCollum, E. V., Simmonds, N., Becker, J. E., and Shipley, P. G. (1922), “An ExperimentalDemonstration of the Existence of a Vitamin Which Promotes Calcium Deposition,” Journalof Biological Chemistry, 53, 293–298.

Meisner, M., Kushner, H. B., and Laska, E. M. (1986), “Multivariate Combining Bioassays,”Biometrics, 42, 421–427.

Mellanby, E. (1919), “An Experimental Investigation on Rickets,” Lancet, 1, 407–412.

Morse, P. M. and Bickle, A. (1967), “The Combination of Estimates from Similar Experiments,Allowing for Inter-experiment,” Journal of the American Statistical Association, 62, 241–250.

Müller, K. M., Gempeler, M. R., Scheiwe, M. W., and Zeugin, B. T. (1996), “Quality Assurancefor Biopharmaceuticals: An Overview of Regulations, Methods and Problems,” PharmaceuticaActa Helvetiae, 71, 421–438.

Mzolo, T., Goris, G., Talens, E., Di Bucchianico, A., and Van den Heuvel, E. (2015), “Statis-tical Process Control Methods for Monitoring In-house Reference Standards,” Statistics inBiopharmaceutical Research, 7, 55–65.

Mzolo, T., Hendriks, M., and Van den Heuvel, E. (2013), “A Comparison of Statistical Methodsfor Combining Relative Bioactivities from Parallel Line Bioassays,” Pharmaceutical Statistics,12, 375–384.

Nastoupil, L. J., Rose, A. C., and Flowers, C. R. (2012), “Diffuse Large B-cell Lymphoma:Current Treatment Approaches,” Oncology, 26, 488–95.

Page, E. (1954), “Continuous Inspection Schemes,” Biometrika, 41, 100–115.

Rampling, R., James, A., and Papanastassiou, V. (2004), “The Present and Future Managementof Malignant Brain Tumours: Surgery, Radiotherapy, Chemotherapy,” Journal of Neurology,Neurosurgery and Psychiatry, 75, ii24–ii30.

Roberts, S. (1959), “Control Charts Tests Based on Geometric Moving Averages,” Technomet-rics, 1, 239–250.

Salvador, J.-P., Adrian, J., Galve, R., Pinacho, D. G., Kreuzer, M., Sánchez-Baeza, F., andMarco, M.-P. (2007), “Chapter 2.8 Application of Bioassays/Biosensors for the Analysis ofPharmaceuticals in Environmental Samples,” Comprehensive Analytical Chemistry, 50, 279–334.

Searle, S., Casella, G., and McCulloch, C. (1992), Variance Components, New Jersey: JohnWiley & Sons, Inc.

Page 30: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

18 REFERENCES

Searle, S. R. (1971), Linear Models, New York: John Wiley & Sons.

Sidwell, R. W. and Smee, D. F. (2000), “In Vitro and In Vivo Assay Systems for Study ofInfluenza Virus Inhibitors.” Antiviral Research, 48, 1–16.

Takigami, H., Suzuki, G., and Sakai, S. (2008), “Application of Bioassays for the Detectionof Dioxins and Dioxin-like Compounds in Wastes and the Environment,” InterdisciplinaryStudies on Environmental Chemistry-biological Responses to Chemical Pollutants, 87–94.

USP < 1030 > (2010), “Biological Assay Chapters-Overview and Glossary,” Tech. rep., UnitedStates Pharmacopeia.

USP < 1032 > (2010), “Design and Development of Biological Assays,” Tech. rep., United StatesPharmacopeia.

USP < 1033 > (2010), “Biological Assay Validation,” Tech. rep., United States Pharmacopeia.

USP < 1034 > (2010), “Analysis of Biological Assays,” Tech. rep., United States Pharmacopeia.

Van Kessel, G., Geels, M. J., De Weerd, S., Buijs, L. J., De Bruijni, M. A. M., Glansbeek, H. L.,Van den Bosch, J. F., Heldens, J. G., and Van den Heuvel, E. R. (2012), “Development andQualification of the Parallel Line Model for the Estimation of Human Influenza Haemagglu-tinin Content Using the Single Radial Immunodiffusion Assay,” Vaccine, 30, 201–209.

Yellowlees, A., Lebutt, C. S., Hirst, K. J., Fusco, P. C., and Fleetwood, K. J. (2013), “EfficientAnalysis of Dose-time-Response Assays,” BioScience, 63, 490–498.

Youngdahl, K. (2010), Early Uses of Diphtheria Antitoxin in the United States (accessed2015/11/26), vol. 1, Philadelphia: History of vaccines.

Page 31: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

CHAPTER 2

Equivalence testing for similarity in bioassays: Acritical note

Page 32: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

20

Abstract

Similarity in bioassays means that the test preparation behaves as a dilution of the standardpreparation with respect to their biological effect. Thus, similarity must be investigated toconfirm this biological property. Historically, this was typically conducted with a traditional hy-pothesis testing, but this approach has received substantial criticism. Failing to reject similaritydoes not imply that the two preparations are similar. Also, rejecting similarity when the bioassayvariability is small, might simply demonstrate a non-relevant deviation in similarity. To remedythese concerns, equivalence testing has been proposed as an alternative to traditional hypothesistesting and it has found its way in the official guidelines. However, the consequences of equiva-lence testing for similarity on the relative bioactivity of the test preparation have not been fullyinvestigated. This chapter provides a general framework on equivalence that is directly relatedto the relative bioactivity. It is demonstrated that non-similarity can never imply equivalence onthe relative bioactivity in general, but only on a finite interval for the dose range. Additionally,several case studies show that reasonable finite dose ranges lead to unrealistic numbers of testunits required to demonstrate bioequivalence of a test preparation in the bioassay. Althoughour general framework is theoretically appropriate towards equivalence testing for similarity, weargue that it might be too impractical to execute.

Keywords: relative bioactivity; slope-ratio; S-shaped response curve; quantal bioassays;quantitative bioassays

Page 33: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2.1. Introduction 21

2.1 Introduction

The main objective in bioassay analysis is to estimate the relative potency or bioactivity ofa test preparation with respect to a reference or standard preparation (Finney, 1978). Therelative bioactivity is the ratio of dose xS of the standard preparation and dose xT of the testpreparation that would generate the same predefined biological effect y (e.g., y = 10% adverseevents in mice). This ratio ρ (xS) may vary with the predefined biological effect y and dosexS through the inverse dose-response relationship for the standard preparation. In the specialcase that the relative bioactivity is independent of dose xS, that is, ρ (xS) ≡ ρ0 is constant,the test preparation is or behaves as a dilution of the standard preparation or the other wayaround. This means that the standard and test preparations are biologically similar, since theyonly differ in concentration and not in their biological response (Finney, 1978). This biologicalcondition is referred to as parallelism or similarity.

The assumption of similarity is almost always statistically tested in the bioassay analysissince its violation could imply that the biological effect of the test preparation administeredat doses different from the doses used in the bioassay cannot be predicted from the standardpreparation. Medicinal products are often tested at substantially lower doses in the bioassaythan the intended doses administered to human subjects. Thus in the most extreme case, non-similarity observed in the bioassay analysis could imply that the test preparation is biologicallyharmful or not effective at all at the intended doses of the test preparation. We used the words“could imply” because violation of similarity may also be caused by artifacts or design issues inthe bioassay. On the other hand, failing to reject similarity in a bioassay analysis does not provethat the test preparation is similar to the standard preparation. Large assay variation will tendto mask non-similarity, while small assay variation may detect irrelevant non-similarity. Theseissues with similarity have been used by Callahan and Sajjadi (2003) and Hauck et al. (2005) asarguments to introduce equivalence testing on similarity in bioassay analysis as a replacementof the commonly used traditional hypothesis testing.

The equivalence testing approach on similarity has been adopted by the United States Phar-macopeia for bioassay development and validation (USP < 1032 >; USP < 1033 >) and bioassayanalysis (USP < 1034 >). It requires a predefined criterion on one or several parameters thatare used in the dose-response relationships for the test and standard preparations. For instance,in a parallel line assay, an equivalence criterion is needed either on the difference of the slopesor on the ratio of the slopes for the two preparations (Hauck et al., 2005). This calculationprocedure resembles, for instance, equivalence testing on the risk difference or risk ratio in alogistic regression analysis of a binary clinical outcome in clinical trials (Dann and Koch, 2008;Julious and Owen, 2011).

Although equivalence testing for similarity is part of the official guidelines, we believe that

Page 34: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

22 2.2. Statistical methodology

the concept has not been discussed critically enough in literature. For instance, equivalencetesting for similarity is not the same as equivalence testing for treatment effects in clinical trials.The latter investigates equivalence of a clinical outcome for two treatments both administered atjust one dose, while the former essentially investigates equivalence of a clinical outcome for twotreatments at all doses. More specifically, the consequences of equivalence testing for similarityon the size of the relative bioactivity of the test preparation with respect to the standard atthe intended doses have never been investigated, nor is the predefined criterion used for thedose-response parameters based on any clinical relevance. Thus, the purpose of this chapter isto formulate equivalence testing of similarity in terms of the relative bioactivity and then discussits feasibility in quantal and quantitative bioassays.

The chapter is organised as follows. We start with a general formulation of dose-responsecurves for bioassays and provide expressions for the relative bioactivity for parallel line, slope-ratio, two, three, four, and five parameter logistic dose-response curves in the next section. Thethird section describes traditional hypothesis testing and equivalence testing for similarity. Thefourth section illustrates the testing methods on four different case studies and discusses samplesizes for equivalence testing. The final section is a critical discussion of the results and thepractical limitations. A manuscript based on this chapter has been submitted for publication.

2.2 Statistical methodology

Let Yijk be the kth (possibly transformed) biological outcome at the jth dose for preparation i ina bioassay analysis, with i = 1, 2, j = 1, 2, . . . , mi, and k = 1, 2, . . . , nij. Preparation i = 1will represent the standard preparation and preparation i = 2 will represent the test preparation.The outcome Yijk can be either continuous (quantitative bioassays), discrete (count bioassays),or binary (quantal bioassays). The design of the bioassay is fully determined by the selection ofdoses and how test units (e.g., animals or wells on a well-plate) are allocated to treatments (e.g.,the doses of preparations). The number of doses will typically be the same for both preparations(mi = m) and the number of replicates the same for each dose (nij = n), but this is not arequirement. We will also assume that test units are randomly assigned to treatments.

A general formulation for the expected biological outcome of any of the three types of bioas-says can be described by

EYijk = δi + (γi − δi)Fηi (αi + βiψ (xij)) , (2.1)

with Fη a known monotone increasing or decreasing function, parameterised by η = (η1, η2, . . . , ηp),zij = ψ (xij) the selected dose metameter (i.e., a transformation of dose xij), andθi = (αi, βi, γi, δi, η1i, η2i, . . . , ηpi) an unknown set of p+4 parameters for preparation i. The

Page 35: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2.2. Statistical methodology 23

slope βi can be selected positive, since an increasing or decreasing dose-response relationship isfully determined by the choice of the function Fη. Note that the function Fη can also be freefrom any parameter (i.e., p = 0). In case the function Fη is a distribution function, the param-eters δi and γi represent a lower and upper asymptote, respectively. Finally, the dose-responserelationship (2.1) is a very general formulation since it contains all dose-response relationshipsthat are commonly used in bioassay analysis.

2.2.1 Dose-response relationships

The parallel line model is obtained by taking the dose metameter zij = ψ (xij) = log xij, withthe log function taken as the natural logarithm, the parameters δi = 1−γi = 0, and the functionFη (z) = z the identity function. Relation (2.1) reduces to

EYijk = αi + βizij. (2.2)

The slope-ratio model has the same form as (2.2), but is obtained by choosing dose metameterzij = xij, αi = 1 − βi = 0, and function Fη (z) = zη. The intercept αi and slope βi in (2.2) arethen replaced by the parameters δi and γi − δi in (2.1), respectively. Note that the choices forthe slope-ratio assay may result in a dose power ηi depending on preparation i, while choiceszij = xηij, δi = 1−γi = 0, and Fη (z) = z lead to a power η that must be known and independentof preparation, since the dose metameter is a known transformation that is applied to all doses,irrespective of preparation.

In many bioassays a linear relationship (2.2) in dose metameter zij = ψ (xij) is often onlyapproximately true (Finney, 1978; Volund, 1978). On the whole domain R≥0, the dose-responserelationship has often a sigmoid form or shape. For quantal bioassays the two-parameter probitdose-response relationship is given by

EYijk = Φ (αi + βi log (xij)) , (2.3)

with Φ the standard normal distribution function. The intercept αi is related to the well-knownED50, the dose concentration that gives an event probability of 50%, that is, ED50 = exp (αi).An alternative model for quantal bioassays is to use the logistic distribution function F (z) =exp (z) /(1 + exp (z)) for the normal distribution function Φ in (2.3).

The relationship in (2.3) suggests no biological events for a blank dose (when βi > 0), butthis is not guaranteed in all quantal bioassays. In the context of virus bioassays (Ridout et al.,1993) and microbiology (IJzerman-Boon and Van den Heuvel, 2015), the following dose-response

Page 36: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

24 2.2. Statistical methodology

relationship has been proposed

EYijk = 1− (1− δi) exp (−βixij) , (2.4)

with δi ∈ [0, 1). The relationship in (2.4) for quantal bioassays is often referred to as thecomplementary log-log dose-response curve. It fits the formulation in (2.1) by either choosingzij = ψ (xij) = xij, αi = 1 − γi = 0, and Fη (z) = 1 − exp (−z) the exponential distributionfunction, or zij = ψ (xij) = log xij, βi = γi = 1, and Fη (z) = 1 − exp (− exp (z)) the doubleexponential distribution function. For the second formulation, βi in (2.4) is equal to exp (αi)and different from the βi defined by (2.1).

The range of expected biological responses in quantitative bioassays is typically differentfrom the range [0, 1] for quantal bioassays. This implies that to obtain flexibility in the rangeof expected outcomes, the parameters γi and δi cannot be restricted anymore. However, choicesfor ψ and Fη in quantitative bioassays may be similar to quantal bioassays. One particular dose-response relationship that is applied to quantitative bioassays more than to quantal bioassays isthe five parameter logistic curve

EYijk = δi + (γi − δi) [1 + exp (−αi − βi log (xij))]−ηi , (2.5)

with γi, δi ∈ R and ηi > 0. The power parameter η induces an asymmetric logistic dose-response relationship, which means that it models the curve below and above the ED50 differently(Gottschalk and Dunn, 2005). In case the power is equal to one (ηi = 1), relationship (2.5)reduces to a symmetric four-parameter logistic curve. It should be noted that Ricketts andHead (1999) discussed another type of five-parameter curve, which falls outside our generalformulation (2.1) of dose-response relationships and is outside the scope of this thesis.

2.2.2 Relative bioactivities

The relative bioactivity is defined by the ratio ρ (xS) of doses that makes the expected biologicalresponse of the test preparation at dose xT = ρ (xS)xS equal to the expected biological responseof the standard preparation at dose xS (Finney, 1978). Note that in our formulation, a relativebioactivity larger than one (ρ (xS) > 1) makes the standard more potent than the test prepara-tion, because the test preparation requires higher doses than the standard preparation to obtainthe same biological response. In terms of our general formulation (2.1), the relative bioactivityρ (xS) should satisfy the equality

δ1 + (γ1 − δ1)Fη1 (α1 + β1ψ (xS)) = δ2 + (γ2 − δ2)Fη2 (α2 + β2ψ (ρ (xS)xS)) .

Page 37: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2.2. Statistical methodology 25

Under the assumption that both the dose metameter ψ and function Fη have an inverse function,the solution for the relative bioactivity ρ (xS) at dose xS is

ρ (xS) = x−1S ψ−1

(β−1

2

[F−1η2

(δ1−δ2γ2−δ2

+ γ1−δ1γ2−δ2

Fη1 (α1 + β1ψ (xS)))− α2

]). (2.6)

This relative bioactivity reduces in complexity when the following parameter restrictions areimplemented: γ1 = γ2, δ1 = δ2, and η1 = η2. These assumptions make the minimum andmaximum expected biological responses the same for the two preparations and also make thedose-response relationship Fη identical for both preparations. The relative bioactivity is then

ρ (xS) = x−1S ψ−1 ([α1 − α2 + β1ψ (xS)] /β2) . (2.7)

Thus, the relative bioactivity is now independent of function Fη and the parameters γ1, γ2, δ1, δ2, η1,and η2, but it still depends on dose metameter ψ and parameters α1, α2, β1, and β2.

The relative bioactivity ρ (xS) for the parallel line Model (2.2), the quantal bioassay (2.3),and the quantitative bioassay (2.5) are all equal to

ρ (xS) = exp {(β1/β2 − 1) log (xS)} exp {(α1 − α2) /β2} ,

when γ1 = γ2, δ1 = δ2, and η1 = η2. This relative bioactivity always holds true for the parallelline model and the two-parameter probit and logit models when the same sigmoid dose-responserelationships for both preparations are fitted (η1 = η2). For the slope-ratio Model (2.2), therelative bioactivity ρ (xS) is different due to the different dose metameter. In terms of theparameters in (2.2), it is given by

ρ (xS) = [(α1 − α2 + β1xηS)/(β2x

ηS)]1/η,

when η1 = η2 = η with the dose metameter in (2.2) taken as ψ (x) = xη. Since the quantalbioassay in (2.4) can be obtained in two different ways from formulation (2.1), using eithera dose metameter ψ (x) = log (x) or ψ (x) = x, and the relative bioactivity ρ (xS) in (2.7)depends only on dose metameter (when δ1 = δ2), the set of parameters in (2.1) to generate (2.4)satisfies a specific constraint (αi = 0 or βi = 1) that would make the relative bioactivity doseindependent. Indeed, the formulation with the exponential distribution has relative bioactivityρ (xS) = β1/β2, while the formulation with the double exponential distribution, has relativebioactivity ρ (xS) = exp {α1 − α2}. These relative bioactivities are of course identical, since thetwo formulations are connected by βi = exp {αi}.

Page 38: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

26 2.3. Hypothesis testing

2.3 Hypothesis testing

In mathematical terms, similarity means that the relative bioactivity is independent of dose,that is, ρ (xS) = ρ0. It is an important aspect of bioassay analysis (USP < 1030 >), since wefrequently do not know at which dose (xS) in the bioassay we need to calculate the relativebioactivity ρ (xS), although we might know the biological properties of the standard preparationat a certain nominal dose when it is applied to subjects. Dose levels used in the bioassaymight be substantially lower than dose levels used in practice, in particular in pharmaceuticalindustry where medications are tested on cell material or on animals, while the medication itselfis intended for humans. Translating a nominal dose of the standard preparation for subjects inpractice to just one dose in the bioassay is impossible without a study that would link biologicalresponses in the bioassay to biological responses of subjects in practice. If such a translationexists, the dose xS in the bioassay would be known and the relative bioactivity of the testpreparation with respect to the standard preparation in the bioassay can be calculated withρ (xS) in (2.6).

For these situations there is no need for either traditional (see Section 2.3.1) or equivalence(see Section 2.3.2) hypothesis testing, since we are only interested in ρ (xS). However, for allother cases we need to demonstrate similarity in the bioassay, since it is the minimal requirementthat the test preparation behaves biologically as the standard preparation at all doses in allcases. Consequently, similarity in the bioassay may suggest that only the constituent or activeingredient in the standard and test preparation drives the biological response and that otherfactors like impurities in the test preparation are not influencing the biological effect (Finney,1978).

2.3.1 Traditional hypothesis testing

The relative bioactivities of the quantal and quantitative bioassays in (2.2)-(2.5), can be made alldose independent if we put constraints on the set of parameters α1, α2, β1, and β2, additional tothe earlier formulated restrictions γ1 = γ2, δ1 = δ2, and η1 = η2. The additional restrictions areβ1 = β2 for the log dose metameter (ψ (x) = log (x)) and α1 = α2 for the power dose metameter(ψ (x) = xη). The dose-independent relative bioactivities are then ρ0 = exp {(α1 − α2) /β2} andρ0 = (β1/β2)1/η for these two dose metameters, respectively. Knowing the dose-independentrelative bioactivities, the dose-dependent relative bioactivities can now be written in generalterms like ρ (xS) = gθ (xS) ρ0, with θ = (θ1, θ2) and θi as defined earlier at the beginning of Section2.2. For the log dose metameter, the function gθ becomes gθ (xS) = exp {(β1/β2 − 1) log (xS)}and for the power dose metameter the function becomes gθ (xS) = [(α1 − α2 + β1x

ηS)/(β1x

ηS)]1/η

(when γ1 = γ2, δ1 = δ2, and η1 = η2 = η hold true). The traditional null and alternativehypothesis for similarity can now be formulated as

Page 39: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2.3. Hypothesis testing 27

H0 : ∀x, gθ (x) = 1, H1 : ∃x, gθ (x) 6= 1. (2.8)

For the parallel line bioassay in (2.2) and the quantal bioassay in (2.3), the null hypothesisin (2.8) reduces to H0 : β1 = β2. For the slope-ratio model in (2.2), the null hypothesis isH0 : α1 = α2 ∧ η1 = η2. The null hypothesis for the five parameter quantitative bioassay in(2.5) is given by H0 : β1 = β2, γ1 = γ2, δ1 = δ2 ∧ η1 = η2 with the latter equation dropped forthe four parameter dose-response relationship (i.e., H0 : β1 = β2, γ1 = γ2 ∧ δ1 = δ2). Finally,quantal bioassay (2.4) does not need a test for similarity, since the relative bioactivity is alwaysdose-independent and similarity is guaranteed automatically.

2.3.2 Equivalence testing

The form ρ(xS) = gθ(xS)ρ0, with ρ0 the dose-independent relative bioactivity, and the formula-tion of the traditional null hypothesis in (2.8), both suggest that equivalence might be formulatedas

H0 : ∃x, |log(gθ(x))| > ∆ H1 : ∀x, |log(gθ(x))| ≤ ∆, (2.9)

with ∆ the predefined acceptance criterion. When the null hypothesis in (2.9) would be rejected,we would have demonstrated (with high confidence) that the relative bioactivity ρ (xS) of thetest preparation with respect to the standard preparation falls within the interval[ρ0 exp {−∆} , ρ0 exp {∆}] for every dose x. One reasonable choice for the acceptance criterionis to take the usual bioequivalence limits ∆ = 0.22314 ≈ log (1.25) = − log (0.8). In terms of(2.9), equivalence on similarity would imply that the test preparation is bioequivalent for eachdose x with a standard preparation having bioactivity ρ0 (shortly referred to as bioequivalentwith the standard preparation).

The problem or issue with equivalence testing on similarity in (2.9) is that the function gθ(xS)is typically unbounded for most bioassays. Depending on the parameter values θ = (θ1, θ2), eitherlimxS→∞ gθ (xS) = ∞ or limxS↓0 gθ (xS) = ∞ holds true. For example, the function gθ (xS) =exp {(β1/β2 − 1) log (xS)}, applicable to the parallel line model and to the probit and logit dose-response relationships in quantal or quantitative bioassays, converges to infinity when the dosageincreases to infinity whenever the constraint β1 > β2 is satisfied. Thus we cannot guarantee ingeneral that the relative bioactivity ρ (xS) remains within interval [ρ0 exp {−∆} , ρ0 exp {∆}] forall doses xS.

A possible solution to this issue, is that one could easily select a window or interval IB of

Page 40: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

28 2.3. Hypothesis testing

doses with finite lower and upper boundary xL and xU , respectively, that is, −M < log (xL) <log (xU) < M . The interval IB can be either open (xL, xU), half open (xL, xU ] or [xL, xU), orclosed [xL, xU ], but in all cases the function gθ (x) will be bounded on the interval IB. Equivalencetesting for similarity is then still formulated by (2.9), but now with the restriction that the dosesare limited to a finite interval:

H0 : ∃x ∈ IB, |log(gθ(x))| > ∆ H1 : ∀x ∈ IB, |log(gθ(x))| ≤ ∆. (2.10)

The null hypothesis in (2.10) can now be tested by constructing a two one-sided 90% confi-dence interval on log (gθ (xB)), with xB either the lower or upper boundary of interval IB, andit is rejected if the interval is fully contained in [−∆,∆]. It is common practice to select 90%confidence intervals for bioequivalence testing since equivalence testing is essentially a one-sidedhypothesis test executed twice, once for each boundary (FDA, 2001). This choice coincides witha type I error rate of 5%.

Choosing an appropriate interval may not be straightforward, but two practical solutions maybe considered. One option is to choose an interval IB equal to the dose range IH = [x11, x1m1 ]on the horizontal axis that is used in the bioassay for the standard preparation. Equivalencetesting (2.10) on the interval IH would then imply that the test preparation is (bio)equivalentto the standard preparation in the bioassay. This might be a suitable solution for the parallelline model and the slope-ratio model, but not for sigmoid dose-response functions. The functiongθ (xS) for sigmoid dose-response functions will typically deviate strongly from one at doses xSfor which the expected biological response of the standard preparation is either at the lower orat the upper asymptote of the sigmoid dose-response relationship. This is illustrated in Figure2.1. Indeed, at the ED50 the relative bioactivity is equal to one, since the probability of 0.5 isattained at the same dose for both preparations. Around this dose the relative bioactivity is stillclose to one, but when the biological outcome increases to the upper asymptote (or decreases tothe lower asymptote) the relative bioactivity strongly deviates from one.

Instead of using a dose interval IH at the horizontal axis, one may define a range of doses IVwith constraints on the expected biological outcome on the vertical axis, by choosing responsesthat are further within the asymptotes. One may select an expected biological range [yL, yU ]defined by the standard through [yL, yU ] = [δ1+(γ1 − δ1) p, δ1+(γ1 − δ1) (1− p)], with p ∈ (0, 1).This results in a dose range IV that is dependent on the parameters α1, β1, and η1 and includesthe inverse dose-response relationship, that is,

IV =[ψ−1([F−1

η1 (p)− α1]/β1), ψ−1([F−1η1 (1− p)− α1]/β1)

].

The choice of p is somewhat arbitrary, but we believe that p = 0.05 or p = 0.01 might be areasonable choice. Figure 2.1 shows a dose range IV (within the vertical lines) for a quantal

Page 41: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2.4. Case studies 29

bioassay with a logistic dose-response relationship for p = 0.05.

Figure 2.1: A dose range for equivalence testing derived from the standard preparation for arange of expected biological outcomes away from the asymptotes.

2.4 Case studies

Different data sets are used to illustrate the concept of equivalence testing (2.10) on the relativebioactivity. Data from parallel line, slope-ratio, and quantal bioassays were selected from Finney(1978), but for a quantitative sigmoid cell-based bioassay data was obtained from Merck Sharp &Dohme (MSD), Oss, The Netherlands. These cases provide an ideal mix of old but well-acceptedbioassays and modern type of in vitro bioassays for medicinal release. The statistical analysisof the bioassay data is conducted with maximum likelihood estimation using the NLMIXEDprocedure of the SAS software, version 9.4. This procedure can fit nonlinear mixed effects models(Vonesh, 2012) and is capable of directly estimating gθ (x) or other functions of parametersderived from (2.10) with their corresponding confidence intervals based on the delta method.The results for the data from Finney (1978) may be slightly different from the results in Finney(1978), since he used mostly least squares.

2.4.1 Parallel line bioassays

Using the interval IB = IH for the parallel line assay, equivalence testing (2.10) results in adirect criterion on the ratio of the slopes of the two regression lines. The alternative hypothesis

Page 42: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

30 2.4. Case studies

for equivalence testing then becomes

1−∆/ log (x0) ≤ β1/β2 ≤ 1 + ∆/log(x0), (2.11)

with x0 being equal to the upper bound x0 = x1m1 of IH . Equivalence testing on the ratio of theslopes has been suggested before (Hauck et al., 2005), but the alternative suggestion to applyequivalence testing on |β1−β2| the difference in slopes (Hauck et al., 2005; USP < 1032 >, 2010)seems less natural in this scenario. Furthermore, alternative hypothesis (2.11) in combinationwith a bioequivalence acceptance criterion demonstrates immediately the biological relevance ofequivalence testing in bioassay analysis of parallel line assays. Implementing equivalence testingin parallel line models is now straightforward through the use of a confidence interval on theratio β1/β2, using either Fieller’s theorem or the delta method.

The data in Table 4.2.1 on page 70 of Finney (1978) presents percentage bone ash on 55 chick-ens of cod-liver oil for vitaminD3. The standard preparation is administered at three doses (5.76,9.6, and 16 BSI units/100g food) and the test preparation at four doses (32.4, 54, 90, and 150mgoil/100g food). Fitting Model (2.2) with dose metameter zij = log (xij) resulted in maximum like-lihood parameter estimates α1 = −102.47 [−148.98,−55.97], α2 = −239.63 [−300.76,−178.51],β1 = 75.13 [54.73, 95.54], and β2 = 73.37 [59.15, 87.58]. The residual variance is estimated as458.91 [283.53, 634.28] and the likelihood ratio test for equality of slopes does not reject par-allelism (P = 0.887). Using the delta method for a 90% confidence interval on β1/β2 resultsin 1.024 [0.739, 1.309]. The bioequivalence acceptance criterion at the maximum dose of 16BSI/100g for the standard preparation, provides the acceptance interval of [0.919, 1.081] forequivalence testing (2.11). Although similarity is not rejected, the parallel line assay cannotdemonstrate bioequivalence of the relative potency for the test preparation with respect to thestandard preparation in the bioassay.

A minimal number of test units to confirm bioequivalence in the current design can beobtained by the confidence interval on β1/β2. The asymptotic standard error τ1i of the max-imum likelihood estimator βi for linear Model (2.2) is given by τ 2

1i = ni·σ2/[ni·

∑mij=1 nijz

2ij −

(∑mij=1 nijzij)2], with zij = log (xij) and ni· the total number of chickens used for preparation

i. The maximum likelihood estimators β1 and β2 are uncorrelated, since we fit a linear re-gression model for each preparation separately and the estimate of the standard deviation isuncorrelated with these estimates. Applying the delta method to the estimate β1/β2, resultsin a standard error equal to τ 2

R = τ 211/β

22 + τ 2

12β21/β

42 and an approximate confidence interval

of [β1/β2 − 1.65τR, β1/β2 + 1.65τR], with τR the maximum likelihood estimate of τR. Using abalanced design (nij = n) with the same set of doses (m1 = 3, m2 = 4), and the observedestimates from the analysis, the number of chickens per dose is at least equal to n = 98. Thus toshow bioequivalence of the test preparation with the standard in the bioassay, requires at least686 chickens, which is more than double the current amount.

Page 43: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2.4. Case studies 31

Note that this sample size is truly a minimum and may not result to equivalence with highconfidence in practice. The reason is that the sample size was calculated under the assump-tion that similarity holds and it ignored the uncertainty of the estimator of the standard errorin the confidence interval. A small simulation study of the parallel line model with the esti-mated parameters (rounded to integers) suggest a sample size of 1100 chickens per dose to haveapproximately 80% confidence that equivalence is demonstrated.

2.4.2 Slope-ratio bioassays

The function gθ (x) = [(α1−α2 +β1xηS)/(β1x

ηS)]1/η for the slope-ratio bioassay does not in general

lead to a simple criterion on just a few parameters like the ratio of slopes for the parallel linebioassay, unless the power transformation η for the dose metameter would be known (typicallyη = 1). In this particular case, equivalence testing for the slope-ratio bioassay is determinedby criteria on the ratio (α1 − α2) /β1. As a result, the alternative hypothesis in (2.10) for doserange IH is then determined by

xη0(exp(−∆η)− 1) ≤ α1 − α2

β1≤ xη0(exp(∆η)− 1), (2.12)

with x0 being equal to the lower bound x0 = x11 of IH . Note that slope-ratio assays mayadminister blank doses, but a blank dose cannot be the lower bound of IH since a blank dose in(2.12) would lead to the traditional null hypothesis of similarity in a slope-ratio assay. Thus x0

would be the minimum non-blank dose for the standard preparation used in the bioassay.Data on a slope-ratio bioassay of nicotinic acid in meat was provided by Finney (1978) in

Table 7.3.1 on page 150. Besides the blank dose, five doses were administered for the standardpreparation (ranging from 0.05 to 0.25 µg/tube) and three doses for the test preparation. Dupli-cate assay tubes were prepared for each dose. Details on the assay can be found in Finney (1978).A linear regression model with dose as the predictor (η = 1) was suggested. The parameter esti-mates with maximum likelihood were determined as α1 = 1.77 [1.56, 1.98], α2 = 2.04 [1.61, 2.48],β1 = 30.40 [29.15, 31.65], and β2 = 2.85 [2.57, 3.13], when the blank dose was ignored. Theresidual variance is estimated as 0.017 [0.004, 0.030] and the likelihood ratio test for equalityof intercepts does not seem to reject similarity (P = 0.242). The 90% confidence interval for(α1 − α2) /β1 with the delta method is obtained at −0.009 [−0.022, 0.004]. The acceptance rangefor bioequivalence in (2.12) at minimum dose x0 = 0.05 is determined as [−0.0125, 0.0125], whichis too narrow to guarantee bioequivalence of the test preparation with the standard preparationin the slope-ratio bioassay.

When the power value η is known, the confidence interval with the delta method on ratio(α1 − α2) /β1 can be constructed similar to the approach used for the parallel line assay, althoughit is a little bit more elaborate. The asymptotic standard error τ0i of the maximum likelihood

Page 44: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

32 2.4. Case studies

estimator αi is given by τ 20i = σ2(∑mi

j=1 nijz2ij)/[ni·

∑mij=1 nijz

2ij− (∑mi

j=1 nijzij)2], with zij = xηij andni· the total number of test tubes used for preparation i. The asymptotic standard error τ1i formaximum likelihood estimator βi is again given by τ 2

1i = ni·σ2/[ni·

∑mij=1 nijz

2ij − (∑mi

j=1 nijzij)2].Since α1 and β1 are both involved in the calculation of (α1 − α2) /β1 we also need to considerthe covariance between the estimates αi and βi. The asymptotic covariance is given by τ01i =−σ2(∑mi

j=1 nijzij)/[ni·∑mij=1 nijz

2ij−(∑mi

j=1 nijzij)2]. The standard error τR for maximum likelihoodestimator (α1 − α2) /β1 based on the delta method is now given by

τ 2R =

(τ 2

01 + τ 202

)/β2

1 − 2τ011 (α1 − α2) /β31 + τ 2

11 (α1 − α2)2 /β41 .

For a balanced design (nij = n) with the same set of doses (m1 = 5, m2 = 3) excluding theblank dose, the number of test tubes per dose is calculated as n = 16 for acceptance criterion[−0.0125, 0.0125] when the estimates of the case study are used. To demonstrate bioequivalenceof the test preparation with the standard preparation in the bioassay, requires at least 128 testtubes, which is 8 times more than the current amount. Again, this is an underestimation of thesample size for equivalence because the variation is assumed fixed.

2.4.3 Quantal bioassays

Equivalence testing for the quantal bioassays of the form (2.3) leads to the exact same equivalencecriterion (2.11) of the parallel line assay when the interval IB is determined by the dose range ofthe standard. If instead the dose range is determined by a restriction on the biological outcome,the alternative hypothesis for equivalence testing becomes

∆/F−1 (p) ≤ β−12 − β−1

1 ≤ ∆/F−1 (1− p) , (2.13)

with F the standard normal or logistic distribution function (or any other parameter-free sym-metric distribution function). The criterion for equivalence testing now results in a difference ofthe inverse slopes.

The insulin assay by the mouse convulsion method described on page 375 of Finney (1978) isa quantal bioassay. The data is presented in Table 18.2.1 on page 376. The standard preparationis administered at nine doses (ranging from 0.0034 IU to 0.0280 IU) and the test preparationat five doses. The number of mice tested at each dose varied from 31 to 40, with an averagenumber of 35.5 mice per dose. Fitting a two-parameter logistic curve instead of (2.3) results in themaximum likelihood parameter estimates α1 = −5.79 [−7.26,−4.32], α2 = −6.01 [−8.14,−3.87],β1 = 2.40 [1.80, 3.00], and β2 = 2.12 [1.36, 2.89], when dose was multiplied by a factor 1000 inthe statistical analysis. The null hypothesis for similarity (i.e., H0 : β1 = β2) was not rejectedwith the likelihood ratio test (P = 0.551). The ratio of slopes was estimated as β1/β2 = 1.13

Page 45: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2.4. Case studies 33

and the delta method provided a 90% confidence interval of [0.72, 1.54]. The bioequivalenceacceptance criterion for (2.11) at the maximum dose of 28 (=1000× 0.0280 IU) was determinedat [0.933, 1.067], which is substantially narrower than the confidence interval on the ratio ofslopes. The difference in inverse slopes β−1

2 − β−11 was determined as 0.053 [−0.111, 0.218] and

equivalence test (2.13) with proportion p = 0.05 and bioequivalence criterion ∆ = 0.22314provides an acceptance range of [−0.076, 0.076]. Thus bioequivalence of the relative potency ofthe test preparation with respect to the standard could not be guaranteed in the insulin quantalbioassay, even though similarity was not rejected.

The appropriate minimal number of mice needed in this quantal bioassays to achieve equiv-alence, can be derived through its confidence interval. The standard error τR of maximumlikelihood estimator β−1

2 − β−11 based on the delta method is given by τ 2

R = τ 211/β

41 + τ 2

12/β42 , with

τ1i the asymptotic standard error for estimator βi. This standard error is given by

τ 21i = SSi (0) /

[SSi (0)SSi (2)− SS2

i (1)],

with SSi (q) = ∑mij=1[nijzqijf 2

ij/(Fij(1 − Fij))], Fij = F (αi + βizij), fij = f(αi + βizij), and zij

the log dose metameter. The function F is the free parameter (logistic or normal) distributionfunction, and f is its corresponding density. Assuming a balanced design (nij = n) with thesame set of doses (m1 = 9, m2 = 5), the number of mice per dose is calculated at n = 148for acceptance criterion [−0.076, 0.076] when the estimates of the case study are used. Todemonstrate bioequivalence of the test preparation with the standard preparation on the range[0.05, 0.95] of event rates in the bioassay, requires at least 2072 mice.

2.4.4 Quantitative sigmoid bioassays

When the dose metameter is restricted to the logarithmic transformed dose (ψ (xij) = log (xij))and the dose range IV is taken from a restricted range of expected biological outcomes [δ1 +(γ1 − δ1) p, δ1 + (γ1 − δ1) (1− p)], the alternative hypothesis for equivalence testing in (2.10) isof the general form

−∆ ≤ 1β2F−1η2

(γ1 − δ2

γ2 − δ2− γ1 − δ1

γ2 − δ2q

)− 1β1F−1η2 (1− q) ≤ ∆, (2.14)

with q equal to either p or 1 − p, depending on which choice maximises the absolute value ofthe expression in (2.14). In case the lower and upper asymptotes for the standard and testpreparation are identical, equivalence reduces to

−∆ ≤(β−1

2 − β−11

)F−1η2 (1− q) ≤ ∆. (2.15)

Page 46: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

34 2.4. Case studies

which is almost identical to (2.13) and it would be equal to (2.13) only when Fη would besymmetric around its median. However, we may not put the inverse distribution F−1

η2 (1− q) onthe left and right hand side of the inequalities, as in (2.13), because the distribution function stillcontains a (set of) parameter(s) that must be estimated. Thus for the five-parameter logisticcurve, (2.15) is different from (2.13), even when the lower and upper asymptotes for the twopreparations are identical. The difference is not just caused by the difference in formulation ofthe equivalence criterion, it is also different in the construction of the confidence interval.

Table 2.1: Bioassay response of a standard and test preparation in a luciferase bioassay atdifferent doses on four well-plates

DoseStandard Preparation

DoseTest Preparation

Plate 1 Plate 2 Plate 3 Plate 4 Plate 1 Plate 2 Plate 3 Plate 41 3980 4124 4228 3996 4 3988 4540 4076 40523 3904 4276 4268 4128 11 4388 4376 4668 42408 4668 5516 5084 5004 26 4984 5116 4528 465212 9236 9992 8240 7548 42 6904 7780 6864 515620 14208 15924 12008 11896 67 13552 13320 11100 1030431 23252 25712 22868 17660 108 22644 21540 19688 1584450 29524 35840 26620 24996 173 27812 30304 25348 2254880 34220 42944 34000 34872 276 35512 40536 33796 34548129 38616 42436 40690 38512 442 38848 40972 37408 38124515 48044 53020 44512 36860 1767 44632 47252 41672 362722058 48196 52808 45508 40876 7067 45440 49639 44664 39944

A confidence interval on (β−12 −β−1

1 )F−1η2 (1− q) is more elaborate than on (β−1

2 −β−11 ), due to the

power parameter η2 that must be estimated. However, for the four-parameter logistic or probitquantitative bioassays, equivalence testing can be conducted with (2.13) when the asymptotesof the preparations are identical.

An in vitro luciferase cell-based assay on 96 well-plates (eight rows and twelve columns)was developed for a hormonal treatment product. Both the standard and test preparationswere tested at eleven non-blank doses on only one row of the well-plate but repeated on fourwell-plates to obtain one relative potency. Although multiple test preparations can be tested inone bioassay run, we provided only data of one test preparation in Table 2.1. To estimate therelative bioactivity of the test preparation with respect to the standard preparation, we applieda non-linear mixed effects model on the logarithmic transformation of the bioassay response,that is,

log (yijk) = δi + (γi − δi)exp {αi + βi log (xij)}

1 + exp {αi + βi log (xij)}+ uk + eijk, (2.16)

Page 47: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2.4. Case studies 35

with yijk the observed bioassay response of preparation i for dose j at plate k, uk ∼ N (0, σ2P )

a normally distributed random effect of plate, eijk ∼ N (0, σ2R) a normally distributed random

error term within plate, and the remaining parameters as defined before. We restricted the lowerand upper asymptotes to be identical across preparations (γ1 = γ2 and δ1 = δ2), since possibleobserved differences are most likely not a biological issue, but rather a design effect of the waythat the preparations are allocated to fixed rows on the plate.

The maximum likelihood parameter estimates of the four parameter logistic dose-response re-lationship are given by α1 = −5.58 [−6.09,−5.07], α2 = −8.30 [−9.08,−7.53], β1 = 1.88 [1.71, 2.05],β2 = 1.91 [1.73, 2.09], δ = 8.28 [8.20, 8.37], and γ = 10.68 [10.60, 10.76]. The within plate vari-ance σ2

R was estimated as 0.0069 [0.0048, 0.0091] and the between plate variance σ2P was estimated

as 0.0054 [0, 0.0135], which is almost the same as the within plate variability. The number ofdegrees of freedom was selected equal to the number of observations minus the number of esti-mated parameters (df = 80). The fit of the dose-response relationships and the observed datais visualised in Figure 2.2.

Figure 2.2: Observed responses and estimated dose-response curves for the quantitative sigmoidbioassay.

Equivalence testing for the selected model can be performed with (2.13), since we assumedidentical asymptotes for the preparations and we chose the logistic curve. The difference ininverse slopes (β−1

2 − β−11 ) was estimated at −0.009 with a 90% confidence interval equal to

[−0.056, 0.038]. This confidence interval fits completely within the acceptance range [−0.076, 0.076]of (2.13) when a bioequivalence criterion is selected with a proportion p = 0.05. Thus the stan-dard and test preparation are now considered bioequivalent on the dose range IV = [4.1, 93.8]that is determined by the range of logarithmically transformed bioassay outcomes [log (yL) , log (yU)]

Page 48: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

36 2.4. Case studies

= [8.4, 10.6]. However, bioequivalence cannot be guaranteed on the whole dose range [1, 2058] ofthe standard preparation nor on the dose range IV = [1.7, 225] which belongs to bioequivalencewith p = 0.01, since acceptance criterion in (2.11) and (2.13) is not met on these dose ranges,respectively.

The function gθ at dose 2058 for the test preparation with respect to the standard preparationgθ (2058) = 0.880 ∈ [0.8, 1.25]. This suggests that bioequivalence of the two preparations mightbe attainable on the full dose range [1, 2058] if the number of plates and possibly the numberof repeats on a plate is increased. To investigate this we conducted a simulation study, sincea direct formula for the sample size based on the non-linear mixed effect model in (2.16) issomewhat difficult. Model (2.16) was simulated with twelve doses using an equidistant log doserange from 0 to 8.25. The parameters in the simulation were taken from the case study, butwe rounded the fixed effects parameter estimates to one decimal and the variance parameters tothree decimals.Table 2.2: Percentages of correctly claiming bioequivalence in a quantitative four-parameterlogistic dose-response relationship

Sample sizes Bioequivalence on dose rangePlates Repeats [4.0,94] [2.0,225] [1.0,2058]

1 4 46.4% 0% 0%2 4 88.0% 34.0% 0%

41 46.4% 0% 0%4 99.4% 79.8% 0%

81 88.0% 34.0% 0%4 100% 98.0% 0%

16 1 99.4% 78.2% 0%

321 100% 98.0% 0%4 100% 100% 65.6%

64 4 100% 100% 96.6%100 4 100% 100% 100%

Model (2.16) was simulated 500 times and at each time we estimated β1/β2 and β−12 − β−1

1 withits 90% confidence interval. For the ratio β1/β2, the proportion of simulations for which theconfidence interval was contained in [1 − ∆/ log (2058) , 1 + ∆/ log (2058)] was recorded. Forβ−1

2 − β−11 we compared the confidence interval both with [∆/F−1 (0.05) ,∆/F−1 (0.95)] and

[∆/F−1 (0.01) ,∆/F−1 (0.99)]. We varied the number of plates (1, 2, 4, 8, 16, 32, 64, and 100)and repeats on a plate (1 or 4) in the simulations to determine a reasonable sample size. Theresults are presented in Table 2.2.

The simulation study demonstrates that bioequivalence of the two preparations on the fulldose range of the standard preparation is only obtained with high confidence when 64 plates are

Page 49: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2.5. Discussion and conclusion 37

included. Both the standard and the test preparation need to be measured at four wells for eachdose. On the smaller dose range [2.0, 225], which corresponds to the range of logarithmicallytransformed expected biological outcomes [8.31, 10.66], we still need approximately eight plateswith multiple repeats or a little less than 32 plates with just one repeat. For the smallestdose range [4.0, 94], for which we observed bioequivalence in the case study, we actually needapproximately two plates with four repeats or possibly four plates with two repeats to claimbioequivalence with high confidence.

2.5 Discussion and conclusion

When a test preparation is administered to subjects (e.g., humans) at a specific dose, it isimportant to know its expected biological effect at this dose. If the test preparation can becompared to a biologically well-characterised standard preparation at this specific dose in abioassay, the relative bioactivity of the test preparation with respect to the standard can beestimated directly. There is no need to investigate the biological property of similarity for thetest preparation, since we are just interested in one dose. When the test preparation cannotbe compared to the standard at the intended dose, since substantial lower doses are requiredin the bioassay, the test preparation must be similar to the standard preparation. This makesthe relative bioactivity dose independent and gives some guarantee that the test preparationwould behave biologically similar to the standard at other and higher doses. The prediction ofthe biological activity at the intended dose will be questionable if similarity is violated. Thusa rejection of similarity in a bioassay should always be treated with the utmost care, since itmight have devastating consequences about dose levels used in practice which could result in awithdrawal of the drug.

Violation of similarity in bioassays can also be caused by other factors, for instance, imperfectbioassay designs, incorrect statistical analysis, and data artefacts. In vitro bioassays are oftenconducted on 96 well-plates and randomization of doses and repeats to individual wells is prob-ably the most appropriate procedure to deal with systematic differences in rows and columns.However, this is not always possible and rows are fully dedicated to one treatment. Testingfor similarity may detect row differences that are unrelated to biological differences and thistype of violations should not be considered a serious violation of similarity. Eliminating suchsystematic differences from traditional hypothesis testing and from equivalence testing seemsacceptable. Furthermore, if the statistical analysis is not addressing all sources of variation, thestandard errors might be underestimated and similarity may be rejected more frequently thanwhat is expected from the selected significance level. For instance, repeats in bioassay analysisare not always replicates, but merely repetitions of the same test tube or sample. In that case,the pure residual error is underestimated and should not be used for testing similarity. Finally,

Page 50: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

38 2.5. Discussion and conclusion

outliers in bioassays are not uncommon and they may disturb the estimates dramatically andthus also the test statistics. Wells on a 96 well-plate do not always “catch-on” and thereforeshow inactivated responses. Such outliers should always be eliminated, to be able to performthe correct calculations. Moreover, a bioassay design should also include enough replicates toproperly evaluate outliers and be able to correct for them. Non-biological issues with similaritydo not imply that equivalence testing is preferred over traditional hypothesis testing; they aremerely a sign of negligence.

A strong argument in favour of equivalence testing is that failure to reject similarity in abioassay does not imply that the two test preparations are similar. We proposed to investigatepossible differences in the two test preparations with calculation of the relative bioactivity atall doses and formulated equivalence directly on the relative bioactivity. However, it was shownthat the relative bioactivity converges to infinity when the dose converges either to infinity orto the blank concentration. This implies that equivalence testing in bioassays is theoreticallyimpossible, since no matter how much similarity is violated, there is always a dose xM that wouldmake the relative bioactivity larger than the valueM . Thus, even the smallest differences in dose-response functions between the test and standard preparation, that would almost be impossibleto detect with traditional hypothesis testing, may theoretically cause biological issues at theintended dose of the test preparation. This is a disturbing conclusion, since equivalence testingon similarity has been promoted strongly and has been accepted in the pharmaceutical industrythrough its implementation in the chapters of the United States Pharmacopoeia.

We circumvented this issue by restricting equivalence testing to a finite dose interval thatwould contain the most relevant range of doses in the bioassay. We also implemented thebioequivalence criterion, since this would fit nicely with a clinical interpretation of new medic-inal products that are not similar to the original product but do provide an equivalent clinicalbioactivity. By restricting the dose interval we were able to claim bioequivalence of the testpreparation with the standard preparation in the bioassay. Although our approach seems theo-retically sound, sample sizes for bioequivalence in the bioassay seem to be unrealistic in practice.This was clearly illustrated by four case studies. Thus equivalence testing for similarity seemsboth theoretically and practically unrealistic. The alternative of traditional hypothesis testingmight not be any better either, which puts the issue of similarity in bioassay in an uneasy split.It is our advice to consider the issue of similarity on a case by case basis and then decide whichapproach would be most reasonable. This might not be just a statistical issue.

Page 51: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

2.5. Discussion and conclusion 39

Appendix

In this appendix, we give the derivation of the standard errors used in sample size functions inappearing in the case studies. These sample sizes are for the parallel line, slope-ratio, and quantalbioassays. The log-likelihood and its derivatives are given. The variance of each parameter istaken as the corresponding element of the inverse of the Fisher’s information matrix.

Parallel line bioassay

`(αi, βi|Yijk) = −mi∑j=1

nij∑k=1

[(Yijk − αi − βizij)2

2σ2

].

The derivatives and expected score functions are given by

Sβi =mi∑j=1

nij∑k=1

zij

[(Yijk − αi − βizij)

σ2

],

Sαi =mi∑j=1

nij∑k=1

[(Yijk − αi − βizij)

σ2

],

ES2βi

=mi∑j=1

nij∑k=1

z2ij

σ2 ,

ES2αi

= ni.σ2 ,

E[Sαi , Sβi ] =mi∑j=1

nij∑k=1

zijσ2 .

The Fisher’s information matrix is given by the elements ES2βi, ES2

αi, ESαiβi and the variances

are given by the inverse of this information matrix.

σ2βi

= ni.σ2

ni.∑mij=1 nijz

2ij − (∑mi

j=1 nijzij)2 ,

σ2αi

=σ2∑mi

j=1 nijz2ij

ni.∑mij=1 nijz

2ij − (∑mi

j=1 nijzij)2 .

Slope-ratio bioassay

For the slope-ratio bioassay, the variance of αi is similar to the one above for the parallel linebioassay

Page 52: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

40 2.5. Discussion and conclusion

σ2αi

=σ2∑mi

j=1 nijz2ij

ni.∑mij=1 nijz

2ij − (∑mi

j=1 nijzij)2 .

Quantal bioassay

For this bioassay Yijk follows a Bernoulli distribution and the likelihood is given by

`(αi, βi|Yijk) =mi∑j=1

nij∑k=1

[YijklogΦij + (1− Yijk)log(1− Φij)] ,

with the score functions given by

Sβi =mi∑j=1

nij∑k=1

Yijk − Φij

Φij(1− Φij)logxij,

Sαi =mi∑j=1

nij∑k=1

Yijk − Φij

Φij(1− Φij),

ES2βi

=mi∑j=1

nij∑k=1

(logxij)2

Φij(1− Φij),

ES2αi

=mi∑j=1

nij∑k=1

1Φij(1− Φij)

,

E[Sαi , Sβi ] =mi∑j=1

nij∑k=1

logxijΦij(1− Φij)

.

The variance is given by

σ2βi

=∑mij=1 nij/(Φij(1− Φij))∑mi

j=1 nij(logxij)2/Φ2ij(1− Φij)2 −

(∑mij=1 nijlogxij/Φij(1− Φij)

)2

Page 53: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

REFERENCES 41

References

Callahan, J. and Sajjadi, N. (2003), “Testing the Null Hypothesis for a Specified Difference: TheRight Way to Test for Parallelism,” Bioprocessing Journal, 2, 71–78.

Dann, R. S. and Koch, G. G. (2008), “Methods for One-sided Testing of the Difference BetweenProportions and Sample Size Considerations Related to Non-Inferiority Clinical Trials,” Phar-maceutical Statistics, 7, 130–141.

FDA (2001), “Guidance for Industry: Statistical Approaches to Establishing Bioequivalence,”Tech. rep., Food and Drug Administration, CDER.

Finney, D. J. (1978), Statistical Method in Biological Assay, London: Charles Griffin & Co. Ltd.

Gottschalk, P. G. and Dunn, J. R. (2005), “Measuring Parallelism, Linearity, and RelativePotency in Bioassay and Immunoassay Data,” Journal of Biopharmaceutical Statistics, 15,437–63.

Hauck, W., Capen, R., Callahan, J., De Muth, J. E., Hsu, H., Lansky, D., Sajjadi, N., Seaver, S.,Singer, R. R., and Weisman, D. (2005), “Assessing Parallelism Prior to Determining RelativePotency,” PDA Journal of Pharmaceutical Science and Technology, 59, 127–137.

IJzerman-Boon, P. C. and Van den Heuvel, E. R. (2015), “Validation of Qualitative Microbio-logical Test Methods,” Pharmaceutical Statistics, 14, 120–128.

Julious, S. a. and Owen, R. J. (2011), “A Comparison of Methods for Sample Size Estimationfor Non-inferiority Studies With Binary Outcomes,” Statistical Methods in Medical Research,20, 595–612.

Ricketts, J. and Head, G. (1999), “A Five-parameter Logistic Equation for Investigating Asym-metry of Curvature in Baroreflex Studies,” American Journal of Physiology, 277, 441–454.

Ridout, M. S., Fenlon, J. S., and Hughes, P. R. (1993), “A Generalized One-hit Model forBioassays of Insect Viruses,” Biometrics, 49, 1136–1141.

USP < 1030 > (2010), “Biological Assay Chapters-Overview and Glossary,” Tech. rep., UnitedStates Pharmacopeia.

USP < 1032 > (2010), “Design and Development of Biological Assays,” Tech. rep., United StatesPharmacopeia.

USP < 1033 > (2010), “Biological Assay Validation,” Tech. rep., United States Pharmacopeia.

Page 54: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

42 REFERENCES

USP < 1034 > (2010), “Analysis of Biological Assays,” Tech. rep., United States Pharmacopeia.

Volund, A. (1978), “Application of the Four-parameter Logistic Model to Bioassay: ComparisonWith Slope Ratio and Parallel Line Models,” Biometrics, 34, 357–365.

Vonesh, E. F. (2012), Generalized Linear and Nonlinear Models for Correlated Data: Theoryand Applications Using SAS, Cary, NC: SAS Institute Inc.

Page 55: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

CHAPTER 3

A comparison of statistical methods for combiningrelative bioactivities from parallel line bioassays

The original version appears in Pharmaceutical Statistics (2013); 12: 375-384

Page 56: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

44

Abstract

This chapter compares the ordinary unweighted average, weighted average, and maximum like-lihood methods for estimating a common bioactivity from multiple parallel line bioassays. Someof these or similar methods are also used in meta-analysis. Based on a simulation study thesemethods are assessed by comparing coverage probabilities of the true relative bioactivity andthe length of the confidence intervals computed for these methods. The ordinary unweightedaverage method outperforms all statistical methods by consistently giving the best coverageprobability but with somewhat wider confidence intervals. The weighted average methods givegood coverage and smaller confidence intervals when combining homogeneous bioactivities. Forheterogeneous bioactivities these methods work well when a liberal significance level for test-ing homogeneity of bioactivities is used. The maximum likelihood methods gave good coveragewhen homogeneous bioactivities were considered. Overall, the preferred methods are the ordi-nary unweighted average and two weighted average methods that were specifically developed forbioassays.

Keywords: bioactivity; biological assays; heterogeneity; meta-analysis

Page 57: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

3.1. Introduction 45

3.1 Introduction

Biological assays, commonly known as bioassays, are designed to estimate the potency or bioac-tivity of an unknown test preparation against a known standard preparation (Finney, 1978;Laska and Meisner, 1987). Bioassays are frequently used in pharmaceutical industry for therelease of medicinal products to the market. Recently, the United States Pharmacopeia (USP)updated their general guidelines on bioassay analysis, which clearly indicates its importance andrelevance USP < 1034 >. For some (hormonal) medicinal products, specific chapters of the USPrequire a maximum level on the precision of the bioactivity (indicated by half the length of the95% confidence interval in the logarithmic scale) for their products. However, single bioassayruns may not provide this precision although the estimate of the bioactivity may be well withinspecifications. Thus, to improve the precision of the estimate a series of bioassay runs on thesame preparation may be performed and then the individual estimates need to be combined intoone estimate (Laska and Meisner, 1987).

Combining estimates from a series of experiments has a long tradition in statistics. However,the methodology for combining bioassay data has received little attention in the last threedecades. On the other hand, meta-analysis, which became the general term for combiningexperiments about four decades ago (Glass, 1976) is currently a very active area of research.Meta-analysis typically focuses on the combination of parameter estimates from linear regression,logistic regression, or survival analyses. Under the normality assumption of parameter estimates,a fixed or random effects model is applied to estimate a common parameter (Borenstein et al.,2009). In this context, the performance of many methods have been investigated in several papers(Brockwell and Gordon, 2001; Sánchez-Meca and Marín-Martínez, 2008; Sidik and Jonkman,2005).

Combining bioactivities from parallel line models is in one aspect more complex than a typicalmeta-analysis, because combining bioactivities involves combining ratios of the linear regressionparameter estimates rather than the estimates themselves. The theoretical distribution of theseratios is no longer normal, albeit the parameter estimates are themselves approximately normallydistributed. Thus, the performance of statistical methods that may work well for typical meta-analyses, may have a diminished performance when applied to relative bioactivities from parallelline bioassays.

In meta-analysis, weighted average methods are strongly recommended (Borenstein et al.,2009), and one particular promising method, which could handle heteroscedastic standard er-rors and which uses (restricted) maximum likelihood estimation, is based on the random effectsmodel on the individual estimates (Hardy and Thompson, 1996). Alternatively, weighted av-erage methods, which were especially developed for combining bioactivities, are those by Bliss(1952), Cochran (1954), and Morse and Bickle (1967). They differ in their approach towards

Page 58: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

46 3.1. Introduction

heterogeneity and heteroscedasticity of bioactivities but they all use the moments estimatesinstead of maximum likelihood. Three alternative maximum likelihood methods are those byArmitage et al. (1976), Williams (1978), and Meisner et al. (1986), and they use a fixed effectsmodel for the parallel line bioassays. They were developed for testing homogeneity of bioactivi-ties across multiple bioassays. An alternative approach towards these fixed effects models is toutilise random or mixed effects models (Laird and Ware, 1982) and apply a random coefficientmodel to the raw data of the parallel line models. All of these methods may be applicable, butthey are not the simplest methods to apply. The USP prefers the ordinary unweighted averagemethod, although it describes alternative approaches as well. All these approaches are describedin detail in the next sections. Alternative approaches like Bayesian analysis or bootstrapping(Chen et al., 1999; Chen, 2007) are beyond the scope of this chapter.

3.1.1 Motivating example

For the release of three batches (A, B, and C) of a hormonal medicinal product, six bioassayruns were performed with an in vivo bioassay. For each bioassay run, the biological responseof each of the three batches and the standard preparation were observed in three-fold for threeseparate doses. The parallel line model (discussed in Section 3.2) was applied to estimate thelogarithmically transformed relative bioactivities of the three batches (Table 3.1) against thestandard. These bioactivities were combined with the aforementioned methods (described indetail in Section 3.3 and 3.4) and accompanied with their 95% confidence intervals (Table 3.2).

Table 3.1: Individual bioactivity estimates with their standard errors (SE)Bioassay run Batch A Batch B Batch C(dfh = 12) Estimate SE Estimate SE Estimate SE

1 -0.027 0.060 -0.089 0.069 -0.104 0.0252 -0.021 0.026 0.105 0.072 -0.004 0.0513 0.001 0.080 0.004 0.066 -0.018 0.0334 -0.040 0.091 0.025 0.061 0.033 0.0355 0.007 0.061 0.046 0.068 0.077 0.0256 -0.105 0.037 -0.032 0.050 0.020 0.044

None of the statistical methods rejected homogeneity of the relative bioactivities for batch Aat the significance level of α = 0.5. This means that the differences in bioactivity estimates fromdifferent bioassay runs are not larger than what can be expected from the standard errors of thebioactivities. On the other hand, Cochran (1954) showed for batch A that the within-bioassayvariances were heteroscedastic (p-value<0.05); that is, the standard errors across bioassay runswere not constant. For batch B, the relative bioactivities and the within-bioassay variances werefound to be homogeneous by all methods at the level of α = 0.05. For batch C though, all

Page 59: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

3.1. Introduction 47

methods demonstrated heterogeneous relative bioactivities at the significance level of α = 0.05,and Cochran (1954) also gave evidence against homoscedastic within-bioassay variances at thesame significance level.

The results in Table 3.2 indicate that the combined bioactivity estimates for all batchesare approximately similar in all methods. For the Williams (1978) method, disjoint confidenceintervals were obtained where one segment included the estimate of the bioactivity in its rangeand the other segment was completely out of range of all the estimates. The computation of theconfidence interval for batch C with the Meisner et al. (1986) method resulted in an empty set asa result of a highly significant p-value against homogeneity of the individual relative bioactivityestimates. Results from the Hardy and Thompson (1996) method are more comparable to theresults from the weighted average methods.

Comparing the lengths of the 95% confidence intervals suggested that some methods gavemuch wider intervals than other methods. For batch A, the method of Morse and Bickle (1967)resulted in the smallest interval whilst that of Meisner et al. (1986) had the largest interval.For batch B, again the Morse and Bickle (1967) method had the smallest interval but now therandom coefficients model had the largest confidence interval. Lastly, the Morse and Bickle(1967) method had the largest interval for batch C, and the methods by Armitage et al. (1976)and Williams (1978) had the smallest intervals.

The example does not demonstrate clearly which method is preferred over another. It is thusdifficult to choose from these methods without additional information on their performance.Possibly the only conclusion that can be drawn from this example is that the maximum likelihoodmethods of Williams (1978) and Meisner et al. (1986) do not seem to provide appropriateconfidence intervals in particular when heterogeneous bioactivities are observed.

Table 3.2: Combined relative bioactivity estimates with their 95% confidence interval

MethodBatch A Batch B Batch C

µ [95% CI] µ [95% CI] µ [95% CI]Ordinary -0.031 [-0.073 ; 0.012] 0.010 [-0.060 ; 0.080] 0.001 [-0.063 ; 0.065]Bliss -0.039 [-0.082 ; 0.004] 0.004 [-0.055 ; 0.064] -0.001 [-0.064 ; 0.062]Cochran -0.039 [-0.081 ; 0.004] 0.010 [-0.043 ; 0.063] 0.001 [-0.051 ; 0.052]Morse -0.039 [-0.075 ; -0.002] 0.004 [-0.047 ; 0.056] -0.000 [-0.069 ; 0.069]Armitage -0.034 [-0.081 ; 0.013] 0.016 [-0.037 ; 0.069] 0.000 [-0.030 ; 0.031]

Williams-0.034 [-0.081 ; 0.013] 0.016 [-0.037 ; 0.069] 0.000 [-0.030 ; 0.031]

[8.56 ; 10.40] [-17.59 ; -10.82] [39.77 ; 100]Meisner -0.034 [-0.111 ; 0.043] 0.016 [-0.060 ; 0.092] 0.0004 [not applicable]Hardy -0.039 [-0.077 ; -0.001] 0.004 [-0.047 ; 0.056] -0.0006 [-0.068 ; 0.067]RCM -0.032 [-0.076 ; 0.011] 0.013 [-0.089 ; 0.114] 0.0005 [-0.030 ; 0.031]

Page 60: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

48 3.2. Parallel line bioassays analysis

3.1.2 Objectives

The main objective of this chapter is to assess the coverage probabilities and the length of theconfidence intervals for a common bioactivity for the selected methods using extensive simulationstudies. We believe that this is the first time that some of these approaches have been comparedby simulation in the setting of combining parallel line bioassay data. For a few methods it isthe first time that they have been applied to this setting at all. The goal is to find the mostefficient method that can be used to estimate a common relative bioactivity and construct asmall but appropriate confidence interval to help practitioners select the best possible methodin combining bioactivities.

3.2 Parallel line bioassays analysis

Suppose H bioassay runs are considered where for each bioassay say h, the response is obtainedfrom I = 2 preparations (standard and unknown/test preparations). The standard preparationwill be denoted with i = 1 and the unknown preparation with i = 2. Each preparation is dilutedinto Jhi concentrations. The total number of concentrations in assay h is denoted by Jh. Theresponse yhijk at concentration j is measured Khij(≥1) times. The total number of observationsin assay h is Kh.. = Kh1. +Kh2., with Khi. the number of observations per preparation i, whereKhi. = Khi1 +Khi2 + . . .+KhiJhi .

For each bioassay run h a linear relationship between the logarithm of the concentration xhijand the response yhijk is assumed:

yhijk = αhi + βhixhij + εhijk, (3.1)

with αhi the intercept for preparation i in bioassay h, βhi the slope for preparation i in bioassayh, and εhijk the residuals where εhijk ∼ N(0, σ2

0h). The variance of the residuals σ20h is the

variability due to differences observed in measurements within each bioassay. The dose-responserelationship is typically of the non-linear S-shaped form, but in routine analysis it is very oftenthat the linear part of this curve is assayed and approximated with a linear relationship.

In order to be able to define the relative bioactivity of a test preparation against a standardpreparation the concept of parallelism is required (Finney, 1978). This implies that the slopesfor preparations in Model (3.1) are equal within each bioassay, that is, βh1 = βh2 = βh. As astandard procedure for any statistical analysis, before inference could commence, assumptionsmade prior to the analysis should be checked, since any violation of some of them may resultin statistical invalidity (Kutner et al., 2005). For instance, linearity and parallelism shouldbe verified to affirm the appropriateness of the statistical methodology used (Finney, 1978). Ifthese assumptions are not violated, the logarithmic relative bioactivity µh of the test preparation

Page 61: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

3.3. (Un)weighted average methods 49

against the standard preparation is estimated as

µh = αh2 − αh1

βh, (3.2)

where αhi and βh are the least squares estimates for αhi and βh. From hereon, the logarithmicrelative bioactivity will be referred to as the bioactivity. In accordance with Finney (1978) theapproximate standard error for µh is given by

σh = σ0h

√√√√ν11(h)− 2µhν12(h) + µ2hν22(h)

β2h

, (3.3)

with ν11(h), ν12(h), and ν22(h) as known values which are determined from the design of thebioassay run, number of repetitions, and the number of concentrations used (Finney, 1978). Thestandard error in (3.3) is typically estimated by σh by substituting µh for µh and σ2

0h for σ20h

defined as the unbiased estimator of σ20h. On the aggregate level, we would have the following

information (µh, σh, dfh) with dfh the appropriate number of degrees of freedom (dfh = Kh..−Jh..).

3.3 (Un)weighted average methods

The combined estimator µ is a weighted average of the estimators µ1, µ2, . . . , µH and it is givenby

µ =H∑h=1

ωhµh/H∑h=1

ωh, (3.4)

where µh is defined as in (7.1b) and ωh as the estimated weight for estimator µh from bioassayh. A 100(1-α)% confidence interval for the weighted estimator µ is in general of the form

µL, µH = µ± t−1dfc

(1− α/2) · σµ, (3.5)

with σµ an estimator for the standard error of µ, dfc the corresponding degrees of freedom,and t−1

dfc(p) the pth percentile of the t-distribution with dfc degrees of freedom. The simplest

application of equations (3.4) and (3.5) is an ordinary unweighted average with ω = 1, dfc =H − 1, and

σµ =

√√√√ H∑h=1

(µh − µ.)/(H(H − 1)).

Page 62: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

50 3.3. (Un)weighted average methods

It should be noted that the ordinary unweighted average occurs naturally when we assume thatthe µh satisfy the random effects model µh = µ+αh + εh with αh ∼ N(0, σ2

B) and εh ∼ N(0, σ2)all independently distributed. The methods of Bliss (1952), Cochran (1954), and Morse andBickle (1967) are described in detail hereafter.

3.3.1 Bliss

Combination of individual bioactivities depends on the homogeneity of these estimates (Bliss,1952). If the bioactivities are homogeneous, then the confidence interval for the combinedestimate depends on the pooled within-bioassay variance (σ2

h). But if these estimators are nothomogeneous, additional variation between the bioactivity estimators exists, and this variationis also used to calculate the confidence interval for the combined bioactivity. The test statisticused by Bliss (1952) to test for homogeneity of bioactivities is given by

χ2 = (H − 1) +

√√√√df . − 4df . − 1

·(

(df . − 2)WSS

df .− (H − 1)

), (3.6)

with WSS given by

WSS =H∑h=1

(µ2h/σ

2h

)−

( H∑h=1

(µ2h/σ

2h))2

/H∑h=1

(1/σ2

h

) ,and df . = ∑H

h=1 dfh/H, that is, the average number of degrees of freedom and with all othervariables defined as before. The test statistic in (3.6) is approximately χ2-distributed with H−1degrees of freedom. The method of Bliss (1952) does not examine or test for the homoscedasticityof the within-bioassay variances.

Homogeneous bioactivity estimates: The common bioactivity is estimated as a weightedaverage with weights defined as the inverse of the estimated within-bioassay variances, that is,ωh = 1/σ2

h. The estimated standard error of this weighted average is

σµ =

√√√√ df .[H(df . − 2) + 8](df . − 2)[H(df . − 4) + 12]ω.

with ω. =H∑h=1

ωh

and the corresponding number of degrees of freedom is df c = df .

(∑Hh=1 σ

2h

)2/∑Hh=1 σ

4h, where

df . is as defined before.

Page 63: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

3.3. (Un)weighted average methods 51

Heterogeneous bioactivity estimates When some between-bioassay variability (σ2B) exists

and is detected by the test statistic in (3.6), it is estimated by

σ2B = σ2

T − σ2P , (3.7)

with σ2T = ∑H

h=1

(µh −

(∑Hh=1 µh/H

))2/(H − 1) and σ2

P = ∑Hh=1 σ

2h/H. The estimator σ2

P isthe average within-bioassay variance based on σ2

1, . . . , σ2H . The common relative bioactivity µ is

calculated by choosing weights ωh = (σ2B + σ2

h)−1, and its standard error is estimated as

σµ = 1/

√√√√ H∑h=1

ωh, (3.8)

with dfc = H − 1 the corresponding number of degrees of freedom.

3.3.2 Cochran

Cochran (1954) distinguishes between homogeneous and heterogeneous bioactivity estimatesand between homoscedastic and heteroscedastic within-bioassay variances. Cochran (1954) usesBartlett’s test statistic to test for homoscedasticity of the within-bioassay variances, where underthe null hypothesis of equal within-bioassay variances this test statistic is approximately χ2-distributed with H − 1 degrees of freedom. This test is based on the error variances (σ2

0h) of theparallel line model (3.1).

If this test indicates that the within-bioassay variances are homoscedastic, the standard F -test (i.e., F = σ2

T/σ2P∗, where σ2

P∗ = ∑Hh=1(dfh · σ2

h)/∑Hi=1 dfh) is used to investigate homogeneity

of the bioactivity estimates (µh). Under the null hypothesis of homogeneity, this test statistic isapproximately F -distributed with H − 1 and df. degrees of freedom where df. is the sum of thedegrees of freedom for H runs: df. = H · df ..

If homoscedasticity is rejected, another F -test is used to test for homogeneity of the bioactiv-ity estimates (i.e., F = σ2

T/σ2P , with σ2

T and σ2P as defined in (3.7)). Under the null hypothesis of

homogeneous bioactivity estimates, this test statistic is also approximately F -distributed. How-ever, the varying precision does not allow the use of the tabulated F -distribution, but Cochran(1954) suggested an approximation with numerator and denominator degrees of freedom ν1 andν2 given by

ν1 = (H − 1)2σ4P

(H − 2)H−1∑Hh=1 σ

4h + σ4

P

and ν2 =

(∑Hh=1 σ

2h

)2

∑Hh=1(σ4

h/dfh)

Page 64: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

52 3.3. (Un)weighted average methods

Homogeneous bioactivity estimates and homoscedasticity: In this case, the combinedbioactivity is estimated as an unweighted average by setting ωh = 1, and its standard error is:

σµ =

√√√√∑Hh=1 dfh · σ2

h + (H − 1) · σ2T

H(∑Hh=1 dfh +H − 1)

,

with corresponding degrees of freedom dfc = ∑Hh=1 dfh +H − 1.

Homogeneous bioactivity estimates and heteroscedasticity: The weights are computedas the reciprocal of the variances, that is, ωh = 1/σ2

h. The standard error of the combinedbioactivity estimate is

σµ =

√√√√√ 1ω.

1 + 4(H − 1)ω2.

H∑h=1

ωh(ω. − ωh)H(dfh − 4)− (dfh − 8)

with ω. =H∑h=1

ωh

where df c = ω2. /∑Hh=1 ω

2h/dfh is the corresponding number of degrees of freedom. Cochran (1954)

notes that when the number of degrees of freedom is small, the ordinary unweighted averagewith its corresponding standard error is preferred, but this will be ignored further.

Heterogeneous bioactivity estimates and homoscedasticity: The ordinary unweightedaverage is computed and its standard error given by σµ =

√σ2B/H with corresponding degrees

of freedom dfc = H − 1.

Heterogeneous bioactivity estimates and heteroscedasticity: A semi-weighted averageof Bliss (1952) is computed. The weights are ωh = (σ2

B + σ2h)−1 with σ2

B as defined in equation(3.7). The corresponding standard error is computed as in equation (3.8).

3.3.3 Morse and Bickle

Morse and Bickle (1967) proposed a different estimator for the between-bioassay variability. Itis given by

σ2B0 =

H/(H − 1)

H∑h=1

1σ2h

H∑h=1

µ2h

σ2h

( H∑h=1

µhσ2h

)2/

H∑h=1

1σ2h

− σ2P∗,

with σ2P∗ as defined before. If the aforementioned estimate is less than or equal to zero, it

is assumed that the estimates for the relative bioactivities are homogeneous and thus, σ2B0

is truncated to zero. With this in mind, weights are now determined, in general, as ωh =(σ2

B0 + σ2h)−1, and the standard error is calculated as the reciprocal of the square root of the sum

Page 65: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

3.4. Maximum likelihood methods 53

of the weights (as was performed in Bliss (1952)’s method). This estimator replaces σ2B from Bliss

(1952) method. Morse and Bickle (1967) argued that this estimator is better especially whenthe bioactivity estimates are extremely heterogeneous and their standard errors have varyingdegrees of freedom. The corresponding number degrees of freedom for the standard error of thecombined bioactivity estimator is given by

df c =

(σ2B0 + σ2

P )2/

σ4B0

1 +H+ σ4

P

(2 +

H∑h=1

dfh

)−1− 2.

Note that, if σ2B0 is set to zero, the degrees of freedom reduces to dfc = ∑H

h=1 dfh.

3.4 Maximum likelihood methods

In this section the maximum likelihood methods are summarised, three of these methods arebased on a fixed effects model and two of them are based on a random effects model.

3.4.1 Armitage, Bennett, and Finney

The likelihood function is formulated assuming constant variance for the residuals over bioassays.Armitage et al. (1976) consider the pairs (βh, yh2..− yh1..) accompanied with one pooled residualvariance

σ20 =

H∑h=1

dfh · σ20h/

H∑h=1

dfh,

where σ20h is the estimated variance from the parallel line model. The distribution of (βh, yh2..−

yh1..) is bivariate normal and the pooled residual variance is related to a χ2-distribution. When weaccept that the parameters µ1, . . . , µH can be unequal, then the maximum likelihood estimators(MLEs) are given by the individual bioactivity estimates µ1, . . . , µH from Section 3.2. Armitageet al. (1976) defined two composite hypotheses; the first hypothesis is HH : µ1 = µ2 = · · · =µH = µ0, where µ0 is a known parameter. The second hypothesis is HH−1 : µ1 = µ2 = · · · =µH = µ where µ is an unknown parameter, representing the true bioactivity when the individualbioactivity estimates are homogeneous.

The test statistic for homogeneous bioassays is based on a likelihood ratio test (LRT) andLRT/(H-1) (LRT divided by the degrees of freedom) is best approximated as an F -distribution.The maximum likelihood estimates for βh and µ are obtained by maximising the log-likelihoodfunction under the hypothesis HH (i.e., `H). However, the maximum likelihood estimate forµ under the hypothesis HH−1 (i.e., `H−1) is obtained by maximising the same log-likelihoodfunction, LH , with respect to µ in which the known parameter µ0 is replaced by µ. The resulting

Page 66: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

54 3.4. Maximum likelihood methods

expressions are solved iteratively to give the maximum likelihood estimates µ and β.This µ minimises a function J(µ):

J(µ) =∑ (

βh(µ− zh)−Dh

)2

uh(µ− zh)2 − 2ωh(µ− zh) + νh, (3.9)

where Dh is the difference of the mean responses for the two preparations per bioassay (Dh =yh2.. − yh1..), zh is the difference of the mean log-doses/concentrations for the two preparationsin each bioassay (zh = xh1.. − xh2..), and νh, ωh, uh are known constants that depend on thedesign of the study (Armitage et al., 1976). Values of µ for which the inequality J(µ) ≤J(µ)+ σ2

0 ·F−11,f (1−α) holds provide an approximate confidence region for µ under the hypothesis

HH−1. Failing to reject the null hypothesis of homogeneous estimates leads to disjoint segmentsof the confidence regions.

3.4.2 Williams

Williams (1978) provided an exact confidence region to the approximate approach of Armitageet al. (1976), and this is based on a regression analysis through the origin. This exact confidenceregion is given by the inequality (Williams, 1978, p. 660):

(∑Hh=1 WhShTh

)2

∑Hh=1 WhS2

h

≤ σ20 · F−1

1,f (1− α), (3.10)

whereWh = (νh−2ωh(µ−zh)+uh(µ−zh)2)−1, Sh = [(ν−ωh(µ−zh))βh+(uh(µ−zh)−ωh)Dh]Wh,Th = Dh − βh(µ− zh), and with the remaining parameters defined as in Armitage et al. (1976).

3.4.3 Meisner, Kushner, and Laska

Meisner et al. (1986) present a LRT based on Wilk’s statistic for homogeneous bioactivities asan alternative to that of Armitage et al. (1976). It uses the statistic J(µ)/Hσ2

h, with J(µ) thefunction that is minimised in µ. Under the null hypothesis that the bioactivity estimates arehomogeneous (HH−1 : αh2 − αh1 = µβh), this test statistic is F -distributed with H and df.

degrees of freedom for the numerator and denominator, respectively. Here, df. is again the totalsum of degrees of freedom from all bioassays (Section 3.2). When the hypothesis HH−1 is true,the exact interval for µ can then be determined from all values of µ satisfying the inequality

J(µ) < Hσ20 · F−1

H,df.(1− α).

This inequality may lead to a (in)finite interval or disjoint intervals, but, if the null hypothesis

Page 67: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

3.4. Maximum likelihood methods 55

of homogeneous bioactivity estimates is rejected, then the resulting confidence interval could bea set which does not include µ.

3.4.4 Hardy and Thompson

This approach uses a random effects model for the bioactivity estimates: µh = µ + αh + εh,where αh ∼ N(0, σ2

B), εh ∼ N(0, σ2h) are independently distributed, and µ is the true bioactivity.

Testing for homogeneity of the bioactivities (i.e., H0 : σ2B=0), a LRT based on profile likelihoods

is used and the p-value is calculated as a mixture of χ2 distributions (Verbeke and Molenberghs,2000). A likelihood function is used for estimation of µ and σ2

B using the bioactivities µh andthe within-bioassay variances σ2

h with their degrees of freedom dfh. These estimates are obtainediteratively from the following expressions

µ =∑Hh=1(σ2

h + σ2B)−1µh∑H

h=1(σ2h + σ2

B)−1 and σ2B =

∑Hh=1(σ2

h + σ2B)−2[(µh − µ)2 − σ2

h]∑Hh=1(σ2

h + σ2B)−2 . (3.11)

The confidence region for the true bioactivity is based on a profile likelihood to account forthe fact that the between variability (σ2

B) is unknown but estimated from the data (Hardy andThompson, 1996). In this study, we are interested in calculating the confidence intervals for thebioactivity, and this is given by all values of µ satisfying the condition:

`∗1(µ) > `∗1(µ)− χ21/2, (3.12)

where `∗1(µ) is the profile likelihood for the parameter µ (i.e., `∗1(µ) = `(µ, σ2B(µ)) and `(., .)

is the full likelihood function). In addition, the Wald confidence intervals for this method canalso be computed where a critical value of the t-distribution with the appropriate degrees offreedom is used. This method has never been used in the bioassay setting but frequently usedin meta-analysis.

3.4.5 Random coefficient model

The previous methods combine bioactivities from multiple bioassays using averaged data. Alter-natively, individual or raw data can be used in one statistical model to estimate the bioactivityusing a proposed random coefficient model

yhijk = αhi + βhxhij + εhijk, (3.13)

with αh1, αh2, and βh random and jointly normally distributed variables having mean values α1,α2, and ρ and some kind of a covariance matrix. In this model the error terms εhijk are assumedto be independent and normally distributed with mean zero and variance σ2. It is possible to

Page 68: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

56 3.5. Pretests for homogeneity

rewrite Model (3.13) as shown below with a random overall intercept (ηh), a random shift of theunknown preparation (δh) and a random slope (βh),

yhijk = ηh + δh1{2}(i) + βhxhij + εhijk. (3.14)

The indicator variable 1{2}(i) is one if i = 2, and zero otherwise. The distribution of ηh, δh, andβh is assumed to be a multivariate normal distribution with mean (η, δ, β)T and an unstructuredcovariance matrix Σ. Homogeneous bioactivity estimates are more conveniently defined in Model(3.14). Indeed, only if the overall intercept would be random then the bioactivities are consideredhomogeneous. The model simplifies to

yhijk = ηh + δ1{2}(i) + βxhij + εhijk. (3.15)

The parameter ηh is normally distributed with mean zero and variance τ 2h and independent from

the residuals. The shift parameter δ and the slope parameter β are both fixed and determine thebioactivity (δ/β) which is the same for all bioassay runs. The maximum likelihood estimators δand β are used to estimate the combined bioactivity µ = δ/β. Since µ is a ratio of two randomvariables its variance will be computed using the delta method (Rice, 1995). The degrees offreedom are estimated using a Satterthwaite approximation (Satterthwaite, 1946).

3.5 Pretests for homogeneity

Most of the statistical methods for combining bioactivities used in this chapter initially testfor the homogeneity of bioactivities, and some approaches also test for homoscedasticity of thewithin-bioassay variances. In meta-analysis the test for homogeneity of studies is importantbecause if studies are not homogeneous, this possibly indicates that studies cannot be combinedat all (Borenstein et al., 2009). For bioassays, this is not an issue (because all studies areusually performed under well-defined and similar conditions), but it could alter the approach forcombining estimates. Pretesting in an analysis of variance (ANOVA) setting has been discussed,and less conservative significance levels are proposed (Bancroft, 1964). In addition to a commonlyused significance level (α = 0.05), liberal significance levels for pretests will be used in thischapter, and these are α = 0.25 and α = 0.50.

3.6 A simulation study

To gain better insight in the appropriateness of the described statistical methods, a simula-tion study is performed. This simulation study had two parts: a preliminary simulation (pre-simulation) to identify relevant parameter settings with relative small numbers of simulations

Page 69: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

3.6. A simulation study 57

(500) and a large scale simulation (5000) to obtain precise results for only a few interestingparameter settings. Only the results of the large scale simulation study are presented.

3.6.1 Design of the simulation study

The simulated individual data yhijk are based on Model (3.13), and the mean values of theintercepts and the slope per bioassay are simulated from a multivariate normal distribution withmean parameters (α1, α2, β)′. Different assumptions are made about α1 and α2, while keeping βfixed at one and this is done by varying values of these parameters. The first assumption is thatα1 = α2 = α = 0, second α1 = 0, α2 = 0.2, third α1 = 0, α2 = 0.5, and fourth α1 = 0.2, α2 = 0.5,where β = 1 in all cases. Variances for the intercepts for both preparations are assumed to beequal, that is, σ2

α1 = σ2α2 = σ2

α. Four levels of σ2α and σ2

β are used: no variation (σ2α = 0, σ2

β = 0),small variation (σ2

α = 0.4σ2, σ2β = 0.4σ2), moderate variation (σ2

α = 0.8σ2, σ2β = 0.8σ2), and large

but not extreme variation (σ2α = 2σ2, σ2

β = 2σ2).The residual variance, σ2, is assumed to be σ2 = 0.01, and this is roughly what was obtained

from the case study. For each individual response, the residuals εhijk are simulated from anormal distribution with mean zero and variance σ2. Furthermore, the intercepts αh1 and αh2

are assumed to be either uncorrelated or positively correlated. Two levels of this correlation(ρα1α2) will be used, namely 0 and 0.6.

The number of concentrations (log-dose) Jhi is kept equal in all simulations, that is, Jhi = 3 forboth preparations per bioassay run. The response is measured five times for each concentration(Khij = 5). The number of bioassay runs that are combined is assessed by varying this numberfrom two to six. For each simulated run a 95% confidence interval of the true bioactivity willbe determined using all the described statistical methods. The performance of these methods isbased on the coverage probability, that is, the probability that the true relative bioactivity fallswithin the 95% confidence interval.

The length of the confidence interval will be used to assess the performance of these statisticalmethods in terms of precision. A small interval is indicative of a small standard error which ismore precise than a confidence interval with a large interval (large standard error) given thatthe same method has good coverage.

All statistical analyses were performed in SASr 9.2. For the Hardy and Thompson (1996)method, a SAS macro ‘marandom.sas’ by Senn et al. (2011) was used but modified accordinglyto fit the estimation of bioactivities.

3.6.2 Simulation results

The preliminary simulation demonstrated that the coverage probability is mainly affected by thevariance σ2

α and the correlation ρα1α2 . Furthermore, in the pre-simulation study, combining two

Page 70: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

58 3.6. A simulation study

to three bioassay runs using the random coefficients model resulted in computational difficulties.Thus, for the main simulation, this method was considered only when combining at least fourbioassay runs. In the pre-simulation study, the three maximum likelihood methods (Armitageet al., 1976; Meisner et al., 1986; Williams, 1978) demonstrated the desired coverage probabilityof approximately 95% when there was no between variability. Extremely low coverage wasobserved when substantial heterogeneity between bioactivities was present. Thus, in the fullscale simulation these methods were considered only when there is no between variability andwhen there is small between variability (σ2

α = 0.4σ2).The presented results from the Hardy and Thompson (1996) method are based on Wald

confidence intervals, instead of the profile likelihood intervals. The coverage probability withthe profile likelihood was less than that of the Wald confidence intervals, and the profile likelihoodconfidence intervals involve a calculation of σ2

B(µ) for every value of µ and requires substantiallymore computational effort. For the large scale simulation, we present the results only for α1 =α2 = 0 and for the four settings of no, small, moderate and large between-bioassay variability.The results are provided for two to six bioassay runs, that is, H = 2 to H = 6. Results of thecoverage probabilities are presented in Table 3.3, 3.4, 3.5.

Table 3.3: Estimated coverage probabilities for homogeneous bioactivitiesRuns Ordinary Bliss Cochran Morse Armitage Meisner Williams Hardy RCM

α=0.05H=2 95.1 95.5 95.1 95.8 95.0 95.2 95.0 94.3H=3 94.6 95.0 94.2 95.2 94.2 95.2 94.2 93.7H=4 94.9 95.3 95.1 95.5 95.2 95.5 95.2 94.3 96.1H=5 94.9 95.5 95.0 95.5 95.2 95.0 95.2 94.3 96.1H=6 94.5 95.3 94.7 95.2 94.9 95.2 94.9 94.3 95.7

α=0.25H=2 95.1 96.6 96.3 95.8 95.4 97.8 95.4 94.4H=3 94.6 96.1 95.8 95.2 94.5 98.8 94.5 93.8H=4 94.9 96.4 96.4 95.5 95.2 99.4 95.2 94.5 96.9H=5 94.9 96.7 96.5 95.5 95.3 99.4 95.3 94.8 97.1H=6 94.5 96.4 96.1 95.2 94.9 99.5 94.9 93.9 96.5

α=0.50H=2 95.1 97.9 97.7 95.8 95.2 98.4 95.3 99.5H=3 94.6 97.3 97.2 95.2 94.3 99.3 94.3 99.3H=4 94.9 97.3 97.5 95.5 94.7 99.7 94.7 99.2 97.2H=5 94.9 97.7 97.5 95.5 95.0 99.7 95.0 98.8 97.6H=6 94.5 96.9 96.9 95.2 94.6 99.8 94.6 97.5 97.0

The first panel of Table 3.3 (α = 0.05) indicates that most statistical methods resulted in

Page 71: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

3.6. A simulation study 59

coverage that is approximately equal to the 95% nominal value. When the significance level αis increased, the weighted average methods (Bliss, 1952; Cochran, 1954; Hardy and Thompson,1996), the Meisner et al. (1986) method, and the random coefficients model increase in coverage,while the remaining methods keep their coverage close to the nominal value. When the individualbioactivities are not homogeneous, but a small between bioassay variability exists among them(σ2

α = 0.4σ2 and ρα1α2 = 0), the ordinary unweighted average method outperforms all otherstatistical methods at significance level α = 0.05 because it gave the best coverage probability(Table 3.4).

Table 3.4: Estimated coverage probabilities when there is some between variabilityRuns Ordinary Bliss Cochran Morse Armitage Meisner Williams Hardy RCM

α=0.05H=2 94.9 74.1 74.7 79.6 52.8 35.8 52.9 71.7H=3 95.1 83.4 84.2 86.5 55.3 23.2 55.5 82.4H=4 95.3 88.6 88.7 89.5 55.3 15.9 55.3 87.8 91.2H=5 95.2 91.6 91.7 90.9 54.8 9.8 54.9 90.4 91.8H=6 94.6 92.2 92.6 91.4 54.0 6.7 54.1 91.8 92.9

α=0.25H=2 94.9 83.8 83.8 79.6 52.5 61.3 52.5 75.5H=3 95.1 91.5 92.0 86.5 56.3 65.6 56.3 87.0H=4 95.3 93.7 93.8 89.5 53.0 67.2 53.0 90.8 94.5H=5 95.2 94.5 94.5 90.9 54.4 65.9 54.4 92.3 95.1H=6 94.6 94.4 94.3 91.4 51.9 67.2 51.9 92.9 95.0

α=0.50H=2 94.9 90.6 90.4 79.6 53.7 62.3 53.7 99.9H=3 95.1 94.6 94.6 86.5 55.4 69.5 55.4 96.8H=4 95.3 95.3 95.3 89.5 55.0 70.4 55.0 94.6 95.4H=5 95.2 95.1 95.1 90.9 50.5 72.0 50.5 93.8 95.8H=6 94.6 94.6 94.6 91.4 52.5 71.9 52.5 93.5 95.4

The weighted average methods resulted in coverage probabilities within the range of 74% -93% which is much lower than the anticipated 95%. Even worse results were observed for themaximum likelihood methods using a fixed effects model. These coverage probabilities are inthe order of 55% for Armitage et al. (1976) and Williams (1978). Meisner et al. (1986) methodresulted in the lowest coverage, as low as 7% when the number of bioassays was six. This is alsotrue but to a lesser extent for the methods of Armitage et al. (1976) and Williams (1978). Thecoverage obtained from Hardy and Thompson (1996) and the random coefficients model are alsoless than the nominal value (95%) and similar to the weighted average methods.

When the pretest significance level is increased to α=0.25 or α=0.50 the coverage probabilities

Page 72: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

60 3.6. A simulation study

improve for the weighted average methods, except for the Morse and Bickle (1967) method, andthe two random effects models. Note that the method of Morse and Bickle (1967) does notapply a pretest on the homogeneity of bioactivities and is thus unaffected by the significancelevels. When the number of bioassays is two, the coverage probabilities remain below the nominalvalue, but better coverage is observed for four or more bioassays at the significance level α = 0.5.However, the method of Hardy and Thompson (1996) does remain somewhat below the nominalvalue. The same phenomenon is observed when the heterogeneity becomes larger σ2

α = 2σ2 andthe correlation ρα1α2 is 0.6 (Table 3.5).

Table 3.5: Estimated coverage probabilities when there is large between variabilityRuns Ordinary Bliss Cochran Morse Hardy RCM

α=0.05H=2 95.0 75.3 75.7 79.5 72.0H=3 95.6 87.0 87.3 87.6 85.2H=4 95.0 91.6 91.9 90.3 90.2 93.3H=5 95.2 93.5 93.7 91.3 91.8 94.3H=6 94.9 93.8 94.1 92.6 92.8 94.7

α=0.25H=2 95.0 84.9 85.0 79.5 76.9H=3 95.6 93.0 93.2 87.6 88.7H=4 95.0 94.2 94.4 90.3 92.0 95.2H=5 95.2 95.0 95.1 91.3 92.6 95.5H=6 94.9 94.7 94.7 92.6 93.1 95.4

α=0.50H=2 95.0 91.4 91.2 79.5 99.9H=3 95.6 95.1 95.1 87.6 95.1H=4 95.0 94.8 94.8 90.3 93.5 95.5H=5 95.2 95.1 95.2 91.3 93.1 95.9H=6 94.9 94.8 94.8 92.6 93.4 95.5

The coverage probabilities indicate how well each statistical method estimates the commonbioactivity, but when different methods have equal coverage probabilities, the shortest intervalis preferred. Because the maximum likelihood approaches for the fixed effects model and themethod of Morse and Bickle (1967) did not provide the appropriate coverage probabilities, we donot focus on these methods anymore for the length of confidence intervals. Table 3.6 presents thelength of the 95% confidence intervals when the bioactivity estimates are homogeneous and whenthere is a slight heterogeneity. For homogeneous bioactivities, the weighted average methodsgave shorter confidence intervals compared with the ordinary unweighted average method, inparticular when α is selected equal to 0.05 and when the number of bioassays is small. For

Page 73: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

3.6. A simulation study 61

larger values of α and four or more bioassays runs, the difference is negligible. The Hardy andThompson (1996) method show smaller intervals than any of the methods, in particular forα = 0.05, but then the coverage is also too low.

Table 3.6: Estimated length of the 95% confidence intervalsRuns Ordinary Bliss Cochran Hardy RCM

Homogeneous bioactivitiesα=0.05

H=2 0.514 0.162 0.168 0.114H=3 0.160 0.097 0.096 0.090H=4 0.107 0.080 0.078 0.077 0.121H=5 0.086 0.071 0.070 0.069 0.085H=6 0.073 0.064 0.062 0.062 0.070

α=0.25H=2 0.514 0.320 0.331 0.154H=3 0.160 0.128 0.131 0.100H=4 0.107 0.096 0.096 0.083 0.148H=5 0.086 0.081 0.081 0.073 0.103H=6 0.073 0.071 0.071 0.065 0.082

α=0.5H=2 0.514 0.455 0.453 0.671H=3 0.160 0.154 0.154 0.188H=4 0.107 0.107 0.108 0.121 0.158H=5 0.086 0.088 0.088 0.095 0.110H=6 0.073 0.075 0.075 0.080 0.087

Heterogeneous bioactivitiesα=0.05

H=2 1.376 1.064 1.09 0.652H=3 0.429 0.382 0.387 0.308H=4 0.283 0.264 0.266 0.230 0.377H=5 0.229 0.221 0.222 0.198 0.278H=6 0.192 0.187 0.188 0.172 0.212

α=0.25H=2 1.376 1.279 1.284 0.801H=3 0.429 0.418 0.419 0.332H=4 0.283 0.279 0.279 0.239 0.396H=5 0.229 0.227 0.227 0.202 0.288H=6 0.192 0.191 0.191 0.174 0.217

α=0.5H=2 1.376 1.350 1.349 1.102H=3 0.429 0.427 0.427 0.361H=4 0.283 0.282 0.282 0.247 0.399H=5 0.229 0.228 0.228 0.205 0.290H=6 0.192 0.191 0.191 0.175 0.218

Page 74: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

62 3.7. Discussion and conclusion

This pattern seems to hold true for any of the simulation settings. When the coverage isapproximately equal to the nominal value, the length of the confidence interval becomes similarto the length of the ordinary unweighted average method.

3.7 Discussion and conclusion

This chapter focusses on the performance of statistical methods in estimating a common bioac-tivity from parallel line bioassays. These methods are the ordinary unweighted average; theweighted average from Bliss (1952), Cochran (1954), and Morse and Bickle (1967); the maxi-mum likelihood for fixed effects models of Armitage et al. (1976), Williams (1978) and Meisneret al. (1986); the maximum likelihood method for a random effects model of Hardy and Thomp-son (1996); and the random coefficient model. For most of these methods, a simulation studythat assesses their performance in estimating a combined relative bioactivity has never beenpublished.

The simulation results indicated that the unweighted average methods gave consistently goodcoverage probabilities for the 95% confidence intervals of a true common bioactivity, and it isby far the simplest method amongst all methods mentioned in this chapter. It even performedwell for only a few bioassay runs and whether there was heterogeneity or not. However, theconfidence intervals obtained using this method tend to be wider than other methods whenbioactivities are truly homogeneous. The major drawback of the unweighted average method isthat it ignores the most important part of the design of the study, that is, it assumed all thebioassay estimates have the same precision which may not be the case in general (Finney, 1978).To account for this drawback, some kind of weighting is vital, and in this chapter, we focusedon three weighted average methods. These weighted average methods gave good coverage whenthe bioactivity estimates were homogeneous and much narrow confidence intervals compared tothe ordinary unweighted average method. Furthermore, existence of some between variabilityamong the bioactivities amounted to a significant drop of the coverage probability, and this ismore pronounced when the number of bioassay runs is at most three. However, using a moreliberal value of the pre-test significance level resulted in an improvement of the coverage exceptwhen combining homogeneous relative bioactivities

The three maximum likelihood methods based on a fixed effects model (Armitage et al.,1976; Meisner et al., 1986; Williams, 1978) resulted in good coverage when the bioactivities werehomogeneous. Furthermore, these methods gave the shortest confidence intervals comparedto the (un)weighted average methods. But lower coverage was observed when heterogeneousbioactivities were considered. This is not surprising, because these methods were essentiallydeveloped for homogeneous bioactivities. As a result the additional variation between bioassaysis not taken into account when the confidence intervals are constructed, although the union of

Page 75: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

3.7. Discussion and conclusion 63

the disjoint intervals should have overcome this problem.A maximum likelihood method based on a random effects model (Hardy and Thompson,

1996), which essentially estimates the common bioactivity also as a weighted average, was alsoconsidered. The main advantage of this method is that it takes into account that the betweenvariability is not known but estimated from the data. This method resulted in a coverageprobability which was a little less than 95% when the combined bioactivities were homogeneous.This coverage probability dropped further as soon as the combined relative bioactivities wereheterogeneous. Increasing the significance level did improve the coverage probability, but itremained somewhat lower than the nominal value (95%), except maybe when the number ofbioassay runs is two. Note that the coverage probabilities presented in this chapter are notbased on the profile likelihood confidence intervals but on the Wald confidence intervals, becausethe coverage of the latter was somewhat larger than the coverage of the former. The methodof Hardy and Thompson (1996) was developed for meta-analysis, and it has been shown tobe better than any other statistical method for meta-analysis (Brockwell and Gordon, 2001).Nevertheless, for bioassays, this method did not yield better results than that of Bliss (1952) orCochran (1954).

Maximum likelihood estimation for the random coefficients model resulted in overly esti-mated coverages when the combined bioactivities were homogeneous. Combining heterogeneousbioactivities resulted in a drop of the coverages much lower than the 95% nominal value. Theuse of liberal significance levels improved this coverage, but in most cases, this method resultedin very wide confidence intervals compared to other methods. The confidence intervals werebased on the delta method and future research may investigate other approximations for theconstruction of confidence intervals.

In conclusion, the ordinary unweighted average method consistently gave better coverageprobabilities but with wider confidence intervals for homogeneous bioactivities. Other statisticalmethods resulted in good coverage for some cases and poor coverage for other cases, thus lackingconsistency. Better coverage probabilities were obtained by relaxing the pretest significance levelin which the weighted average methods, especially those by Bliss (1952) and Cochran (1954),perform similar or better than the ordinary unweighted average method when three or morebioassays should be combined. Most other statistical methods still resulted in too low coverageprobabilities or gave much wider confidence intervals when a liberal pretest significance levelwas used. Thus, both methods by Bliss (1952) and Cochran (1954) are competitive with theordinary unweighted average method when appropriate significance levels are used. Lastly, weobserved that methods that are highly recommended in meta-analysis are not directly the bestmethod for combining bioactivities from the analysis of parallel line bioassays.

Page 76: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

64 REFERENCES

References

Armitage, P., Bennett, B. M., and Finney, D. J. (1976), “Point and Interval Estimation in theCombination of Bioassay Results,” Tropical Medicine, 76, 147–162.

Bancroft, T. (1964), “Analysis and Inference for Incompletely Specified Models Involving theUse of Preliminary Test(s) of Significance,” Biometrics, 20, 427–442.

Bliss, C. I. (1952), The Statistics of Bioassay, New York: Academic Press Inc.

Borenstein, M., Hedges, L. V., Higgins, J. P. T., and Rothstein, H. (2009), Introduction toMeta-analysis, New York: JohnWiley and Sons.

Brockwell, S. E. and Gordon, I. R. (2001), “A Comparison of Statistical Methods for Meta-analysis,” Statistics in Medicine, 20, 825–840.

Chen, D., Carter, E., Hubert, J., and Kim, P. (1999), “Empirical Bayes Estimation for Combi-nations of Multivariate Bioassays,” Biometrics, 55, 1038–1043.

Chen, D. G. (2007), “Bootstrapping Estimation for Estimating Relative Potency in Combina-tions of Bioassays,” Computational Statistics and Data Analysis, 51, 4597–4604.

Cochran, W. G. (1954), “The Combination of Estimates from Different Experiments,” Biomet-rics, 10, 101–129.

Finney, D. J. (1978), Statistical Method in Biological Assay, London: Charles Griffin & Co. Ltd.

Glass, G. V. (1976), “Primary, Secondary and Meta-analysis Research,” Educational Researcher,10, 3–8.

Hardy, R. J. and Thompson, S. G. (1996), “A Likelihood Approach to Meta-analysis WithRandom Effects,” Statistics in Medicine, 15, 619–629.

Kutner, M. H., Nachtsheim, C. J., Neter, J., and Li, W. (2005), Applied Linear Statistical Models,New York: McGraw-Hill/Irwin, 5th ed.

Laird, N. M. and Ware, J. H. (1982), “Random-effects Models for Longitudinal Data,” 38, 963–974.

Laska, E. M. and Meisner, M. J. (1987), “Statistical Methods and Applications of Bioassay,”Annual Review of Pharmacology and Toxicology, 27, 385–97.

Meisner, M., Kushner, H. B., and Laska, E. M. (1986), “Multivariate Combining Bioassays,”Biometrics, 42, 421–427.

Page 77: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

REFERENCES 65

Morse, P. M. and Bickle, A. (1967), “The Combination of Estimates from Similar Experiments,Allowing for Inter-experiment,” Journal of the American Statistical Association, 62, 241–250.

Rice, J. A. (1995), Mathematical Statistics and Data Analysis, Belmont: Wadsworth Inc, 2nded.

Sánchez-Meca, J. and Marín-Martínez, F. (2008), “Confidence Intervals for the Overall EffectSize in Random-effects Meta-analysis,” Psychological Methods, 13, 31–48.

Satterthwaite, F. (1946), “An Approximate Distribution of Estimates of Variance Components,”Biometrics Bulletin, 2, 110–114.

Senn, S., Weir, J., Hua, T. A., Berlin, C., Branson, M., and Glimm, E. (2011), “Creating aSuite of Macros for Meta-analysis in SAS: A Case Study in Collaboration,” Statistics andProbability Letters, 81, 842–851.

Sidik, K. and Jonkman, J. N. (2005), “Simple Heterogeneity Variance Estimation for Meta-analysis,” Journal of the Royal Statistical Society, Series C, 54, 367–384.

USP < 1034 > (2010), “Analysis of Biological Assays,” Tech. rep., United States Pharmacopeia.

Verbeke, G. and Molenberghs, G. (2000), Linear Mixed Models for Longitudinal Data, New York:Springer.

Williams, D. A. (1978), “An Exact Confidence Region for a Relative Potency Estimated fromCombined Bioassays,” Biometrics, 34, 659–61.

Page 78: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016
Page 79: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

CHAPTER 4

Statistical process control methods for monitoringin-house reference standards

The original version appears in Statistics in Biopharmaceutical Research (2015); 7:1 55-65

Page 80: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

68

Abstract

For certain types of medicine the biological strength or bioactivity is the main characteristic forrelease of products to the market. A pharmaceutical company may decide to use their own in-house reference standard to test the drug instead of using the expensive international referencestandard. The company is then legally obliged to verify that the in-house reference standardremains stable over time with respect to the international one. This is a special problem withinstatistical process control (SPC) since the monitoring period is relatively short and bioassays aretypically heterogeneous. The objective of this chapter is to apply methods from SPC and dose-finding studies to different study designs and assess how well these methods perform in detectinga decline in bioactivity. The included methods are the exponentially weighted moving average(EWMA), Shewhart chart, and analysis of variance ANOVA-type contrasts (linear, Helmert,and reverse-Helmert). An optimal α-spending function was selected first to avoid inflating thefamilywise error rate. The normal α-spending function seemed to perform generally the best.Then from the results, the linear, reverse-Helmert, and the EWMA (λ ≤ 0.6) resulted in highpower when a change occurred at earlier time points, while Helmert and the EWMA (λ > 0.6)performed better in later declines. Linear contrasts and the Shewhart chart performed betterirrespective of the decline profile. Having more bioassay runs at the beginning of the stabilitystudy increased the probability of detecting a decline. If there is no prior information on theexpected deterioration profile, linear contrasts or Shewhart chart should be preferred. Otherwise,reverse-Helmert or Helmert contrasts should be chosen for either early or late deterioration,respectively.

Keywords: Bioassay; dose-finding; familywise error; quality control; sequential tests; sta-bility

Page 81: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

4.1. Introduction 69

4.1 Introduction

In the pharmaceutical industry batches of medicinal products are manufactured and released tothe market, and, for certain types of medicine the bioactivity is the main characteristic of thepreparation. This bioactivity measures the biological strength of a drug by comparing it to aninternational reference standard. This international reference standard is quite costly to be usedfor routine purposes. As a consequence, a pharmaceutical company may decide to develop itsown in-house reference standard that has been tested against the international standard to beable to take its place for batch release. The pharmaceutical company is then legally obliged toverify that the in-house reference standard remains stable over time. Therefore, at regular timepoints (e.g., every year), experiments are conducted to reevaluate the estimated bioactivity ofan in-house reference standard with respect to the international reference standard. A smalldecline in the bioactivity of an in-house reference standard is an indication that the in-housereference is deteriorating.

Monitoring a process is a very active area of research commonly referred to as statistical pro-cess control (SPC). In SPC, control charts with appropriate control limits are used to monitorprocess characteristics. SPC will typically lead to improved processes when out-of-control signalscan be identified by special causes and then eliminated from occurring again. Different controlcharts are available for studying the stability of processes. For detecting small changes thecommonly used control charts are the cumulative sum (CUSUM) (Page, 1954) and the exponen-tially weighted moving average (EWMA) (Roberts, 1959). These two control charts are usefulin monitoring a process mean (Aerne et al., 1991; Chang and Fricker Jr, 1999; Gan, 1991) and aprocess variance (Hawkins, 1991; Yeh et al., 2005, 2004, 2003) when the goal is to detect smallchanges. The EWMA is more flexible than the CUSUM since it is robust against the violationof the normality assumption (Montgomery, 2009) and it is easier to calculate its control limits.Properties of the EWMA for monitoring a shift in the process mean have been evaluated (Lucasand Saccucci, 1990; Yashchin, 1987) and λ = 0.2 is the commonly used weighting parameter fordetecting smaller changes of a mean (Lucas and Saccucci, 1990; Montgomery, 2009).

Monitoring the stability of an in-house reference standard is different from general SPC sincethe reference standard has a relatively short shelf life with respect to the number of tests that areconducted to verify its stability, whereas in standard SPC the process is generally assumed to belong term. However, it is possible to have short production runs, and specific statistical methodshave been developed for short-run SPC such as the Q-charts (Crowder, 1992; Del Castillo andMontgomery, 1994; Montgomery, 2009). Unfortunately, these charts do not perform better thanthe traditional SPC methods, for example, the EWMA has been shown to perform better thanthe Q-chart in short-run SPC (Del Castillo and Montgomery, 1994).

Monitoring a reference standard is typically done sequentially and this relates mostly to the

Page 82: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

70 4.1. Introduction

way dose-finding studies are conducted. For such studies, the objective may be to find either theminimally effective dose (MED), the optimal dose, minimum detectable dose (MDD), or severalothers on the basis of smart escalation schemes (Collins et al., 1990). A sequential testingprocedure is thus followed until a change in response with respect to placebo is detected. Asimilar process is carried out when monitoring the bioactivity where the whole analysis is haltedas soon as the null hypothesis of a stable bioactivity is rejected.

A vast literature of applicable methodology to these dose-finding studies exists. For example,analysis of variance (ANOVA), linear regression, and logistic regression can be applied but, forcontinuous outcomes ANOVA is often considered the most favourable since it does not assumea specific dose-response relationship (Ruberg, 1995). ANOVA can be combined with differenttypes of contrasts such as the linear (Ruberg, 1995), Helmert, step, basin (Ruberg, 1989), orreverse-Helmert (Tamhane et al., 1996) to determine a change in the response with respect to thedose escalation schemes. Some of these contrasts are less suitable for monitoring bioactivities,since they require that all data be available at the time of analysis (e.g, step and basin), whereasa sequential approach used for testing the stability of reference standards accumulates dataat each time point. In dose-finding studies, some approaches assume a response that wouldincrease with dose, typically of the form of the logistic link function. These approaches areusually implemented for binary outcomes. For continuous outcomes, like the one being studiedin this chapter, this is less common. The frequency of the number of tests done to assess thestability of the reference standard implies that some control over the familywise error (FWE)rate is necessary. Several approaches have been proposed to adjust the individual α-level tomaintain a pre-specified FWE rate (Lan and DeMets, 1983; Nakamura and Douke, 2007).

Despite the availability of all these methods (control charts, methods for dose-finding, andmultiple testing methods), there is still a need to develop or evaluate a statistical methodologythat is capable of detecting a reduction in the bioactivity of an in-house reference at differentstorage times. Besides the issue of a small predetermined monitoring window, bioassays havetwo aspects that are typically not addressed in these research areas. The first aspect is thatbioactivities are accompanied with a precision measure (i.e., standard error) that provides in-formation on how precise the bioactivity has been observed. The second aspect is that multipleestimates of the same bioactivity at one time point (but also over time points) would typicallyexhibit heterogeneity (Mzolo et al., 2013). This means that the variability between bioactivityestimates is larger than expected from the precision estimates. Thus, these two aspects need tobe incorporated in the available methods to obtain an optimal approach for testing the stabilityof an in-house reference standard.

This article will study the application of the EWMA in monitoring the stability of an in-housereference standard over a prespecified and finite period, taking into account the characteristicsof bioassay data. Motivated by the methodology commonly used in dose-finding studies, the

Page 83: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

4.2. Statistical methods 71

ANOVA-type contrasts will also be included after making the appropriate adjustments to handlebioassay data. The selected contrast methods are the linear, Helmert, and reverse-Helmertcontrasts. Prior to studying the power of the methods, the optimal α-spending function thatstrongly controls the FWE will be chosen. For this, the αt, log, normal (Lan and DeMets,1983), and Bonferroni-Holms (Budde and Bauer, 1989) functions will be compared. Furthermore,different designs, that is, the number of bioassay runs at each time point, will be investigated.The main objective is to find the best combination of the method and design that gives thehighest power in detecting a decline in the bioactivity of an in-house reference standard withinthe given monitoring period under a controlled type I error rate.

The chapter is structured as follows: The statistical methodology used is illustrated in Section4.2, the design of the simulation study and its settings are described in Section 4.3, the resultsfrom the simulation and the example are reported in Section 4.4, and Section 4.5 follows withthe discussion and conclusions.

4.2 Statistical methods

4.2.1 Bioassay experiments

The relative bioactivity of a sample measures the biological strength with respect to a referencestandard. A relative bioactivity of one means that the sample and reference are equally potent.To obtain the relative bioactivity, a bioassay experiment is conducted. Both the sample and thereference are prepared at different concentrations and their biological response is observed ateach concentration. Through the observed data, a dose-response function of the same form isfitted per test sample. When the dose-response functions are on top of each other the relativebioactivity is equal to one, but more generally, the horizontal shift indicates the relative potency.Figure 4.1 illustrates this principle for a parallel line bioassay (Finney, 1978; Mzolo et al., 2013).

Here the mathematical form of the dose-response relation is linear in a certain range ofconcentrations. The bioactivity experiment, which contains two test samples, provides just onerelative bioactivity. For parallel line bioassay, the relative bioactivity in the logarithmic scaleis estimated by X = (αS − αR)/β with αS and αR the intercepts of the regression line forthe sample and reference, respectively, and β the common slope for both parallel regressionlines. The experiment also provides a standard error on the estimate, which is given by S =σ0((η11 − 2η12X + η22X

2)/β2)1/2, with σ0 the residuals standard deviation and the ηij’s knownparameters from the bioassay experiment (Finney, 1978). Additionally, the number of degreesof freedom df that corresponds to σ0 is provided. Thus, one bioassay experiment provides thetripartite (X,S, df).

Page 84: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

72 4.2. Statistical methods

Figure 4.1: Graphical representation of the relative bioactivity from a parallel line bioassay fora known (solide line) preparation and a test (dashed line) preparation

In this chapter, the sample is the in-house reference and the reference standard is the inter-national reference. The information that would become available for evaluation of the stabilityof the bioactivity of an in-house reference is thus the triple (Xij, Sij, dfij), with i = 1, 2, . . . , Iand j = 1, 2, . . . , Ji. Xij is the jth bioactivity at time point ti, Sij is the corresponding standarderror and dfij the accompanying degrees of freedom. The first time point t1 is typically themoment the in-house reference standard is introduced (t1 = 0). This period represents Phase Iin traditional SPC, while all follow-up measurements (t > 0) represent Phase II.

4.2.2 Statistical model and hypotheses

A mixed effects model that is chosen to describe the observed bioactivities is

Xij(ti) = µ(ti) + δij + εij, (4.1)

with µ(ti) a nonincreasing function of time, ti the storage time at the ith evaluation, ti ∈ [0, T ],T the time at which the in-house reference is replaced by another, δij the effect of the jthbioassay run at the ith time point (heterogeneity of the bioactivities) with δij ∼ N(0, σ2

R), andεij the residuals with εij ∼ N(0, σ2

0). It is assumed that δij and εij are mutually independent.Furthermore, it is assumed that Sij are estimates of the error variance component σ2

0, withdfijS

2ij/σ

20 ∼ χ2

dfij.

The evaluation of the stability of the bioactivity of an in-house reference is performed se-

Page 85: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

4.2. Statistical methods 73

quentially and the hypotheses at the kth time point are formulated as

H0,k : µ(t1) = µ(t2) = . . . = µ(tk) (4.2)

, Ha,k : ∃l > 1 s.t µ(tl) 6= µ(t1) ∧ µ(t1) ≥ µ(t2) ≥ . . . ≥ µ(tk),

with l ∈ 2, 3, . . . , k. A decline can typically occur as a sudden shift or as a smooth trend. Itcan happen that after the first time point the bioactivity takes a sudden shift and remains atthat level for the rest of the time points. Another possibility is that the bioactivity remainsstable in the first two time points, but declines soon after the second time point, and so forth.Consequently, the bioactivity can have a constant decline soon after being stored (initial timepoint). When a sudden shift takes place, µ(ti) is given by µ(ti) = µ−∆1[t0,∞)(ti) for some t0 inthe period (0, T ) implying that at a certain time point i the bioactivity is ∆ below the initialbioactivity. If a trend occurs then µ(ti) can take many forms, but it will be assumed that it islinear with µ(ti) = α + βti, β ≤ 0, on the whole interval [0, T ]. In this study two different casesare considered for the run-to-run variability:

(a) Homogeneous bioactivities (σ2R = 0)

The estimate for the residual variance σ20 at time point k is

σ20,k =

(SSWk +

k∑i=1

Ji∑j=1

dfijS2ij

)/(

(J.k − k) +k∑i=1

Ji∑j=1

dfij

)(4.3)

(Cochran, 1954) with SSWk as the within sum of squares at the kth time point given bySSWk = ∑k

i=1∑Jij=1(Xij − X i.)2, X i. the average bioactivity at time point ti, and J.k the

total number of bioactivities up to the kth time point, that is, J.k = ∑ki=1 Ji . The estimate

σ2R,k for the heterogeneity at time point k is then taken equal to zero (σ2

R,k = 0).

(b) Heterogeneous bioactivities (σ2R > 0)

The estimates for the variance components σ20 and σ2

R up to time point k are (Bliss, 1952;Cochran, 1954)

σ20,k =

k∑i=1

Ji∑j=1

dfijS2ij/

k∑i=1

Ji∑j=1

dfij, (4.4)

σ2R,k =

(SSWk/(J.k − k)

)− σ2

0,k. (4.5)

This means that at a particular time point, the variance estimates are computed based on allthe data accumulated up to that time point but eliminating any time effect. The significanceof the heterogeneity (σ2

R,k) at the kth time point may be tested using the statistic of Cochran

Page 86: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

74 4.2. Statistical methods

(1954)

Fk = SSWk/(J.k − k)∑k

i=1∑Jij=1 dfijS

2ij/∑ki=1

∑Jij=1 dfij

. (4.6)

This statistic is approximately F -distributed with J.k − k and nF = ∑ki=1

∑Jij=1 dfij degrees of

freedom. A one-sided critical region [c,∞), with c = FJ.k−k,nF (1− α) is selected. In this article,two scenarios are considered; no pre-testing (NPT) and pretesting (PT) at significance levelα = 0.25 (Bancroft, 1964, and Chapter 3).

4.2.3 Contrasts

Three different contrasts are adopted in this section and these are linear, Helmert, and reverse-Helmert contrasts. In general, a linear contrast approach is defined as a linear combination attime point k of the form

lk =k∑i=1

cikX i. (4.7)

where cik’s are selected contrasts at time point k for the time points i = 1, 2, . . . , k. The contrastscik are selected such that they sum to zero since under the null hypothesis the expected value oflk should be zero. Under the alternative, the expected value would deviate from zero, typicallybelow zero due to a decline in bioactivity. The selected contrasts are formulated as

Linear : cik = Jiti∑ki=1 Ji

− Ji∑ki=1 Jiti

(∑ki=1 Ji)2

Helmert : cik =

Ji/∑Ji, i = 1, . . . , k − 1

−1, i = k

Reverse− Helmert : cik =

1, i = 1−Ji/

∑Ji, i = 2, . . . , k,

with Ji and ti the number of bioactivities and the storage time at time point i, respectively. Thevariance of lk is given by

var(lk) = (σ2R,k + σ2

0,k)k∑i=1

(c2ik/Ji), (4.8)

and the test statistic is given by

T(C)k =

∑ki=1 cikX i.√

(σ2R,k + σ2

0,k)∑ki=1(c2

ik/Ji), (4.9)

Page 87: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

4.2. Statistical methods 75

with σ2R,k and σ2

0,k defined as before. The corresponding number of degrees of freedom is df =J.k − k for heterogeneous bioactivities and df = (J.k − k) + ∑k

i=1∑Jij=1 dfij for homogeneous

bioactivities. The one-sided critical value is −tdf (1 − α) and the null hypothesis is rejected ifT

(C)k ≤ −tdf (1−α). Note that, the linear contrasts are identical to a test of the slope of a linear

regression analysis on the average bioactivities with respect to time. Consequently, the termslinear contrasts and linear regression will be used interchangeably.

4.2.4 The Exponentially Weighted Moving Average

The EWMA statistic for the bioassay data is defined for time points beyond the first time point(k ≥ 2) as

Zk = λXk. + (1− λ)Zk−1, (4.10)

with λ ∈ (0, 1] the weighting parameter and Xk. is the average bioactivity at time point k. Atthe first time point (k = 1) Z1 = X1.. The EWMA statistic can then be rewritten as

Zk = (1− λ)k−1X1.(t1) +k∑r=2

λ(1− λ)k−rXr.(tr). (4.11)

Under the assumption of Model (4.1), this statistic is normally distributed. To evaluate thestability of an in-house reference at the kth time point, the difference Zk − Z1 will be used asthe statistic of interest. After some tedious algebraic calculations the mean and variance for thisstatistic are, respectively given by

E(Zk − Z1) =(

(1− λ)k−1 − 1)µ(t1) +

k∑r=2

λ(1− λ)k−rµ(tr), (4.12)

var(Zk − Z1) = (σ2R + σ2

0)(1− (1− λ)k−1)2

J1+

k∑r=2

λ2(1− λ)2(k−r)

Jr

. (4.13)

When λ = 1 the variance function is given by var(Zk − Z1) = (σ2R,k + σ2

0,k)(J1 + Jk)/(J1 · Jk).The EWMA with λ = 1 is known as a Shewhart chart (Shewhart, 1931) and in this chapter itwill be referred as such. The EWMA test statistic at time point k is given by

T(E)k = Zk − Z1√

var(Zk − Z1). (4.14)

with var(Zk−Z1) the estimated variance obtained by substituting σ2R,k and σ2

0,k in the variance.Under the null hypothesis, this test statistic is t-distributed with degrees of freedom (J.k − k) +∑ki=1

∑Jij=1 dfij for homogeneous bioactivities and J.k − k for heterogeneous bioactivities.

Page 88: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

76 4.3. Simulation study

4.2.5 α-Spending functions

The α-spending functions that will be used to strongly control the FWE rate are the αt,Bonferroni-Holms, log, and normal methods. These are given by

αt : αk = k × α

K − 1Bonferroni− Holms : αk = α

K − k

Log : αk = α× log{

1 + (e− 1) k

K − 1

}

Normal : αk = 2{

1− Φ(Zα

2

√K − 1k

)},

with K the total number of possible tests. For each of these functions to be deemed effective,the FWE rate retained should be as close as possible to the nominal α value under the nullhypothesis. More detailed information about these α-spending functions can be found in Lanand DeMets (1983) and Nakamura and Douke (2007) for example.

4.3 Simulation study

In order to be able to compare the proposed methods, the bioactivities are simulated assumingthat Xij ∼ N(µ(t), σ2

R + σ20) with corresponding standard errors S2

ij ∼ σ20χ

2dfij/dfij and µ(t) is

assumed mean of the bioactivity. The within-bioassay variance is assumed to be σ20=0.00047.

The number of degrees of freedom belonging to the estimated standard error is assumed constantdfij=20. The between variability is varied from no heterogeneity (σ2

R=0), moderate heterogeneity(σ2

R=0.00023), and slightly high heterogeneity (σ2R=0.00047). This between variability is due to

bioassay runs. It is the variability introduced by each run/experiment conducted to estimate thebioactivity of an in-house reference standard. This variability is accounted for by the randomterm δij in Model (4.1). These heterogeneity values are based on an assumption of a 5.0%,6.11%, and 7.11% relative standard deviation (%RSD) of the bioactivities. They indicate theamount of variability in the bioactivities and the highest variability assumed is 7.11% whenσ2

0 = σ2R = 0.00047. These parameter settings were taken from a known bioassay from certain

release activities. This means that these are actual values and are expressed as %RSD toformulate it in more general terms, since only relative standard deviations will be of relevance.

The stability of the bioactivity of an in-house reference is typically examined annually forstability and will likely be used for at most five years. The time points are assumed to be t1 = 0,t2 = 12, t3 = 24, t4 = 36, and t5 = 48 months. The initial bioactivity in the logarithmic scaleis assumed to be µ(t1) = log10(70). The number of simulations is set at Nsim = 10000 whichgives the half length of the confidence interval (precision) on power values as 0.6%. In total 15

Page 89: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

4.3. Simulation study 77

bioassay runs will be used and these are spread over the five time points. All the analyses wereperformed in SASr 9.2 (SAS Institute Inc, Cary, NC, USA).

4.3.1 Study designs

The first design is chosen such that there are more bioactivities at the first time point andtreating the rest of the time points equally: Design 1 (D1) is 7, 2, 2, 2, 2. The advantage of thisdesign is that a precise estimate of the bioactivity will be obtained at the first time point. Thesecond design consists of an equal number of estimated bioactivities at each time point: Design2 (D2) is 3, 3, 3, 3, 3. For this design, all the time points are treated equally and can be easilyused in practice. In the third design, the number of bioactivities at each time point decreasesin order, thus putting more emphasis on the first few time points: Design 3 (D3) is 5, 4, 3, 2, 1.For the last design, a substantial amount of bioactivities are estimated at the first and last timepoints: Design 4 (D4) is 5, 1, 2, 3, 4. This design expects that instability would occur at latertime points, and thus more emphasis is also put at the end of the period. However, to have areasonable estimate at the beginning a substantial number of bioactivities is considered.

4.3.2 Deterioration profiles

A shift profile of the bioactivity given by µ(ti) = µ − ∆1[t0,∞)(ti) is assumed. The declinefactor ∆ is chosen equal to a 5% and 10% of the initial bioactivity, that is, the size of the shiftis ∆ = −log10(0.95) and ∆ = −log10(0.90). The sudden shift is implemented just after themeasurements, that is, after time point one, two, three, and four, respectively. A linear trendprofile of the form µ(ti) = α + βti, β ≤ 0 will also be assumed. The slope β in the trend profileis chosen such that the reduction at the last time point is equal to the shift parameter ∆, thatis, β = ∆/T . Changes in bioactivity with respect to the standard deviation are computed as∆/√

(σ20 + σ2

R). For a 5% decline these changes are 1.03, 0.84, 0.31 and for a 10% decline theseare 2.11, 1.73, 0.64 for σ2

R = 0, 0.00023, and 0.00047 with a constant σ20 = 0.00047.

4.3.3 Performance measure

The probabilities of detecting (and signaling) a change in the bioactivity of an in-house referenceat every time point taking into account the result from the previous time points will computed.These probabilities are calculated as follows

(a) P (T2 < −tn2(1− α)),

(b) P (T2 > −tn2(1− α), T3 < −tn3(1− α)),

(c) P (T2 > −tn2(1− α), T3 > −tn3(1− α), T4 < −tn4(1− α)),

Page 90: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

78 4.4. Statistical results

(d) P (T2 > −tn2(1− α), T3 > −tn3(1− α), T4 > −tn4(1− α), T5 < −tn5(1− α)),

with Tk the test statistic T (C)k for contrasts or T (E)

k for the EWMA with nk the number ofdegrees of freedom (which depends on whether bioactivities are homogeneous or heterogeneous),and tm(q) the qth quantile of the t-distribution with m degrees of freedom. The probability thatthe system detects a reduction in the bioactivity of a reference standard somewhere during themonitoring interval (0, T ] is

1− P (T2 > −tn2(1− α), T3 > −tn3(1− α), T4 > −tn4(1− α), T5 > −tn5(1− α)).

It is important to note that this is a very different performance measure than the one usedin standard SPC where the average run length is commonly used as a performance measure(Montgomery, 2009). This probability measure is used since an average run length would be lesssuitable for the short monitoring window.

4.4 Statistical results

The results are reported in two sections where the first section mainly focusses on the simulationanalysis and the second section is based on the analysis of the example. The first section ispartitioned into two subsections where the first part focusses on the selection of the appropriateα-spending function and the subsequent subsection shows the power values for detecting changesin the bioactivity.

4.4.1 Simulation results

Familywise error

The FWE rates obtained from the analysis are shown in two tables. Table 4.1 shows the FWErates when the pretest is not done and heterogeneity is accepted (NPT) and Table 4.2 showsthe FWE rates when the pretest is done (PT). Only the minimum and maximum FWE rates ofthe four different designs are presented for each α-spending function and statistical method.

All the procedures tend to be slightly conservative with the exception of the Bonferroni-Holmsfor the EWMA (λ ≥ 0.6) when σ2

R = 0. In the presence of some heterogeneity (σ2R = 0.00023),

the minimum FWE rates are approximately close to 0.05, while most of the maximum valuesare well above this nominal value except for the normal method which consistently kept it closeto 0.05. For σ2

R = 0.00047, only the normal method resulted in FWE rates that appeared tobe consistently controlled, except when the Helmert contrasts and the Shewhart chart (EWMA(λ = 1.0)) are used where the FWE is slightly larger than 0.05.

Page 91: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

4.4. Statistical results 79

When the pretest is done, the α-spending functions resulted in substantially higher FWErates. The only procedure which was able to keep it under control haphazardly was the normalmethod (Table 4.2).

Table 4.1: The familywise error rates (Min - Max) assuming heterogeneity (NPT).

Contrasts methods EWMAVarR Procedure Linear Helmert Reverse-Helmert 0.2 0.6 1.0

0

αt 0.040-0.046 0.046-0.053 0.031-0.034 0.032-0.038 0.036-0.049 0.042-0.054Bon-Holms 0.032-0.039 0.037-0.044 0.029-0.031 0.029-0.033 0.064-0.081 0.078-0.094

Log 0.044-0.053 0.035-0.066 0.033-0.038 0.034-0.042 0.040-0.054 0.046-0.060Normal 0.030-0.035 0.034-0.038 0.029-0.030 0.029-0.031 0.030-0.036 0.033-0.038

0.00023

αt 0.058-0.065 0.073-0.089 0.045-0.052 0.046-0.068 0.054-0.071 0.065-0.079Bon-Holms 0.050-0.055 0.056-0.070 0.042-0.047 0.042-0.050 0.045-0.058 0.054-0.066

Log 0.063-0.073 0.051-0.100 0.048-0.057 0.049-0.063 0.058-0.079 0.070-0.090Normal 0.044-0.047 0.051-0.058 0.039-0.044 0.040-0.045 0.042-0.051 0.049-0.056

0.00047

αt 0.072-0.077 0.090-0.102 0.053-0.060 0.055-0.068 0.064-0.081 0.078-0.094Bon-Holms 0.061-0.066 0.073-0.084 0.048-0.054 0.050-0.057 0.054-0.069 0.065-0.079

Log 0.080-0.087 0.065-0.115 0.055-0.067 0.059-0.074 0.069-0.090 0.084-0.105Normal 0.051-0.055 0.065-0.069 0.046-0.049 0.048-0.051 0.051-0.059 0.059-0.066

Table 4.2: The familywise error rates (Min - Max) when the pre-test is conducted (PT).

Contrasts methods EWMAVarR Procedure Linear Helmert Reverse-Helmert 0.2 0.6 1.0

0

αt 0.074-0.079 0.093-0.097 0.046-0.063 0.058-0.065 0.061-0.081 0.080-0.092Bon-Holms 0.064-0.069 0.077-0.080 0.044-0.055 0.052-0.057 0.058-0.068 0.067-0.076

Log 0.083-0.089 0.062-0.110 0.050-0.069 0.063-0.072 0.074-0.089 0.090-0.101Normal 0.050-0.055 0.060-0.064 0.040-0.047 0.045-0.047 0.048-0.057 0.056-0.063

0.00023

αt 0.089-0.100 0.118-0.127 0.061-0080 0.073-0.081 0.086-0.097 0.105-0.114Bon-Holms 0.078-0.090 0.105-0.111 0.058-0.072 0.067-0.072 0.078-0.083 0.093-0.099

Log 0.097-0.111 0.085-0.139 0.066-0.087 0.079-0.087 0.093-0.106 0.114-0.124Normal 0.061-0.071 0.080-0.086 0.052-0.058 0.055-0.060 0.063-0.069 0.073-0.079

0.00047

αt 0.088-0.113 0.126-0.138 0.064-0.091 0.076-0.084 0.095-0.102 0.108-0.121Bon-Holms 0.078-0.100 0.107-0.121 0.061-0.083 0.071-0.078 0.079-0.093 0.095-0.107

Log 0.097-0.122 0.092-0.146 0.069-0.099 0.082-0.091 0.096-0.110 0.116-0.132Normal 0.061-0.076 0.084-0.092 0.053-0.066 0.059-0.063 0.064-0.074 0.077-0.084

Page 92: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

80 4.4. Statistical results

Power values of detecting the change

The normal α-spending function was chosen to keep the FWE rate as close as possible to thenominal level to be able to do power comparisons. Furthermore, pretesting was eluded since thisinflated the FWE rates. Only the results for a 10% decline of the bioactivity are presented sincethe results at 5% show the same trends although the power values were approximately 50% less.As was mentioned in the introduction, one of the objectives was to choose the design, whichwould provide the highest power of detecting the decline of the bioactivity for each method.

The optimal designs for each deteriorating profile are shown in Table 4.3. It is seen thatDesign 1 (D1) and Design 4 (D4) are the most dominant designs among the four. The differencesin power between D1 and D4 for the linear and reverse-Helmert were quite small when the shiftoccurred after 1 or 2 years. Thus, D4 was chosen as the optimal design for all methods exceptfor the EWMA with λ = 0.2 for which D1 was the optimal design.

Table 4.3: The Optimal designs observed for each method at different time points.

Profiles Linear Helmert R-HEWMA

0.2 0.6 1.0Shift after time point 1 D1 D4 D1 D1 D4 D4

2 D1/D4 D4 D4 D1 D4 D43 D4 D4 D4 D1 D4 D44 D4 D4 D4 D1 D4 D4

Linear trend D4 D4 D4 D1 D4 D4

The power of detecting a 10% decline in the bioactivity of the in-house reference obtainedwith the optimal designs are shown in Table 4.4. These results indicated that, when the bioassayruns are homogeneous (σ2

R = 0) the linear, reverse-Helmert, and the EWMA all retained highpower values of detecting a shift occurring after time points 1-2. Furthermore, if the shiftoccurred after time points 3-4, the linear, Helmert, EWMA (λ=0.6), and the Shewhart chartshowed high power values. The reverse-Helmert contrasts and the EWMA (λ = 0.2) were theleast powerful methods if the shift occurred after the fourth time point.

For a linear decline, the linear contrasts had the highest power as compared to other methods.This method was followed by the EWMA (λ=0.6), Shewhart chart, reverse-Helmert contrasts,the EWMA (λ = 0.2), and then the Helmert contrasts, respectively. A similar behaviour is ob-served when there is heterogeneity between bioactivities, even though power values are slightlyless than those seen when there is homogeneity (due to variability). The probabilities of im-mediate detection, that is, the probability that the change is detected at the earliest possibletime points for the shifts are shown in Table 4.5. These power values are useful in choosing theoptimal method when methods demonstrate an overall power that is similar. It is clear that theprobability of detecting a decline is low at earlier time points due to the low α-values used inthe normal α-spending function.

Page 93: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

4.4. Statistical results 81

Table 4.4: Power values of detecting a 10% decline with D1 for EWMA (λ = 0.2) and D4 forother methods.

VarR Profiles Linear Helmert R-HEWMA

0.2 0.6 1.0

0

Shift after time point 1 0.93 0.55 0.97 0.98 0.96 0.912 0.96 0.72 0.93 0.91 0.95 0.913 0.95 0.91 0.77 0.69 0.89 0.914 0.76 0.95 0.35 0.28 0.63 0.88

Linear trend 0.91 0.76 0.82 0.79 0.89 0.89

0.00023

Shift after time point 1 0.84 0.48 0.91 0.92 0.90 0.822 0.89 0.62 0.84 0.81 0.86 0.823 0.87 0.81 0.65 0.57 0.78 0.824 0.64 0.86 0.30 0.25 0.53 0.76

Linear trend 0.80 0.66 0.69 0.67 0.77 0.78

0.00047

Shift after time point 1 0.75 0.43 0.82 0.85 0.81 0.732 0.80 0.54 0.74 0.71 0.77 0.733 0.79 0.72 0.55 0.49 0.68 0.734 0.55 0.77 0.27 0.23 0.44 0.66

Linear trend 0.71 0.57 0.60 0.58 0.68 0.69

Table 4.5: Probability of immediate detection of a 10% deterioration under the selecteddesigns.

VarR Profiles Linear Helmert R-HEWMA

0.2 0.6 1.0

0

Shift after time point 1 0 0 0 0 0 02 0.05 0.07 0.02 0.06 0.02 0.063 0.48 0.68 0.20 0.18 0.34 0.604 0.75 0.94 0.34 0.27 0.62 0.88

0.00023

Shift after time point 1 0 0 0 0 0 02 0.05 0.07 0.03 0.06 0.03 0.073 0.39 0.55 0.18 0.17 0.28 0.484 0.63 0.84 0.29 0.24 0.51 0.75

0.00047

Shift after time point 1 0 0 0 0 0 02 0.06 0.07 0.03 0.06 0.04 0.063 0.32 0.45 0.16 0.15 0.24 0.404 0.53 0.75 0.25 0.21 0.42 0.64

More concerning however, is that when a shift happens after the first time point it will notbe detected (see Table 4.5). At later time points, the α-value is getting closer to the nominal

Page 94: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

82 4.4. Statistical results

value of 0.05 and the probability of detecting the shift at the earliest possible moment becomeslarger and more substantial. The Shewhart chart seemed to detect a shift the earliest (betterthan the linear contrasts), although the overall power is a little lower or similar. This impliesthat the linear contrasts builds up the power at later moments after the shift has happened.

4.4.2 An example

A stability study which was used at Merck Sharp & Dohme, Oss, The Netherlands is used forillustration in this article. The available information for this study is the date, bioactivity (ina logarithmic scale), standard errors, and degrees of freedom. This study was conducted from2007 to 2012 (K = 6), thus a 6-year monitoring period. For the first time point (2007), therewere six bioassay runs used and one bioassay run in each subsequent years resulting in a totalof 11 bioassay runs at the end of the monitoring period. A similar approach is followed for theanalysis as was done in the simulation study with the normal α-spending function used to controlthe FWE rate and avoiding pretesting. The results from this stability study are presented inTable 4.6 where the mean, test statistics (T (C)

k and T (E)k ), and the critical t-value t1−αk,dfk , where

αk is given by the normal spending function and dfk = 5 (dfk = number of bioassay runs - k)are computed at each time point.

Comparing the calculated t-statistics and the critical t-values at each time point, it canbe seen that the calculated t-statistics (T (C)

k and T(E)k ) are all less than the critical t-values

(t1−αk,dfk). This suggests that there is not enough evidence to conclude that the bioactivityis changing over the 6-year monitoring period. This is true for all the methods applied. Thefindings are supported by the mean values in Table 4.6, where it is evident that although themean shows some changes, these changes are not too drastic for alarming concerns.

Table 4.6: Assessing the stability of a bioactivity over a six year period.

Contrasts (T (C)k ) EWMA (T (E)

k )Year Xi σ2

0,k σ2R,k Linear Helmert Reverse 0.2 0.6 1.0 t1−αk,dfk

2007 7.8502008 7.861 0.00076 0.0068 0.111 0.111 0.111 0.111 0.111 0.111 15.0522009 7.941 0.00036 0.0072 0.906 0.967 0.717 0.777 0.901 0.973 5.0652010 7.925 0.00039 0.0071 1.091 0.674 0.958 1.020 1.031 0.797 3.2462011 7.956 0.00038 0.0072 1.437 0.941 1.259 1.361 1.362 1.128 2.4652012 7.953 0.00039 0.0071 1.624 0.818 1.464 1.578 1.465 1.095 2.015

Page 95: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

4.5. Discussion and conclusion 83

4.5 Discussion and conclusion

The main goal for this article was to determine the most efficient statistical method for monitor-ing the bioactivity of an in-house reference standard with respect to an international referencestandard. This method should be able to detect the change (shift/trend) of the bioactivityas quickly as possible and with the highest power, while controlling the familywise error rate(FWE). The normal α-spending function was found to control the FWE best compared to theαt, Bonferroni-Holms, and log functions in most settings. However, under homogeneous settingsthe FWE rates were slightly conservative. Pretesting for heterogeneity was not recommendedsince it strongly inflated the FWE rates in all α-spending functions. The inflated FWE ratesseem to be related to the chosen significance level for testing H0 : σ2

R = 0. In this current work,the significance level used was α=0.25. For lower values of α, the FWE rates increase. However,α values of 0.5 or higher are almost similar to no pretesting.

The power comparisons showed that designs with more bioactivities at the initial and lasttime points were more powerful than designs with equal number of bioassay runs per time pointor used more bioassay runs at intermediate time points. The contrasts were optimal with Design4 (with most runs at the beginning and the end) with the exception of the linear and reverse-Helmert contrasts when a shift occurred after the first time point in which Design 1 was optimal(with the most runs at the beginning only). However, the discrepancy between Designs 1 and 4was minimal, thus Design 4 was chosen as the most powerful design for the contrasts methods.For the EWMA, Design 4 was the optimal design when the weights λ = 0.6 and λ = 1.0 wereused and Design 1 was optimal for λ = 0.2.

Assessing the power analysis it was found that the linear contrasts resulted in high powersof detecting the (shift/trend) change in the bioactivity at any time point during the monitoringphase, but more so at earlier time points. This approach is known to give high power valueswhen there is a clear indication of a linear trend (Phillips, 1998). In general, it also gave goodresults even when a shift/step decline occurred instead of a linear trend. The Helmert andreverse-Helmert contrasts complemented each other in a sense that the former contrasts weremore powerful at detecting later changes and the latter contrasts were more powerful at detectingearly declines. This is in agreement with Tamhane et al. (1996) where they mentioned that thesecontrasts tend to switch roles depending on when the intended event occurred.

The EWMA with λ=0.2 showed high power in detecting a shift occurring at earlier timepoints, but with much lower power for a late shift. The EWMA (λ = 0.6) and the Shewhartchart consistently gave better power irrespective of where the change occurred, except when theshift occurred after the first time point and a decline of 5% was used. Overall, the EWMAdid not perform as well as might have been expected from literature. For example, it is widelydocumented that smaller changes are easily detected by the EWMA with smaller values of

Page 96: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

84 4.5. Discussion and conclusion

λ which were indicated by a 5% decline in this current work. These smaller changes werebetter detected by the EWMA with λ = 0.6 and the Shewhart chart. Various reasons could beattributed to this; one of them being the finite monitoring period. The EWMA builds up powerover a long period of time, while in the bioassay application this longer time period is terminatedbefore the EWMA can do its job. Among the EWMA charts, the Shewhart chart seemed to bemore consistent in both deterioration profiles and almost as good as the linear regression (linearcontrasts). It was noted that the linear regression had a slightly higher overall power, however,for most cases it had lower powers for an immediate detection after a shift occurs. This resultsin a selection dilemma between the linear regression and the Shewhart chart.

Another method that could have been applied is the weighted regression. Several articleshave indicated the advantages of using the weighted approach over unweighted approach in thepresence of heterogeneity between bioactivities (see Cochran, 1954; Mzolo et al., 2013) but froma preliminary analysis, it was noted that the weighted approach did not yield improved resultscompared to other methods that were applied. This is most likely due to the imposed assumptionthat the residual variance is constant over time points and bioassay runs. This makes weightingless appealing, but more research in this direction is needed.

The example indicated the effectiveness of these methods in monitoring the bioactivity. Al-though in this case the bioactivity appeared stable, it would have been interesting to havean example where the bioactivity actually changes. In that case we would have been able todetermine which approach detected this change the earliest.

In this current work, the assumption was that the international reference standard is avail-able. It is possible that for some batches this international reference does not exist and thatleads to difficulties in trying to monitor the in-house reference standards. Some approacheshave been proposed, for example, Kirkwood (1977) used the accelerated degradation approachto monitor the bioactivity. This is done by imposing the bioactivities on elevated temperaturesand compare these to bioactivity stored at much lower temperature. Its applicability may notbe trivial for some cases, thus further research is necessary to focus on developing new and easilyapplicable methods.

In summary, the normal method was found to be the most efficient α−spending function forcontrolling the FWE rate. The linear contrasts, the EWMA (λ = 0.6), and the Shewhart chartwere more consistent and often had the highest power under a design with as many bioassaysat the beginning and at the end of the monitoring window. The linear contrasts consistentlyresulted in high power, but the Shewhart chart was better in detecting shifts earlier. The Helmertcontrasts were more powerful at detecting late declines, whereas the reverse-Helmert togetherwith the EWMA (λ = 0.2) were best at detecting early declines.

Page 97: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

REFERENCES 85

References

Aerne, L., Champ, C., and Rigdon, S. (1991), “Evaluation of Control Charts under LinearTrend,” Communications in Statistics - Theory and Methods, 20, 3341–3349.

Bancroft, T. (1964), “Analysis and Inference for Incompletely Specified Models Involving theUse of Preliminary Test(s) of Significance,” Biometrics, 20, 427–442.

Bliss, C. I. (1952), The Statistics of Bioassay, New York: Academic Press Inc.

Budde, M. and Bauer, P. (1989), “Multiple Test Procedures in Clinical Dose Finding Studies,”Journal of the American Statistical Association, 84, 792–796.

Chang, J. T. and Fricker Jr, D. (1999), “Detecting When a Monotonically Increasing Mean hasCrossed a Threshold,” Journal of Quality Technology, 31, 217–234.

Cochran, W. G. (1954), “The Combination of Estimates from Different Experiments,” Biomet-rics, 10, 101–129.

Collins, J., Grieshaber, C., and Chabner, B. (1990), “Pharmacologically Guided Phase I ClinicalTrials Based upon Preclinical Drug Development,” Journal of National Cancer Institute, 82,1321–1326.

Crowder, S. (1992), “An SPC Model for Short Production Runs: Minimizing Expected Cost,”Technometrics, 34, 64–73.

Del Castillo, E. and Montgomery, D. C. (1994), “Short-run Statistical Process Control: Q-chartEnhancements and Alternative Methods,” Quality and Reliability Engineering International,10, 87–97.

Finney, D. J. (1978), Statistical Method in Biological Assay, London: Charles Griffin & Co. Ltd.

Gan, F. (1991), “EWMA Control Chart Under Linear Drift,” Journal of Statistical Computingand Simulation, 38, 181–200.

Hawkins, D. M. (1991), “Multivariate Quality Control Based on Regression-adjusted Variables,”Technometrics, 33, 61–75.

Kirkwood, T. B. L. (1977), “Predicting the Stability of Biological Standards and Products,”Biometrics, 33, 736–742.

Lan, K. and DeMets, D. (1983), “Discrete Sequential Boundaries for Clinical Trials,” Biometrika,70, 659–663.

Page 98: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

86 REFERENCES

Lucas, J. M. and Saccucci, M. S. (1990), “Exponentially Weighted Moving Average ControlSchemes: Properties and Enhancements,” Technometrics, 32, 1–12.

Montgomery, D. C. (2009), Introduction to Statistical Quality Control, New Jersey: John Wiley& Sons, 6th ed.

Mzolo, T., Hendriks, M., and Van den Heuvel, E. (2013), “A Comparison of Statistical Methodsfor Combining Relative Bioactivities from Parallel Line Bioassays,” Pharmaceutical Statistics,12, 375–384.

Nakamura, T. and Douke, H. (2007), “Development of Sequential Multiple Comparison Proce-dure for Dose Response Test,” Biometrical Journal, 49, 30–39.

Page, E. (1954), “Continuous Inspection Schemes,” Biometrika, 41, 100–115.

Phillips, A. (1998), “A Review of the Performance of Tests used to Establish Whether there isa Drug Effect in Dose-response Studies,” Therapeutic Innovation and Regulatory Science, 32,683–692.

Roberts, S. (1959), “Control Charts Tests Based on Geometric Moving Averages,” Technomet-rics, 1, 239–250.

Ruberg, S. J. (1989), “Contrasts for Identifying the Minimum Effective Dose,” Journal of theAmerican Statistical Association, 84, 816–822.

— (1995), “Dose Response Studies II. Analysis and Interpretation,” Journal of Biopharmaceu-tical Statistics, 5, 15–42.

Shewhart, W. A. (1931), Economic Control of Quality of Manufactured Product, New York: VanNostrand.

Tamhane, A. C., Hochberg, Y., and Dunnett, C. W. (1996), “Multiple Test Procedures for DoseFinding,” Biometrics, 52, 21–37.

Yashchin, E. (1987), “Some Aspects of the Theory of Statistical Control Schemes,” IBM Journalof Research and Development, 31, 199–205.

Yeh, A. B., Huwang, L., and Wu, C.-W. (2005), “A Multivariate EWMA Control Chart forMonitoring Process Variability Wwith Individual Observations,” IIE Transactions, 37, 1023–1035.

Yeh, A. B., Huwang, L., and Wu, Y.-F. (2004), “A Likelihood Ratio Based EWMA ControlChart for Monitoring Variability of Multivariate Normal Processes,” IIE Transactions, 36,865–879.

Page 99: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

REFERENCES 87

Yeh, A. B., Lin, D. K.-J., Zhou, H., and Venkataramani, C. (2003), “A Multivariate Exponen-tially Weighted Moving Average Control Chart for Monitoring Process Variability,” Journalof Applied Statistics, 30, 507–536.

Page 100: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016
Page 101: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

CHAPTER 5

Monitoring the bioactivity in the absence of aninternational reference standard

Page 102: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

90

Abstract

Routine bioassays for batch release in pharmaceutical industry use an in-house reference stan-dard instead of an international reference standard. The pharmaceutical company is then legallyobligated to verify the stability of the bioactivity of the in-house reference standard with respectto the international standard over time. However, for some newly developed products it mayhappen that an international reference does not exist. In that case, the pharmaceutical companycreates its own primary reference that takes the place of the international reference. The com-pany is then responsible for controlling and monitoring the bioactivity of both the in-house andthe primary reference. In this chapter, a new strategy of backup in-house reference standardsis developed to enable monitoring and qualification of standards. This set of in-house referencestandards is used to release batches and to monitor the primary reference over time. The stabilityof the primary reference is assessed using an exponentially weighted moving average (EWMA)chart for an in vitro bioassay using 96 well-plates and the four-parameter logistic (4PL) curves toassess bioactivities. The performance of this strategy is assessed using simulations and optimalsettings are discussed.

Keywords: EWMA, false alarms, in vitro assays, in-house reference, linear shift, lineartrend

Page 103: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

5.1. Introduction 91

5.1 Introduction

Routine bioassays for batch release in pharmaceutical industry use in-house reference standardinstead of an international reference standard. An in-house reference may replace the inter-national reference in release testing, only when it is strictly comparable to the internationalreference and when the pharmaceutical company can demonstrate its stability over time withrespect to the international reference (Chapter 4). Statistical methods that can be used to assessthe stability of this in-house reference were thoroughly discussed in Chapter 4. These methodsinclude (among others) the use of control charts such as the exponentially weighted movingaverage (Roberts, 1959) and specific contrast estimates for an analysis of variance (ANOVA)approach. However, for some newly developed products it may happen that an internationalreference does not exist and these approaches are not possible anymore. In that case, the phar-maceutical company has to employ other strategies to enable monitoring the stability of thebioactivity of the in-house standards used to release batches.

Pharmaceutical companies typically create their own reference standard that takes the role ofthe international reference, also called primary references. They may then apply the monitoringapproaches that are used when an international does exist. However, pharmaceutical companiesare also responsible for maintaining the stability of the bioactivity of the primary reference.Thus, without the existence of an international reference, the pharmaceutical company mustboth monitor the primary reference and the in-house reference standards.

One approach is the estimation of the shelf life (ICH Q1A(R2), 2003) of the primary reference(Kirkwood, 1977). This approach uses an accelerated degradation test in combination with theArrhenius equation. This equation assumes a fixed relation between the degradation rates atdifferent temperatures. Thus, the primary reference is stored at different temperatures and therelative bioactivities are calculated with the primary reference stored at the lowest temperature.By evaluating the relative bioactivities over time and applying the Arrhenius equation, a pre-diction of the bioactivity for the primary reference at the lowest temperature can be obtained.This approach makes two crucial assumptions; the first assumption is that the reference at thelowest temperature is stable over time and the second assumption is that the Arrhenius equa-tion, which was obtained for gases, holds true for complex molecules in solid phase at a widerange of temperatures. These assumptions do not hold true for bioassays, hence more researchis required on the methods that can be applied.

In this chapter we propose an alternative approach that enables the monitoring of the primaryreference simultaneously with the monitoring of in-house reference standards. Our approach isreferred to as the “back-up system of monitoring”, and it allows the qualification of new referencestandards when the existing primary reference is out-of-trend. The manuscript of this chapteris under preparation for submission for publication.

Page 104: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

92 5.2. Statistical methods

5.2 Statistical methods

Primary reference standards are denoted by Rp (p = 1, 2, . . . , P ) where p refers to a period intime that it is active. The primary reference is stored at −70◦C or −80◦C and is specificallyprepared to be stable for a long time period using high quality and stable constituents. Then thisprimary reference plays the role of an international reference. The secondary or in-house referencestandards are denoted by Sm (m = 1, 2, . . . ,M) where m refers to a subperiod within the periodof the primary reference. The better mathematical notation would have been m(p), but forconvenience this notation is simplified to m since we will focus on one period, that is, p only. In-house reference standards are typically stored at −20◦C and they are expected to have a shorterstability period than the primary reference due to the lower quality in preparation and higherstorage temperature. The secondary reference standards are commonly used to determine thebioactivity of batches in routine analysis. Both the primary and secondary reference standardsare prepared from the sponsored drug substance batches.

5.2.1 Strategy

A testing scheme for the creation and qualification of standards is presented in Figure 5.1. Thetree is dedicated to the period of one primary reference, say, Rp. Each vertex represents atime point for which the bioactivities of secondary reference standards are determined againstthe primary reference. Equidistant time points are considered and the period between twoconsecutive time points is given by ∆t. This time interval is selected in such a way that asecondary reference is guaranteed to be stable over a time period of length ∆t. The first timepoint t1 starts at the first vertex on the left side in Figure 5.1, where only the secondary referenceS1 is qualified against the primary reference Rp. This means that both the primary and secondarystandards are produced and tested at time point t1.

At the second time point, which is represented by the second vertex from the left, thesecondary reference S1 is measured and a new secondary reference S2 is created and testedagainst the primary reference. At the third vertex from the left of Figure 5.1 the secondaryreference standards S1 and S2 are measured, and S3 is created and tested against the primaryreference. The system continues until the primary reference fails the stability criterion (discussedlater). In that case the primary reference Rp is replaced by a new primary reference Rp+1.

At time point tm, there exist m consecutive comparisons between the activity of stablesecondary reference standards against the primary reference Rp over time, that is, (Sm−1, tm)vs (Sm−1, tm−1), (Sm−2, tm−1) vs (Sm−2, tm−2), . . . , (S1, t2) vs (S1, t1). Thus, the difference Zkin bioactivity is defined by the activity of Sk−1 at time point tk minus the activity of Sk−1 attime point tk−1. The differences Z2, Z3, . . . , Zm for the secondary standards are all obtainedagainst primary reference Rp within their first period of existence for which it was assumed

Page 105: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

5.2. Statistical methods 93

that the secondary standards were stable. Thus, when the primary reference Rp is stable withinperiod [t1, tm] the differences Z2, Z3, . . . , Zm will fluctuate randomly around zero. On the otherhand, if the activity of the primary reference Rp is decreasing over time, then the differencesZ2, Z3, . . . , Zm would deviate from zero by showing a negative trend (when the follow-upbioactivity is subtracted from the bioactivity in the previous time point). A control chart canbe used to assess these differences, for example, the exponentially weighted moving average(EWMA) control chart.

(S3),

S4, S5

∆t

S5 S4 S3 S2 S1 S6 S1 S1, S2 (S1),

S2, S3

(S2),

S3, S4

Figure 5.1: A testing scheme for stability evaluation of primary and secondary reference stan-dards

If at a particular time point tm the primary reference Rp is considered to be unstable, thena new primary reference Rp+1 should be created immediately to replace Rp. The secondaryreference Sm which was just introduced at time tm, may act as S1 under the new period of theprimary reference Rp+1 and its activity is set by Sm−1 instead of Rp, and the tree in Figure 5.1starts all over again.

The measurements of Sm−2 at tm can be used to assess the stability of the secondary referenceover the last time period. If the secondary reference standards are also stable in the period 2∆t,then the difference between Sm−2 at tm and Sm−1 at tm−1 can be used to lengthen the timeperiod between replacing secondary reference standards. Additionally, the extra measurementof Sm−2 at time tm could be used to confirm that a potential stability issue is caused by theprimary reference and not by the current secondary reference (when Sm−2 would be stable overtwo periods).

5.2.2 Statistical details on monitoring the primary and secondaryreferences

At each time point tm, J bioassay runs are executed to obtain multiple bioactivities that can bepooled to accurately estimate the bioactivity of the secondary reference standards Sm−2, Sm−1,and Sm with respect to the primary reference. A bioassay run can be affected by different assayfactors such as the analyst, testing day, and cell passage (USP < 1034 >). These factors shouldbe included in the experiment to prevent spurious or false signals in the monitoring of reference

Page 106: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

94 5.2. Statistical methods

standards. At least one of these factors should change from bioassay run to bioassay run intesting bioactivities to be able to capture the true variability in bioactivities. Additionally, thebioactivity from one bioassay run may itself be a combination of multiple bioactivities as iscustom for in vitro bioassays that uses multiple well-plates. In fact, a well-plate may produceeven multiple bioactivities. A nonlinear S-curve is fitted to the data observed from well-platesfor each reference to estimate the relative bioactivity between secondary and primary reference.In this study, a four-parameter logistic curve which is characterised by the lower and upperasymptotes (αa, δa), the slopes (βa), and the ED50 (γa) is given by

ln(Yaj) = δa + (αa − δa) [1 + exp (βaln(xaj/γa))]−1 + εaj, (5.1)

where a = 0 for the primary reference and a = 1 for the secondary reference, xaj is the jthconcentration for reference a, and the residuals are assumed to follow a normal distribution withmean zero and variance σ2

e . These nonlinear curves are fitted for each well-plate. The bioactivityis estimated as a ratio of the ED50 (the median effective dose) parameters given by Xm = γ0/γ1

assuming that parallelism holds (i.e., α0 = α1, β0 = β1, δ0 = δ1). The bioactivities obtainedwith (5.1) are the bioactivities that are collected for the proposed monitoring strategy. Since theconditions under which the bioactivities are collected may vary, heterogeneity in bioactivities maybe present (USP < 1032 >). Thus, a statistical model that may account for this heterogeneityis mathematically described by a variance components model

Xijkl = ηi + βj + αβij + δk(j) + αδik(j) + εijkl, (5.2)

with Xijkl the lth (l = 1, 2, . . . , L) relative log bioactivity on the kth (k = 1, 2, . . . , K)well-plate in the jth (j = 1, 2, . . . , J) bioassay run for the ith (i = m−2, m−1, m) secondaryreference at time point tm with respect to primary reference Rp.

The model parameters are defined as follows: ηi is the true mean logarithmic bioactivity ofthe secondary reference i, βj is the effect of the jth analytical run, βj ∼ N(0, σ2

R), αβij is theinteraction effect of the ith secondary reference with the jth analytical run, αβij ∼ N(0, σ2

SR),δk(j) is the effect of the kth well-plate within the jth analytical run, δk(j) ∼ N(0, σ2

P (R)), αδik(j)

is the interaction effect of the ith secondary reference with the kth well-plate within the jthanalytical run, with αδik(j) ∼ N(0, σ2

SP (R)), and εijkl is the residual error, εijkl ∼ N(0, σ2E). The

random effects and the residuals are assumed to be mutually independently distributed. Model(5.2) is described for time point tm, which implies that the indices i = m − 2, i = m − 1, andi = m refer to the secondary reference standards Sm−2, Sm−1, and Sm, respectively. Note thatModel (5.2) will only be approximately true, because the relative bioactivities are the ratios ofmaximum likelihood estimates (γ0/γ1) from the fitted S-curves.

The model parameters will be estimated using a Type 3 method of moments approach which

Page 107: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

5.2. Statistical methods 95

is appropriate for variance components models with a small number of samples (Van den Heuvel,2010). The estimates are denoted by ηi, σ2

R, σ2SR, σ2

P (R), σ2SP (R), and σ2

E for the log bioactivityηi and variance components σ2

R, σ2SR, σ

2P (R), σ

2SP (R), and σ2

E, respectively. For monitoring of theprimary reference, the estimated bioactivities ηi(tm) of the secondary reference standards i =m−2, m−1, m, its corresponding estimated standard error σηi(tm) and its corresponding numberof degrees of freedom ni(m), and the estimated covariance νm−1,m between the estimated logbioactivity ηm−1(tm) and ηm(tm) of the secondary reference Sm−1 and Sm at time point tm areobtained.

Change patterns of the primary reference standard

Before describing the methods for monitoring the primary and secondary reference standards,the change profiles in the bioactivity of the primary reference standards are introduced first.It is not known in advance which change patterns will occur, but the most commonly studiedpatterns in statistical process control (SPC) are persistent shifts and linear trend. These profileswere also part of Chapter 4 where the stability of the in-house reference was assessed. A shiftsimply means that the bioactivity suddenly changes in level at one particular time and staysconstant afterwards, and a linear trend is a gradual decrease in the bioactivity of the referencestandard. Other patterns have been discussed by Gitlow et al. (1995) and they include wild andmulti-universe patterns. There are two types of wild patterns; there are freak patterns wheredramatic changes occur and there are also grouping patterns in which a change is introduced bygroups. The grouping pattern means that a certain group of points is affected by a change butnot others. In the case of multi-universe patterns, several patterns occur that could be due tosystematic or stratification variables. These patterns seem less realistic for biological referencestandards, thus we will focus on just two patterns, that is, the traditional shift and the lineartrend profiles.

When the bioactivity is shifting the parameter γ0(t) corresponding to the ED50 of the primaryreference at time point t is then given by

logγ0(t) = logγ0 + log(1 + p)1[ts,∞)(t), (5.3)

where ts > 2 is the time point where the shift is introduced and p > 0 a percentage increase inγ0. Note that when the ED50 becomes larger the primary reference is biologically less active.Additionally, the bioactivity of the primary reference is assumed to be stable at the first timeperiod (similar to the secondary standards). Since the primary reference is deteriorating, thebioactivity of the secondary reference is expected to increase. When a shift occurs at time ts,

Page 108: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

96 5.2. Statistical methods

the bioactivity at tm ≥ ts is given by

ρi = γ0(1 + p)γi

(5.4)

with i ∈ {m− 2, m− 1, m}, and m ≥ 3. Note that γm−2 may also have changed in the period[∆t, 2∆t], but this is ignored here. The value of p will be assumed to be 0.10. Thus, the primaryreference has a 10% decrease in bioactivity after the shift occurs. The second profile that willbe assessed is a linear trend profile in the logarithmic ED50 of the primary reference. In thisprofile, we assume that the bioactivity of the primary reference is gradually changing over time.The linear trend profile is given by

logγ0(t) = γ0 + βt1[2,∞)(t), (5.5)

where β > 0 is the rate at which the primary reference is reducing after the second time period.The relative bioactivity for t ≥ 2 is given by

ρi = γ0

γi· eβt, (5.6)

for i ∈ {m − 2, m − 1, m} and m ≥ 3. The gradual rate β is chosen such that there is a 10%change every T years where T can be taken as 5 years for example. If T = 5 years, then thegradual rate is given by β = 0.019062.

Stability of the primary reference standard

To evaluate the stability of the primary reference Rp, the EWMA chart (Roberts, 1959) isconstructed for time differences Z2, Z3, . . . , Zm of the secondary references S1, S2, . . . , Sm. Itshould be noted that these differences were obtained when the secondary references were stable.These differences are defined as

Zm = ηm−1(tm−1)− ηm−1(tm). (5.7)

If the primary reference is unstable, then Zm would tend to show more negative differences. Theexpected value of Zm is given by

EZm = logγ0(tm)− logγ0(tm−1) (5.8)

The estimated variance of Zm is given by

τ 2Zm = σ2

ηm−1(tm−1) + σ2ηm−1(tm). (5.9)

Page 109: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

5.2. Statistical methods 97

However, the differences Z2, Z3, . . . , Zm are not independently distributed since secondaryreference standards are not independently observed from each other. The secondary referenceSm−1 is observed for the difference Zm at time point tm, but Sm is also observed at this timepoint tm for the difference Zm+1. Thus, when the effects of analytical runs and well-plates cannotbe neglected, the variables Z2, Z3, . . . , Zm are then pairwise dependent. This means that Zmis dependent with both Zm−1 and Zm+1, but with no other differences. In case Model (5.2)describes the bioactivities well, the covariance between Zm and Zm+1 is given by

Cov(Zm, Zm+1) = Cov(ηm−1(tm−1)− ηm−1(tm), ηm(tm)− ηm(tm+1)) (5.10)

= −Cov(ηm−1(tm), ηm(tm))

= −νm−1, m.

Using the differences Z2, Z3, . . . , Zm, the EWMA statistic is defined as

E1 = Z1 = 0, (5.11)

Em = λZm + (1− λ)Em−1,

= λZm + λm−1∑k=1

(1− λ)kZm−k,

where λ ∈ (0, 1] is a weighting parameter chosen to help detect changes in the activity of theprimary reference. The smaller values of λ put more focus on historical data and larger valuesof λ put more emphasis on the current difference Zm. As a result, the likelihood of detectingsmaller changes in the bioactivity is higher with a small λ and high values of λ will increase thedetection of larger changes in the target value. If the primary reference is stable over time theexpected value of the EWMA statistic is zero which is the target value. The expected value ofthe EWMA statistic Em is equal to

EEm = E(λZm + λ

m−2∑k=1

(1− λ)kZm−k)

(5.12)

= λm−2∑k=0

(1− λ)k (ηm−k−1(tm−k−1)− ηm−k−1(tm−k))

= λm−2∑k=0

(1− λ)k (logγ0(tm−k−1)− logγ0(tm−k))

The estimated variance of the EWMA statistic is given by

τ 2Em = λ2

m−2∑k=0

(1− λ)2kτ 2Zm−k

− 2m−3∑k=0

(1− λ)2k+1νm−k−2, m−k−1

. (5.13)

Page 110: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

98 5.2. Statistical methods

When the EWMA statistic (Em) drops below the limit −L · τEm , where L determines the size ofthe control limits (Montgomery, 2009), then a signal is given that indicates that the bioactivity ofthe primary reference has changed. This implies that a new primary reference should be created,and the monitoring process starts all over again. The size of the control limits (L) indicates howwide or narrow the control limits should be. If narrow control limits are considered, then theprobability of signalling a change will be much higher compared to when wider control limitsare preferred. The parameters λ and L should be selected in advance.

Stability of secondary reference standards

Statistical methods for investigating the stability of the bioactivity of the secondary referenceSm−2 with respect to the reference standard Sm−1 between the current time point tm and theprevious time point tm−1 for m ≥ 3 are described. Since Sm−2 was stable up to time point tm−1

and Sm−1 is stable at time point tm, the stability of Sm−2 after tm−2 can be evaluated. The bestpossible secondary reference to do this with is Sm−1 since it is stable over period tm−2 to tm−1.The difference in estimated bioactivities between Sm−2 and Sm−1 per well-plate with respect tothe primary reference is independent of the primary reference because bioactivities are relativeestimates. Similarly, the EWMA control chart can be used to monitor the secondary referencestandards. For the secondary reference, the differences of interest are given by

Tm = δ1(m)− δ2(m− 1), (5.14)

where δ1(m) = ηm−2(tm)− ηm−1(tm) and δ2(m− 1) = ηm−2(tm−1)− ηm−1(tm−1). The differencesδ1(m) and δ2(m − 1) correspond to the difference of the two oldest secondary reference stan-dards at time point tm and the difference of the two newest secondary reference standards attm−1, respectively. If the secondary reference standards are stable, then the two differences areexpected to be equal. Note that this evaluation is independent of the primary reference. Theexpected value of Tm is given by

ETm = ηm−2(tm)− ηm−1(tm)− ηm−2(tm−1) + ηm−1(tm−1). (5.15)

The variance of Tm is the sum of variances corresponding to δ1(m) and δ2(m − 1) and it isestimated as

τ 2Tm = σ2

δ1(m) + σ2δ2(m−1), (5.16)

where the variances of δ1(m) and δ2(m − 1) are estimated taking into account the covariancebetween Sm−2 and Sm−1 at each time point, but δ1(m) and δ2(m− 1) are pairwise independent.Similar to the Zm differences, the Tm differences are not independently distributed. This is

Page 111: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

5.3. A simulation study 99

because Sm−2 used in Tm is measured at time point tm−1 and Sm, which is in Tm+1, is alsomeasured at time point tm−1. The covariance between Tm and Tm+1 is given by

Cov(Tm, Tm+1) = Cov(ηm−2(tm)− ηm−1(tm)− ηm−2(tm−1) + ηm−1(tm−1), (5.17)

ηm−1(tm+1)− ηm(tm+1)− ηm−1(tm) + ηm(tm))

= Cov(ηm−2(tm), ηm(tm))

= ωm−2, m

The EWMA statistic for monitoring secondary references is defined as

N2 = T2 = 0 (5.18)

Nm = λTm + (1− λ)Nm−1

= λTm + λm−2∑k=1

(1− λ)kTm−k, (m ≥ 3)

where λ is the weighting parameter as defined before. Ideally, when the secondary reference isnot changing then the expected values of Nm should lie around zero with some random noise.When the secondary reference is declining, the EWMA statistic Nm falls below −L · τNm .

5.3 A simulation study

The simulation study simulated raw data Yaj for an in vitro bioassay on the 96 well-plates usingthe four parameter logistic curves given in Model 5.1. From the raw data, the bioactivity of thesecondary reference is calculated with respect to the primary reference. A plate layout of the 96well-plates that might be useful in practice is given in Table 5.1. The plate has 12 columns and8 rows, but our interest is only on row B to row E for Rp, Sm−2, Sm−1, and Sm, respectivelysince only this data contributes to the monitoring of the standards. The remaining rows can beused for other purposes. Furthermore, we assume that there is no row or column effect otherwisedifferent set-ups will be needed. The columns contain the different concentrations di used in theexperiment to generate sigmoid curves.

At each time point in Figure 5.1, six bioassay runs with four well-plates are simulated. Tosimulate realistic curves, we need appropriate parameter settings for the 4PL curves in (5.1).Using the routine data (six bioassay runs with 4 plates) from an in vitro bioassay on a hormonalmedicinal product, we were able to investigate the variability in the parameters in the 4PL curvesbetween plates and between bioassay runs. We implemented a mixed effect model including theterms “parameter”, “runs”, “plate(run)”, “parameter*runs”, and “residual”.

Page 112: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

100 5.3. A simulation study

Table 5.1: An overview of the rows and columns of a 96 well-plate used within an analyticalrun at time tm.

ColumnsSample Rows 1 2 3 4 5 6 7 8 9 10 11 12QC A d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12

Rp B d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12

Sm−2 C d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12

Sm−1 D d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12

Sm E d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12

batch 1 F d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12

batch 2 G d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12

batch 3 H d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12

QC = Quality Control, Rp = Primary reference, Sm−2 = Oldest secondary reference,Sm−1 = Youngest secondary references, and Sm = New secondary reference

The factor parameter referred to the parameters α, β, γ, and δ in the 4PL curve, andwas considered fixed. Only runs and parameter remained significant. At time point tm, theparameters in (5.1) were therefore simulated according to the following model:

αijk = α0 + λmj + ε(α)km(j) (5.19)

βijk = β0 + λmj + ε(β)km(j)

δijk = δ0 + λmj + ε(δ)km(j)

γijk = αim + λmj + ε(γ)km(j),

with i = 0, m−2, m−1, m, j = 1, 2, . . . , J (runs), and k = 1, 2, . . . , K (plates). The effectsλmj and ε(s)

km(j) for parameters s ∈ {α, β, γ, δ} are normally distributed with ηmj ∼ N(0, τ 2R)

and ε(s)km(j) ∼ N(0, τ 2

E), with α0, β0 and δ0 fixed values, αim = γ0 + νm, and νm ∼ N(0, τ 20 )

and finally with α0m defined by (5.3) or (5.5). All random terms are independently distributed.Note that the parameters change per plate within bioassay run and with bioassay run. Theparameters α, β, and δ for the standards on one plate are all equal to guarantee that the doseresponse relations for the primary standard Rp and secondary standard Sm−2, Sm−1, and Sm

are all similar on the plate. The parameter γ differ with the standards since each secondaryreference has its own potency or bioactivity. The variability in γ due to the plates and thebioassay runs vary in the same way as the other parameters. The case study of the routine invitro bioassay provided us with the following values for models in (5.19): α0 = 3.7256, β0 =2.0250, δ0 = 5.0072, γ0 = 4.5725, τ 2

R = 0.004907, τ 2E = 0.00239, and τ 2

0 = 0.00228.

Page 113: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

5.3. A simulation study 101

Having simulated the parameters of the S-curves in (5.1) for each plate within a bioassayrun at each time point tm makes it possible to simulate the raw data on a plate using (5.1) anda normally distributed error term εaj in (5.1), that is, εaj ∼ N(0, τ 2) with τ 2 = 0.00025. Theconcentrations in the S-curves were selected to be equidistant from 0 to 8.25 with incrementsof 0.75 resulting in 12 concentrations. These concentrations were the same on all plates in thesimulation. From the raw data on the S-curves, bioactivities of the secondary reference standardswere calculated per plate and per bioassay run individually with respect to the primary reference.This was done with the NLMIXED procedure of the SAS software version 9.4 (SAS Institute,NC). This procedure uses a quasi-Newton technique for maximising the likelihood function. Thelog relative bioactivities are then analysed with the mixed model in (5.2) to be able to determinethe EWMA chart for detecting a change in the parameter γ0 of the primary reference. The mixedmodel in (5.2) is fitted at each time point tm using the MIXED procedure of the SAS softwareversion 9.4.

5.3.1 Performance measures

At each time point tm the estimated EWMA statistic Em is calculated and if Em is less than−L · τEm a signal is given. This implies that either a possible change in the primary referenceRp has been detected at this time point, or the chart provides a false signal. The stabilityof the bioactivity of the primary reference is assessed by calculating the probability of signal(POS). This probability has been used before as a performance measure of an in-control andout-of-control process (Mzolo et al., 2015; Zhu and Lin, 2010). It is interpreted as the power ofdetecting a change in an out-of-control process, and also as the probability of false alarms, thatis, the probability of making a signal when the process is stable. The power of detecting a truechange is expected to be high, while the probability of making a false signal should be as low aspossible. The probability of detecting a change at tm is

P (tm|∆ts) = P (Em < −L · τEm|∆ts). (5.20)

where ∆ts represents the change profile implemented at time ts and the profiles are the shift andlinear trend that were discussed earlier. In the simulation study, if ∆ is given by (5.3) and saythe shift occurs after time point 2, then the probability of detecting this shift at time point 3is P (t3|∆t2) = P (E3 < −L · τE3|∆t2), and if the shift occurs just after time point 3, then theprobability of detecting this change at time point 4 is P (t4|∆t3) = P (E4 < −L · τE4 |∆t3), andso forth. For each simulation run, when Em < −L · τEm a signal is given as 1, else the signal is0. Thus, the probability of the signal is calculated as the average number of the signals at eachtime point over 1 000 simulations.

The second performance measure that is calculated is the accumulative probability of signal

Page 114: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

102 5.3. A simulation study

(APOS). It gives an indication of the power of detecting the decline up to the current time point.This is an accumulated probability that is obtained after the shift has occurred and is calculatedas each time point, taking into account the probability of detection at the current and previoustime points. If a decline occurs after the second time point, then the APOS at each time pointis given by

APOS(tm|∆ts) = 1− P ((E2 > −L · τE2 , E3 > −L · τE3 , . . . , Em > −L · τEm)|∆ts). (5.21)

In this simulation, the shift profile in (5.3) is implemented after every time point and the POSand APOS are calculated for each setting. For instance, if a shift is implemented after t2 thePOS and APOS are both calculated for each time point, similarly when the shift occurs aftertime point 3, or time point 4, and so forth. For a trend profile in (5.5), the change is onlyimplemented after the second time point, and the POS and APOS performance measures arecalculated for time point 2 onwards. The value of POS is expected to be low at time point 2because at this time, the primary reference is stable, hence the obtained probability correspondsto false signals.

Most engineering or production applications use the average run length (ARL) to investigatethe performance of a control chart. It is the average number of runs required until an alarm ismade. Similarly, a higher ARL is expected when the process is stationary, and a smaller ARLis expected when the process is changing. In some cases, there is a one-to-one relation betweenPOS and ARL. For example, under a stationary process, the ARL for a Shewhart chart whichis the EWMA with λ = 1 is given by

ARL(tm|∆ts = 0) = 1P (tm|∆ts = 0) , (5.22)

where ARL(ti) is the ARL at time point tm (Göb et al., 2001) for change profile ∆ts . Thestatistics POS, ARL, and APOS all depend on the weighting parameter λ and the size of thecontrol limits (L) which are determined prior to the stability monitoring.

The choice for the optimal parameter values of λ (the weighting parameter) and L (thewidth of the control limit) in the EWMA chart for the primary reference will be investigated.The values that will be assessed are λ = 0.1 and 0.2, and L = 1.5, 2.0, and 2.5. The overallperformance measure is based on 1 000 simulations and 10 time points.

5.3.2 Probability of false alarms

The results obtained under a stationary process are shown in Table 5.2. The results indicatethat, at the earlier time points the false alarms are slightly higher. But as the process progresses,these false alarms reduce. The false alarm rate varies with the width of the control limit and not

Page 115: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

5.3. A simulation study 103

much disparity is seen for different values of the weighting parameter λ used. For instance, theprobability of a false signal ranges from 0.034 to 0.068 for narrow control limits (i.e., L = 1.5),0.006 to 0.018 for medium control limits (i.e., L = 2.0), and from 0 to 0.005 for wider controllimits (i.e., L = 2.5) and these ranges almost independent of λ.

Table 5.2: Probability of detection and accumulative probability under a stationary processbased on 6 bioassay runs

POS APOSL 1.5 2.0 2.5 1.5 2.0 2.5

Time\λ 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.22 0.068 0.068 0.018 0.018 0.005 0.005 0.068 0.068 0.018 0.018 0.005 0.0053 0.043 0.044 0.012 0.014 0.002 0.002 0.097 0.099 0.026 0.029 0.007 0.0074 0.051 0.054 0.010 0.011 0.001 0.000 0.119 0.127 0.031 0.036 0.008 0.0075 0.036 0.041 0.007 0.007 0.002 0.001 0.132 0.151 0.034 0.038 0.009 0.0086 0.042 0.045 0.012 0.011 0.000 0.001 0.146 0.170 0.042 0.047 0.009 0.0097 0.036 0.038 0.008 0.006 0.001 0.001 0.157 0.185 0.048 0.052 0.010 0.0108 0.035 0.034 0.007 0.009 0.001 0.001 0.166 0.198 0.053 0.058 0.011 0.0119 0.049 0.049 0.007 0.009 0.002 0.002 0.182 0.217 0.056 0.064 0.013 0.01310 0.044 0.044 0.007 0.011 0.000 0.001 0.191 0.234 0.059 0.072 0.013 0.014

5.3.3 Probability of detecting a shift

The bioactivity is assumed to be stable between the first and second time points but then ajump of 10% (p = 0.10 in (5.3)) may occur. Such a shift is implemented at different time points.The probability and accumulative probability of detecting a shift at each time point are shownin Table 5.3 for the narrow control limits (L = 1.5), Table 5.4 for medium control limits, andTable 5.5 for the wider control limits. The columns show the time points where a change in theprimary reference has occurred. For example, column 2 shows probabilities when the shift hasoccurred just after the second evaluation in Figure 5.1. This new bioactivity level remained atthat level throughout the monitoring window.

It is observed that when a shift occurs at earlier time points, the probability of signalling thisshift is significantly higher compared to shifts occurring at later time points. For example, if ajump occurred after the second time point, the probability of detecting it at time point 3 is 0.599(λ = 0.1) and 0.638 (λ = 0.2). But the probability of detecting this change at subsequent timepoints reduces to 0.128 and 0.094 at the last time point when λ = 0.1 and λ = 0.2, respectively.If a shift occurred after time point 3 (results in column 3), the probability of detecting it at timepoint 4 is 0.477 (λ = 0.1) and 0.535 (λ = 0.2). Consequently, the probability observed at timepoint 2 are false signals because the primary reference is assumed to be stable between timepoint 1 and time point 2. The probability of a signal reduces faster when λ = 0.2 compared to

Page 116: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

104 5.3. A simulation study

when λ = 0.1. For example, the probability of detecting a change at time point 4 which occurredafter the second time point point is 0.412 when λ = 0.1 and 0.398 when λ = 0.2. In addition, itis observed that, the probability of detecting a late shift is low compared to a shift occurring atearlier time points of the monitoring period.

The accumulated probabilities indicate a higher chance of detecting a decline, especially foran early occurring shift. For instance, when a shift occurred after the second time point, theprobability of detecting this shift increases from 0.601 (third time point) to 0.735 (last timepoint) when λ = 0.1 and from 0.640 (third time point) to 0.778 (last time point) when λ = 0.2.The accumulative probability of signalling increases faster with λ = 0.2 compared to whenλ = 0.1. Increasing the number of bioassays runs result in an increased probability of detectinga change in the bioactivity (Table 5.9 in the Appendix).

Page 117: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

5.3. A simulation study 105

Tab

le5.3:

Prob

ability

ofdetectinga10%

shift

with

narrow

controllim

its(L

=1.5)

basedon

6bioassay

runs

Shift

34

56

78

910

34

56

78

910

POS

λ=

0.1

λ=

0.2

20.05

20.05

30.05

40.05

60.05

80.06

20.06

30.06

70.05

20.05

30.05

40.05

60.05

80.06

20.06

30.06

73

0.59

90.03

20.03

40.03

60.03

80.04

00.04

00.04

00.63

80.03

40.03

60.03

70.03

90.04

00.04

10.04

34

0.41

20.47

70.04

10.04

20.04

50.04

60.04

70.05

00.39

80.53

50.04

40.04

70.04

70.04

70.04

90.05

15

0.33

20.38

80.44

60.02

40.02

50.02

80.03

10.03

40.28

30.39

30.53

90.03

00.03

20.03

30.03

70.04

06

0.25

60.29

60.33

60.39

80.03

30.03

40.03

80.04

10.19

70.26

10.36

10.50

40.03

60.03

90.04

20.04

37

0.19

60.23

70.27

70.33

40.38

40.03

10.03

30.03

50.14

00.19

00.26

00.36

50.49

20.03

40.03

50.03

58

0.15

20.18

40.21

50.25

70.30

90.37

10.03

40.03

50.11

40.14

10.18

70.25

00.35

90.50

20.03

00.03

39

0.15

10.17

30.19

80.22

90.25

50.30

20.35

20.04

60.11

10.12

70.15

60.20

30.26

30.34

00.49

30.04

910

0.12

80.14

90.16

60.19

50.22

50.26

20.30

50.36

10.09

40.10

70.12

60.15

90.20

20.26

90.37

50.51

2

APO

S2

0.05

20.05

30.05

40.05

60.05

80.06

20.06

30.06

70.05

20.05

30.05

40.05

60.05

80.06

20.06

30.06

73

0.60

10.07

40.07

70.08

00.08

40.08

90.09

00.09

30.64

00.07

80.08

10.08

30.08

60.09

10.09

20.09

74

0.67

10.48

50.09

70.10

00.10

50.11

00.11

20.11

60.70

40.54

40.10

50.10

90.11

10.11

60.11

70.12

25

0.69

10.56

10.45

80.10

80.11

30.12

00.12

40.12

90.73

10.62

10.55

00.12

40.12

80.13

40.13

80.14

56

0.70

30.58

40.50

50.42

10.12

70.13

40.13

60.14

30.74

00.64

20.59

90.53

00.14

30.15

10.15

60.16

37

0.71

10.61

50.54

20.48

90.41

80.14

50.14

70.15

40.75

10.66

80.63

80.60

80.52

90.16

50.16

90.17

68

0.71

80.62

80.56

30.52

20.48

40.41

20.15

80.16

30.76

00.68

80.65

80.63

30.60

70.54

40.17

90.18

89

0.72

50.63

90.58

20.55

20.52

10.46

70.40

00.17

90.76

90.70

00.67

20.65

40.63

40.59

30.54

50.20

810

0.73

50.65

30.59

80.57

30.54

40.50

10.46

80.42

50.77

80.71

30.68

40.67

40.65

10.62

20.61

10.57

4

Page 118: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

106 5.3. A simulation study

Table

5.4:Probability

ofdetectinga10%

shiftwith

medium

controllimits

(L=

2.0)based

on6bioassay

runs

Shift3

45

67

89

103

45

67

89

10PO

=0.1

λ=

0.2tim

e2

0.0070.009

0.0090.010

0.0130.015

0.0170.018

0.0070.009

0.0090.010

0.0130.015

0.0170.018

30.383

0.0080.009

0.0100.010

0.0110.011

0.0110.419

0.0070.008

0.0080.009

0.0090.009

0.0114

0.2180.258

0.0050.005

0.0070.008

0.0090.009

0.2010.323

0.0050.006

0.0090.009

0.0100.010

50.146

0.1830.235

0.0050.005

0.0050.005

0.0060.121

0.1960.312

0.0040.006

0.0060.006

0.0076

0.1050.130

0.1620.204

0.0080.008

0.0090.011

0.0800.114

0.1780.283

0.0080.009

0.0090.011

70.069

0.0810.107

0.1380.174

0.0070.007

0.0080.046

0.0640.096

0.1720.284

0.0060.006

0.0068

0.0550.067

0.0830.105

0.1300.167

0.0060.006

0.0340.049

0.0710.113

0.1680.270

0.0080.008

90.054

0.0670.083

0.0960.118

0.1470.181

0.0050.032

0.0500.066

0.0920.125

0.1810.274

0.00810

0.0480.052

0.0620.077

0.1000.113

0.1400.174

0.0230.030

0.0460.067

0.0920.121

0.1910.292

APO

S2

0.0070.009

0.0090.010

0.0130.015

0.0170.018

0.0070.009

0.0090.010

0.0130.015

0.0170.018

30.384

0.0170.017

0.0180.020

0.0230.025

0.0260.420

0.0160.017

0.0180.021

0.0230.025

0.0284

0.4360.261

0.0190.020

0.0230.026

0.0290.030

0.4650.326

0.0200.022

0.0280.030

0.0320.035

50.458

0.3230.239

0.0220.025

0.0280.031

0.0330.486

0.3960.317

0.0250.030

0.0320.034

0.0376

0.4720.348

0.2820.212

0.0310.034

0.0370.040

0.4990.418

0.3590.291

0.0370.039

0.0410.046

70.478

0.3560.301

0.2530.185

0.0390.042

0.0460.504

0.4270.378

0.3510.298

0.0440.046

0.0518

0.4820.363

0.3130.269

0.2230.183

0.0460.050

0.5090.433

0.3900.378

0.3470.291

0.0520.056

90.490

0.3690.325

0.2890.254

0.2350.199

0.0520.515

0.4420.401

0.4010.381

0.3430.291

0.06110

0.4920.372

0.3330.300

0.2760.261

0.2500.196

0.5180.445

0.4150.418

0.4040.374

0.3540.317

Page 119: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

5.3. A simulation study 107

Tab

le5.5:

Prob

ability

ofdetectinga10%

shift

with

wider

controllim

its(L

=2.5)

basedon

6bioassay

runs

Shift

34

56

78

910

34

56

78

910

POS

λ=

0.1

λ=

0.2

20.00

20.00

30.00

40.00

40.00

40.00

40.00

40.00

40.00

20.00

30.00

40.00

40.00

40.00

40.00

40.00

43

0.18

40.00

20.00

20.00

20.00

20.00

20.00

20.00

20.20

70.00

20.00

20.00

20.00

20.00

20.00

20.00

24

0.08

70.11

40.00

00.00

00.00

00.00

00.00

00.00

00.08

60.15

30.00

00.00

00.00

00.00

00.00

00.00

05

0.04

00.05

90.07

90.00

10.00

10.00

10.00

20.00

20.03

70.07

20.13

20.00

10.00

10.00

10.00

10.00

16

0.03

50.04

60.05

60.07

40.00

00.00

00.00

00.00

00.02

20.04

00.07

00.12

60.00

10.00

10.00

10.00

17

0.01

60.02

60.03

50.04

40.06

30.00

00.00

00.00

10.00

80.01

90.03

40.05

60.10

60.00

00.00

00.00

18

0.01

50.02

20.02

80.03

40.04

20.05

80.00

10.00

10.00

80.01

60.02

00.03

30.06

50.11

60.00

10.00

19

0.01

30.02

30.02

60.03

20.04

30.05

30.06

80.00

20.00

50.01

00.01

40.02

90.04

70.08

60.13

40.00

210

0.01

00.01

10.01

40.01

90.03

10.04

30.04

90.06

30.00

50.00

70.01

00.01

70.02

40.04

70.08

60.13

2

APO

S2

0.00

20.00

30.00

40.00

40.00

40.00

40.00

40.00

40.00

20.00

30.00

40.00

40.00

40.00

40.00

40.00

43

0.18

40.00

50.00

60.00

60.00

60.00

60.00

60.00

60.20

70.00

50.00

60.00

60.00

60.00

60.00

60.00

64

0.20

90.11

60.00

60.00

60.00

60.00

60.00

60.00

60.23

50.15

50.00

60.00

60.00

60.00

60.00

60.00

65

0.22

10.14

70.08

10.00

70.00

70.00

70.00

70.00

70.25

00.19

80.13

40.00

70.00

70.00

70.00

70.00

76

0.22

60.15

90.10

00.07

50.00

70.00

70.00

70.00

70.25

20.20

70.16

40.12

70.00

80.00

80.00

80.00

87

0.22

80.16

70.11

40.09

90.06

80.00

70.00

70.00

80.25

30.21

40.17

70.15

40.11

20.00

80.00

80.00

98

0.23

30.17

20.12

20.11

30.08

50.06

40.00

80.00

90.25

60.22

10.18

40.16

40.13

80.12

30.00

90.01

09

0.23

50.17

90.13

00.12

10.09

80.08

40.07

30.01

10.25

70.22

40.18

90.17

40.15

40.16

40.14

00.01

210

0.23

90.18

20.13

40.12

60.10

50.09

70.09

00.06

90.26

00.22

80.19

20.17

90.16

10.18

10.17

80.13

8

Page 120: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

108 5.4. Discussion and conclusion

5.3.4 Probability of detecting a linear trend

The linear trend in bioactivity was of the form (5.5) with β = 0.01906, that is, a 10% change inthe bioactivity every 5 years. This gradual change is implemented immediately after the secondtime point, and as a result, the probability of signalling a change a time point two (t2) shouldbe low since it indicates a false signal. The individual probability (POS) and accumulativeprobability (APOS) of signalling a change at each time point are presented in Table 5.6. Fromthe results in Table 5.6, the probability of signalling a change is low at the early time points,but it gradually increase as the time increases. This is the case when λ = 0.1 and λ = 0.2 forall settings of control limits. The probability of signalling a change early is higher for λ = 0.2,while the probability of signalling a change later on is higher for λ = 0.1 than for λ = 0.2. Thisis true irrespective of the choice of limits. However, the cumulative probability over time seemsbetter for λ = 0.2 than for λ = 0.1. A better probability of detection is observed when thenumber of bioassay runs used is 12 (Table 5.8 in the Appendix).

Table 5.6: Probability of detecting a linear trend with a 10% decline every 5 years based on6 bioassay runs

Individual probability Accumulative probabilityL 1.5 2.0 2.5 1.5 2.0 2.5λ 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.22 0.047 0.047 0.006 0.006 0.002 0.002 0.047 0.047 0.006 0.006 0.002 0.0023 0.270 0.302 0.105 0.117 0.028 0.038 0.279 0.312 0.108 0.120 0.029 0.0394 0.277 0.292 0.119 0.127 0.045 0.050 0.387 0.437 0.175 0.193 0.060 0.0745 0.346 0.339 0.148 0.139 0.040 0.044 0.481 0.534 0.245 0.265 0.081 0.1006 0.366 0.341 0.166 0.162 0.061 0.064 0.544 0.599 0.300 0.328 0.111 0.1347 0.407 0.371 0.189 0.177 0.065 0.056 0.621 0.677 0.354 0.392 0.143 0.1638 0.435 0.394 0.217 0.195 0.080 0.070 0.680 0.735 0.411 0.454 0.174 0.1969 0.459 0.392 0.253 0.215 0.115 0.101 0.721 0.772 0.470 0.506 0.220 0.24110 0.506 0.447 0.287 0.239 0.130 0.109 0.766 0.818 0.521 0.559 0.259 0.285

5.4 Discussion and conclusion

The main focus of this study is on monitoring the bioactivity of a primary reference. Theprimary reference is created in the absence of an international reference . Both primary andsecondary standards are created and stored at freezing temperatures to maintain their activity.These standards do not have to be of similar substances, for example, a primary reference canbe of liquid form while the secondary reference is of solid form. The secondary references arereplaced frequently to guarantee stability of the bioactivity over time and constant quality of

Page 121: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

5.4. Discussion and conclusion 109

the medicinal product. These secondary reference standards are assumed to be stable for a ∆tperiod and they are used to monitor the primary reference standard. However, the stability ofthe secondary reference can also be assessed to check whether it is possible to increase the ∆tperiod to 2∆t.

The proposed strategy assesses the stability of the primary reference at each time point usingthe differences of the secondary reference standards between time points. In this study we showedhow the data from the well-plates can be used to assess the stability of the primary reference.Firstly, the nonlinear curves are fitted and the results are used to estimate the bioactivities whichwill be used to study the stability of the primary reference. These log bioactivity estimates arethen analysed with a variance component model to determine the average bioactivities whiletaking into account the variability due to bioassay runs and plates. The differences of theestimated average log bioactivities are used to determine the EWMA statistic and its variance.The main analysis focused on the three control limits: the narrow/short (L = 1.5), medium(L = 2.0), and wide (L = 2.5) control limits with the weighting parameter values λ = 0.1 andλ = 0.2. These were used to assess how well a shift and a linear trend profile is detected by thesystem. The performance can be improved by increasing the number of bioassay runs at eachtime point.

Several authors have discussed the effects of autocorrelation in monitoring processes in sta-tistical process control (Lu and Reynolds Jr., 1999a,b). One of the effects associated withautocorrelation is that it may increase the rate of false alarms and this may impact the outcomeof the process. Suggestions that have been made to overcome the impact of autocorrelationon the frequency of signals, include the use of residuals instead of the original observations inthe control chart (Lu and Reynolds Jr., 1999a). However, interpreting a control chart based onresiduals is not straightforward. In this current work, we do have issues associated with auto-correlation since the differences in bioactivities Zm and Zm+1 are correlated, but this correlationis not diminishing over time because Zm and Zm+2 are uncorrelated. Our focus was on theoriginal EWMA, due to the limited autocorrelation and for keeping interpretation simple. Butour setting does demonstrate that time-varying correlations can be different from autoregressiveforms.

The results indicated low false alarm rates when the primary reference is stable only for themedium and wide control limits. Two change profiles were considered for the primary referenceand these were the shift and the linear trend. The probability of signalling, that is, the POS, wascalculated at each time point for both shift and trend profiles. A 10% shift was assumed, andthe probabilities of detecting this change were high, especially when the narrow control limitswere assumed. But there was little difference in the POS obtained with λ = 0.1 and λ = 0.2for the same control limit. For the linear trend, a 10% decline every five years was assumedand the probabilities of detecting these changes were significantly high, more so when L = 1.5.

Page 122: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

110 5.4. Discussion and conclusion

For linear profiles, λ = 0.2 detects a change quicker than with λ = 0.1. For both profiles, thecumulative probability for detecting a change is better with λ = 0.2 than with λ = 0.1 for thefirst 10 years of the primary reference.

In the literature, smaller values of λ are beneficial in detecting small changes in the process,however in our application, λ = 0.2 performed better than λ = 0.1. But this chapter focusedon a 10% change in the bioactivity, and λ = 0.1 may perform better than λ = 0.2 if a 5% oreven smaller change was the objective. The performance can be optimised by increasing thenumber of bioactivities at each time point. Balancing out the false positive signals and thepower, λ = 0.2 and L = 2 seem to be the most realistic choices in practice.

Page 123: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

5.4. Discussion and conclusion 111

Appendix

Some of the results that are based on 12 bioassay runs.

Table 5.7: Probability of detection and accumulative probability under a stationary processbased on 12 bioassay runs

POS APOSL 1.5 2.0 2.5 1.5 2.0 2.5

Time\λ 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.22 0.066 0.066 0.020 0.020 0.003 0.003 0.066 0.066 0.020 0.020 0.003 0.0033 0.039 0.040 0.011 0.010 0.003 0.004 0.090 0.093 0.028 0.027 0.006 0.0074 0.042 0.045 0.005 0.004 0.000 0.000 0.113 0.120 0.030 0.029 0.006 0.0075 0.034 0.036 0.008 0.008 0.000 0.000 0.132 0.143 0.035 0.037 0.006 0.0076 0.041 0.042 0.010 0.009 0.000 0.001 0.144 0.159 0.040 0.042 0.006 0.0087 0.033 0.039 0.011 0.012 0.001 0.004 0.158 0.179 0.045 0.049 0.006 0.0118 0.035 0.035 0.009 0.008 0.002 0.002 0.172 0.196 0.049 0.054 0.008 0.0139 0.029 0.035 0.007 0.008 0.000 0.003 0.184 0.214 0.053 0.060 0.008 0.01610 0.040 0.046 0.008 0.010 0.001 0.002 0.191 0.230 0.056 0.066 0.008 0.016

Table 5.8: Probability of detecting a linear trend with a 10% decline every 5 years with 12bioassay runs

Individual probability Accumulative probabilityL 1.5 2.0 2.5 1.5 2.0 2.5λ 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.2 0.1 0.22 0.044 0.044 0.008 0.008 0.001 0.001 0.044 0.044 0.008 0.008 0.001 0.0013 0.422 0.453 0.220 0.246 0.082 0.101 0.428 0.461 0.224 0.251 0.083 0.1024 0.482 0.492 0.261 0.270 0.106 0.109 0.581 0.613 0.345 0.376 0.151 0.1745 0.552 0.535 0.305 0.293 0.129 0.124 0.692 0.719 0.440 0.470 0.211 0.2406 0.592 0.551 0.355 0.327 0.168 0.151 0.769 0.797 0.520 0.548 0.275 0.2987 0.624 0.568 0.388 0.337 0.186 0.169 0.810 0.834 0.574 0.602 0.331 0.3518 0.672 0.586 0.432 0.376 0.255 0.204 0.852 0.861 0.643 0.668 0.406 0.4199 0.712 0.622 0.481 0.395 0.277 0.202 0.890 0.897 0.700 0.725 0.460 0.46610 0.745 0.650 0.539 0.448 0.317 0.239 0.913 0.926 0.762 0.778 0.525 0.526

Page 124: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

112 5.4. Discussion and conclusion

Table

5.9:Probability

ofdetectinga10%

shiftwith

narrowcontrollim

its(λ

=0.1)

basedon

12bioassay

runs

Shift3

45

67

89

103

45

67

89

10L=

1.5PO

SAPO

S2

0.0510.051

0.0540.054

0.0560.060

0.0610.062

0.0510.051

0.0540.054

0.0560.060

0.0610.062

30.851

0.0270.028

0.0290.031

0.0310.033

0.0360.851

0.0700.072

0.0730.077

0.0800.083

0.0844

0.6530.758

0.0280.031

0.0340.035

0.0370.039

0.8820.761

0.0870.090

0.0950.099

0.1030.106

50.519

0.6040.693

0.0290.029

0.0310.033

0.0340.888

0.8100.698

0.1080.112

0.1180.123

0.1266

0.3810.456

0.5440.642

0.0320.032

0.0330.040

0.8910.824

0.7400.648

0.1240.128

0.1330.139

70.303

0.3590.419

0.4990.599

0.0270.028

0.0310.893

0.8360.761

0.6990.612

0.1400.144

0.1528

0.2590.295

0.3430.404

0.4800.574

0.0320.035

0.8970.841

0.7730.725

0.6730.604

0.1570.167

90.206

0.2400.281

0.3240.389

0.4620.568

0.0270.899

0.8440.781

0.7380.696

0.6530.599

0.17710

0.1620.200

0.2270.269

0.3340.399

0.4690.549

0.9010.848

0.7890.750

0.7160.688

0.6550.595

L=2.0

20.008

0.0090.013

0.0130.015

0.0190.019

0.0190.008

0.0090.013

0.0130.015

0.0190.019

0.0193

0.6580.007

0.0070.008

0.0100.010

0.0100.010

0.6580.016

0.0190.020

0.0230.026

0.0260.026

40.438

0.5470.002

0.0020.004

0.0050.005

0.0050.705

0.5480.020

0.0210.024

0.0280.028

0.0285

0.2900.361

0.4700.002

0.0020.002

0.0060.007

0.7170.600

0.4700.022

0.0250.029

0.0310.032

60.212

0.2530.317

0.3980.006

0.0080.009

0.0100.721

0.6150.518

0.4010.029

0.0340.036

0.0377

0.1350.171

0.2140.286

0.3600.009

0.0090.010

0.7250.623

0.5340.450

0.3680.039

0.0410.042

80.101

0.1380.171

0.2240.280

0.3420.007

0.0070.731

0.6320.544

0.4750.425

0.3540.044

0.0459

0.0860.106

0.1330.164

0.2100.267

0.3260.007

0.7360.638

0.5520.489

0.4500.406

0.3390.049

100.070

0.0840.106

0.1330.156

0.1980.247

0.3250.737

0.6430.559

0.4940.461

0.4270.389

0.341

Page 125: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

REFERENCES 113

References

Gitlow, H., Oppenheim, A., and Oppenheim, R. (1995), Quality Management: Tools and Meth-ods for Improvement, Illinois: Richard D. Irwin, Inc, 2nd ed.

Göb, R., Del Castillo, E., and Ratz, M. (2001), “Run Length Comparisons of Shewhart Chartsand Most Powerful Test Charts for the Detection of Trends and Shifts,” Communications inStatistics - Simulation and Computation, 30, 355–376.

ICH Q1A(R2) (2003), “Stability Testing of New Drug Substances and Products,” ICH Har-monised Tripartite Guideline, 24.

Kirkwood, T. B. L. (1977), “Predicting the Stability of Biological Standards and Products,”Biometrics, 33, 736–742.

Lu, C.-W. and Reynolds Jr., M. R. (1999a), “Control Charts for Monitoring the Mean andVariance of Autocorrelated Processes,” Journal of Quality Technology, 31, 259–274.

— (1999b), “EWMA Control Charts for Monitoring the Mean of Autocorrelated Processes,”Journal of Quality Technology, 31, 166–188.

Montgomery, D. C. (2009), Introduction to Statistical Quality Control, New Jersey: John Wiley& Sons, 6th ed.

Mzolo, T., Goris, G., Talens, E., Di Bucchianico, A., and Van den Heuvel, E. (2015), “Statis-tical Process Control Methods for Monitoring In-house Reference Standards,” Statistics inBiopharmaceutical Research, 7, 55–65.

Roberts, S. (1959), “Control Charts Tests Based on Geometric Moving Averages,” Technomet-rics, 1, 239–250.

USP < 1032 > (2010), “Design and Development of Biological Assays,” Tech. rep., United StatesPharmacopeia.

USP < 1034 > (2010), “Analysis of Biological Assays,” Tech. rep., United States Pharmacopeia.

Van den Heuvel, E. (2010), “A Comparison of Estimation Methods on the Coverage Probabilityof Satterthwaite Confidence Intervals for Assay Precision With Unbalanced Data,” Commu-nications in Statistics - Simulation and Computation, 39, 777–794.

Zhu, J. and Lin, D. K. J. (2010), “Monitoring the Slopes of Linear Profiles,” Quality Engineering,22, 1–12.

Page 126: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016
Page 127: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

CHAPTER 6

A modified Satterthwaite approach for estimationof one-sided tolerance limits for general mixed

effects models

Page 128: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

116

Abstract

In the pharmaceutical industry, statistical tolerance intervals are commonly used to set specifica-tion limits for drug products. Most of the available statistical methods for estimating tolerancelimits are model-specific, with the one-way random effects model being the focal point. Thepresent study proposes a simple approach for computing tolerance limits that is applicable toany mixed or random effects model. A tolerance factor which depends on the ratio of the vari-ance of the mean and variance of the observations and the non-central Student distribution withthe modified Satterthwaite degrees of freedom (Van den Heuvel, 2010) is derived. One of theadvantages of this approach is that the parameter estimates can easily be obtained using mostwidely available statistical software.

Keywords: generalised pivotal quantity, modified large sample, non-central Student distri-bution, specification limits, variance components

Page 129: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

6.1. Introduction 117

6.1 Introduction

Statistical tolerance limits are commonly used to set specification limits in the pharmaceuticalindustry for medicinal drugs (Hoffman, 2010). These limits play a significant role in drug de-velopment and release and it is very important that they are estimated as precisely as possible.A vast amount of literature on the methodology for estimation of tolerance limits or intervalsexists. Tolerance intervals can be estimated as a two-sided interval or as a one-sided lower or up-per tolerance limit. Sharma and Mathew (2012), Krishnamoorthy and Mathew (2009), Hoffman(2010), and Hoffman and Kringle (2005) all give examples of settings where either a one-sidedor two-sided tolerance interval is preferred. One-sided tolerance intervals are typically estimatedfor specification limits.

Several approaches have been developed for tolerance limits for either balanced or unbalanceddata from random effects models (see Table 6.1). Mee and Owen (1983) proposed a one-sidedtolerance interval for a balanced one-way random effects model involving a tolerance factor. Twotolerance factors were derived; one for when the ratio of between and within group variances isknown and the other for when this ratio is unknown. Vangel (1992) argued that the Mee andOwen (1983) approach is conservative (especially when the variance ratio is small) and proposed adifferent approach for computing a one-sided tolerance interval for a balanced one-way randomeffects model. In 1991, another approach based on an approximate statistic of Thomas andHultquist (1978) was developed to compute tolerance intervals for both balanced and unbalancedone-way random effects model (Bhaumik and Kulkarni, 1991). A simple and exact method ofconstructing a tolerance interval for a one-way random effects model was proposed by Bhaumikand Kulkarni (1996). The authors argued that this approach is better since it is exact andoutperforms methods based on approximations. Again, different ways of calculating the tolerancelimit were proposed depending on whether the ratio of variances is known or not.

A generalised Mee and Owen approach for one-sided tolerance intervals for an unbalancedone-way random effects model was proposed by Romero-Villafranca et al. (2008). This approachmodified the estimator of the ratio of variances proposed by Mee and Owen (1983) for balanceddata and used it in the unbalanced data setting. The authors concluded that the proposedapproach should be preferred to those of Bhaumik and Kulkarni (1991). Bagui et al. (1996)proposed an approach for estimating one-sided tolerance intervals for unbalanced m-way randomeffects models.

The generalised pivotal quantity (GPQ) approach has gained a lot of popularity in theapplication of tolerance limits. This approach stems from the theory of generalised p-values andgeneralised confidence intervals (Weerhandi, 1993) which is useful for problems where standardmethods do not apply (Krishnamoorthy and Mathew, 2009). A method for estimating tolerancelimits based on the GPQ approach was first introduced by Krishnamoorthy and Mathew (2004).

Page 130: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

118 6.1. Introduction

This approach was developed for the estimation of one-sided tolerance intervals from a one-wayrandom effects model. The authors made a contrast with the methods of Bagui et al. (1996)and Bhaumik and Kulkarni (1996) and concluded the new approach outperforms these methods.Li (2006) proposed a different version of the GPQ for estimating a one-sided tolerance intervalfrom an unbalanced one-way random effects model. The author noted that, when the intraclasscorrelation and the number of groups are small, the method of Krishnamoorthy and Mathew(2004) gives somewhat liberal tolerance limits whilst the new method tends to give tolerancelimits that are conservative. The approach of Krishnamoorthy and Mathew (2004) was modifiedby Fonseca et al. (2007) to derive tolerance limits for a two-way nested random effects model.In addition, Fonseca et al. (2007) also proposed an approach for estimating tolerance intervalsfrom a mixed model, that is, a model with both fixed and random effects.

A general approach for computing one-sided and two-sided tolerance intervals from a balancedgeneral mixed model and an unbalanced one-way random effects model was proposed by Liaoet al. (2005). This approach is also based on the GPQ theory. For a balanced one-way randomeffects model, the Liao et al. (2005) approach is similar to that of Krishnamoorthy and Mathew(2004). However, for unbalanced data with the same model, the approaches differ slightly inthe way the methods adjust for unequal repetitions in the data. Liao et al. (2005) noted thatthe new method has a performance that is approximately equal to that of Krishnamoorthy andMathew (2004) for most cases albeit with shorter expected lengths of tolerance intervals whenthe intraclass correlation is small. For one-sided tolerance intervals constructed from a two-waynested random effects model the Liao et al. (2005) approach is similar to that of Fonseca et al.(2007). The main advantage of the Liao et al. (2005) is that a general expression for computingtolerance intervals from a general model with balanced data is given, although its performancefor higher-order random effects models is not known.

An alternative and easy method based on the modified large sample (MLS) theory (Graybilland Wang, 1980) was recently proposed by Krishnamoorthy and Mathew (2009). One of theadvantages of this approach is that the tolerance intervals are in closed-form and thus easier tocompute than those based on the GPQ approach (Krishnamoorthy and Lian, 2012) which requiredata simulation. Another analytical approach for general random effects models with balancedor unbalanced data was proposed by Hoffman (2010). This approach is based on the MLS forconstructing confidence bounds on nonnegative functions of variance components. In additionto the analytical approach, Hoffman (2010) included the bootstrap-adjusted tolerance limits.Comparing the Hoffman (2010) method with that of Bagui et al. (1996) and Krishnamoorthyand Mathew (2004), the author concluded that the analytical approach is not as conservativeas the other two and it may yield substantially shorter intervals, particularly for smaller samplesizes and larger confidence coefficients. On the other hand, the bootstrap-adjusted limits aregenerally quite close to the nominal confidence level and can provide shorter intervals, but, this

Page 131: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

6.1. Introduction 119

approach generally yields liberal tolerance limits for small sample sizes.Recently, Sharma and Mathew (2012) proposed an approach for computing both one- and

two-sided tolerance intervals for a general mixed and random effects model using small-sampleasymptotics. However, the computation of the tolerance intervals is involved due to the amountof likelihood derivations required. These tolerance intervals exhibit satisfactory performanceregardless of the sample size. The authors also showed that the method of Krishnamoorthy andMathew (2004) results in conservative tolerance limits whilst the approach of Hoffman (2010)gives accurate results when compared to the new approach.

A summary of the above methods is shown in Table 6.1 including comparisons that havebeen done thus far. The second column contains authors who proposed new approaches, thethird column indicates the model under which the proposed method is applicable, the fourthcolumn indicates the approaches the ‘proposed’ method was compared with, and the last columnindicates which method was concluded to be better. In this last column a cell being empty impliesthat the proposed method outperformed all comparative methods. For example, the approach ofSharma and Mathew (2012) was compared to Hoffman (2010) and Krishnamoorthy and Mathew(2004) and it was as good as the former but better than the latter. In case the fourth columnis empty then the proposed approach was not contrasted against other approaches. It can beseen that it is not easy to choose the best method to estimate the tolerance intervals given thenumber of methods available.

This is due to the fact that there has never been an extensive study conducted to comparethese methods. This gap is even more pronounced when it comes to higher-order models. Itshould be noted, however, that only the approaches of Hoffman (2010) and Sharma and Mathew(2012) are applicable to any random effects models with both balanced and unbalanced data.Unfortunately, it appears from the aforementioned approaches that, the approach of Hoffman(2010) is conservative, the approach of Sharma and Mathew (2012) is computationally intensiveand highly complex to determine for higher order models. Thus, the purpose of this study is tofill this gap by proposing a new approach for estimating a one-sided tolerance interval that iseasy to compute and applicable to any mixed or random effects model structure. This approachadapts the modified Satterthwaite approximation (Van den Heuvel, 2010) to obtain improveddegrees of freedom and correct for underestimation of the variance.

The performance of the new approach will be assessed for two-way random effects modelsagainst the GPQ approach of Liao et al. (2005) and the analytical approach due to Hoffman(2010). In addition, the example used latter in this current study is from a bioassay experiment.One common theme involving bioassay experiments is that the sample size is usually limited. Asa result in this chapter we will put more emphasis on how well our proposed approach performsunder small sample size settings. For further reading on bioassays, we refer to Finney (1978)and Chapter 3 (of this thesis).

Page 132: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

120 6.2. Statistical methods

This chapter is organised as follows: the proposed method and a survey of other methodsare introduced in Section 6.2, the simulation study results and the application to a real dataset is presented in Section 6.3, and the summary and conclusions are done in Section 6.4. Themanuscript of this chapter is being prepared for publication submission.

Table 6.1: A summary of statistical methods for tolerance interval estimation

Abbrev. MethodModel

ComparisonBetter

type methodMO1983 Mee and Owen (1983) 1-wayBK1991 Bhaumik and Kulkarni (1991) 1-wayV1992 Vangel (1992) 1-way MO1983 V1992BBP1996 Bagui et al. (1996) generalBK1996 Bhaumik and Kulkarni (1996) 1-way MO1983 BK1996KM2004 Krishnamoorthy and Mathew (2004) 1-way BBP1996 KM2004

2-way BK1996LLI2005 Liao et al. (2005) 2-way KM2004 LLI2005

generalL2006 Li (2006) 1-way KM2004 L2006FMMZ2007 Fonseca et al. (2007) 2-wayRZRP2008 Romero-Villafranca et al. (2008) 1-way BK1991 RZRP2008

KM2004LLI2005

KM2009 Krishnamoorthy and Mathew (2009) 1-way KM2004 KM20092-way

H2010 Hoffman (2010) general BK1996 H2010KM2004

SM2012 Sharma and Mathew (2012) general H2010 SM2012KM2004

6.2 Statistical methods

Suppose a general variance components model is considered with Y the vector of N observations,

Y = Xα + Zτ + ε = Xα +q∑i=1

Aiτi + ε (6.1)

with X an N×p matrix of known constants of rank p, α a vector of p unknown fixed parameters,Ai an (N × si) matrix of known constants, τi a vector of si independent normally distributedvariables with mean zero and variance σ2

i , and ε a vector of N independent normally distributedvariables with mean zero and variance σ2

e . The bold letters represent matrices and the other

Page 133: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

6.2. Statistical methods 121

letters, for example, Y , are vectors. Model (6.1) is used on existing data to estimate the fixedeffects α, variance components σ2

1, σ22, . . . , σ

2q , and σ2

e and then possibly other statistics. Inthis current work we are mainly interested in the estimation of the tolerance intervals for thedistribution of a new observation Y0 ∼ N(XT

0 α, σ2T ), with X0 a vector that corresponds to the

fixed effects and σ2T = ∑q

i=1 σ2i +σ2

e the total variance. For specification settings, the mean XT0 α

is often a constant µ indicating an average value of a quality characteristic of the product.There are different types of tolerance intervals that can be constructed; the first is referred

to as the β-expectation tolerance interval where the tolerance interval is defined such that onaverage it contains a proportion β of the population (Krishnamoorthy and Mathew, 2009) andthis is equivalent to a prediction interval. The second type of the tolerance interval is formulatedas a confidence interval on a quantile (only true in a one-sided case) and this will be the focus ofthis chapter. In order to compute a tolerance interval, two quantities have to be specified: thecontent p and the confidence level γ. The (p, γ) upper tolerance limit B which is a function ofY , say B = ϕ(Y ) satisfies the condition

P [F (B) ≥ p] = γ. (6.2)

where F (·) is the cumulative distribution function of Y0 (Hoffman, 2010). This implies thatat least a proportion p of the population lies below the limit B with confidence γ. A similarapproach for the (p, γ) lower tolerance interval can be followed. One general form of a (p, γ)upper tolerance limit is

Y + k

√√√√ q∑i=1

σ2i + σ2

e , (6.3)

where k is referred to as the tolerance factor. The term under the square root sign is an estimateof the total variance, that is, σ2

T . This variance can be estimated using moment estimators,(restricted) maximum likelihood estimators or any other type of appropriate estimator. Severalmethods for estimating tolerance intervals are centered around the determination of the tolerancefactor k; more so when the tolerance interval is estimated from a one-way random effects model.Different approaches or strategies have been developed for determining the tolerance factor (k)and some of these will be elaborated on subsequent sections.

Another general form of constructing tolerance intervals is based on the generalised pivotalquantity (GPQ) (Krishnamoorthy and Mathew, 2004). The tolerance interval is obtained bysimulating an iid sample from a standard normal distribution and also generating χ2 distributedrandom variables associated with each variance component. This χ2-value is given by U2

i =SSi/E(S2

i ) ∼ χ2dfi

where SSi, S2i , E(S2

i ), and dfi are the sums of squares, the mean squares, theexpected mean squares, and the degrees of freedom corresponding to variance component σ2

i ,

Page 134: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

122 6.2. Statistical methods

respectively. Given all the above information a variable, say, G is computed for each simulateddata and G is the pth percentile of the distribution. Then, the γth quantile of the G valuesgiven by Gγ gives a (p, γ) upper tolerance bound. An elaboration of the algorithm for computingtolerance intervals based on the GPQ approach will be discussed later.

In addition to the above forms of constructing tolerance intervals, another approach that hasbeen utilised in developing methods for tolerance intervals is the modified large sample (MLS)method (Burdick and Graybill, 1992; Graybill and Wang, 1980). The idea of the MLS originatesfrom the estimation of a confidence bound of a linear combination of variance components wherethe confidence limits are obtained by modifying a large sample confidence bound for, say, σ2

T

(Krishnamoorthy and Mathew, 2009). These methods will be introduced in the forthcomingsections.

6.2.1 Determination of the tolerance factor for tolerance intervals

In case Formula (6.3) is taken as the γ upper confidence limit for the pth quantileµ+ zpσe

√1 +∑q

i=1 ri = µ+ zpσeCT , with ri = σ2i /σ

2e known ratios of variance components, the

tolerance factor k in (6.3) is equal to

k = (CM/CT ) tdf,γ (zpCT/CM) , (6.4)

with tdf,γ(δ) the γth quantile of the non-central t-distribution with df degrees of freedom andnon-centrality parameter δ, and with CM the square root of the ratio of the variance of theoverall average Y for Model (6.1) and the residual variance component, that is, C2

M = σ2M/σ

2e ,

with σ2M = var(Y ). Since the matrices X and Z in Model (6.1) are known, σ2

eC2M is a known

linear function of all variance components, say σ2eC

2M = ceσ

2e +∑q

i=1 ciσ2i . Under the assumption

of normality and for known ratios (ri), the upper tolerance limit in (6.3) with k given by (6.4)is exact. Constructing one-sided tolerance intervals of the form of (6.3) for normally distributeddata is complicated when some or all ri’s are unknown. Note that these ri’s not only appear inCT , but they are also part of CM .

A number of researchers have proposed their own versions of the tolerance factor k in one-wayrandom effects models. One of these factors was proposed by Vangel (1992) and this factor isbased on a ratio of mean squares R = S2

1/S2e . This tolerance factor is approximated by a cubic

polynomial given by

k = v1 + v2W + v3W2 + v4W

3. (6.5)

whereW = 1/√

1 + (n− 1)/R (n is the number of replicates) and v1, v2, v3, v4 are the coefficientstabulated in Vangel (1992). These coefficients depend on the number of groups, repetitions,

Page 135: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

6.2. Statistical methods 123

content (p), and confidence (γ) used. Rigorous derivations of this tolerance factor can be foundin Krishnamoorthy and Mathew (2009) and Vangel (1992).

A tolerance factor which is closely related to the one in (6.4) driven by the ratio of variances(R = σ2

1/σ2e) was proposed by Bhaumik and Kulkarni (1996) to compute tolerance intervals.

With this approach, the tolerance factor is given by

k = tN−1,γ(δ)/√

(1 +R)ω, (6.6)

with the non-centrality parameter δ = zp√

(1 +R)ω. The parameter ω is the sum of weightsgiven by

ω =I∑i=1

ωi =I∑i=1

ni1 + niR

. (6.7)

where I is the number of levels for the random effects and ni is the number of replicates ingroup i. Thus, the method applied to unbalanced one-way random effects models. The abovetolerance factor is based on the condition that R is known. In general, it is rarely the case that Ris known, as a consequence Bhaumik and Kulkarni (1996) suggested that R should be replacedby

R = SS1

SSe(N − I − 2)− λ, (6.8)

where SS1 is the sums of squares due to random effects, SSe is the sums of squares due to errors(or residuals), λ is the harmonic mean of size ni given by λ = (1/I)∑I

i=1 n−1i .

A generalised Mee and Owen (1983) approach for one-sided tolerance intervals for an unbal-anced one-way random effects model was proposed by Romero-Villafranca et al. (2008). Thisapproach also makes use of the variance ratio R. The tolerance factor is approximated as

k = tν,γ(δ)√n∗

, (6.9)

where the non-centrality parameter is given by δ = zp√n∗ with n∗ is the effective sample size

given by n∗ = (Iλ−1(R + 1)/(λ−1R + 1)) and ν is the number of degrees of freedom. If R isunknown then Romero-Villafranca et al. (2008) recommended using an estimate

R = max{

0, (Fη,N−I,I−1λ−1S2

1/S2e − 1)/λ−1

}(6.10)

where Fq,df1,df2 is the q-quantile for an F -distribution with df1 and df2 degrees of freedom. Thevalue η in (6.10) depends on the values of the content (p) and confidence (γ) of the tolerance

Page 136: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

124 6.2. Statistical methods

interval and is determined by

η = 7.8833− 2.281p+ 1.9444p2 − 14.818γ + 9.3518γ2 − 1.1791pγ. (6.11)

These values were determined such that the absolute bias associated with γ is minimised(Romero-Villafranca et al., 2008).

A general observation from all the above mentioned tolerance factors is that they are onlyapplicable to a one-way random effects model. A major concern around is the lack of their gen-eralisability to any higher order mixed or random effects models. To propose a general tolerancefactor that can be used irrespective of the model structure and which is applicable to both bal-anced or unbalanced data, we follow the approach of Van den Heuvel (2010). The distributionof the estimator σ2

T = σ2e +∑q

i=1 σ2i in (6.3) can be approximated with a χ2 distribution, that is,

dfT σ2T/σ

2T ∼ χ2

dfT, with the degrees of freedom dfT calculated as

dfT = 2(σ2e +

q∑i=1

σ2i

)2

/

(ς2e +

q∑i=1

ς2i

), (6.12)

with ςi and ςe the estimated standard errors of σ2i and σ2

e , respectively. The main difference in theabove degrees of freedom to the original Satterthwaite approximation is that the approximationis directly applied to the variance component estimators and the denominator excludes thecovariances between variance component estimators. The reason is that these covariances areusually negative and increase the degrees of freedom to unrealistic numbers. To further improvethe estimation of these degrees of freedom, Van den Heuvel (2010) asserted that the number ofdegrees of freedom should be bounded between 1 and N − 1. Thus, the degrees of freedom thatwill be used are given by dfT = max {1,min(N − 1, dfT )}. The factor k is then approximated as

k∗ = (σM/σT ) tdfT ,γ (zpσT/σM) (6.13)

The upper tolerance limit in (6.3) with the tolerance factor k∗ given by (6.13) can essentiallybe applied to any general variance component model of the form (6.1) for both balanced orunbalanced data. However, the estimation of factor k will introduce additional uncertainty thatshould be compensated for in (6.13) for the calculation of tolerance limits. Thus, to accountfor this uncertainty k∗ is multiplied by the factor

√dfT/χ2

1−γ,dfT and the tolerance factor is nowgiven by

k =(σM/σT

√dfT/χ2

1−γ,dfT

)tdfT ,γ (zpσT/σM) . (6.14)

The factor√dfT/χ2

1−γ,dfT is the lower confidence limit of the ratio (σ2M/σ

2T ) (Burdick and Gray-

bill, 1992) where we let dfM → ∞ so that the F -distribution simplifies to the χ2 distribution.

Page 137: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

6.2. Statistical methods 125

Consequently, the upper tolerance interval is now given by

Y +(σMσT

)tdfT ,γ

(zpσTσM

√√√√ dfT σ2T

χ21−γ,dfT

. (6.15)

The factor√dfT/χ2

1−γ,dfT approaches one when the uncertainty in estimating σM/σT reduces. Allthe required estimates are directly obtained from Model (6.1) with the REML variance estimatesused instead of the mean squares. The variance of the mean (σ2

M) is estimated as the varianceof Xα which is obtained from the resulting standard error of the fitted model. This last part,that is, estimation of σ2

M , was also followed by Hong et al. (2014) when estimating two-sidedtolerance intervals. Our generic approach will be assessed as to how well it performs whenestimating one-sided tolerance limits for higher-order random effects model. Its performancewill be contrasted against other analytical approaches that are currently available.

6.2.2 The generalised pivotal quantity tolerance intervals

To develop a GPQ of a (p, γ) tolerance interval based on Model (6.1), the random variables ofinterest are Y , SSi, . . . , SSq+1. Then the GPQ to be used when constructing tolerance intervalsis given by

G = Y − Z

√√√√√max

0,q+1∑i=1

biSSiU2i

+ zp

√√√√√max

0,q+1∑i=1

aiSSiU2i

, (6.16)

where ai and bi are coefficients in the linear combinations of mean squares used to estimate thevariance components σ2

T and σ2M , respectively, and Z and Ui are independent random variables

(Krishnamoorthy and Mathew, 2009). Let Gγ denote the γth quantile of G, then Gγ is a γ

generalised upper confidence limit of µ + zp√∑q

i=1 σ2i + σ2

e and hence is also a (p, γ) uppertolerance limit for N(µ,∑q

i=1 σ2i +σ2

e). The GPQ is then computed using the following algorithm:

1. Compute values Y and SSi for i = 1, . . . , q + 1.

2. Let T denote the number of simulation runs. For j = 1, 2, . . . , T do the following:

3. Generate independent random variables Zj ∼ N(0, 1), U2ij ∼ χ2

mi,

4. Compute Gj with Z replaced by Zj and U2i replaced by U2

ij in (6.16).

Then, the (p, γ) upper tolerance interval is given by the γ quantile of the Gj values. For aone-sided tolerance interval based on a (balanced) one-way random effects model, this approachwill give the intervals of Liao et al. (2005) and Krishnamoorthy and Mathew (2004), and fora two-way nested random effects model this algorithm gives the one-sided tolerance interval ofFonseca et al. (2007) and Liao et al. (2005).

Page 138: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

126 6.3. Simulation study

6.2.3 The modified large sample tolerance intervals

Krishnamoorthy and Mathew (2009) proposed an approach for constructing tolerance intervalsusing ideas of the MLS by Burdick and Graybill (1992) and Graybill and Wang (1980). TheKrishnamoorthy and Mathew (2009) (p, γ) upper tolerance interval of Y0 is given by

Y + zp

√√√√√√q+1∑i=1

ciS2i +

√√√√√q+1∑i=1

c2iS

4i

mi

ui− 1

2

where ui =

χ2mi;1−γ, ci > 0,χ2mi;γ, ci < 0,

, (6.17)

where ci = ai + bi is the sum of the coefficients of the linear combination of means squares usedin the estimation of σ2

T and σ2M . This approach does not use the point estimates as we proposed

in our approach but uses the mean squares associated with each variance component instead.A simple analytical approach for computing one-sided tolerance intervals for general random

effects models was proposed by Hoffman (2010). This approach was also developed using theprinciples of the MLS. The approach of Hoffman (2010) is applicable to both balanced andunbalanced data. The approximate one-sided (p, γ) upper tolerance interval is given by

Y +zp

√√√√σ2Y +

( q+1∑i=1

H2i a

2iS

4i

)1/2+ zγσM

, (6.18)

where Hi = 1F1−γ,dfi,∞

− 1 and F1−γ,dfi,∞ is the lower γ percentile of the F distribution with dfiand ∞ the numerator and denominator degrees of freedom, respectively.

6.3 Simulation study

A simulation study is conducted to assess the performance of our proposed approach for con-structing tolerance intervals. Data will be simulated from a two-way nested and two-way crossed(with interaction) random effects models. The number of simulated data sets is 10 000 and foreach simulated dataset a (0.90, 0.95) upper tolerance limit will be estimated. The performance ofthe proposed approach will be assessed by estimating the proportion of times that the simulatedupper tolerance limit is greater than or equal to the quantile given by µ + zp

√∑σ2i + σ2

e . Thisproportion should be close to the confidence coefficient given by γ=0.95 if the approach usedto estimate the tolerance limit is working properly. In addition, the MLS approach of Hoffman(2010), and the GPQ approach of Liao et al. (2005) will be used as comparative methods fora two-way nested random effects model and only the MLS approach will be included for thecrossed random effects model. For the GPQ approach of Liao et al. (2005), the quantile will beestimated from a simulation of T = 2500 runs. The location parameter µ will be assumed zero(µ = 0) and the residual variance parameter is assumed to be one (σ2

e = 1.0). Other variance

Page 139: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

6.3. Simulation study 127

parameters will be varied as shown in the forthcoming sections. The analyses were performedusing the statistical software SASr version 9.4 (SAS Institute Inc, Cary, NC, USA).

6.3.1 Two-way nested random effects model

The data is simulated from a two-way nested random effects model

Yijk = µ+ τi + δj(i) + εijk, with i = 1, . . . , I; j = 1, . . . , J ; k = 1, . . . , K, (6.19)

with τi ∼ N(0, σ2τ ), δj(i) ∼ N(0, σ2

δ ), and εijk ∼ N(0, σ2ε). The marginal distribution of Yijk

is Yijk ∼ N(µ, σ2τ + σ2

δ + σ2ε). The assumed variance parameter values are σ2

τ = (1, 5, 10) andσ2δ = (0.1, 0.5). These variance parameters are similar to those used by Sharma and Mathew

(2012). The (I, J,K) levels are assumed to take values (3, 3, 3), (3, 6, 3), (3, 3, 6) and (3, 6, 6)where the first, second, and third entry correspond to I, J , and K, respectively. In a bioassayexperiment this could be represented, for example, by runs, batches, and repeats, respectively.The simulated confidence coefficient values or coverage probabilities under different settings arepresented in Table 6.2. In this table, the variance parameters are represented by the intraclustercorrelation coefficient (ρ) which is calculated as ρ = σ2

τ/(σ2τ + σ2

δ + σ2ε).

Table 6.2: Simulated confidence coefficient for a (p = 0.90, γ = 0.95) upper tolerance limit fora two-way nested random effects model (balanced data setting)

(I,J,K) (3, 3, 3) (3, 6, 3) (3, 3, 6) (3, 6, 6)

ρ MSa GPQb MLSc MS GPQ MLS MS GPQ MLS MS GPQ MLS

0.40 0.991 0.986 0.975 0.973 0.974 0.965 0.968 0.981 0.972 0.948 0.969 0.9610.48 0.983 0.981 0.970 0.959 0.968 0.962 0.955 0.972 0.965 0.933 0.963 0.9580.77 0.966 0.963 0.956 0.941 0.957 0.953 0.949 0.962 0.958 0.931 0.960 0.9550.82 0.953 0.957 0.951 0.940 0.960 0.955 0.932 0.955 0.949 0.930 0.958 0.9540.87 0.964 0.961 0.956 0.943 0.957 0.950 0.953 0.962 0.956 0.937 0.958 0.9530.90 0.957 0.959 0.953 0.945 0.956 0.950 0.946 0.958 0.950 0.941 0.957 0.951

a MS: Modified Satterthwaite approach using (6.15),b GPQ: Generalised pivotal quantity using (6.16), andc MLS: Modified large sample using (6.18).

A conservative confidence coefficient was obtained when smaller values of I, J,K and ρ wereconsidered. This comes as no surprise since it is well documented that smaller values of ρtend to give either conservative or liberal values of the coverage probability (Hoffman, 2010;Krishnamoorthy and Mathew, 2004; Liao et al., 2005). For the (3, 3, 3) design, conservativeconfidence coefficient values were observed but these results got better with increasing valuesof the intracluster correlation, that is, when σ2

τ was more dominant. In this particular case,

Page 140: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

128 6.3. Simulation study

the GPQ approach was better than our proposed method and the MLS. A better coverage wasachieved for our proposed method when (3,6,3) design was implemented whilst the MLS approachstill resulted in slightly conservative coverage. But this was already apparent in Hoffman (2010)’spaper who claimed the analytical approach is at times conservative unless large values of I, J,Kare used.

For the (3, 3, 6) design the GPQ and our proposed approach were quite comparable and bothgave good coverage whilst the MLS showed conservative results. However, the GPQ approachis expected to give better results than the other two approximate approaches but its limitationis that it is only applicable to limited model structures and can only be considered for balanceddata setting. Overall, our proposed approach under the two-way nested random effects modelis better than the MLS approach of Hoffman (2010) and comparable to the GPQ approach ofLiao et al. (2005).

6.3.2 Two-way crossed random effects model with interaction

A two-way crossed random effects model with interaction is considered and it is given by

Yijk = µ+ τi + βj + δij + εijk, with i = 1, . . . , I; j = 1, . . . , J ; k = 1, . . . , K, (6.20)

with τi ∼ N(0, σ2τ ), βj ∼ N(0, σ2

β), δij ∼ N(0, σ2δ ), and εijk ∼ N(0, σ2

ε). Yijk follows a normaldistribution with mean µ and variance σ2

τ + σ2β + σ2

δ + σ2ε . Similarly, the mean is assumed to be

µ = 0 and the variances of the random effects are assumed to be σ2τ = (1, 5, 10), σ2

β = (0.1, 0.5),σ2δ = (0.1, 0.5), and σ2

ε = 1.0 and these are also similar to those used by Sharma and Mathew(2012). An experimental design used is similar to the one used for the two-way nested randomeffects.

The results in Table 6.3 suggest that our proposed approach is comparable to the MLSapproach of Hoffman (2010). This approach resulted in good coverage whilst the MLS gave aslightly conservative coverage when lower values of the intracluster correlation were considered.In addition, the proposed approach works even better when I, J,K values are small which is anadded advantage given that bioassay analyses usually involve smaller values of batches, runs, orrepetitions, for example.

The design (3, 3, 3) resulted in rather conservative confidence coefficient values for both ourproposed approach and the MLS. The values improved slightly when higher values of ρ wereimplemented. On the other hand, good confidence coefficient values were achieved for bothapproaches for the (3, 6, 3) and (3, 3, 6) designs. However, our proposed approach tended tooutperform the MLS approach more so when small values of ρ were considered. The results ofthe MLS improved significantly when the design (3, 6, 6) was used with the resulting coverageapproximately equal to the true value (γ = 0.95). On the other hand, the proposed approach

Page 141: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

6.3. Simulation study 129

resulted in slightly liberal values of the coverage, especially when ρ < 0.71 for this particulardesign.

Table 6.3: Simulated confidence coefficient for a (p = 0.90, γ = 0.95) upper tolerance limit fora two-way crossed random effects model with an interaction (balanced data)

(I,J,K) (3, 3, 3) (3, 6, 3) (3, 3, 6) (3, 6, 6)

ρ MSa MLSb MS MLS MS MLS MS MLS

0.33 0.980 0.982 0.954 0.960 0.946 0.977 0.918 0.9500.39 0.976 0.979 0.951 0.963 0.943 0.974 0.917 0.9520.46 0.979 0.980 0.955 0.966 0.949 0.977 0.918 0.9540.71 0.968 0.967 0.942 0.955 0.948 0.968 0.931 0.9510.76 0.963 0.964 0.943 0.954 0.944 0.965 0.929 0.9520.81 0.961 0.962 0.938 0.955 0.940 0.965 0.926 0.9510.83 0.966 0.961 0.947 0.954 0.953 0.964 0.940 0.9500.86 0.963 0.958 0.947 0.953 0.952 0.962 0.939 0.9490.90 0.962 0.957 0.948 0.954 0.950 0.961 0.942 0.951

a MS: Modified Satterthwaite approach using (6.15) andb MLS: Modified large sample using (6.18).

In addition to the above analysis, the data simulated above was used to assess how well theproposed approach performs under an unbalanced data setting. This was done by randomlyselecting 80% and 70% of the data per simulation run following a simple random samplingapproach, respectively (using a SAS procedure PROC SURVEYSELECT). This implies thatonly a subset of the data is used and this of course will create an unbalanced structure in thedata as there were no constraints imposed on the selection.

Table 6.4: Simulated confidence coefficient for a (p = 0.90, γ = 0.95) upper tolerance limit fora two-way crossed random effects model with an interaction (unbalanced data)

%Missing 20% 30%ρ (3,3,3) (3,6,3) (3,3,6) (3,6,6) (3,3,3) (3,6,3) (3,3,6) (3,6,6)

0.33 0.985 0.965 0.959 0.927 0.986 0.974 0.967 0.9350.39 0.981 0.962 0.955 0.930 0.985 0.969 0.962 0.9320.46 0.982 0.962 0.962 0.921 0.988 0.969 0.971 0.9310.71 0.973 0.956 0.954 0.940 0.978 0.961 0.957 0.9420.76 0.969 0.950 0.948 0.936 0.976 0.955 0.953 0.9360.81 0.965 0.944 0.945 0.927 0.973 0.946 0.946 0.9280.83 0.967 0.953 0.957 0.943 0.975 0.958 0.959 0.9450.86 0.966 0.953 0.954 0.943 0.973 0.954 0.957 0.9440.90 0.966 0.951 0.951 0.940 0.971 0.951 0.954 0.942

Page 142: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

130 6.4. Discussion and conclusion

The observed data is then used to estimate the upper tolerance limit as before. The results inTable 6.4 show good coverage in general but with slightly more conservative results for smallervalues of ρ than what was observed for a balanced data setting and this was more prevalentwhen only 70% of the data was used.

6.3.3 Application to bioassay analysis data

For a new medicinal product at pharmaceutical company MSD, Oss, The Netherlands speci-fication limits needed to be determined. Specification limits are a regulatory requirement forproducts manufactured and released by pharmaceutical companies. The main characteristic forthis product is a relative bioactivity with respect to an accepted reference standard. The com-pany decided to measure the bioactivity of five batches (I = 5) in six bioassay runs (J = 6) to beable to estimate the variability of batch and assays simultaneously. In each bioassay run threemeasurements of the relative bioactivity were obtained for each batch. A crossed experimentaldesign was used and as a result a two-way crossed random effects model with an interactionbetween batch and bioassay run is a suitable statistical model for this study. This is similar toModel (6.20) in Section 6.3.2. The terms τi, βj, and δij represent the batches, bioassay runs, andthe interaction of the two random terms, respectively. The response Yijk is the kth measurementof the relative bioactivity in the jth bioassay run for the ith batch.

The objective is to estimate a (0.90,0.95) upper tolerance limit for the distribution Yijk ∼N(0, σ2

T ). This tolerance limit guarantees that 90% of the relative bioactivities will not exceedthis tolerance limit with 95% confidence. Our proposed approach is used to estimate the uppertolerance limits. Following a similar procedure as in Section 6.3.2, the resulting parameterestimates were obtained as Y = 4.626, σ2

T = 0.00943, σ2M = 0.000260 (where the parameters

are defined as before). The (0.90, 0.95) upper tolerance limit was found to be utl = 4.818.As expected, this upper tolerance limit is a little less than that obtained using the analyticalapproach of Hoffman (2010) which is given by utl = 4.823. Thus, this implies that when abioassay is constructed the relative bioactivity will be expected to fall below this upper tolerancelimit of 4.818.

6.4 Discussion and conclusion

This study sets out to assess the performance of a simple and appropriate approach that canbe used to compute tolerance intervals for essentially any variance component model. Ourapproach entails estimating a Student t distribution based tolerance factor that accounts for theuncertainty in estimating the unknown ratio of the variance of the mean (σ2

M) and the varianceof observations (σ2

T ). The estimate of the ratio is multiplied by the factor df/χ2df,1−γ, which

Page 143: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

6.4. Discussion and conclusion 131

approaches one as the number of degrees of freedom increases (df → ∞). Hence, the effectof this factor vanishes when the sample size is large but it is essential to accommodate theuncertainty in the estimation of the tolerance factor. In addition, the degrees of freedom usedare those suggested by Van den Heuvel (2010) which are the modified Satterthwaite degrees offreedom.

The simulation study showed a good performance of our proposed method. However, it wasnoted that it is similar or slightly inferior to the GPQ approach but better than the conserva-tive analytical approach of Hoffman (2010) for a two-way nested random effects model. It isworth mentioning that the GPQ approach cannot be applied to all random effects models. Forunbalanced data, it is only applicable to a one-way random effects model (Krishnamoorthy andMathew, 2004, 2009). The analytical approach of Hoffman (2010) which is based on the MLSis applicable to any mixed or random effects models, but its negative aspect is that it tends tobe overly conservative. Our approach is simple to use and it has been shown that it also givesgood results for unbalanced data settings, thus posing a much greater advantage to the methodsthat are already available. It is important to note that the MLS and GPQ approaches givethe expression of variance as a linear combination of mean squares. However, when the datais unbalanced, it is not trivial to determine the values of the coefficients ai and bi in equation(6.16) or the ci values in equation (6.18). With the proposed approach, all this is evaded byusing the variances instead of mean squares.

A different angle of the estimation of tolerance intervals encapsulates the use of bootstrapsampling. This approach is frequently used in the estimation of confidence intervals of iid ran-dom variables. Several authors have proposed different bootstrap procedures that can be usedto compute tolerance intervals, for example, Fernholz and Gillespie (2001); Hoffman (2010); Re-bafka et al. (2007); Shoung et al. (2005). However, there has been some criticism about theseprocedures mainly for higher-order random effects models, because the probability that at leastone of the random effects will have a negative estimate increases with the number of randomeffects. Additionally, the bootstrap approaches for random effects model are not straightforwardfor practical use since the resulting total variance tends to be hugely underestimated, thus need-ing a correction factor which may not be trivial to determine. Selecting the unit of bootstrapping(at the data or group level) is another issue. None of these bootstrapping strategies seems to beuniformly the best for all settings.

The current findings demonstrate that there are now three appropriate general methods fortolerance limits. Sharma and Mathew (2012), Hoffman (2010), and our approach. The Sharmaand Mathew (2012) method is highly complex to execute for higher-order and unbalanced mixedmodels. The Hoffman (2010) method is analytically easy to implement, but it is generallyconservative. Our approach is simple and has good performance, but requires more study tounderstand its performance in other settings and its theoretical fundament.

Page 144: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

132 REFERENCES

References

Bagui, S., Bhaumik, D., and Parnes, M. (1996), “One-sided Tolerance Limits for Unbalancedm-way Random Effects ANOVA Models,” Journal of Applied Statistical Science, 3, 135–148.

Bhaumik, D. and Kulkarni, P. (1991), “One-sided Tolerance Limits for Unbalanced One-wayANOVA Random Effects Model,” Communications in Statistics - Theory and Methods, 20,1665–1675.

— (1996), “A Simple and Exact Method of Constructing Tolerance Intervals for the One-wayANOVA with Random Effects,” The American Statistician, 50, 319–323.

Burdick, R. and Graybill, F. (1992), Confidence Intervals on Variance Components, New York:Marcel-Dekker.

Fernholz, L. and Gillespie, J. (2001), “Content-corrected Tolerance Limits Based on the Boot-strap,” Technometrics, 43, 147–155.

Finney, D. J. (1978), Statistical Method in Biological Assay, London: Charles Griffin & Co. Ltd.

Fonseca, M., Mathew, T., Mexia, T. J., and Zmyślony, R. (2007), “Tolerance Intervals in aTwo-way Nested Model with Mixed or Random Effects,” Statistics, 41, 289–300.

Graybill, F. A. and Wang, C.-M. (1980), “Confidence on Nonnegative Intervals Linear Combi-nations of Variances,” Journal of the American Statistical Association, 75, 869–873.

Hoffman, D. (2010), “One-sided Tolerance Limits for Balanced and Unbalanced Random EffectsModels,” Technometrics, 52, 303–312.

Hoffman, D. and Kringle, R. (2005), “Two-sided Tolerance Intervals for Balanced and Unbal-anced Random Effects Models,” Journal of Biopharmaceutical Statisticsics, 15, 283–93.

Hong, B., Fisher, T., Sult, T., Maxwell, C., Mickelson, J., Kishino, H., and Locke, M. (2014),“Model-based Tolerance Intervals Derived from Cumulative Historical Composition Data: Ap-plication for Substantial Equivalence Assessment of a Henetically Modified Crop,” Journal ofAgricultural and Food Chemistry, 62, 9916–9926.

Krishnamoorthy, K. and Lian, X. (2012), “Closed-form Approximate Tolerance Intervals forSome General Linear Models and Comparison Studies,” Journal of Statistical Computationand Simulation, 82, 547–563.

Page 145: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

REFERENCES 133

Krishnamoorthy, K. and Mathew, T. (2004), “One-Sided Tolerance Limits in Balanced andUnbalanced One-way Random Models Based on Generalized Confidence Intervals,” Techno-metrics, 46, 44–52.

— (2009), Statistical Tolerance Regions: Theory, Applications, and Computation, New Jersey:JohnWiley and Sons.

Li, X. (2006), “Comparison of One-sided Tolerance Limits in Unbalanced One-way RandomModels,” Communications in Statistics - Simulation and Computation, 35, 321–329.

Liao, C., Lin, T., and Iyer, H. (2005), “One- and Two-sided Tolerance Intervals for GeneralBalanced Mixed Models and Unbalanced One-way Random Models,” Technometrics, 47, 323–335.

Mee, W. and Owen, D. B. (1983), “Improved Factors for Balanced One-way ANOVA RandomModel,” Journal of the American Statistical Association, 78, 901–905.

Rebafka, T., Clémençon, S., and Feinberg, M. (2007), “Bootstrap-based Tolerance Intervals forApplication to Method Validation,” Chemometrics and Intelligent Laboratory Systems, 89,69–81.

Romero-Villafranca, R., Zúnica, L., Romero-Zúnica, R., and Pagura, J. A. (2008), “One-sidedToleranceLimits for Unbalanced One-way Random Effects Models: A Generalized Mee andOwen Procedure,” Journal of Statistical Computation and Simulation, 78, 1215–1227.

Sharma, G. and Mathew, T. (2012), “One-sided and Two-sided Tolerance Intervals in GeneralMixed and Random Effects Models Using Small-sample Asymptotics,” American StatisticalAssociation, 107, 258–267.

Shoung, J.-M., Altan, S., and Cabrera, J. (2005), “Double Bootstrapping a Tolerance Limit,”Journal of Biopharmaceutical Statistics, 15, 367–73.

Thomas, J. D. and Hultquist, R. A. (1978), “Interval Estimation for the Unbalanced Case of theOne-way Random Effects Model,” Annals of Statistics, 6, 582–587.

Van den Heuvel, E. (2010), “A Comparison of Estimation Methods on the Coverage Probabilityof Satterthwaite Confidence Intervals for Assay Precision With Unbalanced Data,” Commu-nications in Statistics - Simulation and Computation, 39, 777–794.

Vangel, M. G. (1992), “New Methods for One-sided Tolerance Limits for a One-way BalancedRandom-effects ANOVA Model,” Technometrics, 34, 176–185.

Weerhandi, S. (1993), “Generalized Confidence Intervals,” Journal of the American StatisticalAssociation, 88, 899–905.

Page 146: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016
Page 147: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

CHAPTER 7

Estimation of shelf life of a drug product in astability degradation study

Page 148: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

136

Abstract

The shelf life of a drug product is a very important characteristic in pharmaceutical industry.The statistical analysis for shelf life estimation has been described in the ICH guidelines, butthe suggested approach has received a number of criticisms. One important issue is that theguidelines implicitly suggest that the statistical analysis should be conducted per storage condi-tion instead of a combined analysis that is more efficient. Additionally, the guidelines have beendescribed mainly for analytical methods and not so much for bioassays that would demonstratemore variability than chemical analyses. This means that efficient experimentation and statisti-cal analysis are essential for the estimation of the shelf life of the bioactivity of drug products.This manuscript compares two specific experimental designs and implements a statistical analy-sis that addresses the bioassay run structure used in the shelf life study and the within bioassayrun precision. The experimental design that confounds bioassay runs with batches providesshelf life estimates that are less biased than the experimental design that confounds bioassayruns with storage condition. The optimal experimental design implies a simultaneous statisticalanalysis of all the bioactivities in a shelf life study to obtain realistic and meaningful shelf lifeestimates.

Keywords: Bioassay; measurement error, mixed models; stability; storage conditions

Page 149: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

7.1. Introduction 137

7.1 Introduction

The shelf life of drug products is a very important characteristic in the pharmaceutical indus-try. According to the International Conference of Harmonisation guideline ICH Q1A(R2) “Thepurpose of stability testing is to provide evidence on how the quality of a drug substance or drugproduct varies with time under the influence of a variety of environmental factors such as tem-perature, humidity, and light, and to establish a retest period for the drug substance or a shelflife for the drug product and recommended storage conditions”. The ICH guidelines also statehow the data from a shelf life study should be analysed. The common approach is to treat themultiple batches under study as fixed effects in the analysis and treat the different storage condi-tions as separate experiments, which implies that each storage condition is analysed separately.Additionally, the confidence level for accepting a common slope across batches (poolability) inzero-order degradation profile has been prescribed at α = 0.25 to manage the type II error andimprove the power of detecting differences in degradation rates between batches. However, thesuggested analysis is not without criticism.

Van den Heuvel et al. (2011) argued for an approach that estimates shelf life for multiplestorage conditions simultaneously instead of estimating shelf life for each storage condition sepa-rately. This was followed by a simulation study in another article to demonstrate the relevance ofthe proposed approach (Almalik et al., 2014). The simulation results showed that the proposedapproach gives better degrees of freedom and an improved precision than the one preferred bythe ICH. Quinlan et al. (2013) criticised the choice of using batches as fixed effects, which limitsthe generalisability of the results to other batches. The authors argue that treating batches asrandom improves the interpretation of the shelf life and eliminates poolability issues (Quinlanet al., 2013).

In all the above mentioned studies, the role of bioassays is deemed redundant, that is, nottaken into account when estimating the shelf life. This is unfortunate since bioassays are oftenless precise than analytical methods and the discussion of the number of bioassay runs should bea serious topic. Additionally, since bioassays may use living animals, efficient experimentationmay be an important aspect within shelf life studies. It brings into question as to how bioassayruns should be incorporated in the study design, since bioassay runs are often capable of testingmultiple samples simultaneously. However, bioassay runs introduce additional variation in theform of between-run variability. This implies that the variance structure in a shelf life study isno longer due to the variance of the residuals only, and it is important to know where to allocatethis additional variation in the experiment to maximise the precision of the shelf life estimate.Additionally, bioassays provide a tripartite outcome (Mzolo et al., 2015, 2013) in the form of abioactivity, a standard error, and the degrees of freedom. This added information should not beignored in the optimization of the shelf life study, although this will result in a more elaborate

Page 150: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

138 7.2. Methodology

analysis.This chapter explores and compares two design structures of bioassay runs in a shelf life study

for the estimation of shelf life. It includes all bioassay information in the statistical analysis.Furthermore, the statistical models assume non-poolable degradation rates, but poolability isincluded and investigated too. This is an extension of the work of Van den Heuvel et al. (2011)and Almalik et al. (2014) for analytical methods. The current study does not ignore the runstructure, and it extends the statistical analysis to bioassay data.

7.2 Methodology

In our setup for long-term stability studies, we assume that the required batches (m ≥ 3) canbe analysed in one bioassay run and the same applies to the necessary storage conditions butthe bioassay is not capable of testing all batches from all storage conditions simultaneously. Thetwo options of experimental designs are to either test all batches simultaneously in one bioassayrun and separate bioassay runs for storage conditions or test all storage conditions of a batchin one bioassay run and separate bioassay runs for batches. Thus, the first design assumes thatbioassay runs are nested within storage conditions, and the second design assumes that bioassayruns are nested within batches.

7.2.1 Statistical models

The statistical models for a zero-order degradation model of the log potency Yhijk for batch h,at storage condition i, at time point tj and bioassay run k based on the two designs mentionedabove are formulated as

Yhijk = β0h + β1hitj + δk(ij) + εhijk, (7.1a)

Yhijk = β0h + β1hitj + δk(hj) + εhijk, (7.1b)

where h = 1, 2, . . . , H; i = 1, 2, . . . , I; j = 1, 2, . . . , J ; and k = 1, 2, . . . , K, β0h is the interceptof batch h, β1hi is the slope of batch h at condition i, time point tj ∈ {0, 3, 6, 9, 12} followingICH recommendations (ICH Q1A(R2)), and εhijk is the residual with εhijk ∼ N(0, σ2

E). ForModel (7.1a), bioassay runs are nested within storage conditions and time, δk(ij) ∼ N(0, σ2

R)independently and identically distributed and independent of the residuals. For Model (7.1b),the bioassay runs are nested within batches and time, δk(hj) ∼ N(0, σ2

R) independently andidentically distributed and independent of the residuals. Note that Models (7.1a) and (7.1b) arenon-hierarchical since the intercept does not depend on the storage condition while the slopealways depends on the storage condition, also when batches are poolable (β1hi = β1i, ∀h).

Page 151: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

7.2. Methodology 139

For the bioactivity, we also obtain an estimator for the standard error Shijk with a correspond-ing number of degrees of freedom dhijk. This implies that the residuals εhijk are now partitionedinto two parts: the first part represents the imprecision of the estimate for the bioactivity andthe second part represents the variability due to a possible misfit of the selected zero-orderdegradation model or other sources of variation. If the regression model represents the truedegradation profile, then we may assume that dfhijkS2

hijk/σ2E ∼ χ2

dfhijk. Under the homogeneity

assumption (Bartlett, 1937), the variance σ2E is best estimated by

S2P =

H∑h=1

I∑i=1

J∑j=1

K∑k=1

dfhijkS2hijk/

H∑h=1

I∑i=1

J∑j=1

K∑k=1

dfhijk. (7.2)

Alternatively, if the residuals capture more sources of variability, that is, εhijk ∼ N(0, σ2 +σ20E),

then S2hijk is an estimate of σ2, and σ2

0E must still be estimated from the log potency data.The conditional density function of the log potency Yhijk given the run effect δk(ij) = z (or

δk(hj) = z) and the precision Shijk = s is now assumed equal to

f(Yhijk|δk(ij) = z;Shijk = s

)= 1√

s2 + σ20E

φ

Yhijk − (β0h + β1hitj + z)√s2 + σ2

0E

, (7.3)

with φ the standard normal density. This density is the same for both Models (7.1a) and (7.1b).Since the run effect is not observed, consequently, it is integrated out from the conditionaldensity. However, the marginal densities for Models (7.1a) and (7.1b) are different. For Model(7.1a), the observations Y1ijk, Y2ijk, . . . , YHijk are considered independent given the run effect,while Yh1jk, Yh2jk, . . . , YhIjk are independent in Model (7.1b) given the run effect. The marginaljoint density for the log potencies in one run for Model (7.1a) is given by

f(Y1ijk, . . . , YHijk|S1ijk = s1, . . . , SHijk = sH) = (7.4)∫R

H∏h=1

1√s2h + σ2

0E

φ

Yhijk − β0h − β1hitj − z√s2h + σ2

0E

1σRφ(z

σR

)dz,

and for Model (7.1b), it is given by

f(Yh1jk, . . . , YhIjk|Sh1jk = s1, . . . , ShIjk = sI) = (7.5)∫R

I∏i=1

1√s2i + σ2

0E

φ

Yhijk − β0h − β1hitj − z√s2i + σ2

0E

1σRφ(z

σR

)dz.

The full conditional likelihoods for the two models given the within bioassay variances are now

Page 152: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

140 7.2. Methodology

given by

L1(β;σ2R;σ2

0E) =I∏i=1

J∏j=1

K∏k=1

f(Y1ijk, . . . , YHijk|S1ijk = s1, . . . , SHijk = sH), (7.6)

L2(β;σ2R;σ2

0E) =H∏h=1

J∏j=1

K∏k=1

f(Yh1jk, . . . , YhIjk|Sh1jk = s1, . . . , ShIjk = sI),

where β = (β0, . . . , β0H , β111, . . . , β11I , . . . , β1H1, . . . , β1HI) is the vector of all interceptsand degradation slopes.

Estimates of the parameters in (7.1a) and (7.1b) are obtained by maximizing the log likeli-hood of L1(β;σ2

R;σ20E) and L2(β;σ2

R;σ20E), respectively. The log likelihoods of (7.1a) and (7.1b)

have been studied elsewhere (Verbeke and Molenberghs, 2000), at least when s2hijk + σ2

0E isreplaced by an unknown parameter σ2

E. The likelihood functions are products of multivariatenormal densities since (Y1ijk, Y2ijk, . . . , YHijk) and (Yh1jk, Yh2jk, . . . , YhIjk) are multivariatenormally distributed. With σ2

E taking the place of s2hijk + σ2

0E, most statistical softwares canbe used to maximise the likelihood. However, integrating s2

hijk in the likelihood is not straight-forward or possible with every statistical software package. Possibly, a two-step approach canbe used where σE is estimated first, and the estimate of σ0E is given by σ2

0E = σ2E − S2

P . How-ever, this might not be optimal for the standard errors of the estimators of β, because thesestandard errors will assume that σE is estimated from the log potency data, while only σ0E isestimated. Thus, to obtain the maximum likelihood estimator we should directly optimise thelog likelihoods `1(·) = lnL1(·) and `2(·) = lnL2(·).

Due to the intractability of these log likelihoods, the restricted maximum likelihood (REML)approach together with the Newton-Raphson algorithm are used to estimate the parameters ofthe models. The REML approach is commonly preferred when estimating variance componentsbecause it gives unbiased variance estimates compared to, for example, the maximum likelihoodapproach (Brown and Prescott, 2006; Verbeke and Molenberghs, 2000; West et al., 2007). Thisestimation will have to take into account the s2

hijk values when estimating variances. In SAS,this can be incorporated by using a LOCAL option in the MIXED procedure. With this option,the estimate of σ2

0E can be obtained.

7.2.2 Shelf life estimation

The time where the lower confidence interval for a batch at a storage condition intersects with thelower specification limit is the shelf life estimate of this batch at this specific storage condition(ICH Q1A(R2)). In order to estimate the lower confidence limit, suppose the predicted logpotency at storage time t for batch h at storage condition i is given by Yhi(t) = β0h + β1hit, withβ0h and β1hi the maximum likelihood estimates from Models (7.1a) and (7.1b). The variance of

Page 153: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

7.2. Methodology 141

the prediction is given by

σ2hi(t) = σ2

β0h+ 2C(β0h, β1hi)t+ σ2

β1hit2, (7.7)

where C(β0h, β1hi) is the covariance of β0h and β1hi, t is the time of storage, σβ0his the standard

error of β0h, σβ1hiis the standard error of β1hi. The standard error of the prediction yhi(t) is

given by the square-root of the estimate of variance in (7.7). The shelf life is now obtained bysolving the equation yhi(t)− t−1

df (1− α)σhi(t) = LSL, where t−1df (1− α) is the (1− α)th quantile

of the t-distribution with df degrees of freedom, σhi(t) is an estimate of σhi(t), and LSL isthe lower specification limit. If there is no or one solution from this equation, the shelf life isequal to zero. If there are two solutions t1 and t2 then the shelf life is either tLhi = min(t1, t2),tUhi = max(t1, t2), or∞. This depends on how the solutions relate to the storage time that makesthe degradation profile intersect with the specification limit, that is, thi = (LSL − β0h)/β1hi.The shelf life estimate is mathematically given by

tShi =

∞ if tLhi < thi < tUhi < 0 ∧ β1hi > 0tUhi if tLhi < 0 < tUhi < thi ∧ β1hi < 0or if thi < tLhi < 0 < tUhi ∧ β1hi > 0tLhi if 0 < tLhi < thi < tUhi ∧ β1hi < 00 otherwise

(7.8)

For more details, see Van den Heuvel et al. (2011). Since the standard error of the maximumlikelihood estimators β0h and β1hi are involved in the calculation of the shelf life, the experimentalsetup may affect the estimates and standard errors, hence shelf life estimates from Model (7.1a)can be different from shelf life estimates obtained with Model (7.1b).

7.2.3 A theoretical comparison of the standard errors used in theestimation of shelf life

In this section we provide an explicit expression of the variance function (7.7) for the estimationof the shelf life based on the two designs. The two functions will give an indication of the designthat yields a more precise (smaller variance) estimate of the shelf life. This does not mean thatthe highest theoretical precision will lead to longest shelf life values, because the estimates β0h

and β1hi based on (7.1a) and (7.1b) may be different, the precision must be estimated, and a testfor poolability of batches is conducted. However, we feel that a comparison of the theoreticalprecision will give more insight. For simplicity we focus on models with poolable degradationrates, that is, β1hi is replaced by β1i in (7.1a) and (7.1b). We also assume that the precision ofthe bioactivity is excluded in the estimation process and the error variance is given by σ2

E.

Page 154: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

142 7.2. Methodology

If we define the parameters β = (β01, β02, . . . , β0H , β11, β12, . . . , β1I)T , the design matrixX for Model (7.1a) can be formed as follows

Xjk =

1 0 · · · 0 tj 0 · · · 00 1 ... tj 0 · · · 0... . . . 0 ... ... ...0 · · · 0 1 tj 0 · · · 01 0 · · · 0 0 tj · · · 00 1 ... 0 tj · · · 0... . . . 0 ... ... ...0 · · · 0 1 0 tj · · · 0

... ...1 0 · · · 0 0 · · · 0 tj

0 1 ... 0 · · · 0 tj... . . . 0 ... ... ...0 · · · 0 1 0 · · · 0 tj

X =

X11

X12...

X1K

X21

X22...

X2K...

XJ1

XJ2...

XJK

(7.9)

where Xjk is an HI × (H + I) matrix. The V matrix for the order of observations, batches,storage conditions, repeats, and time is

V =

V0 0 · · · 00 V0

...... . . . 00 · · · 0 V0

V0 = σ2T

1 ρ · · · ρ

ρ 1 ...... . . . ρ

ρ · · · ρ 1

(7.10)

with σ2T = σ2

R + σ2E and ρ = σ2

R/(σ2R + σ2

E). The inverse matrix V −1 is

V −1 =

V −10 0 · · · 00 V −1

0...

... . . . 00 · · · 0 V −1

0

V −10 = c

1 r · · · r

r 1 ...... . . . r

r · · · r 1

(7.11)

with c and r given by

c = [1 + (H − 2)ρ]/[σ2T (1 + (H − 2)ρ− (H − 1)ρ2)] (7.12)

r = −ρ/[1 + (H − 2)ρ] (7.13)

Page 155: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

7.2. Methodology 143

The variance of Y is given by (XTV −1X)−1. Calculating XTV −1X gives the following

V −1X =

V −1

1 X11

V −11 X12...

V −11 XJK

V −11 =

V −10 0 · · · 00 V −1

0...

... . . . 00 · · · 0 V −1

0

(7.14)

and

XTV −1X =(XT

11, XT12, . . . , X

TJK

)

V −11 X11

V −11 X12...

V −11 XJK

, (7.15)

=J∑j=1

K∑k=1

XTjkV

−11 Xjk. (7.16)

The result for XTjkV

−11 Xjk and its summation are given in the Appendix.

The X matrix corresponding to Model (7.1b) is given by

Xjk =

1 0 · · · 0 tj 0 · · · 01 0 · · · 0 0 tj 0... . . . ... ... . . . 01 0 · · · 0 0 · · · 0 tj

0 1 · · · 0 tj 0 · · · 00 1 · · · 0 0 tj 0... ... ... ... . . . 00 1 · · · 0 0 · · · 0 tj

... ...0 0 · · · 1 tj 0 · · · 00 0 · · · 1 0 tj 0... ... ... ... . . . 00 0 · · · 1 0 · · · 0 tj

X =

X11

X12...

X1K

X21

X22...

X2K...

XJ1

XJ2...

XJK

(7.17)

The V matrix used in the previous model is used again here but now for a different order; storageconditions, batches, runs, and storage times. The XTV −1X corresponding to Model (7.1b) canbe found in the Appendix. The variances and covariances of the batches and degradation ratesare given by the diagonal and off-diagonal entries of the inverse of XTV −1X.

Suppose we consider a case study with three batches (H = 3), three storage conditions

Page 156: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

144 7.2. Methodology

(I = 3), five time points (J = 5), and six bioassay runs (K = 6). The residual variance is fixedat one (σ2

E = 1) and the assumed between-run variability values are σ2R = 0.5, 1.0, 2.5, 4.0.

Using SAS IML procedure (or any other software) the inverse of the matrix XTV −1X can beobtained. For the given settings the prediction variance (7.7) corresponding to each setting werecomputed and plotted against time as shown in Figure 7.1 for the varying values of σ2

R. Whenthere is no between bioassay variation, the designs are identical.

Figure 7.1: Theoretical standard error of the predicted bioactivity at storage times based onModel 7.1a (solid line) and Model 7.1b (dashed line)

It is observed that when there is little variability between the bioassay runs both designsresult in almost the same precision with Model (7.1b) slightly better. When there is a moderateor large variability between bioassay runs the difference in terms of precision between the models

Page 157: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

7.3. A simulation study 145

is more pronounced where Model (7.1b) has a higher precision than (7.1a). Model (7.1a) have abetter precision at the earlier storage time points but this subsides over time.

7.3 A simulation study

A simulation study is conducted to compare shelf life values of Models (7.1a) and (7.1b). Thesimulation parameters for the fixed effects, that is, intercepts and slopes were taken from Almaliket al. (2014). In the first simulation, the slopes are assumed to be β11 = −0.07, β12 = −0.09, β13 =−0.11 for each batch at the three storage conditions, respectively. Thus, this means that thedegradation rates are poolable for batches. In the second simulation the same parameters areused but now the slope of the third batch assumed to be 1.3 times larger than the slopes ofthe first two batches. This implies that the slopes for the third batch are now given by -0.091,-0.117, and -0.143 for the three storage conditions, respectively. The parameters for the batchintercepts are varied with simulations too. The first case assumes equal intercepts for the threebatches (β01 = β02 = β03), the second case assumes that the intercept of the first batch is greaterthan that of the second and third batch and that of batch two is greater than that of batchthree, and the third case assumes that the intercept of batch two is greater than that of batchone and three and batch one is less than batch three. The values of these intercepts are givenby

• Case 1: β01 = β02 = β03 = 100,

• Case 2: β01 = 103, β02 = 100, β03 = 99,

• Case 3: β01 = 99, β02 = 101, β03 = 100.

The residual variance is assumed to be σ2E = 1 and the bioassay run variability is varied as

σ2R=0.5, 1.0, 2.5, and 4.0. The precision of the bioactivity for each bioassay run is assumed to

be S2hijk ∼ 0.3σ2

Eχ2/df with df = 20 and this implies that σ2

0E = 0.7σ2E in (7.3). This also means

that the precision S2hijk takes up 30% of the total residual variance. Six bioassay runs will be

used for both models. Models (7.1a) and (7.1b) will be simulated 10 000 times per setting.The analyses are conducted in SAS using the MIXED procedure. The estimation of variance

estimates is done using the restricted maximum likelihood (REML). For the inclusion of S2hijk the

LOCAL option in the REPEATED statement of the MIXED procedure is used and the variancesS2hijk are read in as a matrix. If the residuals are not partitioned the residual variance is then

given by σ2eIn where In is the standard diagonal matrix and n is the total sample size. Now,

with the existence of S2hijk this is no longer the case. Thus, the LOCAL option will incorporate

this part in the error variance structure. The output will contain three estimates, where the first

Page 158: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

146 7.3. A simulation study

estimate is variance of bioassay runs, the second estimate is the precision of bioactivities (usingthe input S2

hijk), and the last estimate corresponds to σ20E.

The performance measures that will be used to compare the two models are similar to thoseused by Almalik et al. (2014) and these are the 5th, 50th, and 95th percentiles of the shelflife estimates. The estimates are calculated according to (7.8). Before the shelf life is tested,poolability of the degradation rates is tested with a Wald test at the significance level of α = 0.25(according to the guidelines). The percentiles over the 10 000 simulations will determine howclose the estimated shelf life to the true shelf life is for each storage condition. The true shelf lifeis determined as the minimum value of the three batches at each storage condition. The fourthstatistic is given by the percentage of simulation runs that result in a shelf life that is greaterthan 24 months.

The shelf life estimates based on the simulation settings with poolable degradation rates(slopes) are presented in Table 7.1 and for the simulation settings without poolable degradationrates in Table 7.2 for both statistical models. The true shelf life is the same for both modelssince only the design changed for (7.1a) and (7.1b) and not the quality of the batches. Hence,only one column with true shelf life values is displayed. It is important to note that these shelflife values cannot be directly compared to those obtained by Almalik et al. (2014) because theyreported a design that excluded bioassay runs while we varied the between bioassay variability.

The results in Tables 7.1 and 7.2 demonstrate that the shelf life estimates for (7.1a) are closerto each other than for (7.1b) over simulations. This means that the shelf life estimate is moreprecise in (7.1a) than in (7.1b). However, the median shelf life is closer to the true shelf life for(7.1b) than for (7.1a). This means that (7.1b) is less biased than (7.1a). Overall, (7.1b) seemsto be better since (7.1a) has smaller percentages in claiming a shelf life of 24 months. Notethat the true shelf life in each simulation settings was more than 24 months. Model (7.1a) doessubstantially better than (7.1b) when the variability between bioassay runs is large.

Page 159: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

7.3. A simulation study 147

Table 7.1: Estimated shelf life values when poolable degradation slopes are simulated forModels 7.1a and 7.1b

MODEL 7.1a MODEL 7.1bCase σ2

R Storage True P5 P50 P95 T ≥ 24 P5 P50 P95 T ≥ 241 0.5 1 71.4 22.2 38.2 66.0 92.1 21.0 42.6 71.6 91.8

2 55.6 22.4 33.4 52.0 91.8 21.1 36.7 56.4 91.63 45.5 22.2 30.2 43.8 89.4 21.6 32.4 46.9 91.6

1 1 71.4 20.9 34.5 65.3 88.4 18.9 40.4 76.3 90.42 55.6 20.9 30.7 51.7 85.6 19.1 35.0 59.6 90.13 45.5 20.6 28.1 43.4 78.6 19.6 31.0 48.8 88.2

2.5 1 71.4 18.2 28.7 62.0 72.4 15.1 35.8 85.8 87.42 55.6 17.9 26.2 50.3 62.8 15.5 31.3 66.4 83.13 45.5 17.5 24.4 42.5 52.3 15.8 28.1 53.6 74.5

4 1 71.4 16.4 25.5 59.2 58.2 12.8 32.5 92.8 81.02 55.6 16.1 23.7 49.2 48.1 13.2 28.9 70.8 73.33 45.5 15.8 22.1 41.9 38.7 13.5 26.0 57.0 61.7

2 0.5 1 57.1 18.6 31.8 54.3 84.0 17.5 35.6 59.4 88.62 44.4 18.6 27.9 43.0 76.6 17.2 30.6 47.2 85.23 36.4 19.0 25.1 36.3 59.7 19.0 26.9 38.9 74.4

1 1 57.1 17.7 28.8 53.5 75.2 15.7 34.1 63.4 87.12 44.4 17.5 25.8 42.6 61.8 15.7 29.5 50.2 80.33 36.4 17.5 23.5 36.0 45.5 17.6 26.0 40.9 65.5

2.5 1 57.1 15.5 24.2 51.1 51.0 12.4 30.6 73.8 77.62 44.4 15.1 22.1 41.4 38.7 12.8 26.9 56.5 65.63 36.4 14.9 20.5 35.4 27.8 15.1 23.9 44.5 49.3

4 1 57.1 14.1 21.7 48.9 38.5 10.5 28.1 80.7 67.82 44.4 13.6 20.1 40.5 29.0 11.0 25.0 61.0 55.63 36.4 13.4 18.8 34.6 21.3 13.4 22.4 47.8 41.2

3 0.5 1 57.1 21.7 32.2 54.1 88.5 20.9 35.8 60.2 91.22 44.4 20.6 28.4 43.9 80.6 19.6 31.0 48.2 87.13 36.4 19.7 25.7 37.6 64.7 20.0 27.3 41.3 77.3

1 1 57.1 20.0 29.3 53.7 78.8 19.0 34.2 65.2 88.72 44.4 18.9 26.2 43.5 65.8 17.6 29.8 51.7 81.63 36.4 18.0 24.0 36.9 49.8 18.1 26.3 43.5 67.8

2.5 1 57.1 16.8 24.5 51.2 53.1 15.6 30.7 76.1 79.12 44.4 15.9 22.5 42.0 40.7 13.7 27.1 58.9 66.23 36.4 15.2 21.0 35.9 31.0 14.3 24.2 48.7 51.4

4 1 57.1 14.9 21.9 49.2 39.3 13.4 28.2 81.9 68.72 44.4 14.1 20.4 40.7 30.3 11.5 25.2 63.1 56.13 36.4 13.6 19.2 35.3 23.0 12.1 22.7 51.6 42.8

Page 160: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

148 7.3. A simulation study

Table 7.2: Estimated shelf life values when not poolable degradation slopes are simulated forModels 7.1a and 7.1b

MODEL 7.1a MODEL 7.1bCase σ2

R Storage True P5 P50 P95 T ≥ 24 P5 P50 P95 T ≥ 247 0.5 1 55.0 22.5 35.7 59.0 92.4 22.0 40.1 65.0 92.9

2 42.7 22.5 31.6 47.2 91.5 22.3 34.5 52.0 93.13 35.0 19.3 27.0 38.4 71.7 18.5 29.3 40.2 80.2

1 1 55.0 21.1 32.5 58.2 87.6 19.7 38.3 68.5 91.62 42.7 20.8 29.2 46.9 82.9 20.3 33.2 55.2 90.93 35.0 18.2 25.2 38.2 58.9 17.0 28.1 41.3 76.5

2.5 1 55.0 18.2 27.3 55.5 67.8 15.8 34.4 80.1 87.52 42.7 17.8 25.1 45.6 57.1 16.5 30.0 61.6 80.93 35.0 16.0 22.1 37.3 37.0 14.0 25.7 43.8 61.6

4 1 55.0 16.3 24.4 53.1 52.5 13.2 31.4 85.4 79.62 42.7 16.0 22.7 44.3 42.2 13.9 27.7 64.8 69.43 35.0 14.5 20.2 36.8 28.1 12.1 24.0 45.6 50.2

8 0.5 1 44.0 19.2 29.4 47.9 80.6 18.7 32.9 53.1 88.62 34.2 18.8 26.1 38.9 66.8 18.7 28.3 43.4 80.23 28.0 16.3 22.0 30.8 32.7 16.1 23.7 32.5 47.2

1 1 44.0 18.0 26.9 47.0 68.5 16.9 31.7 56.9 85.42 34.2 17.4 24.2 38.7 51.7 17.1 27.4 46.1 72.83 28.0 15.5 20.7 30.7 24.2 15.1 23.0 33.3 41.1

2.5 1 44.0 15.5 22.9 45.4 43.3 13.5 28.7 67.7 72.32 34.2 14.8 21.1 37.3 31.6 14.2 25.1 51.7 56.83 28.0 13.5 18.3 30.1 15.5 13.2 21.3 36.2 31.8

4 1 44.0 14.0 20.6 43.4 32.4 11.3 26.5 74.0 62.12 34.2 13.3 19.2 36.4 23.7 12.3 23.5 54.4 47.53 28.0 12.3 16.9 29.7 12.5 12.0 20.1 37.7 27.1

9 0.5 1 55.0 21.8 31.1 50.2 87.9 21.7 34.5 56.2 92.12 42.7 20.4 27.6 41.9 77.3 20.1 29.9 46.7 85.83 35.0 18.3 23.5 32.8 45.8 17.5 25.1 34.5 60.2

1 1 55.0 20.1 28.5 49.5 77.0 19.9 33.1 61.0 89.12 42.7 18.7 25.6 41.2 62.1 18.0 28.7 49.8 79.13 35.0 16.9 22.1 32.4 34.1 15.8 24.2 35.7 52.1

2.5 1 55.0 16.8 24.0 47.3 50.3 16.5 29.9 71.8 78.02 42.7 15.7 22.2 39.8 38.4 14.1 26.3 56.1 63.13 35.0 14.5 19.5 31.7 20.9 12.6 22.5 38.7 39.5

4 1 55.0 14.9 21.6 45.5 37.0 14.0 27.6 77.4 67.12 42.7 14.0 20.1 38.5 28.3 11.7 24.5 58.7 52.93 35.0 13.1 18.0 31.2 15.7 10.7 21.2 41.0 33.0

Page 161: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

7.3. A simulation study 149

The standard errors based on the simulations are plotted in Figure 7.2. Model (7.1b) hasa higher precision than Model (7.1a). This is more clearly shown at the later storage times.These results seem to be in line with the results of the theoretical standard errors although thesimulated results are slightly higher than the theoretical ones in Figure 7.1. The reason thatthey are increased has to do with two aspects: (1) the standard error itself must be estimatedand (2) the analysis first conducts a test of poolability of slopes for batches (at α=0.25) beforethe final model is fitted and parameters are estimated.

Figure 7.2: The prediction standard errors of the predicted bioactivity at storage times basedon the simulations with poolable slopes for Model 7.1a (solid line) and 7.1b (dashed line)

Note that the simulation results make use of the precision estimates Shijk in the analysis.Ignoring these estimates would increase the standard errors further in the simulations. Forinstance, for σ2

R = 4 the precision of the prediction is increased to σhi(24) = 1.5181 for Model(7.1a) and σhi(24) = 1.2671 for Model (7.1b). These values are higher than those obtainedwhen the precision Shijk is incorporated, for example, σhi(24) = 1.5178 for Model (7.1a) and

Page 162: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

150 7.4. Discussion and conclusion

σhi(24) = 1.0941 for Model (7.1b). This indicates that ignoring the standard errors of thebioactivity has a significant effect on the precision of the prediction interval more so for Model(7.1b) but not for Model (7.1a).

7.4 Discussion and conclusion

This chapter sets out to investigate two experimental designs that can be used for the estimationof the shelf life of the bioactivity of a drug product. The first design assumed that the batches atone storage condition are simultaneously measured within bioassay runs. This would typicallybe the choice when the data would be analysed per storage condition. For the second design,the storage conditions for one batch are simultaneously measured within bioassay runs. To beable to address this experimental structure in the analysis of the data, all data must be analysedsimultaneously instead of an analysis per storage condition. Since a combined analysis of alldata is preferred over an analysis per storage condition, we analysed both structures with mixedeffects models addressing the bioassay runs and the accompanied precision estimates of thebioactivities using the SAS statistical software. The analysis was fully adapted to bioactivitiesto include all available data from the bioassays.

The two designs capture the variation differently as indicated by different estimates of theshelf life variance. The model that results in the smallest precision provides the most consistentestimates, but precision should be balanced out with bias. The simulation results showed that(7.1a) resulted in shelf life values that are more precise than those obtained with (7.1b). However,(7.1b) demonstrated less bias than (7.1a). Furthermore, we theoretically evaluated the twomodels and compared these results to the simulation results. It was clear that Model (7.1a) hasa poor precision irrespective of whether the standard errors of the bioactivity are included in themodel or not. On the other hand there was a clear benefit of adding this extra information forModel (7.1b) with its precision showing a huge improvement with the addition of the standarderrors of the bioactivity.

The main conclusion from this chapter is that the shelf life estimation of bioactivities is bestwhen samples from a batch at different storage conditions are analysed in the same bioassayruns and batches are measured in different bioassay runs. This implies that the data shouldbe processed all together and not per storage condition, otherwise the bioassay runs cannot beappropriately addressed.

Page 163: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

7.4. Discussion and conclusion 151

Appendix

The SAS codes used for the analysis are given below. The data ‘S2’ in the PARMS statementcontains variance estimates for the bioactivities including the initial value of the run-to-runvariability. The letters b, c, and t represent batches, storage conditions, and time.

Model 7.1a

proc mixed data=degrad noinfo noitprint noclprint method=reml ;class b c time_cls;model x = b t*b*c/noint solution alpha=0.25 ddfm=bw notest CovB;random intercept/sub=run(c*time_cls) type=vc;repeated / type=vc local;parms /parmsdata=S2;ods output CovParms=CovParms SolutionF=SolutionF CovB=CovB;by sim;run;

Model 7.1b

proc mixed data=degrad noinfo noitprint noclprint method=reml ;class b c time_cls;model x = b t*b*c/noint solution alpha=0.25 ddfm=bw notest CovB;random intercept/sub=run(b*time_cls) type=vc;repeated / type=vc local;parms /parmsdata=S2 ;ods output CovParms=CovParms SolutionF=SolutionF CovB=CovB;by sim;run;

The result for XTjkV

−11 Xjk and its summation corresponding to Model (7.1a) and (7.1b) are

respectively given by

Page 164: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

152 7.4. Discussion and conclusion

XT jkV−

11Xjk

=c

IIr

···

Ir

(1+

(H−

1)r)tj

(1+

(H−

1)r)tj

···

(1+

(H−

1)r)tj

Ir

I···

. . .(1

+(H−

1)r)tj

(1+

(H−

1)r)tj

···

(1+

(H−

1)r)tj

. . .. .

.Ir

. . .. . .

. . .Ir

···

Ir

I(1

+(H−

1)r)tj

(1+

(H−

1)r)tj

···

(1+

(H−

1)r)tj

(1+

(H−

1)r)tj

(1+

(H−

1)r)tj···

(1+

(H−

1)r)tj

H(1

+(H−

1)r)t

2 j0

···

0(1

+(H−

1)r)tj

(1+

(H−

1)r)tj···

(1+

(H−

1)r)tj

0H

(1+

(H−

1)r)t

2 j···

0. . .

. . .. . .

. . .. .

.0

(1+

(H−

1)r)tj

(1+

(H−

1)r)tj···

(1+

(H−

1)r)tj

00

···

H(1

+(H−

1)r)t

2 j

an

d

J ∑ j=1

K ∑ k=

1XT jkV−

11Xjk

=c

IKJ

IKJr

···

IKJr

K(1

+(H−

1)r)t.

K(1

+(H−

1)r)t.

···

K(1

+(H−

1)r)t.

IKJr

IKJ

. . .K

(1+

(H−

1)r)t.

K(1

+(H−

1)r)t.

K(1

+(H−

1)r)t.

. . .. .

.IKJr

. . .. . .

. . .IKJr

···

IKJr

IKJ

K(1

+(H−

1)r)t.

K(1

+(H−

1)r)t.

···

K(1

+(H−

1)r)t.

K(1

+(H−

1)r)t.

K(1

+(H−

1)r)t.

···

K(1

+(H−

1)r)t.

HK

(1+

(H−

1)r)t

2 .0

···

0

K(1

+(H−

1)r)t.

K(1

+(H−

1)r)t.

···

K(1

+(H−

1)r)t.

0HK

(1+

(H−

1)r)t

2 .. . .

. . .. . .

. . .. . .

. ..

0K

(1+

(H−

1)r)t.

K(1

+(H−

1)r)t.

···

K(1

+(H−

1)r)t.

0···

0HK

(1+

(H−

1)r)t

2 .

Page 165: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

7.4. Discussion and conclusion 153

XT jkV−

11Xjk

=c

I(1

+(I−

1)r)

0···

0(1

+(I−

1)r)tj

(1+

(I−

1)r)tj···

(1+

(I−

1)r)tj

0I(1

+(I−

1)r)···

0(1

+(I−

1)r)tj

(1+

(I−

1)r)tj···

(1+

(I−

1)r)tj

. . .. .

.. . .

. . .. . .

. . .0

0···

I(1

+(I−

1)r)

(1+

(I−

1)r)tj

(1+

(I−

1)r)tj···

(1+

(I−

1)r)tj

(1+

(I−

1)r)tj

(1+

(I−

1)r)tj···

(1+

(I−

1)r)tj

Ht2 j

Hrt2 j

···

Hrt2 j

(1+

(I−

1)r)tj

(1+

(I−

1)r)tj···

(1+

(I−

1)r)tj

Hrt2 j

Ht2 j

···

Hrt2 j

. . .. . .

···

. . .. . .

. ..

. . .(1

+(I−

1)r)tj

(1+

(I−

1)r)tj···

(1+

(I−

1)r)tj

Hrt2 j

Hrt2 j

···

Ht2 j

an

d

J ∑ j=1

K ∑ k=

1XT jkV−

11Xjk

=c

IJK

(1+

(I−

1)r)

0···

0K

(1+

(I−

1)r)t.

K(1

+(I−

1)r)t.···

K(1

+(I−

1)r)t.

0IJK

(1+

(I−

1)r)···

0K

(1+

(I−

1)r)t.

K(1

+(I−

1)r)t.···

K(1

+(I−

1)r)t.

. . .. .

.. . .

. . .. . .

. . .0

0···

IJK

(1+

(I−

1)r)

K(1

+(I−

1)r)t.

K(1

+(I−

1)r)t.···

K(1

+(I−

1)r)t.

K(1

+(I−

1)r)t.

K(1

+(I−

1)r)t.···

K(1

+(I−

1)r)t.

KHt2 .

KHrt2 .

···

KHrt2 .

K(1

+(I−

1)r)t.

K(1

+(I−

1)r)t.···

K(1

+(I−

1)r)t.

KHrt2 .

KHt2 .

···

KHrt2 .

. . .. . .

···

. . .. . .

. ..

. . .K

(1+

(I−

1)r)t.

K(1

+(I−

1)r)t.···

K(1

+(I−

1)r)t.

KHrt2 .

KHrt2 .

···

KHt2 .

Page 166: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

154 REFERENCES

References

Almalik, O., Nijhuis, M. B., and van den Heuvel, E. R. (2014), “Combined Statistical Analysesfor Long-term Stability Data with Multiple Storage Conditions: A Simulation Study,” Journalof Biopharmaceutical Statistics, 24, 493–506.

Bartlett, M. S. (1937), “Properties of Sufficiency and Statistical Test,” Proceedings of the RoyalStatistical Society, Series A, 160, 268–282.

Brown, H. and Prescott, R. (2006), Applied Mixed Models in Medicine, Chichester, West Sussex:John Wiley & Sons, 2nd ed.

ICH Q1A(R2) (2003), “Stability Testing of New Drug Substances and Products,” ICH Har-monised Tripartite Guideline, 24.

Mzolo, T., Goris, G., Talens, E., Di Bucchianico, A., and Van den Heuvel, E. (2015), “Statis-tical Process Control Methods for Monitoring In-house Reference Standards,” Statistics inBiopharmaceutical Research, 7, 55–65.

Mzolo, T., Hendriks, M., and Van den Heuvel, E. (2013), “A Comparison of Statistical Methodsfor Combining Relative Bioactivities from Parallel Line Bioassays,” Pharmaceutical Statistics,12, 375–384.

Quinlan, M., Stroup, W., Schwenke, J., and Christopher, D. (2013), “Evaluating the Performanceof the ICH Guidelines for Shelf Life Estimation,” Journal of Biopharmaceutical Statistics, 23,881–96.

Van den Heuvel, E. R., Almalik, O., Nijhuis, M. B., and Warner, E. I. (2011), “Statistical Anal-ysis for Long-term Stability Studies With Multiple Storage Conditions,” Drug InformationJournal, 45, 301–314.

Verbeke, G. and Molenberghs, G. (2000), Linear Mixed Models for Longitudinal Data, New York:Springer.

West, B. T., Welch, K. B., and Galecki, A. T. (2007), Linear Mixed Models, a Practical GuideUsing Statistical Software, vol. 27, Boca Raton, Florida: Chapman & Hall.

Page 167: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

CHAPTER 8

Concluding remarks

This thesis sets out to evaluate and improve the statistical methodology used to analyse datafrom bioassays. There has been little focus on this type of methodology as evidenced by thelimited literature on this subject in the last thirty years. The findings of the thesis and possiblefurther research topics are summarised in the forthcoming sections.

8.1 Estimation of the bioactivity

Chapters 2-3 of this thesis focused on the estimation of the bioactivity. In Chapter 2, an equiv-alence testing approach for similarity in bioassays is proposed. For the similarity or parallelismof dose-response curves in the bioassay, an equivalence approach that is based on relative bioac-tivities of the samples involved was proposed. This equivalence testing approach was comparedto the traditional hypothesis testing. Data from case studies based on parallel line, slope-ratio,quantal, and quantitative bioassays was used to demonstrate the appropriateness of the equiva-lence test on similarity. The take-home message was that, although the equivalence test directlyformulated on the bioactivities is theoretically feasible when the dose range is limited to certainintervals, it requires a large sample size to be met. Given that the sample size is large, thisequivalence testing approach provides a better alternative to the traditional hypothesis becausethe equivalence limits are derived directly from the relative bioactivity.

In our approach, several assumptions were made for the nonlinear dose-response relation-ship. For instance, for the 4PL model 2PP response curves, the asymptotes of the responsecurves and the response curves were assumed equal for the test and known preparations. Thisresulted in a tractable gθ(·) (as defined in Sections 2.3.1 and 2.3.2) function for an equivalencetesting approach. This function gθ(·) gives an indication as to whether the bioactivity is dose-independent (i.e., gθ(·) = 1) or not (i.e., gθ(·) 6= 1). It would be of interest to learn how changesin the parameters α, γ, and δ affect the proposed equivalence approach. In our research wehave assumed that the dose-response relationships have a certain form with only a set of pa-

Page 168: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

156 8.2. Monitoring the bioactivity

rameters being unknown. However, nonparametric estimation of the dose-response relationshipswas not investigated. Our theory does not require a parametric form, but the consequences ofnonparametric estimation on similarity have not been investigated.

Chapter 3 focused on the estimation of the bioactivity from parallel line bioassays whenthe similarity assumption is fulfilled. In this chapter, the unweighted, weighted, and likeli-hood approaches for combining bioactivities were assessed. Most of the comparative methodswere developed for bioassays, but the random effects and profile likelihood approaches were alsoincluded. The latter approach was developed specifically for meta-analysis, and it seemed rea-sonable to include it to assess how well it performed in combining bioactivities from differentbioassay runs. The results showed that the unweighted average method is consistently betterthan the weighted average and likelihood approaches, although it tends to be less precise (widerconfidence intervals) than the methods of Cochran (1954) and Bliss (1952). These two weightedaverage approaches performed better, especially when the pretesting level of significance is re-laxed, that is when α > 0.05 was used. Surprisingly, the profile likelihood and random effectsapproach performed poorly. This reiterated the notion that some of the methods that performbetter in meta-analysis do not translate well to the combination of bioactivities.

Additional research is still needed to investigate if the performance of the methods of com-bining log bioactivities is different when it is applied to sigmoid dose-response curves (discussedin Chapter 2) instead of parallel line models (Chapter 3). The estimation of the bioactivity forsigmoidal curves is different from parallel line bioassays. For the 4PL curves the log bioactiv-ity is the difference in the log ED50 of the two preparations. This calculation is closer to thecalculations of associations of treatment effects in clinical trials. Thus, for sigmoidal curves itis expected that the approaches used in meta-analysis may outperform the other methods, butthis has not been investigated for the specific settings of bioassays yet.

8.2 Monitoring the bioactivity

The objective of Chapter 4 was to evaluate statistical methods that can be used to monitor thein-house reference standard. This in-house reference standard is created for routine analysis toavoid using an expensive international reference standard. The stability of the bioactivity ofthe in-house reference standard should be assessed during the period of its use. In this chapter,we used ANOVA-type contrasts that are commonly used in dose-response finding studies. Inaddition, we also used the EWMA and Shewhart control charts. These control charts arecommonly used in quality control to assess the stability of processes. Different α-spendingfunctions were used to correct for the inflation of the type I error rate. The results indicatedthat pretesting, that is, assessing the significance of the run-to-run variability tend to increasethe likelihood of inflating the type I error. Shifts and trend profiles were assessed and the results

Page 169: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

8.2. Monitoring the bioactivity 157

did not show a clear winner in terms of detecting a change in bioactivity. However, it wasnoted that the linear contrasts, EWMA (λ=0.6), and Shewhart chart had a higher probabilityof signaling a change when more bioassay runs are used. The Shewhart chart had a higherprobability of signaling early changes in bioactivities compared to the linear contrasts. TheHelmert and reverse-Helmert contrasts were more powerful at detecting declines in bioactivitiesat later and earlier time points, respectively. For these approaches to be effective, more bioassayruns should be used at the initial stage of the monitoring period.

For some products, the international reference is not available. This means that the phar-maceutical company must create both a primary reference standard and in-house or secondaryreference standards. The primary reference Rp (taking the place of the international referencestandard) is typically stored at -70 or -80◦C, and the secondary reference Sm (taking the placeof the in-house reference standard) is typically stored at -20◦C. At time point tm, Si secondaryreferences are measured, where i = {m − 2, m − 1, m}. New secondary reference standardsare constantly created to maintain a system of stable references. The primary reference stan-dard is used to quantify the bioactivities of the secondary references. As a result the primaryreference standard can be monitored for stability, simultaneously with the stability of the sec-ondary reference standards. At each time point the most recently qualified secondary referencestandard is used for routine analysis. In this chapter, the EWMA approach was proposed asthe methodology for monitoring the stability of the primary and secondary reference standards.The simulation study focused on the stability of the primary reference standard and the opti-mal values of the weighting parameter λ and the size of the control limits L were investigatedusing the POS and APOS. The stability of the secondary reference standards is not investigatedbecause the newest standard is always used for routine analysis. But to lengthen the period ofuse from ∆t to 2∆t or even higher, the EWMA control chart can be used.

The analysis focused on one bioactivity for each standard reference as shown in the platedesign (Table 5.1). It would be interesting to perform the analysis for multiple bioactivities tounderstand the influence of repeats on the performance of the EWMA approach. This impliesthat a common relative bioactivity of each secondary reference will have to be estimated. Thiswas not included in the current analysis due to the amount of computation time that wouldbe required if such a design was to be considered. In addition, the simulation focused on theresponse curves that are assumed to be equivalent. We intend to expand the analysis furtherby considering multiple bioactivities and response curves with different biological responses atthe lower and upper asymptotes. This means that equivalence tests, such as the one proposedin Chapter 2 will have to be performed prior to estimating the relative bioactivity.

In Chapters 4 and 5, the EWMA method and the dose-finding contrasts were used. Thesemethods have never been used in the bioassay applications as far as we know. The two approachesperformed well in the monitoring of the bioactivity, and this showed that some of the methods

Page 170: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

158 8.3. Estimation of tolerance limits

that were not specifically developed for bioassays can still be used. Hence, more work on theevaluation of these and other methods in the bioassay applications is necessary. In particular,the issue of correlation between sequentially observed bioactivities in the EWMA chart shouldbe studied in more detail. This correlation structure is special and unique to the bioassay settingand it may lead to different EWMA charts or criteria to detect changes.

8.3 A modified Satterthwaite approach for estimation ofone-sided tolerance limits for general mixed effectsmodels

In Chapter 6, we focused on the specification limits of drug products. Tolerance intervals canbe used to determine these specification limits. In this chapter, we propose a new and simplemethod that is flexible for estimating tolerance limits. Tolerance intervals require the estimationof variance components when they are calculated based on a mixed or random effects model.In most cases, the mean squares are used to estimate the variance of a linear combination ofvariances. But, when an unbalanced design is used, some of the mean squares are not uniquelydefined and this creates a difficulty in the estimation of tolerance limits. The new approachdoes not use the mean squares but the REML variance components estimates themselves. Itperformed well under the settings that were considered. It is however important to note that thesettings that were considered are related to bioassays and for some applications, the approachmay fail. More work is still needed to improve the performance of this approach in more generalsettings.

The performance of the proposed method was good for some of the settings that were inves-tigated. However, it was apparent that in the setting where (I, J, K) was given by (3, 6, 6)the proposed approach resulted in significantly liberal coverage values for the two-way crossedrandom effects model. This was more apparent when the interclass correlation was small andthis suggested that further investigation is required to fully understand the dynamics of thisapproach. The issues of our approach could be due to the tolerance factor that we used in thecalculation of the tolerance limits or the estimation of variance component σ2

T . In addition,the variance parameter values that were used are more general, hence some of the variance as-sumptions may not be appropriate for a bioassay application. For example, the variance of thebatches was assumed to be lower than the residual variance, thus rendering the batch-to-batchvariability almost nonexistent. Cases where the batch-to-batch variability is more prominentshould be investigated or settings that represent a true design of the bioassay application.

Page 171: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

8.4. Estimation of the shelf life of a drug product 159

8.4 Estimation of the shelf life of a drug product

In Chapter 7, we considered a case of shelf life estimation of drug products where several ex-periments are performed, that is, experiments that require more than one bioassay run. Twodifferent designs were investigated, and it was concluded that a design where samples from abatch at different storage conditions are analysed in the same bioassay runs and batches aremeasured in different bioassay runs is more appropriate. The analysis of the shelf life is madecomplex by the presence of the precision estimates of the bioactivities. Hence, a mathematicalcomparison of the precision of the shelf life estimates from the two models is not straightforwarddue to the presence of S2

hijk. The precision of the shelf life estimates would be possible to obtainsince the log bioactivities follow a multivariate normal distribution, although the likelihoodsare less tractable due to the involved precision estimates. For future research it will be inter-esting to attempt to approximate the precision of the shelf life estimates for the two proposedexperimental designs and verify theoretically if the conclusions of the simulations make sense.

Page 172: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016
Page 173: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

Summary

Biostatistics plays a crucial role in the development, licensing, release, and marketing of biologicalproducts in the pharmaceutical industry. Statistical methods used in bioassays are unique andsome of them are outdated. Statistical challenges are often encountered that require tailor-madestatistical methodology for better estimation of drug activity. This type of statistical researchis necessary to improve the precision and reliability of bioassays and to increase the efficiencyof testing for release of medicinal and drug products. This thesis focused on two stronglyinterrelated aspects of statistical methods used for biological assays. These are the estimationof the bioactivity and the estimation of product quality using bioassays.

A relative bioactivity is estimated by fitting bioassay data on an appropriate model whichcould be a model suitable for linear or nonlinear dose-response relationship. The bioactivityestimate of a test sample relative to a known or standard sample is then computed using theresults from the fitted model. For the estimation of the bioactivity, a statistical method thatis highly precise when combining bioassay estimates is still not known. Another issue in thebioassay analysis is that official guidelines require the similarity assumption to be assessed us-ing an equivalence hypothesis, but the feasibility of this approach has not been appropriatelyevaluated in the context of bioassays. Our results showed that although the equivalence ap-proach is theoretically appropriate for testing for similarity it requires a large sample size to bemet. In Chapter 3 we demonstrated that statistical methods that are commonly preferred inmeta-analysis may not be recommended for the combination of bioassay runs. Furthermore, theweighted average approaches performed better especially when the significance level is relaxed.

The second part of the thesis focused on several topics of the product quality where thequality of the product is mainly measured using the bioactivity. Statistical methodology formonitoring the bioactivity of a known in-house preparation with respect to an internationalreference standard or monitoring the bioactivity of a known in-house preparation when an inter-national reference standard is not available is very limited or at most nonexistent. This meansthat new methodology is required to enable the monitoring of the bioactivity and this methodol-ogy should take into account the bioassay data structure. Monitoring the standard preparationshelp guarantee the quality of drug products that are released to the market. Another qualityrelated aspect of a drug product is the determination of specification limits. For the in-house

Page 174: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

162 8.4. Estimation of the shelf life of a drug product

preparation, we demonstrated that assessing the significance of the run-to-run variability in-flates the familywise error and the ANOVA-type linear contrasts and Shewhart chart performedbetter in detecting a change in the bioactivity irrespective of the decline profile. In Chapter 5we showed that the exponential weighted moving average with λ = 0.2 has a higher probabilityof detecting a decline in the bioactivity of the primary reference. Concerning the specificationslimits, we proposed a general approach that resulted in a good coverage for the settings that weconsidered.

Batches of products produced by pharmaceutical companies are stored under different storageconditions to maintain their shelf life. These stoage conditions may lead to different degradationrates of a drug product. Thus, it is important that the shelf life of the batch is estimatedprecisely. In this thesis we considered two design structures of the stability degradation thatcan be considered. The first design assumed that bioassay runs are nested within conditionsand the second design assumes bioassay runs are nested within batches. The performance inestimating shelf lives under different storage conditions for multiple batches was investigated forthese designs. It was concluded that a design where samples from a batch at different storageconditions are analysed in the same bioassay runs and batches are measured in different bioassayruns is more appropriate.

Page 175: Statistical methods for the analysis of bioassay dataThis work was financially supported by the pharmaceutical company MSD based in Oss, The Netherlands. c ThembileVirginiaMzolo,2016

Curriculum Vitae

Thembile Virginia Mzolo was born on August 22, 1983 at Othulini, South Africa. After fin-ishing high school education at Muden Combined School in 2002, she went on to study for adegree in Operations Research (2003-2005), followed by an honours degree in Statistics (2006)and a Master degree in Statistics (2007-2008) at the University of KwaZulu Natal in Pietermar-itzburg, South Africa. In 2008 she started working as a statistician at Human Sciences ResearchCouncil in Pretoria, South Africa. In October 2009 she was accepted on a scholarship for aMaster in Statistics specialising in Biostatistics at Hasselt University, where she graduated witha ‘distinction’ in 2011.

In March 2012, she started her PhD research project at the University Medical Center Gronin-gen, University of Groningen under the supervision of Edwin van den Heuvel. The project fo-cused on statistical methods for the analysis of bioassay data. In 2015 she moved to EindhovenUniversity of Technology to complete her PhD under the continuous supervision of Edwin vanden Heuvel and Alessandro Di Bucchianico. The research has led to several publications ininternational journals. This dissertation is a collection of the research conducted in the pastfour years.

During her PhD Thembile was involved in teaching activities in Groningen and Eindhoven.She participated in various conferences, workshops, and meetings in Groningen, Rotterdam,Potsdam, Hasselt, London, and Brugge.


Recommended