ELIZABETH GARRETT-MAYER PROFESSOR OF BIOSTATISTICS DIRECTOR, BIOSTATISTICS SHARED RESOURCE HOLLINGS...

transcript

ELIZABETH GARRETT-MAYERPROFESSOR OF BIOSTATISTICS

DIRECTOR, BIOSTATISTICS SHARED RESOURCE

HOLLINGS CANCER CENTER

Statistical Considerations for Studies Involving Mice (et al.)

What started this? ARRIVE

Improving Bioscience Research Reporting: The ARRIVE Guidelines for Reporting Animal Research*

The gist: careful consideration should be given to planning experiments with

animals results should provide details of experimental designs, including sample

sizes, of mouse experiments.“A wealth of evidence shows that across many areas, the

reporting of biomedical research is often inadequate, leading to the view that even if the science is sound, in many cases the publications themselves are not ‘‘fit for purpose,’’ meaning that incomplete reporting of relevant information effectively renders many publications of limited value as instruments to inform policy or clinical and scientific practice.”

*Kilkenny et al., PLOS Biology, June 2010.

Why are specifics not reported?

Lots of reasons space lack of understanding of their importance lack of confidence in methods used or inability to

articulate methods recognition that results may look less important if all

details are included they are not required! peer reviewers don’t care or understand novel approaches often get panned because it isn’t a t-

test, for example.

Why are specifics not reported?

If you don’t have good rationale for your planned experimental design, what are you going to say? “we experimented on 3 mice, and the p-value was non-

significant. So, we continued adding 3 mice to the experiment until we got a significant p-value.”

“We chose 6 mice per group because we always use 6 mice per group.”

“We chose 8 mice per group because that was all the budget allowed.”

“We chose to report the results of differences in tumor size at day 45 because differences at all other days were not statistically significant.”

ARRIVE guidelines: 20 item checklist

TITLE (1) Provide as accurate and concise a description of the content of the

article as possible.ABSTRACT (2) Provide an accurate summary of the background, research objectives

(including details of the species or strain of animal used), key methods, principal findings, and conclusions of the study.

INTRODUCTION Background

(3 a). Include sufficient scientific background (including relevant references to previous work) to understand the motivation and context for the study, and explain the experimental approach and rationale.

(3 b.) Explain how and why the animal species and model being used can address the scientific objectives and, where appropriate, the study’s relevance to human biology.

Objectives (4) Clearly describe the primary and any secondary objectives of the study, or specific hypotheses being tested.

METHODS Ethical statement (5) Indicate the nature of the ethical review permissions, relevant licences

(e.g. Animal [Scientific Procedures] Act 1986), and national or institutional guidelines for the care and use of animals, that cover the research.

Study design (6) For each experiment, give brief details of the study design, including: a. The number of experimental and control groups. b. Any steps taken to minimise the effects of subjective bias when allocating animals to treatment (e.g.,

randomisation procedure) and when assessing results (e.g., if done, describe who was blinded and when). c. The experimental unit (e.g. a single animal, group, or cage of animals). A time-line diagram or flow chart can

be useful to illustrate how complex study designs were carried out. Experimental procedures (7) For each experiment and each experimental group, including

controls, provide precise details of all procedures carried out. For example: a. How (e.g., drug formulation and dose, site and route of administration, anaesthesia and analgesia used

[including monitoring], surgical procedure, method of euthanasia). Provide details of any specialist equipment used, including supplier(s).

b. When (e.g., time of day). c. Where (e.g., home cage, laboratory, water maze). d. Why (e.g., rationale for choice of specific anaesthetic, route of administration, drug dose used).

Experimental animals (8) a. Provide details of the animals used, including species, strain, sex, developmental stage (e.g., mean or median

age plus age range), and weight (e.g., mean or median weight plus weight range). b. Provide further relevant information such as the source of animals, international strain nomenclature, genetic

modification status (e.g. knock-out or transgenic), genotype, health/immune status, drug- or testnaıve, previous procedures, etc.

Housing and husbandry (9) Provide details of: a. Housing (e.g., type of facility, e.g., specific pathogen free (SPF); type of cage or housing;

beddingmaterial; number of cage companions; tank shape and material etc. for fish). b. Husbandry conditions (e.g., breeding programme, light/dark cycle, temperature, quality of

water etc. for fish, type of food, access to food and water, environmental enrichment). c. Welfare-related assessments and interventions that were carried out before, during, or after

the experiment. Sample size (10)

a. Specify the total number of animals used in each experiment and the number of animals in each experimental group.

b. Explain how the number of animals was decided. Provide details of any sample size calculation used.

c. Indicate the number of independent replications of each experiment, if relevant. Allocating animals to experimental groups (11)

a. Give full details of how animals were allocated to experimental groups, including randomisation or matching if done.

b. Describe the order in which the animals in the different experimental groups were treated and assessed.

Experimental outcomes (12) Clearly define the primary and secondary experimental outcomes assessed (e.g., cell death, molecular markers, behavioural changes).

Statistical methods (13) a. Provide details of the statistical methods used for each analysis. b. Specify the unit of analysis for each dataset (e.g. single animal, group of animals, single

neuron). c. Describe any methods used to assess whether the data met the assumptions of the statistical

approach.

RESULTS Baseline data (14) For each experimental group, report relevant characteristics

and health status of animals (e.g., weight, microbiological status, and drug- or test-naıve) before treatment or testing (this information can often be tabulated).

Numbers analysed (15) a. Report the number of animals in each group included in each analysis. Report absolute

numbers (e.g. 10/20, not 50%a). b. If any animals or data were not included in the analysis, explain why.

Outcomes and estimation (16) Report the results for each analysis carried out, with a measure of precision (e.g., standard error or confidence interval).

Adverse events (17) a. Give details of all important adverse events in each experimental group. b. Describe any modifications to the experimental protocols made to reduce adverse events.

DISCUSSION Interpretation/scientific implications (18)

a. Interpret the results, taking into account the study objectives and hypotheses, current theory, and other relevant studies in the literature.

b. Comment on the study limitations including any potential sources of bias, any limitations of the animal model, and the imprecision associated with the resultsa.

c. Describe any implications of your experimental methods or findings for the replacement, refinement, or reduction (the 3Rs) of the use of animals in research.

Generalisability/translation (19) Comment on whether, and how, the findings of this study are likely to

translate to other species or systems, including any relevance to human biology.

Funding (20) List all funding sources (including grant number) and the role of the

funder(s) in the study.

Manuscripts vs. grants

Similar principles, but very limited space in grants

Convince the reviewers you have a clear question you have a method for gathering relevant data you can measure the outcomes of interest you know what to do with the data the results will answer the clear question

1. You have a clear question

Stated objective and rationale should make it clear what your scientific question is.

Stating your hypothesis can be very helpful. E.g., the knockout model will have faster tumor

growth than the wildtype model. E.g., gene expression will be lower in the mice treated

with inhibitor compared to untreated mice.This is standard in clinical research but not

as common in basic/translational research.

2. You have a method for gathering relevant data

Experimental design!!Details….we need details.If you cannot explain the design to the reviewer, he

cannot understand the data that is generated.Be very careful of ‘bias’s’ in your designExample: evaluating metastases.

design says mice will be followed until death or primary tumor reaches xx mm3.

Differential follow-up time. Example: comparing tumor size

design says you will compare tumor volumes at day 60 preliminary data suggests that you will have had to sacrifice most of

the animals in your control group by day 50. how can you compare tumors when the mouse died 10+ days ago?

Randomization? important to consider “confounders” confounder: a ‘variable’ that might affect your outcome

that is not related to the experimental conditions e.g., shipping batch e.g., diet/temperature/location of cage

Blinding? inherent biases when you know group assignment! important to NOT know if the mouse being evaluated is

in the group you expect to do better/worse. subconcious effects. in clinical research, taken for

granted.

Sources of variation? transgenic vs. xenografts models? how many cell lines? Or, are you using (multiple)

primary tumors? fresh vs. frozen tissue? reference gene? is it really ‘stable’? Sampling

longitudinal? same mouse measured repeatedly over time separate cohorts? sac’ing different cohorts over time

Measurement same vs. different ‘raters’? different measurement approaches?

3. how you measure the outcome

Very often see “we will evaluate differences in antitumor efficacy”

tumor size at time t? tumor growth rate? presence/absence of metastases? tumor take rate? caliper vs. imaging measures

Other measures: gene expression, methylation, mutations, histology, etc. assays need to be included type of measure should be included

continuous expression? positive vs. negative? IHC, for example, can be expressed in many ways.

4. You know what to do with data

Example: You have measured the tumor volume every 5 days for 100 days

on 100 mice. 21 measures per mouse x 100 mice = 2100 measures. Why would you look only at the measures at one time point?

Statistical analysis plan is very important.There are different approaches for

continuous outcomes (e.g. tumor volume at time t). binary/categorical outcomes (e.g, presence of metastases) time to event outcomes (e.g., time to death/sacrifice)

There are different approaches for longitudinal data comparing vs. estimating vs. dose finding

Example

Moussa O, Ashton AW, Fraig M, Garrett-Mayer E, Ghoneim MA, Halushka PV, Watson DK. Novel role of thromboxane receptors beta isoform in bladder cancer pathogenesis. Cancer Research, 2008, Jun 1; 68(11): 4097-4104.

Xenograft mouse model. TCC-SUP tumorigenic human bladder cancer cells were selected as they express TP-β receptor and were used for the drug combination studies. Immortalized nontransformed normal urothelial SV-HUC cells were selected because they express the TP-α. These cells were stably transfected with pcDNA3, TP-α, or TP-β for cell transformation studies. Both cell lines were used in a s.c. model in immunocompromised (nu/nu) mice. TCC-SUP cells (5 × 106) or SV-HUC cells (5 × 107) in Matrigel (BD Bioscience, Inc.) were injected s.c. into the right and left flanks of anesthetized mice. Tumor growth was monitored in these mice twice a week. For mice injected with TCC-SUP, GR32191 or vehicle control was administered daily (20 mg/kg) by gavage with treatment initiated 24 h after initial injection. Two cycles of cisplatin [single high dose (5 mg/kg) or single low dose (0.5 mg/kg)] were administrated at day 4 and day 11 post-tumor cell injection

The data

0 20 40 60 80

Time (days)

4000 High Cis+GRHigh CisLow Cis+GRLow CisGRControl

What are our questions?

Is the time to tumor initiation different across treatment groups? is onset later in the cisplatin groups than in the other groups?

Is the growth rate different across treatment groups? is the growth rate for high cisplatin smaller or larger than for low

cisplatin in the GR group?

These are questions that can be addressed statistically.

Vague questions: Is tumor size different? Which treatment is the most effective?

(when?)(using what metric?)

Time to tumor initiation: antitumor effects of GR32191 and cisplatin treatment in immunocompromised mice. Subcutaneous tumors from TCC-SUP human bladder cancer cells were treated with vehicle control (12 mice), GR32191 (15 mice), 5 mg/Kg cisplatin (cisplatin high; 12 mice), 5 mg/kg cisplatin in combination with GR32191 (13 mice), or 0.5 mg/kg (single low-dose cisplatin) alone (10 mice), or 0.5 mg/kg cisplatin in combination with GR32191 (10 mice). Tumor size was measured over time. Kaplan-Meier curves showing time to tumor onset across the treatment groups.

0 20 40 60 80

0 10 20 30 40 50 60 70 80

Time to Tumor (Days)

High Cis+GRHigh CisLow Cis+GRLow CisGRnoGR

Comparison Hazard Ratio

p-value 95% Confidence interval for hazard ratio

No GR vs. GR 2.42 0.02 1.12, 5.22No GR vs. Low 1.24 0.63 0.51, 2.99

No GR vs. Low + GR 29.1 <0.0001 8.70, 97.2No GR vs. High 304.1 <0.0001 65.4, 1414.2

No GR vs. High + GR 556.5 <0.0001 113.7, 2722.7GR vs. Low 0.51 0.12 0.22, 1.20

GR vs. Low + GR 12.0 0.0001 3.29, 43.79GR vs. High 125.6 <0.0001 25.2 626.4

GR vs. High + GR 229.8 <0.0001 43.9, 1203.3Low vs. Low + GR 23.5 <0.0001 5.88, 93.9

Low vs. High 245.7 <0.0001 45.7, 1320.5Low vs. High + GR 449.6 <0.0001 79.8, 2531.5Low + GR vs. High 10.5 <0.0001 3.36, 32.6

Low + GR vs. High + GR 19.1 <0.0001 5.77, 63.4High vs. High + GR 1.83 0.19 0.73, 4.61

Table 2: Hazard ratios comparing time to tumor in treatment groups. A hazard ratio greater than 1.00 implies that the first treatment has shorter time to tumor than the second in the comparison. For example, the hazard ratio comparing Low Cis vs. Low Cis + GR is 23.5. This implies that, at any given time for mice who haven’t yet developed a tumor, mice treated with Low Cis were 23 times more likely to have tumor incidence than mice treated with Low Cisp + GR. Hazard ratios less than 1 imply a protective effect. For example, the hazard ratio for GR vs. Low is 0.51. This implies that for mice who haven’t yet developed a tumor, those treated with GR are 0.51 times as likely to have tumor incidence as mice treated on Low Cis at any given point in time.

0 20 40 60 80

Time (days)

Tumor growth rate: how to compare?

0 20 40 60 80

Time from Tumor Initiation (days) or Day 60

Data used for tumor growth analysis: notice that for miceWith no tumor onset, they are included from day 60+. RemainingMice all have data shown for volumes>0.

0 20 40 60 80

Time from Injection (days)

High Cis+GRHigh CisLow Cis+GRLow CisGRControl

Fitted regression lines per mouse, by treatment group (stage 1 ofTwo stage analysis).

0 20 40 60 80

Time from Injection (days)

High Cis+GRHigh CisLow Cis+GRLow CisGRControl

Estimated regression lines per treatment group (result of stage 2 ofTwo stage analysis)

Comparisons of slopes

GR Low Cis

Low Cis + GR

High Cis High Cis + GR

Control 0.004 0.0001 <0.0001 0.14 0.0006GR 0.75 0.47 0.08 0.89Low Cis 0.58 0.01 0.84Low Cis + GR 0.003 0.49High Cis 0.03

Table 3: P-values for comparing slopes of tumor growth

Simpler ways to deal with the data?

Simple comparisons across groupsExample:

two groups of mice have tail vein injections to establish tumors. they are followed for a fixed amount of time and then are sacrificed.

Question 1: All mice get tumors. you want to compare tumor burden in the two groups of mice. What test would you use to compare them?

a) t-testb) Wilcoxon rank sum testc) Anovad) Fisher’s exact teste) it depends

Question 3: SOME mice get metastases. You want to compare incidence of metastases. What test should you use?

a) Chi-square testb) Fisher’s exact testc) Anovad) Kaplan Meiere) Signed rank testf) it depends

Question 2: All mice get tumors. you want to compare tumor burden in the two groups of mice. What are you comparing if you use a t-test?

a) the distribution of tumor volumeb) the distribution of the log of tumor volumec) the mean of tumor volume d) the mean of log tumor volumee) It depends

Simpler way to deal with the data?

By using simple approaches, you might be oversimplifying

This can hurt you. Example: you average triplicate values

naively assuming that there is 1/3 of the data as truly exists this will make your standard errors larger than they should be

This can invalidate your results.Example: repeated measures on the same mouse

assumed to be independent naively assuming that there is more data than truly exists. this will make your standard errors smaller than they should

5. Lastly…how many mice should you use?

Stay tuned for next week’s talk by EKG.

Quick aside….interpreting pvalues

Definition: the p-value is the probability of getting a result as or more extreme than you observed if the null hypothesis is true

Can anyone translate that?

Hypotheses

Ho: mean gene expression is the same in two groups

H1: mean gene expression is different in two groups

Next: we do an experiment, we collect data (e.g. gene expression), we perform a test.

What will affect the p-value? which hypothesis is actually true the variance of the values in each group the SAMPLE SIZE!!!

What if…?

What if we find a p-value of 0.02 comparing the mean gene expression in two groups? Your conclusion is…. i need more information including:

the effect size! how different is the gene expression? the sample size! how many samples/animals were in

each group? And, it would be nice to ‘see’ the data either in a

figure of by knowing the means and standard deviations in the two groups

Two scenarios

Scenario 1: 5 mice per group. Gene expression is 4 times higher on average in the KO group compared to the WT group. P-value is p=0.02.

Scenario 2: Tissue microarray. Gene expression is 1.2 times higher in late stage cancers (n=100) compared to the normals (n=50) (p=0.02). There is no significant difference (fold change = 1.05) between early stage cancers (n=100) and normals (p=0.45).

Take home points

1. Never interpret a p-value without additional information

2. statisticians can help you. at a minimum, we help you clarify your experimental design and definition of outcomes.

3. statisticians have many tools in our toolboxes. A t-test is not the hammer and every experiment is not a nail.

4. sample size justification will be next….very important (maybe most important) piece of statistical considerations in your grant.

ELIZABETH GARRETT-MAYER PROFESSOR OF BIOSTATISTICS DIRECTOR, BIOSTATISTICS SHARED RESOURCE HOLLINGS...

Documents