+ All Categories
Home > Documents > Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in...

Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in...

Date post: 28-Mar-2019
Category:
Upload: ngothien
View: 227 times
Download: 0 times
Share this document with a friend
40
Multiple Comparisons Methods in Genetic E id il St di Epidemiology Studies Yi Ren Wang, MPH Department of Epidemiology UCLA School of Public Health
Transcript
Page 1: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Multiple ComparisonsMethods in Genetic

E id i l St diEpidemiology StudiesYi Ren Wang, MPH

Department of EpidemiologyUCLA School of Public Health

Page 2: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

G ti E id i l T dGenetic Epidemiology Today

• Genetic association studies have become more ambitious:more ambitious:

Early studies focused on one or a few candidate SNPs

Recent studies target many SNPs and haplotypes using high throughput platforms

Page 3: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

G id A i ti St dGenome-wide Association StudyLarge number of genetic variations involved• 1 test for 500 000 SNPs• 1 test for 500,000 SNPs• 25,000 expected to be significant at

p<0.05, by chance aloneTo make things worseTo make things worse• Dominance

(additive/dominant/recessive)• Epistasis (multiple combinations of• Epistasis (multiple combinations of

SNPs)• Multiple phenotype definitions• Subgroup analyses• Subgroup analyses• Multiple analytic methods

Page 4: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Motivating Example

DNA-DSBR Pathway and Lung &DNA DSBR Pathway and Lung & UADT Cancer Study

Page 5: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

G l f th t dGoal of the study

This study intends to cover the genetic variations on the whole DNA-DSBRvariations on the whole DNA-DSBR pathway, in order to systematically reveal a f ll i t f h ti l hi ifull picture of how genetic polymorphisms in double-strand break pathway alters risks of lung cancer and UADT cancerThe potential gene-gene and gene-The potential gene gene and geneenvironment interactions will be explored

Page 6: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

St d D iStudy Design

Population-based case-control study in Los AngelesAngeles611 new cases of lung cancer601 new cases of UADT cancer1040 cancer free controls matched to cases1040 cancer-free controls matched to cases by age (within 10 years category) and

dgender

Page 7: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

G S l tiGene Selection

19 genes involved in the DNA-DSBR pathway were selected for evaluation basedpathway were selected for evaluation based on evidence for their role in either the h l bi ti i (HR)homologous recombination repair (HR) or the non-homologous end joining (NHEJ) pathways.

Page 8: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

SNP S l tiSNPs Selection

Known functional SNPs within the DNA double stranded break repair pathway weredouble stranded break repair pathway were selectedAs well as potential functional SNPs such as amino-acid-changing (nonsynonymous) g g ( y y )SNPs (nsSNPs)With a minor allele frequency (MAF) greaterWith a minor allele frequency (MAF) greater than 5%

Page 9: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

SNP S l tiSNPs Selection

189 SNPs analyzed are in or near one of 19189 SNPs analyzed are in or near one of 19 DNA-DSBR genes.

Page 10: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

St d D iStudy DesignSAS 9.1 software will be used for data analysis. ORs and 95% CLs will be computed using p gunconditional logistic regressionPotential confounding factors adjusted: age, g j ggender, ethnicity, educational level and tobacco smoking for lung cancer; age, gender, ethnicity, educational level tobacco smoking alcoholeducational level, tobacco smoking, alcohol drinking and diet for UADT cancerχ2 test is performed to evaluate Hardyχ2 test is performed to evaluate Hardy-Weinberg equilibrium.

Page 11: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

St tifi d A lStratified AnalysesL CLung Cancer:Non-small cell lung carcinoma (NSCLC)g ( )Small cell lung carcinoma (SCLC)

Head and Neck Cancer:Oral cancerOral cancerPharyngeal cancerLaryngeal cancerEsophageal cancerEsophageal cancer

Page 12: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Stratified and Multivariate Analyses

Interaction between DSBR and smoking for lung cancerlung cancerInteraction between DSBR and smoking for UADT cancerInteraction between DSBR and alcoholInteraction between DSBR and alcohol drinking for UADT cancerH l t l iHaplotype analysis

Page 13: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

What are the Genetic Epidemiology Issues?

Population stratification• Variation of SNP frequency by ethnicity• Genomic control parameter will be calculated to

assess the validity of the resultsHi h di i l d tHigh dimensional data• Gene-environment interactions

Interaction of host genetics with environmentInteraction of host genetics with environment• Gene-gene interactions

Interaction of different SNPsMultiple comparisons

Page 14: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Multiple comparisons issue

Page 15: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Hypothesis TestingHypothesis TestingH0 : Null hypotheis vs. H1 : Alternative HypothesisHypothesis

T : test statistics C : critical valueT : test statistics C : critical value

If |T|>C, H0 is rejected. Otherwise H0 is retained| | , j

Ex ) H0 : μ1 = μ2 vs. H1 : μ1 ≠ μ2 T = (x1- x2) / pooledEx ) H0 : μ1 μ2 vs. H1 : μ1 ≠ μ2 T (x1 x2) / pooled se

If |T| > z(1- α/2), H0 is rejected at the significance | | (1 α/2), j glevel α

Page 16: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Hypothesis TestingHypothesis Testing

Hypothesis ResultHypothesis ResultRetained Rejected

T th H0 T ITruth H0 Type I error H1 Type II error

Type I error rate = false positives (α : significance level )level )Type II error rate = false negativesPower : 1 Type II error ratePower : 1–Type II error rate

• P-values : p=inf{α | H0 is rejected at the significance level α }

Page 17: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Issues in Multiple ComparisonIssues in Multiple ComparisonQ : Given n treatments, which two treatments are Q G e t eat e ts, c t o t eat e ts a esignificantly different ? (simultaneous testing)cf) Is treatment A different from treatment B ? )Ex ) m treatment means : μ1,…,μn

Hj : μi = μj where i≠j Tj = (xi- xj) / pooled j μi μj j j ( i j) pSE

• Type I error when testing each at 0.05 significance level one by one : 1 – (0.95)n

• Inflated Type I error, ex) α =1 – (0.95)10 = 0.401263

• Remedies : Bonferroni MethodType I error rate = α / # of comparison

Page 18: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

M lti l C iMultiple Comparisons

Probability of finding a false association by chance = 1 - 0 95nchance = 1 - 0.95• n = 10, p = 40%• n = 100, p = 99.4%

Our data:Our data:• 189 genotypes, 2 cancer sites, 10 Subgroup

analysesanalyses• N = 2268, p = 99.99999%

Page 19: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Type I Error RatesType I Error Rates

Hypothesis ResultHypothesis Result #retained #rejected Total

Truth H0 U V m0Truth H0 U V m0H1 T S m1T t l R RTotal m-R R m

Per-comparison error rate ( PCER ) = E(V) / m p ( ) ( )Per-family error rate ( PFER ) = E(V)Family-wise error rate = pr ( V ≥ 1 ) y p ( )False discovery rate ( FDR ) = E(Q), Q V/R , if R > 0

0, if R = 0 ,

Page 20: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

F l P itiFalse Positives

In the absence of bias, three factors determine the probability that a statisticallydetermine the probability that a statistically significant finding is actually a false-positive fi difindingthe magnitude of the P valuegstatistical powerf ti f t t d h th th t i tfraction of tested hypotheses that is true

Page 21: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

M lti l C iMultiple Comparisons

There is a lack of consensus regarding the optimal approach to address the false-optimal approach to address the false-positive probability of single nucleotide

l hi (SNP) i tipolymorphism (SNP) associations.

Page 22: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Methods for Multiple pComparisons

Ignore itAdjust p-valuesAdjust p-values• Familywise Error Rate (FWER)

Ch f f l itiChance of any false positives• False discovery rate (FDR) Benjamini et al 2001

Use Bayesian methods• False positive report probability (FPRP) Wacholder et al False positive report probability (FPRP) Wacholder et al

2004

Page 23: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

FWER t lli dFWER controlling procedures

Bonferonni• adj Pvalue = min(n*Pvalue 1)• adj Pvalue = min(n Pvalue,1)

Holm (1979)Hochberg (1986)Westfall & Young (1993) maxT and minPWestfall & Young (1993) maxT and minP

Page 24: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

B f i tiBonferroni correction

For testing 500,000 SNPs• 5,000 expected to be significant at p<0.015,000 e pected to be s g ca t at p 0 0• 500 expected to be significant at p<0.001•• ……• 0.05 expected to be significant at p<0.0000001

Suggests setting significance level to α = 10 7*Suggests setting significance level to α = 10-7*Bonferroni correction for m tests

t i ifi l l f l t 0 05 /set significance level for p-values to α = 0.05 / m

Page 25: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Multiple Testing Procedures based on P valuesMultiple Testing Procedures based on P-values that control the family-wise error rate

For a single hypothesis H1, p1=inf{ α | H1 is rejected at the significance level α }If p1 < α, H1 is rejected. Otherwise H1 is retained

Adjusted p-values for multiple testing (p*)pj*=inf{ α | H1 is rejected at FWER=α }j

If pj* < α, Hj is rejected. Otherwise Hj is retained

Single-Step, Step-Down and Step-Up procedure

Page 26: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Single Step ProcedureSingle-Step Procedure

For a strong control of FWERFor a strong control of FWER, single-step Bonferroni adjusted p-values : pj*= min( mpj,1)single-Step Sidak adjsted pvalues : pj*= 1- (1-pj)m

For a weak control of FWERFor a weak control of FWER,single-step minP adjusted p-values

pj*= min 1≤k≤m (Pk ≤ pj | complete null)mpj 1≤k≤m ( k ≤ pj | p )

single-step maxP adjusted p-values p *= max (|T | ≤ C | complete null)mpj = max 1≤k≤m (|Tk| ≤ Cj | complete null)m

Under subset pivotal property, weak control = strong control

Page 27: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Step Down ProcedureStep-Down Procedure

Order the observed unadjusted p-values such that pOrder the observed unadjusted p-values such that pr1≤ pr2 ≤ … ≤ prmAccordingly, order Hr1 ≤ Hr2 ≤ … ≤ Hrmg y, r1 ≤ r2 ≤ ≤ rm

Holm’s procedurej* i { j | / ( j 1) } j t H f j 1 j* 1j* = min { j | prj > α / (m-j+1) }, reject Hrj for j=1, .., j*-1

Adjusted step down Holm’s p valuesAdjusted step-down Holm’s p-valuesprj *= max{ min( (m-k+1) prk , 1) }p *= max{ 1 (1 p )(m-k+1) }prj = max{ 1-(1-prk)(m k 1) }prj *= max{ Pr( min rk<l<rm Pl ≤ prk | complete null) }p j *= max{ Pr( max k<l< |Tl| ≤ C k | complete null) }prj max{ Pr( max rk<l<rm |Tl| ≤ Crk | complete null) }

Page 28: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Step Up ProcedureStep-Up Procedure

Order the observed unadjusted p-values such that pOrder the observed unadjusted p-values such that pr1≤ pr2 ≤ … ≤ prmAccordingly, order Hr1 ≤ Hr2 ≤ … ≤ Hrmg y, r1 ≤ r2 ≤ ≤ rm

j* = max { j | prj ≤ α / (m-j+1) }, reject Hrj for j=1, .., j*

Adjusted step-down Holm’s p-valuesp *= min{ min( (m k+1) p 1) }prj *= min{ min( (m-k+1) prk , 1) }

Page 29: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Resampling MethodResampling Method

Bootstrap or permutation based method

For the bth permutation, b=1, …, B, compute test statistics t1,b, …, tm,b, ,

prj *= ∑j=1B I (| tj,b | ≥ Cj ) / B

ex ) Colub (1999)

Page 30: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Resampling MethodResampling MethodEfron et al. (2000) and Tusher et al. (2001)

Compute a test statistics tj for each gene j and define order statistics t(j) such that t(1) ≥ t(2) ≥ .. ≥ t(m)( ) ( ) ( )For each b permutation, b=1, ..,B, compute the test statistics and define the order statistics t(1),b ≥ t(2),b ≥ .. ≥ t(m) b(m),bFrom the permutations, estimate the expected value (under the complete null) of the order statistics by t*(j)= ∑t(j) b /Bt(j),b /BForm a Q-Q plot of the observed t(j) vs. the expected t*(j)

Efron et al for a fixed threshold Δ genes with |t t* | ≥ Efron et al. – for a fixed threshold Δ, genes with |t(j)-t (j)| ≥ ΔTusher et al. - for a fixed threshold Δ, let j*=max{j: t(j)-t*(j) ≥ Δ t* 0}Δ, t*(j) > 0}

Page 31: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

M lti l T ti C ti O tiMultiple Testing Correction OptionsWithout consideration of prior probability

Family-wise error rate (FWER)• Very conservative and does not tolerate any false positivesFalse Discovery Rate (FDR) y ( )• Rate False positives a percentage of called gene No correction• False positives a percentage of genes being tested

Page 32: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

th F l Di R t (FDR)the False Discovery Rate (FDR)

FDR is the expected ratio of erroneous rejections of the null hypothesis to the totalrejections of the null hypothesis to the total number of rejected hypotheses among the SNP l d i thi tSNPs analyzed in this report.

Page 33: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

A Measure Attached to Each Individual Association----Q Value

E t d ti f f l itiExpected proportion of false positives incurred when calling that association significant.

Page 34: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Comparison of p-value and q-p p qvalue

p value q valuep-value q-value

P( ll f t b i E t d ti f f lP(a null feature being as or more extreme than the observed one)

Expected proportion of false positives among all features as or more extreme than theobserved one) as or more extreme than the oberved one

Page 35: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Q l S ftQ-value Software

http://faculty washington edu/~jstorey/qvalue/http://faculty.washington.edu/~jstorey/qvalue/

Page 36: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

I DSBR St dIn DSBR Study

Bootstrap estimation method will be used to provide for each hypothesis test a q-valueprovide for each hypothesis test a q-value, which estimates the minimum FDR that can b tt i d h ll t t ith lbe attained when all tests with lower or equal p-values are called significantThis statistical procedure is appropriate to adjust for multiple testing in large scaleadjust for multiple testing in large scale association studies

Page 37: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

th F l P iti R t P b bilitthe False-Positive Report Probability (FPRP)(FPRP)

FPRP is the probability of no trueFPRP is the probability of no true association between a genetic variant and disease given a statistically significantdisease given a statistically significant finding

Page 38: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

D t i t f FPRPDeterminants of FPRP

1) prior probability of a true association2) observed P value2) observed P value3) statistical power to detect the odds ratio ) pof the alternative hypothesis at the given level or P valuelevel or P value

Page 39: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

I DSBR St dIn DSBR Study

Will be applied on a range of prior probabilities (i e 0 01 to 0 25)probabilities (i.e. 0.01 to 0.25)A FPRP criteria of 0.2 will be used to identify which, if any, findings should be considered noteworthyy

Page 40: Multiple Comparisons Methods in Genetic Eid il ... 10 False... · Multiple Comparisons Methods in Genetic Eid il StdiEpidemiology Studies Yi Ren Wang, MPH Department of Epidemiology

Thank you!Thank you!


Recommended