+ All Categories
Home > Documents > RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With...

RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With...

Date post: 22-Dec-2015
Category:
View: 215 times
Download: 2 times
Share this document with a friend
Popular Tags:
18
RECOMB Satellite Workshop , 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis
Transcript

RECOMB Satellite Workshop, 2007

Algorithms for Association Mapping of Complex Diseases With

Ancestral Recombination Graphs

Yufeng Wu

UC Davis

2

Association (or LD) Mapping

• Given a subset of SNPs from unrelated individuals, find unobserved genetic variations that strongly discriminate individuals with the trait (cases) and those without the trait (controls)

• Complex Diseases: difficult to map

3

Illustration (Zollner and Pritchard, Genetics, 2005)

Cases

ControlsSNP markers

1: 0011012: 1100003: 0011104: 0010005: 0000106: 1111017: 1000118: 1100019: 11001010: 10001111: 01000012: 101101

4

Some Challenges in Association Mapping

1 2

5

The Genealogy Approach

• “..the best information that we could possibly get about association is to know the full coalescent genealogy…” – Zollner and Pritchard

• Goal: infer genealogy from marker data with recombination– Approximation (e.g. in Zollner and Pritchard)

6

Ancestral Recombination Graph (ARG)

10 01 00

S1 = 00S2 = 01S3 = 10S4 = 10

MutationsS1 = 00S2 = 01S3 = 10S4 = 11

10 01 0011

Recombination

Assumption:

at most one mutation per site

1 0 0 1

1 1

7

Full-ARG Approaches

• First full ARG mapping method (Minichiello and Durbin)– Use full plausible ARG, but heuristic– Less complex disease model

• Our results (Wu, 2007)– Sampling full ARGs with provable property, and work

on more complex disease model– Focus on parsimonious history

• minARGs: ARGs that use the minimum number of recombinations

• Near minimum ARGs

– Uniform sampling of minARGs

8

Special Case: ARG with Only Input Sequences

• Self-derivability (SD) Problem: construct an ARG with only the input sequences

• In fact, such ARG, if exits, must be a minARG

• Runs in O(2n) time

• Heuristics to extend to non-self-derivable data

9

00000

01000

01100

01101

11000

00010

11011

00011 1 2

00000

01000

01100

01101

11000

00010

00011

11011

N1=164

00000

01000

01100

11000

00010

11011

00011

01101

N2=76N = 164*1 + 76*2

= 316

Counting Self-derived ARGs

00000

01000

01100

01101

11000

00010

11011

00011 1 2

00000

01000

01100

01101

11000

00010

00011

11011

164

00000

01000

01100

11000

00010

11011

00011

01101

76

1. Random value Rnd = 0.3 < 0.52

316

Select 11011 with prob = 164/316 = 0.52, and 01101 with prob = 76*2/316 = 0.48

2. Pick seq = 11011 as last row to derive

3. Move to reduced matrix

11

ARGs Represents a Set of Marginal Trees

• Clear separation of cases/controls: NOT expected for complex diseases!

12

Disease Model (Zollner & Pritchard)

Disease mutations: Poisson Process

Two alleles: wild-type and mutant

0.05

0.05

0.05 0.05

0.1

0.1

0.050.05

13

Disease Penetrance (Zollner & Pritchard)

PA,1: probability of a mutant sequence becomes a casePC,1 = 1.0 - PA,1

PA,0: probability of a wild-type sequence becomes a casePC,0 = 1.0 - PA,0

0.05

0.05

0.05 0.05

0.1

0.1

0.050.05

Case

Control

14

Phenotype Likelihood (Zollner and Pritchard)

• Given a tree Tx at position x and case/control phenotype of its leaves, what is the probability Pr( | Tx) of observing on Tx? (Zollner & Pritchard)

– Sum over all subset of mutated edges

• Adopted in this work

15

Expected Phenotype Likelihood

• Need for assessing statistical significance.• Null model: randomly permute case/control

labels.• Our result: O(n3) algorithm for computing

expected value of phenotype likelihood.– Exact, fully deterministic method.

16

Diploid Penetrance

Diploid: two sequences per individual

Diploid enetrance:

PA,00: prob. Individual with two wild-type sequences becomes a case

PA,01 : …, PA,11: …

Case

Control

Efficient computation of phenotype likelihood: stated but unresolved in Zollner and Pritchard

Our result (Wu, 2007): computing phenotype likelihood with diploid penetrance is NP-hard

17

Simulation Results

Comparison: TMARG (uniform), TMARG (pathway), LATAG, MARGARITA

50 ARGs per data

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

Uniform Pathway LATAG MARGARITA

50/5000 ARGs per data

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

n50 n5000 LATAG MARGRITA

18

Acknowledgement

• Software available at: http://wwwcsif.cs.ucdavis.edu/~wuyu

• I want to thank– Dan Gusfield– Dan Brown– Chuck Langley– Yun S. Song


Recommended