DNA Identification: Mixture Weight & Inference Cybergenetics © 2003-2010 Mark W Perlin, PhD, MD,...

Post on 26-Mar-2015

222 views 0 download

Tags:

transcript

DNA Identification:Mixture Weight & Inference

Cybergenetics © 2003-2010Cybergenetics © 2003-2010

Mark W Perlin, PhD, MD, PhDMark W Perlin, PhD, MD, PhDCybergenetics, Pittsburgh, PACybergenetics, Pittsburgh, PA

TrueAlleleTrueAllele®® Lectures LecturesFall, 2010Fall, 2010

Mixture Weight: Uncertain Quantity

Infer mixture weight fromSTR experiments:

• quantitative peak data• contributor genotypes

Pr(W=w | data, G1=g1, G2=g2, …)hierarchical Bayesian model

Perlin MW, Legler MM, Spencer CE, Smith JL, Allan WP, Belrose JL, Duceman BW. Validating TrueAllele® DNA mixture interpretation. Journal of Forensic Sciences. 2011;56(November):in press.

Mixture Weight Model

template weight

locus weights

Wk

Wk,1 Wk,2 Wk,N…

kth contributor

W0prior probability

experiment data dk,1 dk,2 dk,N…

Experiment Estimate

wk

sum of peak heightsfrom kth contributor

sum of peak heightsfrom all contributors

=

D16S539

11 12 13

Three Alleles

Allele111213

Quantity500

6,750250

11 12 13

Experiment Estimate

Allele111213

Quantity500

6,750250

500 + 250

500 + 6,750 + 250

750

7,500= 10%=

D7S820

Four Alleles

Allele8

101213

Quantity4,000

2503,000

250

10 128 13

Experiment Estimate

10 128 13

250 + 250

4,000 + 250 + 3,000 + 250

500

7,500= 6.7%=

Allele8

101213

Quantity4,000

2503,000

250

Overlapping Alleles

Template Average

mean

variance

Template Mixture Weight Probability Distribution

mean = 6.7%

standarddeviation

= 0.9%

Central Limit Theorem

• more data experiments for a template provide greater mixture weight precision

• double the precision by doing four times the number of experiments

• combine evidence from multiple experiments to obtain a more informative result

Probability Solution

w | d, g1, g2, …

g1 | d, g2, w, …

g2 | d, g1, w, …

interacting random variables

find probability distributions by iterative sampling

zi | d, g1, g2, w, …

Gelfand, A. and Smith, A. (1990). Sampling based approaches to calculating marginal densities. J. American Statist. Assoc., 85:398-409.

Markov Chain Monte Carlo

gk,l :fi2, i=j

2fifj, i≠j⎧⎨⎪⎩⎪

w: Dir1( )ml : N+ 5000, 50002( )σ−2:Gam10, 20( )τ−2:Gam10, 500( )ψ−2:Gam12, 1200( )

Prior Probability

genotype

mixture weight

varianceparameters

DNA quantity

Σl =σ 2 ⋅Vl +τ 2 wl : N0,1[ ]K−1 w, ψ2⋅I( )

μl =ml ⋅ wk,l ⋅gk,lk=1

K

∑ dl : N+ μl,Σl( )

Joint Likelihood Function

data

pattern

variation

Prσ2=s2d1,d2,...,dj,...{ }∝Prσ2=s2{ }⋅ Prdjσ2=s2,...{ }j=1

J∏Prτ2=t2d1,d2,...,dj,...{ }∝Prτ2=t2{ }⋅ Prdjτ2=t2,...{ }

j=1

J∏

PrW=w|d1,d2K,dJ,K{ }∝PrW=w{ }⋅ Prdj|W=w,K{ }

j=1

J∏

PrQ=x|dl,1,dl,2,...,dl,z,...{ }∝PrQ=x{ }⋅ Prdl,i|Q=x,...{ }i=1

I∏Posterior Probability

genotype

mixture weight

data variation

Generally Accepted Method

genotype

mixture weight

data variation

James Curran. A MCMC method for resolving two person mixtures. Science & Justice.

2008;48(4):168-77.

Hierarchical Bayesian Model with MCMC Solution

• standard approach in modern science• describes uncertainty using probability• the "new calculus"• replaces hard calculus with easy computing• can solve virtually any problem• well-suited to interpreting DNA evidence