Simulating and Modeling Genetically Informative Data

Post on 12-Jan-2016

34 views 0 download

Tags:

description

Simulating and Modeling Genetically Informative Data. Matthew C. Keller Sarah E. Medland. Outline. The usefulness of simulation in behavioral genetics Using GeneEvolve to simulate genetically informative data Practical simulating different designs Classical Twin Design (CTD) - PowerPoint PPT Presentation

transcript

Simulating and Modeling Genetically Informative Data

Matthew C. Keller

Sarah E. Medland

The usefulness of simulation in behavioral The usefulness of simulation in behavioral geneticsgenetics

Using GeneEvolve to simulate genetically Using GeneEvolve to simulate genetically informative datainformative data

Practical simulating different designsPractical simulating different designs1.1. Classical Twin Design (CTD)Classical Twin Design (CTD)

2.2. Nuclear Twin Family Design (NTFD)Nuclear Twin Family Design (NTFD)

Outline

Independent check of models. Especially Independent check of models. Especially important for complex (e.g., extended twin family) important for complex (e.g., extended twin family) models. models.

Model verificationModel verification: Check that your models work as they are : Check that your models work as they are supposed to.supposed to.

Sensitivity analysisSensitivity analysis: Check the effect on parameter estimates : Check the effect on parameter estimates when assumptions are violated (e.g., different modes of when assumptions are violated (e.g., different modes of assortative mating, vertical transmission, or genetic action). assortative mating, vertical transmission, or genetic action).

Method for predicting complex dynamics in Method for predicting complex dynamics in population geneticspopulation genetics

Simulation provides knowledge about processes that are difficult/impossible to

figure out analytically

Using complex models without independent verification (e.g., simulation) is like…

QuickTime™ and a decompressor

are needed to see this picture.

Process of model verification

1.1. Simulate a dataset that has parameters that your Simulate a dataset that has parameters that your model can estimate.model can estimate.

2.2. Run your model on the simulated datasetRun your model on the simulated dataset

3.3. Obtain and store parameter estimatesObtain and store parameter estimates

4.4. Repeat steps 1-3 many (e.g., 1000) times Repeat steps 1-3 many (e.g., 1000) times

Results of model verification

If the mean parameter estimate = the simulated If the mean parameter estimate = the simulated parameter estimate, the estimate is parameter estimate, the estimate is unbiased. unbiased. If If your model has no mistakes, parameters should your model has no mistakes, parameters should generally be unbiased (there are exceptions)generally be unbiased (there are exceptions)

The standard deviation of an estimates The standard deviation of an estimates corresponds to its corresponds to its standard errorstandard error and its and its distribution to its distribution to its sampling distributionsampling distribution

You can also easily study the You can also easily study the multivariate multivariate sampling distribution and statisticssampling distribution and statistics. E.g., how . E.g., how correlated parameters are.correlated parameters are.

Process of sensitivity analysis

1.1. Simulate a dataset that has one or more parameters Simulate a dataset that has one or more parameters that your model that your model cannotcannot estimate. estimate.

2.2. Run your model on the simulated datasetRun your model on the simulated dataset

3.3. Obtain and store parameter estimatesObtain and store parameter estimates

4.4. Repeat steps 1-3 many (e.g., 1000) times Repeat steps 1-3 many (e.g., 1000) times

Results of sensitivity analysis

Because we are simulating Because we are simulating violations of violations of assumptionsassumptions, we expect parameters to be biased, we expect parameters to be biased . . The question becomes: The question becomes: howhow biased? I.e., how big biased? I.e., how big of a deal are these violations? We should be able of a deal are these violations? We should be able to quantify the answers to these questions.to quantify the answers to these questions.

Reality: A=.4, D=.15, S=.15

A,D, & F estimates are highly correlated in Stealth & Cascade models

Simulation is not a panacea Simulation can be said to provide “knowledge Simulation can be said to provide “knowledge

without understanding.” It is a helpful tool for without understanding.” It is a helpful tool for understanding, but doesn’t provide understanding understanding, but doesn’t provide understanding in and of itself.in and of itself.

Simulations themselves rely on assumptions about Simulations themselves rely on assumptions about how processes work. If these are wrong, our how processes work. If these are wrong, our simulation results may not reflect reality. simulation results may not reflect reality.

Simulation program: GeneEvolve

Implemented in R, open-source, user modifiableImplemented in R, open-source, user modifiable User specifies 31 basic parameters up front (and User specifies 31 basic parameters up front (and

17 advanced ones); no need to alter script after 17 advanced ones); no need to alter script after that.that.

Fast Fast (on AMD Opteron 3.2GHz dual 64 bit processor, 2GB RAM; OS= RHEL AS4)(on AMD Opteron 3.2GHz dual 64 bit processor, 2GB RAM; OS= RHEL AS4) 10 genes, N=20,000 takes ~ 20 seconds/gen10 genes, N=20,000 takes ~ 20 seconds/gen

GeneEvolve 0.73

Download: www.matthewckeller.com

User specifies:User specifies: population size, # generations for population to population size, # generations for population to

evolve, threshold effects, mechanisms of evolve, threshold effects, mechanisms of assortative mating, vertical transmission, etc.assortative mating, vertical transmission, etc.

3 types of genetic effects3 types of genetic effects 5 types of environmental effects5 types of environmental effects 13 types of moderator/covariate effects13 types of moderator/covariate effects

How GeneEvolve works:

Download: www.matthewckeller.com

Diagram of GeneEvolve Model

PMa

Sa

Dd

Ee s

A

q

x

w

f

F

PFa

Sa

D d

Ees

A

q

x

w

f

F

mm

PT1

Sa

Dd

Ee s

A

f

F

PT2

Sa

D d

Ees

A

f

F

mm

zd

zs

µ

AxAAxA

AxAAxA

aa aa

aaaa

zaa

11 age

AL AS

P

βAL

1 1

βAS

βageβ01

A

P

βA+ βint(age)

β0

1

Diagram: GeneEvolve Age-by-A Interactions

r

Purcell Model: Our Model:

open Purcell-vs-Ours.pdf; Purcell-vs-OursCorrelation.pdf

At adulthood, ~ x% find mates s.t. phenotypic At adulthood, ~ x% find mates s.t. phenotypic correlation b/w mating phenotypes = AM:correlation b/w mating phenotypes = AM:

Pairs have children :Pairs have children : Rate Rate determined by user-specified population growthdetermined by user-specified population growth

Process iterated Process iterated nn times times

How GeneEvolve works (cont):

Download: www.matthewckeller.com

After After n n iterations, population split into two:iterations, population split into two: Parents of spousesParents of spouses Parents of twinsParents of twins

Parents of twins have offspring (MZ/DZ twins & Parents of twins have offspring (MZ/DZ twins & their sibs)their sibs)

Twins mate with spousal population & have Twins mate with spousal population & have offspringoffspring

How GeneEvolve works (cont):

Download: www.matthewckeller.com

3 generations of phenotypic data written out (one 3 generations of phenotypic data written out (one row per family), potentially across repeated row per family), potentially across repeated measuresmeasures

This data (& subsets of it) can be entered into This data (& subsets of it) can be entered into structural models for model verification and structural models for model verification and sensitivity analysissensitivity analysis

A summary PDF at end showsA summary PDF at end shows: : Basic simulation statisticsBasic simulation statistics Changes in variance components across timeChanges in variance components across time Correlations between 10 relative typesCorrelations between 10 relative types

What you get:

Download: www.matthewckeller.com

SEM is great because…SEM is great because… Directs focus to effect sizes, not “significance” Directs focus to effect sizes, not “significance” Forces consideration of causes and consequencesForces consideration of causes and consequences Explicit disclosure of assumptionsExplicit disclosure of assumptions

Potential weakness…Potential weakness… Parameter reification: “Using the CTD we found that 50% Parameter reification: “Using the CTD we found that 50%

of variation is due to A and 20% to C.” of variation is due to A and 20% to C.”

Structural Equation Modeling (SEM) in BG

SEM is great because…SEM is great because… Directs focus to effect sizes, not “significance” Directs focus to effect sizes, not “significance” Forces consideration of causes and consequencesForces consideration of causes and consequences Explicit disclosure of assumptionsExplicit disclosure of assumptions

Potential weakness…Potential weakness… Parameter reification: “Using the CTD we found that 50% Parameter reification: “Using the CTD we found that 50%

of variation is due to A and 20% to C.” of variation is due to A and 20% to C.”

Structural Equation Modeling (SEM) in BG

NO! Only true under strong assumptions that probably aren’t

met (e.g., D=0) and usually go untested. To the degree

assumptions wrong, estimates are biased.

PT1

Ca

Dd

Ee c

A

PT2

Ca

D d

Eec

A1/.25

1

Classical Twin Design (CTD)

PT1

Ca

Dd

Ee c

A

PT2

Ca

D d

Eec

A1/.25

1

Classical Twin Design (CTD) Assumption biased up biased downAssumption biased up biased down

Either D or C is zero A C & DEither D or C is zero A C & D

No assortative mating C DNo assortative mating C D

No A-C covariance C D & ANo A-C covariance C D & A

Adding parents gets us around all these assumptions

Assumption biased up biased downAssumption biased up biased downEither D or C is zeroEither D or C is zero

No assortative matingNo assortative mating

No A-C covarianceNo A-C covariance

PMa

Ca

Dd

Ee c

A

qw

PFa

Ca

D d

Eec

A

qw

m

m

PT1

Ca

Dd

Ee c

A

PT2

Ca

Dd

Eec

A

mm

1/.25

µ

We don’t have to

make these

x x

With parents, we can break “C” up into:

S = env. factors shared only between sibs

F = familial env factors passed from parents to offspring

F

SC

PT1

SaDd

Ee s

A

f

F

PT2

S aD d

Ees

A

f

F1/.25

1

Parents also allow differentiation of S & F

PT1

Ca

Dd

Ee c

A

PT2

Ca

D d

Eec

A1/.25

1

Nuclear Twin Family Design (NTFD)

Assumptions:Assumptions: Only can estimate 3 of 4: A, D, S, and F (bias is variable)Only can estimate 3 of 4: A, D, S, and F (bias is variable) Assortative mating due to primary phenotypic assortment (bias is variable)Assortative mating due to primary phenotypic assortment (bias is variable)

Note: m estimated and

f fixed to 1

PMa

Sa

Dd

Ee s

A

q

x

w

f

F

PFa

Sa

D d

Ees

A

q

x

w

f

F

mm

PT1

Sa

Dd

Ee s

A

f

F

PT2

Sa

D d

Ees

A

f

F

mm

zd

zs

µ

Stealth

Include twins and their sibs, parents, spouses, and Include twins and their sibs, parents, spouses, and offspring…offspring… Gives 17 unique covariances (MZ, DZ, Sib, P-O, Spousal, Gives 17 unique covariances (MZ, DZ, Sib, P-O, Spousal,

MZ avunc, DZ avunc, MZ cous, DZ cous, GP-GO, and 7 in-MZ avunc, DZ avunc, MZ cous, DZ cous, GP-GO, and 7 in-laws) laws)

88 covariances with sex effects88 covariances with sex effects

can be estimated simultaneously

= env. factors shared only between twins

PT1

Sa

Dd

Ee s

A

f

F

PT2

Sa

D d

Ees

A

f

F1/.25

1

Additional obs. covs with Stealth allow estimation of A, S, D, F, T

Tt

d

T t

1/0

T

(Remember: we’re not just estimating more effects. More importantly, we’re

reducing the bias in estimated effects!)

FS DA T

Stealth

PMa

Sa

Dd

T

E

t

e s

A

q

x

w

f

F

PFa

Sa

D d

T

E

t

es

A

q

x

w

f

F

mm

PT1

Sa

Dd

T

E

t

e s

A

f

F

PMa

Sa

Dd

T

E

t

e s

A

q

x

w

f

F

PT2

Sa

D d

T

E

t

es

A

f

F

PFa

Sa

D d

T

E

t

es

A

q

x

w

f

F

mm

PCh

Sa

D d

T

E

t

es

A

f

F

mm

mm

PCh

Sa

Dd

T

E

t

e s

A

f

F

1/0

1/.25

1

µ

µ

µ

Stealth

Assumption biased up biased downAssumption biased up biased downPrimary assortative mating A, D, or F A, D, or FPrimary assortative mating A, D, or F A, D, or F

No epistasis A, D SNo epistasis A, D S

No AxAge D, S ANo AxAge D, S A

Stealth

Assumption biased up biased downAssumption biased up biased downPrimary assortative mating A, D, or F A, D, or FPrimary assortative mating A, D, or F A, D, or F

No epistasis A, D SNo epistasis A, D S

No AxAge D, S ANo AxAge D, S A

Primary AM: mates choose each other based on Primary AM: mates choose each other based on phenotypic similarityphenotypic similarity

Social homogamy: mates choose each other due to Social homogamy: mates choose each other due to environmental similarity (e.g., religion)environmental similarity (e.g., religion)

Convergence: mates become more similar to each Convergence: mates become more similar to each other (e.g., becoming more conservative when other (e.g., becoming more conservative when dating a conservative)dating a conservative)

PMa

Sa

Dd

T

E

t

e

PMa

s

e

a

A

q

x

wf

f

~t~

~

~~d~s~

F

µ

PFa

Sa

D d

T

E

t

e

PFa

s

e

a

A

q

x

wf

f

~t~

~

~~

d~ s~

F

mm

PT1

Sa

Dd

T

E

t

e

PT1

s

e

a

Af

f

~

t~

~

~~

d~s~F

PMa

Sa

Dd

T

E

t

e

PSp

s

e

a

A

q

x

wf

f

~t~

~

~~d~s~

F

PT2

Sa

D d

T

E

t

e

PT2

s

e

a

Af

f

~

t~

~

~~

d~s~ F

PFa

Sa

D d

T

E

t

e

PSp

s

e

a

A

q

x

wf

f

~t~

~

~~

d~ s~

F

mm

PCh

Sa

D d

T

E

t

es

A

f

F

µ µ

mm

mm

PCh

Sa

Dd

T

E

t

e s

A

f

F

1/0

1/.25

1

Cascade

Reality: A=.5, D=.2

Reality: A=.5, S=.2

Reality: A=.4, D=.15, S=.15

Reality: A=.35, D=.15, F=.2, S=.15, T=.15, AM=.3

Reality: A=.45, D=.15, F=.25, AM=.3 (Soc Hom)

Reality: A=.4, A*A=.15, S=.15

Reality: A=.4, A*Age=.15, S=.15

All models require assumptions. Generally, more All models require assumptions. Generally, more assumptions = more biased estimatesassumptions = more biased estimates

For the first time, we have demonstrated For the first time, we have demonstrated independent assessments of the NTFD, independent assessments of the NTFD, StealthStealth, , and and CascadeCascade models models These complicated models work as designed!These complicated models work as designed! In all models, but especially the CTD, please In all models, but especially the CTD, please

don’t REIFY A, C, & D!don’t REIFY A, C, & D!

Conclusions

Those who conceived of these models originally:Those who conceived of these models originally: Jinks, Fulker, Eaves, Cloninger, Reich, Rice, Jinks, Fulker, Eaves, Cloninger, Reich, Rice,

Heath, Neale, Maes, etc.Heath, Neale, Maes, etc. And to Nick Martin: for his energy and And to Nick Martin: for his energy and

enthusiasm, and for encouraging us to do this to enthusiasm, and for encouraging us to do this to begin withbegin with

Acknowledgments

Check bias & identification:Check bias & identification: Feed PE parameters you are modeling, simulate data, Feed PE parameters you are modeling, simulate data,

& see if your model recovers the parameters& see if your model recovers the parameters Check model’s sensitivity to assumptions:Check model’s sensitivity to assumptions:

Simulate violations of assumptions & note its effects Simulate violations of assumptions & note its effects on estimateson estimates

Estimate power & multivariate sampling dist’s of Estimate power & multivariate sampling dist’s of estimates under very general conditions:estimates under very general conditions:

Run PE multiple times given whatever condition you Run PE multiple times given whatever condition you wantwant

Why use it? Modeling aid

Download: www.matthewckeller.com

Find changes in variance parameters & relative Find changes in variance parameters & relative covariances under different modes of AM, VT, & covariances under different modes of AM, VT, & genetic effects:genetic effects:

Simulate random genetic drift by varying Simulate random genetic drift by varying population sizepopulation size

Introduce selection (coming) to test theories on Introduce selection (coming) to test theories on maintenance of genetic variationmaintenance of genetic variation

Why use it? Predictor of population / evolutionary genetics dynamics

Download: www.matthewckeller.com