+ All Categories
Home > Documents > Simulating and Modeling Genetically Informative Data

Simulating and Modeling Genetically Informative Data

Date post: 12-Jan-2016
Category:
Upload: hollye
View: 34 times
Download: 0 times
Share this document with a friend
Description:
Simulating and Modeling Genetically Informative Data. Matthew C. Keller Sarah E. Medland. Outline. The usefulness of simulation in behavioral genetics Using GeneEvolve to simulate genetically informative data Practical simulating different designs Classical Twin Design (CTD) - PowerPoint PPT Presentation
Popular Tags:
45
Simulating and Modeling Genetically Informative Data Matthew C. Keller Sarah E. Medland
Transcript
Page 1: Simulating and Modeling Genetically Informative Data

Simulating and Modeling Genetically Informative Data

Matthew C. Keller

Sarah E. Medland

Page 2: Simulating and Modeling Genetically Informative Data

The usefulness of simulation in behavioral The usefulness of simulation in behavioral geneticsgenetics

Using GeneEvolve to simulate genetically Using GeneEvolve to simulate genetically informative datainformative data

Practical simulating different designsPractical simulating different designs1.1. Classical Twin Design (CTD)Classical Twin Design (CTD)

2.2. Nuclear Twin Family Design (NTFD)Nuclear Twin Family Design (NTFD)

Outline

Page 3: Simulating and Modeling Genetically Informative Data

Independent check of models. Especially Independent check of models. Especially important for complex (e.g., extended twin family) important for complex (e.g., extended twin family) models. models.

Model verificationModel verification: Check that your models work as they are : Check that your models work as they are supposed to.supposed to.

Sensitivity analysisSensitivity analysis: Check the effect on parameter estimates : Check the effect on parameter estimates when assumptions are violated (e.g., different modes of when assumptions are violated (e.g., different modes of assortative mating, vertical transmission, or genetic action). assortative mating, vertical transmission, or genetic action).

Method for predicting complex dynamics in Method for predicting complex dynamics in population geneticspopulation genetics

Simulation provides knowledge about processes that are difficult/impossible to

figure out analytically

Page 4: Simulating and Modeling Genetically Informative Data

Using complex models without independent verification (e.g., simulation) is like…

QuickTime™ and a decompressor

are needed to see this picture.

Page 5: Simulating and Modeling Genetically Informative Data

Process of model verification

1.1. Simulate a dataset that has parameters that your Simulate a dataset that has parameters that your model can estimate.model can estimate.

2.2. Run your model on the simulated datasetRun your model on the simulated dataset

3.3. Obtain and store parameter estimatesObtain and store parameter estimates

4.4. Repeat steps 1-3 many (e.g., 1000) times Repeat steps 1-3 many (e.g., 1000) times

Page 6: Simulating and Modeling Genetically Informative Data

Results of model verification

If the mean parameter estimate = the simulated If the mean parameter estimate = the simulated parameter estimate, the estimate is parameter estimate, the estimate is unbiased. unbiased. If If your model has no mistakes, parameters should your model has no mistakes, parameters should generally be unbiased (there are exceptions)generally be unbiased (there are exceptions)

The standard deviation of an estimates The standard deviation of an estimates corresponds to its corresponds to its standard errorstandard error and its and its distribution to its distribution to its sampling distributionsampling distribution

You can also easily study the You can also easily study the multivariate multivariate sampling distribution and statisticssampling distribution and statistics. E.g., how . E.g., how correlated parameters are.correlated parameters are.

Page 7: Simulating and Modeling Genetically Informative Data

Process of sensitivity analysis

1.1. Simulate a dataset that has one or more parameters Simulate a dataset that has one or more parameters that your model that your model cannotcannot estimate. estimate.

2.2. Run your model on the simulated datasetRun your model on the simulated dataset

3.3. Obtain and store parameter estimatesObtain and store parameter estimates

4.4. Repeat steps 1-3 many (e.g., 1000) times Repeat steps 1-3 many (e.g., 1000) times

Page 8: Simulating and Modeling Genetically Informative Data

Results of sensitivity analysis

Because we are simulating Because we are simulating violations of violations of assumptionsassumptions, we expect parameters to be biased, we expect parameters to be biased . . The question becomes: The question becomes: howhow biased? I.e., how big biased? I.e., how big of a deal are these violations? We should be able of a deal are these violations? We should be able to quantify the answers to these questions.to quantify the answers to these questions.

Page 9: Simulating and Modeling Genetically Informative Data

Reality: A=.4, D=.15, S=.15

Page 10: Simulating and Modeling Genetically Informative Data

A,D, & F estimates are highly correlated in Stealth & Cascade models

Page 11: Simulating and Modeling Genetically Informative Data

Simulation is not a panacea Simulation can be said to provide “knowledge Simulation can be said to provide “knowledge

without understanding.” It is a helpful tool for without understanding.” It is a helpful tool for understanding, but doesn’t provide understanding understanding, but doesn’t provide understanding in and of itself.in and of itself.

Simulations themselves rely on assumptions about Simulations themselves rely on assumptions about how processes work. If these are wrong, our how processes work. If these are wrong, our simulation results may not reflect reality. simulation results may not reflect reality.

Page 12: Simulating and Modeling Genetically Informative Data

Simulation program: GeneEvolve

Page 13: Simulating and Modeling Genetically Informative Data

Implemented in R, open-source, user modifiableImplemented in R, open-source, user modifiable User specifies 31 basic parameters up front (and User specifies 31 basic parameters up front (and

17 advanced ones); no need to alter script after 17 advanced ones); no need to alter script after that.that.

Fast Fast (on AMD Opteron 3.2GHz dual 64 bit processor, 2GB RAM; OS= RHEL AS4)(on AMD Opteron 3.2GHz dual 64 bit processor, 2GB RAM; OS= RHEL AS4) 10 genes, N=20,000 takes ~ 20 seconds/gen10 genes, N=20,000 takes ~ 20 seconds/gen

GeneEvolve 0.73

Download: www.matthewckeller.com

Page 14: Simulating and Modeling Genetically Informative Data

User specifies:User specifies: population size, # generations for population to population size, # generations for population to

evolve, threshold effects, mechanisms of evolve, threshold effects, mechanisms of assortative mating, vertical transmission, etc.assortative mating, vertical transmission, etc.

3 types of genetic effects3 types of genetic effects 5 types of environmental effects5 types of environmental effects 13 types of moderator/covariate effects13 types of moderator/covariate effects

How GeneEvolve works:

Download: www.matthewckeller.com

Page 15: Simulating and Modeling Genetically Informative Data

Diagram of GeneEvolve Model

PMa

Sa

Dd

Ee s

A

q

x

w

f

F

PFa

Sa

D d

Ees

A

q

x

w

f

F

mm

PT1

Sa

Dd

Ee s

A

f

F

PT2

Sa

D d

Ees

A

f

F

mm

zd

zs

µ

AxAAxA

AxAAxA

aa aa

aaaa

zaa

Page 16: Simulating and Modeling Genetically Informative Data

11 age

AL AS

P

βAL

1 1

βAS

βageβ01

A

P

βA+ βint(age)

β0

1

Diagram: GeneEvolve Age-by-A Interactions

r

Purcell Model: Our Model:

open Purcell-vs-Ours.pdf; Purcell-vs-OursCorrelation.pdf

Page 17: Simulating and Modeling Genetically Informative Data

At adulthood, ~ x% find mates s.t. phenotypic At adulthood, ~ x% find mates s.t. phenotypic correlation b/w mating phenotypes = AM:correlation b/w mating phenotypes = AM:

Pairs have children :Pairs have children : Rate Rate determined by user-specified population growthdetermined by user-specified population growth

Process iterated Process iterated nn times times

How GeneEvolve works (cont):

Download: www.matthewckeller.com

Page 18: Simulating and Modeling Genetically Informative Data

After After n n iterations, population split into two:iterations, population split into two: Parents of spousesParents of spouses Parents of twinsParents of twins

Parents of twins have offspring (MZ/DZ twins & Parents of twins have offspring (MZ/DZ twins & their sibs)their sibs)

Twins mate with spousal population & have Twins mate with spousal population & have offspringoffspring

How GeneEvolve works (cont):

Download: www.matthewckeller.com

Page 19: Simulating and Modeling Genetically Informative Data

3 generations of phenotypic data written out (one 3 generations of phenotypic data written out (one row per family), potentially across repeated row per family), potentially across repeated measuresmeasures

This data (& subsets of it) can be entered into This data (& subsets of it) can be entered into structural models for model verification and structural models for model verification and sensitivity analysissensitivity analysis

A summary PDF at end showsA summary PDF at end shows: : Basic simulation statisticsBasic simulation statistics Changes in variance components across timeChanges in variance components across time Correlations between 10 relative typesCorrelations between 10 relative types

What you get:

Download: www.matthewckeller.com

Page 20: Simulating and Modeling Genetically Informative Data
Page 21: Simulating and Modeling Genetically Informative Data
Page 22: Simulating and Modeling Genetically Informative Data

SEM is great because…SEM is great because… Directs focus to effect sizes, not “significance” Directs focus to effect sizes, not “significance” Forces consideration of causes and consequencesForces consideration of causes and consequences Explicit disclosure of assumptionsExplicit disclosure of assumptions

Potential weakness…Potential weakness… Parameter reification: “Using the CTD we found that 50% Parameter reification: “Using the CTD we found that 50%

of variation is due to A and 20% to C.” of variation is due to A and 20% to C.”

Structural Equation Modeling (SEM) in BG

Page 23: Simulating and Modeling Genetically Informative Data

SEM is great because…SEM is great because… Directs focus to effect sizes, not “significance” Directs focus to effect sizes, not “significance” Forces consideration of causes and consequencesForces consideration of causes and consequences Explicit disclosure of assumptionsExplicit disclosure of assumptions

Potential weakness…Potential weakness… Parameter reification: “Using the CTD we found that 50% Parameter reification: “Using the CTD we found that 50%

of variation is due to A and 20% to C.” of variation is due to A and 20% to C.”

Structural Equation Modeling (SEM) in BG

NO! Only true under strong assumptions that probably aren’t

met (e.g., D=0) and usually go untested. To the degree

assumptions wrong, estimates are biased.

Page 24: Simulating and Modeling Genetically Informative Data

PT1

Ca

Dd

Ee c

A

PT2

Ca

D d

Eec

A1/.25

1

Classical Twin Design (CTD)

Page 25: Simulating and Modeling Genetically Informative Data

PT1

Ca

Dd

Ee c

A

PT2

Ca

D d

Eec

A1/.25

1

Classical Twin Design (CTD) Assumption biased up biased downAssumption biased up biased down

Either D or C is zero A C & DEither D or C is zero A C & D

No assortative mating C DNo assortative mating C D

No A-C covariance C D & ANo A-C covariance C D & A

Page 26: Simulating and Modeling Genetically Informative Data

Adding parents gets us around all these assumptions

Assumption biased up biased downAssumption biased up biased downEither D or C is zeroEither D or C is zero

No assortative matingNo assortative mating

No A-C covarianceNo A-C covariance

PMa

Ca

Dd

Ee c

A

qw

PFa

Ca

D d

Eec

A

qw

m

m

PT1

Ca

Dd

Ee c

A

PT2

Ca

Dd

Eec

A

mm

1/.25

µ

We don’t have to

make these

x x

Page 27: Simulating and Modeling Genetically Informative Data

With parents, we can break “C” up into:

S = env. factors shared only between sibs

F = familial env factors passed from parents to offspring

F

SC

PT1

SaDd

Ee s

A

f

F

PT2

S aD d

Ees

A

f

F1/.25

1

Parents also allow differentiation of S & F

PT1

Ca

Dd

Ee c

A

PT2

Ca

D d

Eec

A1/.25

1

Page 28: Simulating and Modeling Genetically Informative Data

Nuclear Twin Family Design (NTFD)

Assumptions:Assumptions: Only can estimate 3 of 4: A, D, S, and F (bias is variable)Only can estimate 3 of 4: A, D, S, and F (bias is variable) Assortative mating due to primary phenotypic assortment (bias is variable)Assortative mating due to primary phenotypic assortment (bias is variable)

Note: m estimated and

f fixed to 1

PMa

Sa

Dd

Ee s

A

q

x

w

f

F

PFa

Sa

D d

Ees

A

q

x

w

f

F

mm

PT1

Sa

Dd

Ee s

A

f

F

PT2

Sa

D d

Ees

A

f

F

mm

zd

zs

µ

Page 29: Simulating and Modeling Genetically Informative Data

Stealth

Include twins and their sibs, parents, spouses, and Include twins and their sibs, parents, spouses, and offspring…offspring… Gives 17 unique covariances (MZ, DZ, Sib, P-O, Spousal, Gives 17 unique covariances (MZ, DZ, Sib, P-O, Spousal,

MZ avunc, DZ avunc, MZ cous, DZ cous, GP-GO, and 7 in-MZ avunc, DZ avunc, MZ cous, DZ cous, GP-GO, and 7 in-laws) laws)

88 covariances with sex effects88 covariances with sex effects

Page 30: Simulating and Modeling Genetically Informative Data

can be estimated simultaneously

= env. factors shared only between twins

PT1

Sa

Dd

Ee s

A

f

F

PT2

Sa

D d

Ees

A

f

F1/.25

1

Additional obs. covs with Stealth allow estimation of A, S, D, F, T

Tt

d

T t

1/0

T

(Remember: we’re not just estimating more effects. More importantly, we’re

reducing the bias in estimated effects!)

FS DA T

Page 31: Simulating and Modeling Genetically Informative Data

Stealth

PMa

Sa

Dd

T

E

t

e s

A

q

x

w

f

F

PFa

Sa

D d

T

E

t

es

A

q

x

w

f

F

mm

PT1

Sa

Dd

T

E

t

e s

A

f

F

PMa

Sa

Dd

T

E

t

e s

A

q

x

w

f

F

PT2

Sa

D d

T

E

t

es

A

f

F

PFa

Sa

D d

T

E

t

es

A

q

x

w

f

F

mm

PCh

Sa

D d

T

E

t

es

A

f

F

mm

mm

PCh

Sa

Dd

T

E

t

e s

A

f

F

1/0

1/.25

1

µ

µ

µ

Page 32: Simulating and Modeling Genetically Informative Data

Stealth

Assumption biased up biased downAssumption biased up biased downPrimary assortative mating A, D, or F A, D, or FPrimary assortative mating A, D, or F A, D, or F

No epistasis A, D SNo epistasis A, D S

No AxAge D, S ANo AxAge D, S A

Page 33: Simulating and Modeling Genetically Informative Data

Stealth

Assumption biased up biased downAssumption biased up biased downPrimary assortative mating A, D, or F A, D, or FPrimary assortative mating A, D, or F A, D, or F

No epistasis A, D SNo epistasis A, D S

No AxAge D, S ANo AxAge D, S A

Primary AM: mates choose each other based on Primary AM: mates choose each other based on phenotypic similarityphenotypic similarity

Social homogamy: mates choose each other due to Social homogamy: mates choose each other due to environmental similarity (e.g., religion)environmental similarity (e.g., religion)

Convergence: mates become more similar to each Convergence: mates become more similar to each other (e.g., becoming more conservative when other (e.g., becoming more conservative when dating a conservative)dating a conservative)

Page 34: Simulating and Modeling Genetically Informative Data

PMa

Sa

Dd

T

E

t

e

PMa

s

e

a

A

q

x

wf

f

~t~

~

~~d~s~

F

µ

PFa

Sa

D d

T

E

t

e

PFa

s

e

a

A

q

x

wf

f

~t~

~

~~

d~ s~

F

mm

PT1

Sa

Dd

T

E

t

e

PT1

s

e

a

Af

f

~

t~

~

~~

d~s~F

PMa

Sa

Dd

T

E

t

e

PSp

s

e

a

A

q

x

wf

f

~t~

~

~~d~s~

F

PT2

Sa

D d

T

E

t

e

PT2

s

e

a

Af

f

~

t~

~

~~

d~s~ F

PFa

Sa

D d

T

E

t

e

PSp

s

e

a

A

q

x

wf

f

~t~

~

~~

d~ s~

F

mm

PCh

Sa

D d

T

E

t

es

A

f

F

µ µ

mm

mm

PCh

Sa

Dd

T

E

t

e s

A

f

F

1/0

1/.25

1

Cascade

Page 35: Simulating and Modeling Genetically Informative Data

Reality: A=.5, D=.2

Page 36: Simulating and Modeling Genetically Informative Data

Reality: A=.5, S=.2

Page 37: Simulating and Modeling Genetically Informative Data

Reality: A=.4, D=.15, S=.15

Page 38: Simulating and Modeling Genetically Informative Data

Reality: A=.35, D=.15, F=.2, S=.15, T=.15, AM=.3

Page 39: Simulating and Modeling Genetically Informative Data

Reality: A=.45, D=.15, F=.25, AM=.3 (Soc Hom)

Page 40: Simulating and Modeling Genetically Informative Data

Reality: A=.4, A*A=.15, S=.15

Page 41: Simulating and Modeling Genetically Informative Data

Reality: A=.4, A*Age=.15, S=.15

Page 42: Simulating and Modeling Genetically Informative Data

All models require assumptions. Generally, more All models require assumptions. Generally, more assumptions = more biased estimatesassumptions = more biased estimates

For the first time, we have demonstrated For the first time, we have demonstrated independent assessments of the NTFD, independent assessments of the NTFD, StealthStealth, , and and CascadeCascade models models These complicated models work as designed!These complicated models work as designed! In all models, but especially the CTD, please In all models, but especially the CTD, please

don’t REIFY A, C, & D!don’t REIFY A, C, & D!

Conclusions

Page 43: Simulating and Modeling Genetically Informative Data

Those who conceived of these models originally:Those who conceived of these models originally: Jinks, Fulker, Eaves, Cloninger, Reich, Rice, Jinks, Fulker, Eaves, Cloninger, Reich, Rice,

Heath, Neale, Maes, etc.Heath, Neale, Maes, etc. And to Nick Martin: for his energy and And to Nick Martin: for his energy and

enthusiasm, and for encouraging us to do this to enthusiasm, and for encouraging us to do this to begin withbegin with

Acknowledgments

Page 44: Simulating and Modeling Genetically Informative Data

Check bias & identification:Check bias & identification: Feed PE parameters you are modeling, simulate data, Feed PE parameters you are modeling, simulate data,

& see if your model recovers the parameters& see if your model recovers the parameters Check model’s sensitivity to assumptions:Check model’s sensitivity to assumptions:

Simulate violations of assumptions & note its effects Simulate violations of assumptions & note its effects on estimateson estimates

Estimate power & multivariate sampling dist’s of Estimate power & multivariate sampling dist’s of estimates under very general conditions:estimates under very general conditions:

Run PE multiple times given whatever condition you Run PE multiple times given whatever condition you wantwant

Why use it? Modeling aid

Download: www.matthewckeller.com

Page 45: Simulating and Modeling Genetically Informative Data

Find changes in variance parameters & relative Find changes in variance parameters & relative covariances under different modes of AM, VT, & covariances under different modes of AM, VT, & genetic effects:genetic effects:

Simulate random genetic drift by varying Simulate random genetic drift by varying population sizepopulation size

Introduce selection (coming) to test theories on Introduce selection (coming) to test theories on maintenance of genetic variationmaintenance of genetic variation

Why use it? Predictor of population / evolutionary genetics dynamics

Download: www.matthewckeller.com


Recommended