+ All Categories
Home > Documents > Genome Wide Association Study for Binomially Distributed ...

Genome Wide Association Study for Binomially Distributed ...

Date post: 03-Dec-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
55
Genome Wide Association Study for Binomially Distributed Traits: A Case Study for Stalk Lodging in Maize Esperanza Shenstone and Alexander E. Lipka Department of Crop Sciences University of Illinois at Urbana-Champaign
Transcript
Page 1: Genome Wide Association Study for Binomially Distributed ...

GenomeWideAssociationStudyforBinomiallyDistributedTraits:

ACaseStudyforStalkLodginginMaize

EsperanzaShenstoneandAlexanderE.LipkaDepartmentofCropSciences

UniversityofIllinoisatUrbana-Champaign

Page 2: Genome Wide Association Study for Binomially Distributed ...

UnifiedMixedLinearModel(MLM)inGWAS

2Yu et al. (2006)

AdaptedfromA.Lipka

Phenotype of ithindividual

Grand Mean

Fixed effects: account for population structure

Marker effect

Observed SNP alleles of ith individual

Random effects: account for familial relatedness

Random errorterm

Measures relatedness between individuals

Page 3: Genome Wide Association Study for Binomially Distributed ...

AssumptionsoftheUnifiedMLM

Whatdowedoiftheseassumptionscannotbemet?(Example:Binomiallydistributeddata)

3Yu et al. (2006)

AdaptedfromA.Lipka

Page 4: Genome Wide Association Study for Binomially Distributed ...

BinomialDistribution:#SuccessesinnIndependentSuccess/FailureTrials

MixedLogisticRegressiondoesnotrequirenormalityorequalvariancesConductGWASbyfittingalogisticregressionmodelateachSNP

LogitLinkfunction:Thenaturallog-oddsofasuccess

Thegrandmean

Problem:Fittingthismodelisextremelycomputationallyintensive!!!

Page 5: Genome Wide Association Study for Binomially Distributed ...

Purpose

Developamulti-modelGWASapproachthatwillallowmixedmodelGWAStobeconductedonbinomially

distributedtraits

5

Page 6: Genome Wide Association Study for Binomially Distributed ...

StalkLodgingInMaize

6

StalkStrength

Disease/Pests

EnvironmentalFactors

5-20%yieldlossesworldwide

Flint-Garciaetal.,2003

Page 7: Genome Wide Association Study for Binomially Distributed ...

DataCollection- 2016

7

TwoRepsoftheGoodman-Bucklerdiversitypanelwereplantedusingincompleteblockdesign

TheentireexperimentwasinoculatedwithGoss’swilt

Inthisexperimenttherewasnocorrelationbetweendiseaseandlodging

TheJamannLabatUIUC

Page 8: Genome Wide Association Study for Binomially Distributed ...

LodgingPhenotyping

8

Standcount NumberofPlantsLodged

NumberofplantsNotlodged

LodgingScore(PercentLodged)

23 6 17 26%BeginningofGrowingSeasonEndofgrowingseason

Above:Diagramdepictingoneplot(rep)ofonetaxainthefield

Page 9: Genome Wide Association Study for Binomially Distributed ...

TreatLodgingDataasaBinomialSetupofBinomial Whywethinkbinomialisan

appropriateapproximationforlodging

Theexperimentconsistsofnrepeatedtrials

Withineachplot,eachplantisatrial

Eachtrialhastwooutcomes:successorfailure

Success: planthaslodgedFailure: Planthasnotlodged

Theprobabilityofsuccess, π,isthesameoneverytrial

Theprobabilityofaplantlodging,π,isthesamewithinaplot

Thetrialsareindependent Oneplantlodgingwillnotchangethelikelihoodofanotherplantlodging

9

Page 10: Genome Wide Association Study for Binomially Distributed ...

Multi-ModelApproach

10

Model1FitLogisticRegressionModel

Controlsforpopulation

structureonlyIdentifypeak

SNPs

Model2FitaMixedLinearModelControlsforpopulation

structureandrelatednessIdentifypeak

SNPs

Model3FitMixedLogistic

RegressionModel

UsingPeakSNPsfromModel1andModel2

Page 11: Genome Wide Association Study for Binomially Distributed ...

LogisticRegressionIdentified~50%ofMarkerstobeSignificant

11

Thetop2,796SNPsfromthismodelweresubset

RStudioChromosome

PeakSNPPossibleSNPsofinterest

Motivation:mixedlogisticregressionmodelcanfit2,796modelsin<1day

Page 12: Genome Wide Association Study for Binomially Distributed ...

UnifiedMLMIdentifiedNoSignificantSignals

12GAPITLipkaetal.,2012

Chromosome

Page 13: Genome Wide Association Study for Binomially Distributed ...

MixedLogisticRegressionIdentifies68%ofSNPsIdentifiedinLogisticRegression

toBeSignificant

13

SAS9.4PROCGLIMMIX

Accountingforfamilialrelatednesshelpedrefinelocationofputativegenomicregions

SignalscoincidewiththosepreviouslyidentifiedfortraitsrelatedtolodgingChromosome

Page 14: Genome Wide Association Study for Binomially Distributed ...

14

SimulationStudyinGoodman-BucklerDiversitypanel:

Determinewhichparametersofthebinomialdistributioncontributethemosttoidentificationofgenomic

signals

Page 15: Genome Wide Association Study for Binomially Distributed ...

15

Assign SNP from 4K Set to be QTN

Simulate binomial distributed trait

QTN Effect size Stand count per plot Grand mean

Simulate Data~100 “Traits”

Fit logistic regression model at each of 55K SNPs

ProposedMethodologyforSimulationStudy

Foreachtraitineachsetting:Assessedgenomicpositionsof“top100”markerswithstrongestassociations

Page 16: Genome Wide Association Study for Binomially Distributed ...

HowdoesthetotalnumberofplantsinaplotaffectQTNdetection?

StandCount:10

Proportionoftimesdetected:1.0

Model1

Top100SNPsfromeachtraitusedtocreatethisfigure

Page 17: Genome Wide Association Study for Binomially Distributed ...

HowdoesthetotalnumberofplantsinaplotaffectQTNdetection?

StandCount:15

Proportionoftimesdetected:1.0

Model1

Top100SNPsfromeachtraitusedtocreatethisfigure

Page 18: Genome Wide Association Study for Binomially Distributed ...

HowdoesthetotalnumberofplantsinaplotaffectQTNdetection?

StandCount:20

Proportionoftimesdetected:1.0

Model1

Top100SNPsfromeachtraitusedtocreatethisfigure

Page 19: Genome Wide Association Study for Binomially Distributed ...

HowdoesthetotalnumberofplantsinaplotaffectQTNdetection?

StandCount:25

Proportionoftimesdetected:1.0

Model1

Top100SNPsfromeachtraitusedtocreatethisfigure

StandcountdoesnotappeartoaffectourabilitytodetectQTN

Page 20: Genome Wide Association Study for Binomially Distributed ...

HowdoesgrandmeanaffectQTNdetection?

GrandMean=0;P{Success}=0.5Proportionoftimesdetected:1.0

Model1

Top100SNPsfromeachtraitusedtocreatethisfigure

Page 21: Genome Wide Association Study for Binomially Distributed ...

HowdoesgrandmeanaffectQTNdetection?

GrandMean=1;P{Success}=0.73Proportionoftimesdetected:1.0

Model1

Top100SNPsfromeachtraitusedtocreatethisfigure

Page 22: Genome Wide Association Study for Binomially Distributed ...

HowdoesgrandmeanaffectQTNdetection?

GrandMean=3;P{Success}=0.95Proportionoftimesdetected:0.82

Model1

Top100SNPsfromeachtraitusedtocreatethisfigure

Page 23: Genome Wide Association Study for Binomially Distributed ...

HowdoesgrandmeanaffectQTNdetection?

GrandMean=5;P{Success}=0.99

Proportionoftimesdetected:0.10

Model1

Top100SNPsfromeachtraitusedtocreatethisfigure

GrandmeanvaluesaffectsourabilitytodetectQTN

Page 24: Genome Wide Association Study for Binomially Distributed ...

FutureDirectionsAnyphenotypethatmeasures#successesinaplotofn plantscouldtheoreticallyusetheseapproaches- Trytodesignexperimentsthatresultinabaselineprobabilityofsuccessof0.5

HowcanwefitmixedlinearmodelsinacomputationallyefficientmanneronaWindows/Maccomputer?- Temporarysolution:multi-modelapproachisreasonable- Trytostrivefor:writesoftwarethatusesthescoretest

24

Page 25: Genome Wide Association Study for Binomially Distributed ...

Acknowledgements

25

CommitteeMembersDr.AlexanderE.LipkaDr.TiffanyJamannDr.MartinBohnDr.PatBrown

TheJamannLabJulianCooper

TheLipkaLabBrianRiceAngelaChen

GraduateStudentsAmandaOwings

DepartmentofCropSciencesatUIUC

Page 26: Genome Wide Association Study for Binomially Distributed ...

Model1vs.Model2Comparison

Page 27: Genome Wide Association Study for Binomially Distributed ...

VaryingAdditiveEffectSizes(SameAssignedQTN)

Additiveeffectsize0.5onchromosome8

Proportiontimedetected:0.93

Additiveeffectsize0.1onchromosome8

Proportionoftimesdetected:0.07

Page 28: Genome Wide Association Study for Binomially Distributed ...

SummaryofResultsAbletoidentifytwosignificantSNPsintheBPregionofMaizeStalkStrengthQTL

Lietal.,2014,Flint-Garciaetal.,2003,Huetal.,2012PeakSNPsonChromosome7wereinthesamelocationasthemostrobustmarkerassociationwithRPR

Piefferetal.,2013AsignificantSNPonChromosome1wasinthesameregionasacandidategeneforMediterraneanCornBorerstalkdestructionsusceptibility

Samayoaetal.,2015

28

Page 29: Genome Wide Association Study for Binomially Distributed ...

29

HighLDDecayObservedAroundPeakSNPonChromosomeSeven

Page 30: Genome Wide Association Study for Binomially Distributed ...

LimitingFactorsofThisStudyStalklodgingisaputativelylowheritabilitytraitNorepeatabilityacrossreplications

OnlyoneyearofdataincludedinthisanalysisOnlyoneenvironment

MissingdataVariousfactorscontributed

30

Page 31: Genome Wide Association Study for Binomially Distributed ...

SummaryofProjectLogisticRegressioniscomputationallyintensiveApproximately30secondstorun1SNPinSAS~17.36daystorun50,000SNPs

Model1andModel2areusedtoidentifywhichSNPsarefitusingthecompletelogisticregressionmodel(Model3)ThenumberofSNPstoincludeisdependentoncomputationalpoweravailable

StalkLodgingdatawasusedtotestthisapproachSomePeakSNPsidentifiedareinthesameregionasQTLassociatedwithstalkstrength,andacandidategeneforMCBStalkDamage

31

Page 32: Genome Wide Association Study for Binomially Distributed ...

DataCollection- 2016

32

2Repsofthe282diversitypanelwereplantedusingincompleteblockdesign

TheentireexperimentwasinoculatedwithGoss’swilt

Inthisexperimenttherewasnocorrelationbetweendiseaseandlodging TheJamannLab

Page 33: Genome Wide Association Study for Binomially Distributed ...

ObservedLodgingintheField

33

Taxaclassifiedasnonstiffstalkwerelodgedmoreoften

Taxaclassifiedasstiffstalkwerelodgedthelessoften

Allplotsrepresentedinthisgraphhadatleast10plantslodged

Page 34: Genome Wide Association Study for Binomially Distributed ...

LodgingScoreResidualsFollowaNon-NormalDistribution

34

The Box-Cox procedure was implemented, and λ=-0.6 was the suggested transformation

Transformation was unsuccessful

352 plots had no lodging

DistributionofLodgingScores

LodgingScoreFreq

uency

Page 35: Genome Wide Association Study for Binomially Distributed ...

Genome-WideAssociationStudy(GWAS)

SearchthegenomeforgeneticmarkerssignificantlyassociatedwithyourtraitofinterestAllowsfortheidentificationofQTLsregionofthegenomeassociatedwiththetrait

35SingleNucleotidePolymorphism(SNP):Atypeofgeneticmarker

http://knowgenetics.org/snps/

Page 36: Genome Wide Association Study for Binomially Distributed ...

36

282DiversityPanel

~75%ofallallelicdiversityinMaize

AdaptedfromFlint-Garciaetal.,2005Romayetal.,2013

Page 37: Genome Wide Association Study for Binomially Distributed ...

OutlineIntroduction

Genome-WideAssociationonStalkLodginginMaize

SimulationStudy

Conclusions

37

Page 38: Genome Wide Association Study for Binomially Distributed ...

UnifiedMixedLinearModelControlsforFalsePositives

38Yuetal.,2006

Simple

Simple

FloweringtimeofMaize(Highpopulationstructure)

EarDiameterofMaize(LowPopulationStructure)

Page 39: Genome Wide Association Study for Binomially Distributed ...

StalkLodginginMaizePredictinglodgingischallenging

Mostmethodsaredestructiveand/oruseothertraitsasproxies

Canphenotypinglodgingstillyieldinterestingresults?

39

StangerandLauer,2006

Page 40: Genome Wide Association Study for Binomially Distributed ...

BinomialDataAllowsforLogisticRegression

40

Page 41: Genome Wide Association Study for Binomially Distributed ...

Methods

OneSNPsfrom4K markersetwasassignedtobeQTN

Taxafromthe282diversitypanelweresimulatedtoexperiencelodging

The55Kmarkersetwasusedtogenotypethetaxausedinthesimulation

41

Page 42: Genome Wide Association Study for Binomially Distributed ...

ObjectivesEvaluatetheefficacyofthethreemodelapproachtomixedlogisticregression

EvaluatetheuseofthediversitypanelforuseinlogisticregressionGWAS

ExaminehowvariableswithinthedataseteffecttheabilitytodetectaQTN

42

Page 43: Genome Wide Association Study for Binomially Distributed ...

SimulationStudySettings

43

Setting GrandMean

StandCount

Additiveeffectsize

1 0 10 0.92 1 10 0.93 3 10 0.94 5 10 0.95 0 15 0.96 0 20 0.97 0 25 0.9

Page 44: Genome Wide Association Study for Binomially Distributed ...

Model1identifiesPeakSNPsWhileAccountingforPopulationStructure

44

Page 45: Genome Wide Association Study for Binomially Distributed ...

45

Page 46: Genome Wide Association Study for Binomially Distributed ...

Whatdoeschangingtheinterceptdotoourdata?

Page 47: Genome Wide Association Study for Binomially Distributed ...

Model3FailedtoConvergeinSASProcGLIMMIXPossiblereasonsforthisfailure:• “therewasnotenoughvariationintheresponsetoattributeanyvariationtotherandomeffect”•EstimatedGmatrixisnotpositivedefinite:“procedureconvergedtoasolutionswherethevarianceoftherandomeffectis0”AlternativeSolution:• UsetheGMMATpackage(Chenetal.2015)(OnlyrunsonUNIXOS)

47

Page 48: Genome Wide Association Study for Binomially Distributed ...

Model2

48

• Model2mayhavehadenoughpowertosuccessfullydetectQTNdespitemodelassumptionsbeingviolated•Previousstudieshaveshownthatlinearmodelscansometimesbeapproximatedbylogisticregressionmodels

Page 49: Genome Wide Association Study for Binomially Distributed ...

Conclusion➢TraditionalGWASrequiresnormaldata

➢Logisticregressionhasthepotentialtoanalyzenon-normallydistributedtraits

➢Thebiggestlimitationofusinglogisticregressionisthecomputationalpower

required

➢ SimulationStudyshowtheneedforincreasedvariabilityofphenotypicdata- this

isespeciallyhardtoachieveinabinarytrait

49

Page 50: Genome Wide Association Study for Binomially Distributed ...

Model1identifiesPeakSNPsWhileAccountingforPopulationStructure

50

Page 51: Genome Wide Association Study for Binomially Distributed ...

BinomialDataAllowsforLogisticRegression

LogisticRegressiondoesnotrequirenormalityorequalvariancesConductGWASbyfittingalogisticregressionmodelateachSNP

LogitLinkfunction:Thenaturallog-oddsofaplantislodgedornotlodged

Thegrandmean

Page 52: Genome Wide Association Study for Binomially Distributed ...

Model2IdentifiesPeakSNPsWhileControllingforPopulationStructureandRelatedness

52

Phenotype of ithindividual

Grand Mean

Fixed effects: account for population structure

Marker effect

Observed SNP alleles of ith individual

Random effects: account for familial relatedness

Random errorterm

Yu et al. (2006)

Measures relatedness between individuals

AdaptedfromA.Lipka

Page 53: Genome Wide Association Study for Binomially Distributed ...

Model3isFitUsingSubsetofPeakSNPs

53

SAS9.4PROCGLIMMIX

Model3isfitusingtopSNPsfromModel1

Recommendation:NumberofSNPsthatcanberuninapproximately24hours

Page 54: Genome Wide Association Study for Binomially Distributed ...

ResultsofSimulationStudyinContextofStalkLodgingData

• Itispossiblethatourmodel’sabilitytoaccuratelydetectQTLwascompromisedbecauseofanobservedlowrateoflodging• Canwecontrol•Ifthisbaselineprobabilityoccurs,thentheinabilityofourmodeltodetectQTLmayhavebeenexacerbatedbyaninterceptvaluethatisfarremoved0.

54

Page 55: Genome Wide Association Study for Binomially Distributed ...

55

PeakSNPsthatCoincidewithSignalsAssociatedwithRelatedTraits

TypeofRegionidentified

Chr LocationinLiterature LocationinModel3 Notes

Marker 7 159.4Mb 161.9Mb155.8Mb164.9Mb

ThreemostsignificantSNPsonChr7

qRPR2QTL 2 236.4-237.0Mb 236.8Mb 14th mostsignificantSNPonChr2

qRPR3-1QTL 3 181.1Mb-184.7 181.7Mb182.0Mb 92nd and98th mostsignificantSNPOnChr3


Recommended