+ All Categories
Home > Documents > Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a...

Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a...

Date post: 26-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
34
Introduc)on to PLINK GBIO0009 Kridsadakorn Chaichoompu University of Liege 19/10/16 GBIO0015 1
Transcript
Page 1: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Introduc)ontoPLINK

GBIO0009KridsadakornChaichoompu

UniversityofLiege

19/10/16 GBIO0015 1

Page 2: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

PLINK:WhyPLINK?•  PLINKisawholegenomeassocia)onanalysissoKware,anditis

FREE!hQp://pngu.mgh.harvard.edu/~purcell/plink/

•  PLINKhasawell-documentedmanualtoexplainallfeatures•  PLINKisavailableforLinux,MacOS,andMS-DOS•  PLINKhas2versions,thestableversion(1.07)andthebetaversion

(1.9)–  PLINK1.9worksmuchfasterthan1.07–  PLINK1.9hasmanynewfeatures

•  gPLINKistheotherversionofPLINKthatprovidesgraphicaluserinterface.PleasebeawarethatusingPLINKforawhilegenomeanalysisusuallytakesalong)me,itisbeQertouseacommand-lineversion

•  RecommendtousePLINK1.07

19/10/16 GBIO0015 2

Page 3: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

PLINK:Let’sgetstarted

•  TodownloadPLINK:hQp://pngu.mgh.harvard.edu/~purcell/plink/dist/plink-1.07-i686.zip•  Inplink-1.07-xxx.zip,thereisanexamplesetofinputfileswhichisagoodpointtoexplore–  test.mapcontainsthemarkerinforma)on–  test.pedcontainsgenotypedataandsampleinforma)on

•  Checkwhatareinsidetheexamplefiles!plink--filetest

19/10/16 GBIO0015 3

Page 4: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Exampledata

•  Downloadtheexampledatafromthecoursewebsite– TSI_JPT_chr20_case_control.bed– TSI_JPT_chr20_case_control.bim– TSI_JPT_chr20_case_control.fam– TSI_JPT_chr20_pheno_header.txt– TSI_JPT_chr20_pheno.txt

19/10/16 GBIO0015 4

Page 5: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

PLINK:FileFormats

PLINKmainlysupports3typesofformats•  Standardtextformat(PEDandMAP)Notethatallfilesmusthavethesamename,otherwiseweneedtoclearlyindicatebyusing--pedand–map plink--filetest

•  Binaryformat(BED,BIM,andFAM) plink--bfiletest

•  Transposedtextformat(TPED,andTFAM)Notethatallfilesmusthavethesamename,otherwiseweneedtoclearlyindicatebyusing--tpedand--4am plink--hiletest

19/10/16 GBIO0015 5

Page 6: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Formatconversion•  Toconvertortoindicateoutputastextformat(PEDandMAP

plink--filetest--recode--outtest_ped•  ToconvertortoindicateoutputasBinaryformat(BED,BIM,and

FAM)plink--filetest--make-bed--outtest_bin

•  ToconvertortoindicateoutputasTransposedtextformat(TPED,andTFAM)plink--filetest--transpose--recode--outtest_tp

•  Alterna)vely,itispossibletorecodedataas1/2encodingplink--filetest--recode12--outtest_12

•  Toconverttoaddi)veencodingplink--filetest--recodeAD--outtest_12

•  ItispossibletoswitchbetweenA,T,G,Cencodingto1,2,3,4encodingbyusing--allele1234or--alleleACGTviceversa

19/10/16 GBIO0015 6

Page 7: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

AlternatephenotypefilesTospecifyanalternatephenotypeforanalysis,i.e.otherthantheoneinthe*.pedfile(or,ifusingabinaryfileset,the*.famfile),usethe--phenoop)on:

plink--filemydata--phenopheno.txtwherepheno.txtisafilethatcontains3columns(onerowperindividual):FamilyIDIndividualIDPhenotypeTheoriginalPEDfilemusts)llcontainaphenotypeincolumn6,unlessthe--no-phenoflagisgiven.Theorderofthealternatephenotypefileneednotbethesameasfortheoriginalfile.Ifthephenotypefilecontainsmorethanonephenotype,thenusethe--mphenoNop)ontospecifytheNthphenotypeistheonetobeused:

plink--filemydata--phenopheno2.txt--mpheno4wherepheno2.txtcontains5differentphenotypes,thiscommandwillusethe4thforanalysis(phenotypeD):FamilyIDIndividualIDPhenotypeAPhenotypeBPhenotypeCPhenotypeDPhenotypeEIfyourfileiscoded0/1torepresentunaffected/affected,thenusethe--1flag:

plink--filemydata--1

19/10/16 GBIO0015 7

Page 8: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Datamanipula)on:SNPs(1/3)TogetasetofSNPs,youcanspecifyasingleSNPand,op)onally,alsoaskforallSNPsinthesurroundingregion,withthe--windowop)on:

plink--bfilemydata--snprs652423--window20whichextractsonlySNPswithin+/-20kbofrs652423basedonmul)pleSNPsandranges(--snps)The--snpscommandwillacceptacomma-delimitedlistofSNPs,includingrangesbasedonphysicalposi)on.Forexample,

plink--bfilemydata--snpsrs273744-rs89883,rs12345-rs67890,rs999,rs222Basedonphysicalposi)on(--from-kb,etc)

plink--bfilemydata--chr2--from-kb5000--to-kb10000toselectallSNPswithinthis5000kbregiononchromosome2.

19/10/16 GBIO0015 8

Page 9: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Datamanipula)on:SNPs(2/3)

Tomergemorethantwostandardand/orbinaryfilesets,itisoKenmoreconvenienttospecifyasinglefilethatcontainsalistofPED/MAPForexample,considerwehad4PED/MAPfilesets(labelledfA.*throughfD.*)and4binaryfilesets,labelledfE.*throughfH.*).Thenusingthecommand:plink--filefA--merge-listallfiles.txt--make-

bed--outmynewdata

19/10/16 GBIO0015 9

Page 10: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Datamanipula)on:SNPs(3/3)

ToexcludesomesetsofSNPsplink--filedata--excludemysnps.txtwherethefilemysnps.txtis,asforthe--extractcommand,justalistofSNPs,oneperline.

19/10/16 GBIO0015 10

Page 11: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Datamanipula)on:individuals(1/3)

Togetasetofindividualsplink--filedata--keepmylist.txt

wherethefilemylist.txtis,asforthe--removecommand,justalistofFamilyID/IndividualIDpairs,onesetperline,i.e.onepersonperline.(fieldscanoccuraKerthe2ndcolumnbuttheywillbeignored--i.e.youcoulduseaFAMfileastheparameterofthe--keepcommand,orhavecommentsinthefile.Forexample

F1011F10012_BF30331_ADropthisindividualbecauseofconsentissuesF444222

19/10/16 GBIO0015 11

Page 12: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Datamanipula)on:individuals(2/3)

Toexcludeasetofindividualsplink--filedata--removemylist.txt

wherethefilemylist.txtis,asforthe--keepcommand,justalistofFamilyID/IndividualIDpairs,onesetperline,i.e.onepersonperline(although,asfor--keep,fieldsaKerthe2ndcolumnareallowedbuttheywillbeignored).

19/10/16 GBIO0015 12

Page 13: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Datamanipula)on:individuals(3/3)Filtersomeindividuals

plink--filedata--filtermyfile.raw1--freqimpliesafilemyfile.rawexistswhichhasasimilarformattophenotypeandclusterfiles:thatis,thefirsttwocolumnsarefamilyandindividualIDs;thethirdcolumnisexpectedtobeanumericvalue(althoughthefilecanhavemorethan3columns),andonlyindividualswhohaveavalueof1forthiswouldbeincludedinanysubsequentanalysisorfilegenera)onprocedure.e.g.ifmyfile.rawwere

F1I12F2I17F3I11F3I21F3I33

Becausefilteringoncasesorcontrols,oronsex,oronposi)onwithinthefamily,willbecommonopera)ons,therearesomeshortcutop)onsthatcanbeusedinsteadof--filter.Theseare:

--filter-cases--filter-controls--filter-males--filter-females--filter-founders--filter-nonfounders

19/10/16 GBIO0015 13

Page 14: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Qualitycontrolprocesses

•  Missinggenotype•  Hardy-WeinbergEquilibrium•  MinorAllelefrequency•  Linkagedisequilibriumpruning•  Mendelerrors

19/10/16 GBIO0015 14

Page 15: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

MissinggenotypeTogeneratealistgenotyping/missingnessratesta)s)cs:

plink--filedata--missingThisop)oncreatestwofiles: plink.imiss plink.lmisswhichdetailmissingnessbyindividualandbySNP(locus),respec)vely.Forindividuals,theformatis:

FID FamilyIDIID IndividualIDMISS_PHENO Missingphenotype?(Y/N)N_MISS NumberofmissingSNPsN_GENO Numberofnon-obligatorymissinggenotypesF_MISS Propor)onofmissingSNPs

ForeachSNP,theformatis:SNP SNPiden)fierCHR ChromosomenumberN_MISS NumberofindividualsmissingthisSNPN_GENO Numberofnon-obligatorymissinggenotypesF_MISS Propor)onofsamplemissingforthisSNP

19/10/16 GBIO0015 15

Page 16: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Clusteringbasedonmissinggenotypes

Systema)cbatcheffectsthatinducemissingnessinpartsofthesamplewillinducecorrela)onbetweenthepaQernsofmissingdatathatdifferentindividualsdisplay.Oneapproachtodetec)ngcorrela)oninthesepaQerns,thatmightpossiblyidenitysuchbiases,istoclusterindividualsbasedontheiriden)ty-by-missingness(IBM).

plink--filedata--cluster-missingwhichcreatesthefiles:

plink.matrix.missingplink.cluster3.missing

whichhavesimilarformatstothecorrespondingIBSclusteringfiles.

19/10/16 GBIO0015 16

Page 17: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

MissingrateperpersonTheini)alstepinalldataanalysisistoexcludeindividualswithtoomuchmissinggenotypedata.Thisop)onissetasfollows:

plink--filemydata--mind0.1whichmeansexcludewithmorethan10%missinggenotypes.Alineintheterminaloutputwillappear,indica)nghowmanyindividualswereremovedduetolowgenotyping.Ifanyindividualswereremoved,afilecalled

plink.iremwillbecreated,lis)ngtheFamilyandIndividualIDsoftheseremovedindividuals.Anysubsequentanalysisalsospecifeidonthesamecommandlinewillbeperformedwithouttheseindividuals.

19/10/16 GBIO0015 17

Page 18: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

MissingrateperSNP

Subsequentanalysescanbesettoautoma)callyexcludeSNPsonthebasisofmissinggenotyperate,withthe--genoop)on:thedefaultistoincludeallSNPS(i.e.--geno1).ToincludeonlySNPswitha90%genotypingrate(10%missing)useplink--filemydata--geno0.1

Aswiththe--mafop)on,thesecountsarecalculatedaKerremovingindividualswithhighmissinggenotyperates.

19/10/16 GBIO0015 18

Page 19: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Hardy-WeinbergEquilibrium(1/2)TogeneratealistofgenotypecountsandHardy-Weinbergteststa)s)csforeachSNP,usetheop)on:

plink--filedata--hardywhichcreatesafile: plink.hweThisfilehasthefollowingformat

SNP SNPiden)fierTEST Codeindica)ngsampleA1 MinorallelecodeA2 MajorallelecodeGENOGenotypecounts:11/12/22O(HET)ObservedheterozygosityE(HET) ExpectedheterozygosityP H-Wp-value

19/10/16 GBIO0015 19

Page 20: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Hardy-WeinbergEquilibrium(2/2)ToexcludemarkersthatfailuretheHardy-Weinbergtestataspecifiedsignificancethreshold,usetheop)on:

plink--filemydata--hwe0.001Bydefaultthisfilterusesanexacttest.Thestandardasympto)c(1dfgenotypicchi-squaredtest)canberequestedwiththe--hwe2op)oninsteadof--hwe.Thefollowingoutputwillappearintheconsolewindowandinplink.log,detailinghowmanySNPsfailedtheHardy-Weinbergtest,forthesampleasawhole,and(whenPLINKhasdetectedadiseasephenotype)forcasesandcontrolsseparately:Wri)ngHardy-Weinbergtests(founders-only)to[plink.hwe]30markersfailedHWEtest(p<=0.05)andhavebeenexcluded34markersfailedHWEtestincases30markersfailedHWEtestincontrolsThistestwillonlybebasedonfounders(iffamily-baseddataarebeinganalysed)unlessthe--nonfoundersop)onisalsospecified.

19/10/16 GBIO0015 20

Page 21: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

AllelefrequencyTogeneratealistofminorallelefrequencies(MAF)foreachSNP,basedonallfoundersinthesample:

plink--filedata--freqwillcreateafile: plink.frqwithfivecolumns:CHR ChromosomeSNP SNPiden)fierA1 Allele1code(minorallele)A2 Allele2code(majorallele)MAF MinorallelefrequencyNCHROBS Non-missingallelecount

19/10/16 GBIO0015 21

Page 22: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

MinorAllelefrequencyOnceindividualswithtoomuchmissinggenotypedatahavebeenexcluded,subsequentanalysescanbesettoautoma)callyexcludeSNPsonthebasisofMAF(minorallelefrequency):plink--filemydata--maf0.05

meansonlyincludeSNPswithMAF>=0.05.Thedefaultvalueis0.01.Thisquan)tyisbasedonlyonfounders(i.e.individualsforwhomthepaternalandmaternalindividualcodesandboth0).Thisop)onisappropriatelycountsallelesforXandYchromosomeSNPs.

19/10/16 GBIO0015 22

Page 23: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Linkagedisequilibriumpruning(1/2)Some)mesitisusefultogenerateaprunedsubsetofSNPsthatareinapproximatelinkageequilibriumwitheachother.Thiscanbeachievedviatwocommands:--indepwhichprunesbasedonthevarianceinfla)onfactor(VIF),whichrecursivelyremovesSNPswithinaslidingwindow;second,--indep-pairwisewhichissimilar,exceptitisbasedonlyonpairwisegenotypiccorrela)on.TheVIFpruningrou)neisperformed:

plink--filedata--indep5052willcreatefiles plink.prune.in plink.prune.outEachisasimlpelistofSNPIDs;boththesefilescansubsequentlybespecifiedastheargumentfora--extractor--excludecommand.Theparametersfor--indepare:windowsizeinSNPs(e.g.50),thenumberofSNPstoshiKthewindowateachstep(e.g.5),theVIFthreshold.TheVIFis1/(1-R^2)whereR^2isthemul)plecorrela)oncoefficientforaSNPbeingregressedonallotherSNPssimultaneously.Thatis,thisconsidersthecorrela)onsbetweenSNPsbutalsobetweenlinearcombina)onsofSNPs.

19/10/16 GBIO0015 23

Page 24: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Linkagedisequilibriumpruning(2/2)

Thesecondprocedureisperformed:plink--filedata--indep-pairwise5050.5

Thisgeneratesthesameoutputfilesasthefirstop)on;theonlydifferenceisthatasimplepairwisethresholdisused.Thefirsttwoparameters(50and5)arethesameasabove(windowsizeandstep);thethirdparameterrepresentsther^2threshold.Togiveaconcreteexample:thecommandabovethatspecifies5050.5woulda)considerawindowof50SNPs,b)calculateLDbetweeneachpairofSNPsinthewindow,b)removeoneofapairofSNPsiftheLDisgreaterthan0.5,c)shiKthewindow5SNPsforwardandrepeattheprocedure.Tomakeanew,prunedfile,thenusesomethinglike(inthisexample,wealsoconvertthestandardPEDfilesettoabinaryone):plink--filedata--extractplink.prune.in--make-bed--outpruneddata

19/10/16 GBIO0015 24

Page 25: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

MendelerrorsTogeneratealistofMendelerrorsforSNPsandfamilies,usetheop)on:

plink--filedata--mendelwhichwillcreatefiles: plink.mendel plink.imendel plink.fmendel plink.lmendelThe*.mendelfilecontainsallMendelerrors(i.e.onelinepererror);the*.imendelfilecontainsasummaryofper-individualerrorrates;the*.fmendelfilecontainsasummaryofper-familyerrorrates;the*.lmendelfilecontainsasummaryofper-SNPerrorrates.The*.mendelfilehasthefollowingcolumns:FID FamilyIDKID ChildindividualIDCHR ChromosomeSNP SNPIDCODE Anumericalcodeindica)ngthetypeoferror(seebelow)ERROR Descrip)onoftheactualerror

19/10/16 GBIO0015 25

Page 26: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Associa)onAnalysis

•  Case/control•  Fisher'sexact•  Fullmodel•  Quan)ta)vetrait•  Linearandlogis)cmodels•  Mul)ple-testcorrec)on

19/10/16 GBIO0015 26

Page 27: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

ManhaQanplotusingGWASTools

manhattanPlot(assoc$P,chromosome=assoc$CHR)

05/10/16 KC-ULg 27

Page 28: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

QQplotusingGWASToolsqqPlot(pval=assoc$P,truncate=TRUE, main="QQ Plot of P-values")

05/10/16 KC-ULg 28

Page 29: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Basiccase/controlassocia)ontestToperformastandardcase/controlassocia)onanalysis,usetheop)on:

plink--filemydata--assocwhichgeneratesafile plink.assoc whichcontainsthefields:CHR ChromosomeSNP SNPIDBP Physicalposi)on(base-pair)A1 Minorallelename(basedonwholesample)F_A FrequencyofthisalleleincasesF_U FrequencyofthisalleleincontrolsA2 MajorallelenameCHISQ Basicallelictestchi-square(1df)P Asympto)cp-valueforthistestOR Es)matedoddsra)o(forA1,i.e.A2isreference)

19/10/16 GBIO0015 29

Page 30: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Fisher'sExacttest(allelicassocia)on)Toperformastandardcase/controlassocia)onanalysisusingFisher'sexacttesttogeneratesignificance,usetheop)on:

plink--filemydata--fisherwhichgeneratesafile plink.fisherwhichcontainsthefields:CHR ChromosomeSNP SNPIDBP Physicalposi)on(base-pair)A1 Minorallelename(basedonwholesample)F_A FrequencyofthisalleleincasesF_U FrequencyofthisalleleincontrolsA2 MajorallelenameP Exactp-valueforthistestOR Es)matedoddsra)o(forA1)Asdescribedbelow,if--fisherisspecifiedwith--modelaswell,PLINKwillperformgenotypictestsusingFisher'sexacttest.

19/10/16 GBIO0015 30

Page 31: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Alternate/fullmodelassocia)ontestsItispossibletoperformtestsofassocia)onbetweenadiseaseandavariantotherthanthebasicallelictest(whichcomparesfrequenciesofallelesincasesversuscontrols),byusingthe--modelop)on.Thetestsofferedhereare(inaddi)ontothebasicallelictest):

Cochran-ArmitagetrendtestGenotypic(2df)testDominantgeneac)on(1df)testRecessivegeneac)on(1df)test

Thegenotypictestprovidesageneraltestofassocia)oninthe2-by-3tableofdisease-by-genotype.Thedominantandrecessivemodelsaretestsfortheminorallele(whichistheminorallelecanbefoundintheoutputofeitherthe--assocorthe--freqcommands.Thatis,ifDistheminorallele(anddisthemajorallele):Allelic: DversusdDominant: (DD,Dd)versusddRecessive: DDversus(Dd,dd)Genotypic:DDversusDdversusddAsmen)onedabove,thesetestsaregeneratedwithop)on:

plink--filemydata--modelwhichgeneratesafile plink.modelwhichcontainsthefollowingfields:CHR ChromosomenumberSNP SNPiden)fierTEST TypeoftestAFF Genotypes/allelesincasesUNAFF Genotypes/allelesincontrolsCHISQ Chi-squatedsta)s)cDF DegreesoffreedomfortestP Asympto)cp-value

31

Page 32: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Quan)ta)vetraitassocia)onQuan)ta)vetraitscanbetestedforassocia)onalso,usingeitherasympto)corempiricalsignificancevalues.Ifthephenotype(column6ofthePEDfileorthephenotypeasspecifiedwiththe--phenoop)on)isquan)ta)ve,thenPLINKwillautoma)callytreattheanalysisasaquan)ta)vetraitanalysis.

plink--filemydata--assocwillgeneratethefile

plink.qassocwithfieldsasfollows:CHR ChromosomenumberSNP SNPiden)fierBP Physicalposi)on(base-pair)NMISS Numberofnon-missinggenotypesBETA RegressioncoefficientSE StandarderrorR2 Regressionr-squaredT Waldtest(basedont-distrib)on)P Waldtestasympto)cp-valueIfpermuta)onswerealsorequested,thenanextrafile,eitherplink.assoc.permorplink.assoc.mpermwillbegenerated,dependingonwhetheradap)veormax(T)permuta)onwasused(seethenextsec)onformoredetails).Theempiricalp-valuesarebasedontheWaldsta)s)c. 32

Page 33: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Linearandlogis)cmodelsThesetwofeaturesallowformul)plecovariateswhentes)ngforbothquan)ta)vetraitanddiseasetraitSNPassocia)on,andforinterac)onswiththosecovariates.Thecovariatescaneitherbecon)nuousorbinary(i.e.forcategoricalcovariates,youmustfirstmakeasetofbinarydummyvariables).Inthissec)onweconsider:

BasicuasgeCovariateandinterac)onsFlexiblyspecifyingtheprecisemodelFlexiblyspecifyingjointtestsBasicusage

Forquan)ta)vetraits,useplink--bfilemydata--linear

Fordiseasetraits,specifylogis)cregressionwithplink--bfilemydaya–logis)c

Thesecommandswilleithergeneratetheoutputfileplink.assoc.linearorplink.assoc.logis)cdependingonthephenotype/commandused.Thebasicformatis:CHR ChromosomeSNP SNPiden)fierBP Physicalposi)on(base-pair)A1 Testedallele(minorallelebydefault)TEST Codeforthetest(seebelow)NMISS Numberofnon-missingindividualsincludedinanalysisBETA/OR Regressioncoefficient(--linear)oroddsra)o(--logis)c)STAT Coefficientt-sta)s)cP Asympto)cp-valuefort-sta)s)c

33

Page 34: Introduc)on to PLINK · 2016-10-19 · plink --file data --filter myfile.raw 1 --freq implies a file myfile.raw exists which has a similar format to phenotype and cluster files:

Adjustmentformul)pletes)ngTogenerateafileofadjustedsignificancevaluesthatcorrectforalltestsperformedandothermetrics,usetheop)on:

plink--filemydata--assoc--adjustwhichgeneratesthefile plink.adjustwhichcontainsthefieldsCHR ChromosomenumberSNP SNPiden)ferUNADJ Unadjustedp-valueGC Genomic-controlcorrectedp-valuesBONF Bonferronisingle-stepadjustedp-valuesHOLM Holm(1979)step-downadjustedp-valuesSIDAK_SS Sidaksingle-stepadjustedp-valuesSIDAK_SD Sidakstep-downadjustedp-valuesFDR_BH Benjamini&Hochberg(1995)step-upFDRcontrolFDR_BY Benjamini&Yeku)eli(2001)step-upFDRcontrolThisfileissortedbysignificancevalueratherthangenomicloca)on,themostsignificantresultsbeingatthetop.

19/10/16 GBIO0015 34


Recommended