Date post: | 30-Oct-2014 |
Category: |
Documents |
Upload: | gregor-gorjanc |
View: | 80 times |
Download: | 2 times |
Genomic simulation of complex
traits using AlphaDrop
Gorjanc G. & Hickey J. M.
COST RGB-Net
Rodica – Domžale, Slovenia15th October 2012
Introduction
• Whole-genome technologies � rich data
• In complex traits (e.g., body height, weight, …)
gene discovery still very limited
• Rich genome-wide data can be used for
prediction(classicaly based on phenotype and pedigree data)
• AIM: Show the simulation of different types of
information for prediction in complex traits
Phenotype
Different sources of information(simplistic scheme)
k-1 k k+1 k+2 k+3 k+4 k+5 k+6
… C C T A G A …
… G G A T C T …
… C C T A G A …
… G G A T C T …
… C C T A G A …
… G G A T C T …
… C T C A G A …
… G A G T C T …
… C T C A T A …
… G A G T A T …
… C T C A T A …
… G A G T A T …
+0 cm
+1 cm
+2 cm
QTLSNP SNP
Haplotypes
SNP
ABAABB
BB
SNP data – long format
[Header]BSGT Version 3.3.7Content BovineSNP50_A.bpm � imageNum SNPs 54001Total SNPs 54001Num Samples 191Total Samples 191[Data]SNP Name Sample ID Allel1 Allel2ARS-BFGL-BAC-10172 SLO110973 B BARS-BFGL-BAC-1020 SLO110973 - -ARS-BFGL-BAC-10245 SLO110973 - -ARS-BFGL-BAC-10345 SLO110973 A BARS-BFGL-BAC-10365 SLO110973 A AARS-BFGL-BAC-10375 SLO110973 B A…
AB format!!!
SNP data – wide format
SLO110973 2 - - 1 0 1 …
SLO110974 0 0 - 1 0 1 …
SLO110975 1 2 - 1 0 1 …
…
0, 1, 2 format!!!
Missing genotypes in reality?
„Computer eye“
Manual correction?
Methods - Idea1. Simulate individuals‘ genome
2. Allocate QTLs and SNPs to positions in genome
3. Sample QTL effects and compute (total) additive genetic value (AGV = sum of all QTL effects)
4. Sample phenotypic value based on additive genetic value and heritability ��
5. Statistical analysis of co-variation between phenotypic and genotypic data
6. Correlate estimated AGV with true AGV
MAJOR
„MINOR“
1. Simulate individuals‘ genome?
• A hard task!
• What is „going on under the hood“?
– „Mutation“ � SNP, MS, …
– Insertions
– Deletions
– Duplications � copy number variation
– Translocations
– Methylation � epigenetics
– …
– Linkage + Recombination
– Segregation
– …
Structural variation
Whole-genome sequence
haplotypes (human trios) – Chr1
A compromise
• DNA is a double helix (linear structure)
� simulate two „strings“ (haplotypes)
• Linkage + Recombination
� simulate „chunkular strings“ (structured haplotypes
manifesting linkage disequlibrium - LD) - – there are
many programs to do this (e.g., ms, MaCS, Freegene, …)
� sample chunk and position within the chunk where
crossing over occurs (flip haplotypes between gametes)
… B B A B A A …
… A B A A B A …
… B B A A B A …
… A B A B A A …
A compromise II
• Segregation
� base: take at random gametes from a „soup“
� non-base: take at random one gamete from parent
… B B A A B A …
… A B A B A A …
… B B B B B B …
… A B A B A A …
… B B B B B B …
… A B A B A A …
2. Allocate genes (QTL) and markers
• Sample position within the genome separately for QTL
and markers
• How many?
… B B A A B A …
… A B A B A A …
… B B B B B B …
… A B A B A A …
… B B B B B B …
… A B A B A A …
… B B A B A A …
… A B A A B A …
…
Ascertainment bias
• SNP chips are designed from few animals (often not
equally distributed among breeds, countries, …) and
such that markers are polymorphic
• Variation on SNP chip does not necesarilly represent the
variation of QTL closely!!!
For example
B + B B A A + B B A B B +
B + A A B A + A A A A B +
A - B A A A + A A B B A -
B + A A B B - A A A B B +
A - B B A B - A A B A A -
A - A B A B - B B A A B +
B + B B A A + B B A B B +
B + A A A A + A A B B A -
B + A A B A + A A A B B +
A - B A A B - A A A B B +
A - B B A B - A A B A A -
A - B A A A + A A B B A -
A - B B B B - A A B A A +
B + A A B B + A A B B A -
1 32
4
5
6
7
For example IIID DAD MUM SNP1 QTL1 SNP2 SNP3 SNP4 SNP5 QTL2 SNP6 SNP7 SNP8 SNP9 SNP10 QTL3 a y
1 / / B + B B A A + B B A B B +2,2 172,,0
B + A A B A + A A A A B +
2 / / A - B A A A + A A B B A --0,9 167,5
B + A A B B - A A A B B +
3 / / A - B B A B - A A B A A --3,9 168,6
A - A B A B - B B A A B +
4 2 1 B + B B A A + B B A B B +2,1 169,1
B + A A A A + A A B B A -
5 2 1 B + A A B A + A A A B B +-0,8 170,3
A - B A A B - A A A B B +
6 2 3 A - B B A B - A A B A A --2,0 165,9
A - B A A A + A A B B A -
7 2 3 A - B B A B - A A B A B +-0,9 168,1
B + A A B B + A A B B A -
ID DAD MUM SNP1 QTL1 SNP2 SNP3 SNP4 SNP5 QTL2 SNP6 SNP7 SNP8 SNP9 SNP10 QTL3 a y
1 / / BB / AB AB AB AA / AB AB AA AB BB / / 172,0
2 / / AB / AB AA AB AB / AA AA AB BB AB / / 167,5
3 / / AA / AB BB AA BB / AB AB AB AA AB / / 168,6
4 2 1 BB / AB AB AA AA / AB AB AB BB AB / / 169,1
5 2 1 AB / AB AA AB AB / AA AA AA BB BB / / 170,3
6 2 3 AA / BB AB AA AB / AA AA BB AB AA / / 165,9
7 2 3 AB / AB AB AB BB / AA AA BB AB AB / / 168,1
ID DAD MUM SNP1 QTL1 SNP2 SNP3 SNP4 SNP5 QTL2 SNP6 SNP7 SNP8 SNP9 SNP10 QTL3 a y
1 / / 2 / 1 1 1 0 / 1 1 0 1 2 / / 172,0
2 / / 1 / 1 0 1 1 / 0 0 1 2 1 / / 167,5
3 / / 0 / 1 2 0 2 / 1 1 1 0 1 / / 168,6
4 2 1 2 / 1 1 0 0 / 1 1 1 2 1 / / 169,1
5 2 1 1 / 1 0 1 1 / 0 0 0 2 2 / / 170,3
6 2 3 0 / 2 1 0 1 / 0 0 2 1 0 / / 165,9
7 2 3 1 / 1 1 1 2 / 0 0 2 1 1 / / 168,1
For example IIIAA ���� 0 AB ���� 1 BB ���� 2
�� � ��� ���� ���� � ���
�� � ��� ���� ����
3. Sample QTL effects
• What is the true state of nature? � Nobody knows
• Additive
� simple, i.e, linear regression on 0, 1, 2
• Dominance
� interaction between alleles on one locus
• Epistasis
� interaction between alleles on different loci
• Imprinting (parent of origin effect)
� interaction between parental origin and allele
• … ???
� � � ��� � � ���� � � ����
������� � � � � � � �� � � � �� � ��
�� � � ��� � � ���� � � ����
������� � � � � � � �� � � � �� � ���
� � ����
Additive
QTL effect size - popular choices
• Gaussian distribution
� most of complex traits support infinitesimall model
with very many loci having very small effects
• Gamma distribution
� can accomodate few genes with very large effect
(major genes) as for example DGAT in %fat in milk
QTL
QTL
AlphaDrop in a nutshell
• Simple program
• Coalescent simulation of haplotypes with MaCS(Chen et al., 2009)
– mutation, recombination
• Dropping haplotypes through the pedigree
– recombination, segregation, selection
• Result
– SNP, haplotype, and sequence data(efficient internal compression: 01010110 � long integer)
– QTL position and effect and genetic values 7
EXEMPLAR APPLICATION
Methods - Simulation
1. Coalescent simulation of haplotypes (4000 on 30 Chr)
– mutation, recombination, drop in Ne
2. Dropping haplotypes through the pedigree(animal breeding scenario)
– recombination, segregation
– 50 sires × 10 dams × 2 progeny � 1000 animals / generation
– 10 generations
– QTL effects from Gaussian or gamma distribution
– phenotypes from Gaussian distribution, �� � ���
– 50K SNP genotypes
• AlphaDrop software (Hickey & Gorjanc, 2012)
AGV
Methods – Simulated data
1
Genotype
Pedigree
Phenotype
Genotype
2
3
4
5
6
7
8
9
10
Genotype
Genotype
ValidationCalibrationGen.
Methods – Statistical analysis
• Statistical analysis of co-variation between phenotypic and genotypic data with linear mixed model
!"# $% � &� '()�
• … accounting for relationships between individuals *
�"# +*(,�
• * based on:– pedigree,
– SNP, „proxies“– haplotype, or
– QTL data � „genetic variation that matters“
�� �(,
�
()� � (,
�
GWAS vs. relationship modelling
• GWAS
• Relationships � use the same underlying information
(phenotype and genotype data) to infer the sum of all
GWAS estimates
Haplotype similarity
• Long haplotypes � „explosion“ in #haplotypes
• But parts of haplotypes are similar
� efective number of haplotypes is smaller
• Similarities(several variations
tested)
k-1 k k+1 k+2 k+3 k+4 k+5 k+6
… C C T A G A …
… G G A T C T …
… C T C A G A …
… G A G T C T …
… C T C A T A …
… G A G T A T …
Haplotype 1 Haplotype 2 Haplotype 3
Haplotype 1 6/6 4/6 3/6
Haplotype 2 6/6 5/6
Haplotype 3 6/6
Results – Gaussian QTL
QTL Pedigree SNP V SNP Y
Haplotypes – no similarity
Haplotypes – similarity 1 Haplotypes – similarity 2
QTL
Results – Gamma QTL
QTL Pedigree SNP V SNP Y
Haplotypes – no similarity
Haplotypes – similarity 1 Haplotypes – similarity 2
QTL
Conclusions
• Genome-wide information increases accuracy in comparison to classic methods using pedigrees and phenotypes only
• Long haplotypes � large #haplotypes
– low accuracies
– similarities help
– no advantage over SNP data (perhaps due to large #haplotypes)
• Accuracies drop in further generations(not so much with Gamma QTL data)
� can not predict distant relatives or unrelated individuals accurately!!!
• Even with QTL data accuracies are not perfect!!!
Genomic simulation of complex
traits using AlphaDrop
Gorjanc G. & Hickey J. M.
COST RGB-Net
Rodica – Domžale, Slovenia15th October 2012