Statistical Bioinformatics
• QTL mapping• Analysis of DNA sequence alignments• Postgenomic data integration• Systems biology
Statistical Bioinformatics
• QTL mapping• Analysis of DNA sequence alignments• Postgenomic data integration• Systems biology
2_056314.01_050215.22_142320.02_010722.71_094323.52_039437.12_126138.81_078742.52_086443.51_111044.21_145245.21_084746.72_017350.51_052551.61_107352.21_029653.21_083755.32_115355.92_133861.81_106169.01_105473.51_063876.31_127281.12_074882.01_033085.51_001286.41_047487.21_009988.11_003592.41_019193.52_037494.62_109496.71_112198.91_0545101.02_1144101.91_0265104.51_0407105.41_0717106.02_0667107.22_0528108.21_0650109.32_0419110.21_0952111.51_0196115.82_0489117.51_0475130.21_0936131.11_0969132.31_0214136.92_1007141.01_0649141.81_0731164.42_1396165.11_0429167.72_1220175.22_0511176.31_0092177.42_1370183.32_1406184.31_1486185.31_0109187.51_0656188.32_1125189.91_0065190.92_0715197.41_0625206.01_0566206.91_0315208.41_0714211.91_0487213.01_0181217.62_0293218.82_1436223.02_1099231.62_0561233.8
0
20
40
60
80
10
0
2H
Rust ratio
U2453
U7845
U6615
Mixed models for QTL by environment analysis
Mixed models represent correlations over sites and models differences in environmental variance: allows tests for QTL by environment interactions
Eg marker Rub2a1 on LG3 shows a consistent effect raspberry total anthocyanins over 7 environments
ad
0.4
bc
-0.6
-0.4
0.0
bd
0.2
-0.8
acaverage s.e.d.
-0.2
To
tal A
nth
ocya
nin
Rub2a1 genotype
average s.e.d.
P2007Mean
Antho_poly_08Antho_PT_08
F2006MeanF2007Mean
Antho_M24_08PT2007Mean
eQTL analysis using pairs of barley DHs on a two-colour microarray
2_056314.01_050215.22_142320.02_010722.71_094323.52_039437.12_126138.81_078742.52_086443.51_111044.21_145245.21_084746.72_017350.51_052551.61_107352.21_029653.21_083755.32_115355.92_133861.81_106169.01_105473.51_063876.31_127281.12_074882.01_033085.51_001286.41_047487.21_009988.11_003592.41_019193.52_037494.62_109496.71_112198.91_0545101.02_1144101.91_0265104.51_0407105.41_0717106.02_0667107.22_0528108.21_0650109.32_0419110.21_0952111.51_0196115.82_0489117.51_0475130.21_0936131.11_0969132.31_0214136.92_1007141.01_0649141.81_0731164.42_1396165.11_0429167.72_1220175.22_0511176.31_0092177.42_1370183.32_1406184.31_1486185.31_0109187.51_0656188.32_1125189.91_0065190.92_0715197.41_0625206.01_0566206.91_0315208.41_0714211.91_0487213.01_0181217.62_0293218.82_1436223.02_1099231.62_0561233.8
0
20
40
60
80
100
2H
Rust ratio
U2453
U7845
U6615
A distant pair design gives more informative pairs than a random design (horizontal line)
Significant (p < .001) QTLs were detected for 9557 out of 15208 genes
Most significant QTL for rust resistance mapped to 2H: 23 genes with highly correlated expression also mapped to the same region
Taking QTL analysis further
• Analysis of more complex populations – moving from a single biparental cross through multiple related crosses to general association mapping populations.
• Analysis of high-dimensional phenotypic trait data (expression data, metabolomic data etc), including network-based approaches
• QTL analysis of processes (raspberry ripening, water use? Process of biofuel production?)
• Linkage analysis: review statistical methods, especially clustering, behind some marker technologies. Analysis of blackcurrant (454 sequencing) and sugarcane (Dart) show that more information can be obtained by working directly on continuous underlying data (intensities).
Statistical Bioinformatics
• QTL mapping• Analysis of DNA sequence alignments• Postgenomic data integration• Systems biology
Molecular Sequence Analysis
• Intragenic recombination detection - method Various methods developed at BioSS (DSS, PDM,HMM)
• TOPALi - software User-friendly access to statistical phylogenetic methods
• Molecular sequence alignment - analysis automation Phylogenetic tree/ model selection selection
• Positive (diversifying) selection - methods appliedUse of state-of-the-art methodology for detection of functionally significant amino acid sites in proteins.
• Comparative genomics analysis – growth area Phylogenetic tree estimation using many loci
• Population genetic structure analysis – growth area
• Optimal use of Next Generation Sequence data
development
Statistical Bioinformatics
• QTL mapping• Analysis of DNA sequence alignments• Postgenomic data integration• Systems biology
Example: Human nutrigenomics study
10 volunteers observed over 10 time points
Various body fluids (blood, urine,saliva) collected
Samples analyzed by various ‘omics’ techniques
Statistical Bioinformatics
• QTL mapping• Analysis of DNA sequence alignments• Postgenomic• Systems biology
Can we learn the signalling pathway from data?
From Sachs et al Science 2005
Cell membrane
Receptor molecules
Inhibition
Activation
Interaction in signalling pathway
Phosphorylated protein
Circadian rhythms in Arabidopsis thalianaCollaboration with the Institute of Molecular Plant Sciences at Edinburgh University
(Andrew Millar’s group)
T28 T20
Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4,ELF3, GI, PRR9, PRR5, and PRR3
Two gene expression time series measured with Affymetrix arrrays under constant light condition at 13 time points: 0h, 2h,…, 24h, 26h
Plants entrained to different light:dark cycles 10h:10h (T20) and 14h:14h (T28)
Circadian genes in Arabidopsis thaliana, network learned from two time series over 13 time points
CCA1
LHY
PRR9
GI
ELF3
TOC1
ELF4
PRR5
PRR3
“False positives”“False negatives”
Overview of the plant clock model
X
LHY/ CCA1
TOC1Y (GI)PRR9/ PRR7
Morning Evening
Locke et al. Mol. Syst. Biol. 2006
Sensitivity = TP/[TP+FN] = 62%
Specificity = TN/[TN+FP] = 81%
Overview of the plant clock model
X
LHY/ CCA1
TOC1Y (GI)PRR9/ PRR7
Morning Evening
Locke et al. Mol. Syst. Biol. 2006
Sensitivity = TP/[TP+FN] = 62%
Specificity = TN/[TN+FP] = 81%
Yes
Yes
Yes
Yes
Correct sign
Future work
• Integration of mechanistic and machine learning models
• Latent variable models for post-translational modifications
• Network inferences from eQTL type data• Allowing for heterogeneity and non-stationarity
Can we learn the protein signalling pathway
from protein concentrations?
Raf pathway
Flow cytometry data from 100 cells
Sachs et al., Science 2005