Date post: | 01-Jul-2015 |
Category: |
Science |
Upload: | ctitusbrown |
View: | 420 times |
Download: | 0 times |
Exploring Marek’s Disease Resistance with RNAseq
C. Titus BrownMichigan State University
Genetic resistance to Marek’s Disease
• MHC (B) locus has a major influence on MD resistance
• Several haplotypes of B locus have been found to correlate with resistance– B21 most resistance– B19 susceptibility
• Lines 6 and 7 (ADOL*) are B2 homozygous, but line 6 is resistant and line 7 is susceptible to MD
• Relatively few non-MHC genes have been identified*Avian disease and Oncology Laboratory, East Lansing
Research Goal
• Identify non-MHC genes influencing MD resistance from a genome-wide gene and isoform expression analysis based on RNA-Seq data
• Generate hypotheses for studying the mechanism controlling MD resistance
Collaboration with Hans Cheng (ADOL) and Jerry Dodgson (MSU)
Dr. Likit Preeyanon
Research PlanGCCGCGGTTCCGTGGTT
ACCGCGGTGGTGGTTACCGCGTTTGTGGTT
ACCGCGGTGGTGGTTACCGCGGTCCGTGGCC
CCCGCGGTGGTGGTT
Differential Gene Expression
Pathway Analysis
A B C D
B CA D
Differential Exon Usage
Lines 6 and 7Control and infected (4 dpi)
Single-endand Paired-end
Illumina Sequencing
Dr. Likit Preeyanon
RNA-Seq MethodAAAAAAAA
AAAAAAAA
AAAAAAAA
AAA
AAA
AAA
Fragmented and sequenced
Short reads (<200bp)
Adapted from Shirley et al Nat Methods 2009
Gene models and isoforms are woefully incomplete –e.g. ENSEMBL missing many exon-exon junctions.
De novo reconstruction
Ab initio reconstructionDr. Likit Preeyanon
GIMME: Software for Merging Gene Models
Assembly-based
Local Assembly
GIMME
Reference-guided
MergedModels
In-house software
Dr. Likit PreeyanonDr. Likit Preeyanon
Merged Gene ModelsGlobal Assembly
Local Assembly
Reference-guided
Merged (consensus) Model
Newly predicted isoform
Merged models connect fragmented gene models & provide new isoforms
Merged models can glue fragmented gene models and
include unannotated isoforms.
Gene BGene A
Gene A
Reference-guided
Merged model
IDH3A Gene – now with both UTRs!
Merged
RefSeq
ENSEMBL
UTR
IDH3A– different models, different predicted expression…
SE : single-end, PE: paired-end
Not signif..
Signif
Differentially Expressed Genes from Different Gene Model Sets …Differ.
DE genes by EBseq FDR < 0.05
Ref-guided
Ref-guided
In addition, many of the diff expr genes are not annotated in KEGG
Ref-guided
GOseq FDR 0.05
Chicken + HumanKEGG Pathway
40 pathways
Must merge in human KEGG
annotations
Enriched KEGG Pathways by GOSeq
GOseq FDR < 0.05
Biological Processes (BP) categories involved in Adaptive Immune Responses are Enriched in Line 7 (susceptible)
GO ID Description Adjusted p-value
0009615 Response to virus 0.00023
0050670 Regulation of lymphocyte proliferation
0.00048
0002252 Immune effector process 0.00068
0051249 Regulation of lymphocyte activation
0.0027
0042129 Regulation of T cell proliferation
0.0032
0002250 Adaptive immune response 0.0106
At early stage of infection, elicitation of the adaptive immune responsesappears to be delayed in line 6.
Isoform Expression Estimation
Gene Expression = 400x
20%
80%
Gene Expression = 405x
2%
98%
Sample A
Sample B
How to Estimate Isoform ExpressionSpliced reads
Differential Exon Usage of ITGB2 Gene from MISO
Spliced reads
Percent Spliced In (Ψ)
Read coverage
Genes with predicted differential splicing can be categorized into four groups
Cutoff = 0.2
6 Ctrl
6 Inf
7 Ctrl
7 Inf
1
1
1
1
0
0
0
0
Group I
11 Genesψ
1
1
1
1
0
0
0
0
Group II
19 Genesψ
1
1
1
1
0
0
0
0
Group III
20 Genesψ
0 1
0 1
0 1
0 1
Group IV
1 Genesψ
The main point
• We are completely at the mercy of annotations to interpret our large-scale data.
• Need more experimental information!• But also, better methods => better signal
Concluding thoughts (I)
• Computational analysis of high-throughput sequencing data can help refine hypotheses, but cannot conclusively resolve mechanism.
• Don’t knock “refining hypotheses”, though! Complex biological phenomena like disease are refractory to simplifying assumptions.
Concluding thoughts (II)
• Much of the -omic data being gathered by all of you has utility far beyond your specific research question.
• This is particularly true in “semi-model” organisms where annotations are generally poor and not species-specific, and where there may be significant intra-species variation.
• How can we better share this data, to make faster and better progress?
Where should we spend our –omics money?
• Improving genomes is still expensive and requires significant technical expertise.
• mRNAseq is inexpensive, broadly useful and wonderful for building better gene models.
• Proteomics and metabolomics?• Better tools, annotation, and data sharing and
exploration portals are critically important to the future of (agricultural genomics.
Thanks!