2014 wcgalp

Exploring Marek’s Disease Resistance with RNAseq

C. Titus BrownMichigan State University

Genetic resistance to Marek’s Disease

• MHC (B) locus has a major influence on MD resistance

• Several haplotypes of B locus have been found to correlate with resistance– B21 most resistance– B19 susceptibility

• Lines 6 and 7 (ADOL*) are B2 homozygous, but line 6 is resistant and line 7 is susceptible to MD

• Relatively few non-MHC genes have been identified*Avian disease and Oncology Laboratory, East Lansing

Research Goal

• Identify non-MHC genes influencing MD resistance from a genome-wide gene and isoform expression analysis based on RNA-Seq data

• Generate hypotheses for studying the mechanism controlling MD resistance

Collaboration with Hans Cheng (ADOL) and Jerry Dodgson (MSU)

Dr. Likit Preeyanon

Research PlanGCCGCGGTTCCGTGGTT

ACCGCGGTGGTGGTTACCGCGTTTGTGGTT

ACCGCGGTGGTGGTTACCGCGGTCCGTGGCC

CCCGCGGTGGTGGTT

Differential Gene Expression

Pathway Analysis

A B C D

B CA D

Differential Exon Usage

Lines 6 and 7Control and infected (4 dpi)

Single-endand Paired-end

Illumina Sequencing

Dr. Likit Preeyanon

RNA-Seq MethodAAAAAAAA

AAAAAAAA

AAAAAAAA

AAA

AAA

AAA

Fragmented and sequenced

Short reads (<200bp)

Adapted from Shirley et al Nat Methods 2009

Gene models and isoforms are woefully incomplete –e.g. ENSEMBL missing many exon-exon junctions.

De novo reconstruction

Ab initio reconstructionDr. Likit Preeyanon

GIMME: Software for Merging Gene Models

Assembly-based

Local Assembly

GIMME

Reference-guided

MergedModels

In-house software

Dr. Likit PreeyanonDr. Likit Preeyanon

Merged Gene ModelsGlobal Assembly

Local Assembly

Reference-guided

Merged (consensus) Model

Newly predicted isoform

Merged models connect fragmented gene models & provide new isoforms

Merged models can glue fragmented gene models and

include unannotated isoforms.

Gene BGene A

Gene A

Reference-guided

Merged model

IDH3A Gene – now with both UTRs!

Merged

RefSeq

ENSEMBL

UTR

IDH3A– different models, different predicted expression…

SE : single-end, PE: paired-end

Not signif..

Signif

Differentially Expressed Genes from Different Gene Model Sets …Differ.

DE genes by EBseq FDR < 0.05

Ref-guided

Ref-guided

In addition, many of the diff expr genes are not annotated in KEGG

Ref-guided

GOseq FDR 0.05

Chicken + HumanKEGG Pathway

40 pathways

Must merge in human KEGG

annotations

Enriched KEGG Pathways by GOSeq

GOseq FDR < 0.05

Biological Processes (BP) categories involved in Adaptive Immune Responses are Enriched in Line 7 (susceptible)

GO ID Description Adjusted p-value

0009615 Response to virus 0.00023

0050670 Regulation of lymphocyte proliferation

0.00048

0002252 Immune effector process 0.00068

0051249 Regulation of lymphocyte activation

0.0027

0042129 Regulation of T cell proliferation

0.0032

0002250 Adaptive immune response 0.0106

At early stage of infection, elicitation of the adaptive immune responsesappears to be delayed in line 6.

Isoform Expression Estimation

Gene Expression = 400x

20%

80%

Gene Expression = 405x

2%

98%

Sample A

Sample B

How to Estimate Isoform ExpressionSpliced reads

Differential Exon Usage of ITGB2 Gene from MISO

Spliced reads

Percent Spliced In (Ψ)

Read coverage

Genes with predicted differential splicing can be categorized into four groups

Cutoff = 0.2

6 Ctrl

6 Inf

7 Ctrl

7 Inf

1

1

1

1

0

0

0

0

Group I

11 Genesψ

1

1

1

1

0

0

0

0

Group II

19 Genesψ

1

1

1

1

0

0

0

0

Group III

20 Genesψ

0 1

0 1

0 1

0 1

Group IV

1 Genesψ

The main point

• We are completely at the mercy of annotations to interpret our large-scale data.

• Need more experimental information!• But also, better methods => better signal

Concluding thoughts (I)

• Computational analysis of high-throughput sequencing data can help refine hypotheses, but cannot conclusively resolve mechanism.

• Don’t knock “refining hypotheses”, though! Complex biological phenomena like disease are refractory to simplifying assumptions.

Concluding thoughts (II)

• Much of the -omic data being gathered by all of you has utility far beyond your specific research question.

• This is particularly true in “semi-model” organisms where annotations are generally poor and not species-specific, and where there may be significant intra-species variation.

• How can we better share this data, to make faster and better progress?

Where should we spend our –omics money?

• Improving genomes is still expensive and requires significant technical expertise.

• mRNAseq is inexpensive, broadly useful and wonderful for building better gene models.

• Proteomics and metabolomics?• Better tools, annotation, and data sharing and

exploration portals are critically important to the future of (agricultural genomics.

Thanks!

Date post:	01-Jul-2015
Category:	Science
Upload:	ctitusbrown
View:	420 times
Download:	0 times

2014 wcgalp

Science