RNA surveillance and degradation: the Yin Yang of RNA

RNA surveillance and degradation: the Yin Yang of RNA

RNA Pol II

AAAAAAAAAAA

AAA

production

destruction

RNA

Ribosome

MODEL:

*

**

*

AAAAA

Exosome

Degradation of hypomodified tRNAi

Met

Hypomodified tRNAiMet

*

**

*

Polyadenylationby Trf4p

*

**

*

AAAAAMtr3p Rrp41p

Rrp45p

Rrp40p

Rrp46p

Rrp42p

Rrp4p

Rrp43p

Rrp44p

Csl4p

*

*- Hypothetical diagram of the exosome

Rrp6p

Trf4p

Mtr4

Workflow

Next Gen sequencing PolyA-Seq

Mtr4

TRAMP Complex

Papd5

ZCCHC7

siRNA knockdown

AAAA

AAAA

AAAA

AAAAAA

AA

AAAA

AAAA

Library creation for NGS

Map paired end reads to genome• BWA (Burrows-Wheeler Aligner) Algorithm used to

map each pair of reads to the genome• Report each pair of reads as a single nucleotide

position within the genome where polyadenylation detected in an RNA sample

• Average insert size 300– Read size ~45

TTTp-5’

AAAA-3’3’-A

Raw reads vs Mapped readsData type/kd type Raw reads Mapped reads positions

Replicate Data

Mtr4 15,135,078 10,853,534 651,551

Ctrl 16,348,780 11,708,310 652,128

Rrp6 15,971,926 12,388,266 705,173

Original data

Mtr4 ND 34,204,534 1,124,968

Ctrl ND 7,195,942 582,256

Rrp6 ND 8,241,505 597,672

Normalization of data: reads per million (rpm)

Analysis

• Starting with refseq database– Raw read counts converted to reads per million

• Reads at position/total reads in sample

– Remove all non-coding RNAs– From each sample collect normalized reads

mapping at the 3’ end +/- 50 bases of each refseq encoding protein

– Dot Plot normalized reads on log scale, X axis=control and Y axis=mMtr4KD

mRNA polyadenylation does not change between Mtr4 and control KD

R2=0.95141

Problems encountered

• Sequencing read depth very different in the original data– 34 mil mapped reads in one sample 8 mil in other

• Lack of 3 replicates for robust statistical analysis of data

• Removal of internal A– Seq reads that map to a oligoadenylate track in the

genome– Algorithm developed misses many– Manual removal takes too much time.

Remove Internal A

AAAAAAAAAAAAAAAA

TTTTTTTTTTTTTTTTTT

How to mine the data based on a hypothesis

• Hypothesis: PolyA+ RNAs of unknown identity will accumulate upon depletion of mMtr4 vs. the control.– How can the transcriptome be queried?– How detailed should a query be?

• Every pA position, or only those exhibiting greater than x number of raw/normalized reads?

• How do we find significant differences with one sample, or possibly two?

• How can repetitive elements be accounted for in the data?

Custom annotation to remove bias from existing annotations

• Data mapped with Bowtie to mouse genome mm10 build

• Mapped data from KD and control compared using cufflinks to explore gene expression differences using a custom annotation

• Custom annotation– 1000 base pair genes with 500 base pair overlap

with next gene• This did not work well

Problems with using custom annotation

• First real problem was the no computing could handle more than 5000 genes of the custom annotation at a time– One chromosome had 147K genes

• There was a problem with assignment when the reads overlapped– Cuffdiff would randomly assign the reads to only one of the

genes.• Overlaps split into two fasta files, but we could not capture

differences in the data that we knew exists.– cuffdiff collects data from the entire 1000 bp gene and

compares between 2 samples– This method leads to false negatives for pA data where the

focus is on one or a few positions as a pA event.

What next?

F-Seq• Tags to identify specific sequence features for different library

preparations (ChIP-seq), (DNase-seq) and (pA-seq). • Will summarize and display individual sequence data as an

accurate and interpretable signal, by generating a continuous tag sequence density estimation.

Generating Peaks with FSeq• 1. Estimate kernel density to estimate pdf

• 2. compute threshold– nw=nw/L.– xc,– Repeat step 2 k times– s SDs above the mean

• 2.1 threshold output module is modifiable

Magnitude of data: one sample both strands

51 million bases of Chromosome 12

12 thousand bases of Chromosome 12

Chromsome 12 is 121 million base pairs long

rRNA workflow

18S 28S5.8S

pA reads intersecting 45S pre-rRNA

pA reads intersecting 45S pre-rRNA

18S 28S5.8S

Accumulation of micro RNA processed 5’ leader upon depletion of Mtr4

• Comparison of Mtr4 V. Control KD• Abundant polyA found near 5’ end of annotated Mir322• Confirmed using molecular technique

Date post:	18-Jan-2016
Category:	Documents
Upload:	dex
View:	22 times
Download:	1 times

RNA surveillance and degradation: the Yin Yang of RNA

Documents