Download - Improving miRNA Target Genes Prediction

Improving miRNA Target Genes Prediction

Rikky Wenang Purbojati

miRNA MicroRNA (miRNA) is a class of RNA which is

believed to play important roles in gene regulation.

It’s a short (21- to 23-nt) RNAs that bind to the 3′ untranslated regions (3′ UTRs) of target genes.

miRNA Functions miRNA plays a major role in RNA Induced

Silencing Complex (RISC). miRNAs control the expression of large

numbers of genes by: mRNA degradation Translational repression

Recent studies indicates it plays a role in cancer development: Surplus of miRNA might inhibit cell apoptosis

process Deficit of miRNA might cause excess of certain

oncogenes

RNA Induced Silencing Complex mRNA degradation

Breaks the structural integrity of a mRNA. Translational repression

Prevent the mRNA from being translated.

Characteristics of miRNA Short (22-25nts) Transcripted from a miRNA gene

Intragenic: miRNA gene is located inside a host gene (usually intron region)

Intergenic: miRNA gene is located outside gene bodies

A consistent 5’ and 3’ boundary: Transcription Start Site 5’ Cap Poly(A) tail

Development of miRNA

miRNA General Research Question Much attention has been directed in miRNA

processing and targeting. Computational-wise, one basic challenge of

miRNA:Given a miRNA sequence, what are its target genes?

miRNA sequence target prediction Predict target genes by matching the

complement of miRNA sequence. Two types of complement:

Perfect complement

Imperfect complement

Find perfect match for seed (2-8nt)

miRNA sequence target prediction Several requirements for matching:

Strong Watson-Crick base pairing of the 5’ seed (2-8 nts)

Conservation of the miRNA binding site across species

Another approach: thermodynamic rule Local miRNA-mRNA interaction with positive

balance of minimum free energy

Problems and Opportunities Problem:

Pure computational target genes prediction produces a lot of candidates No unifying theory for target gene prediction yet Most of them are not validated yet Common assumption is that most of them are

false positives Can we shorten the list to include only the strong

candidates ?

Problems and Opportunities Opportunity:

Lots of publicly available experimental dataset i.e. cDNA microarray, miRNA microarray, etc. Use the dataset to computationally validate some

of the target genes

Current Research:Preliminary research tries to utilizes the abundance of publicly available microarray data.

Assumptions

miRNA works by silencing target genes, thus miRNA gene and target genes should be anti-correlated

Intragenic miRNA are expressed along with the host gene. a host gene should be anti-correlated with a target

gene Intergenic miRNA does not have a host gene, but

we might be able to use available composite (miRNA microarray + cDNA microarray) dataset If a miRNA is up-regulated in miRNA microarray, then

its target genes should be down-regulated in cDNA microarray

Current Work There have been some works related to this

idea (i.e. HOCTAR) However, we can improve it by:

Using a stricter criteria across the microarray data Using a more diverse data

We expect we will get a much better specifity than the previous method

Hoctar Method Get a list of target genes from 3 different tools

(pictar, TargetScan,miranda) Uses Pearson correlation to determine the

correlation coefficient between 2 genes Include target genes which have correlation

below some threshold (-) Only works for intragenic miRNA

Hoctar Method

Shortcomings of Hoctar

Uses all probes data even though they are not consistent

Uses only one target gene prediction algorithm approach

Depends on Pearson Correlation, which is sensitive to outliers

Improvement Idea (1) Use only subset of data which probes are all

consistent Treat each probes as different experiments

Improvement Idea (2)

Pearson correlation is very sensitive to outliers, alternative solutions: Uses Rank correlation coefficients instead of

Pearson correlation coefficients Normalize the dataset to normal distribution Ignore outliers

Improvement Idea (3) In addition to probes consistency and rank correlation,

we might use entropy rule in eliminating candidate target genes

Assumption: Transcript level can be approximated from expression level

data One miRNA transcript can only degrade one mRNA transcript Thus miRNA expression changes should not be much

different from mRNA expression changes

Improvement Idea (4) Uses a larger amount of microarray data We might be able to include miRNA microarray

to further refine target genes list for several miRNA

Preliminary Result GSE9234 dataset (hipoxia/normoxia) Using only consistency criteria

miRNA Host Gene Known Target Gene

HOCTAR Refined

miR-103-2 PANK3 GPD1 YES YESmiR-103-2 PANK3 FBW1B NO YESmiR-140 WWP2 HDAC4 YES YESmiR-224 GABRE API5 NO NO

Refining Intergenic miRNA prediction Refining intergenic miRNA prediction using

microarray dataset is not a trivial task Microarray can only be used to measure the

expression of target genes, but not the miRNA gene

Might have to rely on additional data: Proxy measurement miRNA microarray

Intergenic miRNA proxy measurement Putative target gene approximation

use the expression level of a known target genes for that specific intergenic miRNA

If its target genes are consistently down-regulated, then we can assume that the expression level of the intergenic miRNA gene is up-regulated

Cluster miRNA approximation Some intergenic miRNAs are clustered with each

other; according to (Saini et al. 2007) most of these clusters use the same pri-mirNA transcript

Use method 1 for neighboring miRNA to get the intergenic miRNA expression approximation

Further Work Implementation and evaluation Standardizing composite dataset repository