+ All Categories
Home > Documents > Nucl. Acids Res.-2005-Qiu-1834-47

Nucl. Acids Res.-2005-Qiu-1834-47

Date post: 03-Jun-2018
Category:
Upload: bhupendra10059iari
View: 219 times
Download: 0 times
Share this document with a friend

of 14

Transcript
  • 8/12/2019 Nucl. Acids Res.-2005-Qiu-1834-47

    1/14

    A computational study of off-target effects ofRNA interferenceShibin Qiu, Coen M. Adema1 and Terran Lane*

    Department of Computer Science and 1Department of Biology, University of New Mexico, Albuquerque,NM 87131, USA

    Received December 13, 2004; Revised February 19, 2005; Accepted March 7, 2005

    ABSTRACT

    RNA interference (RNAi) is an intracellular mechan-ism for post-transcriptional gene silencing that is

    frequently used to study gene function. RNAi is initi-ated by short interfering RNA (siRNA) of 21 nt inlength, either generated from the double-stranded

    RNA(dsRNA)byusingtheenzymeDicerorintroducedexperimentally. Following association with an RNAisilencing complex, siRNA targets mRNA transcriptsthat have sequence identity for destruction. A pheno-type resulting from this knockdown of expressionmay inform about the function of the targeted gene.

    However, off-target effects compromise the speci-ficity of RNAi if sequence identity between siRNAand random mRNA transcripts causes RNAi toknockdown expression of non-targeted genes. Thecomplete off-target effects must be investigated sys-tematically on each gene in a genome by adjusting agroup of parameters, which is too expensive to con-

    duct experimentally and motivates a study in silico.This computational study examined the potential foroff-target effects of RNAi, employing the genome

    and transcriptome sequence data of Homo sapiens,Caenorhabditis elegans and Schizosaccharomycespombe. The chance for RNAi off-target effects provedconsiderable, ranging from 5 to 80% for each of theorganisms, when using as parameter the exact iden-tity between any possible siRNA sequences (arbitr-ary length ranging from 17 to 28 nt) derived from adsRNA (range 100400 nt) representing the codingsequences of targetgenesand allother siRNAswithin

    the genome. Remarkably, high-sequence specificity

    and low probability for off-target reactivity wereoptimally balanced for siRNA of 21 nt, the length

    observed mostly in vivo. The chance for off-targetRNAi increased (although not always significantly)

    with greater length of the initial dsRNA sequence,inclusion into the analysis of available untranslatedregion sequences and allowing for mismatchesbetween siRNA and target sequences. siRNA seq-uences from within 100 nt of the 5 0 termini of coding

    sequences had low chances for off-target reactivity.This may be owing to coding constraints for sig-

    nal peptide-encoding regions of genes relative toregions that encode for mature proteins. Off-targetdistribution varied along the chromosomes ofC.elegans,apparentlyowingtotheuseofmoreuniquesequences in gene-dense regions. Finally, biologicaland thermodynamical descriptors of effective siRNAreduced the number of potential siRNAs comparedwith those identified by sequence identity alone,

    but off-target RNAi remained likely, with an off-target error rate of10%. These results also suggesta direction for future in vivostudies that could bothhelp in calibrating true off-target rates in livingorganisms and also in contributing evidence towardthe debate of whether siRNA efficacy is correlatedwith, or independent of, the target molecule. In sum-mary, off-target effects present a real but not pro-hibitive concern that should be considered for RNAi

    experiments.

    INTRODUCTION

    RNA interference (RNAi) (1) is an intracellular mechanismfor post-transcriptional gene silencing that most probablyfunctions in the regulation of gene expression and defenseagainst transposable DNA elements and viruses. RNAi is trig-gered by double-stranded RNA (dsRNA). Dicer, an enzyme

    with RNAse activity, cleaves dsRNA into fragments of21 nt,termed short interfering RNA (siRNA). The siRNA associateswith several proteins to form an RNAi silencing complex(RISC). The sequence of the minus-strand of the siRNA then

    *To whom correspondence should be addressed at Department of Computer Science, University of New Mexico, Farris Engineering Building Room 325,Albuquerque, NM 87131-1386, USA. Tel: +1 505 277 9609; Fax: +1 505 277 9627; Email: [email protected]

    The Author 2005. Published by Oxford University Press. All rights reserved.

    The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open accessversion of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Pressare attributed as the originalplace of publication withthe correct citationdetailsgiven; if an article is subsequentlyreproduced or disseminatednot in its entiretybutonly in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected]

    18341847 Nucleic Acids Research, 2005, Vol. 33, No. 6doi:10.1093/nar/gki324

    Published online March 30, 2005

  • 8/12/2019 Nucl. Acids Res.-2005-Qiu-1834-47

    2/14

    targets mRNA molecules that have sequence identityfor cleavage by RISC. This sequence-directed removal ofparticular mRNA transcript yields a knockdown of expressionof the affected gene. Extensive investigations are ongoing togain more detailed understanding of RNAi. RNAi has beenwidely used as an experimental tool for the study of gene

    function and can be applied for large-scale analyses (24).RNAi has aroused a great deal of excitement in both thera-peutic and genomic experimental communities because ofits potentials for the treatment of a wide spectrum of dis-eases, such as HIV (5,6), spinocerebellar ataxia type 1 andHuntingtons diseases (7), certain classes of cancers (810)and hypercholesterolemia (11,12), as well as its demonstrateduse in functional genomic studies via controlled geneknockdown (1315).

    Both dsRNA and siRNA have been used to knockdown theexpression of genes of interest. Resulting phenotypes are thenused to infer gene function. Unfortunately, RNAi is not with-out some complications. Empirically, RNAi was shown tofunction in many different organisms. However, some organ-

    isms (Saccharomyces cerevisiae, Trypanosoma cruzei andLeishmania major) are considered to be RNAi-negative,based on the lack of experimental observations for specificknockdown of targeted genes and on the absence of compon-ents, such as Dicer and RISC, in the genes of these organismsthat are critical for effective RNAi (16,17). More importantly,concern has arisen that the specificity of RNAi, targeted by thesequence of siRNA, may not be perfect. Initially, RNAi wasregarded as a highly specific means of gene repression. Severalstudies dealing with various model systems supported this idea(13,1820). However, still siRNA can direct RNAi to targetmRNA sequences that lack complete sequence identity (21).Agrawalet al. (4) forwarded concerns over specificity of generepression in RNAi. Saxena et al. (22) have demonstrated

    the effect of siRNA mismatches on target specificity inmammalian tissue culture cells and reported off-targetgene knockdown. Sequence identity of as few as 11 contiguousnucleotides to siRNA caused direct silencing of non-targetgenes in experiments conducted on specificity of siRNA incultured human cells (23). Scacheriet al. (24) pointed out thatmismatches between siRNA and target sequences could havecaused off-target RNAi in mammalian cells but such effectsare difficult to detect. Combined, the above examinations ofRNAi off-target effects have yielded mixed results. Perhapsas a consequence, RNAi studies do not explicitly control foroff-target effects on a routine basis.

    Of course, a lack of specificity resulting in knockdown ofunknown or unintended genes has considerable negative

    implications for functional genomics. Target specificity isalso of paramount importance when considering applicationsof RNAi in therapeutics (3,4). For clarification of theseuncertainties regarding RNAi, the off-target effect shouldbe evaluated for each gene expressed by the organismunder study, by considering multiple possible factors affectingoff-target silencing. Such comprehensive studies are mostprobably expensive and cumbersome to conduct experiment-ally. A computational approach is less expensive to implementand permits the extension of real parameters into wider rangesfor fully observing the trends and effects upon RNAi speci-ficity. This work represents a systematic computational studyof RNAi-related off-target effects in several organisms.

    Current guidelines for the design of siRNA and dsRNA forRNAi experiments recommend BLAST similarity searches(25) against sequence databases to identify potential off-target genes to improve the likelihood that only the intendedsingle gene is targeted (26). However, the BLAST algorithmwas not specifically designed to assess RNAi off-target effects.

    Therefore, dedicated computational methods were developedfor improved detection of sequence identity to accurately andsystematically evaluate RNAi off-target effects betweensiRNA sequences and target genes on a transcriptome-widescale. In this computational study, three organisms, Schizo-saccharomyces pombe(fission yeast),Caenorhabditis elegansandHomo sapiens(human) were examined. The likelihood ofoff-target effects for all known genes in each of theseorganisms were evaluated, including factors that may impactthe target specificity and efficiency of RNAi. These factorsincluded the length of siRNA, the length of dsRNA, the lengthof siRNA-target sequence mismatch, the position of mismatchwithin the siRNA sequence, the position of dsRNA within itstarget, coding sequences (CDSs) and untranslated regions

    (UTRs) as targets for RNAi, the chromosomal location anddensity of genes, and the effect of siRNA selection by rationalsiRNA design (27). These analyses were aimed to gain insightstoward improving specificity of RNAi for functional genomicsand potential future therapeutic application by facilitating abetter understanding of off-target effects of RNAi. It wouldalso be desirable to include effects such as RNAi directedagainst promoter regions, concentration dependences andthe non-linear silencing effects of siRNA pools. Unfortu-nately, published empirical data on such effects are currentlysparse that we cannot construct a reasonable computationalmodel for them, hence these classes of interactions are omittedfrom this study.

    MATERIALS AND METHODS

    Sequence data

    The sequence data used in this study were collected from theS.pombe, C.elegans andH.sapiens. RNAi has been observed ineach of these organisms and extensive sequence data, includ-ing full genome sequences, were available for analysis. Thesethree organisms represent a wide phylogenetic range. We usedthe cDNA sequences of 5401 genes ofS.pombeavailable at theSanger Institute (ftp://ftp.sanger.ac.uk/pub/yeast/pombe). ThecDNA sequences from 22 168 genes of C.elegans (releaseWS110) were obtained from the Wormbase at Sanger Institute.The collective sequence data considered to represent 30-UTR

    sequences from C.elegans consisted of 1000 UTRs that werepresent in the expressed sequence tag database combined withsequences that resulted from the UTR prediction method ofHajarnaviset al. (28). The dataset of human genes represent-ing 27 852 mRNAs with 30-UTRs was taken from the RefSeqdatabase at NCBI (http://www.ncbi.nlm.nih.gov).

    Modeling RNAi and off-target effects

    Although computational methods exist to model aspects ofmechanisms that employ short RNA sequences to regulategene expression, such as microRNA (miRNA) genes(29,30), miRNA targets (31,32) and siRNA efficacy (3335),none was available to study RNAi off-target effects. Thus,

    Nucleic Acids Research, 2005, Vol. 33, No. 6 1835

  • 8/12/2019 Nucl. Acids Res.-2005-Qiu-1834-47

    3/14

    dedicated computational methods were developed forimproved detection of sequence identity to accurately andsystematically evaluate RNAi off-target effects based onsequence identity between siRNA sequences and targetgenes on a transcriptome-wide scale.

    RNAi is guided by complete and near complete sequence

    identity of siRNA and the target mRNA transcript (2123).siRNA sequences are generated by the activity of Dicer, anenzyme that cleaves long dsRNA into fragments of21 bp(19). To model RNAi, we determined the incidence ofsequence identity (exact and allowing for some mismatch)of each of all possible siRNA sequences (arbitrary lengthrange of 1729 nt) derived from the length of dsRNA (100,200, 300 and 400 nt starting at the first coding nucleotide, andthe sequence region from 100 to 200 nt) representing any ofthe CDSs relative to all possible siRNA sequences predictedfrom the CDSs of each of the organisms studied. Sequenceidentity of the siRNA derived from a given gene by usinganother gene was considered to signify a potential off-targetRNAi. To mimic RNAi that is directed through siRNA with

    sequence identity to the UTRs of mRNA transcripts, bothupstream and downstream UTR sequences (if available)were included for the analysis of off-target effects. Withthese variables, the effect of length of both siRNA and initialdsRNA upon the chance of off-target effects was investigated.This was implemented as follows.

    The similarity between two oligonucleotides is computedwith inner product in the feature space using the n-gram

    feature map, as described previously (36). The use of aninverted file and red black tree (RBT) for calculating theinner products in the feature space achieved efficient compu-tational performance.

    Computational representation of siRNA-target binding

    We describe each gene by its possible contiguous sub-sequences of length n (typically 21, Table 1 and Figure 1explain the parameters used in our computational model),called n-grams or n-mers. We consider each gene to be a doc-ument described by its coordinates that are indexed into then-mer space. More formally, each genegxin the input spaceX,consisting of a sequence of characters drawn from the alphabet

    Table 1. List of algorithm parameters

    Parameter Description

    n siRNA length (nt)l dsRNA length (nt)

    pos Position of dsRNA within target (nt), starting from50 end of CDS

    m Length of mismatch permitted (nt)mpos Position of mismatch within siRNAu3 3

    0-UTR, true when 30-UTR is included, false otherwiseu5 5

    0-UTR, true when 50-UTR is included, false otherwiser Application of rational design rules, true when applied,

    false otherwise

    target sequence

    dsRNA

    siRNACompare forexact identity

    or with mismatch (m>0)

    population of all potential siRNA in transcriptome

    siRNA length(n)

    dsRNA length(l)

    CDS

    5 UTR 3 UTR

    inclusion ofavailable UTRs in

    target sequence

    position ofdsRNA alongtarget sequence

    target sequence

    A

    B

    D

    E

    C target sequence

    CDS

    Figure 1.Graphic depiction of variables tested computationally to investigate chance of off-target effect in each of the three organisms (H.sapiens,C.elegansandS.pombe). (A) Generalconsiderations:a target sequence(representingone particular expressed mRNA) is usedas thesourceof dsRNA of which a poolof allpossiblesiRNA is derived(mimickingthe action of Dicer). Each sequencewithinthe siRNA poolwas comparedfor sequenceidentity (exact: m= 0; withmismatch: m> 0)toall possible siRNA sequences in the transcriptome through the feature map F() to determine chance of off-target errors. The parameters tested are as follows: (B)lengthof siRNA (n); (C) lengthof dsRNA( l ); (D) additionof available UTR datain the target sequences (u3and u5);and (E) position of thedsRNA along thetargetsequence (pos).

    1836 Nucleic Acids Research, 2005, Vol. 33, No. 6

  • 8/12/2019 Nucl. Acids Res.-2005-Qiu-1834-47

    4/14

    A= fa, c, g, tg, j A j 4, is mapped onto an n-gram featurespace, N4

    n

    , by the feature map of exact match

    Fexn gx fa gx a2An 1

    where fa gx is the number of times n-gram a occurs in gx.Therefore, the image of a genegxconsists of its coordinates in

    the feature space indexed by the number of occurrences ofeach of its constituent n-mers. A genegyis said to match genegxif the following condition is satisfied

    Kgx gy hFexn gx F

    exngyi > T 2

    for a predefined threshold T. Then, the similarity measuredefined using the inner product in the feature space,

    Kgx gy hFexn gx F

    exngyiin Equation 2, actually defines

    a kernel function that can be used in a support vector machineclassifier (37). Here, we use the kernel to match two sequencesinstead of classification. For modeling RNAi, we chooseT= 1,since any match between an siRNA and its target mRNAwill cause the target to be knocked down. There is evidence

    that such similarity measures are appropriate models ofRNAi off-target effects (23). However, using the feature mapin Equation 1 with T= 1 establishes a lower bound on cross-reactivity. When more complicated RNAi-binding functions,such as mismatch, wobble and bulge, are modeled, differentfeature maps could be used than in Equation 1. Changing thevalue ofTcan make the similarity measure stronger or weaker.A simple example helps to explain how Equation 2 works. Tocompute the similarity measure on short sequences O1= aacgacand O2 = aacgtgg using 3mer (n = 3) exact match, they aremapped onto the feature space as Fexn O1 faacacgcgagacgand Fexn O2 faacacgcgtgtgtggg. Since 3mersaac and acg occur in both of them, hFexn O1 F

    exn O2 i

    11 2. Therefore, these two sequences match each

    other given the parameter and the criterion. For additionaldetails on the kernel function and the computation of thesimilarity measure, we refer the interested reader to Ref. (38).

    Computing the similarity of Equation 2 to find the off-targeterror in the genome using vector space model directly requiresO(DF4

    n) time, where F (40 106 forC.elegansand 60 106

    for human) is the number of n-grams in the genome that mayinclude UTR sequence and D (close to F) is the amount ofn-grams to be compared in the CDSs. For genome-wide scan-ning, this computing time is prohibitive and can be improvedby using the sparsity of the feature vectors. We use an invertedfile where the n-grams serve as identifiers and their gene namesand positions within the genes serve as attributes (the positionsare used for mismatches later). If we ignore n-mers having

    zero occurrence and allow for the duplication of n-mers, a genegxcan be represented in the feature space compactly

    Fexn gx f a1p1 a2p2 . . . akx pkx

    g 3

    whereaj, 1 < j< kx, is thej-th n-gram,pjis its position ongxand kx is the number of n-mers in gene gxwith UTRs beingoptionally included. If the length ofgx is Lx, then kx Lxn1. In the inverted file, the records for gene gxcontains thetriples ha1gxp1i ha2gxp2i ha3 gxp3i . . . hakx gxpkx i.The inverted file for the genome of an organism is the collec-tion of the triples of its genes. To speed up computation, wesort the inverted file on the n-mer fields using a RBT, which is

    a balanced binary search tree and guarantees logarithmicsearch performance.

    K(gx, gy) in Equation 2 is computed by searching each n-merof gx for gy in the inverted file. K(gx, gy) is the number ofoccurrence ofgyamong the matched genes. Each search in theRBT takesO(logF ) time, resulting in a time ofO(kxlogF ) for

    computingK(gx, gy).

    Definition of off-target error rate

    We define the off-target error using the exact match featuremap. However, it is the same for the mismatch feature mapdefined later. To simulate Dicers cleavage of dsRNA intosiRNAs, we take an oligonucleotide, ox, as dsRNA from genegxand map it onto the feature space, expressed compactly as

    Fexn ox f s1p1 s2p2 . . . slx plx

    g 4

    where sj, 1 < j< lx, is the j-th n-mer in ox and lx is themaximum number of n-grams in ox. To obtain the matchedgenes based on Equation 2, we compute the inner product

    hFexn ox Fexngyi for each gene gy in the genome, 1


Recommended