+ All Categories
Transcript

Published online 27 November 2007 Nucleic Acids Research 2008 Vol 36 Database issue D141ndashD148doi101093nargkm982

GRSDB2 and GRS_UTRdb databases of quadruplexforming G-rich sequences in pre-mRNAs and mRNAsOleg Kikin2 Zachary Zappala1 Lawrence DrsquoAntonio2 and Paramjeet S Bagga2

1Bergen County Academies Hackensack and 2Bioinformatics School of Theoretical and Applied ScienceRamapo College of New Jersey Mahwah NJ USA

Received September 16 2007 Accepted October 18 2007

ABSTRACT

G-quadruplex motifs in the RNA play significantroles in key cellular processes and humandisease While sequences capable of formingG-quadruplexes in the pre-mRNA are involved inregulation of polyadenylation and splicing eventsin mammalian transcripts the G-quadruplex motifsin the UTRs may help regulate mRNA expressionGRSDB2 is a second-generation database contain-ing information on the composition and distributionof putative Quadruplex-forming G-Rich Sequences(QGRS) mapped in 29 000 eukaryotic pre-mRNAsequences many of which are alternatively pro-cessed The data stored in the GRSDB2 is based oncomputational analysis of NCBI Entrez Gene entrieswith the help of an improved version of the QGRSMapper program The database allows complexqueries with a wide variety of parameters includingGene Ontology terms The data is displayed in avariety of formats with several additional computa-tional capabilities We have also developed anew database GRS_UTRdb containing informationon the composition and distribution patterns ofputative QGRS in the 5rsquo- and 3rsquo-UTRs ofeukaryotic mRNA sequences The goal of theseexperiments has been to build freely accessibleresources for exploring the role of G-quadruplexstructure in regulation of gene expression at post-transcriptional level The databases can beaccessed at the G-Quadruplex Resource Site athttpbioinformaticsramapoeduGQRS

INTRODUCTION

The G-rich polynucleotide molecule can repeatedly fold onitself to form a unimolecular quadruplex structure consist-ing of stacked G-tetrads which are square co-planar arrays

of four guanine bases each (1) Although G-quadruplexescan also be formed by association of two or fourmolecules the present work focuses only on theunimolecular quadruplexes which are more likely to beencountered in physiological conditions (23)G-quadruplexes have come into the limelight in

recent years especially because of increasing indicationfor their diverse roles in key cellular processes humandisease and as targets for therapy (4ndash10) Production ofG-quadruplexes has been shown to occur cotranscription-ally in the G-rich complementary DNA strands (11)Formation of RNA G-quadruplex structures in vivo hasalso been demonstrated (12) In fact RNA is more likelyto form stable G-quadruplexes than DNA in vivo (1314)G-quadruplex motifs in the RNA have been shown toplay significant roles in mRNA turnover (4) and FMRPbinding (15) Genes containing FMRP-binding sites maybe regulated by a common pathway At least two suchgenes have been suggested to be involved in autism (16)We have previously shown that interaction of a G-richSequence (GRS) with hnRNPHH0 can modulate 30endprocessing of mammalian pre-mRNAs (17ndash19) RecentlyFurger and coworkers have also found that 30end process-ing of melanocortin receptor 1 is regulated by interactionof two G-rich elements with hnRNPHH0 (20) We havedetermined using the QGRS Mapper software programthat we had developed earlier (21) that GRS in theabove studies are potentially capable of forming stableG-quadruplexes The hnRNPs HH0 and F that bind toGRS are known to regulate polyadenylation and splicingevents in mammalian transcripts (22ndash24) The hnRNP A1which is found in alternative splicing reactions also hasa demonstrated affinity for the G-quadruplex structure(25) G-rich motifs that may fold into quadruplexesin the vicinity of RNA-processing sites act as regulatorsby interacting with hnRNP A1 HH0 or F proteins(17ndash19242627) The majority of human genes are knownto undergo alternative polyadenylation (28) or alternativesplicing (29) The role of quadruplex structure in regulat-ing RNA processing which is an essential componentof differential gene expression needs to be explored

The authors wish it to be known that in their opinion the first two authors should be regarded as joint Authors

To whom correspondence should be addressed Tel +1 201 684 7722 Fax +1 201 684 7637 Email pbaggaramapoedu

2007 The Author(s)

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicenses

by-nc20uk) which permits unrestricted non-commercial use distribution and reproduction in any medium provided the original work is properly cited

Prevalence of G-quadruplexes in the human genomehas been established (3031) In a recent study genefunction was found to be associated with potential forG-quadruplex formation (32) However there is apaucity of systematic studies focusing on the analysisof G-quadruplex motifs near RNA processing sites of thegenes especially that are alternatively processed Genesthat contain G-quadruplex forming sequences are likelyto be regulated via special mechanisms (32) Our grouphas been interested in studying the role of G-quadruplexesin regulation of gene expression at post-transcriptionallevel We have adopted a bioinformatics approach tostudy composition and patterns of G-quadruplexes inpre-mRNA and mRNA sequencesWe had previously built GRSDB a database of

mapped G-quadruplex sequences in selected alternativelyprocessed human and mouse genes (33) GRSDB2 is asecond-generation database and contains information oncomposition and distribution of putative Quadruplex-forming G-Rich Sequences (QGRS) mapped in a largenumber of eukaryotic pre-mRNA sequences many ofwhich are alternatively processed (alternatively splicedor alternatively polyadenylated) The data stored in theGRSDB2 is based on computational analysis of NCBIEntrez Gene entries and their corresponding annotatedgenomic nucleotide sequences of RefSeqGenBankGRSDB2 has been built with a new and much improvedversion of QGRS Mapper program (21) It contains datafrom 29 000 eukaryotic genes from other organismsin addition to human and mouse The data model ofGRSDB2 is different than the first version in that it iscentered around Entrez Gene rather than solely GenBankRefSeq nucleotide entries The search module has beengreatly enhanced making it possible to generate complexqueries to search the database with a wide variety ofparameters including Gene Ontology terms The user mayselect subsets of genes from a query and perform furthercomputations on these genes through a lsquoWorkbenchrsquo It isalso possible to define the composition and size ofG-quadruplexes to be displayed by applying a varietyof filters through the lsquooptionsrsquo menu The lsquoGene ViewrsquolsquoData Viewrsquo and a highly interactive lsquoGraphic Viewrsquo forindividual database entries have been significantlyenhanced with several additional computational capabil-ities and links The data can now be exported into Excelfor further analysis In addition we have added alsquoSequence Viewrsquo which displays mapped G-quadruplexesin the context of pre-mRNA sequenceG-quadruplexes in the mRNA can influence translation

initiation (34) as well as repression (35) Recently aG-quadruplex in the 50-UTR (untranslated region) ofNRAS proto-oncogene mRNA was found to inhibit itstranslation (14) This study also found G-quadruplexesin the 50-UTR of many other genes The UTRs of mRNAscontain motifs that are vital for regulation of post-transcriptional gene expression Much attention has beenpaid to study the composition of regulatory RNA motifsand mechanism of their interactions with the cellularmachinery (36) Our preliminary bioinformatics studieshave found notable frequencies of G-quadruplex motifsin the 50- as well as 30- UTRs of mammalian mRNAs

More detailed studies are needed to investigate therole of UTR G-quadruplex structure in regulating post-transcriptional gene expression We have developed a newdatabase GRS_UTRdb which contains information onthe composition and distribution patterns of putativeQuadruplex forming GRS in the 50- and 30-UTRs ofeukaryotic mRNA sequences The data stored in theGRS_UTRdb is based on computational analysis of NCBIEntrez Gene entries and their corresponding annotatednucleotide sequences of RefSeqGenBank The computa-tions were performed with the help of an extension of theexisting QGRS Mapper program (20)

Both the GRSDB2 and GRS_UTRdb databases canbe accessed at the G-Quadruplex Resource Site at httpbioinformaticsramapoeduGQRS The goal of theseexperiments has been to build resources for exploringthe role of G-quadruplex structure in regulation of geneexpression at post-transcriptional level Researchers willfind both the websites to be user-friendly along withcomprehensive help sections as well as context-specifichelp where it is needed Investigators interested in thefunctional relevance of G-quadruplex structure in partic-ular its role in regulating the gene expression at post-transcriptional level will find both the databases to beof great value While GRSDB2 is useful for studyingG-quadruplexes near RNA-processing sites particularlyin alternatively processed pre-mRNAs GRS_UTRdboffers a resource for investigating G-quadruplexes in theuntranslated regions of mRNA Both the websites allowa comprehensive large-scale analysis as well as detailedstudies in individual genes

G-QUADRUPLEX MOTIF

The G-quadruplex motif may be written GxNaGxNbGx

NcGx namely four guanine groups of equal size (whichwe call G-groups) interspersed by three arbitrary nucleo-tide sequences called loops The size of each G-groupcorresponds to the number of stacked G-tetrads formingthe quadruplex structure We have previously describedthe G-quadruplex motif in more detail (21)

The potential of G-quadruplex to influence gene expres-sion relies on the stability of the structure Stability ofthe G-quadruplex is considered to be linked to its looplengths and the number of G-tetrads in the foldedstructure (37ndash39) While quadruplexes with at least threeG-tetrads have been accepted as stable structures twoG-tetrad quadruplexes are not uncommon (4041) In facta stable two G-tetrad RNA G-quadruplex that is capableof significantly influencing gene expression in vivo hasrecently been reported (12) Lower stability in fact mayallow more sensitive control of gene expression (12)Two G-tetrads although relatively lower in stabilityare expected to be far more prevalent in the genomes ascompared to the three G-tetrads

METHODS

GRSDB2 and GRS_UTRdb are relational databasesdeveloped with MySQL and store non-redundant data

D142 Nucleic Acids Research 2008 Vol 36 Database issue

Interfaces for the databases were built using PHP andJava The databases have been populated with the help ofcustom software developed previously by us to analyzeNCBI Entrez Gene entries (21) The QGRS are mappedwithin the relevant gene sequence and assigned a computedvalue called a G-score which rewards those sequencesdeemed more likely to form a stable complex (21)

Structure and features of GRSDB2

GRSDB2 contains information on the compositionand distribution of QGRS mapped in the eukaryoticpre-mRNA sequences

We have made an effort to include all possibleG-quadruplexes Users may search for QGRS containingG-groups of 2 3 or more Also the length of QGRS andloop size are search parameters that the user may setThere are two categories of QGRS that are regularly

used in GRSDB2 (302) refers to QGRS at most 30 ntlong and having at least 2 Grsquos per G-group while (453)refers to QGRS at most 45 nt long and having at least3 Grsquos per G-groupThe overall statistics for the database are shown in

Table 1Queries may be performed using a variety of search

fields Several fields for gene identifiers are

Figure 1 GRSDB2 Query Results Page Results of a query for alternatively spliced human or rat genes involved in apoptosis The results may besorted by clicking the header of any column At the bottom of the screen there are four controls allowing the user to add and clear genes froma lsquoWorkbenchrsquo Several programs are provided on the lsquoWorkbenchrsquo for further analysis of QGRS from any set of genes in the database

Table 1 Statistics for GRSDB2

Organism Numberof genes

Alternativelyspliced

Averagegenesize

Average numberof products

Number of(302) QGRS

Number of(453) QGRS

Homo sapiens 10 475 3197 (305) 61 401 154 2 391 014 196 949Mus musculus 2008 421 (21) 52 715 133 371 809 31 108Drosophila melanogaster 12 223 3018 (247) 5361 146 190 325 7830Rattus norvegicus 1477 37 (25) 7907 104 39 650 2948Caenorhabditis elegans 3054 1085 (355) 4478 154 20 194 309Gallus gallus 41 0 (0) 19 878 1 2349 211Bos taurus 3 0 (0) 8410 1 201 34Danio rerio 7 0 (0) 15 912 1 141 3Total 29 288 7758 3 015 683 239 392

Nucleic Acids Research 2008 Vol 36 Database issue D143

provided GeneID Gene Symbol Gene Name AliasesGI and Accession Number Since alternatively processedgenes are a focus of the database the user may look forgenes with a specified number of products and poly(A)signals The user may also specify which organism(s) toconsiderA significant feature of GRSDB2 is the ability it

affords the user to look for correlations of occurrencesof G-quadruplexes with gene ontology terms Queries canspecify gene ontology function process or componentThe GO terms are an example of a search field for whichthe user is not required to exactly match the databaseentry Instead searches may be done for which one ormore fields start with end with or contain the query valueThe columns of the query results page consist of the

Gene Name GeneID Organism Gene Size AccessionNumber Number of Products and Number of Poly(A)Signals (Figure 1) The results may be sorted on any ofthese columns with Gene Name the default sort fieldThe RefSeq status of each entry is also listed in the tableThe user may analyze sets of genes by putting them into

a lsquoWorkbenchrsquo where a variety of programs are available

to study QGRS distribution patterns for that particulargene set There are four controls at the bottom of theresults page for working with the lsquoWorkbenchrsquo The usermay add marked genes add all genes from the queryclear all genes from the lsquoWorkbenchrsquo or analyze the geneson the lsquoWorkbenchrsquo

There are five programs available on the lsquoWorkbenchrsquoOne program reports various statistics of the selectedgenes similar to the statistics page for the entire databaseThere are two programs [one for (302) QGRS the otherfor (453) QGRS] summarizing the distribution of QGRSwith respect to location in exons introns and near poly(A)signals (which is defined to be within 200 nt) Additionallythere are two programs showing the distribution ofG-scores for the selected genes The user is given theopportunity to export the output of any of these programsto Excel for further analysis

On the results page the user may select a particular genefor analysis by clicking on an entry in the Gene Namecolumn GRSDB2 has five interfaces for viewing informa-tion about QGRS Gene View Data View (no overlaps)Data View (with overlaps) Sequence View Graphic View

Figure 2 GRSDB2 CCRK Gene View Provides basic gene information including the number of products and poly(A) signals and gene ontologyterms for that gene QGRS counts are displayed for the (302) and (453) categories together with non-overlapping versus overlapping QGRSAdditional QGRS information together with an exonintron map is provided for each mRNA product There are controls to navigate to any of theother views

D144 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View has a table presenting basic informationon the properties of the gene the number of QGRS foundand the gene ontology terms associated with the gene(Figure 2) For each alternatively spliced product a mapof the exon and intron structure is given together withQGRS information for that product

In the Gene View the user has several options availableto filter which QGRS will be displayed in the Data andSequence Views For example the G-score loop sizeminimum size G-group and maximum QGRS length maybe set by the user

From the Gene View the user may choose any of theother interfaces The Data View displays the actual nucleo-tide sequence for each QGRS and locates its position in anexon intron or near a poly(A) signal (Figure 3) The resultsof this page may be exported to Excel There are twoversions of the Data View One view shows only non-overlapping QGRS the other view displays all QGRSoverlapping or not

In the Sequence View the nucleotide sequence for theentire gene is displayed Exons are listed in purple and eachQGRS is shown in yellow The Graphic View gives the usera highly interactive visual tool to zoom in on any portion ofthe gene and analyze QGRS located in that section

Structure and features of GRS_UTRdb

GRS_UTRdb contains information on the compositionand distribution of QGRS in the UTRs of eukaryoticmRNA sequences

Database users have 14 search fields to define queriesFields such as GeneID Gene NameSymbol GI andmRNA Accession Number allow the user to look forspecific genes and mRNA products There are fields forgene ontology function process or component The usermay select which organism to search on Also ranges forthe lengths of mRNA 50 UTR 30UTR or CDS may bespecifiedQuery results are summarized in a table from which

the user may select a product for further analysisGRS_UTRdb has six ways of viewing mRNA datamRNA Map Data View (no overlaps) Data View (withoverlaps) Sequence View Gene View Alternate ProductsThe mRNA map contains a table showing an overview

of information about the product including the lengths ofthe 50 UTR CDS and 30 UTR (Figure 4) The numberof QGRS found in each region is displayed There isa visual map of the locations of QGRS within each regionof the productThe Data View shows the actual sequence of every

QGRS in the product The view gives the location of eachQGRS in the 50 UTR CDS 30 UTR or near poly(A)signal Also G-score and the distance of QGRS from thenearest region boundaries are shown There are twoversions of the Data View one that displays only non-overlapping QGRS and a view that shows all QGRSThe sequence for the entire product may be found in

the Sequence View and separate displays are shown foreach region in the product The QGRS are shown ina box with each G-group in purple (Figure 5)

Figure 3 GRSDB2 CCRK Data View A listing of the nucleotide sequences for all QGRS the location of QGRS in exons introns and nearpoly(A) signals Results are shown for each product of the gene What is shown is a truncated image of the view The QGRS displayed satisfy theconditions that G-scores are in the range from 0 to 25 (which in effect restricts the output to QGRS with G-groups of size 2) and loop size is inthe range from 1 to 7

Nucleic Acids Research 2008 Vol 36 Database issue D145

Figure 5 GRS_UTRdb NM_1987091 Sequence View Shows the nucleotide sequence for the entire product in separate displays for the 50 UTRCDS and 30 UTR QGRS are enclosed in a box with each G-group shown in purple

Figure 4 GRS_UTRdb NM_1987091 mRNA (ARSB Gene) Map Displays basic information about the 50 UTR CDS and 30 UTR and the QGRSfrequency in each region A visual map of each region is displayed with the relative positions of QGRS depicted in the map

D146 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View gives basic information about genesincluding gene ontology terms and the location of poly(A)signals and sites Each mRNA product is listed togetherwith the distribution of QGRS there The lsquoAlternateProductsrsquo tab takes the user directly to the QGRSinformation for each product associated with the gene

CONCLUSIONS

GRSDB2 and GRS_UTRdb provide curated data on thecomposition and distribution of putative QGRS in thetranscribed regions of a large number of alternativelyprocessed eukaryotic genes GRSDB2 is useful for studyingG-quadruplexes near RNA-processing sites particularlythose that are differentially processed At present itcontains data for 29 288 genes encompassing 42 932products from several eukaryotic organisms More than 3million QGRS have been mapped to these genes Theavailability of large number of pre-mRNAs with mappedQGRS makes it possible to perform a variety of bioinfor-matics studies The database website already offers a rangeof computational tools to aid large scale as well asindividual gene analysis The lsquoWorkbenchrsquo can be used toperform computations on sets and sub-sets of genes in thedatabase The lsquoGene Viewrsquo lsquoData Viewrsquo and lsquoSequenceViewrsquo are useful for studying individual genes and theirmultiple products The highly interactive lsquoGraphic Viewrsquo isparticularly useful for working with parts of the genes

The new GRS_UTRdb offers a valuable resource forinvestigating G-quadruplexes in the UTRs of mRNACurrently it contains data for more than 16000eukaryotic mRNAs including 27000 QGRS whichhave been mapped to the 50 UTRs Like GRSDB2GRS_UTRdb also displays QGRS data in a variety ofmodes with computational capabilities and links At thispoint it does not have a lsquoWorkbenchrsquo facility We areconstantly adding new genes and new computational toolsto the website Since genes containing G-quadruplexmotifs could be regulated through special mechanismsone can expect for the gene function to correlate withG-quadruplex formation (32) We have classified the geneentries in our databases according to the gene ontologycategories which allows for queries with relevant terms

Researchers interested in the functional relevance ofG-quadruplex structure in particular its role in regulatingthe gene expression at post-transcriptional level will findboth the databases to be of great value

ACKNOWLEDGEMENTS

We thank Manuel Viotti for assistance with the testingand uploading data in the initial stages of this projectThe authors would like to acknowledge RumenKostadinov who developed the first version of GRSDBWe would also like to acknowledge Marcelo Halpern fortechnical assistance with the web server This project wasfunded in part by a grant from the Provost Office ofRamapo College of New Jersey Funding for the OpenAccess publication charges for this article was provided inpart by the Divisions of Student Affairs and Academic

Affairs of Ramapo College of New Jersey and the BergenCounty Academies Hackensack New Jersey

Conflict of interest statement None declared

REFERENCES

1 GellertM LipsettMN and DaviesDR (1962) Helix formationby guanylic acid Proc Natl Acad Sci USA 48 2013ndash2018

2 SchaffitzelC BergerI PostbergJ HanesJ LippsHJ andPluckthunA (2001) In vitro generated antibodies specificfor telomeric guanine-quadruplex DNA react withStylonychia lemnae macronuclei Proc Natl Acad Sci USA 988572ndash8577

3 HalderK and ChowdhuryS (2005) Kinetic resolution of bimole-cular hybridization versus intramolecular folding in nucleic acids bysurface plasmon resonance application to G-quadruplexduplexcompetition in human c-myc promoter Nucleic Acids Res 334466ndash4474

4 SimonssonT (2001) G-quadruplex DNA structures ndash variations ona theme Biol Chem 382 621ndash628

5 DavisJT (2004) G-quartets 40 years later from 50-GMP tomolecular biology and supramolecular chemistry Angew Chem IntEd Engl 43 668ndash698

6 KellandLR (2005) Overcoming the immortality of tumour cells bytelomere and telomerase based cancer therapeutics ndash current statusand future prospects Eur J Cancer 41 971ndash979

7 BurgeS ParkinsonGN HazelP ToddAK and NeidleS(2006) Quadruplex DNA sequence topology and structureNucleic Acids Res 34 5402ndash5415

8 MaizelsN (2006) Dynamic roles for G4 DNA in the biology ofeukaryotic cells Nat Struct Mol Biol 13 1055ndash1059

9 PaeschkeK SimonssonT PostbergJ RhodesD and LippsHJ(2005) Telomere end-binding proteins control the formation ofG-quadruplex DNA structures in vivo Nat Struct Mol Biol 12847ndash854

10 ToddAK HaiderSM ParkinsonGN and NeidleS (2007)Sequence occurrence and structural uniqueness of a G-quadruplexin the human c-kit promoter Nucleic Acids Res 35 5799ndash5808

11 DuquetteML HandaP VincentJA TaylorAF andMaizelsN (2004) Intracellular transcription of G-rich DNAsinduces formation of G-loops novel structures containing G4DNA Genes Dev 18 1618ndash1629

12 WielandM and HartigJS (2007) RNA quadruplex-basedmodulation of gene expression Chem Biol 14 757ndash763

13 SaccaB LacroixL and MergnyJL (2005) The effect of chemicalmodifications on the thermal stability of different G-quadruplex-forming oligonucleotides Nucleic Acids Res 33 1182ndash1192

14 KumariS BugautA HuppertJL and BalasubramanianS (2007)An RNA G-quadruplex in the 50 UTR of the NRAS proto-oncogene modulates translation Nat Chem Biol 3 218ndash221

15 BashkirovVI ScherthanH SolingerJA BuersteddeJM andHeyerWD (1997) A mouse cytoplasmic exoribonuclease(mXRN1p) with preference for G4 tetraplex substrates J Cell Biol136 761ndash773

16 NishimuraY MartinCL Vazquez-LopezA SpenceSJAlvarez-RetuertoAI SigmanM SteindlerC PellegriniSSchanenNC et al (2007) Genome-wide expression profiling oflymphoblastoid cell lines distinguishes different forms of autism andreveals shared pathways Hum Mol Genet 16 1682ndash1698

17 BaggaPS FordLP ChenF and WiluszJ (1995) The G-richauxiliary downstream element has distinct sequence and positionrequirements and mediates efficient 30 end pre-mRNA processingthrough a trans-acting factor Nucleic Acids Res 23 1625ndash1631

18 BaggaPS ArhinGK and WiluszJ (1998) DSEF-1 is amember of the hnRNP H family of RNA-binding proteins andstimulates pre-mRNA cleavage and polyadenylation in vitroNucleic Acids Res 26 5343ndash5350

19 ArhinGK BootsM BaggaPS MilcarekC and WiluszJ(2002) Downstream sequence elements with different affinities forthe hnRNP HH0 protein influence the processing efficiency ofmammalian polyadenylation signals Nucleic Acids Res 301842ndash1850

Nucleic Acids Research 2008 Vol 36 Database issue D147

20 DalzielM NunesNM and FurgerA (2007) Two G-rich regula-tory elements located adjacent to and 440 nucleotides downstreamof the core poly(A) site of the intronless melanocortin receptor 1gene are critical for efficient 30 end processing Mol Cell Biol 271568ndash1580

21 KikinO DrsquoAntonioL and BaggaPS (2006) QGRS Mapper aweb-based server for predicting G-quadruplexes in nucleotidesequences Nucleic Acids Res 34 W676ndashW682

22 VeraldiKL ArhinGK MartincicK Chung-GansterLHWiluszJ and MilcarekC (2001) hnRNP F influences bindingof a 64-kilodalton subunit of cleavage stimulation factor to mRNAprecursors in mouse B cells Mol Cell Biol 21 1228ndash1238

23 BruceSR DingleRW and PetersonML (2003) B-cell andplasma-cell splicing differences a potential role in regulatedimmunoglobulin RNA processing RNA 9 1264ndash1273

24 GarneauD RevilT FisetteJF and ChabotB (2005)Heterogeneous nuclear ribonucleoprotein FH proteins modulatethe alternative splicing of the apoptotic mediator Bcl-xJ Biol Chem 280 22641ndash22650

25 ZhangQS MancheL XuRM and KrainerAR (2006) hnRNPA1 associates with telomere ends and stimulates telomerase activityRNA 12 1116ndash1128

26 HanK YeoG AnP BurgeCB and GrabowskiPJ (2005)A combinatorial code for splicing silencing UAGG and GGGGmotifs PLoS Biol 3 e158

27 WangE DimovaN and CambiF (2007) PLPDM20 ratio isregulated by hnRNPH and F and a novel G-rich enhancer inoligodendrocytes Nucleic Acids Res 35 4164ndash4178

28 TianB HuJ ZhangH and LutzCS (2005) A large-scaleanalysis of mRNA polyadenylation of human and mouse genesNucleic Acids Res 33 201ndash212

29 JohnsonJM CastleJ Garrett-EngeleP KanZ LoerchPMArmourCD SantosR SchadtEE StoughtonR et al (2003)Genome-wide survey of human alternative pre-mRNA splicing withexon junction microarrays Science 302 2141ndash2144

30 HuppertJL and BalasubramanianS (2005) Prevalence ofquadruplexes in the human genome Nucleic Acids Res 332908ndash2916

31 ToddAK JohnstonM and NeidleS (2005) Highly prevalentputative quadruplex sequence motifs in human DNANucleic Acids Res 33 2901ndash2907

32 EddyJ and MaizelsN (2006) Gene function correlates withpotential for G4 DNA formation in the human genomeNucleic Acids Res 34 3887ndash3896

33 KostadinovR MalhotraN ViottiM ShineR DrsquoAntonioLand BaggaP (2006) GRSDB a database of quadruplex formingG-rich sequences in alternatively processed mammalian pre-mRNAsequences Nucleic Acids Res 34 D119ndashD124

34 BonnalS SchaefferC CreancierL ClamensS MoineHPratsAC and VagnerS (2003) A single internal ribosome entrysite containing a G quartet RNA structure drives fibroblast growthfactor 2 gene expression at four alternative translation initiationcodons J Biol Chem 278 39330ndash39336

35 OliverAW BogdarinaI SchroederE TaylorIA andKnealeGG (2000) Preferential binding of fd gene 5 protein totetraplex nucleic acid structures J Mol Biol 301 575ndash584

36 MignoneF GissiC LiuniS and PesoleG (2002)Untranslated regions of mRNAs Genome Biol 3 reviews00041-reviews000410

37 CrnugeljM SketP and PlavecJ (2003) Small change in a G-richsequence a dramatic change in topology new dimericG-quadruplex folding motif with unique loop orientationsJ Am Chem Soc 125 7866ndash7871

38 HazelP HuppertJ BalasubramanianS and NeidleS (2004)Loop-length-dependent folding of G-quadruplexesJ Am Chem Soc 126 16405ndash16415

39 RisitanoA and FoxKR (2004) Influence of loop size on thestability of intramolecular DNA quadruplexes Nucleic Acids Res32 2598ndash2606

40 ZarudnayaMI KolomietsIM PotyahayloAL andHovorunDM (2003) Downstream elements of mammalianpre-mRNA polyadenylation signals primary secondary and higher-order structures Nucleic Acids Res 31 1375ndash1386

41 KankiaBI BaranyG and Musier-ForsythK (2005) Unfoldingof DNA quadruplexes induced by HIV-1 nucleocapsid proteinNucleic Acids Res 33 4395ndash4403

D148 Nucleic Acids Research 2008 Vol 36 Database issue

Prevalence of G-quadruplexes in the human genomehas been established (3031) In a recent study genefunction was found to be associated with potential forG-quadruplex formation (32) However there is apaucity of systematic studies focusing on the analysisof G-quadruplex motifs near RNA processing sites of thegenes especially that are alternatively processed Genesthat contain G-quadruplex forming sequences are likelyto be regulated via special mechanisms (32) Our grouphas been interested in studying the role of G-quadruplexesin regulation of gene expression at post-transcriptionallevel We have adopted a bioinformatics approach tostudy composition and patterns of G-quadruplexes inpre-mRNA and mRNA sequencesWe had previously built GRSDB a database of

mapped G-quadruplex sequences in selected alternativelyprocessed human and mouse genes (33) GRSDB2 is asecond-generation database and contains information oncomposition and distribution of putative Quadruplex-forming G-Rich Sequences (QGRS) mapped in a largenumber of eukaryotic pre-mRNA sequences many ofwhich are alternatively processed (alternatively splicedor alternatively polyadenylated) The data stored in theGRSDB2 is based on computational analysis of NCBIEntrez Gene entries and their corresponding annotatedgenomic nucleotide sequences of RefSeqGenBankGRSDB2 has been built with a new and much improvedversion of QGRS Mapper program (21) It contains datafrom 29 000 eukaryotic genes from other organismsin addition to human and mouse The data model ofGRSDB2 is different than the first version in that it iscentered around Entrez Gene rather than solely GenBankRefSeq nucleotide entries The search module has beengreatly enhanced making it possible to generate complexqueries to search the database with a wide variety ofparameters including Gene Ontology terms The user mayselect subsets of genes from a query and perform furthercomputations on these genes through a lsquoWorkbenchrsquo It isalso possible to define the composition and size ofG-quadruplexes to be displayed by applying a varietyof filters through the lsquooptionsrsquo menu The lsquoGene ViewrsquolsquoData Viewrsquo and a highly interactive lsquoGraphic Viewrsquo forindividual database entries have been significantlyenhanced with several additional computational capabil-ities and links The data can now be exported into Excelfor further analysis In addition we have added alsquoSequence Viewrsquo which displays mapped G-quadruplexesin the context of pre-mRNA sequenceG-quadruplexes in the mRNA can influence translation

initiation (34) as well as repression (35) Recently aG-quadruplex in the 50-UTR (untranslated region) ofNRAS proto-oncogene mRNA was found to inhibit itstranslation (14) This study also found G-quadruplexesin the 50-UTR of many other genes The UTRs of mRNAscontain motifs that are vital for regulation of post-transcriptional gene expression Much attention has beenpaid to study the composition of regulatory RNA motifsand mechanism of their interactions with the cellularmachinery (36) Our preliminary bioinformatics studieshave found notable frequencies of G-quadruplex motifsin the 50- as well as 30- UTRs of mammalian mRNAs

More detailed studies are needed to investigate therole of UTR G-quadruplex structure in regulating post-transcriptional gene expression We have developed a newdatabase GRS_UTRdb which contains information onthe composition and distribution patterns of putativeQuadruplex forming GRS in the 50- and 30-UTRs ofeukaryotic mRNA sequences The data stored in theGRS_UTRdb is based on computational analysis of NCBIEntrez Gene entries and their corresponding annotatednucleotide sequences of RefSeqGenBank The computa-tions were performed with the help of an extension of theexisting QGRS Mapper program (20)

Both the GRSDB2 and GRS_UTRdb databases canbe accessed at the G-Quadruplex Resource Site at httpbioinformaticsramapoeduGQRS The goal of theseexperiments has been to build resources for exploringthe role of G-quadruplex structure in regulation of geneexpression at post-transcriptional level Researchers willfind both the websites to be user-friendly along withcomprehensive help sections as well as context-specifichelp where it is needed Investigators interested in thefunctional relevance of G-quadruplex structure in partic-ular its role in regulating the gene expression at post-transcriptional level will find both the databases to beof great value While GRSDB2 is useful for studyingG-quadruplexes near RNA-processing sites particularlyin alternatively processed pre-mRNAs GRS_UTRdboffers a resource for investigating G-quadruplexes in theuntranslated regions of mRNA Both the websites allowa comprehensive large-scale analysis as well as detailedstudies in individual genes

G-QUADRUPLEX MOTIF

The G-quadruplex motif may be written GxNaGxNbGx

NcGx namely four guanine groups of equal size (whichwe call G-groups) interspersed by three arbitrary nucleo-tide sequences called loops The size of each G-groupcorresponds to the number of stacked G-tetrads formingthe quadruplex structure We have previously describedthe G-quadruplex motif in more detail (21)

The potential of G-quadruplex to influence gene expres-sion relies on the stability of the structure Stability ofthe G-quadruplex is considered to be linked to its looplengths and the number of G-tetrads in the foldedstructure (37ndash39) While quadruplexes with at least threeG-tetrads have been accepted as stable structures twoG-tetrad quadruplexes are not uncommon (4041) In facta stable two G-tetrad RNA G-quadruplex that is capableof significantly influencing gene expression in vivo hasrecently been reported (12) Lower stability in fact mayallow more sensitive control of gene expression (12)Two G-tetrads although relatively lower in stabilityare expected to be far more prevalent in the genomes ascompared to the three G-tetrads

METHODS

GRSDB2 and GRS_UTRdb are relational databasesdeveloped with MySQL and store non-redundant data

D142 Nucleic Acids Research 2008 Vol 36 Database issue

Interfaces for the databases were built using PHP andJava The databases have been populated with the help ofcustom software developed previously by us to analyzeNCBI Entrez Gene entries (21) The QGRS are mappedwithin the relevant gene sequence and assigned a computedvalue called a G-score which rewards those sequencesdeemed more likely to form a stable complex (21)

Structure and features of GRSDB2

GRSDB2 contains information on the compositionand distribution of QGRS mapped in the eukaryoticpre-mRNA sequences

We have made an effort to include all possibleG-quadruplexes Users may search for QGRS containingG-groups of 2 3 or more Also the length of QGRS andloop size are search parameters that the user may setThere are two categories of QGRS that are regularly

used in GRSDB2 (302) refers to QGRS at most 30 ntlong and having at least 2 Grsquos per G-group while (453)refers to QGRS at most 45 nt long and having at least3 Grsquos per G-groupThe overall statistics for the database are shown in

Table 1Queries may be performed using a variety of search

fields Several fields for gene identifiers are

Figure 1 GRSDB2 Query Results Page Results of a query for alternatively spliced human or rat genes involved in apoptosis The results may besorted by clicking the header of any column At the bottom of the screen there are four controls allowing the user to add and clear genes froma lsquoWorkbenchrsquo Several programs are provided on the lsquoWorkbenchrsquo for further analysis of QGRS from any set of genes in the database

Table 1 Statistics for GRSDB2

Organism Numberof genes

Alternativelyspliced

Averagegenesize

Average numberof products

Number of(302) QGRS

Number of(453) QGRS

Homo sapiens 10 475 3197 (305) 61 401 154 2 391 014 196 949Mus musculus 2008 421 (21) 52 715 133 371 809 31 108Drosophila melanogaster 12 223 3018 (247) 5361 146 190 325 7830Rattus norvegicus 1477 37 (25) 7907 104 39 650 2948Caenorhabditis elegans 3054 1085 (355) 4478 154 20 194 309Gallus gallus 41 0 (0) 19 878 1 2349 211Bos taurus 3 0 (0) 8410 1 201 34Danio rerio 7 0 (0) 15 912 1 141 3Total 29 288 7758 3 015 683 239 392

Nucleic Acids Research 2008 Vol 36 Database issue D143

provided GeneID Gene Symbol Gene Name AliasesGI and Accession Number Since alternatively processedgenes are a focus of the database the user may look forgenes with a specified number of products and poly(A)signals The user may also specify which organism(s) toconsiderA significant feature of GRSDB2 is the ability it

affords the user to look for correlations of occurrencesof G-quadruplexes with gene ontology terms Queries canspecify gene ontology function process or componentThe GO terms are an example of a search field for whichthe user is not required to exactly match the databaseentry Instead searches may be done for which one ormore fields start with end with or contain the query valueThe columns of the query results page consist of the

Gene Name GeneID Organism Gene Size AccessionNumber Number of Products and Number of Poly(A)Signals (Figure 1) The results may be sorted on any ofthese columns with Gene Name the default sort fieldThe RefSeq status of each entry is also listed in the tableThe user may analyze sets of genes by putting them into

a lsquoWorkbenchrsquo where a variety of programs are available

to study QGRS distribution patterns for that particulargene set There are four controls at the bottom of theresults page for working with the lsquoWorkbenchrsquo The usermay add marked genes add all genes from the queryclear all genes from the lsquoWorkbenchrsquo or analyze the geneson the lsquoWorkbenchrsquo

There are five programs available on the lsquoWorkbenchrsquoOne program reports various statistics of the selectedgenes similar to the statistics page for the entire databaseThere are two programs [one for (302) QGRS the otherfor (453) QGRS] summarizing the distribution of QGRSwith respect to location in exons introns and near poly(A)signals (which is defined to be within 200 nt) Additionallythere are two programs showing the distribution ofG-scores for the selected genes The user is given theopportunity to export the output of any of these programsto Excel for further analysis

On the results page the user may select a particular genefor analysis by clicking on an entry in the Gene Namecolumn GRSDB2 has five interfaces for viewing informa-tion about QGRS Gene View Data View (no overlaps)Data View (with overlaps) Sequence View Graphic View

Figure 2 GRSDB2 CCRK Gene View Provides basic gene information including the number of products and poly(A) signals and gene ontologyterms for that gene QGRS counts are displayed for the (302) and (453) categories together with non-overlapping versus overlapping QGRSAdditional QGRS information together with an exonintron map is provided for each mRNA product There are controls to navigate to any of theother views

D144 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View has a table presenting basic informationon the properties of the gene the number of QGRS foundand the gene ontology terms associated with the gene(Figure 2) For each alternatively spliced product a mapof the exon and intron structure is given together withQGRS information for that product

In the Gene View the user has several options availableto filter which QGRS will be displayed in the Data andSequence Views For example the G-score loop sizeminimum size G-group and maximum QGRS length maybe set by the user

From the Gene View the user may choose any of theother interfaces The Data View displays the actual nucleo-tide sequence for each QGRS and locates its position in anexon intron or near a poly(A) signal (Figure 3) The resultsof this page may be exported to Excel There are twoversions of the Data View One view shows only non-overlapping QGRS the other view displays all QGRSoverlapping or not

In the Sequence View the nucleotide sequence for theentire gene is displayed Exons are listed in purple and eachQGRS is shown in yellow The Graphic View gives the usera highly interactive visual tool to zoom in on any portion ofthe gene and analyze QGRS located in that section

Structure and features of GRS_UTRdb

GRS_UTRdb contains information on the compositionand distribution of QGRS in the UTRs of eukaryoticmRNA sequences

Database users have 14 search fields to define queriesFields such as GeneID Gene NameSymbol GI andmRNA Accession Number allow the user to look forspecific genes and mRNA products There are fields forgene ontology function process or component The usermay select which organism to search on Also ranges forthe lengths of mRNA 50 UTR 30UTR or CDS may bespecifiedQuery results are summarized in a table from which

the user may select a product for further analysisGRS_UTRdb has six ways of viewing mRNA datamRNA Map Data View (no overlaps) Data View (withoverlaps) Sequence View Gene View Alternate ProductsThe mRNA map contains a table showing an overview

of information about the product including the lengths ofthe 50 UTR CDS and 30 UTR (Figure 4) The numberof QGRS found in each region is displayed There isa visual map of the locations of QGRS within each regionof the productThe Data View shows the actual sequence of every

QGRS in the product The view gives the location of eachQGRS in the 50 UTR CDS 30 UTR or near poly(A)signal Also G-score and the distance of QGRS from thenearest region boundaries are shown There are twoversions of the Data View one that displays only non-overlapping QGRS and a view that shows all QGRSThe sequence for the entire product may be found in

the Sequence View and separate displays are shown foreach region in the product The QGRS are shown ina box with each G-group in purple (Figure 5)

Figure 3 GRSDB2 CCRK Data View A listing of the nucleotide sequences for all QGRS the location of QGRS in exons introns and nearpoly(A) signals Results are shown for each product of the gene What is shown is a truncated image of the view The QGRS displayed satisfy theconditions that G-scores are in the range from 0 to 25 (which in effect restricts the output to QGRS with G-groups of size 2) and loop size is inthe range from 1 to 7

Nucleic Acids Research 2008 Vol 36 Database issue D145

Figure 5 GRS_UTRdb NM_1987091 Sequence View Shows the nucleotide sequence for the entire product in separate displays for the 50 UTRCDS and 30 UTR QGRS are enclosed in a box with each G-group shown in purple

Figure 4 GRS_UTRdb NM_1987091 mRNA (ARSB Gene) Map Displays basic information about the 50 UTR CDS and 30 UTR and the QGRSfrequency in each region A visual map of each region is displayed with the relative positions of QGRS depicted in the map

D146 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View gives basic information about genesincluding gene ontology terms and the location of poly(A)signals and sites Each mRNA product is listed togetherwith the distribution of QGRS there The lsquoAlternateProductsrsquo tab takes the user directly to the QGRSinformation for each product associated with the gene

CONCLUSIONS

GRSDB2 and GRS_UTRdb provide curated data on thecomposition and distribution of putative QGRS in thetranscribed regions of a large number of alternativelyprocessed eukaryotic genes GRSDB2 is useful for studyingG-quadruplexes near RNA-processing sites particularlythose that are differentially processed At present itcontains data for 29 288 genes encompassing 42 932products from several eukaryotic organisms More than 3million QGRS have been mapped to these genes Theavailability of large number of pre-mRNAs with mappedQGRS makes it possible to perform a variety of bioinfor-matics studies The database website already offers a rangeof computational tools to aid large scale as well asindividual gene analysis The lsquoWorkbenchrsquo can be used toperform computations on sets and sub-sets of genes in thedatabase The lsquoGene Viewrsquo lsquoData Viewrsquo and lsquoSequenceViewrsquo are useful for studying individual genes and theirmultiple products The highly interactive lsquoGraphic Viewrsquo isparticularly useful for working with parts of the genes

The new GRS_UTRdb offers a valuable resource forinvestigating G-quadruplexes in the UTRs of mRNACurrently it contains data for more than 16000eukaryotic mRNAs including 27000 QGRS whichhave been mapped to the 50 UTRs Like GRSDB2GRS_UTRdb also displays QGRS data in a variety ofmodes with computational capabilities and links At thispoint it does not have a lsquoWorkbenchrsquo facility We areconstantly adding new genes and new computational toolsto the website Since genes containing G-quadruplexmotifs could be regulated through special mechanismsone can expect for the gene function to correlate withG-quadruplex formation (32) We have classified the geneentries in our databases according to the gene ontologycategories which allows for queries with relevant terms

Researchers interested in the functional relevance ofG-quadruplex structure in particular its role in regulatingthe gene expression at post-transcriptional level will findboth the databases to be of great value

ACKNOWLEDGEMENTS

We thank Manuel Viotti for assistance with the testingand uploading data in the initial stages of this projectThe authors would like to acknowledge RumenKostadinov who developed the first version of GRSDBWe would also like to acknowledge Marcelo Halpern fortechnical assistance with the web server This project wasfunded in part by a grant from the Provost Office ofRamapo College of New Jersey Funding for the OpenAccess publication charges for this article was provided inpart by the Divisions of Student Affairs and Academic

Affairs of Ramapo College of New Jersey and the BergenCounty Academies Hackensack New Jersey

Conflict of interest statement None declared

REFERENCES

1 GellertM LipsettMN and DaviesDR (1962) Helix formationby guanylic acid Proc Natl Acad Sci USA 48 2013ndash2018

2 SchaffitzelC BergerI PostbergJ HanesJ LippsHJ andPluckthunA (2001) In vitro generated antibodies specificfor telomeric guanine-quadruplex DNA react withStylonychia lemnae macronuclei Proc Natl Acad Sci USA 988572ndash8577

3 HalderK and ChowdhuryS (2005) Kinetic resolution of bimole-cular hybridization versus intramolecular folding in nucleic acids bysurface plasmon resonance application to G-quadruplexduplexcompetition in human c-myc promoter Nucleic Acids Res 334466ndash4474

4 SimonssonT (2001) G-quadruplex DNA structures ndash variations ona theme Biol Chem 382 621ndash628

5 DavisJT (2004) G-quartets 40 years later from 50-GMP tomolecular biology and supramolecular chemistry Angew Chem IntEd Engl 43 668ndash698

6 KellandLR (2005) Overcoming the immortality of tumour cells bytelomere and telomerase based cancer therapeutics ndash current statusand future prospects Eur J Cancer 41 971ndash979

7 BurgeS ParkinsonGN HazelP ToddAK and NeidleS(2006) Quadruplex DNA sequence topology and structureNucleic Acids Res 34 5402ndash5415

8 MaizelsN (2006) Dynamic roles for G4 DNA in the biology ofeukaryotic cells Nat Struct Mol Biol 13 1055ndash1059

9 PaeschkeK SimonssonT PostbergJ RhodesD and LippsHJ(2005) Telomere end-binding proteins control the formation ofG-quadruplex DNA structures in vivo Nat Struct Mol Biol 12847ndash854

10 ToddAK HaiderSM ParkinsonGN and NeidleS (2007)Sequence occurrence and structural uniqueness of a G-quadruplexin the human c-kit promoter Nucleic Acids Res 35 5799ndash5808

11 DuquetteML HandaP VincentJA TaylorAF andMaizelsN (2004) Intracellular transcription of G-rich DNAsinduces formation of G-loops novel structures containing G4DNA Genes Dev 18 1618ndash1629

12 WielandM and HartigJS (2007) RNA quadruplex-basedmodulation of gene expression Chem Biol 14 757ndash763

13 SaccaB LacroixL and MergnyJL (2005) The effect of chemicalmodifications on the thermal stability of different G-quadruplex-forming oligonucleotides Nucleic Acids Res 33 1182ndash1192

14 KumariS BugautA HuppertJL and BalasubramanianS (2007)An RNA G-quadruplex in the 50 UTR of the NRAS proto-oncogene modulates translation Nat Chem Biol 3 218ndash221

15 BashkirovVI ScherthanH SolingerJA BuersteddeJM andHeyerWD (1997) A mouse cytoplasmic exoribonuclease(mXRN1p) with preference for G4 tetraplex substrates J Cell Biol136 761ndash773

16 NishimuraY MartinCL Vazquez-LopezA SpenceSJAlvarez-RetuertoAI SigmanM SteindlerC PellegriniSSchanenNC et al (2007) Genome-wide expression profiling oflymphoblastoid cell lines distinguishes different forms of autism andreveals shared pathways Hum Mol Genet 16 1682ndash1698

17 BaggaPS FordLP ChenF and WiluszJ (1995) The G-richauxiliary downstream element has distinct sequence and positionrequirements and mediates efficient 30 end pre-mRNA processingthrough a trans-acting factor Nucleic Acids Res 23 1625ndash1631

18 BaggaPS ArhinGK and WiluszJ (1998) DSEF-1 is amember of the hnRNP H family of RNA-binding proteins andstimulates pre-mRNA cleavage and polyadenylation in vitroNucleic Acids Res 26 5343ndash5350

19 ArhinGK BootsM BaggaPS MilcarekC and WiluszJ(2002) Downstream sequence elements with different affinities forthe hnRNP HH0 protein influence the processing efficiency ofmammalian polyadenylation signals Nucleic Acids Res 301842ndash1850

Nucleic Acids Research 2008 Vol 36 Database issue D147

20 DalzielM NunesNM and FurgerA (2007) Two G-rich regula-tory elements located adjacent to and 440 nucleotides downstreamof the core poly(A) site of the intronless melanocortin receptor 1gene are critical for efficient 30 end processing Mol Cell Biol 271568ndash1580

21 KikinO DrsquoAntonioL and BaggaPS (2006) QGRS Mapper aweb-based server for predicting G-quadruplexes in nucleotidesequences Nucleic Acids Res 34 W676ndashW682

22 VeraldiKL ArhinGK MartincicK Chung-GansterLHWiluszJ and MilcarekC (2001) hnRNP F influences bindingof a 64-kilodalton subunit of cleavage stimulation factor to mRNAprecursors in mouse B cells Mol Cell Biol 21 1228ndash1238

23 BruceSR DingleRW and PetersonML (2003) B-cell andplasma-cell splicing differences a potential role in regulatedimmunoglobulin RNA processing RNA 9 1264ndash1273

24 GarneauD RevilT FisetteJF and ChabotB (2005)Heterogeneous nuclear ribonucleoprotein FH proteins modulatethe alternative splicing of the apoptotic mediator Bcl-xJ Biol Chem 280 22641ndash22650

25 ZhangQS MancheL XuRM and KrainerAR (2006) hnRNPA1 associates with telomere ends and stimulates telomerase activityRNA 12 1116ndash1128

26 HanK YeoG AnP BurgeCB and GrabowskiPJ (2005)A combinatorial code for splicing silencing UAGG and GGGGmotifs PLoS Biol 3 e158

27 WangE DimovaN and CambiF (2007) PLPDM20 ratio isregulated by hnRNPH and F and a novel G-rich enhancer inoligodendrocytes Nucleic Acids Res 35 4164ndash4178

28 TianB HuJ ZhangH and LutzCS (2005) A large-scaleanalysis of mRNA polyadenylation of human and mouse genesNucleic Acids Res 33 201ndash212

29 JohnsonJM CastleJ Garrett-EngeleP KanZ LoerchPMArmourCD SantosR SchadtEE StoughtonR et al (2003)Genome-wide survey of human alternative pre-mRNA splicing withexon junction microarrays Science 302 2141ndash2144

30 HuppertJL and BalasubramanianS (2005) Prevalence ofquadruplexes in the human genome Nucleic Acids Res 332908ndash2916

31 ToddAK JohnstonM and NeidleS (2005) Highly prevalentputative quadruplex sequence motifs in human DNANucleic Acids Res 33 2901ndash2907

32 EddyJ and MaizelsN (2006) Gene function correlates withpotential for G4 DNA formation in the human genomeNucleic Acids Res 34 3887ndash3896

33 KostadinovR MalhotraN ViottiM ShineR DrsquoAntonioLand BaggaP (2006) GRSDB a database of quadruplex formingG-rich sequences in alternatively processed mammalian pre-mRNAsequences Nucleic Acids Res 34 D119ndashD124

34 BonnalS SchaefferC CreancierL ClamensS MoineHPratsAC and VagnerS (2003) A single internal ribosome entrysite containing a G quartet RNA structure drives fibroblast growthfactor 2 gene expression at four alternative translation initiationcodons J Biol Chem 278 39330ndash39336

35 OliverAW BogdarinaI SchroederE TaylorIA andKnealeGG (2000) Preferential binding of fd gene 5 protein totetraplex nucleic acid structures J Mol Biol 301 575ndash584

36 MignoneF GissiC LiuniS and PesoleG (2002)Untranslated regions of mRNAs Genome Biol 3 reviews00041-reviews000410

37 CrnugeljM SketP and PlavecJ (2003) Small change in a G-richsequence a dramatic change in topology new dimericG-quadruplex folding motif with unique loop orientationsJ Am Chem Soc 125 7866ndash7871

38 HazelP HuppertJ BalasubramanianS and NeidleS (2004)Loop-length-dependent folding of G-quadruplexesJ Am Chem Soc 126 16405ndash16415

39 RisitanoA and FoxKR (2004) Influence of loop size on thestability of intramolecular DNA quadruplexes Nucleic Acids Res32 2598ndash2606

40 ZarudnayaMI KolomietsIM PotyahayloAL andHovorunDM (2003) Downstream elements of mammalianpre-mRNA polyadenylation signals primary secondary and higher-order structures Nucleic Acids Res 31 1375ndash1386

41 KankiaBI BaranyG and Musier-ForsythK (2005) Unfoldingof DNA quadruplexes induced by HIV-1 nucleocapsid proteinNucleic Acids Res 33 4395ndash4403

D148 Nucleic Acids Research 2008 Vol 36 Database issue

Interfaces for the databases were built using PHP andJava The databases have been populated with the help ofcustom software developed previously by us to analyzeNCBI Entrez Gene entries (21) The QGRS are mappedwithin the relevant gene sequence and assigned a computedvalue called a G-score which rewards those sequencesdeemed more likely to form a stable complex (21)

Structure and features of GRSDB2

GRSDB2 contains information on the compositionand distribution of QGRS mapped in the eukaryoticpre-mRNA sequences

We have made an effort to include all possibleG-quadruplexes Users may search for QGRS containingG-groups of 2 3 or more Also the length of QGRS andloop size are search parameters that the user may setThere are two categories of QGRS that are regularly

used in GRSDB2 (302) refers to QGRS at most 30 ntlong and having at least 2 Grsquos per G-group while (453)refers to QGRS at most 45 nt long and having at least3 Grsquos per G-groupThe overall statistics for the database are shown in

Table 1Queries may be performed using a variety of search

fields Several fields for gene identifiers are

Figure 1 GRSDB2 Query Results Page Results of a query for alternatively spliced human or rat genes involved in apoptosis The results may besorted by clicking the header of any column At the bottom of the screen there are four controls allowing the user to add and clear genes froma lsquoWorkbenchrsquo Several programs are provided on the lsquoWorkbenchrsquo for further analysis of QGRS from any set of genes in the database

Table 1 Statistics for GRSDB2

Organism Numberof genes

Alternativelyspliced

Averagegenesize

Average numberof products

Number of(302) QGRS

Number of(453) QGRS

Homo sapiens 10 475 3197 (305) 61 401 154 2 391 014 196 949Mus musculus 2008 421 (21) 52 715 133 371 809 31 108Drosophila melanogaster 12 223 3018 (247) 5361 146 190 325 7830Rattus norvegicus 1477 37 (25) 7907 104 39 650 2948Caenorhabditis elegans 3054 1085 (355) 4478 154 20 194 309Gallus gallus 41 0 (0) 19 878 1 2349 211Bos taurus 3 0 (0) 8410 1 201 34Danio rerio 7 0 (0) 15 912 1 141 3Total 29 288 7758 3 015 683 239 392

Nucleic Acids Research 2008 Vol 36 Database issue D143

provided GeneID Gene Symbol Gene Name AliasesGI and Accession Number Since alternatively processedgenes are a focus of the database the user may look forgenes with a specified number of products and poly(A)signals The user may also specify which organism(s) toconsiderA significant feature of GRSDB2 is the ability it

affords the user to look for correlations of occurrencesof G-quadruplexes with gene ontology terms Queries canspecify gene ontology function process or componentThe GO terms are an example of a search field for whichthe user is not required to exactly match the databaseentry Instead searches may be done for which one ormore fields start with end with or contain the query valueThe columns of the query results page consist of the

Gene Name GeneID Organism Gene Size AccessionNumber Number of Products and Number of Poly(A)Signals (Figure 1) The results may be sorted on any ofthese columns with Gene Name the default sort fieldThe RefSeq status of each entry is also listed in the tableThe user may analyze sets of genes by putting them into

a lsquoWorkbenchrsquo where a variety of programs are available

to study QGRS distribution patterns for that particulargene set There are four controls at the bottom of theresults page for working with the lsquoWorkbenchrsquo The usermay add marked genes add all genes from the queryclear all genes from the lsquoWorkbenchrsquo or analyze the geneson the lsquoWorkbenchrsquo

There are five programs available on the lsquoWorkbenchrsquoOne program reports various statistics of the selectedgenes similar to the statistics page for the entire databaseThere are two programs [one for (302) QGRS the otherfor (453) QGRS] summarizing the distribution of QGRSwith respect to location in exons introns and near poly(A)signals (which is defined to be within 200 nt) Additionallythere are two programs showing the distribution ofG-scores for the selected genes The user is given theopportunity to export the output of any of these programsto Excel for further analysis

On the results page the user may select a particular genefor analysis by clicking on an entry in the Gene Namecolumn GRSDB2 has five interfaces for viewing informa-tion about QGRS Gene View Data View (no overlaps)Data View (with overlaps) Sequence View Graphic View

Figure 2 GRSDB2 CCRK Gene View Provides basic gene information including the number of products and poly(A) signals and gene ontologyterms for that gene QGRS counts are displayed for the (302) and (453) categories together with non-overlapping versus overlapping QGRSAdditional QGRS information together with an exonintron map is provided for each mRNA product There are controls to navigate to any of theother views

D144 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View has a table presenting basic informationon the properties of the gene the number of QGRS foundand the gene ontology terms associated with the gene(Figure 2) For each alternatively spliced product a mapof the exon and intron structure is given together withQGRS information for that product

In the Gene View the user has several options availableto filter which QGRS will be displayed in the Data andSequence Views For example the G-score loop sizeminimum size G-group and maximum QGRS length maybe set by the user

From the Gene View the user may choose any of theother interfaces The Data View displays the actual nucleo-tide sequence for each QGRS and locates its position in anexon intron or near a poly(A) signal (Figure 3) The resultsof this page may be exported to Excel There are twoversions of the Data View One view shows only non-overlapping QGRS the other view displays all QGRSoverlapping or not

In the Sequence View the nucleotide sequence for theentire gene is displayed Exons are listed in purple and eachQGRS is shown in yellow The Graphic View gives the usera highly interactive visual tool to zoom in on any portion ofthe gene and analyze QGRS located in that section

Structure and features of GRS_UTRdb

GRS_UTRdb contains information on the compositionand distribution of QGRS in the UTRs of eukaryoticmRNA sequences

Database users have 14 search fields to define queriesFields such as GeneID Gene NameSymbol GI andmRNA Accession Number allow the user to look forspecific genes and mRNA products There are fields forgene ontology function process or component The usermay select which organism to search on Also ranges forthe lengths of mRNA 50 UTR 30UTR or CDS may bespecifiedQuery results are summarized in a table from which

the user may select a product for further analysisGRS_UTRdb has six ways of viewing mRNA datamRNA Map Data View (no overlaps) Data View (withoverlaps) Sequence View Gene View Alternate ProductsThe mRNA map contains a table showing an overview

of information about the product including the lengths ofthe 50 UTR CDS and 30 UTR (Figure 4) The numberof QGRS found in each region is displayed There isa visual map of the locations of QGRS within each regionof the productThe Data View shows the actual sequence of every

QGRS in the product The view gives the location of eachQGRS in the 50 UTR CDS 30 UTR or near poly(A)signal Also G-score and the distance of QGRS from thenearest region boundaries are shown There are twoversions of the Data View one that displays only non-overlapping QGRS and a view that shows all QGRSThe sequence for the entire product may be found in

the Sequence View and separate displays are shown foreach region in the product The QGRS are shown ina box with each G-group in purple (Figure 5)

Figure 3 GRSDB2 CCRK Data View A listing of the nucleotide sequences for all QGRS the location of QGRS in exons introns and nearpoly(A) signals Results are shown for each product of the gene What is shown is a truncated image of the view The QGRS displayed satisfy theconditions that G-scores are in the range from 0 to 25 (which in effect restricts the output to QGRS with G-groups of size 2) and loop size is inthe range from 1 to 7

Nucleic Acids Research 2008 Vol 36 Database issue D145

Figure 5 GRS_UTRdb NM_1987091 Sequence View Shows the nucleotide sequence for the entire product in separate displays for the 50 UTRCDS and 30 UTR QGRS are enclosed in a box with each G-group shown in purple

Figure 4 GRS_UTRdb NM_1987091 mRNA (ARSB Gene) Map Displays basic information about the 50 UTR CDS and 30 UTR and the QGRSfrequency in each region A visual map of each region is displayed with the relative positions of QGRS depicted in the map

D146 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View gives basic information about genesincluding gene ontology terms and the location of poly(A)signals and sites Each mRNA product is listed togetherwith the distribution of QGRS there The lsquoAlternateProductsrsquo tab takes the user directly to the QGRSinformation for each product associated with the gene

CONCLUSIONS

GRSDB2 and GRS_UTRdb provide curated data on thecomposition and distribution of putative QGRS in thetranscribed regions of a large number of alternativelyprocessed eukaryotic genes GRSDB2 is useful for studyingG-quadruplexes near RNA-processing sites particularlythose that are differentially processed At present itcontains data for 29 288 genes encompassing 42 932products from several eukaryotic organisms More than 3million QGRS have been mapped to these genes Theavailability of large number of pre-mRNAs with mappedQGRS makes it possible to perform a variety of bioinfor-matics studies The database website already offers a rangeof computational tools to aid large scale as well asindividual gene analysis The lsquoWorkbenchrsquo can be used toperform computations on sets and sub-sets of genes in thedatabase The lsquoGene Viewrsquo lsquoData Viewrsquo and lsquoSequenceViewrsquo are useful for studying individual genes and theirmultiple products The highly interactive lsquoGraphic Viewrsquo isparticularly useful for working with parts of the genes

The new GRS_UTRdb offers a valuable resource forinvestigating G-quadruplexes in the UTRs of mRNACurrently it contains data for more than 16000eukaryotic mRNAs including 27000 QGRS whichhave been mapped to the 50 UTRs Like GRSDB2GRS_UTRdb also displays QGRS data in a variety ofmodes with computational capabilities and links At thispoint it does not have a lsquoWorkbenchrsquo facility We areconstantly adding new genes and new computational toolsto the website Since genes containing G-quadruplexmotifs could be regulated through special mechanismsone can expect for the gene function to correlate withG-quadruplex formation (32) We have classified the geneentries in our databases according to the gene ontologycategories which allows for queries with relevant terms

Researchers interested in the functional relevance ofG-quadruplex structure in particular its role in regulatingthe gene expression at post-transcriptional level will findboth the databases to be of great value

ACKNOWLEDGEMENTS

We thank Manuel Viotti for assistance with the testingand uploading data in the initial stages of this projectThe authors would like to acknowledge RumenKostadinov who developed the first version of GRSDBWe would also like to acknowledge Marcelo Halpern fortechnical assistance with the web server This project wasfunded in part by a grant from the Provost Office ofRamapo College of New Jersey Funding for the OpenAccess publication charges for this article was provided inpart by the Divisions of Student Affairs and Academic

Affairs of Ramapo College of New Jersey and the BergenCounty Academies Hackensack New Jersey

Conflict of interest statement None declared

REFERENCES

1 GellertM LipsettMN and DaviesDR (1962) Helix formationby guanylic acid Proc Natl Acad Sci USA 48 2013ndash2018

2 SchaffitzelC BergerI PostbergJ HanesJ LippsHJ andPluckthunA (2001) In vitro generated antibodies specificfor telomeric guanine-quadruplex DNA react withStylonychia lemnae macronuclei Proc Natl Acad Sci USA 988572ndash8577

3 HalderK and ChowdhuryS (2005) Kinetic resolution of bimole-cular hybridization versus intramolecular folding in nucleic acids bysurface plasmon resonance application to G-quadruplexduplexcompetition in human c-myc promoter Nucleic Acids Res 334466ndash4474

4 SimonssonT (2001) G-quadruplex DNA structures ndash variations ona theme Biol Chem 382 621ndash628

5 DavisJT (2004) G-quartets 40 years later from 50-GMP tomolecular biology and supramolecular chemistry Angew Chem IntEd Engl 43 668ndash698

6 KellandLR (2005) Overcoming the immortality of tumour cells bytelomere and telomerase based cancer therapeutics ndash current statusand future prospects Eur J Cancer 41 971ndash979

7 BurgeS ParkinsonGN HazelP ToddAK and NeidleS(2006) Quadruplex DNA sequence topology and structureNucleic Acids Res 34 5402ndash5415

8 MaizelsN (2006) Dynamic roles for G4 DNA in the biology ofeukaryotic cells Nat Struct Mol Biol 13 1055ndash1059

9 PaeschkeK SimonssonT PostbergJ RhodesD and LippsHJ(2005) Telomere end-binding proteins control the formation ofG-quadruplex DNA structures in vivo Nat Struct Mol Biol 12847ndash854

10 ToddAK HaiderSM ParkinsonGN and NeidleS (2007)Sequence occurrence and structural uniqueness of a G-quadruplexin the human c-kit promoter Nucleic Acids Res 35 5799ndash5808

11 DuquetteML HandaP VincentJA TaylorAF andMaizelsN (2004) Intracellular transcription of G-rich DNAsinduces formation of G-loops novel structures containing G4DNA Genes Dev 18 1618ndash1629

12 WielandM and HartigJS (2007) RNA quadruplex-basedmodulation of gene expression Chem Biol 14 757ndash763

13 SaccaB LacroixL and MergnyJL (2005) The effect of chemicalmodifications on the thermal stability of different G-quadruplex-forming oligonucleotides Nucleic Acids Res 33 1182ndash1192

14 KumariS BugautA HuppertJL and BalasubramanianS (2007)An RNA G-quadruplex in the 50 UTR of the NRAS proto-oncogene modulates translation Nat Chem Biol 3 218ndash221

15 BashkirovVI ScherthanH SolingerJA BuersteddeJM andHeyerWD (1997) A mouse cytoplasmic exoribonuclease(mXRN1p) with preference for G4 tetraplex substrates J Cell Biol136 761ndash773

16 NishimuraY MartinCL Vazquez-LopezA SpenceSJAlvarez-RetuertoAI SigmanM SteindlerC PellegriniSSchanenNC et al (2007) Genome-wide expression profiling oflymphoblastoid cell lines distinguishes different forms of autism andreveals shared pathways Hum Mol Genet 16 1682ndash1698

17 BaggaPS FordLP ChenF and WiluszJ (1995) The G-richauxiliary downstream element has distinct sequence and positionrequirements and mediates efficient 30 end pre-mRNA processingthrough a trans-acting factor Nucleic Acids Res 23 1625ndash1631

18 BaggaPS ArhinGK and WiluszJ (1998) DSEF-1 is amember of the hnRNP H family of RNA-binding proteins andstimulates pre-mRNA cleavage and polyadenylation in vitroNucleic Acids Res 26 5343ndash5350

19 ArhinGK BootsM BaggaPS MilcarekC and WiluszJ(2002) Downstream sequence elements with different affinities forthe hnRNP HH0 protein influence the processing efficiency ofmammalian polyadenylation signals Nucleic Acids Res 301842ndash1850

Nucleic Acids Research 2008 Vol 36 Database issue D147

20 DalzielM NunesNM and FurgerA (2007) Two G-rich regula-tory elements located adjacent to and 440 nucleotides downstreamof the core poly(A) site of the intronless melanocortin receptor 1gene are critical for efficient 30 end processing Mol Cell Biol 271568ndash1580

21 KikinO DrsquoAntonioL and BaggaPS (2006) QGRS Mapper aweb-based server for predicting G-quadruplexes in nucleotidesequences Nucleic Acids Res 34 W676ndashW682

22 VeraldiKL ArhinGK MartincicK Chung-GansterLHWiluszJ and MilcarekC (2001) hnRNP F influences bindingof a 64-kilodalton subunit of cleavage stimulation factor to mRNAprecursors in mouse B cells Mol Cell Biol 21 1228ndash1238

23 BruceSR DingleRW and PetersonML (2003) B-cell andplasma-cell splicing differences a potential role in regulatedimmunoglobulin RNA processing RNA 9 1264ndash1273

24 GarneauD RevilT FisetteJF and ChabotB (2005)Heterogeneous nuclear ribonucleoprotein FH proteins modulatethe alternative splicing of the apoptotic mediator Bcl-xJ Biol Chem 280 22641ndash22650

25 ZhangQS MancheL XuRM and KrainerAR (2006) hnRNPA1 associates with telomere ends and stimulates telomerase activityRNA 12 1116ndash1128

26 HanK YeoG AnP BurgeCB and GrabowskiPJ (2005)A combinatorial code for splicing silencing UAGG and GGGGmotifs PLoS Biol 3 e158

27 WangE DimovaN and CambiF (2007) PLPDM20 ratio isregulated by hnRNPH and F and a novel G-rich enhancer inoligodendrocytes Nucleic Acids Res 35 4164ndash4178

28 TianB HuJ ZhangH and LutzCS (2005) A large-scaleanalysis of mRNA polyadenylation of human and mouse genesNucleic Acids Res 33 201ndash212

29 JohnsonJM CastleJ Garrett-EngeleP KanZ LoerchPMArmourCD SantosR SchadtEE StoughtonR et al (2003)Genome-wide survey of human alternative pre-mRNA splicing withexon junction microarrays Science 302 2141ndash2144

30 HuppertJL and BalasubramanianS (2005) Prevalence ofquadruplexes in the human genome Nucleic Acids Res 332908ndash2916

31 ToddAK JohnstonM and NeidleS (2005) Highly prevalentputative quadruplex sequence motifs in human DNANucleic Acids Res 33 2901ndash2907

32 EddyJ and MaizelsN (2006) Gene function correlates withpotential for G4 DNA formation in the human genomeNucleic Acids Res 34 3887ndash3896

33 KostadinovR MalhotraN ViottiM ShineR DrsquoAntonioLand BaggaP (2006) GRSDB a database of quadruplex formingG-rich sequences in alternatively processed mammalian pre-mRNAsequences Nucleic Acids Res 34 D119ndashD124

34 BonnalS SchaefferC CreancierL ClamensS MoineHPratsAC and VagnerS (2003) A single internal ribosome entrysite containing a G quartet RNA structure drives fibroblast growthfactor 2 gene expression at four alternative translation initiationcodons J Biol Chem 278 39330ndash39336

35 OliverAW BogdarinaI SchroederE TaylorIA andKnealeGG (2000) Preferential binding of fd gene 5 protein totetraplex nucleic acid structures J Mol Biol 301 575ndash584

36 MignoneF GissiC LiuniS and PesoleG (2002)Untranslated regions of mRNAs Genome Biol 3 reviews00041-reviews000410

37 CrnugeljM SketP and PlavecJ (2003) Small change in a G-richsequence a dramatic change in topology new dimericG-quadruplex folding motif with unique loop orientationsJ Am Chem Soc 125 7866ndash7871

38 HazelP HuppertJ BalasubramanianS and NeidleS (2004)Loop-length-dependent folding of G-quadruplexesJ Am Chem Soc 126 16405ndash16415

39 RisitanoA and FoxKR (2004) Influence of loop size on thestability of intramolecular DNA quadruplexes Nucleic Acids Res32 2598ndash2606

40 ZarudnayaMI KolomietsIM PotyahayloAL andHovorunDM (2003) Downstream elements of mammalianpre-mRNA polyadenylation signals primary secondary and higher-order structures Nucleic Acids Res 31 1375ndash1386

41 KankiaBI BaranyG and Musier-ForsythK (2005) Unfoldingof DNA quadruplexes induced by HIV-1 nucleocapsid proteinNucleic Acids Res 33 4395ndash4403

D148 Nucleic Acids Research 2008 Vol 36 Database issue

provided GeneID Gene Symbol Gene Name AliasesGI and Accession Number Since alternatively processedgenes are a focus of the database the user may look forgenes with a specified number of products and poly(A)signals The user may also specify which organism(s) toconsiderA significant feature of GRSDB2 is the ability it

affords the user to look for correlations of occurrencesof G-quadruplexes with gene ontology terms Queries canspecify gene ontology function process or componentThe GO terms are an example of a search field for whichthe user is not required to exactly match the databaseentry Instead searches may be done for which one ormore fields start with end with or contain the query valueThe columns of the query results page consist of the

Gene Name GeneID Organism Gene Size AccessionNumber Number of Products and Number of Poly(A)Signals (Figure 1) The results may be sorted on any ofthese columns with Gene Name the default sort fieldThe RefSeq status of each entry is also listed in the tableThe user may analyze sets of genes by putting them into

a lsquoWorkbenchrsquo where a variety of programs are available

to study QGRS distribution patterns for that particulargene set There are four controls at the bottom of theresults page for working with the lsquoWorkbenchrsquo The usermay add marked genes add all genes from the queryclear all genes from the lsquoWorkbenchrsquo or analyze the geneson the lsquoWorkbenchrsquo

There are five programs available on the lsquoWorkbenchrsquoOne program reports various statistics of the selectedgenes similar to the statistics page for the entire databaseThere are two programs [one for (302) QGRS the otherfor (453) QGRS] summarizing the distribution of QGRSwith respect to location in exons introns and near poly(A)signals (which is defined to be within 200 nt) Additionallythere are two programs showing the distribution ofG-scores for the selected genes The user is given theopportunity to export the output of any of these programsto Excel for further analysis

On the results page the user may select a particular genefor analysis by clicking on an entry in the Gene Namecolumn GRSDB2 has five interfaces for viewing informa-tion about QGRS Gene View Data View (no overlaps)Data View (with overlaps) Sequence View Graphic View

Figure 2 GRSDB2 CCRK Gene View Provides basic gene information including the number of products and poly(A) signals and gene ontologyterms for that gene QGRS counts are displayed for the (302) and (453) categories together with non-overlapping versus overlapping QGRSAdditional QGRS information together with an exonintron map is provided for each mRNA product There are controls to navigate to any of theother views

D144 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View has a table presenting basic informationon the properties of the gene the number of QGRS foundand the gene ontology terms associated with the gene(Figure 2) For each alternatively spliced product a mapof the exon and intron structure is given together withQGRS information for that product

In the Gene View the user has several options availableto filter which QGRS will be displayed in the Data andSequence Views For example the G-score loop sizeminimum size G-group and maximum QGRS length maybe set by the user

From the Gene View the user may choose any of theother interfaces The Data View displays the actual nucleo-tide sequence for each QGRS and locates its position in anexon intron or near a poly(A) signal (Figure 3) The resultsof this page may be exported to Excel There are twoversions of the Data View One view shows only non-overlapping QGRS the other view displays all QGRSoverlapping or not

In the Sequence View the nucleotide sequence for theentire gene is displayed Exons are listed in purple and eachQGRS is shown in yellow The Graphic View gives the usera highly interactive visual tool to zoom in on any portion ofthe gene and analyze QGRS located in that section

Structure and features of GRS_UTRdb

GRS_UTRdb contains information on the compositionand distribution of QGRS in the UTRs of eukaryoticmRNA sequences

Database users have 14 search fields to define queriesFields such as GeneID Gene NameSymbol GI andmRNA Accession Number allow the user to look forspecific genes and mRNA products There are fields forgene ontology function process or component The usermay select which organism to search on Also ranges forthe lengths of mRNA 50 UTR 30UTR or CDS may bespecifiedQuery results are summarized in a table from which

the user may select a product for further analysisGRS_UTRdb has six ways of viewing mRNA datamRNA Map Data View (no overlaps) Data View (withoverlaps) Sequence View Gene View Alternate ProductsThe mRNA map contains a table showing an overview

of information about the product including the lengths ofthe 50 UTR CDS and 30 UTR (Figure 4) The numberof QGRS found in each region is displayed There isa visual map of the locations of QGRS within each regionof the productThe Data View shows the actual sequence of every

QGRS in the product The view gives the location of eachQGRS in the 50 UTR CDS 30 UTR or near poly(A)signal Also G-score and the distance of QGRS from thenearest region boundaries are shown There are twoversions of the Data View one that displays only non-overlapping QGRS and a view that shows all QGRSThe sequence for the entire product may be found in

the Sequence View and separate displays are shown foreach region in the product The QGRS are shown ina box with each G-group in purple (Figure 5)

Figure 3 GRSDB2 CCRK Data View A listing of the nucleotide sequences for all QGRS the location of QGRS in exons introns and nearpoly(A) signals Results are shown for each product of the gene What is shown is a truncated image of the view The QGRS displayed satisfy theconditions that G-scores are in the range from 0 to 25 (which in effect restricts the output to QGRS with G-groups of size 2) and loop size is inthe range from 1 to 7

Nucleic Acids Research 2008 Vol 36 Database issue D145

Figure 5 GRS_UTRdb NM_1987091 Sequence View Shows the nucleotide sequence for the entire product in separate displays for the 50 UTRCDS and 30 UTR QGRS are enclosed in a box with each G-group shown in purple

Figure 4 GRS_UTRdb NM_1987091 mRNA (ARSB Gene) Map Displays basic information about the 50 UTR CDS and 30 UTR and the QGRSfrequency in each region A visual map of each region is displayed with the relative positions of QGRS depicted in the map

D146 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View gives basic information about genesincluding gene ontology terms and the location of poly(A)signals and sites Each mRNA product is listed togetherwith the distribution of QGRS there The lsquoAlternateProductsrsquo tab takes the user directly to the QGRSinformation for each product associated with the gene

CONCLUSIONS

GRSDB2 and GRS_UTRdb provide curated data on thecomposition and distribution of putative QGRS in thetranscribed regions of a large number of alternativelyprocessed eukaryotic genes GRSDB2 is useful for studyingG-quadruplexes near RNA-processing sites particularlythose that are differentially processed At present itcontains data for 29 288 genes encompassing 42 932products from several eukaryotic organisms More than 3million QGRS have been mapped to these genes Theavailability of large number of pre-mRNAs with mappedQGRS makes it possible to perform a variety of bioinfor-matics studies The database website already offers a rangeof computational tools to aid large scale as well asindividual gene analysis The lsquoWorkbenchrsquo can be used toperform computations on sets and sub-sets of genes in thedatabase The lsquoGene Viewrsquo lsquoData Viewrsquo and lsquoSequenceViewrsquo are useful for studying individual genes and theirmultiple products The highly interactive lsquoGraphic Viewrsquo isparticularly useful for working with parts of the genes

The new GRS_UTRdb offers a valuable resource forinvestigating G-quadruplexes in the UTRs of mRNACurrently it contains data for more than 16000eukaryotic mRNAs including 27000 QGRS whichhave been mapped to the 50 UTRs Like GRSDB2GRS_UTRdb also displays QGRS data in a variety ofmodes with computational capabilities and links At thispoint it does not have a lsquoWorkbenchrsquo facility We areconstantly adding new genes and new computational toolsto the website Since genes containing G-quadruplexmotifs could be regulated through special mechanismsone can expect for the gene function to correlate withG-quadruplex formation (32) We have classified the geneentries in our databases according to the gene ontologycategories which allows for queries with relevant terms

Researchers interested in the functional relevance ofG-quadruplex structure in particular its role in regulatingthe gene expression at post-transcriptional level will findboth the databases to be of great value

ACKNOWLEDGEMENTS

We thank Manuel Viotti for assistance with the testingand uploading data in the initial stages of this projectThe authors would like to acknowledge RumenKostadinov who developed the first version of GRSDBWe would also like to acknowledge Marcelo Halpern fortechnical assistance with the web server This project wasfunded in part by a grant from the Provost Office ofRamapo College of New Jersey Funding for the OpenAccess publication charges for this article was provided inpart by the Divisions of Student Affairs and Academic

Affairs of Ramapo College of New Jersey and the BergenCounty Academies Hackensack New Jersey

Conflict of interest statement None declared

REFERENCES

1 GellertM LipsettMN and DaviesDR (1962) Helix formationby guanylic acid Proc Natl Acad Sci USA 48 2013ndash2018

2 SchaffitzelC BergerI PostbergJ HanesJ LippsHJ andPluckthunA (2001) In vitro generated antibodies specificfor telomeric guanine-quadruplex DNA react withStylonychia lemnae macronuclei Proc Natl Acad Sci USA 988572ndash8577

3 HalderK and ChowdhuryS (2005) Kinetic resolution of bimole-cular hybridization versus intramolecular folding in nucleic acids bysurface plasmon resonance application to G-quadruplexduplexcompetition in human c-myc promoter Nucleic Acids Res 334466ndash4474

4 SimonssonT (2001) G-quadruplex DNA structures ndash variations ona theme Biol Chem 382 621ndash628

5 DavisJT (2004) G-quartets 40 years later from 50-GMP tomolecular biology and supramolecular chemistry Angew Chem IntEd Engl 43 668ndash698

6 KellandLR (2005) Overcoming the immortality of tumour cells bytelomere and telomerase based cancer therapeutics ndash current statusand future prospects Eur J Cancer 41 971ndash979

7 BurgeS ParkinsonGN HazelP ToddAK and NeidleS(2006) Quadruplex DNA sequence topology and structureNucleic Acids Res 34 5402ndash5415

8 MaizelsN (2006) Dynamic roles for G4 DNA in the biology ofeukaryotic cells Nat Struct Mol Biol 13 1055ndash1059

9 PaeschkeK SimonssonT PostbergJ RhodesD and LippsHJ(2005) Telomere end-binding proteins control the formation ofG-quadruplex DNA structures in vivo Nat Struct Mol Biol 12847ndash854

10 ToddAK HaiderSM ParkinsonGN and NeidleS (2007)Sequence occurrence and structural uniqueness of a G-quadruplexin the human c-kit promoter Nucleic Acids Res 35 5799ndash5808

11 DuquetteML HandaP VincentJA TaylorAF andMaizelsN (2004) Intracellular transcription of G-rich DNAsinduces formation of G-loops novel structures containing G4DNA Genes Dev 18 1618ndash1629

12 WielandM and HartigJS (2007) RNA quadruplex-basedmodulation of gene expression Chem Biol 14 757ndash763

13 SaccaB LacroixL and MergnyJL (2005) The effect of chemicalmodifications on the thermal stability of different G-quadruplex-forming oligonucleotides Nucleic Acids Res 33 1182ndash1192

14 KumariS BugautA HuppertJL and BalasubramanianS (2007)An RNA G-quadruplex in the 50 UTR of the NRAS proto-oncogene modulates translation Nat Chem Biol 3 218ndash221

15 BashkirovVI ScherthanH SolingerJA BuersteddeJM andHeyerWD (1997) A mouse cytoplasmic exoribonuclease(mXRN1p) with preference for G4 tetraplex substrates J Cell Biol136 761ndash773

16 NishimuraY MartinCL Vazquez-LopezA SpenceSJAlvarez-RetuertoAI SigmanM SteindlerC PellegriniSSchanenNC et al (2007) Genome-wide expression profiling oflymphoblastoid cell lines distinguishes different forms of autism andreveals shared pathways Hum Mol Genet 16 1682ndash1698

17 BaggaPS FordLP ChenF and WiluszJ (1995) The G-richauxiliary downstream element has distinct sequence and positionrequirements and mediates efficient 30 end pre-mRNA processingthrough a trans-acting factor Nucleic Acids Res 23 1625ndash1631

18 BaggaPS ArhinGK and WiluszJ (1998) DSEF-1 is amember of the hnRNP H family of RNA-binding proteins andstimulates pre-mRNA cleavage and polyadenylation in vitroNucleic Acids Res 26 5343ndash5350

19 ArhinGK BootsM BaggaPS MilcarekC and WiluszJ(2002) Downstream sequence elements with different affinities forthe hnRNP HH0 protein influence the processing efficiency ofmammalian polyadenylation signals Nucleic Acids Res 301842ndash1850

Nucleic Acids Research 2008 Vol 36 Database issue D147

20 DalzielM NunesNM and FurgerA (2007) Two G-rich regula-tory elements located adjacent to and 440 nucleotides downstreamof the core poly(A) site of the intronless melanocortin receptor 1gene are critical for efficient 30 end processing Mol Cell Biol 271568ndash1580

21 KikinO DrsquoAntonioL and BaggaPS (2006) QGRS Mapper aweb-based server for predicting G-quadruplexes in nucleotidesequences Nucleic Acids Res 34 W676ndashW682

22 VeraldiKL ArhinGK MartincicK Chung-GansterLHWiluszJ and MilcarekC (2001) hnRNP F influences bindingof a 64-kilodalton subunit of cleavage stimulation factor to mRNAprecursors in mouse B cells Mol Cell Biol 21 1228ndash1238

23 BruceSR DingleRW and PetersonML (2003) B-cell andplasma-cell splicing differences a potential role in regulatedimmunoglobulin RNA processing RNA 9 1264ndash1273

24 GarneauD RevilT FisetteJF and ChabotB (2005)Heterogeneous nuclear ribonucleoprotein FH proteins modulatethe alternative splicing of the apoptotic mediator Bcl-xJ Biol Chem 280 22641ndash22650

25 ZhangQS MancheL XuRM and KrainerAR (2006) hnRNPA1 associates with telomere ends and stimulates telomerase activityRNA 12 1116ndash1128

26 HanK YeoG AnP BurgeCB and GrabowskiPJ (2005)A combinatorial code for splicing silencing UAGG and GGGGmotifs PLoS Biol 3 e158

27 WangE DimovaN and CambiF (2007) PLPDM20 ratio isregulated by hnRNPH and F and a novel G-rich enhancer inoligodendrocytes Nucleic Acids Res 35 4164ndash4178

28 TianB HuJ ZhangH and LutzCS (2005) A large-scaleanalysis of mRNA polyadenylation of human and mouse genesNucleic Acids Res 33 201ndash212

29 JohnsonJM CastleJ Garrett-EngeleP KanZ LoerchPMArmourCD SantosR SchadtEE StoughtonR et al (2003)Genome-wide survey of human alternative pre-mRNA splicing withexon junction microarrays Science 302 2141ndash2144

30 HuppertJL and BalasubramanianS (2005) Prevalence ofquadruplexes in the human genome Nucleic Acids Res 332908ndash2916

31 ToddAK JohnstonM and NeidleS (2005) Highly prevalentputative quadruplex sequence motifs in human DNANucleic Acids Res 33 2901ndash2907

32 EddyJ and MaizelsN (2006) Gene function correlates withpotential for G4 DNA formation in the human genomeNucleic Acids Res 34 3887ndash3896

33 KostadinovR MalhotraN ViottiM ShineR DrsquoAntonioLand BaggaP (2006) GRSDB a database of quadruplex formingG-rich sequences in alternatively processed mammalian pre-mRNAsequences Nucleic Acids Res 34 D119ndashD124

34 BonnalS SchaefferC CreancierL ClamensS MoineHPratsAC and VagnerS (2003) A single internal ribosome entrysite containing a G quartet RNA structure drives fibroblast growthfactor 2 gene expression at four alternative translation initiationcodons J Biol Chem 278 39330ndash39336

35 OliverAW BogdarinaI SchroederE TaylorIA andKnealeGG (2000) Preferential binding of fd gene 5 protein totetraplex nucleic acid structures J Mol Biol 301 575ndash584

36 MignoneF GissiC LiuniS and PesoleG (2002)Untranslated regions of mRNAs Genome Biol 3 reviews00041-reviews000410

37 CrnugeljM SketP and PlavecJ (2003) Small change in a G-richsequence a dramatic change in topology new dimericG-quadruplex folding motif with unique loop orientationsJ Am Chem Soc 125 7866ndash7871

38 HazelP HuppertJ BalasubramanianS and NeidleS (2004)Loop-length-dependent folding of G-quadruplexesJ Am Chem Soc 126 16405ndash16415

39 RisitanoA and FoxKR (2004) Influence of loop size on thestability of intramolecular DNA quadruplexes Nucleic Acids Res32 2598ndash2606

40 ZarudnayaMI KolomietsIM PotyahayloAL andHovorunDM (2003) Downstream elements of mammalianpre-mRNA polyadenylation signals primary secondary and higher-order structures Nucleic Acids Res 31 1375ndash1386

41 KankiaBI BaranyG and Musier-ForsythK (2005) Unfoldingof DNA quadruplexes induced by HIV-1 nucleocapsid proteinNucleic Acids Res 33 4395ndash4403

D148 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View has a table presenting basic informationon the properties of the gene the number of QGRS foundand the gene ontology terms associated with the gene(Figure 2) For each alternatively spliced product a mapof the exon and intron structure is given together withQGRS information for that product

In the Gene View the user has several options availableto filter which QGRS will be displayed in the Data andSequence Views For example the G-score loop sizeminimum size G-group and maximum QGRS length maybe set by the user

From the Gene View the user may choose any of theother interfaces The Data View displays the actual nucleo-tide sequence for each QGRS and locates its position in anexon intron or near a poly(A) signal (Figure 3) The resultsof this page may be exported to Excel There are twoversions of the Data View One view shows only non-overlapping QGRS the other view displays all QGRSoverlapping or not

In the Sequence View the nucleotide sequence for theentire gene is displayed Exons are listed in purple and eachQGRS is shown in yellow The Graphic View gives the usera highly interactive visual tool to zoom in on any portion ofthe gene and analyze QGRS located in that section

Structure and features of GRS_UTRdb

GRS_UTRdb contains information on the compositionand distribution of QGRS in the UTRs of eukaryoticmRNA sequences

Database users have 14 search fields to define queriesFields such as GeneID Gene NameSymbol GI andmRNA Accession Number allow the user to look forspecific genes and mRNA products There are fields forgene ontology function process or component The usermay select which organism to search on Also ranges forthe lengths of mRNA 50 UTR 30UTR or CDS may bespecifiedQuery results are summarized in a table from which

the user may select a product for further analysisGRS_UTRdb has six ways of viewing mRNA datamRNA Map Data View (no overlaps) Data View (withoverlaps) Sequence View Gene View Alternate ProductsThe mRNA map contains a table showing an overview

of information about the product including the lengths ofthe 50 UTR CDS and 30 UTR (Figure 4) The numberof QGRS found in each region is displayed There isa visual map of the locations of QGRS within each regionof the productThe Data View shows the actual sequence of every

QGRS in the product The view gives the location of eachQGRS in the 50 UTR CDS 30 UTR or near poly(A)signal Also G-score and the distance of QGRS from thenearest region boundaries are shown There are twoversions of the Data View one that displays only non-overlapping QGRS and a view that shows all QGRSThe sequence for the entire product may be found in

the Sequence View and separate displays are shown foreach region in the product The QGRS are shown ina box with each G-group in purple (Figure 5)

Figure 3 GRSDB2 CCRK Data View A listing of the nucleotide sequences for all QGRS the location of QGRS in exons introns and nearpoly(A) signals Results are shown for each product of the gene What is shown is a truncated image of the view The QGRS displayed satisfy theconditions that G-scores are in the range from 0 to 25 (which in effect restricts the output to QGRS with G-groups of size 2) and loop size is inthe range from 1 to 7

Nucleic Acids Research 2008 Vol 36 Database issue D145

Figure 5 GRS_UTRdb NM_1987091 Sequence View Shows the nucleotide sequence for the entire product in separate displays for the 50 UTRCDS and 30 UTR QGRS are enclosed in a box with each G-group shown in purple

Figure 4 GRS_UTRdb NM_1987091 mRNA (ARSB Gene) Map Displays basic information about the 50 UTR CDS and 30 UTR and the QGRSfrequency in each region A visual map of each region is displayed with the relative positions of QGRS depicted in the map

D146 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View gives basic information about genesincluding gene ontology terms and the location of poly(A)signals and sites Each mRNA product is listed togetherwith the distribution of QGRS there The lsquoAlternateProductsrsquo tab takes the user directly to the QGRSinformation for each product associated with the gene

CONCLUSIONS

GRSDB2 and GRS_UTRdb provide curated data on thecomposition and distribution of putative QGRS in thetranscribed regions of a large number of alternativelyprocessed eukaryotic genes GRSDB2 is useful for studyingG-quadruplexes near RNA-processing sites particularlythose that are differentially processed At present itcontains data for 29 288 genes encompassing 42 932products from several eukaryotic organisms More than 3million QGRS have been mapped to these genes Theavailability of large number of pre-mRNAs with mappedQGRS makes it possible to perform a variety of bioinfor-matics studies The database website already offers a rangeof computational tools to aid large scale as well asindividual gene analysis The lsquoWorkbenchrsquo can be used toperform computations on sets and sub-sets of genes in thedatabase The lsquoGene Viewrsquo lsquoData Viewrsquo and lsquoSequenceViewrsquo are useful for studying individual genes and theirmultiple products The highly interactive lsquoGraphic Viewrsquo isparticularly useful for working with parts of the genes

The new GRS_UTRdb offers a valuable resource forinvestigating G-quadruplexes in the UTRs of mRNACurrently it contains data for more than 16000eukaryotic mRNAs including 27000 QGRS whichhave been mapped to the 50 UTRs Like GRSDB2GRS_UTRdb also displays QGRS data in a variety ofmodes with computational capabilities and links At thispoint it does not have a lsquoWorkbenchrsquo facility We areconstantly adding new genes and new computational toolsto the website Since genes containing G-quadruplexmotifs could be regulated through special mechanismsone can expect for the gene function to correlate withG-quadruplex formation (32) We have classified the geneentries in our databases according to the gene ontologycategories which allows for queries with relevant terms

Researchers interested in the functional relevance ofG-quadruplex structure in particular its role in regulatingthe gene expression at post-transcriptional level will findboth the databases to be of great value

ACKNOWLEDGEMENTS

We thank Manuel Viotti for assistance with the testingand uploading data in the initial stages of this projectThe authors would like to acknowledge RumenKostadinov who developed the first version of GRSDBWe would also like to acknowledge Marcelo Halpern fortechnical assistance with the web server This project wasfunded in part by a grant from the Provost Office ofRamapo College of New Jersey Funding for the OpenAccess publication charges for this article was provided inpart by the Divisions of Student Affairs and Academic

Affairs of Ramapo College of New Jersey and the BergenCounty Academies Hackensack New Jersey

Conflict of interest statement None declared

REFERENCES

1 GellertM LipsettMN and DaviesDR (1962) Helix formationby guanylic acid Proc Natl Acad Sci USA 48 2013ndash2018

2 SchaffitzelC BergerI PostbergJ HanesJ LippsHJ andPluckthunA (2001) In vitro generated antibodies specificfor telomeric guanine-quadruplex DNA react withStylonychia lemnae macronuclei Proc Natl Acad Sci USA 988572ndash8577

3 HalderK and ChowdhuryS (2005) Kinetic resolution of bimole-cular hybridization versus intramolecular folding in nucleic acids bysurface plasmon resonance application to G-quadruplexduplexcompetition in human c-myc promoter Nucleic Acids Res 334466ndash4474

4 SimonssonT (2001) G-quadruplex DNA structures ndash variations ona theme Biol Chem 382 621ndash628

5 DavisJT (2004) G-quartets 40 years later from 50-GMP tomolecular biology and supramolecular chemistry Angew Chem IntEd Engl 43 668ndash698

6 KellandLR (2005) Overcoming the immortality of tumour cells bytelomere and telomerase based cancer therapeutics ndash current statusand future prospects Eur J Cancer 41 971ndash979

7 BurgeS ParkinsonGN HazelP ToddAK and NeidleS(2006) Quadruplex DNA sequence topology and structureNucleic Acids Res 34 5402ndash5415

8 MaizelsN (2006) Dynamic roles for G4 DNA in the biology ofeukaryotic cells Nat Struct Mol Biol 13 1055ndash1059

9 PaeschkeK SimonssonT PostbergJ RhodesD and LippsHJ(2005) Telomere end-binding proteins control the formation ofG-quadruplex DNA structures in vivo Nat Struct Mol Biol 12847ndash854

10 ToddAK HaiderSM ParkinsonGN and NeidleS (2007)Sequence occurrence and structural uniqueness of a G-quadruplexin the human c-kit promoter Nucleic Acids Res 35 5799ndash5808

11 DuquetteML HandaP VincentJA TaylorAF andMaizelsN (2004) Intracellular transcription of G-rich DNAsinduces formation of G-loops novel structures containing G4DNA Genes Dev 18 1618ndash1629

12 WielandM and HartigJS (2007) RNA quadruplex-basedmodulation of gene expression Chem Biol 14 757ndash763

13 SaccaB LacroixL and MergnyJL (2005) The effect of chemicalmodifications on the thermal stability of different G-quadruplex-forming oligonucleotides Nucleic Acids Res 33 1182ndash1192

14 KumariS BugautA HuppertJL and BalasubramanianS (2007)An RNA G-quadruplex in the 50 UTR of the NRAS proto-oncogene modulates translation Nat Chem Biol 3 218ndash221

15 BashkirovVI ScherthanH SolingerJA BuersteddeJM andHeyerWD (1997) A mouse cytoplasmic exoribonuclease(mXRN1p) with preference for G4 tetraplex substrates J Cell Biol136 761ndash773

16 NishimuraY MartinCL Vazquez-LopezA SpenceSJAlvarez-RetuertoAI SigmanM SteindlerC PellegriniSSchanenNC et al (2007) Genome-wide expression profiling oflymphoblastoid cell lines distinguishes different forms of autism andreveals shared pathways Hum Mol Genet 16 1682ndash1698

17 BaggaPS FordLP ChenF and WiluszJ (1995) The G-richauxiliary downstream element has distinct sequence and positionrequirements and mediates efficient 30 end pre-mRNA processingthrough a trans-acting factor Nucleic Acids Res 23 1625ndash1631

18 BaggaPS ArhinGK and WiluszJ (1998) DSEF-1 is amember of the hnRNP H family of RNA-binding proteins andstimulates pre-mRNA cleavage and polyadenylation in vitroNucleic Acids Res 26 5343ndash5350

19 ArhinGK BootsM BaggaPS MilcarekC and WiluszJ(2002) Downstream sequence elements with different affinities forthe hnRNP HH0 protein influence the processing efficiency ofmammalian polyadenylation signals Nucleic Acids Res 301842ndash1850

Nucleic Acids Research 2008 Vol 36 Database issue D147

20 DalzielM NunesNM and FurgerA (2007) Two G-rich regula-tory elements located adjacent to and 440 nucleotides downstreamof the core poly(A) site of the intronless melanocortin receptor 1gene are critical for efficient 30 end processing Mol Cell Biol 271568ndash1580

21 KikinO DrsquoAntonioL and BaggaPS (2006) QGRS Mapper aweb-based server for predicting G-quadruplexes in nucleotidesequences Nucleic Acids Res 34 W676ndashW682

22 VeraldiKL ArhinGK MartincicK Chung-GansterLHWiluszJ and MilcarekC (2001) hnRNP F influences bindingof a 64-kilodalton subunit of cleavage stimulation factor to mRNAprecursors in mouse B cells Mol Cell Biol 21 1228ndash1238

23 BruceSR DingleRW and PetersonML (2003) B-cell andplasma-cell splicing differences a potential role in regulatedimmunoglobulin RNA processing RNA 9 1264ndash1273

24 GarneauD RevilT FisetteJF and ChabotB (2005)Heterogeneous nuclear ribonucleoprotein FH proteins modulatethe alternative splicing of the apoptotic mediator Bcl-xJ Biol Chem 280 22641ndash22650

25 ZhangQS MancheL XuRM and KrainerAR (2006) hnRNPA1 associates with telomere ends and stimulates telomerase activityRNA 12 1116ndash1128

26 HanK YeoG AnP BurgeCB and GrabowskiPJ (2005)A combinatorial code for splicing silencing UAGG and GGGGmotifs PLoS Biol 3 e158

27 WangE DimovaN and CambiF (2007) PLPDM20 ratio isregulated by hnRNPH and F and a novel G-rich enhancer inoligodendrocytes Nucleic Acids Res 35 4164ndash4178

28 TianB HuJ ZhangH and LutzCS (2005) A large-scaleanalysis of mRNA polyadenylation of human and mouse genesNucleic Acids Res 33 201ndash212

29 JohnsonJM CastleJ Garrett-EngeleP KanZ LoerchPMArmourCD SantosR SchadtEE StoughtonR et al (2003)Genome-wide survey of human alternative pre-mRNA splicing withexon junction microarrays Science 302 2141ndash2144

30 HuppertJL and BalasubramanianS (2005) Prevalence ofquadruplexes in the human genome Nucleic Acids Res 332908ndash2916

31 ToddAK JohnstonM and NeidleS (2005) Highly prevalentputative quadruplex sequence motifs in human DNANucleic Acids Res 33 2901ndash2907

32 EddyJ and MaizelsN (2006) Gene function correlates withpotential for G4 DNA formation in the human genomeNucleic Acids Res 34 3887ndash3896

33 KostadinovR MalhotraN ViottiM ShineR DrsquoAntonioLand BaggaP (2006) GRSDB a database of quadruplex formingG-rich sequences in alternatively processed mammalian pre-mRNAsequences Nucleic Acids Res 34 D119ndashD124

34 BonnalS SchaefferC CreancierL ClamensS MoineHPratsAC and VagnerS (2003) A single internal ribosome entrysite containing a G quartet RNA structure drives fibroblast growthfactor 2 gene expression at four alternative translation initiationcodons J Biol Chem 278 39330ndash39336

35 OliverAW BogdarinaI SchroederE TaylorIA andKnealeGG (2000) Preferential binding of fd gene 5 protein totetraplex nucleic acid structures J Mol Biol 301 575ndash584

36 MignoneF GissiC LiuniS and PesoleG (2002)Untranslated regions of mRNAs Genome Biol 3 reviews00041-reviews000410

37 CrnugeljM SketP and PlavecJ (2003) Small change in a G-richsequence a dramatic change in topology new dimericG-quadruplex folding motif with unique loop orientationsJ Am Chem Soc 125 7866ndash7871

38 HazelP HuppertJ BalasubramanianS and NeidleS (2004)Loop-length-dependent folding of G-quadruplexesJ Am Chem Soc 126 16405ndash16415

39 RisitanoA and FoxKR (2004) Influence of loop size on thestability of intramolecular DNA quadruplexes Nucleic Acids Res32 2598ndash2606

40 ZarudnayaMI KolomietsIM PotyahayloAL andHovorunDM (2003) Downstream elements of mammalianpre-mRNA polyadenylation signals primary secondary and higher-order structures Nucleic Acids Res 31 1375ndash1386

41 KankiaBI BaranyG and Musier-ForsythK (2005) Unfoldingof DNA quadruplexes induced by HIV-1 nucleocapsid proteinNucleic Acids Res 33 4395ndash4403

D148 Nucleic Acids Research 2008 Vol 36 Database issue

Figure 5 GRS_UTRdb NM_1987091 Sequence View Shows the nucleotide sequence for the entire product in separate displays for the 50 UTRCDS and 30 UTR QGRS are enclosed in a box with each G-group shown in purple

Figure 4 GRS_UTRdb NM_1987091 mRNA (ARSB Gene) Map Displays basic information about the 50 UTR CDS and 30 UTR and the QGRSfrequency in each region A visual map of each region is displayed with the relative positions of QGRS depicted in the map

D146 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View gives basic information about genesincluding gene ontology terms and the location of poly(A)signals and sites Each mRNA product is listed togetherwith the distribution of QGRS there The lsquoAlternateProductsrsquo tab takes the user directly to the QGRSinformation for each product associated with the gene

CONCLUSIONS

GRSDB2 and GRS_UTRdb provide curated data on thecomposition and distribution of putative QGRS in thetranscribed regions of a large number of alternativelyprocessed eukaryotic genes GRSDB2 is useful for studyingG-quadruplexes near RNA-processing sites particularlythose that are differentially processed At present itcontains data for 29 288 genes encompassing 42 932products from several eukaryotic organisms More than 3million QGRS have been mapped to these genes Theavailability of large number of pre-mRNAs with mappedQGRS makes it possible to perform a variety of bioinfor-matics studies The database website already offers a rangeof computational tools to aid large scale as well asindividual gene analysis The lsquoWorkbenchrsquo can be used toperform computations on sets and sub-sets of genes in thedatabase The lsquoGene Viewrsquo lsquoData Viewrsquo and lsquoSequenceViewrsquo are useful for studying individual genes and theirmultiple products The highly interactive lsquoGraphic Viewrsquo isparticularly useful for working with parts of the genes

The new GRS_UTRdb offers a valuable resource forinvestigating G-quadruplexes in the UTRs of mRNACurrently it contains data for more than 16000eukaryotic mRNAs including 27000 QGRS whichhave been mapped to the 50 UTRs Like GRSDB2GRS_UTRdb also displays QGRS data in a variety ofmodes with computational capabilities and links At thispoint it does not have a lsquoWorkbenchrsquo facility We areconstantly adding new genes and new computational toolsto the website Since genes containing G-quadruplexmotifs could be regulated through special mechanismsone can expect for the gene function to correlate withG-quadruplex formation (32) We have classified the geneentries in our databases according to the gene ontologycategories which allows for queries with relevant terms

Researchers interested in the functional relevance ofG-quadruplex structure in particular its role in regulatingthe gene expression at post-transcriptional level will findboth the databases to be of great value

ACKNOWLEDGEMENTS

We thank Manuel Viotti for assistance with the testingand uploading data in the initial stages of this projectThe authors would like to acknowledge RumenKostadinov who developed the first version of GRSDBWe would also like to acknowledge Marcelo Halpern fortechnical assistance with the web server This project wasfunded in part by a grant from the Provost Office ofRamapo College of New Jersey Funding for the OpenAccess publication charges for this article was provided inpart by the Divisions of Student Affairs and Academic

Affairs of Ramapo College of New Jersey and the BergenCounty Academies Hackensack New Jersey

Conflict of interest statement None declared

REFERENCES

1 GellertM LipsettMN and DaviesDR (1962) Helix formationby guanylic acid Proc Natl Acad Sci USA 48 2013ndash2018

2 SchaffitzelC BergerI PostbergJ HanesJ LippsHJ andPluckthunA (2001) In vitro generated antibodies specificfor telomeric guanine-quadruplex DNA react withStylonychia lemnae macronuclei Proc Natl Acad Sci USA 988572ndash8577

3 HalderK and ChowdhuryS (2005) Kinetic resolution of bimole-cular hybridization versus intramolecular folding in nucleic acids bysurface plasmon resonance application to G-quadruplexduplexcompetition in human c-myc promoter Nucleic Acids Res 334466ndash4474

4 SimonssonT (2001) G-quadruplex DNA structures ndash variations ona theme Biol Chem 382 621ndash628

5 DavisJT (2004) G-quartets 40 years later from 50-GMP tomolecular biology and supramolecular chemistry Angew Chem IntEd Engl 43 668ndash698

6 KellandLR (2005) Overcoming the immortality of tumour cells bytelomere and telomerase based cancer therapeutics ndash current statusand future prospects Eur J Cancer 41 971ndash979

7 BurgeS ParkinsonGN HazelP ToddAK and NeidleS(2006) Quadruplex DNA sequence topology and structureNucleic Acids Res 34 5402ndash5415

8 MaizelsN (2006) Dynamic roles for G4 DNA in the biology ofeukaryotic cells Nat Struct Mol Biol 13 1055ndash1059

9 PaeschkeK SimonssonT PostbergJ RhodesD and LippsHJ(2005) Telomere end-binding proteins control the formation ofG-quadruplex DNA structures in vivo Nat Struct Mol Biol 12847ndash854

10 ToddAK HaiderSM ParkinsonGN and NeidleS (2007)Sequence occurrence and structural uniqueness of a G-quadruplexin the human c-kit promoter Nucleic Acids Res 35 5799ndash5808

11 DuquetteML HandaP VincentJA TaylorAF andMaizelsN (2004) Intracellular transcription of G-rich DNAsinduces formation of G-loops novel structures containing G4DNA Genes Dev 18 1618ndash1629

12 WielandM and HartigJS (2007) RNA quadruplex-basedmodulation of gene expression Chem Biol 14 757ndash763

13 SaccaB LacroixL and MergnyJL (2005) The effect of chemicalmodifications on the thermal stability of different G-quadruplex-forming oligonucleotides Nucleic Acids Res 33 1182ndash1192

14 KumariS BugautA HuppertJL and BalasubramanianS (2007)An RNA G-quadruplex in the 50 UTR of the NRAS proto-oncogene modulates translation Nat Chem Biol 3 218ndash221

15 BashkirovVI ScherthanH SolingerJA BuersteddeJM andHeyerWD (1997) A mouse cytoplasmic exoribonuclease(mXRN1p) with preference for G4 tetraplex substrates J Cell Biol136 761ndash773

16 NishimuraY MartinCL Vazquez-LopezA SpenceSJAlvarez-RetuertoAI SigmanM SteindlerC PellegriniSSchanenNC et al (2007) Genome-wide expression profiling oflymphoblastoid cell lines distinguishes different forms of autism andreveals shared pathways Hum Mol Genet 16 1682ndash1698

17 BaggaPS FordLP ChenF and WiluszJ (1995) The G-richauxiliary downstream element has distinct sequence and positionrequirements and mediates efficient 30 end pre-mRNA processingthrough a trans-acting factor Nucleic Acids Res 23 1625ndash1631

18 BaggaPS ArhinGK and WiluszJ (1998) DSEF-1 is amember of the hnRNP H family of RNA-binding proteins andstimulates pre-mRNA cleavage and polyadenylation in vitroNucleic Acids Res 26 5343ndash5350

19 ArhinGK BootsM BaggaPS MilcarekC and WiluszJ(2002) Downstream sequence elements with different affinities forthe hnRNP HH0 protein influence the processing efficiency ofmammalian polyadenylation signals Nucleic Acids Res 301842ndash1850

Nucleic Acids Research 2008 Vol 36 Database issue D147

20 DalzielM NunesNM and FurgerA (2007) Two G-rich regula-tory elements located adjacent to and 440 nucleotides downstreamof the core poly(A) site of the intronless melanocortin receptor 1gene are critical for efficient 30 end processing Mol Cell Biol 271568ndash1580

21 KikinO DrsquoAntonioL and BaggaPS (2006) QGRS Mapper aweb-based server for predicting G-quadruplexes in nucleotidesequences Nucleic Acids Res 34 W676ndashW682

22 VeraldiKL ArhinGK MartincicK Chung-GansterLHWiluszJ and MilcarekC (2001) hnRNP F influences bindingof a 64-kilodalton subunit of cleavage stimulation factor to mRNAprecursors in mouse B cells Mol Cell Biol 21 1228ndash1238

23 BruceSR DingleRW and PetersonML (2003) B-cell andplasma-cell splicing differences a potential role in regulatedimmunoglobulin RNA processing RNA 9 1264ndash1273

24 GarneauD RevilT FisetteJF and ChabotB (2005)Heterogeneous nuclear ribonucleoprotein FH proteins modulatethe alternative splicing of the apoptotic mediator Bcl-xJ Biol Chem 280 22641ndash22650

25 ZhangQS MancheL XuRM and KrainerAR (2006) hnRNPA1 associates with telomere ends and stimulates telomerase activityRNA 12 1116ndash1128

26 HanK YeoG AnP BurgeCB and GrabowskiPJ (2005)A combinatorial code for splicing silencing UAGG and GGGGmotifs PLoS Biol 3 e158

27 WangE DimovaN and CambiF (2007) PLPDM20 ratio isregulated by hnRNPH and F and a novel G-rich enhancer inoligodendrocytes Nucleic Acids Res 35 4164ndash4178

28 TianB HuJ ZhangH and LutzCS (2005) A large-scaleanalysis of mRNA polyadenylation of human and mouse genesNucleic Acids Res 33 201ndash212

29 JohnsonJM CastleJ Garrett-EngeleP KanZ LoerchPMArmourCD SantosR SchadtEE StoughtonR et al (2003)Genome-wide survey of human alternative pre-mRNA splicing withexon junction microarrays Science 302 2141ndash2144

30 HuppertJL and BalasubramanianS (2005) Prevalence ofquadruplexes in the human genome Nucleic Acids Res 332908ndash2916

31 ToddAK JohnstonM and NeidleS (2005) Highly prevalentputative quadruplex sequence motifs in human DNANucleic Acids Res 33 2901ndash2907

32 EddyJ and MaizelsN (2006) Gene function correlates withpotential for G4 DNA formation in the human genomeNucleic Acids Res 34 3887ndash3896

33 KostadinovR MalhotraN ViottiM ShineR DrsquoAntonioLand BaggaP (2006) GRSDB a database of quadruplex formingG-rich sequences in alternatively processed mammalian pre-mRNAsequences Nucleic Acids Res 34 D119ndashD124

34 BonnalS SchaefferC CreancierL ClamensS MoineHPratsAC and VagnerS (2003) A single internal ribosome entrysite containing a G quartet RNA structure drives fibroblast growthfactor 2 gene expression at four alternative translation initiationcodons J Biol Chem 278 39330ndash39336

35 OliverAW BogdarinaI SchroederE TaylorIA andKnealeGG (2000) Preferential binding of fd gene 5 protein totetraplex nucleic acid structures J Mol Biol 301 575ndash584

36 MignoneF GissiC LiuniS and PesoleG (2002)Untranslated regions of mRNAs Genome Biol 3 reviews00041-reviews000410

37 CrnugeljM SketP and PlavecJ (2003) Small change in a G-richsequence a dramatic change in topology new dimericG-quadruplex folding motif with unique loop orientationsJ Am Chem Soc 125 7866ndash7871

38 HazelP HuppertJ BalasubramanianS and NeidleS (2004)Loop-length-dependent folding of G-quadruplexesJ Am Chem Soc 126 16405ndash16415

39 RisitanoA and FoxKR (2004) Influence of loop size on thestability of intramolecular DNA quadruplexes Nucleic Acids Res32 2598ndash2606

40 ZarudnayaMI KolomietsIM PotyahayloAL andHovorunDM (2003) Downstream elements of mammalianpre-mRNA polyadenylation signals primary secondary and higher-order structures Nucleic Acids Res 31 1375ndash1386

41 KankiaBI BaranyG and Musier-ForsythK (2005) Unfoldingof DNA quadruplexes induced by HIV-1 nucleocapsid proteinNucleic Acids Res 33 4395ndash4403

D148 Nucleic Acids Research 2008 Vol 36 Database issue

The Gene View gives basic information about genesincluding gene ontology terms and the location of poly(A)signals and sites Each mRNA product is listed togetherwith the distribution of QGRS there The lsquoAlternateProductsrsquo tab takes the user directly to the QGRSinformation for each product associated with the gene

CONCLUSIONS

GRSDB2 and GRS_UTRdb provide curated data on thecomposition and distribution of putative QGRS in thetranscribed regions of a large number of alternativelyprocessed eukaryotic genes GRSDB2 is useful for studyingG-quadruplexes near RNA-processing sites particularlythose that are differentially processed At present itcontains data for 29 288 genes encompassing 42 932products from several eukaryotic organisms More than 3million QGRS have been mapped to these genes Theavailability of large number of pre-mRNAs with mappedQGRS makes it possible to perform a variety of bioinfor-matics studies The database website already offers a rangeof computational tools to aid large scale as well asindividual gene analysis The lsquoWorkbenchrsquo can be used toperform computations on sets and sub-sets of genes in thedatabase The lsquoGene Viewrsquo lsquoData Viewrsquo and lsquoSequenceViewrsquo are useful for studying individual genes and theirmultiple products The highly interactive lsquoGraphic Viewrsquo isparticularly useful for working with parts of the genes

The new GRS_UTRdb offers a valuable resource forinvestigating G-quadruplexes in the UTRs of mRNACurrently it contains data for more than 16000eukaryotic mRNAs including 27000 QGRS whichhave been mapped to the 50 UTRs Like GRSDB2GRS_UTRdb also displays QGRS data in a variety ofmodes with computational capabilities and links At thispoint it does not have a lsquoWorkbenchrsquo facility We areconstantly adding new genes and new computational toolsto the website Since genes containing G-quadruplexmotifs could be regulated through special mechanismsone can expect for the gene function to correlate withG-quadruplex formation (32) We have classified the geneentries in our databases according to the gene ontologycategories which allows for queries with relevant terms

Researchers interested in the functional relevance ofG-quadruplex structure in particular its role in regulatingthe gene expression at post-transcriptional level will findboth the databases to be of great value

ACKNOWLEDGEMENTS

We thank Manuel Viotti for assistance with the testingand uploading data in the initial stages of this projectThe authors would like to acknowledge RumenKostadinov who developed the first version of GRSDBWe would also like to acknowledge Marcelo Halpern fortechnical assistance with the web server This project wasfunded in part by a grant from the Provost Office ofRamapo College of New Jersey Funding for the OpenAccess publication charges for this article was provided inpart by the Divisions of Student Affairs and Academic

Affairs of Ramapo College of New Jersey and the BergenCounty Academies Hackensack New Jersey

Conflict of interest statement None declared

REFERENCES

1 GellertM LipsettMN and DaviesDR (1962) Helix formationby guanylic acid Proc Natl Acad Sci USA 48 2013ndash2018

2 SchaffitzelC BergerI PostbergJ HanesJ LippsHJ andPluckthunA (2001) In vitro generated antibodies specificfor telomeric guanine-quadruplex DNA react withStylonychia lemnae macronuclei Proc Natl Acad Sci USA 988572ndash8577

3 HalderK and ChowdhuryS (2005) Kinetic resolution of bimole-cular hybridization versus intramolecular folding in nucleic acids bysurface plasmon resonance application to G-quadruplexduplexcompetition in human c-myc promoter Nucleic Acids Res 334466ndash4474

4 SimonssonT (2001) G-quadruplex DNA structures ndash variations ona theme Biol Chem 382 621ndash628

5 DavisJT (2004) G-quartets 40 years later from 50-GMP tomolecular biology and supramolecular chemistry Angew Chem IntEd Engl 43 668ndash698

6 KellandLR (2005) Overcoming the immortality of tumour cells bytelomere and telomerase based cancer therapeutics ndash current statusand future prospects Eur J Cancer 41 971ndash979

7 BurgeS ParkinsonGN HazelP ToddAK and NeidleS(2006) Quadruplex DNA sequence topology and structureNucleic Acids Res 34 5402ndash5415

8 MaizelsN (2006) Dynamic roles for G4 DNA in the biology ofeukaryotic cells Nat Struct Mol Biol 13 1055ndash1059

9 PaeschkeK SimonssonT PostbergJ RhodesD and LippsHJ(2005) Telomere end-binding proteins control the formation ofG-quadruplex DNA structures in vivo Nat Struct Mol Biol 12847ndash854

10 ToddAK HaiderSM ParkinsonGN and NeidleS (2007)Sequence occurrence and structural uniqueness of a G-quadruplexin the human c-kit promoter Nucleic Acids Res 35 5799ndash5808

11 DuquetteML HandaP VincentJA TaylorAF andMaizelsN (2004) Intracellular transcription of G-rich DNAsinduces formation of G-loops novel structures containing G4DNA Genes Dev 18 1618ndash1629

12 WielandM and HartigJS (2007) RNA quadruplex-basedmodulation of gene expression Chem Biol 14 757ndash763

13 SaccaB LacroixL and MergnyJL (2005) The effect of chemicalmodifications on the thermal stability of different G-quadruplex-forming oligonucleotides Nucleic Acids Res 33 1182ndash1192

14 KumariS BugautA HuppertJL and BalasubramanianS (2007)An RNA G-quadruplex in the 50 UTR of the NRAS proto-oncogene modulates translation Nat Chem Biol 3 218ndash221

15 BashkirovVI ScherthanH SolingerJA BuersteddeJM andHeyerWD (1997) A mouse cytoplasmic exoribonuclease(mXRN1p) with preference for G4 tetraplex substrates J Cell Biol136 761ndash773

16 NishimuraY MartinCL Vazquez-LopezA SpenceSJAlvarez-RetuertoAI SigmanM SteindlerC PellegriniSSchanenNC et al (2007) Genome-wide expression profiling oflymphoblastoid cell lines distinguishes different forms of autism andreveals shared pathways Hum Mol Genet 16 1682ndash1698

17 BaggaPS FordLP ChenF and WiluszJ (1995) The G-richauxiliary downstream element has distinct sequence and positionrequirements and mediates efficient 30 end pre-mRNA processingthrough a trans-acting factor Nucleic Acids Res 23 1625ndash1631

18 BaggaPS ArhinGK and WiluszJ (1998) DSEF-1 is amember of the hnRNP H family of RNA-binding proteins andstimulates pre-mRNA cleavage and polyadenylation in vitroNucleic Acids Res 26 5343ndash5350

19 ArhinGK BootsM BaggaPS MilcarekC and WiluszJ(2002) Downstream sequence elements with different affinities forthe hnRNP HH0 protein influence the processing efficiency ofmammalian polyadenylation signals Nucleic Acids Res 301842ndash1850

Nucleic Acids Research 2008 Vol 36 Database issue D147

20 DalzielM NunesNM and FurgerA (2007) Two G-rich regula-tory elements located adjacent to and 440 nucleotides downstreamof the core poly(A) site of the intronless melanocortin receptor 1gene are critical for efficient 30 end processing Mol Cell Biol 271568ndash1580

21 KikinO DrsquoAntonioL and BaggaPS (2006) QGRS Mapper aweb-based server for predicting G-quadruplexes in nucleotidesequences Nucleic Acids Res 34 W676ndashW682

22 VeraldiKL ArhinGK MartincicK Chung-GansterLHWiluszJ and MilcarekC (2001) hnRNP F influences bindingof a 64-kilodalton subunit of cleavage stimulation factor to mRNAprecursors in mouse B cells Mol Cell Biol 21 1228ndash1238

23 BruceSR DingleRW and PetersonML (2003) B-cell andplasma-cell splicing differences a potential role in regulatedimmunoglobulin RNA processing RNA 9 1264ndash1273

24 GarneauD RevilT FisetteJF and ChabotB (2005)Heterogeneous nuclear ribonucleoprotein FH proteins modulatethe alternative splicing of the apoptotic mediator Bcl-xJ Biol Chem 280 22641ndash22650

25 ZhangQS MancheL XuRM and KrainerAR (2006) hnRNPA1 associates with telomere ends and stimulates telomerase activityRNA 12 1116ndash1128

26 HanK YeoG AnP BurgeCB and GrabowskiPJ (2005)A combinatorial code for splicing silencing UAGG and GGGGmotifs PLoS Biol 3 e158

27 WangE DimovaN and CambiF (2007) PLPDM20 ratio isregulated by hnRNPH and F and a novel G-rich enhancer inoligodendrocytes Nucleic Acids Res 35 4164ndash4178

28 TianB HuJ ZhangH and LutzCS (2005) A large-scaleanalysis of mRNA polyadenylation of human and mouse genesNucleic Acids Res 33 201ndash212

29 JohnsonJM CastleJ Garrett-EngeleP KanZ LoerchPMArmourCD SantosR SchadtEE StoughtonR et al (2003)Genome-wide survey of human alternative pre-mRNA splicing withexon junction microarrays Science 302 2141ndash2144

30 HuppertJL and BalasubramanianS (2005) Prevalence ofquadruplexes in the human genome Nucleic Acids Res 332908ndash2916

31 ToddAK JohnstonM and NeidleS (2005) Highly prevalentputative quadruplex sequence motifs in human DNANucleic Acids Res 33 2901ndash2907

32 EddyJ and MaizelsN (2006) Gene function correlates withpotential for G4 DNA formation in the human genomeNucleic Acids Res 34 3887ndash3896

33 KostadinovR MalhotraN ViottiM ShineR DrsquoAntonioLand BaggaP (2006) GRSDB a database of quadruplex formingG-rich sequences in alternatively processed mammalian pre-mRNAsequences Nucleic Acids Res 34 D119ndashD124

34 BonnalS SchaefferC CreancierL ClamensS MoineHPratsAC and VagnerS (2003) A single internal ribosome entrysite containing a G quartet RNA structure drives fibroblast growthfactor 2 gene expression at four alternative translation initiationcodons J Biol Chem 278 39330ndash39336

35 OliverAW BogdarinaI SchroederE TaylorIA andKnealeGG (2000) Preferential binding of fd gene 5 protein totetraplex nucleic acid structures J Mol Biol 301 575ndash584

36 MignoneF GissiC LiuniS and PesoleG (2002)Untranslated regions of mRNAs Genome Biol 3 reviews00041-reviews000410

37 CrnugeljM SketP and PlavecJ (2003) Small change in a G-richsequence a dramatic change in topology new dimericG-quadruplex folding motif with unique loop orientationsJ Am Chem Soc 125 7866ndash7871

38 HazelP HuppertJ BalasubramanianS and NeidleS (2004)Loop-length-dependent folding of G-quadruplexesJ Am Chem Soc 126 16405ndash16415

39 RisitanoA and FoxKR (2004) Influence of loop size on thestability of intramolecular DNA quadruplexes Nucleic Acids Res32 2598ndash2606

40 ZarudnayaMI KolomietsIM PotyahayloAL andHovorunDM (2003) Downstream elements of mammalianpre-mRNA polyadenylation signals primary secondary and higher-order structures Nucleic Acids Res 31 1375ndash1386

41 KankiaBI BaranyG and Musier-ForsythK (2005) Unfoldingof DNA quadruplexes induced by HIV-1 nucleocapsid proteinNucleic Acids Res 33 4395ndash4403

D148 Nucleic Acids Research 2008 Vol 36 Database issue

20 DalzielM NunesNM and FurgerA (2007) Two G-rich regula-tory elements located adjacent to and 440 nucleotides downstreamof the core poly(A) site of the intronless melanocortin receptor 1gene are critical for efficient 30 end processing Mol Cell Biol 271568ndash1580

21 KikinO DrsquoAntonioL and BaggaPS (2006) QGRS Mapper aweb-based server for predicting G-quadruplexes in nucleotidesequences Nucleic Acids Res 34 W676ndashW682

22 VeraldiKL ArhinGK MartincicK Chung-GansterLHWiluszJ and MilcarekC (2001) hnRNP F influences bindingof a 64-kilodalton subunit of cleavage stimulation factor to mRNAprecursors in mouse B cells Mol Cell Biol 21 1228ndash1238

23 BruceSR DingleRW and PetersonML (2003) B-cell andplasma-cell splicing differences a potential role in regulatedimmunoglobulin RNA processing RNA 9 1264ndash1273

24 GarneauD RevilT FisetteJF and ChabotB (2005)Heterogeneous nuclear ribonucleoprotein FH proteins modulatethe alternative splicing of the apoptotic mediator Bcl-xJ Biol Chem 280 22641ndash22650

25 ZhangQS MancheL XuRM and KrainerAR (2006) hnRNPA1 associates with telomere ends and stimulates telomerase activityRNA 12 1116ndash1128

26 HanK YeoG AnP BurgeCB and GrabowskiPJ (2005)A combinatorial code for splicing silencing UAGG and GGGGmotifs PLoS Biol 3 e158

27 WangE DimovaN and CambiF (2007) PLPDM20 ratio isregulated by hnRNPH and F and a novel G-rich enhancer inoligodendrocytes Nucleic Acids Res 35 4164ndash4178

28 TianB HuJ ZhangH and LutzCS (2005) A large-scaleanalysis of mRNA polyadenylation of human and mouse genesNucleic Acids Res 33 201ndash212

29 JohnsonJM CastleJ Garrett-EngeleP KanZ LoerchPMArmourCD SantosR SchadtEE StoughtonR et al (2003)Genome-wide survey of human alternative pre-mRNA splicing withexon junction microarrays Science 302 2141ndash2144

30 HuppertJL and BalasubramanianS (2005) Prevalence ofquadruplexes in the human genome Nucleic Acids Res 332908ndash2916

31 ToddAK JohnstonM and NeidleS (2005) Highly prevalentputative quadruplex sequence motifs in human DNANucleic Acids Res 33 2901ndash2907

32 EddyJ and MaizelsN (2006) Gene function correlates withpotential for G4 DNA formation in the human genomeNucleic Acids Res 34 3887ndash3896

33 KostadinovR MalhotraN ViottiM ShineR DrsquoAntonioLand BaggaP (2006) GRSDB a database of quadruplex formingG-rich sequences in alternatively processed mammalian pre-mRNAsequences Nucleic Acids Res 34 D119ndashD124

34 BonnalS SchaefferC CreancierL ClamensS MoineHPratsAC and VagnerS (2003) A single internal ribosome entrysite containing a G quartet RNA structure drives fibroblast growthfactor 2 gene expression at four alternative translation initiationcodons J Biol Chem 278 39330ndash39336

35 OliverAW BogdarinaI SchroederE TaylorIA andKnealeGG (2000) Preferential binding of fd gene 5 protein totetraplex nucleic acid structures J Mol Biol 301 575ndash584

36 MignoneF GissiC LiuniS and PesoleG (2002)Untranslated regions of mRNAs Genome Biol 3 reviews00041-reviews000410

37 CrnugeljM SketP and PlavecJ (2003) Small change in a G-richsequence a dramatic change in topology new dimericG-quadruplex folding motif with unique loop orientationsJ Am Chem Soc 125 7866ndash7871

38 HazelP HuppertJ BalasubramanianS and NeidleS (2004)Loop-length-dependent folding of G-quadruplexesJ Am Chem Soc 126 16405ndash16415

39 RisitanoA and FoxKR (2004) Influence of loop size on thestability of intramolecular DNA quadruplexes Nucleic Acids Res32 2598ndash2606

40 ZarudnayaMI KolomietsIM PotyahayloAL andHovorunDM (2003) Downstream elements of mammalianpre-mRNA polyadenylation signals primary secondary and higher-order structures Nucleic Acids Res 31 1375ndash1386

41 KankiaBI BaranyG and Musier-ForsythK (2005) Unfoldingof DNA quadruplexes induced by HIV-1 nucleocapsid proteinNucleic Acids Res 33 4395ndash4403

D148 Nucleic Acids Research 2008 Vol 36 Database issue


Top Related