+ All Categories
Home > Documents > The current status of cDNA cloning

The current status of cDNA cloning

Date post: 30-Oct-2016
Category:
Upload: matthias-harbers
View: 212 times
Download: 0 times
Share this document with a friend
11
Review The current status of cDNA cloning Matthias Harbers DNAFORM, Inc., Leading Venture Plaza 2, 75-1 Ono-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0046, Japan Received 17 August 2007; accepted 17 November 2007 Available online 28 January 2008 Abstract The cloning of cDNAs, copies of cellular RNA, is one of the classical technologies in molecular biology. Over the past 30 years cDNA cloning technologies have been improved to enable the cloning of large cDNA collections, which are fundamental to today's understanding of the utilization of genetic information. With the discovery of noncoding RNAs, additional new approaches to the cloning of short RNAs have been developed. However, with the realization that much larger portions of genomes are transcribed than anticipated from genome annotations, cDNA cloning faces new challenges to uncover rare transcripts and to make the corresponding cDNAs available for functional studies. This review provides an overview on the current status of cDNA cloning and possibilities for the discovery and characterization of new RNA families. © 2007 Elsevier Inc. All rights reserved. Keywords: cDNA cloning; cDNA library; mRNA; Small RNA; Non-coding RNA; Expression cloning Contents Introduction ................................................................ 232 Reverse transcriptases and first-strand cDNA synthesis .......................................... 233 Second-strand cDNA synthesis ...................................................... 233 Cloning vectors .............................................................. 234 Single-cell cDNA library preparation ................................................... 234 Approaches to full-length cDNA cloning ................................................. 234 Normalized and subtracted cDNA libraries ................................................ 234 Addressing RNA splicing ......................................................... 235 Large-scale cDNA cloning projects and clone collections ......................................... 235 Linking cloning to functional analysis................................................... 235 The emerging new RNA world ...................................................... 236 Cloning small RNA ............................................................ 236 Cloning long ncRNA ........................................................... 237 Future perspectives and developments ................................................... 237 Acknowledgments ............................................................. 239 References ................................................................. 240 Introduction cDNA cloning is one of the fundamental technologies in molecular biology, and most of our knowledge about transcripts and proteins is derived from the ability to prepare cDNA copies from RNA and to clone them into cDNA libraries. Starting with the discovery of reverse transcriptases, different protocols for cDNA library construction have been developed over time. Improvements in library preparation have been instrumental to gene discovery and the creation of large genomic resources. Recent discoveries of new classes of RNA and transcripts expressed at very low levels demand new cDNA cloning Available online at www.sciencedirect.com Genomics 91 (2008) 232 242 www.elsevier.com/locate/ygeno E-mail address: [email protected]. 0888-7543/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2007.11.004
Transcript
Page 1: The current status of cDNA cloning

Available online at www.sciencedirect.com

Genomics 91 (2008) 232–242www.elsevier.com/locate/ygeno

Review

The current status of cDNA cloning

Matthias Harbers

DNAFORM, Inc., Leading Venture Plaza 2, 75-1 Ono-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0046, Japan

Received 17 August 2007; accepted 17 November 2007Available online 28 January 2008

Abstract

The cloning of cDNAs, copies of cellular RNA, is one of the classical technologies in molecular biology. Over the past 30 years cDNA cloningtechnologies have been improved to enable the cloning of large cDNA collections, which are fundamental to today's understanding of theutilization of genetic information. With the discovery of noncoding RNAs, additional new approaches to the cloning of short RNAs have beendeveloped. However, with the realization that much larger portions of genomes are transcribed than anticipated from genome annotations, cDNAcloning faces new challenges to uncover rare transcripts and to make the corresponding cDNAs available for functional studies. This reviewprovides an overview on the current status of cDNA cloning and possibilities for the discovery and characterization of new RNA families.© 2007 Elsevier Inc. All rights reserved.

Keywords: cDNA cloning; cDNA library; mRNA; Small RNA; Non-coding RNA; Expression cloning

Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232Reverse transcriptases and first-strand cDNA synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233Second-strand cDNA synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233Cloning vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234Single-cell cDNA library preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234Approaches to full-length cDNA cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234Normalized and subtracted cDNA libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234Addressing RNA splicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235Large-scale cDNA cloning projects and clone collections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235Linking cloning to functional analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235The emerging new RNA world . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236Cloning small RNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236Cloning long ncRNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237Future perspectives and developments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

Introduction

cDNA cloning is one of the fundamental technologies inmolecular biology, and most of our knowledge about transcriptsand proteins is derived from the ability to prepare cDNA copies

E-mail address: [email protected].

0888-7543/$ - see front matter © 2007 Elsevier Inc. All rights reserved.doi:10.1016/j.ygeno.2007.11.004

from RNA and to clone them into cDNA libraries. Starting withthe discovery of reverse transcriptases, different protocols forcDNA library construction have been developed over time.Improvements in library preparation have been instrumental togene discovery and the creation of large genomic resources.Recent discoveries of new classes of RNA and transcriptsexpressed at very low levels demand new cDNA cloning

Page 2: The current status of cDNA cloning

233M. Harbers / Genomics 91 (2008) 232–242

approaches to make such RNAs available for functional analysis.Although it is beyond the scope of this article to review all thetechnical developments of the past 30 years, key steps in cDNAlibrary preparation are addressed to highlight general principlesof cDNA cloning (Fig. 1) and to give an overview on the currentstatus of cDNA cloning and future directions.

Reverse transcriptases and first-strand cDNA synthesis

The enzymatic conversion of RNA into double-strandedcDNA has become routine since the discovery of reverse tran-scriptases in 1970 [1,2], and improved conditions for cDNAsynthesis became available by 1975 [3]. Reverse transcriptasesare RNA- and DNA-dependent DNA polymerases that can useeither RNA or DNA to prime DNA synthesis. Commercialpreparations of the avian leukemia virus and Moloney strainmurine leukemia virus (Mo-MLV) reverse transcriptase arecommonly used today, and removal of the RNase H activity fromMo-MLV reverse transcriptases further improved cDNA yields[4]. Moreover, addition of T4 bacteriophage gene 32 protein(T4gp32) can boost the synthesis of long cDNAs [5] and tre-halose increases enzyme fidelity and enables cDNA synthesis athigher temperatures [6,7].

cDNA library preparation has mostly focused on cloning ofmRNAs, for which generally oligo(dT) primers (dT12–18) are

Fig. 1. Classical cDNA library cloning. Key steps for the preparation of a cD

used to initiate cDNA synthesis from poly(A) tails at the 3′ end.Although posttranscriptional addition of poly(A) tails is re-stricted to RNA polymerase II-derived mRNA transcripts, oligo(dT) priming can also occur at internal A-rich sequences, in-cluding RNA polymerase III-transcribed Alu repeats [8]. It hasbeen estimated that some 10 to 15% of the cDNA clones withinoligo(dT)-primed libraries could have truncated 3′ ends due tomisannealing of oligo(dT) primers [9]; this is particularly aproblem for cDNAs derived from very long messages [10,11].Alternatively random primers of 6 to 9 nucleotides can be used todrive reverse transcription reactions [12,13]. While this issometimes useful to reach the 5′ end of very long transcriptsand frequently used for analytical purposes, this approach doesnot allow full-length cDNA cloning. For later digestion anddirectional cloning of cDNAs, recognition sites for restrictionendonucleases can be introduced at the 3′ end of cDNA usingbifunctional primers comprising oligo(dT) and linker regions[14]. During first-strand synthesis cDNA can be further protectedagainst digestion by methylation-sensitive enzymes by introdu-cing 5-methylcytosine to create hemimethylated DNA [15].

Second-strand cDNA synthesis

Synthesis of the second cDNA strand requires a priming siteat the 5′ end of cDNA. Originally, hairpin structures at the 3′ end

NA library are shown. For further details on each step, refer to the text.

Page 3: The current status of cDNA cloning

234 M. Harbers / Genomics 91 (2008) 232–242

of single-stranded cDNAs were used in self-priming reactions,and second strands were synthesized by the DNA polymeraseactivity of the reverse transcriptase followed by S1 nucleasedigestion to remove the hairpin structures. However, self-prim-ing is a poorly controlled reaction and the required S1 nucleasetreatment removes 5′ end sequences from double-strandedcDNA.Various alternative approaches for priming second-strandsynthesis have been developed over time such as the addition ofhomopolymers to the 3′ end of single-stranded cDNA [16,17],replacement synthesis after nicking the RNA in RNA/cDNAhybrids with RNase H [18], ligation of an RNA oligonucleotideto the 5′ end of RNA prior to the reverse transcription reaction[19,20], or ligation of double-stranded adaptors to the 3′ end ofsingle-stranded cDNA [21]. For full-length cDNA cloning, theaddition of a priming site should not affect the 5′ end of double-stranded cDNA and allows for directional cloning by introducinga restriction endonuclease recognition site at the 5′ end that isdistinct from the recognition site used at the 3′ end. Second-strand cDNA synthesis can be combined with PCR amplificationto clone cDNAs from small amounts of RNA. However, PCRamplification is biased against longer cDNAs and templatespresent at low concentrations [22] and is not recommended forgeneral library preparation.

Cloning vectors

For propagation (commonly in Escherichia coli [23]),double-stranded cDNAs are cloned into a plasmid [24] orbacteriophage vector. For longer cDNAs plasmid libraries aredifficult to preserve, whereas in vitro packaging into bacter-iophage λ, like the classical expression vector λgt11 [25],Lambda ZAP [26], or Lambda-FLC [27], allows for a widercloning range, higher titers, and safe long-term storage. Due tolimitations in preparing λ DNA, phages with automaticsubcloning vectors have been developed [26,27]. Differentvectors contain asymmetric cloning sites for directional cloning,unique restriction or recombination sites for releasing entirecDNA inserts, background cutting to reduce the number ofempty vectors in the library, and special promoter features forprotein expression. A wide range of dedicated cloning vectors[28] is commercially available along with approaches to libraryscreening, e.g., by random clone picking, use of antibodies,hybridization, or PCR.

Single-cell cDNA library preparation

Important advances have been made in the cloning of cDNAsfrom very small amounts of RNA or even a single cell [29,30] toaddress, for instance, the zonal expression of transcripts (a singlemammalian cell contains about 20 to 40 pg total RNA including0.5 to 1.0 pg mRNA [31]). These developments and new tech-nologies for the isolation of individual cells by laser capturemicrodissection [32,33] or cell aspiration after microinjectionenable the analysis of genes expressed in specific cells fromheterogeneous tissues [34]. Apart from PCR amplification, novelapproaches have been developed to amplify RNA directly incells [35] by preparing antisense RNA [36]. cDNA synthesis

from whole RNA is primed by an oligonucleotide containing aT7 RNA polymerase promoter, and after second-strand cDNAsynthesis, T7 RNA polymerase can be used to generate antisenseRNA from the cDNA. Since multiple RNA copies are obtainedfrom a single cDNA template, the method allows for linearamplification of RNA.Modifications of the procedure have beenpublished to enable full-length cDNA cloning after cDNA tailingby a terminal transferase [37]. An alternative approach to single-cell cDNA library preparation makes use of oligo(dT) primerslinked to magnetic beads to perform reverse transcription reac-tions and PCR [22] and to handle small amounts of cDNAwith areduced risk of losing material. Recently a protocol for gene-rating cDNA libraries from 1 ng of total RNA that performs allreactions on oligo(dT) magnetic beads was published [38]. Itintroduces a T7 RNA polymerase promoter sequence for am-plification and generates double-stranded DNA by a modifiedswitching mechanism at the 5′ end of RNA to enable PCRamplification and cloning into a vector. However, single-cellamplification reactions are limited in their reproducibility, wherePCR amplification may be more reliable than linear amplifica-tion [39]. Although the necessary amplification makes suchlibraries very biased, these approaches still open up new pros-pects in tissue- or cell-specific gene regulation, such as tumormarker discovery in difficult to classify poorly differentiatedcancers [40].

Approaches to full-length cDNA cloning

Most important for effective cloning approaches is the pro-duction of full-length cDNAs (cDNAs having an open readingframe or ORF) at a high rate for functional analysis of encodedproteins and for information on true 5′ ends of cDNAs to identifypromoter regions in the genome. Various approaches make use ofthe 5′-end-specific cap structure of mRNA to enrich for full-length cDNAs, achieving full-length rates in the range of 90% orabove [41–43]. The largest cDNA collections made so far usedthe cap-trapper [44–47] and oligo-capping [19,20,48] methods.In the cap-trapping method the cap structure is chemicallybiotinylated prior to selection of full-length mRNA/cDNAhybrids on streptavidin-coated beads, while in the oligo-cappingprocess the cap structure is replaced by an RNA oligonucleotideprior to first-strand cDNA synthesis. Other approaches include acap-binding protein [49], an antibody against the cap structure[50], and adding an oligonucleotide to the cap structure (U.S.Patent 6,022,715) or are based on a cap-switch mechanism [51].

Normalized and subtracted cDNA libraries

In addition to the full-length cDNAs, large gene discoveryprograms require special cloning strategies to reduce redun-dancy within libraries and final clone collections and to avoidoverrepresentation of housekeeping genes [52,53]. Moreover, toreduce costs, projects have focused on the cloning of one re-presentative full-length cDNA clone per gene. Although selec-tion criteria for full-length cDNA sequencing varied betweendifferent projects, the enrichment for new cDNA clones waspreferably done at the level of cDNA library construction,

Page 4: The current status of cDNA cloning

Table 1

Clone resource Home page Reference/comment

IMAGE Consortium http://image.llnl.gov/ [67]Mammalian GeneCollection (MGC)

http://mgc.nci.nih.gov/ [68]

FANTOM–mousecDNA collection

http://fantom3.gsc.riken.go.jp/ [45]

Rat EST project http://ratest.eng.uiowa.edu/ [53]Xenopus Gene Collection http://xgc.nci.nih.gov/ Subproject

to MGCZebrafish Gene Collection http://zgc.nci.nih.gov/ Subproject

to MGCDrosophila GeneCollection (Berkeley)

http://www.fruitfly.org/DGC/index.html

[69]

Rice full-lengthcDNA consortium

http://cdna01.dna.affrc.go.jp/cDNA/and http://www.rgrc.dna.affrc.go.jp/index.html.en

[46]

Full-length ArabidopsiscDNA collection

http://www.brc.riken.jp/lab/epd/catalog/cdnaclone.html

[47]

ORFeome Collaboration(human)

http://www.orfeomecollaboration.org/ [72]

C. elegans ORFeome http://worfdb.dfci.harvard.edu/ [73]

235M. Harbers / Genomics 91 (2008) 232–242

keeping the cost of sequencing randomly isolated clones to aminimum [52]. The high variation in cDNA abundance withinlibraries can be reduced (normalized) using time-limited re-association kinetics [54,55]. Since the most abundant cDNAspecies hybridize faster than rare ones, the double-strandedhybrids formed by abundant transcripts can be removed from theremaining single-stranded cDNAs of less abundant transcripts.Alternative approaches use a double-strand DNA-specific endo-nuclease (duplex-specific nuclease), to digest DNA/DNA hy-brids or the DNA portion within RNA/DNA hybrids [56], orRNase H to destroy the RNA portion in RNA/DNA hybrids(U.S. Patent 6,544,741 and [57]). In addition to normalization,known cDNAs can be removed from cDNA libraries in asubtraction step for higher discovery rates [9,52,58]. Librariescan also be enriched for short or long cDNAs by size frac-tionation or by removing undesired sequences by subtraction toclone differentially expressed genes [59].

Addressing RNA splicing

Most cDNA cloning projects have largely ignored alternativesplice variants and focused on representative clones. This isimportant to note as it has been estimated that 65% of allmammalian transcripts might be alternatively spliced [45],including important regulators related directly to human disease[60]. Splicing could explain the increased complexity of higherorganisms, although our present knowledge on splice variants isinsufficient to show an increase in splicing with developmentalcomplexity. New approaches have been developed to monitoralternative exon usage in different samples (U.S. Patent6,251,590 and [61,62]). The methods of Watahiki and ThillformDNA–DNA hybrids, in which alternative exons loop out asregions of single-stranded DNA surrounded by regions ofdouble-stranded DNA. Molecules with single-stranded DNAregions are then isolated by a single-strand DNA bindingmolecule and cloned for sequence analysis. This identifiesindividual exons, but a modified approach allows the isolation offull-length splice variants (Patent ApplicationWO2005108608).Selective cloning of tissue-specific splice variants will beimportant in the future use of cDNA libraries along withprogress in full-length sequencing by new sequencing technol-ogies (see below). The majority of mammalian genes probablyuse both alternative splicing and transcription frommultiple startsites [63,64], and studies will further explore relationshipsbetween splicing and alternative promoter usage [65,66].

Large-scale cDNA cloning projects and clone collections

Current achievements in cDNA library preparation aremarkedby the success of large-scale cDNA cloning projects such as theIMAGE Consortium [67]; the Mammalian Gene Collection [68];Drosophila melanogaster [69], human [48], rice [46], and Ara-bidopsis [47]; and the RIKEN mouse FANTOM projects [45].Initially cDNA libraries were prepared to sequence expressedsequence tags (ESTs) for cataloging transcripts on a genome-wide scale [70], followed by large-scale projects at WashingtonUniversity and commercial entities like Incyte and Human

Genome Sciences (a list of EST projects can be found at http://image.llnl.gov/image/html/projects.shtml). The cDNA librariesproduced in these efforts focused on high-throughput ESTsequencing, commonly from 3′ ends, but were insufficient forpreparing large clone collections due to the small insert sizes andlow full-length rates [71]. Progress in full-length cDNA cloningin combination with normalization and subtraction techniquesallowed for the preparation of comprehensive cDNA collections,though large cDNA collections are still available only for somemodel organisms due to the high cost of full-length cDNA se-quencing; refer to Table 1 for more information on cDNA col-lections in the public domain.

Large cDNA clone collections are also one of the startingpoints for establishing ORF clone resources from human [72] orCaenorhabditis elegans [73]. These clone collections comprisesequence-validated master clones in entry vectors that allow fordirect transfer of ORFs to a broad range of expression vectors[74,75]. Most commonly, ORF collections make use of site-specific recombination cloning systems for easier large-scalemanipulation of cDNA inserts [76]. It is hoped that ORF re-sources will support functional studies on protein-coding genesusing arrayed clone sets in highly parallel experiments undercontrolled conditions.

Linking cloning to functional analysis

Use of ORF clone collections and other genomic resources infunctional studies requires further progress in the development ofnew screening platforms based on biochemical assays suitable tomatch large cDNA clone collections [77,78]. With an increasingnumber of functional assays available, genetic screens can nowbe performed in mammalian cell cultures by selecting againstcellular activities like apoptosis, senescence, differentiation, oroncogenic transformation [79]. Screening assays have also be-nefited from the development of RNAi libraries for gene

Page 5: The current status of cDNA cloning

236 M. Harbers / Genomics 91 (2008) 232–242

inactivation [80–82], where protocols for converting cDNAsinto siDNAs by MmeI digestion and cloning into expressionvectors are available [83,84]; approaches for generation of RNAilibraries have recently been reviewed [85]. Although muchattention is now focusing on loss-of-function studies using RNAiresources [86–88], these must be complemented by gain-of-function studies based on cDNA and/or ORF resources to reducethe inherent rates of false positive and negative results inscreening assays [89].

Expression libraries can be used as an alternative to cDNA/ORF collections as a starting point [90–93]. Expression cloninghas thus far focused on protein coding transcripts to identifyspecific cDNAs by their biological activity in a screening assay.Such assays can include changes in cell behavior like cell deathor cell survival, changes in the expression of endogenous orreporter proteins, or direct binding to exposed polypeptides inphage display [94,95], baculovirus display [96], and ribosomedisplay experiments [97]. Phage display was originally limited indirectly screening cDNA libraries due to the need to fuse cDNAsto the N-terminus of pIII and pVIII phage proteins and the lack ofposttranslational protein modification. In part, these problemshave been addressed in the pJuFo system linking the protein inquestion to a phage protein via the high-affinity interaction of Junand Fos leucine zippers [98] and the use of baculovirus-infectedinsect cells [96]. However, expression libraries often suffer from5′ and 3′ untranslated regions in cDNAs hampering their expres-sion or translation, wrong orientation of cDNA inserts, undefinedreading frames, or the use of partial cDNA fragments, e.g.,in two-hybrid screens to enable production of fusion proteins[99–102]. These limitations emphasize the need for ORF clon-ing, where ORFs currently have to be cloned individually byPCR. Direct cloning of cDNA libraries comprising only ORFregions has not yet been achieved, although first efforts have beenmade by selecting expressing cDNA clones in a yeast system[103]. Such a selection system could be of great interest todistinguish experimentally between coding and noncodingRNAsbased on their ability to translate in vivo or in vitro (see below).

The emerging new RNA world

Although conducted for some 15 years, large-scale cDNAcloning projects have not yet revealed all transcripts [43,104].On the contrary, recent publications indicate that much largerportions of genomes are actively transcribed than previouslyestimated from whole genome annotations [105–108]. Thischallenges the classical view of “isolated genes” surrounded bynontranscribed regions; in particular, overlapping sense–anti-sense pairs seem to be a common feature of complex genomes[109–111]. It has even been suggested that all nonrepeat por-tions of the human genome could be transcribed [112]; simi-larly, 85% of the yeast genome is expressed [113]. Sometimesreferred to as “TUF” (transcripts of unknown function [107]) or“dark matter” [114], there is widespread low-level expression ofpotentially noncoding transcripts [115–119]. It remains to dis-tinguish between “meaningful transcripts” and “transcriptionalnoise” [120]. However, more and more studies suggest thatnoncoding RNAs (ncRNAs) play central roles in gene expres-

sion and genome organization [121]. The lower conservationfound for many noncoding transcripts argues for their evolu-tionary importance since such transcripts can change fasterduring evolution than coding transcripts.

Cloning small RNA

Recently short noncoding RNAs have become a key focus inresearch, emphasizing the great importance of new RNA fami-lies. Such RNA molecules include miRNAs and their precursors[122,123], snoRNAs [124], rasiRNAs [125,126], piRNAs [127],and small regulatory RNAs. The discovery of this new “RNAworld” revealed that standard cDNA libraries missed many tran-scripts, notably those of short length (commonly under 500 bp)or lacking poly(A) tails [107,108,128,129]. New approachesare being developed for targeted cloning of short ncRNAs[130–132], also referred to as experimental RNomics (Fig. 2).

Tailor-made cDNA libraries for systematic searches onncRNAs have concentrated on specific RNA classes, wheretarget groups were selected by RNA size fractionation, theability to ligate a linker to phosphorylated 5′ ends of RNA,structural signature sequences, protein and RNA target binding,or subcellular location. All these approaches require that specificpriming sites for first-strand cDNA synthesis be introduced atthe 3′ end of RNAmolecules (short RNA commonly do not havepoly(A) tails for oligo(dT) priming). These priming sites can beintroduced by ligating an RNA adaptor to the open 3′ end ofRNA using an RNA ligase [132] or by extending the open 3′ endof RNA by adding homopolymers using a poly(A) polymerase[133]. The poly(A) polymerase can also be used for C-tailing,which can be very useful to reduce priming from polyadenylatedmRNAs [133]. Specific priming sequences for second-strandcDNA synthesis can be introduced by different approaches asoutlined above for the cloning of standard cDNA libraries, suchas using poly(C) overhangs in the 5′-adaptor ligation step [134]or the addition of an RNA oligonucleotide to the 5′ end of RNA[135]. As mentioned for standard cDNA cloning protocols,approaches for full-length cloning are again preferable, althoughthis is not a major issue for analytical applications. RNA orcDNA fragments having adaptors at both ends can be rapidlyamplified by PCR for further analysis and/or cloning. Usuallysmall RNAs are enriched by size fractionation prior to cloninginto a library. Since size fractionation alone is not specificenough for targeted cloning approaches, such libraries usuallyhave low discovery rates. Additional selection steps can improvediscovery rates: in the case of H/ACA snoRNAs, for example,the process can be made more specific by using anchored prim-ers for conserved triple nucleotides in the H/ACA box [136].Mostly short RNA libraries are prepared for new ncRNA dis-covery and expression profiling [137], which does not requirencRNA cloning but rather relies on the power of new high-throughput sequencing methods [138–140] in brute-force deepsequencing experiments. Although well proven for the dis-covery of new RNAs, sequences alone will not be sufficient toelucidate the function of newly discovered transcripts. Thereforesequencing approaches should be coupled to cDNA preparation.Here the limit may be set by new high-throughput sequencing

Page 6: The current status of cDNA cloning

Fig. 2. Short RNA cloning and universal cDNA libraries. Key steps for the preparation of a short RNA library are shown. This concept can be extended to cloninguniversal cDNA libraries that in principle could comprise copies of all RNA. For further details on each step, refer to the text.

237M. Harbers / Genomics 91 (2008) 232–242

approaches like the 454 Genome Sequencer FLX System, whichcan obtain about 200 bp per read. This is sufficient for full-lengthsequencing of short RNAs at high throughput followed by invitro synthesis of the corresponding cDNAs. Certainly therapidly increasing power of high-throughput sequencing incombination with gene synthesis will make such approachesfeasible. DNA fragments of some 500 to 800 bp can rapidly beprepared from 40-bp oligonucleotides and automated PCR[141], arguing for an important role for gene synthesis in futuretranscript analyses.

Cloning long ncRNA

Small RNA libraries do not capture all ncRNAs, and shortRNA cloning often captures only mature RNA products. ForinstancemiRNAs are expressed as pre-mRNAs that are processedby endonucleolytic cleavage in the nucleus and cytoplasm to yieldmature miRNAs [142]. Those pre-mRNAs are not found in shortRNA libraries, but have to be cloned by the standard libraryapproaches outlined above. Cell fractionation in combinationwith inactivation of enzymes involved in the maturation processcan open up interesting possibilities such as the recent globalidentification of noncoding RNAs in yeast [143]. Cloning suchpre-mRNAs is important to understand the regulation ofmaturation processes and how they affect the cellular localization

and function of mature RNAs in the cell. For example, ncRNAsassociated with SC35 splicing domains have been identified bytheir nuclear localization and the fact that they are not exportedinto the cytoplasm [144]. These long ncRNAs, NEAT1 andNEAT2, are abundant in human andmouse tissues, and other longncRNAs may be discovered in a similar way. Classical cloningapproaches have already identified many long ncRNAs in mouse[45], including longer RNA transcripts (denoted as “macro-ncRNA” [11] or “macroRNA” [145]) often expressed in a sense–antisense orientation to other transcripts [109–111]. However,other than well-known long ncRNAs such as Xist [146] or Air[147], long ncRNAs have not yet attracted much attention,although they seem to be regulated functional transcripts[10,11,145]. Our present knowledge about their features limitsthe design of dedicated cloning approaches for long ncRNAs.

Future perspectives and developments

Looking back at the development of cDNA cloning techno-logies over the past 30 years, there is a strong basis for developingnovel approaches to the cloning and characterization of newRNAspecies (Table 2). Only cDNA cloning will provide the necessaryresources for functional studies on those new transcripts. Thismakes cDNA cloning a fundamental technology for future direc-tions in gene discovery and transcriptome analysis.

Page 7: The current status of cDNA cloning

Table 2

Targeted RNA Comment Needs

Poly(A)+ mRNA Established for 500 to 15,000 bp Libraries with wider cloning range.Poly(A)− mRNA Full-length cloning not established Requires alternative approaches to priming reverse transcription reaction from 3′ end

(Patent Application WO2006003721).Long mRNA N15,000 bp Better reverse transcription reactions, new cloning vectors, e.g., BAC vectors.Short mRNA b500 bp Better methods to remove adaptors.Full-length mRNA Established methods available Full-length cloning is preferable wherever possible. Methods to capture

capped mRNA should be used in all mRNA cloning approaches.Coding mRNA Selection of ORF clones Experimental selection for ability to translate [103].Noncoding mRNA Selection of clones lacking any ORF Experimental selection against ability to translate [103].Splice variants of mRNA Methods for selective cloning of splice

variants availableRequires better methods to characterize splice variants and exon usage, e.g.,by tiling arrays or full-length shotgun sequencing by new high throughput sequencing.

Sense–antisense pairs Very common feature for many genes Selection by hybridization of “sense driver” to “sense tester”(U.S. Patents 6,528,262 and 6,986,988).

Short RNA Commonly selected based on short length Use of conserved structures for more selective cloning; may be hard,as conserved structures do not necessarily reflect on conserved sequences.Effective protocols to select RNA by binding to proteins and/or DNA/RNA.

Precursor RNA forshort RNA

Most short RNAs go through maturation process Cloning of long cDNAs using RNA prepared from conditioned cells.

All RNA Full-length cDNA sequencing New approaches to high-throughput full-length cDNA sequencingcombining new sequencing technologies and shogun sequencing.

All RNA Cloning of rare transcripts Linking cDNA cloning to tiling arrays and tag-based approaches.All RNA Target at “universal library” for transcriptome

analysisModification of 5' and 3' ends for unbiased cloning of all RNA transcripts.

All RNA Target at expression cloning for functional screens Limiting factor for coding transcripts is ORF cloning. Direct cloning ofncRNAs into expression vectors may be suitable approach. Effectiveexpression systems with inducible promoters are welcome. Resources shouldenable “gain-of-function” and “loss-of-function” experiments at the same time.Progress in the development of screening assays.

All RNA Small-scale or single-cell libraries Improvements in amplification methods.

238 M. Harbers / Genomics 91 (2008) 232–242

Until now computational sequence analysis alone has largelymissed many transcripts [148,149], whereas unsupervised ap-proaches like tiling arrays [150] and tag-sequencing have pushedforward the borders in gene discovery [151]. New strategiescombining tiling arrays and cDNA library screening can beenvisioned, in which tiling arrays could be used to analyze thecomplexity of RNA samples or cDNA libraries and at the sametime provide the sequence information needed to isolate clonesfor novel transcripts. The success of such strategies wouldlargely depend on preparing highly complex cDNA librarieswith high titers and the sensitivity of the tiling arrays to identifyrare transcripts/cDNAs. For tag-based approaches a link betweenexpression profiling and cloning has already been achieved:ditags or paired-end tags comprising the end sequences ofcDNAs are derived from full-length cDNA libraries, and thisprovides sufficient sequence information for primer design andPCR cloning of new transcripts [152]. Similar approaches arealso possible for other 5′ end tag-based approaches likeCAGE [153] or 5′-SAGE [154]. Consequently classical cDNAlibraries/library screening may be challenged by large-scalePCR cloning [155] and gene synthesis utilizing partial orcomplete sequence information from high-throughput sequen-cing and tiling array projects. PCR amplification has beenwidely used in cloning new cDNAs, although PCR amplificationrequires information from both ends of the transcripts andsuitable templates, has an inherent error rate (see above), andmay lead to multiple amplicons covering different splicevariants. Ditag methods have identified very long transcripts[45], and it is expected that additional long transcripts will be

discovered. Present approaches in cDNA library constructionenable the cloning of cDNAs of up to 15 kb at best. Improvedconditions for reverse transcription reactions are needed, as arenew cloning vectors, e.g., BACs, to uncover long RNAs,including macro ncRNAs predicted from cDNA fragments [11].

Especially tag-based approaches will greatly benefit from thefast development of new high-throughput sequencing methods[138–140] that allow deep sequencing of transcriptomes at lowcost. These approaches not only will be important for tag-basedexpression profiling but also will further facilitate importantfunctions in full-length cDNA sequencing by shotgun methods.For example, one can imagine preparing an individual shotgunlibrary per cDNA clone, ligating the resulting DNA fragmentsto adaptors having clone-specific barcode sequences, and thenperforming a highly multiplexed sequencing reaction by pool-ing the barcoded DNA fragments derived from many differentcDNA clones. The barcode sequences will guide clone-specificassembly of full-length cDNA sequences (for multiplexing and454 sequencing refer to [156,157]). Such strategies are of par-ticular interest for analyzing more splice variants and creatingmore sequence-verified cDNA resources.

Using new strategies, future studies may shift from the large-scale cloning projects of the past to more focused applicationsdriving the discovery of new transcripts and RNA classes.Knowledge-driven approaches will make targeted isolation andcloning of newRNAclasses possible, and this processwill benefitfrom a better understanding of structural features of differentRNA groups or their cellular localization. For instance, naturallyoccurring sense–antisense pairs can be isolated by hybridizing

Page 8: The current status of cDNA cloning

239M. Harbers / Genomics 91 (2008) 232–242

cDNAs from sense and antisense RNA obtained from the samesample (U.S. Patents 6,528,262 and 6,986,988), and poly(A)−mRNA could be cloned by 3′ end adaptor ligation to total RNAfollowed by an mRNA-specific cap selection. It was suggested touse double-stranded adaptors with oligo(dT) overhangs to blockthe 3′ ends from polyadenylated mRNA prior to adding anadaptor to poly(A)−RNA for cloning specifically poly(A)−RNA(Patent Application WO2006003721). Such approaches will beimportant in understanding the large portion of nonpolyadeny-lated mRNAs in the cell not yet covered by any cDNA collection[107,108,128,129].

Alternatively, new high-throughput sequencing technologiesmay drive the development of “universal cloning strategies.”Today cDNA library strategies follow certain assumptions todirect the cloning to preferred RNA groups, preferentially by sizefractionation of mRNAs with a size of over 500 bp and muchshorter ncRNAs of about 25 bp. However, cloning strategiesdeveloped for short RNA detection making use of the addition ofpoly(C) or poly(A) tails, or adaptor ligation at 5′ ends and/or3′ ends can be extended to develop a universal cloning strategy.Basically every RNA molecule that can be modified at its 5′ and3′ ends to enable priming of the reverse transcription reaction andthe preparation of a second cDNA strand can be converted into acDNA comprising the entire RNA sequence and contained in auniversal cDNA library. Short RNAs derived from endonucleo-lytic cleavage of precursor RNAs commonly have a 5′ phosphategroup and open 3′ ends as needed in library preparation.However,modifications to the 5′ and 3′ ends of RNA have been describedthat may prevent full-length cloning by standard protocols ofRNAs derived from different or thus far unknown maturationprocesses. For instance, precursor RNAs derived from RNApolymerase I contain an unmodified triphosphate group at their5′ ends, whereas in transcripts derived from RNA polymerase IIthe 5′ triphosphate group is rapidly modified by addition ofmethylated guanosine triphosphate (refer to [158] for studies on5′ ends of RNA). Also modifications at the 3′ ends of RNA havebeen described, such as a 2′,3′-cyclic phosphate as an interme-diate for circulation of RNA [159,160]. Hence, further manipula-tion of the ends of modified RNAs may be required to ensureequal cloning of different RNA species into universal libraries.

Until now universal cloning strategies have not been attrac-tive because the resulting libraries would be dominated by a fewRNA species, mostly tRNA and rRNA. Therefore any universalcloning strategy should include additional steps to remove ef-fectively undesired RNA species that are of no interest andwould hamper library analysis. New reagents containing beadspresenting oligonucleotides complementary to rRNA or enzy-matic digestion using DNA fragments or oligonucleotides andRNase H (see above) can selectively remove RNAs. In combina-tion with computational prediction of RNA structures [161,162],ORFs, or other features, RNase H-mediated digestion can re-move specific RNA species using oligonucleotides hybridizingto conserved RNA motifs or cleaving off priming sites fromselected RNAs. Universal cDNA libraries should enable a moreunbiased transcriptome analysis. Since they would cover muchlarger fractions of transcriptomes than classical libraries, there isa lower risk of losing RNA groups that do not match set para-

meters during library preparation (see above on size rangescommonly used in library preparation). Moreover, profiles fromall RNA groups within one sample could be obtained rather thanfocusing on a few RNA groups only. This aspect is importantbecause many small RNAs are involved in processing of otherlarger RNAs coexpressed within the same cell.

For functional studies on novel coding and noncoding tran-scripts, cDNA clones are necessary for performing in vitro and invivo experiments. Building global cDNA collections comprisingall new RNA species, including unknown splice variants, is amajor challenge demonstrated by the enormous number of newtranscripts identified recently by tag-base approaches, tilingarrays, and short RNA libraries. The focus of cDNA cloningprojects could shift toward functional screens in biologicalmodels to identify RNA classes by function. Here I can see newapplications for expression cloning, for example, where expres-sion libraries for ncRNAs in combination with effective lentiviralexpression cloning systems [163] could play an important role inelucidating ncRNA function. Using cDNA libraries in functionalscreens may also help to characterize ncRNAs that could not beidentified in classical mutation-driven genetic screens. The lack ofan ORF may even allow for easier design of such expressionlibraries; for example, an miRNA expression library has beenprepared directly from genomic DNA fragments [164]. Althoughexpression cloning is a powerful approach, it is still unclear howwell expression cloning will work for ncRNAs. Their largelyunknown functions could make it difficult to select the “right”biological context for testing, and redundancy between RNAsmay further reduce the yields of screening assays. However,successful genetic screens on miRNAs [164] and the functionalcloning of Shirin in an expression cloning system demonstratethat expression cloning approaches can indeed work foridentifying ncRNAs [165]. The 3′ untranslated region of Shirincan bind directly to the RNA-binding protein Vg1RBP and issufficient to induce insulin expression in Xenopus embryos. Itscloning not only highlights a new embryological activity ofVg1RBP, but could stand at the beginning of finding many moreRNA–protein interactions in functional screens.

Now that we have realized how little we know about theutilization of genomic information, cDNA cloning and cDNAlibrary preparation have a long way to go in driving discoveriesin the RNA world. Many of these new approaches could be“biology-driven” to link phenotypes directly to genotypes.

Acknowledgments

It was not my intention to review here the entire literatureregarding cDNA cloning. Therefore I am sorry for those whosepublications could not be cited. I thank M. Dushay and P. Einatfor their critical comments and suggestions for improving themanuscript. My special thanks are given to the past and presentmembers of DNAFORM's cDNA Library Team: T. Hayashi, Y.Hodoyama, M. Kamiya, C. Kato, F. Kobayashi, C. Kurihara, A.Lezhava, Y. Shimatani-Shibata,M. Suzuki, S. Takaku, C. Tanaka,and T. Tanaka, as well as P. Carninci and Y. Hayashizaki atRIKEN and Elizabeth Bosch at FivePrime Therapeutics for ourclose collaboration.

Page 9: The current status of cDNA cloning

240 M. Harbers / Genomics 91 (2008) 232–242

References

[1] D. Baltimore, RNA-dependent DNA polymerase in virions of RNAtumour viruses, Nature 226 (1970) 1209–1211.

[2] H.M. Temin, S. Mizutani, RNA-dependent DNA polymerase in virions ofRous sarcoma virus, Nature 226 (1970) 1211–1213.

[3] A. Efstratiadis, T. Maniatis, F.C. Kafatos, A. Jeffrey, J.N. Vournakis, Fulllength and discrete partial reverse transcripts of globin and chorionmRNAs, Cell 4 (1975) 367–378.

[4] M.L. Kotewicz, C.M. Sampson, J.M. D'Alessio, G.F. Gerard, Isolation ofcloned Moloney murine leukemia virus reverse transcriptase lackingribonuclease H activity, Nucleic Acids Res. 16 (1988) 265–277.

[5] C. Piche, J.P. Schernthaner, Optimization of in vitro transcription and full-length cDNA synthesis using the T4 bacteriophage gene 32 protein,J. Biomol. Tech. 16 (2005) 239–247.

[6] P. Carninci, et al., Thermostabilization and thermoactivation of thermo-labile enzymes by trehalose and its application for the synthesis of fulllength cDNA, Proc. Natl. Acad. Sci. U. S. A. 95 (1998) 520–524.

[7] A.N. Spiess, R. Ivell, A highly efficient method for long-chain cDNAsynthesis using trehalose and betaine, Anal. Biochem. 301 (2002) 168–174.

[8] M. Dewannieux, C. Esnault, T. Heidmann, LINE-mediated retrotranspo-sition of marked Alu sequences, Nat. Genet. 35 (2003) 41–48.

[9] M.F. Bonaldo, G. Lennon,M.B. Soares, Normalization and subtraction: twoapproaches to facilitate gene discovery, Genome Res. 6 (1996) 791–806.

[10] T. Ravasi, et al., Experimental validation of the regulated expression oflarge numbers of non-coding RNAs from the mouse genome, GenomeRes. 16 (2006) 11–19.

[11] M. Furuno, et al., Clusters of internally primed transcripts reveal novellong noncoding RNAs, PLoS Genet. 2 (2006) e37.

[12] K.E. Noonan, I.B. Roninson, mRNA phenotyping by enzymatic ampli-fication of randomly primed cDNA, Nucleic Acids Res. 16 (1988) 10366.

[13] P. Goelet, et al., Nucleotide sequence of tobacco mosaic virus RNA, Proc.Natl. Acad. Sci. U. S. A. 79 (1982) 5818–5822.

[14] C. Coleclough, F.L. Erlitz, Use of primer-restriction-end adapters in anovel cDNA cloning strategy, Gene 34 (1985) 305–314.

[15] J.H. Han, W.J. Rutter, Lambda gt22S, a phage expression vector for thedirectional cloning of cDNA by the use of a single restriction enzymeSfiI, Nucleic Acids Res. 16 (1988) 11837.

[16] F. Rougeon, P. Kourilsky, B. Mach, Insertion of a rabbit beta-globin genesequence into an E. coli plasmid, Nucleic Acids Res. 2 (1975) 2365–2378.

[17] H. Okayama, P. Berg, High-efficiency cloning of full-length cDNA, Mol.Cell. Biol. 2 (1982) 161–170.

[18] U. Gubler, B.J. Hoffman, A simple and very efficient method for gene-rating cDNA libraries, Gene 25 (1983) 263–269.

[19] K. Maruyama, S. Sugano, Oligo-capping: a simple method to replace thecap structure of eukaryotic mRNAs with oligoribonucleotides, Gene 138(1994) 171–174.

[20] S. Kato, et al., Construction of a human full-length cDNA bank, Gene 150(1994) 243–250.

[21] Y. Shibata, et al., Cloning full-length, cap-trapper-selected cDNAs byusing the single-strand linker ligation method, BioTechniques 30 (2001)1250–1254.

[22] E.E. Karrer, et al., In situ isolation of mRNA from individual plant cells:creation of cell-specific cDNA libraries, Proc. Natl. Acad. Sci. U. S. A. 92(1995) 3814–3818.

[23] N. Casali, Escherichia coli host strains, Methods Mol. Biol. 235 (2003)27–48.

[24] F. Hayes, The function and organization of plasmids, Methods Mol. Biol.235 (2003) 1–17.

[25] R.A. Young, R.W. Davis, Efficient isolation of genes by using antibodyprobes, Proc. Natl. Acad. Sci. U. S. A. 80 (1983) 1194–1198.

[26] J.M. Short, J.M. Fernandez, J.A. Sorge, W.D. Huse, Lambda ZAP: abacteriophage lambda expression vector with in vivo excision properties,Nucleic Acids Res. 16 (1988) 7583–7600.

[27] P. Carninci, et al., Balanced-size and long-size cloning of full-length, cap-trapped cDNAs into vectors of the novel lambda-FLC family allows en-hanced gene discovery rate and functional analysis, Genomics 77 (2001)79–90.

[28] A. Preston, Choosing a cloning vector, Methods Mol. Biol. 235 (2003)19–26.

[29] J. Eberwine, C. Spencer, K. Miyashiro, S. Mackler, R. Finnell, Com-plementary DNA synthesis in situ: methods and applications, MethodsEnzymol. 216 (1992) 80–100.

[30] J. Eberwineet, et al., Analysis of gene expression in single live neurons,Proc. Natl. Acad. Sci. U. S. A. 89 (1992) 3010–3014.

[31] R.C. Roozemond, Ultramicrochemical determination of nucleic acids inindividual cells using the Zeiss UMSP-I microspectrophotometer: appli-cation to isolated rat hepatocytes of different ploidy classes, Histochem. J.8 (1976) 625–638.

[32] N.L. Simone, R.F. Bonner, J.W. Gillespie, M.R. Emmert-Buck, L.A.Liotta, Laser-capture microdissection: opening the microscopic frontier tomolecular analysis, Trends Genet. 14 (1998) 272–276.

[33] N.L. Simone, C.P. Paweletz, L. Charboneau, E.F. Petricoin III, L.A. Liotta,Laser capture microdissection: beyond functional genomics to proteo-mics, Mol. Diagn. 5 (2000) 301–307.

[34] S.P. Brandt,Microgenomics: gene expression analysis at the tissue-specificand single-cell levels, J. Exp. Bot. 56 (2005) 495–505.

[35] S.D. Ginsberg, RNA amplification strategies for small sample popula-tions, Methods 37 (2005) 229–237.

[36] R.N. Van Gelder, et al., Amplified RNA synthesized from limited quan-tities of heterogeneous cDNA, Proc. Natl. Acad. Sci. U. S. A. 87 (1990)1663–1667.

[37] S.L. Lin, Single-cell cDNA library construction using cycling aRNAamplification, Methods Mol. Biol. 221 (2003) 117–127.

[38] J. Adjaye, Generation of amplified RNAs and cDNA libraries from singlemammalian cells, Methods Mol. Med. 132 (2007) 117–124.

[39] T. Subkhankulova, F.J. Livesey, Comparative evaluation of linear andexponential amplification techniques for expression profiling at the single-cell level, Genome Biol. 7 (2006) R18.

[40] S. Ramaswamy, et al., Multiclass cancer diagnosis using tumor gene ex-pression signatures, Proc. Natl. Acad. Sci. U. S.A. 98 (2001) 15149–15154.

[41] M. Das, I. Harvey, L.L. Chu, M. Sinha, J. Pelletier, Full-length cDNAs:more than just reaching the ends, Physiol. Genomics 6 (2001) 57–80.

[42] M. Harbers, P. Carninci, CAGE (cap-analysis-gene-expression): a novelapproach for rapid gene discovery and gene network identification, in:S.M. Wang (Ed.), SAGE: Current Technologies and Applications, Nor-wich, Horizon Biosci, 2005, pp. 29–76.

[43] P. Carninci, Constructing the landscape of the mammalian transcriptome,J. Exp. Biol. 210 (2007) 1497–1506.

[44] P. Carninci, et al., High-efficiency full-length cDNA cloning by biotiny-lated CAP trapper, Genomics 37 (1996) 327–336.

[45] P. Carninci, et al., The transcriptional landscape of the mammalian ge-nome, Science 309 (2005) 1559–1563.

[46] S. Kikuchi, et al., Collection, mapping, and annotation of over 28,000cDNA clones from japonica rice, Science 301 (2003) 376–379.

[47] M. Seki, et al., Functional annotation of a full-length Arabidopsis cDNAcollection, Science 296 (2002) 141–145.

[48] T. Ota, et al., Complete sequencing and characterization of 21,243 full-length human cDNAs, Nat. Genet. 36 (2004) 40–45.

[49] I. Edery, L.L. Chu, N. Sonenberg, J. Pelletier, An efficient strategy toisolate full-length cDNAs based on an mRNA cap retention procedure(CAPture), Mol. Cell. Biol. 15 (1995) 3363–3371.

[50] H. Theissen, et al., Cloning of the human cDNA for theU1RNA-associated70K protein, EMBO J. 5 (1986) 3209–3217.

[51] Y.Y. Zhu, E.M. Machleder, A. Chenchik, R. Li, P.D. Siebert, Reversetranscriptase template switching: a SMARTapproach for full-length cDNAlibrary construction, BioTechniques 30 (2001) 892–897.

[52] P. Carninci, et al., Targeting a complex transcriptome: the construction of themouse full-length cDNAencyclopedia, GenomeRes. 13 (2003) 1273–1289.

[53] T.E. Scheetz, et al., High-throughput gene discovery in the rat, GenomeRes. 14 (2004) 733–741.

[54] S.R. Patanjali, S. Parimoo, S.M. Weissman, Construction of a uniform-abundance (normalized) cDNA library, Proc. Natl. Acad. Sci. U. S. A. 88(1991) 1943–1947.

[55] M.B. Soares, et al., Construction and characterization of a normalizedcDNA library, Proc. Natl. Acad. Sci. U. S. A. 91 (1994) 9228–9232.

Page 10: The current status of cDNA cloning

241M. Harbers / Genomics 91 (2008) 232–242

[56] P.A. Zhulidov, et al., Simple cDNA normalization using Kamchatka crabduplex-specific nuclease, Nucleic Acids Res. 32 (2004) e37.

[57] P. Laveder, C. De Pitta, S. Toppo, G. Valle, G. Lanfranchi, A two-stepstrategy for constructing specifically self-subtracted cDNA libraries, NucleicAcids Res. 30 (2002) e38.

[58] P. Carninci, et al., Normalization and subtraction of cap-trapper-selectedcDNAs to prepare full-length cDNA libraries for rapid discovery of newgenes, Genome Res. 10 (2000) 1617–1630.

[59] C.G. Sagerstrom, B.I. Sun, H.L. Sive, Subtractive cloning: past, present,and future, Ann. Rev. Biochem. 66 (1997) 751–783.

[60] N.A. Faustino, T.A. Cooper, Pre-mRNA splicing and human disease,Genes Dev. 17 (2003) 419–437.

[61] A. Watahiki, et al., Libraries enriched for alternatively spliced exonsreveal splicing patterns in melanocytes and melanomas, Nat. Methods 1(2004) 233–239.

[62] G. Thill, et al., ASEtrap: a biological method for speeding up the explo-ration of spliceomes, Genome Res. 16 (2006) 776–786.

[63] P. Carninci, et al., Genome-wide analysis of mammalian promoter archi-tecture and evolution, Nat. Genet. 38 (2006) 626–635.

[64] F. Denoeud, et al., Prominent use of distal 5′ transcription start sites anddiscovery of a large number of additional exons in ENCODE regions,Genome Res. 17 (2007) 746–759.

[65] M. Zavolan, et al., Impact of alternative initiation, splicing, and termi-nation on the diversity of the mRNA transcripts encoded by the mousetranscriptome, Genome Res. 13 (2003) 1290–12300.

[66] A.R. Kornblihtt, Promoter usage and alternative splicing, Curr. Opin. CellBiol. 17 (2005) 262–268.

[67] G. Lennon, C. Auffray,M. Polymeropoulos,M.B. Soares, The I.M.A.G.E.Consortium: an integrated molecular analysis of genomes and their ex-pression, Genomics 33 (1996) 151–152.

[68] D.S. Gerhard, et al., The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC), GenomeRes. 14 (2004) 2121–2127.

[69] M. Stapleton, et al., The Drosophila gene collection: identification ofputative full-length cDNAs for 70% of D. melanogaster genes, GenomeRes. 12 (2002) 1294–1300.

[70] M.D. Adams, et al., Complementary DNA sequencing: expressed se-quence tags and human genome project, Science 252 (1991) 1651–1656.

[71] M. Marra, et al., An encyclopedia of mouse genes, Nat. Genet. 21 (1999)191–194.

[72] G. Temple, et al., From genome to proteome: developing expression cloneresources for the human genome, Hum. Mol. Genet. 15 (Spec. No. 1)(2006) R31–R43.

[73] P. Lamesch, et al., C. elegans ORFeome version 3.1: increasing the cov-erage of ORFeome resources with improved gene predictions, GenomeRes. 14 (2004) 2064–2069.

[74] S. Wiemann, et al., From ORFeome to biology: a functional genomicspipeline, Genome Res. 14 (2004) 2136–2144.

[75] J.F. Rual, D.E. Hill, M. Vidal, ORFeome projects: gateway betweengenomics and omics, Curr. Opin. Chem. Biol. 8 (2004) 20–25.

[76] G.Marsischky, J. LaBaer, Many paths to many clones: a comparative lookat high-throughput cloning methods, Genome Res. 14 (2004) 2020–2028.

[77] A.E. Carpenter, D.M. Sabatini, Systematic genome-wide screens of genefunction, Nat. Rev., Genet. 5 (2004) 11–22.

[78] G. Wu, S.K. Doberstein, HTS technologies in biopharmaceutical dis-covery, Drug Discov. Today 11 (2006) 718–724.

[79] S. Grimm, The art and design of genetic screens: mammalian culture cells,Nat. Rev., Genet. 5 (2004) 179–189.

[80] P.J. Paddison, et al., A resource for large-scale RNA-interference-basedscreens in mammals, Nature 428 (2004) 427–431.

[81] J.M. Silva, et al., Second-generation shRNA libraries covering the mouseand human genomes, Nat. Genet. 37 (2005) 1281–1288.

[82] K. Chang, S.J. Elledge, G.J. Hannon, Lessons from Nature: microRNA-based shRNA libraries, Nat. Methods 3 (2006) 707–714.

[83] D. Shirane, et al., Enzymatic production of RNAi libraries from cDNAs,Nat. Genet. 36 (2004) 190–196.

[84] C. Du, et al., PCR-based generation of shRNA libraries from cDNAs,BMC Biotechnol. 6 (2006) 28.

[85] J. Clark, S. Ding, Generation of RNAi libraries for high-throughputscreens, J. Biomed. Biotechnol. 2006 (2006) 45716.

[86] D.E. Root, N. Hacohen,W.C. Hahn, E.S. Lander, D.M. Sabatini, Genome-scale loss-of-function screening with a lentiviral RNAi library, Nat.Methods 3 (2006) 715–719.

[87] J. Moffat, D.M. Sabatini, Building mammalian signalling pathways withRNAi screens, Nat. Rev., Mol. Cell Biol. 7 (2006) 177–187.

[88] M. Chatterjee-Kishore, From genome to phenome—RNAi library screen-ing and hit characterization using signaling pathway analysis, Curr. Opin.Drug Discov. Dev. 9 (2006) 231–239.

[89] C.J. Echeverri, et al., Minimizing the risk of reporting false positives inlarge-scale RNAi screens, Nat. Methods 3 (2006) 777–779.

[90] A. Aruffo, Expression cloning systems, Curr. Opin. Biotechnol. 2 (1991)735–741.

[91] M.L. Matter, M.H. Ginsberg, J.W. Ramos, Identification of cell signalingmolecules by expression cloning, Sci. STKE 2001 (2001) PL9.

[92] M.L. Matter, J.W. Ramos, Expression cloning of signaling proteinsregulated by cell adhesion, Methods Mol. Biol. 341 (2006) 155–165.

[93] B. Seed, Developments in expression cloning, Curr. Opin. Biotechnol. 6(1995) 567–573.

[94] G.P. Smith, Filamentous fusion phage: novel expression vectors thatdisplay cloned antigens on the virion surface, Science 228 (1985)1315–1317.

[95] M. Paschke, Phage display systems and their applications, Appl. Micro-biol. Biotechnol. 70 (2006) 2–11.

[96] A.R. Makela, C. Oker-Blom, Baculovirus display: a multifunctional tech-nology for gene delivery and eukaryotic library development, Adv. VirusRes. 68 (2006) 91–112.

[97] M. He, M.J. Taussig, Ribosome display: cell-free protein display tech-nology, Brief Funct. Genomics Proteomics 1 (2002) 204–212.

[98] R. Crameri, M. Suter, Display of biologically active proteins on thesurface of filamentous phages: a cDNA cloning system for selection offunctional gene products linked to the genetic information responsible fortheir production, Gene 137 (1993) 69–75.

[99] S. Fields, High-throughput two-hybrid analysis: the promise and the peril,FEBS J. 272 (2005) 5391–5399.

[100] S. Fields, O. Song, A novel genetic system to detect protein–proteininteractions, Nature 340 (1989) 245–246.

[101] S. Fields, R. Sternglanz, The two-hybrid system: an assay for protein–protein interactions, Trends Genet. 10 (1994) 286–292.

[102] M.Vidal, P. Legrain, Yeast forward and reverse ‘n’-hybrid systems, NucleicAcids Res. 27 (1999) 919–929.

[103] C. Holz, et al., A human cDNA expression library in yeast enriched foropen reading frames, Genome Res. 11 (2001) 1730–1735.

[104] P. Kapranov, A.T. Willingham, T.R. Gingeras, Genome-wide transcriptionand the implications for genomic organization, Nat. Rev., Genet. 8 (2007)413–423.

[105] P. Carninci, Tagging mammalian transcription complexity, Trends Genet.22 (2006) 501–510.

[106] P. Kapranov, et al., Examples of the complex architecture of the humantranscriptome revealed by RACE and high-density tiling arrays, GenomeRes. 15 (2005) 987–997.

[107] J. Cheng, et al., Transcriptional maps of 10 human chromosomes at5-nucleotide resolution, Science 308 (2005) 1149–1154.

[108] P. Kapranov, et al., RNA maps reveal new RNA classes and a possiblefunction for pervasive transcription, Science 316 (2007) 1484–1488.

[109] S. Katayama, et al., Antisense transcription in the mammalian tran-scriptome, Science 309 (2005) 1564–1566.

[110] C.F. Hongay, P.L. Grisafi, T. Galitski, G.R. Fink, Antisense transcrip-tion controls cell fate in Saccharomyces cerevisiae, Cell 127 (2006)735–745.

[111] P.A. Galante, D.O. Vidal, J.E. de Souza, A.A. Camargo, S.J. de Souza,Sense–antisense pairs in mammals: functional and evolutionary con-siderations, Genome Biol. 8 (2007) R40.

[112] A.T. Willingham, T.R. Gingeras, TUF love for “junk” DNA, Cell 125(2006) 1215–1220.

[113] L. David, et al., A high-resolution map of transcription in the yeastgenome, Proc. Natl. Acad. Sci. U. S. A. 103 (2006) 5320–5325.

Page 11: The current status of cDNA cloning

242 M. Harbers / Genomics 91 (2008) 232–242

[114] J.M. Johnson, S. Edwards, D. Shoemaker, E.E. Schadt, Dark matter in thegenome: evidence of widespread transcription detected by microarraytiling experiments, Trends Genet. 21 (2005) 93–102.

[115] C. Dennis, The brave new world of RNA, Nature 418 (2002) 122–124.[116] J. Brosius, Waste not, want not—transcript excess in multicellular

eukaryotes, Trends Genet. 21 (2005) 287–288.[117] J.M. Claverie, Fewer genes, more noncoding RNA, Science 309 (2005)

1529–1530.[118] J.S. Mattick, I.V. Makunin, Non-coding RNA, Hum. Mol. Genet. 15

(Spec. No. 1) (2006) R17–R29.[119] P. Carninci, Y. Hayashizaki, Noncoding RNA transcription beyond an-

notated genes, Curr. Opin. Genet. Dev. 17 (2007) 139–144.[120] K. Struhl, Transcriptional noise and the fidelity of initiation by RNA

polymerase II, Nat. Struct. Mol. Biol. 14 (2007) 103–105.[121] K.V. Prasanth, D.L. Spector, Eukaryotic regulatory RNAs: an answer to

the ‘genome complexity’ conundrum, Genes Dev. 21 (2007) 11–42.[122] D.P. Bartel, MicroRNAs: genomics, biogenesis, mechanism, and func-

tion, Cell 116 (2004) 281–297.[123] Y. Wang, H.M. Stricker, D. Gou, L. Liu, MicroRNA: past and present,

Front. Biosci. 12 (2007) 2316–2329.[124] T. Kiss, Small nucleolar RNAs: an abundant group of noncoding RNAs

with diverse cellular functions, Cell 109 (2002) 145–148.[125] L.S. Gunawardane, et al., A slicer-mediated mechanism for repeat-associated

siRNA 5′ end formation in Drosophila, Science 315 (2007) 1587–1590.[126] V.V. Vagin, et al., A distinct small RNA pathway silences selfish genetic

elements in the germline, Science 313 (2006) 320–324.[127] N.C. Lau, et al., Characterization of the piRNA complex from rat testes,

Science 313 (2006) 363–367.[128] L.J. Grady, A.B. North, W.P. Campbell, Complexity of poly(A+) and poly

(A−) polysomal RNA in mouse liver and cultured mouse fibroblasts,Nucleic Acids Res. 5 (1978) 697–712.

[129] J. Van Ness, I.H. Maxwell, W.E. Hahn, Complex population of nonpoly-adenylated messenger RNA in mouse brain, Cell 18 (1979) 1341–1349.

[130] A. Huttenhofer, J. Vogel, Experimental approaches to identify non-codingRNAs, Nucleic Acids Res. 34 (2006) 635–646.

[131] E. Berezikov, E. Cuppen, R.H. Plasterk, Approaches to microRNA dis-covery, Nat. Genet. 38 Suppl (2006) S2–S7.

[132] M.Z. Michael, Cloning microRNAs from mammalian tissues, MethodsMol. Biol. 342 (2006) 189–207.

[133] A. Huttenhofer, J. Cavaille, J.P. Bachellerie, Experimental RNomics: aglobal approach to identifying small nuclear RNAs and their targets indifferent model organisms, Methods Mol. Biol. 265 (2004) 409–428.

[134] S. Takada, et al., Mouse microRNA profiles determined with a new andsensitive cloning method, Nucleic Acids Res. 34 (2006) e115.

[135] C. Lu, et al., Elucidation of the small RNA component of the trans-criptome, Science 309 (2005) 1567–1569.

[136] A.D. Gu, H. Zhou, C.H. Yu, L.H. Qu, A novel experimental approach forsystematic identification of box H/ACA snoRNAs from eukaryotes,Nucleic Acids Res. 33 (2005) e194.

[137] P. Einat, Methodologies for high-throughput expression profiling ofmicroRNAs, Methods Mol. Biol. 342 (2006) 139–157.

[138] M.L. Metzker, Emerging technologies in DNA sequencing, Genome Res.15 (2005) 1767–1776.

[139] J. Shendure, R.D. Mitra, C. Varma, G.M. Church, Advanced sequencingtechnologies: methods and goals, Nat. Rev., Genet. 5 (2004) 335–344.

[140] N. Hall, Advanced sequencing technologies and their wider impact inmicrobiology, J. Exp. Biol. 210 (2007) 1518–1525.

[141] S.J. Kodumal, et al., Total synthesis of long DNA sequences: synthesis ofa contiguous 32-kb polyketide synthase gene cluster, Proc. Natl. Acad.Sci. U. S. A. 101 (2004) 15573–15578.

[142] Y. Zeng, Principles of micro-RNA production and maturation, Oncogene25 (2006) 6156–6162.

[143] M.P. Samanta, W. Tongprasit, H. Sethi, C.S. Chin, V. Stolc, Global iden-tification of noncoding RNAs in Saccharomyces cerevisiae by modulatingan essential RNA processing pathway, Proc. Natl. Acad. Sci. U. S. A. 103(2006) 4192–4197.

[144] J.N. Hutchinson, et al., A screen for nuclear transcripts identifies twolinked noncoding RNAs associated with SC35 splicing domains, BMCGenomics 8 (2007) 39.

[145] J. Ponjavic, C.P. Ponting, G. Lunter, Functionality or transcriptionalnoise? Evidence for selection within long noncoding RNAs, GenomeRes.17 (2007) 556–565.

[146] N. Brockdorff, et al., The product of the mouse Xist gene is a 15 kbinactive X-specific transcript containing no conserved ORF and locatedin the nucleus, Cell 71 (1992) 515–526.

[147] F. Sleutels, R. Zwart, D.P. Barlow, The non-coding Air RNA is requiredfor silencing autosomal imprinted genes, Nature 415 (2002) 810–813.

[148] C. Mathe, M.F. Sagot, T. Schiex, P. Rouze, Current methods of geneprediction, their strengths and weaknesses, Nucleic Acids Res. 30 (2002)4103–4117.

[149] M.R. Brent, Genome annotation past, present, and future: how to definean ORF at each locus, Genome Res. 15 (2005) 1777–1786.

[150] T.C. Mockler, et al., Applications of DNA tiling arrays for whole-genomeanalysis, Genomics 85 (2005) 1–15.

[151] M. Harbers, P. Carninci, Tag-based approaches for transcriptome researchand genome annotation, Nat. Methods 2 (2005) 495–502.

[152] P. Ng, et al., Gene identification signature (GIS) analysis for transcriptomecharacterization and genome annotation, Nat. Methods 2 (2005) 105–111.

[153] T. Shiraki, et al., Cap analysis gene expression for high-throughputanalysis of transcriptional starting point and identification of promoterusage, Proc. Natl. Acad. Sci. U. S. A. 100 (2003) 15776–15781.

[154] S. Hashimoto, et al., 5′-end SAGE for the analysis of transcriptional startsites, Nat. Biotechnol. 22 (2004) 1146–1149.

[155] A. Baross, et al., Systematic recovery and analysis of full-ORF humancDNA clones, Genome Res. 14 (2004) 2083–2092.

[156] K.L. Nielsen, A.L. Hogh, J. Emmersen, DeepSAGE—digital transcrip-tomics with high sensitivity, simple experimental protocol and multi-plexing of samples, Nucleic Acids Res. 34 (2006) e133.

[157] J. Binladen, et al., The use of coded PCR primers enables high-through-put sequencing of multiple homolog amplification products by 454 pa-rallel sequencing, PLoS One 2 (2007) e197.

[158] T. Bruderer, L.C. Tu, M.G. Lee, The 5′ end structure of transcriptsderived from the rRNA gene and the RNA polymerase I transcribedprotein coding genes in Trypanosoma brucei, Mol. Biochem. Parasitol.129 (2003) 69–77.

[159] W. Filipowicz, M. Konarska, H.J. Gross, A.J. Shatkin, RNA 3′-terminalphosphate cyclase activity and RNA ligation in HeLa cell extract, NucleicAcids Res. 11 (1983) 1405–1418.

[160] W. Filipowicz, K. Strugala, M. Konarska, A.J. Shatkin, Cyclization ofRNA 3′-terminal phosphate by cyclase from HeLa cells proceeds viaformation of N(3′)pp(5′)A activated intermediate, Proc. Natl. Acad. Sci.U. S. A. 82 (1985) 1316–1320.

[161] R. Backofen, et al., RNAs everywhere: genome-wide annotation of struc-tured RNAs, J. Exp. Zool. B Mol. Dev. Evol. 308 (2007) 1–25.

[162] S. Washietl, et al., Structured RNAs in the ENCODE selected regions ofthe human genome, Genome Res. 17 (2007) 852–864.

[163] D. Chilov, C. Fux, H. Joch, M. Fussenegger, Identification of a novelproliferation-inducing determinant using lentiviral expression cloning,Nucleic Acids Res. 31 (2003) e113.

[164] P.M. Voorhoeve, et al., A genetic screen implicates miRNA-372 andmiRNA-373 as oncogenes in testicular germ cell tumors, Cell 124 (2006)1169–1181.

[165] F.M. Spagnoli, A.H. Brivanlou, The RNA-binding protein, Vg1RBP, isrequired for pancreatic fate specification, Dev. Biol. 292 (2006) 442–456.


Recommended