+ All Categories
Home > Documents > Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this...

Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this...

Date post: 01-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
17
Genome Analysis Sequencing and Analysis of Common Bean ESTs. Building a Foundation for Functional Genomics 1[w] MarioRamı´rez 2 , Michelle A. Graham 2 , Lourdes Blanco-Lo ´pez, Sonia Silvente, Arturo Medrano-Soto, Matthew W. Blair, Georgina Herna ´ndez, Carroll P. Vance, and Miguel Lara* Centro de Ciencias Geno ´micas, Universidad Nacional Auto ´noma de Me ´xico, Apartado 66210 Cuernavaca, Morelos, Mexico (M.R., L.B.-L., S.S., A.M.-S., G.H., M.L.); Agronomy and Plant Genetics (M.R., C.P.V.) and Plant Biology (M.A.G.), University of Minnesota, St. Paul, Minnesota 55108; Plant Science Research Unit, U.S. Department of Agriculture. Agriculture Research Service, St. Paul, Minnesota 55108 (C.P.V.); and International Center for Tropical Agriculture, Cali, Colombia (M.W.B.) Although common bean (Phaseolus vulgaris) is the most important grain legume in the developing world for human consumption, few genomic resources exist for this species. The objectives of this research were to develop expressed sequence tag (EST) resources for common bean and assess nodule gene expression through high-density macroarrays. We sequenced a total of 21,026 ESTs derived from 5 different cDNA libraries, including nitrogen-fixing root nodules, phosphorus-deficient roots, developing pods, and leaves of the Mesoamerican genotype, Negro Jamapa 81. The fifth source of ESTs was a leaf cDNA library derived from the Andean genotype, G19833. Of the total high-quality sequences, 5,703 ESTs were classified as singletons, while 10,078 were assembled into 2,226 contigs producing a nonredundant set of 7,969 different transcripts. Sequences were grouped according to 4 main categories, metabolism (34%), cell cycle and plant development (11%), interaction with the environment (19%), and unknown function (36%), and further subdivided into 15 subcategories. Comparisons to other legume EST projects suggest that an entirely different repertoire of genes is expressed in common bean nodules. Phaseolus-specific contigs, gene families, and single nucleotide polymorphisms were also identified from the EST collection. Functional aspects of individual bean organs were reflected by the 20 contigs from each library composed of the most redundant ESTs. The abundance of transcripts corresponding to selected contigs was evaluated by RNA blots to determine whether gene expression determined by laboratory methods correlated with in silico expression. Evaluation of root nodule gene expression by macroarrays and RNA blots showed that genes related to nitrogen and carbon metabolism are integrated for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to bean improvement. Common bean (Phaseolus vulgaris) is the most im- portant grain legume for direct human consumption; it comprises 50% of the grain legumes consumed world- wide (McClean et al., 2004). Total production exceeds 23 million metric tons, of which 7 million metric tons are produced in Latin America and Africa (Food and Agriculture Organization of the United Nations, 2001). Diets in countries from Latin America and eastern Africa often contain sufficient carbohydrates (through cereals such as maize, rice, and wheat), but are poor in proteins. Dietary proteins can be found in scarce animal products but are usually derived from le- gumes. In several countries, such as Mexico and Brazil, common bean is important as a primary source of dietary protein (Broughton et al., 2003). Common bean is one of the most ancient crops in the Americas. A nucleus of diversity of common bean is located in Ecuador and northern Peru, from where beans are dispersed into South and Central America, where domestication led to their separation and the forma- tion of two distinct gene pools, the Andean and the Mesoamerican (Gepts, 1998). Partial sequencing of cDNA inserts or expressed sequence tags (ESTs) obtained from many plant tissues and organs has been used as an effective method of gene discovery, molecular marker generation, and transcript pattern characterization. It is an efficient approach for identifying a large number of plant genes expressed during different developmental stages and in response to a variety of environmental conditions. In addition, once ESTs are generated, they provide a resource for transcript-profiling experiments. Cur- rently, only the grasses surpass the legumes (Fabaceae family) for the number of publicly available ESTs. There are nearly 986,000 nucleotide sequences repre- senting the Fabaceae family available from the Na- tional Center for Biotechnology Information (NCBI) taxonomy browser (October, 2004; http://www.ncbi. 1 This work was supported in part by Consejo Nacional de Ciencia y Tecnologı ´a, Mexico (grant no. G31751–B at CCG), U.S. Department of Agriculture, Agricultural Research Service, Current Research Information System (project no. 3640–21000–019–00D at the University of Minnesota), and by U.S. Agency for International Development at International Center for Tropical Agriculture. M.R. received a postdoctoral fellowship from Consejo Nacional de Cien- cia y Tecnologı ´a, Mexico. 2 These authors contributed equally to the paper. [w] The online version of this article contains Web-only data. * Corresponding author; e-mail [email protected]; fax 52–777– 317–4357. www.plantphysiol.org/cgi/doi/10.1104/pp.104.054999. Plant Physiology, April 2005, Vol. 137, pp. 1211–1227, www.plantphysiol.org Ó 2005 American Society of Plant Biologists 1211 www.plantphysiol.org on July 21, 2020 - Published by Downloaded from Copyright © 2005 American Society of Plant Biologists. All rights reserved.
Transcript
Page 1: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

Genome Analysis

Sequencing and Analysis of Common Bean ESTs.Building a Foundation for Functional Genomics1[w]

Mario Ramırez2, Michelle A. Graham2, Lourdes Blanco-Lopez, Sonia Silvente, Arturo Medrano-Soto,Matthew W. Blair, Georgina Hernandez, Carroll P. Vance, and Miguel Lara*

Centro de Ciencias Genomicas, Universidad Nacional Autonoma de Mexico, Apartado 66210 Cuernavaca,Morelos, Mexico (M.R., L.B.-L., S.S., A.M.-S., G.H., M.L.); Agronomy and Plant Genetics (M.R., C.P.V.) andPlant Biology (M.A.G.), University of Minnesota, St. Paul, Minnesota 55108; Plant Science Research Unit,U.S. Department of Agriculture. Agriculture Research Service, St. Paul, Minnesota 55108 (C.P.V.); andInternational Center for Tropical Agriculture, Cali, Colombia (M.W.B.)

Although common bean (Phaseolus vulgaris) is the most important grain legume in the developing world for humanconsumption, few genomic resources exist for this species. The objectives of this research were to develop expressed sequencetag (EST) resources for common bean and assess nodule gene expression through high-density macroarrays. We sequenceda total of 21,026 ESTs derived from 5 different cDNA libraries, including nitrogen-fixing root nodules, phosphorus-deficientroots, developing pods, and leaves of the Mesoamerican genotype, Negro Jamapa 81. The fifth source of ESTs was a leaf cDNAlibrary derived from the Andean genotype, G19833. Of the total high-quality sequences, 5,703 ESTs were classified assingletons, while 10,078 were assembled into 2,226 contigs producing a nonredundant set of 7,969 different transcripts.Sequences were grouped according to 4 main categories, metabolism (34%), cell cycle and plant development (11%), interactionwith the environment (19%), and unknown function (36%), and further subdivided into 15 subcategories. Comparisons toother legume EST projects suggest that an entirely different repertoire of genes is expressed in common bean nodules.Phaseolus-specific contigs, gene families, and single nucleotide polymorphisms were also identified from the EST collection.Functional aspects of individual bean organs were reflected by the 20 contigs from each library composed of the mostredundant ESTs. The abundance of transcripts corresponding to selected contigs was evaluated by RNA blots to determinewhether gene expression determined by laboratory methods correlated with in silico expression. Evaluation of root nodulegene expression by macroarrays and RNA blots showed that genes related to nitrogen and carbon metabolism are integratedfor ureide production. Resources developed in this project provide genetic and genomic tools for an international consortiumdevoted to bean improvement.

Common bean (Phaseolus vulgaris) is the most im-portant grain legume for direct human consumption; itcomprises 50% of the grain legumes consumed world-wide (McClean et al., 2004). Total production exceeds23 million metric tons, of which 7 million metric tonsare produced in Latin America and Africa (Food andAgriculture Organization of the United Nations, 2001).Diets in countries from Latin America and easternAfrica often contain sufficient carbohydrates (throughcereals such as maize, rice, and wheat), but are poor inproteins. Dietary proteins can be found in scarceanimal products but are usually derived from le-

gumes. In several countries, such as Mexico and Brazil,common bean is important as a primary source ofdietary protein (Broughton et al., 2003). Common beanis one of the most ancient crops in the Americas. Anucleus of diversity of common bean is located inEcuador and northern Peru, from where beans aredispersed into South and Central America, wheredomestication led to their separation and the forma-tion of two distinct gene pools, the Andean and theMesoamerican (Gepts, 1998).

Partial sequencing of cDNA inserts or expressedsequence tags (ESTs) obtained from many plant tissuesand organs has been used as an effective method ofgene discovery, molecular marker generation, andtranscript pattern characterization. It is an efficientapproach for identifying a large number of plant genesexpressed during different developmental stages andin response to a variety of environmental conditions.In addition, once ESTs are generated, they providea resource for transcript-profiling experiments. Cur-rently, only the grasses surpass the legumes (Fabaceaefamily) for the number of publicly available ESTs.There are nearly 986,000 nucleotide sequences repre-senting the Fabaceae family available from the Na-tional Center for Biotechnology Information (NCBI)taxonomy browser (October, 2004; http://www.ncbi.

1 This work was supported in part by Consejo Nacional deCiencia y Tecnologıa, Mexico (grant no. G31751–B at CCG), U.S.Department of Agriculture, Agricultural Research Service, CurrentResearch Information System (project no. 3640–21000–019–00D at theUniversity of Minnesota), and by U.S. Agency for InternationalDevelopment at International Center for Tropical Agriculture. M.R.received a postdoctoral fellowship from Consejo Nacional de Cien-cia y Tecnologıa, Mexico.

2 These authors contributed equally to the paper.[w] The online version of this article contains Web-only data.* Corresponding author; e-mail [email protected]; fax 52–777–

317–4357.www.plantphysiol.org/cgi/doi/10.1104/pp.104.054999.

Plant Physiology, April 2005, Vol. 137, pp. 1211–1227, www.plantphysiol.org � 2005 American Society of Plant Biologists 1211 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 2: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

nlm.nih.gov/Taxonomy). Over 92% of the ESTs de-posited for the Fabaceae family are derived from themodel legumes Medicago truncatula and Lotus japonicusand the crop legume soybean (Glycine max). Despitethe importance of common beans as a crop legume,very little EST information is currently publicly avail-able. Only 575 ESTs from common bean and 20,120ESTs from the related species, runner bean (Phaseoluscoccineus), have been deposited in GenBank’s ESTdatabase. For this reason, we have undertaken a surveyof the bean transcriptome by analyzing ESTs fromdiverse organs. Our research has been performedwithin the framework of Phaseomics, the internationalconsortium for Phaseolus genomics (Broughton et al.,2003), developed to establish the necessary frameworkof knowledge and materials for the advancement ofbean genomics, transcriptomics, and proteomics. Amajor goal of Phaseomics is to help generate newcommon bean varieties that are suitable and desiredby farmers and consumers.

Nitrogen (N) and phosphorus (P) are critical mac-ronutrients required for plant growth. In the bean-growing regions of the developing world, soils arefrequently depleted in N and P (Graham and Vance,2003). Moreover, N and P fertilizer use is limited dueto high costs and poor infrastructure. Understandingand improving mechanisms that lead to improved Nand P nutrition are critical to food production andsecurity. While root nodule symbiosis and P nutritionhave been research objectives in the genomics ofmodel legumes M. truncatula and L. japonicus, thesespecies are forage crops and indigenous to temperateregions (Handberg and Stougaard,1992; Cook, 1999).In addition, N and P nutrition have not been themajor focus of soybean genomics research. In thisarticle, we document the sequencing and contigassembly of more than 15,000 ESTs from organs ofcommon bean. Our common bean EST project wasoriginally initiated to develop EST profiles of N2-fixing root nodules and P-deficient roots. However,during the course of the project, it became apparentthat EST resources also needed to be developed forcommon bean pods and leaves. We also reportmacroarray transcriptome analysis of root nodulecontigs.

RESULTS

Features of Generated ESTs

In an effort to develop an EST platform for commonbean, 5 cDNA libraries were constructed, 4 from theMesoamerican cultivar Negro Jamapa and 1 from theAndean cultivar G19833. The sources of RNA toconstruct each library were pods, leaves, P-deficientroots, and nodules for the Mesoamerican genotypeand leaves for the Andean genotype. Single-pass 5#sequencing resulted in 3,400 to 4,900 ESTs from each ofthe Mesoamerican and Andean libraries (sequencesdeposited in GenBank, accession nos. CV528971–CV544303). In addition, single-pass 3# sequencing ofthe Andean genotype yielded an additional 854 se-quences. In total, 21,026 ESTs were sequenced (Table I).This number includes the 575 common bean ESTsalready present in GenBank. Between 19% and 33% ofthe sequenced ESTs from the 5 libraries were discar-ded and not considered for contig assembly due tolow-quality sequence or the absence of insert in theclone. In addition, clones identified as chimeric oralternative splice products were not included in contigassembly. Redundant ESTs were grouped into contigsusing the program Phrap (http://www.phrap.org/phredphrap/phrap.html). Of the total 15,781 ESTsequences considered acceptable for contig assembly,5,703 of these were classified as singletons and theremaining 10,078 assembled into 2,266 contigsranging in EST redundancy from 2 to 264 (Table II).Library-specific contigs ranged from 44 to 228, de-pending upon the organ. Total contigs and singletonscomprised a nonredundant gene set of 7,969 differenttranscripts. All EST sequences, contig images, single-nucleotide polymorphism (SNP), and gene familydata analyses are available (http://www.ccg.unam.mx/phaseolusest/Data_download.htm; see also sup-plemental data online).

Functional Annotation

To identify putative functions for genes encodingESTs, BLASTX analysis was used to compare thecommon bean contigs and singletons to the Uniref

Table I. Sequencing and contigging statistics of common bean ESTs

TissuesTotal No. of ESTs

Sequenced

Sequencing Success

PercentageaGood-Quality ESTs

Used for ContiggingbESTs in

Contigs

EST

Singletons

Mesoamerican nodules 4,636 81.6 3,745 2,441 1,304Mesoamerican pods 3,667 82.7 2,951 1,929 1,022Mesoamerican roots 4,329 74.2 3,165 1,774 1,391Mesoamerican leaves 3,456 78.2 2,677 1,983 694Andean leaves (5# and 3#) 4,938 67.0 3,243 1,951 1,292Total ESTs 21,026 76.3 15,781 10,078 5,703

aSequencing success was determined by removing low-quality or no-insert ESTs from the total number of ESTs sequenced. bClones that wereidentified as chimeric or alternative splice products were not included in contigging.

Ramırez et al.

1212 Plant Physiol. Vol. 137, 2005 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 3: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

100 protein database (Apweiler et al., 2004). The 2,226contigs were initially grouped into 4 main categories:metabolism (34%), cell cycle and plant development(11%), interaction with the environment (19%), andunknown function (36%). These were further subdi-vided into 15 subcategories, shown in Figure 1. Themetabolism category was subdivided into genes fromcarbon (C)/energy, amino acid/protein, nucleic acid/nucleotide, fatty acid/lipid, and secondary metabo-lism, as well as nutrient assimilation and possiblefunctions in other metabolic areas; the first two sub-categories were most abundant. The cell cycle anddevelopment category was subdivided into genes forcell structure, differentiation, cell cycle, apoptosis, andplant development, nodulation, and senescence. Thecategory of interaction with the environment wassubdivided into genes involved in transport/mem-brane proteins, stress/defense, and signal transduc-

tion/regulation. In this category, genes involved insignal transduction/regulation were the most abun-dant. The unknown function category included geneswith unknown function in plants, genes with homol-ogy to DNA or proteins with unknown function, andthose with no hit found.

Contigs Composed of Most Abundant ESTs

Analysis of EST frequency (abundance) comprisinga contig and the source of the contig can provideinsights with respect to gene expression levels andbiochemical functions occurring in an organ or tissue.Therefore, to identify genes that were highly ex-pressed in certain tissues, we identified the contigsthat were most abundantly expressed in pods, leaves,P-deficient roots, and root nodules (Table III). The 20

Figure 1. Based on homology (E-values#10) the 2,226 contigs were grouped in 4main categories, metabolism (34%), cell cycleand plant development (11%), interaction with the environment (19%), and unknown function (36%), and subdivided into 15subcategories that are shown in the figure.

Table II. Identification of tissue-specific contigs from common bean ESTs

Tissue-Specific ContigsNo. of

Contigs .1

Average ESTs/

(Contig . 1)

Average Length of

Contigs .1

No. of ESTs in

Largest Contig

Mesoamerican nodule specific 228 2.5 784.9 10Mesoamerican pod specific 110 3.6 725.8 64Mesoamerican root specific 227 2.5 731.9 11Mesoamerican leaves specific 44 3.0 766.8 16Andean leaves specific 149 3.0 824.3 22Mixed-tissue contigs 1,508 5.3 921.6 264All contigs 2,266 4.5 869.6 264

Transcriptomic Analysis of Common Bean

Plant Physiol. Vol. 137, 2005 1213 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 4: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

Table III. Common bean contigs composed of the most redundant ESTs

The 20 contigs from each organ containing the greatest number of ESTs were sorted according to library source. Library sources (organs) are podscollected at various times after anthesis, leaves from a Mesoamerican and Andean cultivar, roots from P-stressed plants, and effective root nodules.Asterisks denote contigs having ESTs from a single organ.

Pv ContigContig

LengthTop Uniref 100 Hit

Top

E-Value

Total ESTs

in Contig

Nod

ESTs

Pod

ESTs

RTS

ESTs

LVS

ESTs

ALV

ESTs

bp Cutoff of E , 1024

Pod Contigs2,677 668 Albumin 1 precursor (Glycine soja

[Wild soybean])*5.00E-52 64 0 64 0 0 0

2,671 983 Pod storage protein (Phaseolus vulgaris) 1.00E-146 45 0 44 0 1 02,662 1,490 Alcohol dehydrogenase 1 (Glycine max)* 0 36 0 36 0 0 02,670 1,103 Nodulin-26 (Glycine max) 1.00E-124 43 0 20 6 11 62,680 919 PSII type I chlorophyll a/b-binding protein

precursor (Glycine max)1.00E-148 69 0 20 0 49 0

2,628 2,000 Lipoxygenase (Phaseolus vulgaris) 0 19 0 16 3 0 02,632 1,456 Annexin (Medicago truncatula) 1.00E-146 20 3 15 1 1 02,598 549 Albumin 1 precursor (Glycine max)* 6.00E-51 14 0 14 0 0 02,600 2,039 Lipoxygenase (Phaseolus vulgaris) 0 15 0 14 0 0 12,675 998 Chlorophyll a/b-binding protein

(Glycine max)1.00E-149 59 0 14 0 45 0

2,685 950 Ribulose bisphosphate carboxylase precursor(Phaseolus vulgaris)

1.00E-104 161 0 14 0 113 34

2,575 1,479 b-Glucosidase precursor (Polygonumtinctorium)*

1.00E-137 12 0 12 0 0 0

2,652 1,047 LHCII type III chlorophyll a/b-binding protein(Phaseolus aureus)

1.00E-140 28 0 12 0 6 10

2,559 921 Lectin precursor (Vigna linearis var. latifolia)* 1.00E-136 11 0 11 0 0 02,560 760 Acid phosphatase (Glycine max) 4.00E-64 11 0 10 1 0 02,584 763 SAH7 protein (Arabidopsis thaliana) 2.00E-41 13 1 10 1 1 02,656 982 LHCII type II chlorophyll a/b-binding protein

(Phaseolus aureus)1.00E-151 31 0 10 0 21 0

2,660 815 Translationally controlled tumor proteinhomolog (Glycine max)

1.00E-77 32 6 10 8 7 1

2,515 681 Nonspecific lipid transfer protein PvLTP-24(Phaseolus vulgaris)*

6.00E-59 9 0 9 0 0 0

2,639 1,294 Cationic peroxidase 2 (Glycine max) 1.00E-180 21 0 9 1 5 6

Andean Leaf Contigs2,678 2,680 Transketolase 1 (Capsicum annuum) 0 64 0 2 0 4 582,681 1,710 Rubisco activase, chloroplast precursor

(Phaseolus vulgaris)0 72 0 0 0 20 52

2,672 911 PSII type I chlorophyll a/b-binding proteinprecursor (Glycine max)

1.00E-148 52 0 0 0 10 42

2,685 950 Ribulose bisphosphate carboxylase precursor(Phaseolus vulgaris)

1.00E-104 161 0 14 0 113 34

2,673 1,449 Glyceraldehyde-3-P dehydrogenase A,chloroplast precursor (Pisum sativum)

0 54 0 2 0 26 26

2,651 982 Chlorophyll a/b-binding protein(Glycine max)

1.00E-148 27 0 0 0 1 26

2,643 1,541 Carbonic anhydrase (Phaseolus aureus)* 1.00E-176 22 0 0 0 0 222,676 1,100 Chlorophyll a/b-binding protein CP26,

chloroplast precursor (Arabidopsis thaliana)1.00E-139 60 0 7 1 31 21

2,668 1,523 Plastidic aldolase (Trifolium pretense) 0 42 0 4 0 17 212,659 982 LHCII type I chlorophyll a/b-binding

protein (Phaseolus aureus)1.00E-149 32 0 5 0 7 20

2,633 840 Ribulose bisphosphate carboxylase precursor(Phaseolus vulgaris)*

1.00E-104 20 0 0 0 0 20

2,618 911 Chlorophyll a/b-binding protein 7,chloroplast precursor (Lycopersiconesculentum)*

1.00E-140 18 0 0 0 0 18

2,641 954 LHCII type II chlorophyll a/b-binding protein(Phaseolus aureus)

1.00E-151 22 0 1 0 4 17

(Table continues on following page.)

Ramırez et al.

1214 Plant Physiol. Vol. 137, 2005 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 5: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

Table III. (Continued from previous page.)

Pv ContigContig

LengthTop Uniref 100 Hit

Top

E-Value

Total ESTs

in Contig

Nod

ESTs

Pod

ESTs

RTS

ESTs

LVS

ESTs

ALV

ESTs

bp Cutoff of E , 1024

2,634 867 PSI reaction center subunit III (Phaseolusaureus)

1.00E-117 20 0 0 0 3 17

2,607 1,080 Chlorophyll a/b-binding protein 8,chloroplast precursor (Lycopersiconesculentum)*

1.00E-138 16 0 0 0 0 16

2,621 2,460 ATP synthase b-subunit (Hedycarya arborea) 0 18 0 0 0 3 152,655 918 Oxygen-evolving enhancer protein 3

(Pisum sativum)1.00E-77 30 0 5 0 11 14

2,661 1,235 Chlorophyll a/b-binding protein CP29(Phaseolus aureus)

1.00E-161 32 0 1 1 16 14

2,581 1,513 Phosphoribulokinase (Oryza sativa [japonicacultivar group])*

0 12 0 0 0 0 12

2,627 1,573 Triose phosphate/phosphate translocator,chloroplast precursor (Pisum sativum)

0 19 0 4 0 4 11

Mesoamerican Leaf Contigs2,685 950 Ribulose bisphosphate carboxylase precursor

(Phaseolus vulgaris)1.00E-104 161 0 14 0 113 34

2,682 931 Ribulose bisphosphate carboxylase precursor(Phaseolus vulgaris)

1.00E-104 76 0 2 0 63 11

2,686 779 Kidney bean leghemoglobin (Phaseolusvulgaris)

2.00E-75 264 211 0 0 53 0

2,680 919 PSII type I chlorophyll a/b-binding proteinprecursor (Glycine max)

1.00E-148 69 0 20 0 49 0

2,675 998 Chlorophyll a/b-binding protein (Glycinemax)

1.00E-149 59 0 14 0 45 0

2,676 1,100 Chlorophyll a/b-binding protein CP26,chloroplast precursor (Arabidopsis thaliana)

1.00E-139 60 0 7 1 31 21

2,667 746 Pro-rich protein (Glycine max) 2.00E-54 41 0 7 0 27 72,657 1,160 Chlorophyll a/b-binding protein 8,

chloroplast precursor (Lycopersiconesculentum)

1.00E-138 31 0 4 0 27 0

2,673 1,449 Glyceraldehyde-3-P dehydrogenase A,chloroplast precursor (Pisum sativum)

0 54 0 2 0 26 26

2,648 852 Ribulose bisphosphate carboxylase precursor(Phaseolus vulgaris)

1.00E-103 25 0 1 0 24 0

2,684 988 Nodulin-30 precursor (Phaseolus vulgaris) 1.00E-116 144 120 0 0 24 02,663 883 PSI light-harvesting chlorophyll a/b-binding

protein (Nicotiana tabacum)1.00E-119 36 0 5 0 22 9

2,656 982 LHCII type II chlorophyll a/b-binding protein(Phaseolus aureus)

1.00E-151 31 0 10 0 21 0

2,681 1,710 Rubisco activase, chloroplast precursor(Phaseolus vulgaris)

0 72 0 0 0 20 52

2,668 1,523 Plastidic aldolase (Trifolium pretense) 0 42 0 4 0 17 212,661 1,235 Chlorophyll a/b-binding protein CP29

(Phaseolus aureus)1.00E-161 32 0 1 1 16 14

2,608 544 Hypothetical protein (Glycine max)* 5.00E-15 16 0 0 0 16 02,658 1,022 PSI light-harvesting antenna chlorophyll

a/b-binding protein (Pisum sativum)1.00E-129 32 0 8 0 15 9

2,630 900 Plastocyanin, chloroplast precursor(Pisum sativum)

3.00E-64 20 0 2 0 15 3

2,679 940 Nodulin-30 precursor (Phaseolus vulgaris) 1.00E-118 68 54 0 0 14 0

P-Deficient Roots2,665 795 Pathogenesis-related protein 1 (Phaseolus

vulgaris)8.00E-83 40 1 0 38 1 0

2,616 1,154 Extensin class 1 protein precursor(Vigna unguiculata)

1.00E-142 17 0 2 15 0 0

2,615 1,297 Putative plasma membrane intrinsicprotein (Pisum sativum)

1.00E-148 17 0 1 12 3 1

(Table continues on following page.)

Transcriptomic Analysis of Common Bean

Plant Physiol. Vol. 137, 2005 1215 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 6: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

Table III. (Continued from previous page.)

Pv ContigContig

LengthTop Uniref 100 Hit

Top

E-Value

Total ESTs

in Contig

Nod

ESTs

Pod

ESTs

RTS

ESTs

LVS

ESTs

ALV

ESTs

bp Cutoff of E , 1024

2,620 1,819 S-Adenosyl methionine decarboxylase(Phaseolus lunatus)

0 18 0 0 11 1 6

2,563 575 Type 1 metallothionein (Dolichos lab lab)* 6.00E-35 11 0 0 11 0 02,660 815 Translationally controlled tumor protein

homolog (Glycine max)1.00E-77 32 6 10 8 7 1

2,644 791 Histone H3 (Oryza sativa) 3.00E-70 22 3 4 8 6 12,629 1,730 No BLASTX hit to Uniref 100 20 5 4 8 1 22,562 1,268 Glyceraldehyde-3-P dehydrogenase,

cytosolic (Dianthus caryophyllus)1.00E-175 11 1 2 8 0 0

2,496 729 No BLASTX hit to Uniref 100* 8 0 0 8 0 02,512 1,091 Putative elongation factor 2 (Oryza sativa

[japonica cultivar group])1.00E-158 9 0 2 7 0 0

2,479 727 Pro-rich 14-kD protein (Phaseolus vulgaris)* 8.00E-68 7 0 0 7 0 02,670 1,103 Nodulin-26 (Glycine max) 1.00E-124 43 0 20 6 11 62,441 1,455 Putative WRKY4 transcription factor

(Vitis aestivalis)1.00E-113 7 0 0 6 1 0

2,525 1,671 CYP81E8 (Medicago truncatula) 0 9 2 0 6 1 02,573 1,645 T6D22.2 (Arabidopsis thaliana) 0 12 4 1 6 0 12,376 586 No BLASTX hit to Uniref 100* 6 0 0 6 0 02,409 943 Hypothetical protein upa10 (Capsicum

annuum)*2.00E-61 6 0 0 6 0 0

2,669 1,723 T6D22.2 (Arabidopsis thaliana) 0 42 12 9 5 8 82,664 1,059 Cytosolic ascorbate peroxidase (Vigna

unguiculata)1.00E-140 38 16 7 5 7 3

Nodules2,686 779 Kidney bean leghemoglobin (Phaseolus

vulgaris)2.00E-75 264 211 0 0 53 0

2,684 988 Nodulin-30 precursor (Phaseolus vulgaris) 1.00E-116 144 120 0 0 24 02,683 1,522 CDR1 (Arabidopsis thaliana) 2.00E-87 83 77 0 0 6 02,679 940 Nodulin-30 precursor (Phaseolus vulgaris) 1.00E-118 68 54 0 0 14 02,674 864 Phaseolus vulgaris nodulin-30 (Phaseolus

vulgaris)1.00E-123 54 49 0 0 5 0

2,666 667 Kidney bean leghemoglobin (Phaseolusvulgaris)

2.00E-74 40 31 0 0 9 0

2,645 1,182 Uricase II (Phaseolus vulgaris) 1.00E-177 23 22 0 1 0 02,654 2,827 Sucrose synthase (Phaseolus vulgaris) 0 30 21 3 3 3 02,653 974 Phaseolus vulgaris nodulin-30 (Phaseolus

vulgaris)1.00E-124 29 18 0 0 11 0

2,638 947 Nodulin 30 precursor (Phaseolus vulgaris) 1.00E-118 21 18 0 0 3 02,646 1,462 Glutamine synthetase N-1 (Phaseolus

vulgaris)0 23 17 2 0 4 0

2,664 1,059 Cytosolic ascorbate peroxidase (Vignaunguiculata)

1.00E-140 38 16 7 5 7 3

2,642 1,467 Glutamine synthetase PR-1 (Phaseolusvulgaris)

0 22 14 0 4 3 1

2,636 1,462 Fructose-bisphosphate aldolase, cytoplasmicisozyme (Cicer arietinum)

1.00E-177 20 14 3 3 0 0

2,669 1,723 T6D22.2 (Arabidopsis thaliana) 0 42 12 9 5 8 82,601 1,017 MtN24 protein (Medicago truncatula) 7.00E-43 15 12 0 0 3 02,626 1,527 S-Adenosyl-L-methionine synthetase

(Dendrobium crumenatum)0 19 11 2 2 2 2

2,612 1,222 Malate dehydrogenase (Nicotiana tabacum) 1.00E-177 17 11 4 0 2 02,614 1,052 Putative phosphatase (Phaseolus vulgaris) 1.00E-142 17 10 3 2 2 02,589 822 Early nodulin 55-2 precursor (Glycine max) 1.00E-71 13 10 0 0 3 0

Ramırez et al.

1216 Plant Physiol. Vol. 137, 2005 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 7: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

contigs from each library composed of the most re-dundant ESTs are shown in Table III. Those contigshaving ESTs from a single organ source are noted asspecific. Given our methodology, contigs may appearin the top 20 of multiple tissues. A larger version ofTable III, including the UniProt accession numbers, isavailable at our Web site (http://www.ccg.unam.mx/phaseolusest/Data_download.htm; see also supple-mental data online).

Since pods were collected over a range of maturitydates, contigs composed of abundant ESTs reflectgenes involved in both pod and seed growth and de-velopment. Contigs related to seed traits have homol-ogy to albumins (UniProt accessions Q39837 andQ9ZQX0), lectin (Q8L683), lipoxygenases (O24320,P27481, and Q9FQF9), acid phosphatases (O49855),b-glucosidase (Q9XJ67), and lipid transfer proteins(O24440 and Q8W539). By comparison, contigs relatedto pod function included photosynthetic proteins suchas chlorophyll a/b-binding proteins (Q39831, Q40512,Q43437, Q9LKI0, Q9LKI1, Q9SQL2, Q9XF89, andQ9XQB1), PSI reaction center protein (Q9S7N7), andstorage protein (O23808). Unexpectedly, a contig anno-tated as nodulin 26 (contig 2,670, Q39882), whichcorresponds to a membrane transporter, contained 20pod-derived ESTs. Nodulin 26 ESTs were also found inleaves and roots, but not in nodules. A contig contain-ing numerous ESTs for alcohol dehydrogenase (contig2,662, Q8LJR2) was also found in pods. Of the 20contigs noted as those containing numerous ESTs inpods, 6 were pod specific.

The leaf contigs composed of the most abundantESTs from both the Mesoamerican and Andean culti-vars are shown in Table III. As expected, many contigsfrom leaf ESTs of both cultivars related to photosyn-thesis and similar processes. Among the contigs fromthe Andean cultivar are several involved in amino acidmetabolism. These were not evident in the Mesoamer-ican sequences. Conversely, there are 9 contigs inthe Mesoamerican group that had no comparablesequences in the Andean leaf group, including 2nodulin 30s (Q39882 and Q41121), 1 leghemoglobin(Q03972), and a carbonic anhydrase (Q9XQB0), whichis not represented in the Mesoamerican cultivar. Thus,the complement of contigs between the two germ-plasm sources was quite distinct. These differences incontigs may represent genotypic, growth condition,and/or developmental stage variables.

Because root ESTs were derived from P-stressedplants, contigs composed of abundant root ESTs reflectnot only root function, but also those that may berelated to stress. This is exemplified in the five rootcontigs containing the most abundant ESTs that havehomology to a stress-related pathogenesis protein(P25985), an extensin (Q41707), a plasma membraneintrinsic protein (Q9XGG8), a metallothionein(Q75NH5), and an S-adenosyl-methionine (SAM) de-carboxylase (Q8W3Y2), all of which are related tobiotic/abiotic stress. Noticeably, several other contigsencode putative transport/membrane, oxidative

stress, transcription factor, and phosphatase proteins.Five of the most abundant root contigs were foundonly in the root library.

Nodule contigs composed of the most abundantESTs have homology to putative proteins involved incore functions related to N fixation, including oxy-gen control (leghemoglobin Q03972] and ascorbateperoxidase [Q41712]), C metabolism (Suc synthase[Q8GTA3], Suc nonfermenting protein 1 [SNF1;Q9XIW0], aldolase [O65735], malate dehydrogenase[MDH; Q9FSF0]), amino acid synthesis (Gln synthe-tase N-1 [P00965]), and ureide synthesis (uricase[P53763] and inosine dehydrogenase [Q84XA3]).Interestingly, several of the nodule contigs encodeputative proteins functioning in plant-microbe inter-actions, for example, CDR-1 (Q6XBF8), 2-on-2 hemo-globin (Q6QDC2), epoxide hydrolyase (Q9ZP87),hypersensitive-induced response protein (Q6L4S3),and polygalacturonase (O81798). Putative membrane-trafficking and transport proteins (Q7XJQ3), nodulins24 (P04145) and 55 (Q02917), and annexin (O65848)were also highly represented. Surprisingly, of the 20nodule contigs shown in Table III, none were foundonly in nodules. Several other contigs composed of 2 to10 ESTs were nodule specific. A complete list of thecontigs containing a higher number of ESTs is avail-able (http://www.ccg.unam.mx/phaseolusest/Data_download.htm; see also supplemental data online).

Comparisons to Other Legume EST Projects

In recent years, considerable effort has focusedon the identification of nodule-enhanced or nodule-specific genes. To allow comparisons between projects,the 340 nodule-specific M. truncatula EST contigsidentified by Fedorova et al. (2002) were comparedto the 228 nodule-specific common bean contigs. Sur-prisingly, only 17 of the 340 contigs identified byFedorova et al. had homology to nodule-specific com-mon bean contigs. To determine whether this was dueto differences in gene expression between M. trunca-tula and common bean, the 340 tentative consensussequences identified by Fedorova et al. were BLASTedagainst all common bean EST sequences. Of the 340contigs, only 25% had a homolog (E , 10212) incommon bean. This suggests that an entirely differentrepertoire of genes is expressed in common beannodules. While further sequencing of nodule ESTs isnecessary to confirm this observation, some supportcomes from the work of Lee et al. (2004). Comparisonof the 20 most abundant contigs in soybean nodules tothose in common bean revealed that only leghemo-globins and Suc synthase were shared in common.

Identification of Phaseolus-Specific Contigs

Ten contigs (477; 616; 642; 825; 917; 1,041; 1,067;1,372; 1,843; and 2,376) were identified with no orPhaseolus-only BLASTX hits to the Uniref 100 proteindatabase or to non-Phaseolus sequences in the data-

Transcriptomic Analysis of Common Bean

Plant Physiol. Vol. 137, 2005 1217 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 8: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

base of legume sequences. To verify that these contigswere indeed Phaseolus specific, TBLASTX was used tocompare them to the EST_others database and theArabidopsis (Arabidopsis thaliana) genome. Compari-sons to the EST_others database would detect homol-ogy to genes expressed in a variety of conditions.Comparisons to the Arabidopsis genome allowedidentification of sequences whose expression had notbeen detected in other species and could also be usedto find homology to genes that have not yet beenpredicted. These additional analyses confirmed that 9of the 10 contig sequences were indeed Phaseolusspecific. Full-length sequencing of the ESTs in thesecontigs and RNA-blot expression studies may providefurther insight into the function of these genes.

Identification of Gene Families

Single-linkage clustering, as described by Grahamet al. (2004), was used to assign common bean contigsand singletons to putative gene families. The commonbean contig and singleton sequences were combinedinto a single dataset. This dataset was then comparedto itself using TBLASTX and an E-value cutoff of 10212.Any sequences with overlapping BLAST reports wereassigned to a putative gene family. Using this tech-nique, we were able to identify 944 gene families rang-ing in size from 2 to 109 members. A full descriptionof these data is available at our Web site (http://www.ccg.unam.mx/phaseolusest/Data_download.htm; see also supplemental data online). This type ofanalysis had two important benefits for our research.First, we could identify sequences that were likely tocross-hybridize in future northern and macroarrayexperiments. For example, group 8 was composed of109 sequences mostly with homology to protein re-ceptor kinases. While some members of this groupshow quite distant homology, others are very similar.A second advantage of this approach is that sequencesthat had no homology to known proteins often clus-tered into gene families with known proteins. In thecase of group 8, 19 of the 109 sequences were anno-tated as hypothetical proteins and 2 had no BLASThomology to the UniProt database. By comparingsequence alignments of these sequences with repre-sentative members of group 8, we can determinewhether they really are protein kinases.

Analysis of SNPs

SNPs were identified between the Andean and theMesoamerican genotypes by comparing the Andeanleaf ESTs against all other ESTs from all other tissuelibraries of the Mesoamerican genotype. A total of 645contigs (28% of the total) contained at least 1 sequencefrom both genotypes and could be mined for potentialSNPs. Two different criteria were used to identifySNPs. High-quality SNPs were confirmed by two ormore sequences from each genotype showing the samebase change. A total of 138 high-quality SNPs were

found in 72 contigs. Lower quality SNPs were con-firmed by one sequence in one genotype and at leasttwo sequences in the other. A total of 421 SNPs,representing 196 contigs, were identified in this class.Together, these 559 SNPs corresponded to 199 contigs,giving an average SNP per contig number of 2.8. Asexpected, the majority of the SNPs were due to basepair mutations (94.9%) compared to insertion-deletionevents (5.1%). Among the base pair mutations, trans-versions (34.5%) were less common than transitions(65.6%) and, among these, Cys-to-thymidine muta-tions (65.1%) were more common than adenine-to-guanine mutations (34.9%).

SNPs were found in a range of contigs. Due to thenature of the comparison between EST libraries con-ducted here, where Andean ESTs were all from leaftissue, many of the SNPs were found in contigsrepresenting highly expressed leaf genes involved inthe structure of the PSI and PSII, and in the CO2assimilation process. Confirming their high level ofexpression in leaf tissue, the photosynthesis-relatedgenes were homologous to the contigs with the great-est number of ESTs, ranging from .20 up to 161individual sequences in the case of contig 2,685 withhomology to the ribulose bisphosphate carboxylaseprecursor (Q43874).

Expression Analysis for Selected ESTs by RNA Gel Blots

Tissue-specific or tissue-enhanced ESTs were chosenfrom nodules, pods, leaves, and P-deficient root cDNAlibraries to verify transcript abundance in differentplant tissues by RNA blots. Five ESTs were selectedto verify nodule-specific and/or nodule-enhancedexpression (Fig. 2A). All were highly expressed innodules, with a sulfate transporter (contig 2,167), SNF(contig 2,434), and leghemoglobin appearing to beexpressed only in nodules. Two different-size RNAswere detected with the SNF-like cDNA probe. Most ofthe pod ESTs selected for RNA-blot analysis (Fig. 2B)are expressed in a pod-enhanced manner, indepen-dent of the EST redundancy, since pod storage protein(contig 2,671) is represented by 44 pod ESTs andmyoinositol-1-P synthase (contig 2,532) is representedby 5 pod ESTs in this cDNA library. Lipoxygenase(contig 2,628) transcript is detected in pods, but also inleaves, with the greatest expression in stems.

Figure 2C shows that, with the exception of a hypo-thetical protein (contig 2,608), most of the ESTs se-lected from the 2 leaf cDNA libraries are expressed ina leaf-enhanced manner and leaf-specific expressionwas detected for a carbonic anhydrase (contig 2,534)transcript. Interestingly, a transcript of a lower Mr forplastidic aldolase (contig 2,668) was detected in nod-ules as compared to other organs. The unexpectedhybridization of Rubisco (contig 2,682) to nodule RNAis puzzling. However, the different size of the tran-script detected in nodules could reflect a chimericclone or the presence of a very abundant transcript innodules with high homology with Rubisco. Nodule

Ramırez et al.

1218 Plant Physiol. Vol. 137, 2005 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 9: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

ESTs annotated as Rubisco can be found in M. trunca-tula and L. japonicus databases.

Transcript abundance analysis of 7 selected ESTsfrom the P-deficient root library (Fig. 2D) shows thatonly 2 (pathogenesis-related protein [contig 2,665] andaquaporin [contig 2,522]) were more highly expressedin roots than in other tissue. The pathogenesis-relatedprotein contig is composed of 38 root ESTs and theaquaporin contig is composed of 4 root ESTs. Inde-pendent of the number of ESTs, transcript levels ofaquaporin are higher than those of pathogenesis-related protein in roots. The other ESTs in Figure 2Dwere selected as specific sequences of a P-deficientroot cDNA library, but none show root-enhancedexpression. The transcript of a putative phosphatase(contig 2,286) EST was clearly detected in P-deficientroots, but was not detected in any other tissue, in-cluding roots grown in the presence of P, suggestingthat this phosphatase plays a specific role in phosphaterelease processes that take place in roots under Pdeprivation.

High-Density Macroarrays for Nodule ESTs

Macroarray approaches, as described previously(Fedorova et al., 2002; Uhde-Stone et al., 2003; Colebatchet al., 2004), were used to evaluate global expressionof the nodule-isolated ESTs. Nylon filter arrays of2,007 ESTs from the nodule cDNA library were per-formed to evaluate nodule gene expression in com-parison with other bean organs, such as root, leaf,stem, and pod, from which we used 2 experimentallyindependent sources of RNA isolated from plantsgrown under similar conditions. The spotted ESTs

included 1,486 singletons and 300 contigs, represent-ing a 1,786-unigene set.

From the 3 to 5 independent nylon filter arrayshybridized with first-strand cDNA from nodules,roots, leaves, stems, and pods, only those replicates (2or 3) with a high determination coefficient (r2 $ 0.8)were chosen to identify genes with reliable expressionlevels: those showing signal intensity values higherthan 1.5-fold the local background through all selectedtest hybridizations. A total of 565 genes were obtainedand subsequently used to calculate normalized expres-sion ratios of the nodules relative to the other organs(see ‘‘Materials and Methods’’). As expected, thesegenes exhibited significantly different expression levelsacross all organs, applying both the Student’s t test forpaired observations (P, 0.001) and the nonparametricWilcoxon signed-rank test (P, 0.001). Figure 3, A to D,shows a graphic representation of the 565 EST expres-sion ratios for nodule versus root, leaf, stem, and pod,respectively. Expression ratio values higher than 1 (tophorizontal line at y 5 1) represent ESTs with increasedexpression in nodules versus other organs (Fig. 3).Whenever the ratio was lower than 1, we estimated theinverse of that ratio and changed the sign such thatthese values will appear below the line at y 5 21 (Fig.3). Obviously, by definition, there will be no valuesbetween 1 and 21.

Figure 3A shows that the expression ratio of nodulesto roots was lower than nodules as compared to otherorgans. This might be due to the fact that either (1) theroots used for RNA isolation and macroarray hybrid-ization were obtained from nodulated bean plantsafter nodules were removed; or (2) nodules are de-rived from root cortical cells. The data shown in Figure

Figure 2. RNA blots for ESTs identified as highly expressed from nodule (A), pod (B), leaf (C), and P-deficient root (D) libraries.Total RNA (15 mg) from each organ was separated by electrophoresis, transferred to nylon membranes, and probed with each(32P)-labeled EST. Ethidium-stained gel shows the amount of RNA in each lane is equivalent. Lanes: N, nodule; R, root; S, stem; L,leaf; P, pod. Leghem, Leghemoglobin; ADH, alcohol dehydrogenase.

Transcriptomic Analysis of Common Bean

Plant Physiol. Vol. 137, 2005 1219 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 10: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

3A revealed that 31 ESTs had 5-fold or higher nodule-root expression ratios. From these, 2 ESTs identified asvillin 2 (NOD_247_F07) and Suc synthase (contig2,654) showed the highest expression ratio (8; TableIV). Forty-nine ESTs had a higher expression in rootsas compared to nodules (Fig. 3A). From these, an ESTidentified as ring-H2 finger protein (contig 905) has thehighest expression in roots versus nodules (expressionratio 5 212).

Greater differences in ratios of gene expression wereobserved when comparing nodules with leaves andstems; these large ratios reflect very different functionbetween nodules and those source organs (Fig. 3, Band C). In nodules versus leaves and stems, 188 and294 ESTs had expression ratios of 10 or higher, re-spectively (Fig. 3, B and C). From these, 99 and 138ESTs were expressed 20-fold or more in nodules thanin leaves and stems, respectively. In the comparisonsof nodules versus leaves and nodules versus stems,totals of 6 and 26 ESTs, respectively, were found withexpression ratios higher than 50. As shown in Table IV,at least 15 ESTs showed very high expression ratios(ranging from 52–135) both in nodule-leaf and nodule-stem. The functional categories of these ESTs wereidentified as proteins for nodulation or nodulins, suchas leghemoglobin (contig 2,686), nodulin 30 (contig2,679), and early nodulin 55-2 (contig 2,589), as well as

proteins involved in C metabolism, defense, or regu-lation. Data from Figure 3, B and C, show that 61 and44 ESTs, respectively, were more expressed in leavesand in stems than in nodules. The most highly ex-pressed ESTs in leaves versus nodules were identifiedas VirF-interacting protein (NOD_225_E10; expressionratio 5 29), ring-H2 finger protein (contig 905; ex-pression ratio 5 28), and one without homology toknown genes (contig 2,009; expression ratio 526); thefirst 2 were also highly expressed in roots and pods ascompared to nodules.

Pods, as well as nodules, can be considered as sinkorgans; pods receive photosynthate from the leavesand mobilize N for pod development and seed for-mation. In general, expression ratios found in nodulesversus pods were not as high as those found whencomparing leaves and stems (Fig. 3). A total of 197ESTs had nodule-pod expression ratios higher than 10.From these, 65 had 20-fold or higher expression ratios.Only 3 ESTs (nodulin 30, an unknown protein, anda hypothetical protein) had nodule-pod expressionratios higher than 50. Forty-three ESTs were morehighly expressed in pods than in nodules (Fig. 3D);VirF-interacting protein (NOD_225_E10; expressionratio 5 29) and a nonidentified EST showed thehighest expression in pods versus nodules (expressionratio 5 26).

Figure 3. Macroarray expression ra-tios of common bean ESTs. Polyubi-quitin-normalized expression ratios ofnodules (N) versus root (R), leaf (L),stem (S), and pod (P) were obtained for565 selected ESTs as explained in‘‘Materials and Methods.’’ Expressionratios: (A) N/R; (B) N/L; (C) N/S; and(D) N/P. ESTs more expressed in nod-ules than in other organs are plotted asratios higher than 1 (values above thetop horizontal line at y 5 1). When-ever the ratio was lower than 1, theinverse of that ratio was estimated,and the sign was changed (values be-low the bottom horizontal line at y 5

21). Black circles indicate expressionratios $30.

Ramırez et al.

1220 Plant Physiol. Vol. 137, 2005 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 11: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

Transcriptomic Analysis of Nodule C and N Metabolism

Although transcriptome studies of genes related tonodule N and C metabolism have been reported forthe model legumes M. truncatula (Gyorgyey et al.,2000) and L. japonicus (Colebatch et al., 2002, 2004),temperate species that assimilate and transport fixedN as amides, comparable information for soybean,which assimilates and transports fixed N as ureidessimilar to common bean, was recently published (Leeet al., 2004). Analysis of common bean root nodulecontigs and ESTs by macroarray experiments showedthat numerous genes encoding enzymes of N and Cmetabolism had enhanced expression in nodules andshowed a high ratio of expression compared to othertissues.

At least 11 enzymes of C metabolism appeared tohave enhanced expression as evidenced by either theexpression ratio of nodule-root in macroarrays orabundant ESTs (Table V). Notably Suc synthase (contig2,654) and phosphoenolpyruvate (PEP) carboxylase(PEPC; contig 2,265), enzymes that contribute to sugar

use, had high expression (Figs. 2 and 4). Severalenzymes involved in general glycolysis (triose phos-phate isomerase [contig 2,550], phosphoglyceratekinase [contig 2,537], and enolase [contig 2,622]), alsohad enhanced transcript levels (Fig. 4), as well as Glc-6-P dehydrogenase (Table V), a key source of NADPHfor nodules.

With respect to N metabolism, four enzymes relatedto initial assimilation of fixed N into Gln had enhancedexpression. In addition, another two enzymes relatedto ureide metabolism had enhanced expression.

Confirmation of macroarray results and contig anal-ysis for common bean root nodule genes involved in Cand N metabolism was obtained through RNA blots(Fig. 4). Even though expression of most genes in-volved in C metabolism that we tested was not nodulespecific, the greatest transcript abundance was usuallyfound in nodules. Reflecting nodule function, thosegenes involved in N metabolism are most clearlyexpressed preferentially in nodules (Fig. 4). In contrastwith soybean (Lee et al., 2004), in bean the abundanceof those ESTs involved in ammonia assimilation and

Table IV. Macroarray expression ratios of P. vulgaris ESTs identified as abundant in the root nodule library

Expression ratios are derived from hybridization of macroarrays spotted with root nodule ESTs and probedwith (32P) cDNA synthesized from nodule(N), root (R), leaf (L), stem (S), and pod (P) RNA.

Annotation Functional CategoryExpression Ratioa

N/R N/L N/S N/P

Leghemoglobin 2 Nodulation 2 30 135 37Acyl-ACP thioesterase Fatty acid metabolism 2 27 102 30D-Isomer-specific 2-hydroxyacid dehydrogenase C metabolism 4 32 99 37Nodulin-30 Nodulation 5 80 85 52TMV resistance protein N Defense 4 47 75 20Trehalose-6-P phosphatase C metabolism 2 27 71 22Genomic DNA chromosome 3 Unknown 4 14 68 9Early nodulin 55-2 Nodulation 3 30 68 18Unknown protein Unknown 6 32 68 54Ring finger protein Regulation 2 27 65 15Lipoxygenase Defense 2 20 62 17Enolase C metabolism 3 22 60 13Actin-depolymerizing factor 2 Cell structure 3 38 60 38Embryo-specific protein Plant development 3 55 13 18Pyruvate dehydrogenase E1 component C metabolism 2 21 52 40Protein kinase Regulation 2 51 34 20Elicitor-inducible protein Defense 4 41 49 16Dormancy-associated clone AFD1 Regulation 2 34 47 20Diaminopimelate decarboxylase Protein metabolism 2 21 47 19Villin 2 Plant development 8 16 44 26Delta-aminolevulenic acid dehydratase Energy metabolism 6 42 24 36Carboxylesterase Unknown 3 42 29 15Methylmalonate semialdehyde dehydrogenase Energy metabolism 2 27 31 13Alcohol dehydrogenase class III C metabolism 2 21 21 30Hypothetical protein Unknown 6 21 22 26Zinc finger protein Regulation 3 17 24 12Nitrogen fixation-like protein Nutrient assimilation 6 11 14 24Adenylsuccinate lyase-like protein C metabolism 7 22 15 12Suc synthase C metabolism 8 18 14 10Ankyrin protein Plant development 2 9 18 11No apical meristem protein Plant development 6 9 10 8

aAverage of two or three independent membrane hybridizations. The highest expression ratio for each EST is indicated by bold numbers.

Transcriptomic Analysis of Common Bean

Plant Physiol. Vol. 137, 2005 1221 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 12: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

ureide synthesis clearly reflects the nodule metabolicprofile. Consistent with other studies of noduleN metabolism, NADH-dependent Glu synthase(GOGAT) transcript levels were quite low. Moreover,no NADH-GOGAT ESTs were found in the root nod-ule sequences. Interestingly, pods have abundanttranscript expression for many of the genes involvedin nodule N and C metabolism.

DISCUSSION

In this article, we provide an initial platform forfunctional genomics of common bean by the identifi-cation of almost 8,000 unique genes assembled frommore than 20,000 ESTs sequenced from various plantorgans. These sequences enrich the collection of ESTsin this important crop and provide new understandingof bean metabolism, development, and adaptation tostress. Roughly 3,400 to 4,900 ESTs were sequencedfrom each of 5 cDNA libraries of different bean tissues,and we identified 2,226 contigs (with 2 or more ESTseach) which were classified into 15 functional sub-groups. From these contigs, 36% represented se-quences of unknown function or had no homology topreviously identified proteins in the UniProt database(Apweiler et al., 2004). Another 34% corresponded togenes involved in C and N metabolism. These sub-group percentages are similar to those noted fornodules of M. truncatula (Gyorgyey et al., 2000) andL. japonicus (Colebatch et al., 2004) and proteoid rootsof white lupin (Lupinus albus; Uhde-Stone et al., 2003).The third most abundant common bean functionalsubgroup was composed of contigs involved in signaltransduction (8.7%). Transcripts involved in signaltransduction were also large components of the ESTsnoted in M. truncatula and L. japonicus (Fedorova et al.,

2002; Colebatch et al., 2004). Some 5.7% of the beancontigs corresponded to genes implicated in biotic/abiotic stress. An abundance of contigs related to stressmay be due to our selection of the libraries, rootnodules, and P-deficient roots. Uhde-Stone et al. (2003)reported that 10.7% of the ESTs sequenced fromP-deficient cluster roots of white lupin were relatedto stress. Although there are some 986,000 nucleotidesequences deposited in GenBank that are derived fromthe Fabaceae family, prior to this report only 575 camefrom common bean. We have extended that number byover 25-fold. These bean EST sequences provide thefoundation for genome-wide transcript studiesthrough either macro- or microarrays. In addition,they are a source of defined molecular markers formapping bean linkage groups and anchoring physicalmaps.

A comparison of EST redundancy in contigs havingsequences derived from multiple organs can providea broad overview of gene expression and biochemicalfunctions occurring within an organ (Colebatch et al.,2002, 2004; Fedorova et al., 2002; Journet et al., 2002;Uhde-Stone et al., 2003). Both common and uniquefeatures of plant organs may be identified by extract-ing and comparing a limited number of contigs con-taining redundant EST sequences, as evidenced by ouridentification of the 20 primary contigs from eachlibrary having the most EST sequences. Cursory ex-amination of the top 20 contigs showed, not unexpect-edly, that 82% of those from leaves encode proteinsrelated to a single function, photosynthesis/light har-vesting. Some 45% of the most prominent pod contigsencode proteins related to three functions, proteinstorage, lipid metabolism, and photosynthesis. Simi-larly, 75% of the 20 most prominent nodule contigsencode proteins related to nodulins, C and N assim-ilation, and oxygen control. Although there is more

Table V. Genes encoding enzymes of C and N metabolism having enhanced expression in nodules asevidenced by macroarrays and/or contigs composed of abundant ESTs

Annotation Pathway ContigExpression

Ratio N/R

No. of ESTs

in Contig

CarbonSuc synthase Suc cleavage 2,654 8 21Aldolase Glycolysis – 4 14Triosephosphate isomerase Glycolysis 2,550 3 4Phosphoglycerate kinase Glycolysis 2,537 3 3Enolase Glycolysis 2,622 4 8Pyruvate kinase Glycolysis 648 3 3PEPC Organic/amino acids 2,265 6 3MDH Organic acid 2,612 2 11G-6-PDH Pentose phosphate 843 3 2D-Ribulose-5-P 3-epimerase Pentose phosphate – 2 2

NitrogenGS-b NH4 assimilation 2,642 2 14GS-g NH4 assimilation 2,646 4 17Asparaginase Amino acid synthesis 2,546 1 4Phosphoribosylformylglycinamideamidotransferase

Purine synthesis 2,033 3 4

Uricase Ureide synthesis 2,645 5 22

Ramırez et al.

1222 Plant Physiol. Vol. 137, 2005 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 13: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

diversity of function among the 20 most prominentcontigs from P-deficient roots, at least 13 encodeproteins related to stress. For simplicity, we havenoted the contigs involved in the primary functionsof bean organs; more detailed analysis of the entireEST profile from each organ will reveal other featuresthat may prove important to bean biology and im-provement.

Transcript expression evaluated by macroarraysprovides a detailed picture of nodule biology, partic-ularly C and N metabolism. While whole-noduletranscript studies of L. japonicus and M. truncatula(Colebatch et al., 2002, 2004; Fedorova et al., 2002;Journet et al., 2002) show significant gene induction ofenzymes related to amide production, particularly 4-Corganic acids and Asn, common bean-nodule C and Nenzymes favor ureide synthesis. Enhanced transcriptexpression for nodule Glc-6-P dehydrogenase the firstcommitted step in the oxidative branch of the pentosephosphate pathway, supports an interpretation thata portion of bean nodule metabolism favors produc-tion of NADPH and ribulose-5-P, the componentsugar, for de novo purine synthesis (Table V). Induc-tion of the nonoxidative branch of the pentose phos-

phate pathway is shown by increased expression ofribulose-5-P 3-epimerase, which can provide both3- and 6-C intermediates for PEP and glycolysis, re-spectively (Fig. 4). Metabolism favoring ureide produc-tion is also reflected in the fact that all of the enzymesrequired for de novo purine synthesis can be found asnodule ESTs with some being nodule specific. Directproduction of ureides from purines is demonstratedby increased transcript abundance for uricase andxanthine dehydrogenase along with a phosphoribosyl-formylglycinamide amidotransferase contig that iscomposed mainly of nodule ESTs (Fig. 4; Table V).

We found several enzymes related to sugar use andglycolysis to be up-regulated in nodules that are alsoreflected in the contigs containing abundant ESTs fromnodules. Suc synthase, the initial enzyme in Suc cleav-age, which is critical for N fixation, is highly expressedin bean nodules (Fig. 4) and the corresponding contig(2,654) has 21 ESTs from nodules. Interestingly, the fiveenzymes of glycolysis (Table V; Fig. 4) that we find en-hanced in bean nodules are involved in the synthesis of3-C intermediates that ultimately lead to PEP, which isthe fundamental backbone for both malate and Asnsynthesis (Deroche and Carrayol, 1988).

Figure 4. RNA blots for ESTs involved innodule C and N metabolism identified ashighly expressed in root nodules. TotalRNA (15 mg) from each organ was sepa-rated by electrophoresis, transferred tonylon membranes, and probed with each(32P)-labeled EST. Ethydium-stained gelshows the amount of RNA in each laneis equivalent. Lanes: N, nodule; R, root; S,stem; L, leaf; P, pod. R5PE, ribulose 5-Pepimerase; TPI, triose phosphate isomer-ase; PGK, phosphoglycerate kinase; PK,pyruvate kinase; PDH, pyruvate dehydro-genase; CS, citrate synthase; SS, Suc syn-thase; GS, Gln synthetase b and g

isoenzymes; AAT-2, Asp aminotransferaseisoform 2; PRFGA, phosphorybosylfor-mylglycinamide amidotransferase; XDH,xanthine dehydrogenase.

Transcriptomic Analysis of Common Bean

Plant Physiol. Vol. 137, 2005 1223 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 14: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

Malate is considered the primary C source in nodulesused by bacteroids for energy to reduce N (Appels andHaaker, 1991). It is interesting that the two pivotalenzymes required for malate synthesis, PEPC andMDH, have enhanced expression in bean nodules andare represented in the most abundant contigs fromnodules.

The initial assimilation of fixed N into the noduleamino acid pool is catalyzed by glutamine synthetase(GS) and NADH-GOGAT in concert with Asp amino-transferase (AAT; Gantt et al., 1992). Macroarray andRNA blots show that both GS isoforms, b and g (Laraet al., 1984), as well as AAT have enhanced expressionin bean nodules and have numerous ESTs in the nodule(Fig. 4). By contrast, we did not find any NADH-GOGAT ESTs, but RNA blots show the transcript isenhanced in nodules but expressed at a low level (Fig.4). This is consistent with the suggestion that NADH-GOGAT may be the rate-limiting step in nodule Nassimilation (Temple et al., 1998). We have isolated twodistinct NADH-GOGATcDNAs from nodules and oneappears to be related to N2 fixation (M. Lara, L. Blanco-Lopez, and C. Vance, unpublished data).

During the review of our submission, Lee et al.(2004) reported an analysis of the soybean root nodule-enhanced transcriptome. Surprisingly, a comparisonof the 20 most abundant contigs in soybean nodules tothose in common bean revealed that only leghemo-globins and Suc synthase were present in both. Theremaining complement of the 20 most abundantnodule contigs was quite diverse. Comparisons tothe nodule-specific ESTs identified by Fedorova et al.(2002) also demonstrated little overlap. Another note-worthy difference between the nodule contigs ofvarious legumes is the absence in common bean andsoybean of contigs encoding Cys cluster proteins andcalmodulin-like proteins, which are highly abundantin M. truncatula nodules (Graham et al., 2004). Thesestriking differences in nodule contigs between speciesillustrate not only the diversity of legume nodulegenes but also the importance of transcriptome anal-ysis of the same organ from different species.

Although the abundance of ESTs within a contigderived from an organ or tissue can frequently corre-late with transcript expression within the tissue, con-clusions drawn from in silico analyses can bemisleading. We chose 20 ESTs that were specific to orhighly enhanced in a particular organ and evaluatedtheir expression in various organs by RNA blot (Fig. 2).Although many of the ESTs gave expression patternssimilar to that expected from in silico data, several hadabundant expression in organs other than the one fromwhich they were selected. For example, lipoxygenase(contig 2,628), a hypothetical protein (contig 2,632),and zinc finger protein (contig 2,266) derived frompods, leaves, and roots, respectively, have quite highexpression in other organs. These results could be dueto several factors, including mRNA stability, growthconditions, and developmental stage. An added com-plexity in correlating in silico results with RNA blots is

the occurrence of contigs as multigene families. In fact,of the 2,226 contigs we identified, 943 belonged to genefamilies. At this stage of limited sequencing, most(557) of the gene families are composed of 2 sequences,while 36 gene families contain 10 sequences, and 3gene families contain 601 sequences. From this anal-ysis, we can conclude that 21% (3,358) of the ESTs usedfor contig assembly are members of gene families.Inclusively, our findings show the necessity of verify-ing in silico EST expression data by RNA blots and/orquantitative reverse transcription-PCR.

This study also showed the utility of mining ESTcollections in common bean for SNPs. To reduce errorscaused by single-pass sequencing and low base qualityvalues, we used two different criteria for identifyingSNPs. Lower quality SNPs were supported by onesequence in one genotype and at least two sequencesin the other. Using these criteria, a SNP could be foundevery 508 bp. High-quality SNPs were supported by atleast two sequences from each genotype. Similarly,these criteria identified a SNP every 601 bp. Bycombining these data together, we identified 529SNPs in 214 kb of SNP-containing contigs, givinga SNP every 387 bp. These values are similar to thosefound for equivalent comparisons made in other in-breeding species of plants, but less frequent than inmaize (Tenaillon et al., 2001). It was promising to findthis frequency of SNPs in coding regions of commonbean and perhaps was not unexpected due to the largegenetic differences between the source genotypeswhereby each represented a different gene pool ofthe species (Andean and Mesoamerican, respectively;Broughton et al., 2003). It would be necessary toconfirm SNP frequency in further analysis of a greaternumber of ESTs or whole-gene sequences and ina panel of representative common bean genotypesthat could be screened by PCR amplification andresequencing as has been done in other crop species(Tenaillon et al., 2001; Zhu et al., 2003; Russell et al.,2004). Our discovery of a large number of SNPs inexpressed sequences should allow the genetic map-ping of many of the genes underlying agriculturalcharacteristics in common bean and, given the leaflibrary source of ESTs evaluated in this study, it wouldbe possible to begin with the genetic mapping ofseveral genes important in photosynthetic efficiency.SNP marker development, however, will depend onthe establishment of technology and experimentalprotocols that allow their routine use in plant breedingprograms (Morales et al., 2004).

Because of our overriding interests in bean rootnodule development and function, this project wasinitiated to focus on global characterization of beannodule transcripts. This priority is evidenced by ourin-depth analysis of bean nodule metabolism. As thePhaseomics consortium coalesced and defined its goals,it became apparent that the bean community neededEST profiles of additional bean organs. Thus, wesequenced ESTs from pods, leaves, and P-deficientroots. Future reports will concentrate on more detailed

Ramırez et al.

1224 Plant Physiol. Vol. 137, 2005 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 15: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

characterization of and research with ESTs from otherbean organs and development of SNP-based molecu-lar markers from the current set of EST sequences.

MATERIALS AND METHODS

Plant Material

Two genotypes of common bean (Phaseolus vulgaris) were used for library

construction. The first was the Mesoamerican cultivar Negro Jamapa 81,

plants of which were grown in greenhouses at Centro de Investigacion sobre

Fijacion de Nitrogeno (CIFN)/Universidad Nacional Autonoma de Mexico

(Cuernavaca, Mexico) and at University of Minnesota (St. Paul), as previously

reported (Ortega et al., 1992). Negro Jamapa 81 is a black-seeded variety that

was selected by F. Cardenas at the Experimental Station in Cotaxtla,Veracruz,

from a landrace collection. Plants of Negro Jamapa were inoculated with

Rhizobium tropici CIAT 899 and watered with N-free nutrient solution, as

reported by Silvente et al. (2003); mature nodules were collected 15 d post-

inoculation (dpi). Leaves were collected from inoculated plants 15 dpi; plants

were at a vegetative developmental stage, prior to flowering. Pods at different

stages of development were collected from inoculated plants. For P-deficiency

stress conditions, seedlings of Negro Jamapa were germinated for 3 d, the

cotyledons were cut, and the plantlets were watered for 3 weeks with nutrient

solution deprived of P, showing evident symptoms of P deprivation. The

second genotype used was the Andean cultivar G19833, which was grown in

a greenhouse at CIAT in Cali, Colombia, under a 12-h photoperiod, average

relative humidity of 74.7%, and night/day temperatures of 20�C/28�C. The

cultivar G19833 is a landrace from Peru, with yellow and black mottled seed

that is one parent of a genetic-mapping population that has been used at CIAT

to map microsatellite markers (Blair et al., 2003) and to study low-P adaptation

quantitative trait loci (Liao et al., 2004; Yan et al., 2005).

Preparation of cDNA Libraries

A total of 5 cDNA libaries were made, 4 from Negro Jamapa 81 and 1 from

G19833. In the case of Negro Jamapa 81, total RNA was isolated from different

plant organs: (1) young (1.5–5 cm) and mature (15 cm) pods from inoculated

plants; (2) leaves from 15-d-old nodulated plants; (3) roots from P-deficient

plants; and (4) mature effective nodules harvested after 15 dpi with R. tropici

CIAT 899. For all the libraries made from Negro Jamapa 81, poly(A1) RNA

was obtained from total RNA using oligo(dT) cellulose. The poly(A1) RNA

used for the pod library was obtained from total RNA combined from young

and mature pods in a 1:1 (w/v) ratio. Conversion of polyadenylated RNA to

cDNA was performed in the phage Uni-ZAP XR with a Stratagene (La Jolla,

CA) synthesis and cloning kit. The cDNA synthesis of poly(A1) mRNA was

primed by oligo(dT)-XhoI adapter primer with MNLV-reverse transcriptase,

while the second strand was synthesized via polymerase I ribonuclease H

coincubation. EcoRI adapter was added to the blunted double-stranded cDNA

followed by XhoI digestion. Recovered cDNA was directionally cloned into

the EcoRI-XhoI Uni-ZAP XR vector, according to the manufacturer’s instruc-

tions. The cDNA from all libraries was size selected via Sephacryl S-500 spin

columns as part of the procedure described by the manufacturer (Stratagene).

The fifth cDNA library, made for the genotype G19833, was prepared from

total RNA isolated from leaves and vegetative meristems of 3-week-old

plants. For this library, poly(A1) RNA was purified and reverse transcribed,

and cDNAs were directionally cloned into the NotI/SalI sites of the

pCMVSport6.0 vector (Invitrogen, Carlsbad, CA).

Generation of ESTs

For conversion of the 4 Negro Jamapa 81 cDNA phage libraries (ZAP XR

vector) into the plasmid form (pBluescript), mass excision was performed,

according to the procedure described by the manufacturer (Stratagene). Single

colonies of Escherichia coli strain SOLR carrying the excised phagemid were

replicated, and glycerol stocks were stored in microtiter plates at 280�C.

Plasmid DNA from a nodule cDNA library was isolated using the QIAprep 96

Turbo Miniprep kit, according to the manufacturer’s instructions (Qiagen,

Valencia, CA). The plasmid DNA isolation of the other three libraries was

made by a modified alkaline lysis method. Sequencing of the plasmid cDNA

was performed by the Advanced Genetic Analysis Center (St. Paul) for the

pod, root, and nodule libraries and at the CCG (Cuernavaca, Mexico) for the

leaf library. Standard T3 sequencing primer was used for 5# single-stranded

sequencing. For the G19833 library, the clones were transformed into E. coli

EMDH12S cells, which were plated on Q plates with carbenicillin (100 mg

L21). A Q-Bot was used to pick and array colonies into plates and filters.

Plasmid DNA was isolated using a modified alkaline lysis method and the

individual cDNAs were sequenced either from the 5# end, using a SP6 primer,

or from the 3# end with a T7 primer at the Clemson University Genomics

Institute (Clemson, SC) and at CIAT.

EST Processing and Contig Assembly

Common bean EST sequences were analyzed using a processing pipeline

developed by the Center for Computational Genomics and Bioinformatics

(CCGB) at the University of Minnesota (Lamblin et al., 2003). Sequence base

calls were made using Phred (Ewing et al., 1998) with a quality cutoff of 10.

Vector filtering was performed using the CCGB program gstvf4 (Lamblin et

al., 2003). Processed ESTs 100 bases or longer were assembled into contigs

using Phrap (http://www.phrap.org/phredphrap/phrap.html) with a mini-

mum match of 50 and a minimum score of 100. Once contig assembly was

completed, visualization software developed by CCGB was used to assess

contig quality. Contigs were individually inspected for low-quality sequence,

chimeras, and splice variants. Following the trimming of low-quality sequence

and the removal of chimeras and splice variants, the ESTs were reassembled.

This procedure was repeated three times or until no new ESTs were added to

contigs. The final assembly constituted the common bean gene index.

BLAST Analyses

BLASTX (Altschul et al., 1997) comparisons against the Uniref 100 protein

database (August, 2004; Apweiler et al., 2004) were used to assign putative

function to common bean contigs and singletons. In addition, TBLASTX was

used to compare the common bean sequences to a database of legume

sequences. This database included the Lotus japonicus, Medicago truncatula,

and Glycine max/soja gene indices from The Institute for Genomic Research

(TIGR; Quackenbush et al., 2001) and all publicly available sequences from

the genera Arachis, Lupinus, Phaseolus, Robinia, and Pisum (available from

the NCBI taxonomy browser [http://www.ncbi.nlm.nih.gov/Taxonomy/

taxonomyhome.html]). For both BLAST searches, an E-value cutoff of 1024

was used and Perl scripts were used to parse the results.

Contigs with no or Phaseolus-only BLASTX (Altschul et al., 1997) hits (E,

1024) to the Uniref 100 protein database and the database of legume sequences

were identified as candidate Phaseolus-specific contigs. For further verifica-

tion, TBLASTX was used to compare these contigs to GenBank’s EST_others

database (August, 2004) and the Arabidopsis (Arabidopsis thaliana) genome

(The Arabidopsis Information Resource [TAIR] version 5). Contigs with no

hits more significant than 1024 to non-Phaseolus sequences in these databases

were considered Phaseolus specific.

To allow comparisons between EST projects, the nodule-specific M.

truncatula EST contigs identified by Fedorova et al. (2002) were compared to

all common bean sequences. For this analysis, the program TBLASTX

(Altschul et al. 1997) was used with and E-value cutoff of 10212.

RNA Gel Blots

For northern analysis, RNA was extracted from 0.2 g of frozen nodule, root,

stem, and leaf using an RNA extraction kit (BIO-101, Irvine, CA). The RNA

(10 mg) was denatured in 50% formamide, 17% formaldehyde, and 10%

MOPS buffer (200 mM MOPS, pH 7.0, 50 mM Na-acetate, and 1 mM EDTA) at

65�C for 5 min. Twenty micrograms of total RNA were separated on 1.2% aga-

rose gel containing 2.2 M formaldehyde in MOPS buffer and transferred to posi-

tively charged nylon membranes (Hybond-N1; Amersham, Buckinghamshire,

UK) by downward capillary transfer in 203 SSC. After a 30-min prehybridi-

zation (300 mM Na2HPO4, pH 7.2, 7% SDS), the blot was hybridized for 24 h at

65�C with [32P]-labeled specific probes. After stringent washing, radioactive

membranes were exposed to x-ray film (Kodak, Rochester, NY) overnight at

270�C. Three repetitions were done for each probe and similar results were

obtained. The blots shown are representative of the three repetitions.

Nylon Filter Arrays

The cDNA portion of each nodule EST was amplified by PCR, using

standard T3 and T7 primers. Before spotting, the quality of each PCR product

Transcriptomic Analysis of Common Bean

Plant Physiol. Vol. 137, 2005 1225 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 16: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

was evaluated by gel electrophoresis. The PCR products were spotted in

replicate, onto Gene Screen Plus membranes (NEN Life Science Products,

Boston) using the Q-bot (Genetix, Boston) automated spotting system with

a 96-pin gravity griddling head with 0.4-mm pin diameter.

Total RNA was isolated from mature nodules elicited by R. tropici CIAT 899

and nodule-deprived roots, leaves, and stems from inoculated Negro Jamapa

81 bean plants at 18 dpi. Pod RNA was obtained from a mixture of young

developing and mature pods taken from two independent sources. In two

independent experiments, RNA was isolated from the organs of plants grown

under similar conditions. Total RNA was also isolated from P-deficient roots.

Radiolabeled cDNA probes were synthesized by reverse transcription of 30 mg

of total RNA for 1 h in the presence of 50 mCi g [32P]dATP using SuperScriptII

reverse transcriptase, according to the manufacturer’s instructions (Strata-

gene) at 42�C. To complete cDNA synthesis, the reaction was prolonged for

30 min with 1 mL of 5 mM cold ATP. Unincorporated g[32P]dATP was removed

by purification with a Sephadex G50 column and labeling efficiency was

measured by scintillation counting. The final concentration of each probe

was adjusted to 106 cpm mL21 hybridization solution. Hybridizations were

performed in 50% (w/v) formamide, 0.5 M Na2HPO4, 0.25 M NaCl, 7% (w/v)

SDS, and 1 mM EDTA at 42�C. Blots were washed with 3 subsequent washes:

23 SSC/0.1% SDS; 0.53 SSC/0.1% SDS; 0.13 SSC/0.1% SDS at 42�C in

200 mL of wash buffer. Four to seven independent nylon filter arrays were

hybridized with cDNA from each organ.

Data Analysis of Nylon Filter Arrays

Radioactivity of each spot was quantified using a Phosphor Screen imaging

system (Molecular Dynamics, Sunnyvale, CA). The signal intensity of each

spot was determined automatically using the software Array-Pro Analyzer

(Media Cybernetics, Carlsbad, CA). This program allows the normalization of

quantified signals against the background. The normalized intensities were

reported in Excel (Microsoft, Redmond, WA) files and linked to the corre-

sponding cDNA clone. In order to work with highly reproducible experi-

ments, linear regression analysis was performed for each pair of membrane

replicas; only those replicas for which the linear model could explain at least

80% of the variation (determination coefficient r2 $ 0.8) were further taken

into consideration. This process yielded a total of 3, 3, 2, 2, and 2 well-

correlated replicas for nodule, root, leaf, stem, and pod, respectively.

Genes were considered as reliably expressed if they showed intensity/

background ratios greater than 1.5 through all related parallel hybridizations.

A final gene set was obtained by joining the genes expressed in each organ and

removing all duplications. Single expression values per organ were then

calculated as the gene average expression in the sets of correlated replicas.

Given that the expression differences between any two organs follow a bell-

shaped distribution (data not shown), the t test for paired observations was

applied to determine whether genes show significantly different expression

values from organ to organ. Nevertheless, we also applied the nonparametric

Wilcoxon signed-rank test for matched pairs, which does not rely upon the

assumption of normality. Both tests strongly supported the hypothesis of

differential expression (P , 0.001).

The housekeeping gene polyubiquitin (EST NOD_206_B07) served as an

internal normalization control for calculating expression ratios between pairs

of organs. The signal intensity value of each gene was divided by the signal

value of the polyubiquitin EST in the respective organ. Normalized expression

ratios were estimated by dividing the polyubiquitin-normalized signal in-

tensities in nodules by the polyubiquitin-normalized signal intensities in the

other organs. Original signal intensities and transformed data of all experi-

ments are available from our Web site (http://www.ccg.unam/phaseolusest/

Data_download.htm; see also supplemental data online).

Identification of Gene Families Using

Single-Linkage Clustering

In order to identify gene families, the common bean contigs and singletons

were combined into a single dataset. TBLASTX (E-value cutoff of 10212) was

used to compare the dataset against itself. As described by Graham et al.

(2004), any sequences with at least one sequence in common in their BLAST

reports were combined into a putative gene family.

Identification of SNPs

The ace file output of Phrap was used as input to the PolyBayes SNP

detection program along with the base values assigned by Phred for each of

the contigged sequences. Perl scripts were used to parse the PolyBayes output

file and identify SNPs in two categories. High-probability SNPs had SNP

probability values .0.5 and the specific SNP was found in two EST sequences

from each genotype. Lower probability SNPs had SNP probability values .0.5

and the SNP were found in one EST from one genotype and at least two ESTs

from the other. Perl scripts were used to identify and store 50 bp of sequence

on either side of the SNP.

Upon request, all novel materials described in this publication will be

made available in a timely manner for noncommercial research purposes,

subject to the requisite permission from any third-party owners of all or parts

of the material. Obtaining any permission will be the responsibility of the

requester.

Sequence data from this article have been deposited with the EMBL/

GenBank data libraries under accession numbers CV528971 through

CV544303.

ACKNOWLEDGMENTS

We acknowledge the technical assistance provided by Mike Atkins, Mike

Palmer, and Jeff Tomkins at the Clemson University Genomics Institute and

help from Monica C. Munoz, Eliana Gaitan, and Joe Tohme at CIAT. We also

gratefully acknowledge Guillermo Davila and Rosa I. Santamaria for pro-

viding the facility and for technical assistance for DNA sequencing at CCG,

Unversidad Nacional Autonoma de Mexico, and for Eric Verdorn’s assistance

in bioinformatics at the University of Minnesota.

Received October 20, 2004; returned for revision January 21, 2005; accepted

January 30, 2005.

LITERATURE CITED

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,

Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of

protein database search programs. Nucleic Acids Res 25: 3389–3402

Appels MA, Haaker H (1991) Glutamate oxalacetate transaminase in pea

root nodules. Plant Physiol 95: 740–747

Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S,

Gasteiger E, Huang H, Lopez R, Magrane M, et al (2004) UniProt: the

universal protein knowledgebase. Nucleic Acids Res 32: D115–D119

Blair MW, Pedraza F, Buendia HF, Gaitan-Solıs E, Beebe SE, Gepts P,

Tohme J (2003) Development of a genome-wide anchored microsatellite

map for common bean (Phaseolus vulgaris L.). Theor Appl Genet 107:

1362–1374

Broughton WJ, Hernandez G, Blair M, Beebe S, Gepts P, Vanderleyden J

(2003) Beans (Phaseolus spp.)—model food legume. Plant Soil 252: 55–128

Colebatch G, Desbrosses G, Ott T, Krusell L, Montanari O, Kloska S,

Kopka J, Udvardi MK (2004) Global changes in transcription orches-

trate metabolic differentiation during symbiotic nitrogen fixation in

Lotus japonicus. Plant J 39: 487–512

Colebatch G, Sebastian K, Ben T, Susanne F, Thomas A, Udvardi MK

(2002) Novel aspects of symbiotic nitrogen fixation uncovered by

transcript profiling with cDNA arrays. Mol Plant Microbe Interact 15:

411–420

Cook DR (1999) Medicago truncatula a model in the making. Curr Opin

Plant Biol 2: 301–304

Deroche ME, Carrayol E (1988) Nodule phosphoenolpyruvate carboxylase:

a review. Physiol Plant 74: 775–782

Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated

sequencer traces using Phred. I. Accuracy assessment. Genome Res 8:

175–185

Fedorova M, Van De Mortel J, Matsumoto PA, Cho J, Town CD, Vanden-

Bosch KA, Gantt JS, Vance CP (2002) Genome-wide identification of

nodule-specific transcripts in the model legume Medicago truncatula.

Plant Physiol 130: 519–537

Food and Agriculture Organization of the United Nations (2001)

FAOSTAT Agriculture Data. http://www.fao.org/Statistics

Gantt JS, Larson RJ, Farnham MW, Pathirana SM, Miller SS, Vance CP

(1992) Aspartate aminotransferase in effective and ineffective alfalfa

nodules. Plant Physiol 98: 868–878

Ramırez et al.

1226 Plant Physiol. Vol. 137, 2005 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.

Page 17: Sequencing and Analysis of Common Bean ESTs.for ureide production. Resources developed in this project provide genetic and genomic tools for an international consortium devoted to

Gepts P (1998) Origin and evolution of common bean: past events and

recent trends. Hort Sci 33: 1124–1130

Graham MA, Silverstein KAT, Cannon SB, VandenBosch KA (2004)

Computational identification and characterization of novel genes from

legumes. Plant Physiol 135: 1179–1197

Graham PH, Vance CP (2003) Legumes: importance and constraints to

greater use. Plant Physiol 131: 872–877

Gyorgyey J, Vaubert D, Jimenez-Zurdo JI, Charon C, Troussard L,

Kondorosi A, Kondorosi E (2000) Analysis of Medicago truncatula

nodule expressed sequence tags. Mol Plant Microbe Interact 13: 62–71

Handberg K, Stougaard J (1992) Lotus japonicus, an autogamous, diploid

legume species for classical and molecular genetics. Plant J 2: 487–496

Journet EP, van Tuinen D, Gouzy J, Crespeau H, Carreau V, Farmer MJ,

Niebel A, Schiex T, Jaillon O, Chatagnier O, et al (2002) Exploring root

symbiotic programs in the model legume Medicago truncatula using EST

analysis. Nucleic Acids Res 30: 5579–5592

Lamblin AF, Crow JA, Johnson JE, Silverstein KA, Kunau TM, Kilian A,

Benz D, Stromvik M, Endre G, VandenBosch KA, et al (2003) MtDB:

a database for personalized data mining of the model legume Medicago

truncatula transcriptome. Nucleic Acids Res 31: 196–201

Lara M, Porta H, Padilla J, Folch J, Sanchez F (1984) Heterogeneity of

glutamine synthetase polypeptides in Phaseolus vulgaris L. Plant Physiol

76: 1019–1023

Lee HL, Hur CG, Oh CJ, Kim HB, Park SY, An CS (2004) Analysis of

the root nodule-enhanced transcriptome in soybean. Mol Cells 18:

53–62

Liao H, Yan X, Rubio G, Beebe SE, Blair MW, Lynch JP (2004) Basal root

gravitropism and phosphorus acquisition efficiency in common bean.

Funct Plant Biol 31: 959–970

McClean P, Kami J, Gepts P (2004) Genomic and genetic diversity in

common bean. In RF Wilson, HT Stalker, EC Brummer, eds, Legume

Crop Genomics. AOCS Press, Champaign, IL, pp 60–82

Morales M, Roig E, Monforte AJ, Arus P, Garcia-Mas J (2004) Single-

nucleotide polymorphisms detected in expressed sequence tags of

melon (Cucumis melo L.). Genome 47: 352–360

Ortega JL, Sanchez F, Soberon M, Lara M (1992) Regulation of nodule

glutamine synthetase by CO2 levels in bean (Phaseolus vulgaris L.). Plant

Physiol 98: 584–587

Quackenbush J, Cho J, Lee D, Liang F, Holt I, Karamycheva S, Parvizi B,

Pertea G, Sultana R, White J (2001) The TIGR gene indices: analysis of

gene transcript sequence in highly sampled eukaryotic species. Nucleic

Acids Res 29: 159–164

Russell J, Booth A, Fuller J, Harrower B, Hedley P, Machray G, Powell W

(2004) A comparison of sequence-based polymorphism and haplotype

content in transcribed and anonymous regions of the barley genome.

Genome 47: 389–398

Silvente S, Camas A, Lara M (2003) Molecular cloning of the cDNA

encoding aspartate aminotransferase from bean root nodules and de-

termination of its role in nodule nitrogen metabolism. J Exp Bot 54:

1545–1551

Temple SJ, Vance CP, Gantt JS (1998) Glutamate synthase and nitrogen

assimilation. Trends Plant Sci 3: 51–56

Tenaillon MI, Sawkins MC, Long AD, Gaut RL, Doebley JF, Gaut

BS (2001) Patterns of DNA sequence polymorphism along chromo-

some 1 of maize (Zea mays ssp. mays L.). Proc Natl Acad Sci USA 98:

9161–9166

Uhde-Stone C, Zinn KE, Ramırez-Yanez M, Li A, Vance CP, Allan DL

(2003) Nylon filters array reveal different gene expression in proteoid

roots of white lupin in response to phosphorus deficiency. Plant Physiol

131: 1064–1079

Yan X, Liao H, Beebe SE, Blair MW, Lynch JP (2005) Molecular mapping of

QTLs associated with root hairs and acid exudation as related to

phosphorus uptake in common bean. Plant Soil (in press)

Zhu YL, Song QJ, Hyten DL, Tassell CP, van Matukumalli LK, Grimm

DR, Hyatt SM, Fickus EW, Young ND, Cregan PB (2003) Single-

nucleotide polymorphisms in soybean. Genetics 163: 1123–1134

Transcriptomic Analysis of Common Bean

Plant Physiol. Vol. 137, 2005 1227 www.plantphysiol.orgon July 21, 2020 - Published by Downloaded from

Copyright © 2005 American Society of Plant Biologists. All rights reserved.


Recommended