Extensive Functional Diversiﬁcation of the Populus ... · Extensive Functional Diversiﬁcation...

Extensive Functional Diversification of the PopulusGlutathione S-Transferase Supergene Family C W

Ting Lan,a,b,1 Zhi-Ling Yang,a,b,1 Xue Yang,a,b Yan-Jing Liu,a Xiao-RuWang,a,c and Qing-Yin Zenga,2

a State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing

100093, ChinabGraduate School, Chinese Academy of Sciences, Beijing 100049, Chinac Department of Ecology and Environmental Science, Umea Plant Science Centre, Umea University, SE-901 87 Umea, Sweden

Identifying how genes and their functions evolve after duplication is central to understanding gene family radiation. In this

study, we systematically examined the functional diversification of the glutathione S-transferase (GST) gene family in

Populus trichocarpa by integrating phylogeny, expression, substrate specificity, and enzyme kinetic data. GSTs are

ubiquitous proteins in plants that play important roles in stress tolerance and detoxification metabolism. Genome

annotation identified 81 GST genes in Populus that were divided into eight classes with distinct divergence in their

evolutionary rate, gene structure, expression responses to abiotic stressors, and enzymatic properties of encoded proteins.

In addition, when all the functional parameters were examined, clear divergence was observed within tandem clusters and

between paralogous gene pairs, suggesting that subfunctionalization has taken place among duplicate genes. The two

domains of GST proteins appear to have evolved under differential selective pressures. The C-terminal domain seems to

have been subject to more relaxed functional constraints or divergent directional selection, which may have allowed rapid

changes in substrate specificity, affinity, and activity, while maintaining the primary function of the enzyme. Our findings

shed light on mechanisms that facilitate the retention of duplicate genes, which can result in a large gene family with a

broad substrate spectrum and a wide range of reactivity toward different substrates.

INTRODUCTION

In eukaryotes, most structural and regulatory genes are mem-

bers of gene families that vary in size and genomic organization.

In plant genomes, high frequencies of gene duplication are

observed due to the frequent occurrence of genomic segmental

duplications and polyploidization (Arabidopsis Genome Initia-

tive, 2000; Tuskan et al., 2006). Duplication of individual genes,

chromosomal segments, or whole genomes have long been

thought to supply raw genetic material, allowing functional

divergence and rapid biological evolution (Ohno, 1970; Lynch

and Conery, 2000). Hence, understanding the genomic and

functional evolution of gene families is essential for understand-

ing the phenotypic diversification of organisms and their genetic

systems.

Various functional fates have been proposed for duplicated

genes (Ohno, 1970; Hughes, 1994; Force et al., 1999; Moore and

Purugganan, 2005), including (1) retention of the original gene

function; (2) loss of gene function by pseudogenization (non-

functionalization); (3) acquisition of a novel function through

neofunctionalization; or (4) partitioning of the ancestral gene

function by subfunctionalization. Theoretical and experimental

studies have advanced our understanding of the possible reten-

tion mechanisms of duplicate genes, but large amounts of

comparative biochemical or physiological data are required to

reconstruct the evolutionary steps that have resulted in the

functional diversification of gene families. Investigating the func-

tional divergence of a whole family of genes is technically

challenging even for model organisms. Hence, sufficient infor-

mation has only been acquired for few gene families to date, and

patterns of functional diversification and genetic factors govern-

ing the evolution ofmost classes of gene families in plants remain

largely unknown.

Glutathione S-transferases (GSTs; EC 2.5.1.18) are multifunc-

tional proteins encoded by a large gene family found in all cellular

organisms. In plants, GSTs are divided into seven classes: theta,

zeta, phi, tau, lambda, glutathione-dependent dehydroascor-

bate reductase (DHAR), and tetrachlorohydroquinone dehalo-

genase (Smith et al., 2004; Basantani and Srivastava, 2007).

Recently, based on structural similarities, the g-subunit of the

eukaryotic translation elongation factor 1B (EF1Bg) has also

come to be regarded as a member of the GST family (Jeppesen

et al., 2003; Oakley, 2005). In Arabidopsis thaliana, the GST gene

family consists of 53members (Dixon et al., 2002). Detoxification

of xenobiotics is considered to be the main function of plant

GSTs, but other functions include protecting cells from a

wide range of biotic and abiotic stressors, including pathogen

attack, heavy metal toxins, oxidative stress, and UV radiation

The author responsible for distribution of materials integral to thefindings presented in this article in accordance with the policy describedin the Instructions for Authors (www.plantcell.org) is: Qing-Yin Zeng([email protected]).1 These authors contributed equally to this work.2 Address correspondence to [email protected] figures in this article are displayed in color online but in blackand white in the print edition.WOnline version contains Web-only data.www.plantcell.org/cgi/doi/10.1105/tpc.109.070219

This article is a Plant Cell Advance Online Publication. The date of its first appearance online is the official date of publication. The article has been

edited and the authors have corrected proofs, but minor changes could be made before the final version is published. Posting this version online

reduces the time to publication by several weeks.

The Plant Cell Preview, www.aspb.org ã 2009 American Society of Plant Biologists 1 of 18

(Kampranis et al., 2000; Loyall et al., 2000; Mueller et al., 2000;

Agrawal et al., 2002). Since they are enzymatic proteins, studies

of GST functional evolution can exploit the power afforded by

combining genomic structure, gene expression, and enzyme

biochemical analyses of GST variants in attempts to elucidate

the effects of molecular changes on gene function.

Analysis of thePopulus trichocarpa genome suggests awhole-

genome duplication event occurred recently (in evolutionary

terms) on the stem lineage of the Salicaceae, ;60 to 65 million

years ago, in addition to another, much more ancient large-scale

duplication event shared by Populus and Arabidopsis (Tuskan

et al., 2006). The complex history of genome duplications and

chromosomal rearrangements in Populus provides an opportu-

nity to study the patterns of gene family expansion in the course

of genome evolution. In this study, we conducted a genome-

wide annotation of the GST gene family in Populus to identify the

genetic events responsible for its expansion and organization.

Functional diversification of the gene family was characterized

by examining the gene expression responses to abiotic stresses

and enzymatic properties of the encoded proteins. By combining

phylogenetic reconstruction and functional assays, we quanti-

tatively addressed the complex patterns of diversification of

the gene family at three levels: among GST classes within the

family, among members within tandem-arrayed GST clusters,

and between pairs of paralogous duplicate genes. The genome-

wide, multifaceted approach we employed provides new

insights into the mechanisms of gene family expansion and

functional evolution.

RESULTS

Large GST Gene Family in Populus

A total of 81 full-length genes encoding putative GST proteins

were identified in the P. trichocarpa genome (see Supplemental

Table 1 online), among which nine sequences were considered

to be putative pseudogenes based on the presence of a frame

shift disrupting the coding region or a stop codon occurring

prematurely, resulting in a truncated protein. After removing

these stop codons or revising the frame shifts by deleting one or

two nucleotides, these nine full-length sequences were included

in the phylogenetic and gene expression analyses. The predicted

proteins encoded by these 81 genes were initially classified

based on the National Center for Biotechnology Information’s

(NCBI) conserved domain analysis, which divided them into eight

classes. The tau and phi GSTs were the most numerous, being

represented by 58 and nine members, respectively. The lambda,

DHAR, and EF1Bg GST classes were each represented by three

members, both the zeta and theta classes by two members, and

the tetrachlorohydroquinone dehalogenase class by just one

member.

Phylogenetic relationships among the Populus GSTs were

reconstructed using a maximum likelihood (ML) procedure. On

theML tree, the 81 GSTs were grouped into eight distinct groups

with high bootstrap support (Figure 1A). These groups corre-

sponded to the classes identified by domain structures. Striking

gene structure conservation was found within each GST class.

All 58 tau GST genes contain a single intron at a conserved

position, except for GSTU23, 47, and 48, all of which have an

insertion that disrupts their N-terminal domain, making them

appear to be pseudogenes (Figure 1C). Intron length varies from

76 bp in GSTU17 to 1106 bp in GSTU8. All of the nine phi GST

genes have a two-intron/three-exon structure (Figure 1C) with

exons of similar length and a highly conserved first intron

position, except for the putative pseudogene GSTF9, which

has a shorter sequence. In contrast with the tau and phi GSTs,

the gene structures of the other minor GST classes are more

variable (Figure 1C); members of the lambda and zeta classes

contain nine exons, while members of the theta and DHAR have

seven and six exons, respectively. The EF1BgGSTgenes consist

of two parts: a GST domain and an EF1Bg domain. In their GST

domain, five introns were observed. The class-specific gene

structure further supports the subfamily designations among the

81 GSTs.

In addition to full-length GST genes, 50 partial GST fragments

(43 tau type, three phi type, two DHAR type, one theta type, and

one lambda type) were identified in the Populus genome (see

Supplemental Table 2 online); these were considered to be

pseudogenes. The length of these fragments ranged from 36 to

175 amino acid residues. Domain structure analysis identified

nine fragments that contained both partial N- and C-terminal

domain sequences; 19 had only a partial N-terminal domain and

22 had only a partial C-terminal domain. We were unable to

analyze the phylogenetic relationships of these fragments reli-

ably because of their small size. These short and seemingly

random remnants likely reflect more ancient pseudogenization

events.

Genomic Organization of the Populus GST Gene Family

The physical locations of 66 of the 81 full-length GSTs were

assigned to 15 of the 19Populus chromosomes (Figure 2A), while

the other 15 were assigned to 14 as-of-yet unattributed scaffold

fragments (see Supplemental Table 1 online). Of the 50 GST

fragments, 32 were assigned to one of nine chromosomes and

the other 18 on 16 scaffolds (see Supplemental Table 2 online).

The distribution of the GST genes among the chromosomes

appears to be uneven: chromosomes 7, 9, 17, and 18 harbor no

GST genes or GST fragments, while relatively high densities of

GSTs were discovered in some locations on chromosomes 1, 8,

10, 11, 14, and 19, where GSTs are arranged in clusters (Figure

2A). These clusters consist mainly of tau and phi GSTs; 37 of the

full-length tau GSTs and 20 tau-type fragments are arranged in

six clusters (clusters I to V and VII) on five chromosomes (1, 8, 10,

11, and 19), and four phi GSTs are organized in one cluster

(cluster VI) on chromosome 14. Members of the minor GST

classes are sparsely distributed at single loci on different chro-

mosomes.

Previous analysis of the Populus genome has identified

paralogous segments created by the whole-genome duplication

event in the Salicaceae (salicoid duplication), ;60 to 65 million

years ago (Figure 2A) (Tuskan et al., 2006). The distributions of

GST genes relative to the duplicate genomic blocks are illus-

trated in Figure 2A. Of the 66 mapped GSTs, only 15 (clusters IV

and VI,GSTU22, 34, 35, 41, 45,GSTL2, and EF1Bg1) are located

2 of 18 The Plant Cell

Figure 1. Phylogenetic Relationships among Populus GSTs, Their Expression Patterns, and Gene Structure.

Functional Divergence of Poplar GSTs 3 of 18

outside any duplicate blocks. Four duplicate pairs (GSTU18/24,

GSTU46/26, DHAR2/3, and GSTT1/2) are each located in a pair

of paralogous blocks and can be considered to be direct results

of the duplication event (Figure 2A). Similarly, tau class cluster

pairs I/V and II/III correspond to paralogous blocks created by

the salicoid duplication. The other two smaller clusters, IV and

VII, lack corresponding duplicates (Figure 2A). In contrast with

cluster IV, which is located in a single copy genome block on

chromosome 11, cluster VII is located in a region that is thought

to have been duplicated. However, no GSTs were found in the

corresponding duplicate block, suggesting that the correspond-

ing homolog may have been deleted after the duplication event.

This observation corroborates the trend that the most abundant

gene losses in eukaryotes occur following whole-genome dupli-

cation (Lynch, 2007). In contrast with the tau class, the expansion

of the phi class seems not to have been affected by the salicoid

duplication event. The only phi cluster, containing GSTF4, 5, 6,

and 7 (cluster VI), is located in a nonduplicated region on

chromosome 14.

Among the seven clusters, cluster I is the largest, consisting of

11 full-length tau and eight tau-type fragments arranged in

tandem in a 119-kb region on chromosome 1. There are complex

gene orientations among members of this cluster (Figure 2B).

Cluster II, with four tau genes and four tau-type fragments, spans

a region of 40 kb on chromosome 8. Nine tau and seven tau-type

fragments are organized into cluster III, located in a 57-kb region

on chromosome 10. On chromosome 11, there are two tau

clusters (IV and V), one with four genes arranged in a head-to-

head tandem and the other with six genes arranged in the same

orientation in a nearby region. Cluster VII on chromosome 19 is

small and containsGSTU15, 16, and 17. The phi cluster (VI), with

four members tandemly arranged in the same orientation, spans

a 34-kb region on chromosome 14 (Figure 2B).

We attempted to reconstruct the expansion history in each

cluster by reconciliation of both the gene tree and the positions of

genes within clusters. The most parsimonious scenario for gene

duplication, loss, and rearrangement is presented in Figure 3.

The events that led to the expansion of the four large clusters (I, II,

III, and V) appear to have been complex, likely involving seg-

mental duplication followed by a series of tandem duplications

and rearrangements. Cluster I is located in a transposon-rich

region (Figure 2B); hence, transposable elements could have

driven gene duplications by inadvertently carrying copies of

genes during transposition events and/or by facilitating unequal

crossovers (Hancock, 2005). The large number of pseudogene

fragments in this cluster also makes it difficult to delimit the

detailed evolutionary steps leading to its present composition

and structure. The expansion history of cluster V, on the other

hand, is simpler and seems to have involved four rounds of

tandem duplications. Judging from the gene tree and the struc-

ture of cluster II and III, it is likely that a three-gene cluster was

duplicated by the whole-genome duplication event. After that,

four rounds of tandem duplications and a possible rearrange-

ment took place in cluster III, and three rounds of tandem

duplications probably occurred in cluster II (Figure 3A). The other

two smaller tau clusters, IV and VII, formed two well-supported

clades (Figure 3A). Each of these two clusters may be created by

two rounds of localized duplication events. Phylogenetic rela-

tionships amongmembers of the phi cluster suggest an ancestral

copy gave rise to the progenitors ofGSTF3/7 andGSTF4, 5, and

6. GSTF4, 5, and 6 formed an array with the same orientation

(Figure 2B) generated by tandem duplications (Figure 3B). The

grouping of GSTF3 and 7 into one well-supported branch sug-

gests that either a segmental duplication between the two loci or

a transposition or retroposition event occurred.

Molecular Evolution

GST proteins consist of two well-defined domains, the

N-terminal domain that binds the primary substrate glutathione

(GSH) and the C-terminal domain that binds the secondary

substrate (Edwards et al., 2000). To test for deviation in the

substitution pattern of the two domains, we partitioned the 58

tau GST sequences into N- and C-terminal regions (the linker

between the two domains was not included in this analysis).

The ratio of nonsynonymous versus synonymous substitutions

(v = dN/dS) is an indicator of the history of selection acting on a

gene or gene region. Ratios significantly <1 are suggestive of

purifying selection, whereas ratios >1 suggest directional selec-

tion. A plot of dN/dS for the N- versus C-terminal domains is

shown in Figure 4A. The results suggest that both domains

have evolved mainly under the influence of purifying selection,

but the selective constraint on the C-terminal domain was more

relaxed than on the N-terminal domain (t tests, P < 0.001). This

pattern was also evident in the phi GST class (Figure 4B) (t tests,

P < 0.003).

To infer the influence of selection on the expansion of the tau

and phi classes, we estimated v (=dN/dS) values for all branches

usingMLcodonmodels. Two assumptionswere tested: the fixed

one-ratio model that assumes the same v ratio for all branches

and the free-ratio model that assumes an independentv for each

branch in the gene tree. The log-likelihood values under the one-

ratio model were ln L = 220,771.8458 and 26239.9762, with

estimates of v = 0.27562 and 0.24328 for the tau and phi

GST classes, respectively. Under the free-ratio model, the log-

likelihood values were ln L = 220,613.6203 and 26217.2478 for

Figure 1. (continued).

An ML procedure using the JTT model and 1000 bootstrap replicates was used in the phylogeny reconstruction; putative pseudogenes are indicated

with asterisks; numbers at each node in the ML tree signify bootstrap values. GST genes designated as GSTU, F, T, Z, and L correspond to tau, phi,

theta, zeta, and lambda class GSTs, respectively. The unmasked and masked sequence alignments used to reconstruct the ML tree are available as

Supplemental Data Sets 2A and 2B online, respectively. In (B), the green box indicates positive detection of gene expression in the corresponding tissue

under normal growth conditions (NC) and following CDNB (CD), H2O2 (HO), and atrazine (AT) treatments. In (C), the GST N-terminal domain, C-terminal

domain, and EF1Bg domain are highlighted by the blue, purple, and gray boxes, respectively, while introns are indicated as lines.


the tau and phi GST genes, respectively. Likelihood ratio tests

(LRTs) of the two assumptions indicated that the free-ratiomodel

was significantly more likely (P < 0.01) for both the tau and phi

GST classes, suggesting that selective pressure varied among

branches. The estimated v values of all the branches of the tau

and phi GST gene trees were <1, suggesting that the tau and phi

GSTs have been under purifying selection.

Within the tau class, the 58 members were divided into two

clades, each with high bootstrap support (see Supplemental

Figure 1 online). To determine if there was significant change in

Figure 2. Genomic Localization of Full-Length Populus GST Genes.

(A) Schematic diagram of chromosome-level reorganization of the most recent genome-wide duplication in Populus (adapted from Tuskan et al., 2006).

Regions that are assumed to correspond to homologous genome blocks are shaded gray and connected with lines. Paralogous GST genes and clusters

are indicated by dashed lines within the gray-filled trapezoids.

(B) Detailed distribution of the GST genes and pseudogene fragments in clusters I to VII.


Figure 3. Hypothetical Evolutionary Histories of the Populus Tau and Phi GSTs in Each Cluster.

The most parsimonious scenario for gene duplication, loss, and rearrangement was deduced by reconciliation of both the gene tree and the physical

positions of genes within clusters. The tree was reconstructed using an ML procedure with the JTT model and 1000 bootstrap replicates. Numbers at

each node in the ML tree signify bootstrap values. The unmasked and masked sequence alignments used to reconstruct the ML trees are available as

Supplemental Data Sets 3A, 3B, 4A, and 4B online. The letters T, W, and R in the schematic diagram of hypothetical origins of GST genes indicate

putative tandem duplication, whole-genome duplication, and rearrangements, respectively.


the rate of evolution between the two clades, a global clock

model and a local clock two-rate model, one for each clade,

were tested. The log likelihood values under the global clock

model and two-rate model were ln L = 221,062.213421 and ln

L =221,058.829828, respectively. LRTs of the two assumptions

indicated that a global clock should be rejected by the two-rate

clock (P < 0.01), suggesting that evolutionary rate is different

between the two clades: clade II evolved 1.5 times faster than

clade I. To test if there were changes in selective pressure

between the two clades, a two-ratio model in which the two

clades were each assigned to a different ratio (Table 1) was

explored. LRTs indicated that the null model, where the two

clades have the same ratio, could not be rejected, thus suggest-

ing that selective pressure has been similar between the two

clades after the ancestral gene duplication.

The site model analyses indicated that log-likelihood values

were significantly higher under the M8 model than under the M7

model for the tauGST genes, but not the phi GST genes (Table 1).

In the tau class, 10 amino acids were identified by Bayes

Empirical Bayes analysis (Yang et al., 2005) as candidates for

positively selected sites (Table 1), including two with posterior

probabilities >0.95 (alignment positions 234 and 237 in Supple-

mental Figure 2 online). With one exception (alignment position

43 in Supplemental Figure 2 online), the amino acids were

located in the C-terminal domain of the protein.

In Arabidopsis, the GST gene family consists of 53 members,

of which 28 are tau class (Dixon et al., 2002). Joint analysis of tau

GSTs from Populus and Arabidopsis revealed three major

Populus-specific and Arabidopsis-specific clades, respectively

(Figure 5). This suggests that independent expansion of the tau

class has occurred in the two lineages and at least 51 novel tau

GSTs in Populus have been acquired, compared with 21 in

Arabidopsis, after the most recent common ancestor of Populus

and Arabidopsis 100 to 120 million years ago. Such a gene birth

rate is much higher than the average rate of one gene per 100

million years reported for eukaryotes (Lynch and Conery, 2000).

In addition, 43 tau-type pseudogene fragments were discovered

inPopulus, in contrast with very few inArabidopsis. These results

suggest that tau GSTs have been subject to faster rates of

duplication in Populus than in Arabidopsis. This high duplication

rate is partly attributable to the salicoid whole-genome duplica-

tion event followed by localized tandem duplications, since 69%

of the tauGST genes and 56%of the tau fragments are located in

the duplicated genome blocks. The existence of more fragments

and pseudogenes in Populus could be indicative of a lower rate

of complete deletion or divergence beyond the threshold for

detection of homology. Arabidopsis and close relatives with

small genomes have been suggested to possess as yet uniden-

tified mechanisms impeding amplification, and/or they have an

efficient mechanism of continuous removal of amplified se-

quences to counteract their proliferation (Lysak et al., 2009).

Structure Modeling

Protein three-dimensional structure modeling illustrated that, in

general, the structures of Populus tau and phi GSTs are con-

served, especially in the N-terminal domain (Figure 6). The

simulated structures of 55 tau and eight phi GSTs were super-

imposed to evaluate the goodness of fit of the overall topologies

(Figure 6). This examination showed that all memberswithin each

class shared the same conformation of the structural elements of

a-helices and b-sheets, but structural modifications are present

in loop regions of the C-terminal domain and in the linker

between the N- and C-terminal domains, where the structural

conservation is relatively weak. The protein sequence differen-

tiation in these loop regions is high among members of both the

tau and phi classes (see Supplemental Figures 2 and 3 online)

due to either relaxed functional constraints or divergent direc-

tional selection. In fact, the majority of the putative positively

selected sites (7 out of 10) are in the loops of the C-terminal

domains (Figure 6).

Expression of GST Genes under Normal Growth Conditions

and Abiotic Stress

The expression patterns of GST gene family members were

examined by RT-PCR and quantitative RT-PCR under nor-

mal growth conditions and in response to stress treatments

(1-chloro-2,4-dinitrobenzene [CDNB], H2O2, and atrazine appli-

cations). Substantially more variation in expression patterns was

found among the tau class members than among members of

the other minor GST classes (Figure 1B). Of the 58 tau GSTs,

23 were expressed in all tissues under all growth conditions,

while nine (GSTU4, 5, 10, 23, 41, 47, 48, 53, and 57) were neither

expressed in any tissue nor in response to any treatment applied

in this study. Thus, these nine genes are expressed at subde-

tectable levels, or they are only induced in response to treat-

ments and/or in tissues not examined in our study, or they are

pseudogenes. The other 26 tau members were selectively ex-

pressed either in response to a specific treatment and/or in a

specific tissue. For instance, under normal growth conditions

GSTU6, 25, and 37 showed root-specific expression, while

under the stress treatments, they were expressed in all five

examined tissues. GSTU7, 9, 32, 33, 42, 43, and 44 expression

was not detected under normal growth conditions, but they were

expressed in response to treatment with CDNB, H2O2, or atra-

zine, indicating that they are expressed in stress responses. The

expression patterns of phi and other minor GST classes were

more homogenous: except for GSTF3 and 9, all of these genes

Figure 4. dN/dS Plots for the C-Terminal Domain versus the N-Terminal

Domain of 58 Tau and Nine Phi GSTs.

(A) Tau GSTs.

(B) Phi GSTs.


were expressed in all tissues under all growth conditions (Fig-

ure 1B).

For each of the seven tandem-arrayed GST clusters, marked

expression divergence was found among the members. In clus-

ter I, which consists of 11 tau GSTs, three genes (GSTU1, 3, and

8) were expressed in all tissues, no transcripts of three others

(GSTU4, 5, and 10) were detected in any of the tested tissues,

and the other five (GSTU2, 6, 49, 7, and 9) showed restricted

tissue-specific expression patterns under normal and/or stress

conditions (Figure 1B). Furthermore, although GSTU1, 3, and 8

were expressed in all tested tissues under all conditions, their

expression levels in response to the stress treatments differed

significantly (Figure 7): GSTU1 was upregulated 15- to 38-fold in

shoots in response to all three treatments; GSTU8 expression

increased 77- and 105-fold in leaves in response to the H2O2 and

CDNB treatments, respectively; GSTU3 expression increased

25-fold in leaves in response to atrazine and 9- to 15-fold in

shoots in response to H2O2 and atrazine.

Similarly, the four phi GSTs (GSTF4, 5, 6, and 7) in cluster VI,

which were expressed in all tissues under all growth conditions,

differed significantly in expression levels in response to the

different treatments (Figure 7). In general, most of these genes

were upregulated in the tested tissues, for example, in leaves

GSTF4, 5, and 6, expression increased 69- to 141-fold in re-

sponse to CDNB, relative to the control, whileGSTF7 expression

increased only fivefold. In response to H2O2 and atrazine treat-

ments, expression levels of these genes increased by 24- to

85-fold and 16- to 88-fold, respectively. In shoots, GSTF4

showed the strongest response to the treatments, while in roots,

GSTF6 expression changed only slightly, but GSTF5 and 7 were

upregulated 20- to 86-fold. These results indicate that rapid

divergence has occurred in the regulatory regions of the GST

genes in the same clusters.

For duplicate gene pairs, four categories of expression pattern

were observed. In the first category, found in 12 gene pairs, both

of the duplicates were expressed in all tissues under normal

growth and stress treatments (AA model in Table 2). In the

second category, occurring in five gene pairs, one copy was

expressed, while the other was not detected in any tissue type

under any growth conditions (AN model in Table 2), suggesting

that one duplicate gene may have became a pseudogene or

evolved a new function not identified in our study. In the third

category, found in seven gene pairs, one copy of each duplicate

pair was expressed in all tissues under all growth conditions,

Table 1. Summary Statistics for Detection of Selection Using Branch and Site Models of PAML

Model Estimates of Parametersa -ln L x2 P

Positively Selected

Sitesb

Site model

Tau GSTs M0 0.28513 19,313.35

M1a

(nearly neutral)

p0 = 0.68910, p1 = 0.31090

v0 = 0.20084, v1 = 1.00000

18,918.64

M2a

(positive selection)

p0 = 0.68910, p1 = 0.15705, p2 = 0.15385

v0 = 0.20084, v1 = 1.00000, v2 = 1.00000

18,918.64 0

M3 (discrete) p0 = 0.32505, p1 = 0.46802, p2 = 0.20692

v0 = 0.08436, v1 = 0.30996, v2 = 0.91017

18,801.71 1,023.28 <0.001

M7 (b) P = 0.89944, q = 1.62442 18,801.59

M8 (b & v) p0 = 0.90574, P = 1.24556, q = 3.15446

(p1 = 0.09426) v = 1.28108

18,779.36 44.46 <0.001 43, 114, 115, 116,

122, 123, 233, 234*,

236, 237*

Phi GSTs M0 v = 0.22213 4,832.43

M1a

(nearly neutral)

p0 = 0.59667, p1 = 0.40333

v0 = 0.15845, v1 = 1.00000

4,788.14

M2a

(positive selection)

p0 = 0.59667, p1 = 0.23067, p2 = 0.17266

v0 = 0.15845, v1 = 1.00000, v2 = 1.00000

4,788.14 0

M3 (discrete) p0 = 0.16766, p1 = 0.41839, p2 = 0.41395

v0 = 0.00759, v1 = 0.17187, v2 = 0.48703

4,750.95 162.96 <0.001

M7 (b) P = 0.81835, q = 1.97496 4,754.71

M8 (b and v) p0 = 0.99999, P = 0.81836, q = 1.97502

(p1 = 0.00001) v = 3.98485

4,754.71 0

Branch model

One ratio v = 0.28513 for all branches 19,313.35

Two ratios v0 = 0.26811 for clade II

v1 = 0.30691 for clade I

19,311.78 3.14

Two ratios v0 = 0.49690 for clade I

v1 = 0.28427 for clade II

19,312.76 1.18

*Posterior probability >95% of having v>1.aThe proportion of sites (p0, p1, etc.) estimated to have v0, v1, etc.bThe numbering of residues identified by Bayes empirical bayes analysis (Yang et al., 2005) corresponds to their alignment positions in Supplemental

Figure 2 online. Clade I and clade II in branch model refer to the two clades of tau class as shown in Supplemental Figure 1.


while the other was expressed only following a specific treatment

in a specific tissue (AI model in Table 2). In the fourth category,

found in four gene pairs, both duplicates showed selective

expression under different tissue treatment combinations (SE

model in Table 2). Interestingly, duplicate gene pairs created by

the whole-genome duplication event or segmental duplication all

fell within the first category, with both of the duplicates showing a

similar expression pattern. Duplicate pairs created by tandem

duplication, on the other hand, spanned all four categories of

expression pattern (e.g., GSTU20/21, 9/4, 2/8, and 28/29).

Substrate Specificity and Activity of GST Enzymes

Forty-four PopulusGSTs with differing evolutionary relationships

and histories, including 30 tau, seven phi, three DHAR, two

lambda, one theta, and one EF1Bg GSTs, were selected for

protein expression and purification (Table 3). All of these GSTs

were expressed as soluble proteins inEscherichia coli, except for

three tau genes (GSTU7, 18, and 32), two lambda genes (GSTL1

and 2), and the GSTT2 gene, which were expressed as inclusion

bodies.

The substrate specificity of the purified Populus GSTs was

investigated to identify catalytic activities that may be related

to their biological function, in assays with seven substrates:

CDNB, 7-chloro-4-nitrobenzo-2-oxa-1,3-diazole (NBD-Cl), 1,2-

dichloro-4-nitrobenzene (DCNB), 4-nitrobenzyl chloride (NBC),

ethacrynic acid (ECA), 4-nitrophenyl acetate (4-NPA), and dehy-

droascorbate (DHA). Of the 30 purified tau GSTs listed in Table 3,

26 showed specific activity toward CDNB, 25 toward NBD-Cl, 13

toward DCNB, 12 toward NBC, and three toward 4-NPA, in

accordance with the frequent use of CDNB and NBD-Cl for the

detection and determination of plant tau GST activity. Each of

two tau GSTs had enzymatic activity toward five substrates, six

toward four substrates, 12 toward three substrates, and 10

toward only one to two substrates. Of the seven purified phi

GSTs, all had activity toward CDNB and NBD-Cl, three toward

both NBC and ECA, and none toward either DCNB or 4-NPA.

The DHAR GSTs showed distinct diversification in enzyme

specificity from the other GST classes; they showed high activity

toward DHA, but no activity toward any of the other six sub-

strates used in this study (Table 3). The GST domain in EF1Bg1

showed GSH-conjugating activity with CDNB and DCNB. The

lambda and theta GSTs were almost inactive toward all the

substrates used in our study.

The enzyme assays revealed large variations in specific activ-

ities toward different substrates among the tandem-arrayed

Figure 5. Phylogenetic Relationships of the Populus and Arabidopsis

Tau GST Proteins.

Accession numbers for the Arabidopsis tau GSTs are presented in

Supplemental Table 6 online. Pinus taeda lambda GSTs (GenBank

accession numbers CV034086 and DR019281) and a Picea glauca

lambda GST (GenBank accession number EX306134) were used as

outgroups. The tree was reconstructed using an ML procedure with the

JTT model and 1000 bootstrap replicates. Numbers at each node in the

ML tree signify bootstrap values. The sequence alignments used to

reconstruct the ML tree are available as Supplemental Data Sets 5A and

5B online.

[See online article for color version of this figure.]


members in the clusters. Among the seven examined members

of cluster I, the tau GST enzyme activity toward CDNB and

NBD-Cl varied from 0 to 12.9 mmol/min per mg and from 0 to

6.93 mmol/min per mg, respectively. Four members of this

cluster (GSTU1, 2, 8, and 9) each showed activity toward four

of the substrates, two members (GSTU3 and 4) showed activity

toward three substrates, and GSTU7 showed activity toward

only one substrate. GSTU8 and 9 had a similar substrate spec-

trum, but their specific activity toward each substrate varied

twofold to ninefold. Thus, diversification in enzyme specificity

and activity toward different substrates has apparently evolved

among GSTs in the tandem arrays. This pattern was also ob-

served in tau cluster III and V and phi cluster VI. Of the phi GSTs:

although all were reactive with CDNB and NBD-Cl, their enzy-

matic activities varied 24- and 38-fold, from 0.15 to 3.62 mmol/

min per mg protein for CDNB and from 0.06 to 12.95 mmol/min

per mg protein for NBD-Cl (Table 3).

For duplicate gene pairs, differentiation in enzyme specificity

could be categorized into three groups. In the first, the two

duplicates showed a similar substrate spectrum, but differed by

maximal 56-fold (GSTU18/24 toCDNB) in their enzymatic activity

toward each substrate (SS model in Table 2). In the second,

which is more encountered, the two duplicates showed a par-

tially overlapping substrate spectrum (PS model in Table 2),

suggesting that partial subfunctionalization occurred after gene

duplication. In the third, the two duplicates have evolved a

nonoverlapping substrate spectrum (NS model in Table 2),

indicating that significant functional diversification in enzyme

substrate specificity has occurred. Just one duplicate pair

(GSTU26/46) showed this last pattern; GSTU26 was only reac-

tive with CDNB, while GSTU46 was only reactive with ECA.

Kinetics of the Conjugation Reaction

Bydefinition, all GST enzymes catalyze reactions inwhichGSH is

conjugated to a second substrate. The catalytic properties of 29

selected GST enzymes examined here were characterized in

kinetic studies using CDNB, NBD-Cl, and DHA as the second

substrate for tau, phi, and DHAR GSTs, respectively, and the

resulting kinetic parameters are listed in Table 4. For all tau and

phi GSTs, except GSTU4, GSTU16, and GSTF4, the apparent

KmGSH values fell within the range of 0.1 to 0.8mM, indicating that

they have similar, high affinities for GSH. This is in accordance

with the highly conserved nature of the N-terminal domain of the

GSTs (see Supplemental Figures 2 and 3 online), where the

enzyme conjugates GSH. The catalytic efficiency (kcat/Km) for

GSH, however, varied significantly among all GSTs (e.g.,

1849-fold among tau and 1227-fold among phi GSTs) (Table 4).

TheC-terminal domain ofGSTs, where the enzymebinds the 2nd

hydrophobic substrate, is less conserved in plants (see Supple-

mental Figures 2 and 3 online). Accordingly, pronounced varia-

tions in substrate affinities (KmCDNB and Km

NBD-Cl) and catalytic

efficiency (kcat/Km) were observed among the tau GSTs (KmCDNB

0.15 to 123.6 mM and kcat/KmCDNB 0.46 to 4152.87 mM21 s21)

and phi GSTs (KmNBD-Cl 0.31 to 2.10 mM, kcat/Km

NBD-Cl 4.39 to

1361.25 mM21 s21), suggesting that their C-terminal domains

have different conformations.

The tandem-arrayed GSTs varied in both their affinity and

catalytic efficiency toward different substrates. For instance,

among the six tau GSTs (GSTU1, 2, 3, 4, 8, and 9) examined in

cluster I, KmCDNB varied ;42-fold, and their catalytic efficiency

(kcat/Km) for GSH and CDNB varied ;224-fold and 488-fold,

respectively. Similar diversification patterns in enzyme kinetic

parameters were observed for GSTU11, 12, and 14 in cluster III

and GSTU15, 16, and 17 in cluster VII and phi cluster VI.

For duplicate gene pairs, three divergent patterns in kinetic

properties for hydrophobic (2nd) substrates were observed. In

the first, the two duplicates had similar Km values but differed

markedly in their catalytic efficiency (kcat/Km) due to pronounced

differences in their kcat values. For example, GSTF1/2 showed

similar affinity (KmNBD-Cl) to NBD-Cl, but the catalytic efficiency

of GSTF1 was 156-fold higher than that of GSTF2; similar

patterns applied to the duplicate pair GSTU3/54. In the second

pattern, there was differentiation in the affinity for a substrate

between the two duplicates, which correlated positively with

their catalytic efficiency. Duplicate pairs GSTU51/22, GSTU4/9,

GSTU2/8, and GSTU11/12 fell within this category of kinetic

properties. In the third pattern, the two duplicates differed in

their affinity for a substrate, and their catalytic efficiencies

showed an inverse relationship with affinity; i.e., one had lower

affinity (higher Km) but higher catalytic efficiency (higher kcat/Km)

due to compensation by a higher turnover rate (kcat). Duplicate

pairs GSTU40/55, GSTU15/17, GSTF5/6, and DHAR2/3 dis-

played this pattern.

Figure 6. Structural Analysis of Populus GSTs.

(A) Structural superposition of the 28 tau GSTs in clade I (except for

GSTU32 and 49) from Supplemental Figure 1 online.

(B) The 27 tau GSTs in clade II (except for GSTU23) from Supplemental

Figure 1 online.

(C) All 55 tau GSTs in (A) and (B).

(D) Eight phi GSTs (GSTF1 to 8). In each case, the N- and C-terminal

domain is illustrated in green and red, respectively, and the linker

between the two domains is shown in white. The positions of the

putative positively selected sites listed in Table 1 are illustrated in yellow.


DISCUSSION

Functional diversification among gene family members is viewed

as an important source of evolutionary innovation in complex

organisms, and various theoretical models have been proposed

to explain the mechanisms involved (Ohno, 1970; Hughes, 1994;

Force et al., 1999; Walsh, 2003; Moore and Purugganan, 2005).

The most plausible models proposed for the retention of dupli-

cated genes invoke sub- or neofunctionalization. Nevertheless,

some previous studies have found these models to be inade-

quate because many duplicate genes have been found to have

little divergence in their sequence and expression (Barakat et al.,

2009; Matsumura et al., 2009) or substrate specificity (Zhang

et al., 1998; Dean et al., 2008). This has led to the assumption that

the duplicate genes could be redundant in some cases. How-

ever, these studies have not examined all axes of potential

subfunctionalization due to the difficulties associated with ac-

quiring sufficient systematic biochemical data. Relatively few

studies have empirically analyzed the functional diversification of

a gene family at different levels of genomic organization: among

subfamily classes, within tandem clusters, and in paralogous

gene pairs. Our study systematically explored the radiation of a

gene family while integrating phylogenetic, expression pattern,

substrate specificity, and enzyme kinetic data. Through this

integrated approach, we demonstrated that all GST genes

appeared to have diverged along at least one of these axes.

When all the functional parameters were examined, very few of

Figure 7. Quantitative RT-PCR Analysis of Relative Expression Levels of Populus GST Genes under Abiotic Stress.

The expression levels of the assayed genes were normalized to the expression level under normal growth conditions, which was set to 1.0. CD, HO, and

AT indicate the CDNB, H2O2, and atrazine stress treatments, respectively.


the GSTs had identical properties. Even the most recently

diverged paralogs differed in their expression, substrate spec-

trum, and/or catalytic efficiency, suggesting that duplicates have

a relatively high rate of diverging rapidly in function and (partial)

subfunctionalization has indeed taken place. This may explain

why so many GST duplicates have been retained in the poplar

genome.

Clear divergence in expression patterns was observed among

the Populus GSTs in response to different stress treatments.

Within the large tau class, 45% of the members showed distinct

selective expression under a specific treatment and/or in a

specific tissue. Most of the duplicate GSTs created by the

whole-genome duplication appear to have maintained similar

expression patterns, while GSTs from genes created by tandem

duplications have diverged rapidly. A similar pattern has been

found in Arabidopsis (Casneuf et al., 2006; Ganko et al., 2007). A

possible explanation for this pattern is that large-scale segmen-

tal duplication results in the duplication of multiple genes with

their promoter and/or enhancer elements (Casneuf et al., 2006;

Kim et al., 2006), while tandem duplication may disrupt the

regulatory regions of target genes, resulting in considerable

variation in expression responses. Divergence in expression has

been reported for various functional categories of genes. Indeed,

>60% of rice gene families exhibit higher expression diversity

betweenmembers than randomly selected gene pairs (Yim et al.,

2009), and >50% of duplicated gene pairs formed by the most

recent polyploidy event in Arabidopsis have divergent expres-

sion profiles (Blanc and Wolfe, 2004). These findings clearly

support the assertion that expression divergence is often the first

step in the functional divergence between duplicate genes and

thereby increases the chance of duplicate genes being retained

in a genome (Ohno, 1970).

Lineage-specific expansions and contractions of gene

families have been shown to be driven by diversifying selection

(Mondragon-Palomino et al., 2002; Shiu et al., 2004, 2006;

Gingerich et al., 2007). Gene families involved in immune de-

fense, stress responses, metabolism, cell signaling, chemore-

ception, and reproduction are well represented among those

shown to have diversified under positive selection (Lespinet

et al., 2002; Demuth and Hahn, 2009). However, evidence from

Table 2. Divergence between Paralogous GST Gene Pairs in Populus

No. Gene 1 Gene 2 Ks Ka Ka/Ks Gene Expression Substrate Specificities

1 T GSTU20 GSTU21 0.000 0.000 – AA –

2 T GSTU34 GSTU35 0.009 0.008 0.889 SE –

3 T GSTU7 GSTU10 0.015 0.028 1.867 AN –

4 T GSTU15 GSTU17 0.015 0.012 0.800 SE PS

5 T GSTU36 GSTU38 0.023 0.000 0.000 AI PS

6 T GSTU39 GSTU42 0.023 0.021 0.913 AI –

7 O GSTU22 GSTU51 0.024 0.015 0.625 AI PS

8 T GSTU28 GSTU29 0.030 0.006 0.200 SE –

9 O GSTU3 GSTU54 0.034 0.010 0.294 AA PS

10 O GSTU50 GSTU53 0.035 0.006 0.171 AN –

11 T GSTU4 GSTU9 0.044 0.019 0.432 AN PS

12 T GSTU5 GSTU6 0.050 0.030 0.600 AN –

13 T GSTU32 GSTU33 0.050 0.046 0.920 SE –

14 T GSTF5 GSTF6 0.058 0.004 0.069 AA PS

15 T GSTU11 GSTU12 0.062 0.040 0.645 AI PS

16 O GSTU40 GSTU55 0.063 0.029 0.460 AA PS

17 T GSTU2 GSTU8 0.126 0.018 0.143 AI SS

18 W GSTU18 GSTU24 0.170 0.067 0.394 AA SS

19 O EF1Bg2 EF1Bg3 0.180 0.050 0.278 AA –

20 T GSTU1 GSTU49 0.183 0.078 0.426 AI –

21 O GSTF7 GSTF3 0.197 0.037 0.188 AI PS

22 W GSTU26 GSTU46 0.212 0.083 0.391 AA NS

23 W DHAR2 DHAR3 0.247 0.072 0.291 AA SS

24 W GSTT1 GSTT2 0.259 0.104 0.401 AA –

25 T GSTF1 GSTF2 0.277 0.078 0.282 AA SS

26 O GSTU23 GSTU52 0.322 0.057 0.177 AN –

27 O GSTZ1 GSTZ2 0.649 0.095 0.146 AA –

28 O GSTL2 GSTL3 0.762 0.078 0.102 AA –

These gene pairs were identified at the terminal nodes of the gene tree shown in Figure 1. Gene pairs created by tandem duplication (T), whole-

genome duplication (W), or other (O) events are indicated in the first column of the table. Synonymous (ks) and nonsynonymous substitution (ka) rates

are presented for each pair. Observed gene expression patterns are categorized into four classes: AA, both duplicates were expressed in all tissues

under all growth conditions; AN, one duplicate was expressed, while the other was not detected in any tissues under any growth conditions; AI, one

duplicate was expressed in all tissues under all growth conditions, while the other was induced-expressed in response to a specific treatment in a

specific tissue; SE, both duplicates were selectively expressed either in response to a specific treatment and/or in a specific tissue. The encoded

enzyme activity patterns are categorized as follows: SS, both duplicates displayed a similar substrate spectrum; PS, the two duplicates displayed a

partially overlapping substrate spectrum; NS, the two duplicates displayed a nonoverlapping substrate spectrum.


genome-wide nucleotide substitution analyses suggests that

retention of paralogs driven by positive selection accounts for

only a fraction of the expansion of gene families (Bergthorsson

et al., 2007; Demuth and Hahn, 2009). In this study, we found

evidence that purifying selection has predominated across the

tau and phi GST classes, with episodic positive selection. A

theory that may partly explain why many duplicate genes bear

the signature of continued purifying selection after duplication is

that deleterious mutations may occur in different domains in

duplicates of genes with multiple independent domain subfunc-

tions, allowing the conservation of both duplicates since they

retain different permutations of subfunctions (Force et al., 1999).

Purifying selection against deleterious loss-of-function muta-

tions increases the fixation probability of a new duplicate gene

and enhances the preservation of functional alleles at both

duplicate loci (Tanaka et al., 2009). Mutations that affect modular

domains or molecular surfaces that control distinct subfunctions

have the potential to alter one aspect of a protein’s function

Table 3. Specific Activities of the Populus GSTs toward Seven Substrates (Means 6 SD Obtained from at Least Three Independent Determinations)

Specific Activity (mmol/min per mg) to Each Substrate

Cluster Gene Pair GST CDNB NBD-Cl DCNB NBC ECA 4-NPA DHA

I T GSTU2 12.90 6 0.04 6.93 6 0.03 0.18 6 0.09 1.51 6 0.08 n.d. nd n.t.

I GSTU8 6.02 6 0.36 3.59 6 0.02 0.01 6 0.01 2.09 6 1.40 n.d. nd n.t.

I T GSTU4 0.76 6 0.02 n.d. n.d. n.d. 0.72 6 0.04 0.01 6 0.01 n.t.

I GSTU9 3.92 6 0.05 1.42 6 0.06 0.09 6 0.01 0.63 6 0.34 n.d. n.d. n.t.

I O GSTU3 8.49 6 0.31 6.14 6 0.16 0.26 6 0.03 n.d. n.d. n.d. n.t.

– GSTU54 20.99 6 0.01 13.53 6 0.13 n.d. n.d. n.d. n.d. n.t.

I GSTU1 0.99 6 0.01 0.08 6 0.01 n.d. 0.25 6 0.01 0.63 6 0.03 n.d. n.t.

I GSTU7 n.d. 0.01 6 0.01 n.d. n.d. n.d. n.d. n.d.

II GSTU19 0.04 6 0.01 n.d. n.d. 0.15 6 0.02 n.d. n.d. n.t.

III T GSTU11 1.67 6 0.01 1.70 6 0.02 n.d. n.d. 0.02 6 0.01 n.d. n.d.

III GSTU12 7.38 6 0.17 3.59 6 0.06 0.12 6 0.01 n.d. n.d. n.t. n.d.

III T GSTU36 0.02 6 0.01 n.d. n.d. n.d. 0.14 6 0.03 n.d. n.t.

III GSTU38 n.d. 0.04 6 0.01 n.d. n.d. 0.77 6 0.04 n.d. n.t.

III GSTU14 2.47 6 0.04 1.38 6 0.10 n.d. n.d. n.d. n.d. n.t.

IV T GSTU40 15.64 6 0.16 7.30 6 0.51 0.29 6 0.03 0.26 6 0.01 n.d. n.d. n.t.

– GSTU55 3.00 6 0.07 5.18 6 0.02 n.d. n.d. 0.15 6 0.01 n.d. n.t.

V T GSTU32 n.d. 0.03 6 0.01 n.d. n.d. n.d. n.d. n.t.

V GSTU33 2.62 6 0.01 3.14 6 0.03 0.98 6 0.04 1.61 6 0.08 n.d. n.d. n.t.

V GSTU30 15.63 6 0.03 6.92 6 0.02 0.22 6 0.01 n.d. n.d. n.d. n.t.

V GSTU31 0.62 6 0.01 1.46 6 0.21 n.d. 0.88 6 0.46 n.d. n.d. n.t.

VI T GSTF5 0.48 6 0.04 1.79 6 0.03 n.d. 0.63 6 0.13 0.38 6 0.08 n.d. n.t.

VI GSTF6 0.60 6 0.02 12.95 6 0.11 n.d. n.d. n.d. n.d. n.t.

– O GSTF3 0.32 6 0.01 0.06 6 0.01 n.d. n.d. 0.58 6 0.02 n.d. n.d.

VI GSTF7 0.15 6 0.01 0.34 6 0.01 n.d. n.d. n.d. n.d. n.t.

VI GSTF4 3.62 6 0.02 1.10 6 0.01 n.d. 2.94 6 1.03 n.d. n.d. n.t.

VII T GSTU15 0.62 6 0.01 0.63 6 0.01 n.d. 1.38 6 0.20 n.d. n.d. n.t.

VII GSTU17 0.77 6 0.01 0.86 6 0.02 n.d. n.d. n.d. n.d. n.t.

VII GSTU16 0.93 6 0.02 0.49 6 0.02 n.d. 0.10 6 0.01 n.d. n.d. n.t.

– W GSTU18 0.01 6 0.01 0.01 6 0.01 0.03 6 0.01 n.d. n.d. n.t. n.t.

– GSTU24 0.56 6 0.01 0.05 6 0.01 0.05 6 0.01 n.d. n.d. n.d. n.t.

– O GSTU22 28.05 6 0.64 1.11 6 0.38 0.11 6 0.07 n.d. n.d. n.d. n.t.

– GSTU51 43.83 6 1.23 5.24 6 0.43 0.01 6 0.01 4.34 6 1.81 n.d. 0.02 6 0.02 n.t.

– W GSTU26 0.05 6 0.01 n.d. n.d. n.d. n.d. n.d. n.t.

– GSTU46 n.d. n.d. n.d. n.d. 0.06 6 0.01 n.d. n.t.

– T GSTF1 1.94 6 0.02 10.18 6 0.03 n.d. 4.49 6 0.02 0.88 6 0.02 n.d. n.t.

– GSTF2 0.75 6 0.01 0.94 6 0.03 n.d. 0.41 6 0.01 0.13 6 0.01 n.d. n.t.

– W DHAR2 n.d. n.d. n.d. n.d. n.d. n.d. 40.24 6 0.12

– DHAR3 n.d. n.d. n.d. n.d. n.d. n.d. 16.79 6 0.08

– DHAR1 n.d. n.d. n.d. n.d. n.d. n.d. 7.84 6 0.14

– GSTU35 8.01 6 0.41 3.37 6 0.06 0.14 6 0.01 1.17 6 0.74 n.d. 0.01 6 0.01 n.t.

– GSTL1 n.d. 0.03 6 0.01 n.d. n.d. n.d. n.d. n.d.

– GSTL2 n.d. 0.03 6 0.01 n.d. n.d. n.d. n.d. 0.01 6 0.01

– EF1Bg1 0.03 6 0.01 n.d. 0.13 6 0.04 n.d. n.d. n.d. 0.01 6 0.01

– GSTT2 0.04 6 0.01 0.02 6 0.01 n.d. n.d. n.d. n.d. 0.08 6 0.04

Dispersed GSTs not grouped into any cluster are indicated with “–”; gene pairs created by tandem duplication, whole-genome duplication, and other

events are indicated as T, W, and O, respectively. n.d., no activity detected; n.t., not detected.


without disrupting its interactions with at least some molecular

partners. It is therefore likely that purifying selection would be

partially relaxed after duplication (Bridgham et al., 2008).

Plant GSTs perform diverse catalytic and noncatalytic func-

tions in the detoxification of xenobiotics, prevention of oxidative

damage, and endogenous metabolism (Frova, 2003; Basantani

and Srivastava, 2007), some of which are fulfilled by conjugating

electrophilic substrates to glutathione. Most GSTs are active as

dimers, composed of either homogeneous (the most prevalent

form) or heterogeneous subunits (Dixon et al., 1999; Edwards

et al., 2000). Both within and among the GST classes, the

relatively small thioredoxin-like N-terminal domain that binds to

GSH is conserved in all classes (Dixon et al., 2002). By contrast,

the C-terminal domain that provides structural elements for the

recognition of xenobiotic substrates harbors much more diver-

sity within and among classes (Edwards et al., 2000; Basantani

and Srivastava, 2007). Among Populus tau GSTs, the C-terminal

domain appears to be under more relaxed functional constraints

than the N-terminal domain, which could lead to diversification in

substrate selectivity and specificity among the members, while

preserving the enzymes’ primary function. Given the role of the

C-terminal domain in substrate recognition, the overrepresenta-

tion of putative positively selected sites in the tau class suggests

pressure for diversification, possibly linked to selective advan-

tages conferred by the ability to recognize evolving targets and

enhance the metabolism of substances encountered in the

environment. In addition, our study revealed that a subset of

the Populus tau GSTs appears to have evolved significantly

faster than the others, suggesting that the substrate recognition

module in this subset is diversifying in response to a set of

changing substrates, while the other subset recognizes targets

defined by the ancestral functions of the enzyme. This pattern of

gene family diversification has been previously observed in

proteins that function as substrate recognition factors (Gingerich

et al., 2007).

The potential of a gene to evolve new function upon duplica-

tion may also depend on its ability to accept mutations without

losing thermodynamic stability of the protein domain that it

Table 4. Steady State Kinetic Constants of the Populus GSTs for CDNB, NBD-Cl, and DHA Conjugation Reactions (Means 6 SD Obtained from at

Least Three Independent Determinations)

Cluster Gene pair GSTs KmGSH (mM) kcatGSH (S�1)

(kcat/Km)GSH

(mM�1 S�1) KmCDNB (mM) kcatCDNB (s�1)

(kcat/Km)CDNB

(mM�1 s�1)

I T GSTU2 0.33 6 0.02 130.85 396.52 0.22 6 0.01 231.92 1054.18

I GSTU8 0.08 6 0.01 23.23 290.37 4.29 6 1.06 124.93 29.12

I T GSTU4 4.16 6 0.77 9.47 2.27 8.75 6 2.75 18.87 2.16

I GSTU9 0.30 6 0.03 14.65 48.83 0.59 6 0.03 18.46 31.29

I O GSTU3 0.25 6 0.03 127.29 509.16 0.29 6 0.22 152.04 524.27

– GSTU54 0.48 6 0.01 490.55 1021.98 0.24 6 0.01 412.24 1717.67

I GSTU1 0.17 6 0.06 1.76 10.35 0.21 6 0.01 2.26 10.76

III T GSTU11 0.17 6 0.01 1.88 11.06 123.6 6 6.18 174.36 1.41

III GSTU12 0.20 6 0.01 36.96 184.80 18.76 6 9.14 562.23 29.97

III GSTU14 0.16 6 0.01 66.72 417.00 12.57 6 4.53 269.09 21.41

V GSTU30 0.72 6 0.01 543.56 754.94 0.21 6 0.01 341.21 1624.81

VII T GSTU15 0.35 6 0.05 0.48 1.37 3.39 6 0.42 2.16 0.64

VII GSTU17 0.27 6 0.01 0.82 3.04 9.31 6 3.49 6.89 0.74

VII GSTU16 2.45 6 0.57 2.64 1.08 2.49 6 0.74 2.94 1.18

– O GSTU22 0.56 6 0.10 1118.39 1997.12 1.72 6 0.22 574.6 334.07

– GSTU51 0.43 6 0.03 710.15 1651.51 0.15 6 0.02 622.93 4152.87

– O GSTU40 0.18 6 0.02 112.42 624.56 3.08 6 0.70 486.21 157.86

– GSTU55 0.21 6 0.02 3.61 17.19 1.82 6 0.16 7.89 4.33

– GSTU24 0.33 6 0.02 0.54 1.64 3.90 6 0.32 1.81 0.46

– GSTU35 0.45 6 0.02 48.311 107.35 0.41 6 0.03 33.39 81.44

KmGSH (mM) kcatGSH (s�1) (kcat/Km)GSH

(mM�1 s�1)

KmNBD-Cl (mM) kcatNBD-Cl (s�1) (kcat/Km)NBD-Cl

(mM�1 s�1)

VI T GSTF5 0.09 6 0.01 25.27 280.78 0.62 6 0.29 14.64 23.61

VI GSTF6 0.10 6 0.01 147.73 1448.31 1.62 6 0.32 1383.73 854.15

VI GSTF4 2.02 6 0.56 16.83 8.33 2.10 6 0.21 12.64 6.02

VI GSTF7 0.71 6 0.14 0.84 1.18 0.43 6 0.07 1.89 4.39

– T GSTF1 0.51 6 0.07 246.54 483.42 0.32 6 0.03 435.60 1361.25

– GSTF2 0.82 6 0.04 1.82 2.22 0.31 6 0.03 2.70 8.71

KmGSH (mM) kcatGSH (s�1) (kcat/Km)GSH

(mM�1 s�1)

KmDHA (mM) kcatDHA (s�1) (kcat/Km)DHA

(mM�1 s�1)

– W DHAR2 1.39 6 0.28 10349.44 7445.64 0.45 6 0.09 8698.75 19330.56

– DHAR3 2.94 6 0.19 5300.45 1802.87 0.16 6 0.04 1265.55 7909.69

– DHAR1 1.72 6 0.51 172.43 100.25 0.30 6 0.01 115.05 383.50

Dispersed GSTs not grouped into any cluster are indicated with “–”; gene pairs created by tandem duplication, whole-genome duplication, and other

events are indicated as T, W, and O, respectively.


encodes. This protein structure-constrained potential has an

impact, at least in part, on the sequence variability and functional

diversity and the size of a gene family (Shakhnovich et al., 2005).

Proteins often display a certain degree of structural flexibility

that allows conformational changes, and it is widely assumed

that protein functional plasticity correlates with structural flexi-

bility (Hou et al., 2007; Kobilka and Deupi, 2007). Mutations in

the core structures can disrupt catalytic functions or dramatically

alter the shape of the binding pocket, while mutations outside

the active sites that subtly alter the substrate binding conforma-

tion might be more effective for achieving functional flexibility.

Structure modeling of tau and phi GSTs illustrated that diver-

gence is particularly visible in the C-terminal domain. Interest-

ingly, 7 out of 10 of the putative positively selected sites are in

the loop regions of this domain. The mutations accumulated in

loop regions may not have a dramatic effect on substrate

binding, but they could result in subtle conformational changes

to enzyme structure and, thus, substrate selectivity and kinetics,

resulting in a broad range of biochemical properties among the

gene family members.

How genes and their functions evolve after duplication is a

central, long-standing question in evolutionary biology. One

limitation for understanding gene family evolution is a lack of

understanding of the mode and tempo of functional diversifi-

cation in different functional categories of genes. As we gain

insights into divergence in gene sequences, structures, and

functions, the patterns of evolutionary dynamics of gene families

are just beginning to emerge. Our findings provide evidence for

the evolutionary partitioning of ancestral functions among dupli-

cated genes and protein domains, accompanied by specializa-

tion and only partially overlapping enzymatic properties. Both

purifying and directional selection played a role in the observed

functional diversification. This mechanism could facilitate the

retention of the duplicate genes and result in a large gene family

that has a broad substrate spectrum and a wide range of

reactivity toward different substrates. Thus, when gene families

generally, and the GST family in particular, have functions that

are potentially subspecialized, and the structure of the proteins

they encode allows rapid changes in specificity, affinity, and

activity, they are likely to expand in response to environmental

shifts at various time scales.

METHODS

Genomic Data Mining, GST Gene Identification, and Nomenclature

To identify GST genes in Populus trichocarpa, TBLASTN searches of the

P. trichocarpa genome database were performed using 53 full-length

GST protein sequences of Arabidopsis thaliana (Dixon et al., 2002), 61 of

rice (Oryza sativa; Soranzo et al., 2004), and 575 of other plants, animals,

fungi, and bacteria (see Supplemental Data Set 1 online). These 689 full-

length GSTs represent 35 classes defined by theNCBI conserved domain

database (Marchler-Bauer et al., 2005). Analysis of the collected P.

trichocarpa GST candidates indicated that some sequences were par-

tially misannotated during the automated genome annotation process.

Thus, manual reannotation was performed to rectify incorrect start codon

predictions, splicing errors, missed or extra exons, fused genes, split

genes, and incorrectly predicted pseudogenes. The reannotated se-

quences were further analyzed using an NCBI conserved domain search

(http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) to confirm the

presence of typical GST N- and C-terminal domains in their protein

structure. The predicted GST genes were then amplified from genomic

DNA and mRNA, cloned into the pGEM-T Easy Vector (Promega), and

sequenced in both directions to verify the gene sequence and structure.

For genes that RT-PCR did not detect (nine out of 81 in this study), their

structure was assumed to be identical to that of their closest relative on

the phylogenetic tree; this approach was adapted from other studies

(Meyers et al., 2003). Noncoding exons in the 59 and 39 untranslated

regions were not considered in this study. Primers used in gene ampli-

fications are listed in Supplemental Table 3 online.

The nomenclature for PopulusGSTs followed the system suggested by

Edwards et al. (2000) for plant GSTs; a univocal name was assigned to

each Populus GST gene (see Supplemental Table 1 online), consisting of

a letter for the subfamily class (e.g.,GSTU, F, T,Z, and L corresponding to

tau, phi, theta, zeta and lambda classes, respectively) and a progressive

number for each gene (e.g., GSTU1).

Phylogenetic and Molecular Evolution Analyses

GST protein sequences were aligned using a BLOSUM 30 matrix by the

Clustal X 1.83 program (Thompson et al., 1997), with an open gap

penalty of 10 and an extend gap penalty of 0.1 in pairwise alignments,

an extend gap penalty of 0.05 in the multiple alignment, and a delay

divergent setting of 40%. The protein alignment was further adjusted

manually using BioEdit (Hall, 1999). Phylogenetic relationships among

the Populus GSTs were reconstructed using an ML procedure by

PHYML (Guindon and Gascuel, 2003) with the JTT (Jones, Taylor, and

Thornton) amino acid substitution model. Cytosolic GSTs are regarded

as having been derived from the GRX2 protein (Holm et al., 2006); thus,

GRX2 was used as an outgroup in the phylogenetic analysis of the

whole GST family. For analysis of eachGST class, members of the sister

class were used as an outgroup. One thousand bootstrap replicates

were performed in each analysis to obtain the confidence support. The

synonymous (ks) and nonsynonymous substitution (ka) rates between

paralogous gene pairs were calculated by the K-Estimator program

(Comeron, 1999).

Thev values (v = dN/dS) among all pairwise comparisonswithin each of

the tau and phi GST classes were calculated by the YN00 program in the

PAML 4.3 package (Yang, 2007). The calculations were performed

separately using theN-terminal domain (from the start codon to alignment

position 83 in Supplemental Figure 2 for the tau GSTs and to alignment

position 79 in Supplemental Figure 3 for the phi GSTs), and the C-terminal

domain sequences (from alignment position 94 to the end in Supplemen-

tal Figure 2 for the tau GSTs and from alignment position 96 to the end

in Supplemental Figure 3 for the phi GSTs). Two-sample t tests were

performed to determine whether the v values of the two domains are

significantly different. To evaluate variation in selective pressure over a

phylogeny, the branchmodels of CODEML, PAML, were used to estimate

v under different assumptions. Analyses were conducted under three a

priori assumptions: a one-ratio model in which one v value was assumed

for the entire tree, a free-ratiomodel that allowsv to vary over all branches

of the tree, and a two-ratio model for the tau class, in which v values were

allowed to vary between the two major clades of the class (see Supple-

mental Figure 1 online), by selecting each clade as foreground, respec-

tively. To determine whether positive selection had acted at specific sites

in the GST sequences, six models in the PAML package were explored:

the one-ratio model (M0), the discrete model (M3), the nearly neutral

model (M1a), the positive-selection model (M2a), the beta model (M7),

and the beta and v model (M8). To evaluate rate variation between the

two major clades of tau class (see Supplemental Figure 1 online), a local

clock model was compared with a global clock model using PAML. Two

different rates were assigned to each clade in the local clockmodel, while

one the same ratewas assumed for both clades in the global clockmodel.


To verify which of the models best fitted the data, LRTs were performed

by comparing twice the difference in log likelihood values between pairs

of themodels using a x2 distribution, with the degrees of freedomequal to

the differences in the number of parameters between the models (Yang

et al., 2000).

Homology Modeling

The crystal structures of Glycine max Gm GSTU4, wheat (Triticum

aestivum) Ta GSTU4, and rice Os GSTU1 (Protein Data Bank code

numbers 2VO4, 1GWC, and 1OYJ, respectively) were used as templates

for constructing the structure models of Populus tau GSTs. The crystal

structures of two Zea mays phi GSTs (PDB: 1AW9 and 1BYE) and an

Arabidopsis phi GST (PDB: 1GNW) were used as templates for structure

modeling of Populus phi GSTs. Sequences were aligned by the Align 2D

structure alignment program (homology module, InsightII; Accelrys).

Structures were automatically built by the MODELER module of InsightII.

MODELER uses a spatial restraint method to build a three-dimensional

image of protein structure and is capable of generating a reliable

predicted structure using probability density functions derived from

homologous structures and general features of known proteins (Fiser

and Sali, 2003). All optimized structures were evaluated by the Profile-3D

program of InsightII to select the best model for a given protein.

Expression of GST Genes under Different Treatments

To investigate the expression patterns of GSTs under normal growth

conditions and abiotic stress, cuttings of P. trichocarpawere cultivated in

water for ;2 months and then three chemical treatments were applied:

1.0mMCDNB, 5.0%H2O2, and 1.5%atrazine as cultivation solutions and

sprays. Cultivation in water was used as a control. Each treatment

consisted of five replicates. Twelve hours after the chemical treatments,

total RNA was isolated from leaf, shoot, bud, phloem, and root tissues

using an Aurum Total RNA Kit (Bio-Rad Laboratories). Total RNA was

treated with RNase-free DNase I (Promega) and reverse transcribed

into cDNA using an RNA PCR Kit (AMV) version 3.0 (TaKaRa). Based on

the multiple sequences alignment of all Populus GST sequences, 80

specific primer pairs were designed (see Supplemental Table 4 online). In

the RT-PCR analysis, the Populus Actin gene (GenBank number

XM_002316253) was used as an internal control. After RT-PCR, PCR

products from each sample were validated by DNA sequencing. Quan-

titative RT-PCR (qRT-PCR) was performed using an Mx3000P real-time

PCR system (Stratagene). In qRT-PCR analysis, the reverse transcription

products were used as templates, and Brilliant SYBR Green QPCR

Master Mix (Stratagene) was used in all qPCR reactions. The Populus

Actin gene (GenBank number XM_002316253) was used as an internal

control in the qRT-PCR analyses, with the forward primer 59-GAGACCTT-

CAACACTCCTGCTATG-39 and reverse primer 59-CAGGTCAAGACGAA-

GAATGGC-39. Specific qRT-PCR primers for Populus GSTs are listed in

Supplemental Table 4 online. The qPCR conditions, following optimiza-

tion, consisted of an initial denaturation step of 10min at 958C followed by

40 cycles of 958C for 30 s, 608C for 30 s, and 728C for 30 s. A melt-curve

analysis immediately followed the final amplification to observe the

melting characteristics of each amplicon and confirm that a specific

product had been amplified, consisting of incubation at 958C for 60 s,

cooling to 608C for 30 s, and a slow rise in temperature to 958C with

continuous measurement of the decline in fluorescence. The relative

expression of specific genes was quantified by 2–DDCt, where DCT is the

difference in threshold cycles between the target and housekeeping gene

Actin, and DDCT is the difference between the DCt of the samples

exposed to abiotic stress and the DCt of the control plants. The mean

threshold cycle values for each GST were obtained from three indepen-

dent PCR experiments.

Molecular Cloning and Purification of Recombinant Populus

GST Proteins

To investigate the enzymatic functions of Populus GSTs, 30 tau, seven

phi, three DHAR, two lambda, one theta, and one EF1Bg GST were

selected for protein expression analysis and purification (Table 3). These

GSTs represent different evolutionary relationships and histories. Except

for EF1Bg1, the full-length cDNA of each GST was subcloned into a

pET30a expression vector (Novagen), which provides the correct reading

frame and a 63His-tag at the N terminus. EF1Bg1 consists of two parts: a

GST domain and an EF1Bgdomain (Figure 1C). In this study, only theGST

domain of EF1Bg1 was used to construct the expression vector. Primers

used to construct GST expression vectors are listed in Supplemental

Table 5 online. Colonies containing the appropriate insert were identified

by sequencing.

Overnight cultures of E. coli BL21, transformed with each GST gene

expression plasmid, were diluted 1:100 and grown until the optical

density (A600) reached 0.5. isopropyl-b-D-thiogalactopyranoside was

then added to each culture at a final concentration of 0.1 mM, and the

cultures were incubated at 378C overnight. The bacteria were harvested

by centrifugation at 8000g for 3 min at 48C, resuspended in binding buffer

(20 mM sodium phosphate, 0.5 M NaCl, and 20 mM imidazole, pH 7.4),

and disrupted by cold sonication. In each case, the homogenatewas then

subjected to centrifugation at 10,000g for 10 min at 48C. The resultant

particulate material and a small portion of the supernatant were analyzed

by SDS-PAGE. The rest of the supernatant was loaded onto a Ni

Sepharose High Performance column (GE Healthcare Bio-Sciences)

that had been preequilibrated with binding buffer. The overexpressed

protein that bound to the Ni Sepharose High Performance column was

eluted with elution buffer (20 mM sodium phosphate, 0.5 M NaCl, and

500 mM imidazole, pH 7.4). The purified recombinant protein was

desalted using a PD-10 column (GE Healthcare Bio-Sciences) in 10 mM

Tris-HCl buffer, pH 7.5. In this study, sixGSTs (GSTU7, 18, and 32, GSTL1

and 2, and GSTT2) were expressed as inclusion bodies. Inclusion bodies

were dissolved in 10 mM Tris-HCl, pH 8.0, 10 mMDTT, and 8M urea and

then refolded by serial dilutions in 10 mM Tris-HCl, pH 8.0, and 10 mM

DTT. The refolded proteins were further purified by Ni Sepharose High

Performance columns.

Specific Activity and Kinetics of GST Enzymes

GST activities (at 258C) toward the substrates CDNB, DCNB, ECA, NBC,

and 4-NPA were measured using the method described by Habig et al.

(1974), while activity toward NBD-Cl was measured using the method

described by Ricci et al. (1994), and activity toward DHA was measured

as described by Edwards and Dixon (2005). Protein concentrations in

the enzyme preparations were determined by measuring the absorbance

at 280 nm. The apparent Km values for GSH were determined using GSH

concentrations ranging from 0.02 to 1.0 mM and a fixed CDNB, NBD-Cl,

or DHA concentration of 1.0 mM. The apparent Km values for CDNB,

NBD-Cl, or DHA were determined using concentrations of CDNB, NBD-

Cl, or DHA, respectively, ranging from 0.04 to 1.0 mM and a fixed GSH

concentration of 1.0 mM. The kinetic parameters were derived from

nonlinear regression analysis by the Hyper32 program available at http://

www.liv.ac.uk/~jse/software.html.

Accession Numbers

Sequence data from this article can be found in the Arabidopsis Genome

Initiative or GenBank/EMBL databases under the accession numbers

listed in Supplemental Tables 1 and 6 online for P. trichocarpa and

Arabidopsis, respectively. Other sequences used are Populus Actin gene

(GenBank number XM_002316253), two Pinus taeda lambda GSTs

(GenBank numbers CV034086 and DR019281), Picea glauca lambda


GST (GenBank number EX306134), and GRX2 (GenBank number

NP_287198).

Supplemental Data

The following materials are available in the online version of this article.

Supplemental Figure 1. Phylogenetic Tree of the 58 Full-Length

Populus Tau GST Proteins.

Supplemental Figure 2. Sequence Alignment of Populus Tau GSTs

and the Predicted Secondary Structure Elements.

Supplemental Figure 3. Sequence Alignment of Populus Phi GSTs

and the Predicted Secondary Structure Elements.

Supplemental Table 1. Full-Length GST Genes Identified from the

Populus trichocarpa Genome.

Supplemental Table 2. GST Fragments Identified from the Populus

trichocarpa Genome.

Supplemental Table 3. Primers Used to Amplify Populus GST Genes.

Supplemental Table 4. RT-PCR Primers Used to Detect the Expres-

sion of Populus GST Genes.

Supplemental Table 5. Primers Used to Construct the Populus GST

Protein Expression Vector.

Supplemental Table 6. Tau GSTs Identified from Arabidopsis

thaliana Genome.

Supplemental Data Set 1. GSTs Used in the Homology Search of

Populus GST Candidates.

Supplemental Data Set 2A. Text File of Unmasked Alignment

Corresponding to the Phylogenetic Tree in Figure 1.

Supplemental Data Set 2B. Text File of Masked Alignment Corre-

sponding to the Phylogenetic Tree in Figure 1.


Corresponding to the Phylogenetic Tree in Figure 3A.


sponding to the Phylogenetic Tree in Figure 3A.


Corresponding to the Phylogenetic Tree in Figure 3B.


sponding to the Phylogenetic Tree in Figure 3B.


Corresponding to the Phylogenetic Tree in Figure 5.


sponding to the Phylogenetic Tree in Figure 5.


Corresponding to the Phylogenetic Tree in Supplemental Figure 1.


sponding to the Phylogenetic Tree in Supplemental Figure 1.

ACKNOWLEDGMENTS

We thank Bo Zheng of Umea Plant Science Centre, Umea University for

providing plant material and Song Ge, Xiao-Quan Wang, and Daming

Zhang of the Institute of Botany, Chinese Academy of Sciences, Tao

Sang of Michigan State University, Hong Ma of Pennsylvania State

University, and Folmer Bokma of Umea University for valuable com-

ments on an earlier draft of this work. This study was supported by

grants from the Natural Science Foundation of China (NSFC 30830010

and 30770149), the National Basic Research Program of China

(2009CB119104), and the Chinese Academy of Sciences (KZCX2-YW-

414).

Received July 24, 2009; revised October 31, 2009; accepted November

16, 2009; published December 8, 2009.

REFERENCES

Agrawal, G.K., Jwa, N.S., and Rakwal, R. (2002). A pathogen-induced

novel rice (Oryza sativa L.) gene encodes a putative protein homol-

ogous to type II glutathione S-transferases. Plant Sci. 163: 1153–1160.

Arabidopsis Genome Initiative (2000). Analysis of the genome sequence

of the flowering plant Arabidopsis thaliana. Nature 408: 796–815.

Barakat, A., Bagniewska-Zadworna, A., Choi, A., Plakkat, U., DiLoreto,

D.S., Yellanki, P., and Carlson, J.E. (2009). The cinnamyl alcohol

dehydrogenase gene family in Populus: Phylogeny, organization, and

expression. BMC Plant Biol. 9: 26.

Basantani, M., and Srivastava, A. (2007). Plant glutathione transfer-

ases - A decade falls short. Can. J. Bot. 85: 443–456.

Bergthorsson, U., Andersson, D.I., and Roth, J.R. (2007). Ohno’s

dilemma: Evolution of new genes under continuous selection. Proc.

Natl. Acad. Sci. USA 104: 17004–17009.

Blanc, G., and Wolfe, K.H. (2004). Functional divergence of duplicated

genes formed by polyploidy during Arabidopsis evolution. Plant Cell

16: 1679–1691.

Bridgham, J.T., Brown, J.E., Rodriguez-Mari, A., Catchen, J.M., and

Thornton, J.W. (2008). Evolution of a new function by degenerative

mutation in cephalochordate steroid receptors. PLoS Genet. 4:

e1000191.

Casneuf, T., De Bodt, S., Raes, J., Maere, S., and Van de Peer, Y.

(2006). Nonrandom divergence of gene expression following gene and

genome duplications in the flowering plant Arabidopsis thaliana.

Genome Biol. 7: R13.

Comeron, J.M. (1999). K-Estimator: Calculation of the number of

nucleotide substitutions per site and the confidence intervals. Bio-

informatics 15: 763–764.

Dean, E.J., Davis, J.C., Davis, R.W., and Petrov, D.A. (2008). Perva-

sive and persistent redundancy among duplicated genes in yeast.

PLoS Genet. 4: e1000113.

Demuth, J.P., and Hahn, M.W. (2009). The life and death of gene

families. Bioessays 31: 29–39.

Dixon, D.P., Cole, D.J., and Edwards, R. (1999). Dimerisation of maize

glutathione transferases in recombinant bacteria. Plant Mol. Biol. 40:

997–1008.

Dixon, D.P., Lapthorn, A., and Edwards, R. (2002). Plant glutathione

transferases. Genome Biol. 3: reviews 3004.3001–3004.3010.

Edwards, R., and Dixon, D.P. (2005). Plant glutathione transferases.

Methods Enzymol. 401: 169–186.

Edwards, R., Dixon, D.P., and Walbot, V. (2000). Plant glutathione

S-transferases: Enzymes with multiple functions in sickness and in

health. Trends Plant Sci. 5: 193–198.

Fiser, A., and Sali, A. (2003). Modeller: Generation and refinement of

homology-based protein structure models. Methods Enzymol. 374:

461–491.

Force, A., Lynch, M., Pickett, F.B., Amores, A., Yan, Y.L., and

Postlethwait, J. (1999). Preservation of duplicate genes by comple-

mentary, degenerative mutations. Genetics 151: 1531–1545.

Frova, C. (2003). The plant glutathione transferase gene family: Ge-

nomic structure, functions, expression and evolution. Physiol. Plant.

119: 469–479.


Ganko, E.W., Meyers, B.C., and Vision, T.J. (2007). Divergence in

expression between duplicated genes in Arabidopsis. Mol. Biol. Evol.

24: 2298–2309.

Gingerich, D.J., Hanada, K., Shiu, S.H., and Vierstra, R.D. (2007).

Large-scale, lineage-specific expansion of a bric-a-brac/tramtrack/

broad complex ubiquitin-ligase gene family in rice. Plant Cell 19:

2329–2348.

Guindon, S., and Gascuel, O. (2003). A simple, fast, and accurate

algorithm to estimate large phylogenies by maximum likelihood. Syst.

Biol. 52: 696–704.

Habig, W.H., Pabst, M.J., and Jakoby, W.B. (1974). Glutathione

S-transferases. The first enzymatic step in mercapturic acid forma-

tion. J. Biol. Chem. 249: 7130–7139.

Hall, T.A. (1999). BioEdit: A user-friendly biological sequence alignment

editor and analysis program for Windows 95/98/NT. Nucleic Acids

Symp. Ser. 41: 95–98.

Hancock, J.M. (2005). Gene factories, microfunctionalization and the

evolution of gene families. Trends Genet. 21: 591–595.

Holm, P.J., Bhakat, P., Jegerschold, C., Gyobu, N., Mitsuoka, K.,

Fujiyoshi, Y., Morgenstern, R., and Hebert, H. (2006). Structural

basis for detoxification and oxidative stress protection in membranes.

J. Mol. Biol. 360: 934–945.

Hou, L., Honaker, M.T., Shireman, L.M., Balogh, L.M., Roberts, A.G.,

Ng, K.-c., Nath, A., and Atkins, W.M. (2007). Functional promiscuity

correlates with conformational heterogeneity in A-class glutathione

S-transferases. J. Biol. Chem. 282: 23264–23274.

Hughes, A.L. (1994). The evolution of functionally novel proteins after

gene duplication. Proc. Biol. Sci. 256: 119–124.

Jeppesen, M.G., Ortiz, P., Shepard, W., Kinzy, T.G., Nyborg, J., and

Andersen, G.R. (2003). The crystal structure of the glutathione

S-transferase-like domain of elongation factor 1B gamma from Sac-

charomyces cerevisiae. J. Biol. Chem. 278: 47190–47198.

Kampranis, S.C., Damianova, R., Atallah, M., Toby, G., Kondi, G.,

Tsichlis, P.N., and Makris, A.M. (2000). A novel plant glutathione

S-transferase/peroxidase suppresses Bax lethality in yeast. J. Biol.

Chem. 275: 29207–29216.

Kim, J., Shiu, S.H., Thoma, S., Li, W.H., and Patterson, S.E. (2006).

Patterns of expansion and expression divergence in the plant poly-

galacturonase gene family. Genome Biol. 7: R87.

Kobilka, B.K., and Deupi, X. (2007). Conformational complexity of

G-protein-coupled receptors. Trends Pharmacol. Sci. 28: 397–406.

Lespinet, O., Wolf, Y.I., Koonin, E.V., and Aravind, L. (2002). The role

of lineage-specific gene family expansion in the evolution of eukary-

otes. Genome Res. 12: 1048–1059.

Loyall, L., Uchida, K., Braun, S., Furuya, M., and Frohnmeyer, H.

(2000). Glutathione and a UV light-induced glutathione S-transferase

are involved in signaling to chalcone synthase in cell cultures. Plant

Cell 12: 1939–1950.

Lynch, M. (2007). The Origins of Genome Architecture. (Sunderland,

MA: Sinauer Associates).

Lynch, M., and Conery, J.S. (2000). The evolutionary fate and conse-

quences of duplicate genes. Science 290: 1151–1155.

Lysak, M.A., Koch, M.A., Beaulieu, J.M., Meister, A., and Leitch, I.J.

(2009). The dynamic ups and downs of genome size evolution in

Brassicaceae. Mol. Biol. Evol. 26: 85–98.

Marchler-Bauer, A., et al. (2005). CDD: A Conserved Domain Database

for protein classification. Nucleic Acids Res. 33: D192–D196.

Matsumura, Y., Iwakawa, H., Machida, Y., and Machida, C. (2009).

Characterization of genes in the ASYMMETRIC LEAVES2/LATERAL

ORGAN BOUNDARIES (AS2/LOB) family in Arabidopsis thaliana, and

functional and molecular comparisons between AS2 and other family

members. Plant J. 58: 525–537.

Meyers, B.C., Kozik, A., Griego, A., Kuang, H., and Michelmore, R.

W. (2003). Genome-wide analysis of NBS-LRR-encoding genes in

Arabidopsis. Plant Cell 15: 809–834.

Mondragon-Palomino, M., Meyers, B.C., Michelmore, R.W., and

Gaut, B.S. (2002). Patterns of positive selection in the complete

NBS-LRR gene family of Arabidopsis thaliana. Genome Res. 12:

1305–1315.

Moore, R.C., and Purugganan, M.D. (2005). The evolutionary dynam-

ics of plant duplicate genes. Curr. Opin. Plant Biol. 8: 122–128.

Mueller, L.A., Goodman, C.D., Silady, R.A., and Walbot, V. (2000).

AN9, a petunia glutathione S-transferase required for anthocyanin

sequestration, is a flavonoid-binding protein. Plant Physiol. 123:

1561–1570.

Oakley, A.J. (2005). Glutathione transferases: New functions. Curr.

Opin. Struct. Biol. 15: 716–723.

Ohno, S. (1970). Evolution by Gene Duplication. (Heidelberg, Germany:

Springer-Verlag).

Ricci, G., Caccuri, A.M., Lo Bello, M., Pastore, A., Piemonte, F., and

Federici, G. (1994). Colorimetric and fluorometric assays of glutathi-

one transferase based on 7-chloro-4-nitrobenzo-2-oxa-1,3-diazole.

Anal. Biochem. 218: 463–465.

Shakhnovich, B.E., Deeds, E., Delisi, C., and Shakhnovich, E. (2005).

Protein structure and evolutionary history determine sequence space

topology. Genome Res. 15: 385–392.

Shiu, S.H., Byrnes, J.K., Pan, R., Zhang, P., and Li, W.H. (2006). Role

of positive selection in the retention of duplicate genes in mammalian

genomes. Proc. Natl. Acad. Sci. USA 103: 2232–2236.

Shiu, S.H., Karlowski, W.M., Pan, R., Tzeng, Y.H., Mayer, K.F., and

Li, W.H. (2004). Comparative analysis of the receptor-like kinase

family in Arabidopsis and rice. Plant Cell 16: 1220–1234.

Smith, A.P., DeRidder, B.P., Guo, W.J., Seeley, E.H., Regnier, F.E.,

and Goldsbrough, P.B. (2004). Proteomic analysis of Arabidopsis

glutathione S-transferases from benoxacor- and copper-treated

seedlings. J. Biol. Chem. 279: 26098–26104.

Soranzo, N., Sari Gorla, M., Mizzi, L., De Toma, G., and Frova, C.

(2004). Organisation and structural evolution of the rice glutathione

S-transferase gene family. Mol. Genet. Genomics 271: 511–521.

Tanaka, K.M., Takahasi, K.R., and Takano-Shimizu, T. (2009). En-

hanced fixation and preservation of a newly arisen duplicate gene

by masking deleterious loss-of-function mutations. Genet. Res. 91:

267–280.

Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and

Higgins, D.G. (1997). The Clustal X windows interface: Flexible

strategies for multiple sequence alignment aided by quality analysis

tools. Nucleic Acids Res. 24: 4876–4882.

Tuskan, G.A., et al. (2006). The genome of black cottonwood, Populus

trichocarpa (Torr. & Gray). Science 313: 1596–1604.

Walsh, B. (2003). Population-genetic models of the fates of duplicate

genes. Genetica 118: 279–294.

Yang, Z. (2007). PAML 4: Phylogenetic analysis by maximum likelihood.

Mol. Biol. Evol. 24: 1586–1591.

Yang, Z., Nielsen, R., Goldman, N., and Pedersen, A.M. (2000).

Codon-substitution models for heterogeneous selection pressure at

amino acid sites. Genetics 155: 431–449.

Yang, Z., Wong, W.S., and Nielsen, R. (2005). Bayes empirical bayes

inference of amino acid sites under positive selection. Mol. Biol. Evol.

22: 1107–1118.

Yim, W.C., Lee, B.M., and Jang, C.S. (2009). Expression diversity and

evolutionary dynamics of rice duplicate genes. Mol. Genet. Genomics

281: 483–493.

Zhang, W., Bone, J.R., Edmondson, D.G., Turner, B.M., and Roth, S.

Y. (1998). Essential and redundant functions of histone acetylation

revealed by mutation of target lysines and loss of the Gcn5p acetyl-

transferase. EMBO J. 17: 3155–3167.


DOI 10.1105/tpc.109.070219; originally published online December 8, 2009;Plant Cell

Ting Lan, Zhi-Ling Yang, Xue Yang, Yan-Jing Liu, Xiao-Ru Wang and Qing-Yin Zeng-Transferase Supergene FamilyS Glutathione PopulusExtensive Functional Diversification of the

This information is current as of August 28, 2020

Supplemental Data /content/suppl/2009/12/11/tpc.109.070219.DC2.html /content/suppl/2009/12/02/tpc.109.070219.DC1.html

Permissions https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298X

eTOCs http://www.plantcell.org/cgi/alerts/ctmain

Sign up for eTOCs at:

CiteTrack Alerts http://www.plantcell.org/cgi/alerts/ctmain

Sign up for CiteTrack Alerts at:

Subscription Information http://www.aspb.org/publications/subscriptions.cfm

is available at:Plant Physiology and The Plant CellSubscription Information for

ADVANCING THE SCIENCE OF PLANT BIOLOGY © American Society of Plant Biologists

https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298X

http://www.plantcell.org/cgi/alerts/ctmain

http://www.plantcell.org/cgi/alerts/ctmain

http://www.aspb.org/publications/subscriptions.cfm

Date post:	15-Jul-2020
Category:	Documents
Upload:	others
View:	9 times
Download:	0 times

Extensive Functional Diversiﬁcation of the Populus ... · Extensive Functional Diversiﬁcation...

Documents