+ All Categories
Home > Documents > Research - Welcome to PGML · modes (Wang etal., 2007, 2011; Li etal., 2009). Rice has experienced...

Research - Welcome to PGML · modes (Wang etal., 2007, 2011; Li etal., 2009). Rice has experienced...

Date post: 29-Jul-2018
Category:
Upload: vananh
View: 213 times
Download: 0 times
Share this document with a friend
10
Gene body methylation shows distinct patterns associated with different gene origins and duplication modes and has a heterogeneous relationship with gene expression in Oryza sativa (rice) Yupeng Wang 1,2 , Xiyin Wang 1,3 , Tae-Ho Lee 1 , Shahid Mansoor 1 and Andrew H. Paterson 1 1 Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA; 2 Computational Biology Service Unit, Cornell University, Ithaca, NY 14853, USA; 3 Center for Genomics and Computational Biology, School of Life Sciences, School of Sciences, Hebei United University, Tangshan, Hebei, 063000, China Author for correspondence: Andrew H. Paterson Tel: +1 706 583 0162 Email: [email protected] Received: 31 October 2012 Accepted: 6 December 2012 New Phytologist (2013) doi: 10.1111/nph.12137 Key words: correlation analysis, DNA methylation, gene body, gene duplication, gene origin, Ks, rice (Oryza sativa). Summary Whole-genome duplication (WGD) has been recurring and single-gene duplication is also widespread in angiosperms. Recent whole-genome DNA methylation maps indicate that gene body methylation (i.e. of coding regions) has a functional role. However, whether gene body methylation is related to gene origins and duplication modes has yet to be reported. In rice (Oryza sativa), we computed a body methylation level (proportion of methylated CpG within coding regions) for each gene in five tissues. Body methylation levels follow a bimodal distribution, but show distinct patterns associated with transposable element-related genes; WGD, tandem, proximal and transposed duplicates; and singleton genes. For pairs of duplicated genes, divergence in body methylation levels increases with physical distance and synonymous (Ks) substitution rates, and WGDs show lower divergence than single-gene duplications of similar Ks levels. Intermediate body methylation tends to be associated with high levels of gene expression, whereas heavy body methylation is associated with lower levels of gene expression. The biological trends revealed here are consistent across five rice tissues, indicating that genes of different origins and duplication modes have distinct body methylation patterns, and body methylation has a heterogeneous relationship with gene expression and may be related to survivorship of duplicated genes. Introduction Gene duplication is a primary mechanism for the evolution of novelty and complexity in higher organisms (Ohno, 1970; Flagel & Wendel, 2009; Innan & Kondrashov, 2010). It is now known that genes may be duplicated by various modes, generally referred to as large-scale and small-scale duplications (Maere et al., 2005; Casneuf et al., 2006; Ganko et al., 2007; Freeling, 2009; Wang et al., 2012). The most frequent consequence of gene duplication is reversion to single-copy (singleton) status (Freeling & Thomas, 2006; Freeling, 2009); however, genes retained in duplicate offer the potential for the evolution of novelty (Ohno, 1970; Flagel & Wendel, 2009; Innan & Kondrashov, 2010). Thus, the study of mechanisms for gene retention and evolution in view of different gene duplication modes is very important (Wang et al., 2012). Oryza sativa (rice) is a good model to elucidate the genetic mech- anisms and evolutionary features of different gene duplication modes (Wang et al., 2007, 2011; Li et al., 2009). Rice has experienced at least two whole-genome duplications (WGDs), one shared with most if not all cereals (q), and another more ancient event (r) (Paterson et al., 2004; Tang et al., 2010). In angiosperm species, most duplicated chromosomal segments are thought to arise from WGDs (Tang et al., 2008a,b). Small- scale gene duplications, often referred to as single-gene duplica- tions, are also widespread in rice (Wang et al., 2007, 2011; Li et al., 2009). According to the physical distance between duplicates, single-gene duplications can be further classified into local and transposed gene duplications (Ganko et al., 2007; Wang et al., 2011, 2012). Local duplications may occur as tan- dem duplications (i.e. duplicated genes are consecutive in the genome), which may be caused by illegitimate chromosomal recombination (Freeling, 2009), or proximal duplications (i.e. separated by one or more genes), which may be caused by local- ized transposon activities (Zhao et al., 1998; Wang et al., 2011, 2012). Transposable element (TE)-related genes comprise a sig- nificant portion of rice protein-coding genes (Yuan et al., 2005; Jiao & Deng, 2007). TE-related genes have normal gene struc- tures with coding capacity and transcriptional activity, but share significant sequence similarity with known TEs (Jiao & Deng, 2007). Transposed duplications that create two gene copies far Ó 2013 The Authors New Phytologist Ó 2013 New Phytologist Trust New Phytologist (2013) 1 www.newphytologist.com Research
Transcript

Gene body methylation shows distinct patterns associated withdifferent gene origins and duplication modes and has aheterogeneous relationship with gene expression inOryza sativa

(rice)

Yupeng Wang1,2, Xiyin Wang1,3, Tae-Ho Lee1, Shahid Mansoor1 and Andrew H. Paterson1

1Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA; 2Computational Biology Service Unit, Cornell University, Ithaca, NY 14853, USA; 3Center for

Genomics and Computational Biology, School of Life Sciences, School of Sciences, Hebei United University, Tangshan, Hebei, 063000, China

Author for correspondence:Andrew H. Paterson

Tel: +1 706 583 0162

Email: [email protected]

Received: 31 October 2012

Accepted: 6 December 2012

New Phytologist (2013)doi: 10.1111/nph.12137

Key words: correlation analysis, DNAmethylation, gene body, gene duplication,gene origin, Ks, rice (Oryza sativa).

Summary

� Whole-genome duplication (WGD) has been recurring and single-gene duplication is also

widespread in angiosperms. Recent whole-genome DNA methylation maps indicate that gene

body methylation (i.e. of coding regions) has a functional role. However, whether gene body

methylation is related to gene origins and duplication modes has yet to be reported.� In rice (Oryza sativa), we computed a body methylation level (proportion of methylated

CpG within coding regions) for each gene in five tissues.� Body methylation levels follow a bimodal distribution, but show distinct patterns associated

with transposable element-related genes; WGD, tandem, proximal and transposed duplicates;

and singleton genes. For pairs of duplicated genes, divergence in body methylation levels

increases with physical distance and synonymous (Ks) substitution rates, and WGDs

show lower divergence than single-gene duplications of similar Ks levels. Intermediate body

methylation tends to be associated with high levels of gene expression, whereas heavy body

methylation is associated with lower levels of gene expression.� The biological trends revealed here are consistent across five rice tissues, indicating that

genes of different origins and duplication modes have distinct body methylation patterns, and

body methylation has a heterogeneous relationship with gene expression and may be related

to survivorship of duplicated genes.

Introduction

Gene duplication is a primary mechanism for the evolution ofnovelty and complexity in higher organisms (Ohno, 1970; Flagel& Wendel, 2009; Innan & Kondrashov, 2010). It is now knownthat genes may be duplicated by various modes, generally referredto as large-scale and small-scale duplications (Maere et al., 2005;Casneuf et al., 2006; Ganko et al., 2007; Freeling, 2009; Wanget al., 2012). The most frequent consequence of gene duplicationis reversion to single-copy (singleton) status (Freeling & Thomas,2006; Freeling, 2009); however, genes retained in duplicate offerthe potential for the evolution of novelty (Ohno, 1970; Flagel &Wendel, 2009; Innan & Kondrashov, 2010). Thus, the study ofmechanisms for gene retention and evolution in view of differentgene duplication modes is very important (Wang et al., 2012).Oryza sativa (rice) is a good model to elucidate the genetic mech-anisms and evolutionary features of different gene duplicationmodes (Wang et al., 2007, 2011; Li et al., 2009).

Rice has experienced at least two whole-genome duplications(WGDs), one shared with most if not all cereals (q), and another

more ancient event (r) (Paterson et al., 2004; Tang et al., 2010).In angiosperm species, most duplicated chromosomal segmentsare thought to arise from WGDs (Tang et al., 2008a,b). Small-scale gene duplications, often referred to as single-gene duplica-tions, are also widespread in rice (Wang et al., 2007, 2011; Liet al., 2009). According to the physical distance betweenduplicates, single-gene duplications can be further classified intolocal and transposed gene duplications (Ganko et al., 2007;Wang et al., 2011, 2012). Local duplications may occur as tan-dem duplications (i.e. duplicated genes are consecutive in thegenome), which may be caused by illegitimate chromosomalrecombination (Freeling, 2009), or proximal duplications (i.e.separated by one or more genes), which may be caused by local-ized transposon activities (Zhao et al., 1998; Wang et al., 2011,2012). Transposable element (TE)-related genes comprise a sig-nificant portion of rice protein-coding genes (Yuan et al., 2005;Jiao & Deng, 2007). TE-related genes have normal gene struc-tures with coding capacity and transcriptional activity, but sharesignificant sequence similarity with known TEs (Jiao & Deng,2007). Transposed duplications that create two gene copies far

� 2013 The Authors

New Phytologist� 2013 New Phytologist TrustNew Phytologist (2013) 1

www.newphytologist.com

Research

away from each other are widespread in plants (Freeling et al.,2008; Freeling, 2009; Woodhouse et al., 2010, 2011; Wanget al., 2011, 2012), suggesting that many non-TE-related genesare also mobile, via either DNA- or RNA-mediated transposition(Cusack & Wolfe, 2007). Transposed duplicates may also occurby intrachromosomal recombination (Woodhouse et al., 2011).

Divergence between duplicated genes increases with time, butthe rate/extent of divergence is affected by gene duplicationmodes (Casneuf et al., 2006; Arabidopsis Interactome MappingConsortium, 2011; Wang et al., 2011). Generally, WGD dupli-cates are less divergent than other duplicates (Casneuf et al.,2006; Ganko et al., 2007; Li et al., 2009; Wang et al., 2011).Moreover, singletons show higher interspecies conservation thanduplicates based on cross-species comparison of genomic andexpression data (Ha et al., 2009; Wang et al., 2011). Indeed, thedistinct evolutionary effects of gene duplication modes may, inturn, affect the rates of gene retention, depending on functionalcategory-specific selection pressures on neo-functionalization,functional buffering or high expression (Freeling, 2009; Innan &Kondrashov, 2010; Wang et al., 2012).

Under-explored and controversial in the current literature arethe roles of epigenetic marks in gene duplication, evolution andretention. DNA methylation is one of the most important epige-netic marks, and high-resolution whole-genome DNA methyla-tion maps based on bisulfite sequencing have been made for rice(Feng et al., 2010; Zemach et al., 2010a,b). Previous analyses ofwhole-genome DNA methylation data have suggested that riceDNA methylation occurs predominantly at cytosine followed byguanine, that is, ‘CpG’ dinucleotides (Feng et al., 2010; Zemachet al., 2010b). Gene body methylation (DNA methylation ofcoding regions) is conserved across eukaryotic lineages (Lee et al.,2010; Su et al., 2011). Although it is broadly accepted that pro-moter methylation is generally associated with the repression ofplant gene expression (Zhang et al., 2006; Su et al., 2011), thefunctional roles of gene body methylation are controversial (Leeet al., 2010; Su et al., 2011). To date, gene body methylation hasbeen suggested to enhance accurate splicing of primary tran-scripts (Lorincz et al., 2004; Kolasinska-Zwierz et al., 2009;Schwartz et al., 2009; Luco et al., 2010) and/or prevent ‘leaky’expression from intragenic cryptic promoters (Zilberman et al.,2007; Maunakea et al., 2010). In Arabidopsis and rice, associa-tion of gene body methylation with active transcription has beenproposed (Zhang et al., 2006; Zilberman et al., 2007; Zemachet al., 2010b; Takuno & Gaut, 2012). By contrast, several studiesin rice have suggested that the major effect of body methylationon gene expression is repression (Li et al., 2008; He et al., 2010).From the point of view of evolution, body-methylated genes havebeen suggested to be functionally important and to evolve slowly(Sarda et al., 2012; Takuno & Gaut, 2012). However, the inter-play between gene body methylation and gene duplication, aswell as the evolution of duplicate genes, has been little explored.

Study of the potential interplay between gene body methyla-tion and gene origins and duplications may help us to understandthe roles of epigenetic factors in shaping current genomes, as wellas the mechanisms underlying gene duplications and evolution.In rice, we analyzed single-base resolution, whole-genome DNA

methylation maps of five tissues (Zemach et al., 2010a,b). Foreach gene, we computed a body methylation level (proportion ofmethylated CpG dinucleotides within coding regions) in each tis-sue. We classified rice genes into different origins and duplicationmodes, including TE-related genes, singletons, and WGD, tan-dem, proximal and transposed duplicates, and compared thebody methylation levels among different categories of genes. Forduplicated genes, we examined divergence in body methylationlevels and its relationship with coding sequence divergence. Fur-thermore, we studied the potential relationships between bodymethylation and duplicate gene retention. Finally, we investi-gated the complicated relationships between body methylationand gene expression levels.

Materials and Methods

Sequence sources

The rice gene set was retrieved from the Rice Genome Annota-tion Project (TIGR5, http://rice.plantbiology.msu.edu/). Thegene sets of outgroups, including Sorghum bicolor, Brachypodiumand Zea mays, were retrieved from Phytozome (http://www.phytozome.net/). For each gene, only the first transcript in thegenome annotation (transcript name suffixed by ‘.1’) was usedfor analysis.

Identification of genes of different origins

Rice genes were first divided into TE-related and non-TE-relatedgenes, according to TIGR5. The non-TE-related genes were fur-ther classified into WGD duplicates, singletons, tandem, proxi-mal, transposed and dispersed duplicates. To this end, thepopulation of potential gene duplications in rice was identifiedusing BLASTP (Altschul et al., 1990) (TE-related genes were notconsidered for BLASTP). For each gene, only the top five nonselfBLASTP matches that met a threshold of E < 10�10 were consid-ered as potential gene duplication relationships. The genes with-out any BLASTP hit were deemed singletons. WGD duplicateswere obtained from a previous study (Tang et al., 2010). We thenderived single-gene duplications by excluding pairs of WGDduplicates from the population of gene duplications. Tandemduplicates were adjacent homologs and proximal duplicates werenot adjacent, but within 10 annotated genes of each other on thesame chromosomes and without any paralog between them.The remaining single-gene duplications, that is, after deductionof the tandem and proximal duplications, were searched fortransposed duplications. To accomplish this aim, genes at ances-tral (i.e. interspecies collinear) chromosomal positions were dis-cerned by aligning syntenic blocks within rice and between riceand its outgroups, including Sorghum bicolor, Brachypodium andZea mays. For a pair of transposed duplicates, we required thatone duplicate was at its ancestral locus and the other was at anonancestral locus, named the parental duplicate and transposedduplicate, respectively. For a transposed duplicate, there may bemultiple ancestral paralogs, and we regarded the ancestral paralogwith highest sequence identity as its parental duplicate. The

New Phytologist (2013) � 2013 The Authors

New Phytologist� 2013 New Phytologist Trustwww.newphytologist.com

Research

NewPhytologist2

remaining duplicates which do not belong to any of the WGD,tandem, proximal and transposed duplicates were simply denotedas dispersed duplicates.

Rice whole-genome DNA methylation data

Rice single-base resolution DNA methylation data of embryo,endosperm, leaf, root and shoot tissues, generated by bisulfitesequencing technology, were obtained from two previous studies(Zemach et al., 2010a,b). We used the processed data providedby the authors, available at the Gene Expression Omnibus data-base (accession numbers: GSM497260, GSM560562,GSM560563, GSM560564 and GSM560565). In the processeddata, the likelihood of methylation was shown for each CpG,CHG and CHH site, whose chromosomal position was anno-tated according to TIGR5. Only CpG methylation was consid-ered in this study. The likelihood of CpG methylation showed astrong bimodal distribution, and we regarded a value of > 0.5 asmethylation of CpG dinucleotides.

Comparing the distributions of body methylation levels

As body methylation levels tend to be bimodally distributed, it isnot reasonable to compute a single mean and standard deviationof body methylation levels for a gene group. To compare the dis-tributions of body methylation levels of different gene groups, weused both parametric and nonparametric tests: (1) parametrictest: we counted the gene numbers associated with low methyla-tion (body methylation level < 0.1), intermediate methylation(0.1 � body methylation level � 0.9), and high methylation(body methylation level > 0.9) for each gene group, and thencompared the gene numbers with different extent of methylationbetween different gene groups using a v2 test; and (2) nonpara-metric test: the comparison of the distributions of body methyla-tion levels between two gene groups was modeled as testingwhether one gene group had more outliers (highly body-methy-lated genes) than the other group. The Outlier-Sum statistic(Tibshirani & Hastie, 2007) was adopted. P values were assessedbased on 104 permutations of the pooled body methylation levelsof the two gene groups for comparison.

Ks calculation

Protein sequences of duplicated genes were aligned usingClustalw (Thompson et al., 1994) with default parameters. Then,the protein alignment was converted to a coding sequencealignment using the ‘Bio::Align::Utilities’ module in the BioPerlpackage (http://www.bioperl.org/). Ks was calculated using themethods of Nei & Gojobori (1986) and Yang & Nielsen (2000),via the ‘Bio::Align::DNAStatistics’ and ‘Bio::Tools::Run::Phylo::PAML::Yn00’ modules, respectively, in the BioPerl package. Itshould be noted that extremely high levels of sequence divergencebetween duplicated genes may cause the ‘Bio::Align::DNAStatis-tics’ module to generate invalid Ks values, which were then ruledout from the related analysis. Following a previous study in rice(Tang et al., 2010), we excluded Ks values for gene pairs with

average third-codon-position GC content (GC3) > 75% fromrelated statistical analyses because there are two distinct groups ofgenes with significantly different GC3. Ks values > 3.0 were alsoexcluded because of saturated substitutions at synonymous posi-tions.

Gene expression data

Processed rice expression data over 508 tissues and physiologicalconditions, generated by the Affymetrix GeneChip RiceGenome Array, were obtained from previous studies (Ficklinet al., 2010; Wang et al., 2011). In the data, the numbers of col-umns that sampled embryo, endosperm, leaf, root and shootwere 3, 4, 50, 99 and 84, respectively. For some genes, thereare multiple probe sets on the array to measure their expression.Inclusion or exclusion of ‘suboptimal’ probe sets with suffix‘_s_at’ or ‘_x_at’, which were suspected of potential cross-hybridization, has been shown previously to have only trivialeffects (Wang et al., 2011). In this study, all types of probe setswere considered and, for a gene with multiple probe sets, thefirst probe set according to alphabetic sorting was used to repre-sent its expression profile.

Correlation analysis and smoothing spline regression

In this study, correlations were measured by Spearman’s correla-tion coefficients. Smoothing spline regression was performed viathe ‘smooth.spline’ function of R language. To avoid overfittingin smoothing spline regression, three degrees of freedom, includ-ing 2, 4 and 6, were tested.

Results

Gene origins in rice

Like many other eukaryotic species, the rice genome has beenshaped and dynamically reconstructed by multiple evolutionaryforces and events, which render its genes to have different origins(International Rice Genome Sequencing Project, 2005). TE-related genes are classified on the basis of sharing significantsequence similarity with TEs (Jiao & Deng, 2007). Among non-TE-related genes, those present in only single copies were deemedto be singletons, whereas others were deemed to be duplicated.Duplicated genes were further classified in terms of duplicationmodes, with those at collinear positions of intraspecies syntenicblocks deemed to be WGD duplicates (Tang et al., 2010). Allother duplicates were assumed to have occurred by single-geneduplications, further classified into tandem, proximal and dis-persed, as described above. The mechanisms underlying dispersedduplications are very complicated (Wang et al., 2012). However,if one member of a pair of dispersed duplications was at its ances-tral locus and the other was at a nonancestral locus, such geneduplications were deemed to be transposed (Wang et al., 2011,2012). Summary statistics on rice gene origins are shown inTable 1, and the classification of duplicated genes is shown inSupporting Information Table S1.

� 2013 The Authors

New Phytologist� 2013 New Phytologist TrustNew Phytologist (2013)

www.newphytologist.com

NewPhytologist Research 3

Body methylation levels show different distributionsassociated with gene origins and duplication modes

To investigate the patterns of gene body methylation in view ofdifferent gene origins and duplication modes, we computed thebody methylation level for each gene, defined as the proportionof methylated CpG dinucleotides relative to all CpG dinucleo-tides within its coding region, in embryo, endosperm, leaf, rootand shoot. To test the consistency of body methylation levelsacross tissues, we visualized the body methylation levels of allgenes between all pairs of tissues via scatter plots (Fig. S1).Although endosperm tissue shows higher variations than othertissues, body methylation levels are much more likely to be con-sistent (rather than different) across tissues, that is, points (genes)are densely distributed along the ‘y = x’ diagonal line in the scatterplots. This analysis indicates that it is feasible to study the evolu-tionary characteristics of body methylation for large groups ofgenes with the acknowledgement of the existence of tissue-specific body methylation for specific genes.

A recent study has suggested that gene bodies cluster into twogroups corresponding to high and low levels of DNA methyla-tion, respectively, in honeybee, silkworm, sea squirt and seaanemone (Sarda et al., 2012). We plotted the distribution ofbody methylation levels for all rice genes (Fig. 1a), finding a clearbimodal distribution peaking at ‘0’ or ‘1’, suggesting that genebodies tend to be either highly methylated or little methylated inrice.

We found that different gene origins differ in the distributionsof body methylation levels. First, we compared the distributionsof body methylation levels between TE-related and non-TE-related genes, and found that the two distributions weresignificantly different (P < 2.29 10�16, v2; P < 10�4, Outlier-Sum statistic; see the Materials and Methods section) (Fig. 1b).Specifically, most TE-related genes are highly body-methylated(body methylation level > 0.9), consistent with previous studies(Zilberman et al., 2007; Li et al., 2008; Feng et al., 2010; Heet al., 2010; Zemach et al., 2010b), whereas non-TE-relatedgenes are bimodally distributed, with more genes little body-methylated (body methylation level < 0.1). As noted previously,TE-related genes exhibit much lower transcriptional activities

than non-TE-related genes (Jiao & Deng, 2007), suggesting thathigh levels of body methylation may be associated with reducedtranscription, and conflicting with the hypothesis that bodymethylation has only minor, but positive, effects on the levels ofgene expression (Zhang et al., 2006; Zilberman et al., 2007;Zemach et al., 2010b; Takuno & Gaut, 2012).

We compared the distributions of body methylation levelsbetween different origins within non-TE-related genes. Single-tons show a higher frequency of high body methylation than doduplicates (Fig. 1c; P < 2.29 10�16, v2; P < 10�4, Outlier-Sumstatistic; see the Materials and Methods section). Tandem, proxi-mal and transposed duplicates show an obvious frequency peakof high body methylation (Fig. 1d), whereas WGD duplicates donot (P < 2.29 10�16, v2; P < 10�4, Outlier-Sum statistic; see theMaterials and Methods section). Moreover, the likelihood of aduplicated gene being highly body-methylated follows thetendency: transposed > proximal > tandem >WGD (P < 2.2910�16, v2; P < 10�4, Outlier-Sum statistic; see the Materials andMethods section). In partial summary, body methylation levelsshow different distributions associated with gene origins andduplication modes, suggesting that genes of different origins tendto have distinct epigenetic features.

Divergence in body methylation levels between duplicatedgenes

Genes duplicated by different modes differ in the extent ofexpression divergence and the rewiring of protein–protein net-works (De Smet & Van de Peer, 2012; Wang et al., 2012). Here,we examined whether duplicated genes of different modes alsodiffer significantly in divergence in body methylation levels.Divergence in body methylation levels among gene pairs dupli-cated by different modes (Fig. 2a) showed the following trend:random gene pairs > transposed duplicates > proximal dupli-cates > tandem duplicates�WGD duplicates (both an ANOVAmodel involving all duplication modes and Tukey’s honestly sig-nificant difference (HSD) test between adjacent duplicationmodes were significant at a = 0.05), indicating that differentmodes of gene duplication tend to result in different extents ofdivergence in body methylation levels. The physical distancebetween single-gene duplicates (in terms of number of genesapart) also followed a trend: transposed duplicates > proximalduplicates > tandem duplicates. We hypothesized that there maybe position effects that affect body methylation levels, for exam-ple, genes that are closer to each other on chromosomes tend tohave more similar body methylation levels. To this end, we ran-domly selected 20 000 gene pairs on the same chromosomes andcomputed the correlations between divergence in body methyla-tion levels and physical distance. These correlations ranged from0.053 to 0.061 (P < 4.29 10�14), indicating that there existweak position effects that affect body methylation levels for allrice genes. For single-gene duplicates, these correlations rangedfrom 0.111 to 0.137 (P < 2.29 10�16), indicating that theposition effects increase slightly for single-gene duplicate pairsrelative to random gene pairs. At the same physical distance, sin-gle-gene duplicates diverge less in body methylation levels than

Table 1 Statistics on rice (Oryza sativa) genes of different origins andduplication modes

Gene originNumber of genepairs

Number of distinctgenes

Non-TE-related N/A 41 046Singletons N/A 12 618Duplicates N/A 28 428WGD 3087 5061Tandem 2008 3529Proximal 2484 3728Transposed 6269 6269Dispersed N/A 12 957

TE-related N/A 15 232

N/A, not applicable; TE, transposable element.

New Phytologist (2013) � 2013 The Authors

New Phytologist� 2013 New Phytologist Trustwww.newphytologist.com

Research

NewPhytologist4

do random gene pairs (Fig. 2b), suggesting that body methylationpatterns are either copied or recapitulated following gene duplica-tion.

Relationship between body methylation patterns and Ks forpairs of duplicated genes

To understand how gene body methylation evolves followinggene duplication, it may be helpful to relate patterns of bodymethylation of duplicated genes to the divergence of their codingsequence. Synonymous (Ks) substitution rates largely reflect theneutral mutation rates of coding sequences, suggested to increaseapproximately linearly with time for relatively low levels ofsequence divergence (Li, 1997). We first related divergence inbody methylation levels between duplicated genes to Ks usinglinear regression (Fig. 3a). Positive correlations were found for all

duplication modes (0.113 � r � 0.175, P < 2.29 10�16). Forsingle-gene duplicates, these correlations ranged from 0.112 to0.185 (P � 1.0819 10�9). However, as we have shown that, forsingle-gene duplicates, there is a weak correlation between diver-gence in body methylation levels and physical distance, theposition effects could be a nuisance factor for the correlationbetween divergence in body methylation levels and Ks. Toremove the effect of physical distance on these correlations forsingle-gene duplicates, we computed the partial correlationsbetween divergence in body methylation levels and Ks. Thesepartial correlations ranged from 0.101 to 0.159 (P � 3.794910�8), declining by 0.01–0.03 from their corresponding correla-tions, indicating that physical distance has a very weak effect onthe correlation between divergence in body methylation levelsand Ks. Thus, divergence in body methylation levels betweenduplicated genes tends to increase with Ks. Moreover, at similar

(a)

(b)

(c)

(d)

Fig. 1 Gene body methylation shows different patterns associated with gene origins and duplication modes. Each column represents one tissue. (a)Distribution of body methylation levels for all rice genes. (b) Comparison of distributions of body methylation levels between transposable element (TE)-related and non-TE-related genes. (c) Comparison of distributions of body methylation levels between singleton and duplicate genes. (d) Comparison ofdistributions of body methylation levels among whole-genome duplication (WGD), tandem, proximal and transposed duplicates.

� 2013 The Authors

New Phytologist� 2013 New Phytologist TrustNew Phytologist (2013)

www.newphytologist.com

NewPhytologist Research 5

Ks levels, WGDs tend to have smaller divergence in body methyl-ation levels between duplicates than do tandem, proximal ortransposed duplications. The different extent of divergence inbody methylation levels between gene duplication modes may beexplained by the hypothesis that WGDs generate duplicatedchromosomal segments in which collinear duplicates are morelikely to have similar chromatin environments, whereas single-gene, especially transposed, duplications re-locate to newchromosomal positions which often have different chromatinenvironments.

Next, we related the body methylation levels of duplicatedgenes to Ks using linear regression (Fig. 3b). The direction of thecorrelations differs among different modes of gene duplication:Body methylation of WGD duplicates is positively correlatedwith Ks (0.051 � r � 0.084, P < 0.05), whereas body methyla-tion of single-gene duplicates decreases with Ks (�0.212 �r � �0.082, P < 9.49 10�4). Some duplicated genes are highlymethylated, particularly those generated by single-gene duplica-tions. It is well known that single-gene duplicates have a shorterhalf-life than WGD-generated duplicates (Lynch & Conery,2000). Different rates of nonrandom gene loss shortly afterWGD and single-gene duplication may contribute to the con-trasting directions of the correlations between body methylationlevels and Ks. In the first few million years following single-geneduplication, many duplicates become nonfunctionalized and arelost (Innan & Kondrashov, 2010). Biases among these genesmay mitigate the long-term tendency towards increased bodymethylation, as in WGD duplicates, for example if highly body-methylated duplicates are preferentially lost. Thus, there could belinks between body methylation patterns and the probability oflong-term survival of duplicated genes.

Relationship between gene body methylation and geneexpression

The observation that TE-related genes are highly body-methy-lated, but little expressed, appears to conflict with the observationthat body methylation has a positive effect on the levels of geneexpression (Zhang et al., 2006; Zilberman et al., 2007; Zemachet al., 2010b; Takuno & Gaut, 2012). However, these twoobservations might be reconciled if gene body methylation hasheterogeneous effects on gene expression, that is, gene bodymethylation affects gene expression in different ways under dif-ferent conditions. We plotted the regression lines between geneexpression levels and body methylation levels for all non-TE-related genes based on each tissue, using smooth splines withdifferent degrees of freedom (Fig. 4); this showed that intermedi-ate body methylation tends to be associated with higher geneexpression levels than both low and high body methylation. Totest this observation statistically, we computed the correlationsbetween body methylation levels and expression levels for thegenes with body methylation levels of < 0.5 and � 0.5. Thesecorrelations ranged from 0.223 to 0.284 (P < 2.29 10�16) whenthe body methylation level was < 0.5, and from �0.182 to�0.101 (P � 1.6489 10�9) when the body methylation levelwas � 0.5. This result suggests that intermediate body methyla-tion may indeed have positive effects on transcription, possiblythrough the enhancement of accurate splicing of primary tran-scripts, whereas high body methylation is more likely to repressgene expression, which may lead to pseudofunctionalization orgene losses.

We related gene expression to variances of body methylationlevels across tissues. Based on Fig. S1, we inferred that TE-related

(a)

(b)

Fig. 2 Divergence in body methylation levels between duplicated genes. Each column represents one tissue. (a) Comparison of divergence in bodymethylation levels among different modes of gene duplication. Whiskers correspond to the minimum and maximum values in the data. (b) Linearregressions between divergence in body methylation levels and physical distance for random gene pairs and single-gene duplicate pairs.

New Phytologist (2013) � 2013 The Authors

New Phytologist� 2013 New Phytologist Trustwww.newphytologist.com

Research

NewPhytologist6

genes tend to have more uniform body methylation levels(closer to the ‘y = x’ diagonal line) than do non-TE-relatedgenes, which was then proven statistically by two-sample t-testfor variances of body methylation levels between TE-related andnon-TE-related genes (P < 2.29 10�16). This observation indi-cates that the ‘repressive’ TE-related body methylation tends tobe uniform across tissues. For non-TE-related genes, we foundthat there is a significant positive correlation (r = 0.173,P < 2.29 10�16) between the average expression levels and vari-ances of body methylation levels, indicating that non-TE-relatedgenes with high expression tend to vary in body methylationacross tissues.

Discussion

We have related gene body methylation to gene origins andduplication modes in rice. Our results suggest that genes of dif-ferent origins and duplication modes are associated with differentpatterns of gene body methylation, and highly body-methylatedgenes are preferentially lost following gene duplication. Althoughit is known that natural variations in DNA methylation existamong individuals of a species (Becker et al., 2011; Bell et al.,2011; Fraser et al., 2012) and that, within an individual, manycytosines may be differentially methylated among different tissues(Zemach et al., 2010a; Zhang et al., 2011; Vining et al., 2012) or

(a)

(b)

Fig. 3 Relationships between patterns of body methylation and Ks for duplicated genes. Each column represents one tissue. (a) Linear regressions betweendivergence in body methylation levels and Ks for different modes of gene duplication. (b) Linear regressions between body methylation levels and Ks fordifferent modes of gene duplication.

� 2013 The Authors

New Phytologist� 2013 New Phytologist TrustNew Phytologist (2013)

www.newphytologist.com

NewPhytologist Research 7

developmental stages (Alisch et al., 2012), or between normaland stress conditions (Chinnusamy & Zhu, 2009), our analysesof body methylation patterns based on five different tissues revealhighly consistent evolutionary trends. We summarized a bodymethylation level for each gene that may involve hundreds ofCpG dinucleotides. Further, we compared body methylationlevels among large groups of genes with each group consisting ofseveral thousand genes. Thus, our computational procedure,through mitigation of the effect of dynamic changes of

methylation status that may occur at some cytosine nucleotides,is reliable for large-scale evolutionary analyses.

DNA methylation is an important epigenetic mark and canaffect the nucleotide composition of DNA sequences. DNAmethylation can trigger the spontaneous deamination of methyl-cytosine to thymine (Bird, 1980; Jones et al., 1987; Pfeifer,2006), which makes DNA methylation levels and GC levelsinterdependent. The data of this study showed strong negativecorrelations (�0.514 � r � �0.458, P < 2.29 10�16) between

Fig. 4 Gene body methylation has heterogeneous effects on gene expression. Smooth spline curves are fitted between gene expression levels and bodymethylation levels for all non-transposable element (TE)-related genes, based on different degrees of freedom. A body methylation level of 0.5 appears tobe a point dividing the up- and down-regulation of gene expression levels.

New Phytologist (2013) � 2013 The Authors

New Phytologist� 2013 New Phytologist Trustwww.newphytologist.com

Research

NewPhytologist8

body methylation levels and the GC content at the third codonposition (GC3) for rice genes. The evolution of DNA methyla-tion patterns and DNA sequences can be intermingled, and thestudy of DNA methylation evolution may facilitate the under-standing of mechanisms for DNA sequence evolution.

In eukaryotic genomes, there are multiple epigenetic marks,including DNA methylation, histone modifications, nucleosomepositioning and others, all of which may contribute to the regula-tion of gene expression (Henderson & Jacobsen, 2007). Amongthese epigenetic marks, DNA methylation has been studiedextensively for its role in the regulation of gene expression. Inrice, Li et al. (2008) showed an interplay between DNA methyla-tion, histone methylation and gene expression, and that geneexpression appeared to be repressed by DNA methylation, but tobe rescued by the concurrence of DNA and H3K4 methylation.He et al. (2010) found a weak negative correlation between DNAmethylation and transcript levels, and that TE-related genes arehighly methylated and little transcribed. In Populus trichocarpa,gene body methylation is suggested to have a more repressiveeffect than promoter methylation on transcription (Vining et al.,2012). By contrast, in Arabidopsis, many studies have suggestedthat gene body methylation is associated with active transcription(Zhang et al., 2006; Zilberman et al., 2007; Takuno & Gaut,2012). The conflicting conclusions on the direction of the rela-tionship between body methylation and gene expression in previ-ous studies may be because an overall correlation pattern hasoften been sought, overlooking the possibility that body methyla-tion may have heterogeneous effects on gene expression.

In conclusion, in rice, using the proportion of methylatedCpG dinucleotides within coding regions to measure the level ofgene body methylation, we found that body methylation levelsfollow a bimodal distribution peaking at ‘0’ or ‘1’, and displaydistinct patterns associated with different gene origins and dupli-cation modes. For pairs of duplicated genes, divergence in bodymethylation levels increases with physical distance and Ks, andWGDs show lower divergence than single-gene duplications atsimilar Ks levels. Body methylation of WGD duplicates tends toincrease with Ks, whereas the body methylation levels ofsingle-gene duplicates decrease with Ks, indicating that highlybody-methylated genes are preferentially lost following geneduplication. Moderate body methylation tends to enhance geneexpression, whereas light or heavy body methylation tends torepress gene expression. This study suggests that genes ofdifferent origins and duplication modes have distinct body meth-ylation patterns, and body methylation evolves with DNAsequence evolution, has heterogeneous effects on gene expressionand might be related to survivorship of duplicated genes.

Acknowledgements

We thank Barry Marler for IT support, Xinyu Liu for statisticalconsulting and Haibao Tang for providing python scripts.A.H.P. appreciates funding from the National Science Founda-tion (NSF: DBI 0849896, MCB 0821096, MCB 1021718).This study was supported in part by resources and technicalexpertise from the Georgia Advanced Computing Resource

Center, a partnership between the Office of the Vice Presidentfor Research and the Office of the Chief Information Officer.

References

Alisch RS, Barwick BG, Chopra P, Myrick LK, Satten GA, Conneely KN,

Warren ST. 2012. Age-associated DNA methylation in pediatric populations.

Genome Research 22: 623–632.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local

alignment search tool. Journal of Molecular Biology 215: 403–410.Arabidopsis Interactome Mapping Consortium. 2011. Evidence for network

evolution in an Arabidopsis interactome map. Science 333: 601–607.Becker C, Hagmann J, Muller J, Koenig D, Stegle O, Borgwardt K, Weigel D.

2011. Spontaneous epigenetic variation in the Arabidopsis thalianamethylome.

Nature 480: 245–249.Bell JT, Pai AA, Pickrell JK, Gaffney DJ, Pique-Regi R, Degner JF, Gilad Y,

Pritchard JK. 2011. DNA methylation patterns associate with genetic and gene

expression variation in HapMap cell lines. Genome Biology 12: R10.Bird AP. 1980. DNA methylation and the frequency of CpG in animal DNA.

Nucleic Acids Research 8: 1499–1504.Casneuf T, De Bodt S, Raes J, Maere S, Van de Peer Y. 2006. Nonrandom

divergence of gene expression following gene and genome duplications in the

flowering plant Arabidopsis thaliana. Genome Biology 7: R13.Chinnusamy V, Zhu JK. 2009. Epigenetic regulation of stress responses in plants.

Current Opinion in Plant Biology 12: 133–139.Cusack BP, Wolfe KH. 2007. Not born equal: increased rate asymmetry in

relocated and retrotransposed rodent gene duplicates.Molecular Biology andEvolution 24: 679–686.

De Smet R, Van de Peer Y. 2012. Redundancy and rewiring of genetic networks

following genome-wide duplication events. Current Opinion in Plant Biology15: 168–176.

Feng S, Cokus SJ, Zhang X, Chen PY, Bostick M, Goll MG, Hetzel J, Jain J,

Strauss SH, Halpern ME et al. 2010. Conservation and divergence of

methylation patterning in plants and animals. Proceedings of the NationalAcademy of Sciences, USA 107: 8689–8694.

Ficklin SP, Luo F, Feltus FA. 2010. The association of multiple interacting genes

with specific phenotypes in rice using gene coexpression networks. PlantPhysiology 154: 13–24.

Flagel LE, Wendel JF. 2009. Gene duplication and evolutionary novelty in

plants. New Phytologist 183: 557–564.Fraser HB, Lam LL, Neumann SM, Kobor MS. 2012. Population-specificity of

human DNA methylation. Genome Biology 13: R8.Freeling M. 2009. Bias in plant gene content following different sorts of

duplication: tandem, whole-genome, segmental, or by transposition. AnnualReview of Plant Biology 60: 433–453.

Freeling M, Lyons E, Pedersen B, Alam M, Ming R, Lisch D. 2008.Many or

most genes in Arabidopsis transposed after the origin of the order Brassicales.

Genome Research 18: 1924–1937.Freeling M, Thomas BC. 2006. Gene-balanced duplications, like tetraploidy,

provide predictable drive to increase morphological complexity. GenomeResearch 16: 805–814.

Ganko EW, Meyers BC, Vision TJ. 2007. Divergence in expression between

duplicated genes in Arabidopsis.Molecular Biology and Evolution 24: 2298–2309.

Ha M, Kim ED, Chen ZJ. 2009. Duplicate genes increase expression diversity in

closely related species and allopolyploids. Proceedings of the National Academy ofSciences, USA 106: 2295–2300.

He G, Zhu X, Elling AA, Chen L, Wang X, Guo L, Liang M, He H, Zhang H,

Chen F et al. 2010. Global epigenetic and transcriptional trends among two

rice subspecies and their reciprocal hybrids. Plant Cell 22: 17–33.Henderson IR, Jacobsen SE. 2007. Epigenetic inheritance in plants. Nature 447:418–424.

Innan H, Kondrashov F. 2010. The evolution of gene duplications: classifying

and distinguishing between models. Nature Reviews Genetics 11: 97–108.International Rice Genome Sequencing Project. 2005. The map-based sequence

of the rice genome. Nature 436: 793–800.

� 2013 The Authors

New Phytologist� 2013 New Phytologist TrustNew Phytologist (2013)

www.newphytologist.com

NewPhytologist Research 9

Jiao Y, Deng XW. 2007. A genome-wide transcriptional activity survey of rice

transposable element-related genes. Genome Biology 8: R28.Jones M, Wagner R, Radman M. 1987.Mismatch repair of deaminated 5-

methyl-cytosine. Journal of Molecular Biology 194: 155–159.Kolasinska-Zwierz P, Down T, Latorre I, Liu T, Liu XS, Ahringer J. 2009.

Differential chromatin marking of introns and expressed exons by H3K36me3.

Nature Genetics 41: 376–381.Lee TF, Zhai J, Meyers BC. 2010. Conservation and divergence in eukaryotic

DNA methylation. Proceedings of the National Academy of Sciences, USA 107:

9027–9028.Li WH. 1997.Molecular evolution. Sunderland, MA, USA: Sinauer Associates.

Li X, Wang X, He K, Ma Y, Su N, He H, Stolc V, Tongprasit W, Jin W, Jiang J

et al. 2008.High-resolution mapping of epigenetic modifications of the rice

genome uncovers interplay between DNA methylation, histone methylation,

and gene expression. Plant Cell 20: 259–276.Li Z, Zhang H, Ge S, Gu X, Gao G, Luo J. 2009. Expression pattern divergence

of duplicated genes in rice. BMC Bioinformatics 10(Suppl 6): S8.Lorincz MC, Dickerson DR, Schmitt M, Groudine M. 2004. Intragenic DNA

methylation alters chromatin structure and elongation efficiency in mammalian

cells. Nature Structural & Molecular Biology 11: 1068–1075.Luco RF, Pan Q, Tominaga K, Blencowe BJ, Pereira-Smith OM, Misteli T.

2010. Regulation of alternative splicing by histone modifications. Science 327:996–1000.

Lynch M, Conery JS. 2000. The evolutionary fate and consequences of duplicate

genes. Science 290: 1151–1155.Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van de

Peer Y. 2005.Modeling gene and genome duplications in eukaryotes.

Proceedings of the National Academy of Sciences, USA 102: 5454–5459.Maunakea AK, Nagarajan RP, Bilenky M, Ballinger TJ, D’Souza C, Fouse SD,

Johnson BE, Hong C, Nielsen C, Zhao Y et al. 2010. Conserved role ofintragenic DNA methylation in regulating alternative promoters. Nature 466:253–257.

Nei M, Gojobori T. 1986. Simple methods for estimating the numbers of

synonymous and nonsynonymous nucleotide substitutions.Molecular Biologyand Evolution 3: 418–426.

Ohno S. 1970. Evolution by gene duplication. New York, NY, USA: Springer.

Paterson AH, Bowers JE, Chapman BA. 2004. Ancient polyploidization

predating divergence of the cereals, and its consequences for comparative

genomics. Proceedings of the National Academy of Sciences, USA 101: 9903–9908.

Pfeifer GP. 2006.Mutagenesis at methylated CpG sequences. DNA Methylation:Basic Mechanisms 301: 259–281.

Sarda S, Zeng J, Hunt BG, Yi SV. 2012. The evolution of invertebrate gene body

methylation.Molecular Biology and Evolution 29: 1907–1916.Schwartz S, Meshorer E, Ast G. 2009. Chromatin organization marks exon–intron structure. Nature Structural & Molecular Biology 16: 990–995.

Su Z, Han L, Zhao Z. 2011. Conservation and divergence of DNA methylation

in eukaryotes: new insights from single base-resolution DNA methylomes.

Epigenetics 6: 134–140.Takuno S, Gaut BS. 2012. Body-methylated genes in Arabidopsis thaliana arefunctionally important and evolve slowly.Molecular Biology and Evolution 29:219–227.

Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH. 2008a. Synteny

and collinearity in plant genomes. Science 320: 486–488.Tang H, Bowers JE, Wang X, Paterson AH. 2010. Angiosperm genome

comparisons reveal early polyploidy in the monocot lineage. Proceedings of theNational Academy of Sciences, USA 107: 472–477.

Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH. 2008b.

Unraveling ancient hexaploidy through multiply-aligned angiosperm gene

maps. Genome Research 18: 1944–1954.Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTALW: improving the

sensitivity of progressive multiple sequence alignment through sequence

weighting, position-specific gap penalties and weight matrix choice. NucleicAcids Research 22: 4673–4680.

Tibshirani R, Hastie T. 2007.Outlier sums for differential gene expression

analysis. Biostatistics 8: 2–8.Vining KJ, Pomraning KR, Wilhelm LJ, Priest HD, Pellegrini M, Mockler TC,

Freitag M, Strauss SH. 2012. Dynamic DNA cytosine methylation in the

Populus trichocarpa genome: tissue-level variation and relationship to gene

expression. BMC Genomics 13: 27.Wang X, Tang H, Bowers JE, Feltus FA, Paterson AH. 2007. Extensive

concerted evolution of rice paralogs and the road to regaining independence.

Genetics 177: 1753–1763.Wang Y, Wang X, Paterson AH. 2012. Genome and gene duplications and gene

expression divergence: a view from plants. Annals of the New York Academy ofSciences 1256: 1–14.

Wang Y, Wang X, Tang H, Tan X, Ficklin SP, Feltus FA, Paterson AH. 2011.

Modes of gene duplication contribute differently to genetic novelty and

redundancy, but show parallels across divergent angiosperms. PLoS ONE 6:

e28150.

Woodhouse MR, Pedersen B, FreelingM. 2010. Transposed genes in Arabidopsis

are often associated with flanking repeats. PLoS Genetics 6: e1000949.Woodhouse MR, Tang H, Freeling M. 2011. Different gene families in

Arabidopsis thaliana transposed in different epochs and at different frequencies

throughout the rosids. Plant Cell 23: 4241–4253.Yang Z, Nielsen R. 2000. Estimating synonymous and nonsynonymous

substitution rates under realistic evolutionary models.Molecular Biology andEvolution 17: 32–43.

Yuan Q, Ouyang S, Wang A, Zhu W, Maiti R, Lin H, Hamilton J, Haas B,

Sultana R, Cheung F et al. 2005. The institute for genomic research Osa1 rice

genome annotation database. Plant Physiology 138: 18–26.Zemach A, Kim MY, Silva P, Rodrigues JA, Dotson B, Brooks MD, Zilberman

D. 2010a. Local DNA hypomethylation activates genes in rice endosperm.

Proceedings of the National Academy of Sciences, USA 107: 18729–18734.Zemach A, McDaniel IE, Silva P, Zilberman D. 2010b. Genome-wide

evolutionary analysis of eukaryotic DNA methylation. Science 328: 916–919.Zhang M, Xu C, von Wettstein D, Liu B. 2011. Tissue-specific differences in

cytosine methylation and their association with differential gene expression in

sorghum. Plant Physiology 156: 1955–1966.Zhang X, Yazaki J, Sundaresan A, Cokus S, Chan SW, Chen H, Henderson IR,

Shinn P, Pellegrini M, Jacobsen SE et al. 2006. Genome-wide high-resolution

mapping and functional analysis of DNA methylation in Arabidopsis. Cell 126:1189–1201.

Zhao XP, Si Y, Hanson RE, Crane CF, Price HJ, Stelly DM, Wendel JF,

Paterson AH. 1998. Dispersed repetitive DNA has spread to new genomes

since polyploid formation in cotton. Genome Research 8: 479–492.Zilberman D, Gehring M, Tran RK, Ballinger T, Henikoff S. 2007. Genome-

wide analysis of Arabidopsis thalianaDNA methylation uncovers an

interdependence between methylation and transcription. Nature Genetics 39:61–69.

Supporting Information

Additional supporting information may be found in the onlineversion of this article.

Fig. S1 Comparison of body methylation levels of all genesbetween all pairs of tissues.

Table S1 Classification of rice duplicated genes

Please note: Wiley-Blackwell are not responsible for the contentor functionality of any supporting information supplied by theauthors. Any queries (other than missing material) should bedirected to the New Phytologist Central Office.

New Phytologist (2013) � 2013 The Authors

New Phytologist� 2013 New Phytologist Trustwww.newphytologist.com

Research

NewPhytologist10


Recommended