+ All Categories
Home > Documents > Near Homogeneity of PR2-Bias Fingerprints in the Human...

Near Homogeneity of PR2-Bias Fingerprints in the Human...

Date post: 25-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
8
Near Homogeneity of PR2-Bias Fingerprints in the Human Genome and Their Implications in Phylogenetic Analyses Noboru Sueoka University of Colorado, Department of Molecular, Cellular, and Developmental Biology, Boulder, CO 80309-0347, USA Received: 21 December 2000 / Accepted: 16 February 2001 Abstract. Genes of a multicellular organism are het- erogeneous in the G+C content, which is particularly true in the third codon position. The extent of deviation from intra-strand equality rule of A=T and G=C (Parity Rule 2, or PR2) is specific for individual amino acids and has been expressed as the PR2-bias fingerprint. Previous results suggested that the PR2-bias fingerprints tend to be similar among the genes of an organism, and the fingerprint of the organism is specific for different taxa, reflecting phylogenetic relationships of organisms. In this study, using coding sequences of a large number of human genes, we examined the intragenomic heteroge- neity of their PR2-bias fingerprints in relation to the G+C content of the third codon position (P 3 ). Result shows that the PR2-bias fingerprint is similar in the wide range of the G+C content at the third codon position (0.30–0.80). This range covers approximately 89% of the genes, and further analysis of the high G+C range (0.80– 1.00), where genes with normal PR2-bias fingerprints and those with anomalous fingerprints are mixed, shows that the total of 95% of genes have the similar finger prints. The result indicates that the PR2-bias fingerprint is a unique property of an organism and represents the overall characteristics of the genome. Combined with the previous results that the evolutionary change of the PR2- bias fingerprint is a slow process, PR2-bias fingerprints may be used for the phylogenetic analyses to supplement and augment the conventional methods that use the dif- ferences of the sequences of orthologous proteins and nucleic acids. Potential advantages and disadvantages of the PR2-bias fingerprint analysis are discussed. Key words: PR2 biases fingerprints — DNA G+C contents — Phylogenetic analyses Introduction Parity Rule 2 (PR2) is a rule of DNA base composition that dictates A 4 T and G 4 C within a strand when there are no biases in mutation and selection pressures between the two strands of DNA (Sueoka 1995) (here, A, T, G, and C are fractional content of the four bases A, T, G, and C where A + T + G + C 4 1.) Intra-strand relationship, A 4 T and G 4 C, was first observed experimentally by Chargaff and his collaborators in long stretches of contiguous single strands of Bacillus subtilis DNA (Karkas et al. 1968; Rudner et al. 1968, 1969). Their observation was true for the average intra-strand composition for the long stretches of nucleotides. How- ever, the cause for the rule was unknown. PR2 was theo- retically proven in terms of intra-strand rates in the case where mutation and selection are equally effective (or random) in both strands of DNA (Sueoka, 1995; and also see Wu and Maeda, 1987; Wu, 1991; Furusawa and Doi, 1992; Lobry, 1995). In reality, this rule has been ob- served only in inter-genic regions of bacterial genomes (unpublished data) and not in coded regions. Intra-strand biases from A 4 T and G 4 C are commonly observed in coded regions as amino acid- and species-specific Email: [email protected] J Mol Evol (2001) 53:469–476 DOI: 10.1007/s002390010237 © Springer-Verlag New York Inc. 2001
Transcript
Page 1: Near Homogeneity of PR2-Bias Fingerprints in the Human ...pbil.univ-lyon1.fr/members/lobry/articles/JME_1995_40_326.kim/Sue… · PR2-bias Plot.Plotting AT-bias [A/(A + T)] as the

Near Homogeneity of PR2-Bias Fingerprints in the Human Genome andTheir Implications in Phylogenetic Analyses

Noboru Sueoka

University of Colorado, Department of Molecular, Cellular, and Developmental Biology, Boulder, CO 80309-0347, USA

Received: 21 December 2000 / Accepted: 16 February 2001

Abstract. Genes of a multicellular organism are het-erogeneous in the G+C content, which is particularly truein the third codon position. The extent of deviation fromintra-strand equality rule ofA = T and G = C (ParityRule 2, or PR2) is specific for individual amino acids andhas been expressed as the PR2-bias fingerprint. Previousresults suggested that the PR2-bias fingerprints tend tobe similar among the genes of an organism, and thefingerprint of the organism is specific for different taxa,reflecting phylogenetic relationships of organisms. Inthis study, using coding sequences of a large number ofhuman genes, we examined the intragenomic heteroge-neity of their PR2-bias fingerprints in relation to theG+C content of the third codon position (P3). Resultshows that the PR2-bias fingerprint is similar in the widerange of the G+C content at the third codon position(0.30–0.80). This range covers approximately 89% of thegenes, and further analysis of the high G+C range (0.80–1.00), where genes with normal PR2-bias fingerprintsand those with anomalous fingerprints are mixed, showsthat the total of 95% of genes have the similar fingerprints. The result indicates that the PR2-bias fingerprintis a unique property of an organism and represents theoverall characteristics of the genome. Combined with theprevious results that the evolutionary change of the PR2-bias fingerprint is a slow process, PR2-bias fingerprintsmay be used for the phylogenetic analyses to supplementand augment the conventional methods that use the dif-ferences of the sequences of orthologous proteins and

nucleic acids. Potential advantages and disadvantages ofthe PR2-bias fingerprint analysis are discussed.

Key words: PR2 biases fingerprints — DNA G+Ccontents — Phylogenetic analyses

Introduction

Parity Rule 2 (PR2) is a rule of DNA base compositionthat dictatesA 4 T and G 4 C within a strand whenthere are no biases in mutation and selection pressuresbetween the two strands of DNA (Sueoka 1995) (here,A,T, G,andC are fractional content of the four bases A, T,G, and C whereA + T + G + C 4 1.) Intra-strandrelationship,A 4 T and G 4 C, was first observedexperimentally by Chargaff and his collaborators in longstretches of contiguous single strands ofBacillus subtilisDNA (Karkas et al. 1968; Rudner et al. 1968, 1969).Their observation was true for the average intra-strandcomposition for the long stretches of nucleotides. How-ever, the cause for the rule was unknown. PR2 was theo-retically proven in terms of intra-strand rates in the casewhere mutation and selection are equally effective (orrandom) in both strands of DNA (Sueoka, 1995; and alsosee Wu and Maeda, 1987; Wu, 1991; Furusawa and Doi,1992; Lobry, 1995). In reality, this rule has been ob-served only in inter-genic regions of bacterial genomes(unpublished data) and not in coded regions. Intra-strandbiases fromA 4 T andG 4 C are commonly observedin coded regions as amino acid- and species-specificEmail: [email protected]

J Mol Evol (2001) 53:469–476DOI: 10.1007/s002390010237

© Springer-Verlag New York Inc. 2001

Page 2: Near Homogeneity of PR2-Bias Fingerprints in the Human ...pbil.univ-lyon1.fr/members/lobry/articles/JME_1995_40_326.kim/Sue… · PR2-bias Plot.Plotting AT-bias [A/(A + T)] as the

manners (Sueoka, 1995). Deviations from PR2 havebeen visualized by plottingG3/(G3 + C3) as the abscissaand A3/(A3 + T3) as the ordinate for eight four-codonamino acids (alanine, arginine4, glycine, leucine4, pro-line, serine4, threonine, and valine) (Sueoka, 1955).(Here,A3, T3, G3, andC3 are frequencies of A-, T-, G-,and C- nucleotides at the third codon position.) Theamino acid-specific pattern of the PR2-bias plot isunique to taxa and reflects phylogenetic relationships oforganisms, and it has been termed the “PR2-bias finger-print” (Sueoka, 1995, 1999a).

The previous studies (Sueoka, 1995, 1999b) showedthat the PR2-bias fingerprint is evolutionarily a conser-vative characteristic in bacteria as well as in vertebrates.The PR2-bias fingerprints have a definite similarityamong bacterial species that are related but have differ-ent DNA G+C contents, for example, amongEscherichiacoli (0.55),Serratia marscesens(0.71), andPseudomo-nas aeruginosa(0.82). Here, the value in parenthesis isthe average G+C content of the third codon position(GC3). PR2-bias fingerprints are also similar among ver-tebrates so far examined, except for fishes whose finger-prints have basic similarities with other vertebrates butwith some clear differences (Sueoka 1995 and unpub-lished results).

In human, the PR2-bias fingerprints of genes havebeen analyzed for 846 (Sueoka, 1999b) and subsequentlyfor 14,026 genes (Sueoka and Kawanishi 2000). Thethird codon position of individual genes has widely het-erogeneous G+C contents. G+C contents of 87% of theanalyzed genes were between 0.25 and 0.75 and had asimilar PR2 bias fingerprint. The G+C content of the restof the genes (13%) was either lower than 0.25 or higherthan 0.75 and had anomalous fingerprints. Further analy-ses indicated that the region with anomalous fingerprintswas most likely due to mixtures of genes with normaland anomalous fingerprints. After the correction, a morelikely estimate of the normal genes was approximately95%. These results indicate that the PR2-bias fingerprintis the unique property of the genome, independent of theG+C content of individual genes.

The purpose of the present work was to examine fur-ther how the intra-genomic heterogeneity in human in-fluences the PR2-bias fingerprint by investigating cur-rently available nonredundant, 36,533 sequenced genesin GenBank. The PR2-bias fingerprint was assessed forits value as a tool for the analysis of phylogenetic rela-tions that could supplement the conventional methods,and in some cases may provide critical information fordefinitive phylogenetic relationships. Present results sup-port the previous conclusions that the PR2-bias finger-prints are independent of the G+C content in the majorityof genes and can be applied between widely distant taxa.It is also noted that the PR2-bias fingerprint providesdetailed information on the codon usage for the eightfour-codon amino acids.

Materials and Methods

Dataset.The codon usage tables of human, mouse, and chicken geneswere extracted from the NCBI server (ftp://ncbi.nlm.nih.gov/genbank/genomes/) through the ACNUC system developed at the Universite´ deLyon 1 (Gouy et al. 1985). Only sequences with more than 100 codonswere used for the analyses, and redundant genes that have the samefrequencies for all the codons were eliminated. Upon the author’s re-quest, J. R. Lobry developed computer programs of the data deliveryand listed the filtered genes through ACNUC. Numerical analyses andplotting of the data were carried out using MS Excel 2000 and Sigma-Plot 2000.

In the human genetic code, the third codon position of synonymouscodons includes A/T (W) and G/C (S) nucleotides in equal number(symmetric) for most amino acids except for tryptophan (TGG), me-thionine (ATG), and one (ATA) of the three isoleucine codons.P3 is ameasure of the G+C content of the third codon position, and defined asthe G+C content of the third codon position for total codons fromwhich ATG, TGG, ATA and the termination codons (TAA, TAG, orTGA) have been removed. As in the previous report (Sueoka, 1999a),these six codons were also removed from the calculation of the G + Ccontent of the first codon position (P1) and that of the second codonposition (P2). P12 is the average ofP1 andP2. The removal of these sixcodons from analysis eliminates odd-numbered synonymous codon setsand therefore avoids an extra cause of potential bias from PR2. Inpractice, the parameter (P3) is only slightly different from the G+Ccontent of all third codon positions (GC3).

Relative Neutrality Plots.In the plot ofP12 againstP3 (Fig. 1), eachpoint represents a gene. IfP12 values were as neutral asP3 againstselection, data points would fit the diagonal line. Thus, the slopesmaller than 1 indicates that the neutrality ofP12 is less than that ofP3.Accordingly, the slope provides a measure of relative neutrality ofP(e.g., P1, P2, or P12; distribution on the ordinate) to that ofP3. Therationale for usingP3 as the neutrality standard, where the neutrality ofP3 is assumed to be 1, has been discussed previously in detail (Sueoka1988, 1999a). The merit of this plotting is threefold: (1) theP3-P plotshows the intra-genomic or inter-species distribution ofP as well asP3

(distribution on the abscissa); (2) the regression coefficient (slope)provides an estimate of the degree of neutrality of aP relative toP3; (3)the intercept (OP) of the regression line ofP with the diagonal linerepresents the G+C content that is optimal forP, where restrictiveselection (1 - neutrality) is not effective onP (Sueoka 1988, 1993). Inthis study, to analyze PR2-bias fingerprints of genes in individualP3

ranges, genes were subdivided into subclasses byP3 intervals of 0.05(Fig. 1).

PR2-bias Plot.Plotting AT-bias [A/(A + T)] as the ordinate andGC-bias [G/(G + C)] as the abscissa shows PR2-biases as a uniquepattern (Sueoka 1995). The center of the plot, where both coordinatesare 0.5, is the place whereA 4 T andG 4 C (PR2). A vector from thecenter represents the extent and direction of biases from PR2. PR2 biasplots are particularly informative when PR2 biases at the third codonposition of the four-codon amino acids of individual genes are plotted.In this case, “A3/(A3 + T3) | 4” and “G3/(G3 + C3) | 4” are plotted as theordinate and abscissa, respectively. Here “| 4” denotes the four-codonamino acids. The four-codon amino acids are alanine, arginine4 (CGA,CGT, CGG, CGC), glycine, leucine4 (CTA, CTT, CTG, CTC), proline,serine4 (TCA, TCT, TCG, TCC), threonine, and valine.

Results

PR2-bias Fingerprints and the DNA G+C Content

In the present study, PR2 bias fingerprints were analyzedfor 36,533 nonredundant human genes whoseP3 range is

470

Page 3: Near Homogeneity of PR2-Bias Fingerprints in the Human ...pbil.univ-lyon1.fr/members/lobry/articles/JME_1995_40_326.kim/Sue… · PR2-bias Plot.Plotting AT-bias [A/(A + T)] as the

from 0.124 to 0.995 (Fig. 1). Because only three genesare registered in the lowestP3-range (0.10–0.20) (Table1), analyses usingP3-P12 plotting and PR2-bias finger-printing were performed on the genes inP3-ranges be-tween 0.20 and 1.00 (36,530 genes). The results areshown in Figs. 1 and 2. Fig. 1 shows theP3-P12 plot andthe distribution of genes in each 0.05P3 interval. Theregression coefficient (0.196) shows that the averageneutrality of theP12 is 20% relative to the neutrality ofP3 (assumed 1). A typical neutrality value in mammals is20%.

Fig. 2A presents PR2-bias fingerprint that consists ofthe average PR2 biases in the third codon position for theeight four-codon amino acids of all genes. Fig. 2Bpresents the fingerprint of the four-codon amino acids ofthe genes that are within theP3 range from 0.30 to 0.80that exclude anomalous fingerprints in both extremeG+C ranges. Fig. 2C includes the two lowest G+C ranges(0.20−0.25 and 0.25−0.30) that show anomalous finger-prints. Fig. 2D includes four contiguous ranges covering0.80 − 1.00, where anomalous fingerprints are observed.The PR2-bias fingerprints in individualP3 ranges be-tween 0.30 and 0.80 were similar to the combined fin-gerprint shown in Fig. 2B, thus they are not shown in thisarticle. Fingerprints for individual ranges were presentedin a previous article with 14,026 genes (Sueoka andKawanishi, 2000), where the fingerprint patterns weresimilar. The present result, therefore, solidifies the pre-vious conclusions based on smaller samples of humangenes [846 genes (Sueoka 1999b) and 14,026 genes(Sueoka and Kawanishi 2000)] on the similarity of thePR2 fingerprints in wide consecutive ranges of theP3

values (0.25–0.75).

Anomalous Features of PR2 Biases in Extreme Rangesof P3.

Anomalous features of PR2 biases were first found inhuman genes in the high G+C regions (Sueoka 1999b;

Sueoka and Kawanishi 2000) and confirmed in thepresent study (Figs. 2A,B). Whereas the PR2-bias fin-gerprint is similar in the G+C range of 0.30–0.80 (32,678genes), it is different in the both extreme regions, one inthe range of 0.20–0.30 (408 genes or 1.1%) and the otherin 0.80–1.00 (3,824 genes or 11%). Based on the gradualchange, the anomalous PR2-bias fingerprints in the highG+C regions were interpreted as the mixture of geneswith normal fingerprints and genes with anomalous fin-gerprints. Relative frequencies of the genes withtwo types of fingerprints were estimated from theG3/(G3 + C3) values of alanine and glycine in the highG+C ranges (Sueoka and Kawanishi 2000). The fre-quency of anomalous genes increased toward the higherG+C ranges in the high G+C, and, the average frequencywas approximately1⁄3 of the genes in the range. It wasconcluded that the total fraction of the genes withanomalous fingerprints was estimated as approximately4% of the total number of genes analyzed. When thesame analysis has been applied to the present data, the

Table 1. Human sequences in the low G+C range (0.10–0.30) of the third codon position (P3)

Genename

SequenceID Codons GC-cont GC3 P3

PR2 biases

DefinitionsG3/(G3 + C3 )

A3/(A3 + T3)

YA61 AAF34183 411 0.331 0.119 0.124 0.133 0.523 Gastric-associated differentially expressed*** AAG03013 525 0.314 0.168 0.160 0.760 0.476 Uncharacterized gastric protein YA42P*** CAA37886 1131 0.368 0.184 0.218 0.383 0.479 Succinate dehydrogenase flavoprotein subunitGART AAB70813 309 0.422 0.227 0.226 0.273 0.387 Glycinamide ribonucleotide formyltransferase*** AAF68845 438 0.409 0.177 0.226 0.261 0.393 hnRNP 2H9Dgag AAD51794 537 0.346 0.203 0.229 0.656 0.611 Gag protein/protein_id4AAD51794.1*** CAA33387 3234 0.608 0.233 0.241 0.359 0.414 COL3A1 mRNA for pro alpha-1 (III) collagen*** CAA25821 396 0.629 0.233 0.242 0.167 0.424 mRNA for alpha 1 (III) collagen fragment

(aa 892-1023)PBI BAA13971 405 0.541 0.219 0.244 0.357 0.450 Similar to salivary proline-rich protein P-BYRRM AAC16921 529 0.436 0.228 0.246 0.514 0.496 YRRM, expressed specifically in the testisCOL5A2 AAA52007 675 0.610 0.243 0.249 0.389 0.476 Collagen type V alpha-2 chain, partial cds

Of 31 human gene sequences in the lowestP3 range of 0.10–0.30, 11 with some functional information that may be useful for the furthercompositional studies appear here. The first three sequences are in the range of 0.10–0.20, they are not included in the PR2-bias analyses (Fig. 2).

Fig. 1. P3-P12 plot of 36,533 human gene sequences. Each data-pointrepresents one gene sequence. Note thatP3 covers the range from0.10–0.15 to 0.95–1.00. For the exactP3 values of the sequences withextremeP3 values, see Tables 1 and 2.

471

Page 4: Near Homogeneity of PR2-Bias Fingerprints in the Human ...pbil.univ-lyon1.fr/members/lobry/articles/JME_1995_40_326.kim/Sue… · PR2-bias Plot.Plotting AT-bias [A/(A + T)] as the

Fig

.2.

PR

2-bi

asfin

gerp

rints

ofhu

man

gene

sequ

ence

s.(A

)re

pres

ents

the

PR

2-bi

asfin

gerp

rintt

hatw

asco

nstr

ucte

dfo

r36

,533

gene

s(t

hree

gene

sin

the

rang

eof

0.15

–0.2

0ar

ein

clud

ed).(

B)

repr

esen

tsth

efin

gerp

rint

that

incl

udes

gene

sin

the

P3

rang

efr

om0.

30to

0.80

inw

hich

both

extr

eme

rang

esof

P3

are

excl

uded

.N

ote

that

the

two

finge

rprin

ts(A

and

B)

are

alm

ost

iden

tical

.(C

and

D)

The

low

P3

rang

e(0

.2–0

.3)

and

the

highP

3ra

nge

(0.8

0–1.

00),

resp

ectiv

ely.

(E

–F)

The

PR

2-bi

asfin

gerp

rints

for

the

two

low

P3

rang

es(0

.20–

0.25

and

0.25

–30)

.(G

)T

hefin

gerp

rinto

fthe

next

rang

e(0

.30–

0.35

)th

atis

clos

eto

the

norm

alfin

gerp

rint.

The

next

eigh

tP

3ra

nges

corr

espo

ndin

gto

the

0.30

–0.7

5ra

nge

are

not

show

nbe

caus

eth

epr

evio

usre

port

(Sue

oka

and

Kaw

anis

hi,

2000

)w

ithfe

wer

hum

anse

quen

ces

show

sth

atin

this

rang

eth

efi

nger

prin

tsar

eve

rysi

mila

r.(H

)T

hela

stP

3ra

nge

with

ano

rmal

finge

rprin

t.(I–L

)pr

esen

tF

inge

rprin

tsfo

rth

ehi

ghP3

rang

es(0

.80–

1.00

)th

atsh

owan

omal

ous

finge

rprin

ts.

472

Page 5: Near Homogeneity of PR2-Bias Fingerprints in the Human ...pbil.univ-lyon1.fr/members/lobry/articles/JME_1995_40_326.kim/Sue… · PR2-bias Plot.Plotting AT-bias [A/(A + T)] as the

high G+C range (0.80–1.00) contains approximately1,300 anomalous genes, which corresponds to 4%–5% ofthe total genes analyzed (36,530). The anomalous genesdo not appreciably change the fingerprint constructedusing the total number of genes as shown in Figs. 2A,2B.Further information on the genes in the ranges of 0.10–0.25 and 0.95–1.00 is listed in Tables 1 and 2, respec-tively. Genes in the extremeP3 ranges may not be lim-ited to special functional groups.

Anomalous fingerprints in the low G+C range (P3:0.20–0.35) are shown in Figs. 2E–2G, where the finger-print in the last range is normal (Fig. 2G). Anomalousfingerprints in the high G+C range (P3: 0.75–1.00) arepresented in Figs. 2H–2L, where the fingerprint in 2H isclose to the normal one. The fingerprints for the range of0.35–0.75 are not shown because of their similarity withthe normal fingerprint (Figs. 2A,B) as previously shown(Sueoka and Kawanishi 2000).

Discussion

The species- and amino acid-specificities of PR2-biasfingerprints have been shown in both bacteria and invertebrates (Sueoka 1995, 1999 a, b). In addition, mul-ticellular eukaryotes are generally known to have a widerange of intra-genomic heterogeneity in the G+C content.It is therefore important to examine whether the intra-genomic heterogeneity of PR2-bias fingerprints exists ornot. The present results show that PR2-bias fingerprintsare similar in the major G+C ranges (0.30–0.80) that

cover 88% of the genes analyzed. This coverage is ap-parently an underestimation because the range of 0.80and 1.00 likely contains both types of genes—genes withthe normal fingerprint and those with the anomalous fin-gerprint. As the present result indicates, a more reason-able estimation is close to 95%.

Uniformity of PR2-bias fingerprint in 95% of genes inthis study and in previous studies (Sueoka 1999; Sueokaand Kawanishi 2000) indicates that the PR2-bias finger-print constructed with a large number of random samplesof genes (>500) is a reliable property of the genome of anorganism. This result suggests that the PR2-bias finger-print may provide a potential method that supplementsand, in some cases, augments conventional methods forthe analyses of phylogeny that compare orthologousamino acid- and nucleotide-sequences.

Unique features of the PR2 bias analysis as well asadvantages and disadvantages in its use for phylogeneticanalyses are discussed below. We will also discuss thepossible origin of species- and amino acid-specificitiesof the PR2-bias fingerprints.

Conservative Nature of PR2 Biases

The conservative nature of the PR2-bias fingerprints de-scribed in the Introduction suggests that the synonymouscodon-usage biases of the four-codon amino acids re-flects the optimization of translation efficiency of mRNAthrough matching codon frequencies with relative fre-quencies of corresponding tRNAs as originally proposedin Escherichia coliby Ikemura (1981). The uniform

Table 2. Human sequences in the high G+C range (0.95–1.00) of the third codon position (P3)

Genename

SequenceID Codons GC-cont GC3 P3

PR2 biases

Gene propertiesG3/(G3 + C3)

A3/(A3 + T3)

*** CAA36794 1038 0.734 0.949 0.951 0.455 0.471 Nuclear factor NF-IL6 (AA 1–345)H2B.1 AAA63192 306 0.628 0.957 0.951 0.386 0.750 Histone H2B.1coseg CAA44681 496 0.727 0.956 0.952 0.386 0.429 Vasopressin-neurophysin precursorAVP AAA61291 495 0.725 0.956 0.952 0.386 0.429 Vasopressin/neurophysinfreac-8 AAB48856 318 0.682 0.949 0.953 0.326 0.400 Forkhead protein (freac-8) geneARL7 CAB44355 579 0.611 0.959 0.953 0.521 0.286 ADP-ribosylation factor-like protein 7*** CAB06877 645 0.715 0.951 0.954 0.354 0.400 Lactase-phlorizin hydrolase LPH*** AAC50596 1098 0.649 0.950 0.954 0.373 0.235 G protein-coupled receptor OGR1.*** AAA79060 1098 0.649 0.950 0.954 0.376 0.235 G protein-coupled receptorsreb1 BAA96645 1128 0.738 0.958 0.955 0.423 0.533 mRNA for SREB1Kv6.2 CAB56834 1401 0.735 0.957 0.955 0.460 0.579 mRNA for cardiac potassium channel subunit (Kv6.2)ARC AAF07185 1191 0.685 0.954 0.957 0.527 0.353 Similar to rat arc/arg3.1SOD3 AAA62278 723 0.733 0.961 0.959 0.429 0.444 SOD3, superoxide dismutase*** AAB70165 375 0.651 0.964 0.960 0.519 0.500 Centromere protein B mRNAZNF AAA61332 326 0.633 0.961 0.963 0.424 1.000 DNA-binding protein (ZNF) geneHBF-3 CAA52241 672 0.652 0.961 0.964 0.372 0.375 HHBF-3 mRNATASK AAC51777 1185 0.673 0.967 0.967 0.414 0.583 Pore-forming K+ channel subunit*** AAA92039 318 0.632 0.968 0.972 0.424 0.000 Orkhead protein FREAC-4 mRNASOX1 CAA73847 1164 0.763 0.972 0.974 0.467 0.600 SOX1, Sry-related Box 1 protein

Of the 36 human gene sequences in the highestP3 range of 0.95 to 1.00, 19 with some functional information that may be useful for furthercompositional studies are listed.

473

Page 6: Near Homogeneity of PR2-Bias Fingerprints in the Human ...pbil.univ-lyon1.fr/members/lobry/articles/JME_1995_40_326.kim/Sue… · PR2-bias Plot.Plotting AT-bias [A/(A + T)] as the

PR2-bias fingerprints in 95% of human genes in thisstudy suggest that the tRNA spectrum in human may besimilar in all or almost all cell types. This implies that theadaptation of codon usage frequencies to those of thecorresponding tRNA is likely to be established by selec-tion for the general efficiency of translation, and theirrelationship, once established, becomes strongly con-served (Sueoka, 1995). However, the evolutionarily con-served nature of the PR2-bias fingerprint is apparentlynot absolute, because the PR2-bias fingerprint does showwide differences among distant taxa (Sueoka, 1995).

Ubiquity of PR2 Biases in Coded Areas

The amino acid specific PR2 biases (translation-coupledPR2 biases) exist in all organisms so far examined, andtheir patterns expressed as PR2 bias fingerprints areunique to a group of organisms that are phylogeneticallyrelated (Sueoka, 1995). The PR2-bias fingerprint is aproperty of the genome that covers the codon usage fre-quencies of the eight four-codon amino acids of genes.The four-codon amino acids represent about half of thetotal amino acid residues in protein. The present result ofuniform PR2-bias fingerprints in human implies that thetaxa-specific fingerprint represents the average of a uni-form distribution of the fingerprint for the majority ofgenes. Thus, the PR2-bias fingerprint is a solid genomeproperty.

Independence of PR2 Biases fromP3.

An important feature of the PR2-bias fingerprint for phy-logenetic analyses is its independence from the G+Ccontent in the majority of genes (>95%) in the range ofG+C content of the third codon position from 30% to80% in human (Fig. 2B). This immunity of PR2-biasfrom G+C contents of most genes gives reproducibilityand reliability to the PR2-bias fingerprint. This point issignificant because the conventional methods for theconstruction of phylogenetic trees are based on the com-parisons of amino acid sequences among a few ortholo-gous genes. The orthologous genes may or may not belocated in isochores. Isochore is a chromatin segment inwhich the G+C content is approximately uniform (Ber-nardi et al. 1985). In this sense, the PR2-bias fingerprintcould represent almost all the genes of an organism. Thispoint is also important for the organisms where theG+C content of genes are highly heterogeneous, such asin multicellular eukaryotes and some unicellular eukary-otes.

Anomalous PR2-Bias Fingerprints in the ExtremeG+C Ranges

Anomalous fingerprints in the low and high G+C con-tents are interesting, but they are found only in smallspecific fractions of the human genome (approximately

5%), thus they do not generate serious problems for phy-logenetic analyses. In this connection, preliminary analy-ses with PR2-bias fingerprints and the intra-genomic het-erogeneity of the G+C content have been performed forchickens and mice. Constancy of the fingerprints in themajor G+C range similar to humans has been observed inchickens and a similar tendency was found in mice (datanot shown; to be reported elsewhere with a variety ofeukaryotes).

Caveats for Applications of the PR2-BiasFingerprint Analysis

The accuracy of the PR2-bias fingerprints depends on thenumber of the sequenced genes used. Assuming the bi-nomial distribution of the center of a bubble (x andy), thestandard error of a bubble coordinates is estimated as

=x~1 − x!/~nc ? ng! and=y~1 − y!/~nc ? ng!.

Here,x andy are averages ofG3/(G3 + C3) andA3/(A3 +T3) for individual genes, respectively.nc denotes the av-erage number of codons per gene (harmonic mean) forthe amino acid in question, andng the number of genesanalyzed. Actual errors are usually larger than the theo-retical estimates because of non-random sources of er-rors. However, the error is proportional to 1/√ng. In theauthor’s experience, whenng is close to 100, 500, and1000, the fingerprints provide an approximate and pre-liminary configuration (not reliable for serious compari-sons), mostly reliable configuration, and sufficiently ac-curate configuration, respectively. Obviously, largerbubbles have relatively more reliable estimates of theaverage xy coordinates than those of smaller bubbles.

The conservative nature of the PR2-bias fingerprint isboth advantageous and disadvantageous for phylogeneticanalyses using the base composition of nucleic acids.The first advantage of the PR2-bias fingerprint is itsindependence on the G+C content, and the second ad-vantage is its slow evolution that allows us to examinelarger taxa. Because of its slow evolution, the fingerprintcan be used potentially for the organisms that are re-motely related, such as different orders of vertebrates,where, because of extensive sequence diversities, se-quence comparisons may give ambiguous results. Thethird advantage is that the fingerprint provides detailed,specific information about evolutionary changes ofcodon usages. This visual demonstration of the timecourses of fingerprint changes offer unique insights intothe processes of evolution. A disadvantage is that theslow evolution makes the method less sensitive to minorchanges and useless for analyses of very closely relatedorganisms. Another disadvantage may be that, unlike se-quence comparisons, the difference of the PR2-bias fin-gerprints does not provide a simple proportionality withtime. On the other hand, these advantages and disadvan-

474

Page 7: Near Homogeneity of PR2-Bias Fingerprints in the Human ...pbil.univ-lyon1.fr/members/lobry/articles/JME_1995_40_326.kim/Sue… · PR2-bias Plot.Plotting AT-bias [A/(A + T)] as the

tages make the two methods complementary rather thanmutually exclusive.

As an alternative to the fingerprint, we could define afunction to replace a fingerprint using one or more pa-rameters combining distances and angles of the bias co-ordinates for all four-codon amino acids. However, suchmethods lose the detailed information of fingerprints thatare essential for the recognition of specific changes interms of which amino acids and codons are involved inthe diversification of a taxum from its mother lineage.

On the Possible Origin of PR2-bias Fingerprints

The codon usage biases due to amino acid specific PR2biases originate likely from tRNA frequency for the ef-ficiency of translation. The currently available and bestclue for this scenario is the positive correlation betweensynonymous codon usage biases and the relative fre-quencies of different tRNAs in bacteria (Ikemura, 1981;Kanaya et al., 1999) and yeast (Bennetzen and Hall,1982; Ikemura, 1982). It is likely that the quantitativerelationship between the codon usage frequencies andthe relative abundance of corresponding tRNAs in highereukaryotes are the basis of the pattern of PR2-bias fin-gerprints.

As to the frequencies of iso-acceptor tRNAs, muta-tions are most likely the source of variation. Duplicationsand deletions of tRNA genes, as well as directional mu-tation pressures affecting regulation of tRNA levels,likely change relative frequencies of iso-acceptor tRNAs.Subsequently, codon usage frequencies will be changedby selection to match tRNA frequencies for better trans-lation efficiencies. The anomalous PR2-bias patterns inthe extreme G+C ranges in human and chicken suggestthat strong mutation pressure in those ranges may forcethe use of minor tRNAs (Sueoka and Kawanishi, 2000).

Mutator mutation may also change the efficiency oftranslation through changes of the amino-acylation stepof tRNA. Four copies of the leucine tRNA gene for theCUG codon are known to exist inE. coli, which in-creases both frequencies of the leucine-tRNA with GACanti-colon as well as the CUG codon-usage concertedly(Ikemura and Ozeki, 1983). This correlation between thecopy number of iso-acceptor tRNA gene and the fre-quency of synonymous codon usage seems to be generalphenomena in bacteria (Komine et al. 1990). Based onthe widely found multiple copies of iso-accepting tRNAgenes, it seems reasonable to assume that the PR2-biasesfor individual four-codon amino acids reflect gene num-bers of iso-acceptor tRNAs. It is also likely that, once theoptimal correspondence is established between tRNAsand codon usages, it will remain conserved in evolutionuntil sudden increase or decrease of copy number oftRNAs due to mutations (duplication and deletion).

Under the above model (tRNA-gene mutation model),relative frequencies of iso-acceptor tRNAs are changed

directly through duplications and deletions of tRNAgenes, and in turn, synonymous codon frequencies areadapted to the tRNA frequencies by selection for trans-lation efficiency. It is noted that the selection at thecodon level is not generated by the selection for specificfunctions of individual genes. The tRNA-gene mutationmodel in multi-cellular eukaryotes is based on two as-sumptions: (1) isochores with different G+C contentshave evolved by locally different directional mutationpressures probably due to differences in the extent ofmutation/repair through differences of the chromatinstructure (Filipski, 1987; Sueoka, 1988); and (2) theabundance spectrum of tRNAs is similar in different celltypes (Sueoka and Kawanishi, 2000; this article). Thealternative model has been based on the assumption thatisochores evolved through selection by functional advan-tage of high or low G+C contents of genes in differentisochores (Bernardi et al., 1985; Bernardi 2000).

Conclusions

The present results show that when a large number ofgenes are analyzed in human, the PR2-bias fingerprintsare similar in the majority of genes in the DNA G+Ccontent range between 0.25 and 0.80. Similarity of thefingerprints of different vertebrates (with some differ-ence in fishes) previously reported indicates that the PR2biases are well conserved in vertebrates. PR2 biases maybe used as a new tool for phylogenetic analysis to supple-ment and augment conventional methods. The principleof the PR2-bias fingerprint analysis is entirely differentfrom the currently used methods that are based on thedirect comparison of the amino-acid sequences of pro-teins and the base sequences of orthologous genes. Fur-thermore, the fingerprint method should provide somespecific information on the nature of the changes thatmust have occurred in the branching of taxa. The PR2-bias fingerprint, therefore, is likely to provide additionalinformation for the phylogenetic analyses.

Acknowledgment This article is dedicated to the memory of the lateTom Jukes for his love of science and life. This work was supportedoriginally by NSF (DIR8820806) and Yamamoto Foundation for Pro-motion of Genetic Research to N.S. The author is grateful to Dr. J. R.Lobry for his help in the data retrieval through the ACNUC system, andto Dr. Ruri Miyata for her technical assistance in the data processing inthe early stage of this project.

References

Bennetzen JL, Hall B (1982) Codon in yeast selection. J Biol Chem257:3026–3031

Bernardi G (2000) Isochore and the evolutionary genomics of verte-brates. Gene 241:3–17

Bernardi G, Olfsson B, Filipski J, Zerial M, Salinas J, Cunny G, Meu-nier-Rotival M, Rodier F (1985) The mosaic genome of the verte-brates. Science 228:953–958

475

Page 8: Near Homogeneity of PR2-Bias Fingerprints in the Human ...pbil.univ-lyon1.fr/members/lobry/articles/JME_1995_40_326.kim/Sue… · PR2-bias Plot.Plotting AT-bias [A/(A + T)] as the

Filipski J (1987) Correlation between molecular clock ticking, codonusage, fidelity of DNA repair, chromosome banding and chromatincompactness in germline cells. FEBS Lett 217:184–186

Furusawa M, Doi H (1992) Promotion of evolution: disparity in thefrequency of strans-specific misreading between the lagging andleading ENA strands enhances disproportionate accumulation ofmutations. J Theor Biol 157:127–133

Gouy M, Gautier C, Attimonelli M, Lanave C, di Paola G (1985)ACNUC - a portable retrieval system for nucleic acid sequence databases: logical and physical designs and usage. Comput Appl Biosci1:167–172

Ikemura T (1981) Correlation between the abundance ofE. coli t-RNAand the occurrence of the respective codons in its protein genes: aproposal for a synonymous codon choice that is optimal for theE.coli translational system. J Mol Biol 151:389–404

Ikemura T (1982) Correlation between the abundance of yeast transferRNAs and the occurrence of the respective codons in protein genes.Differences in synonymous codon choice patterns of yeast andEscherichia coliwith reference to the abundance of isoacceptingtRNAs. J Mol Biol 158:573–597

Ikemura T, Ozeki H (1983) Codon usage and transfer RNA content:organism-specific codon-choice patterns in reference to the isoac-ceptor contents. Cold Spring Harb Symp Quant Biol 47:1087–1097

Kanaya S, Yamada Y, Kudo Y, Ikemura T (1999) Studies of codonusage and tRNA genes of 18 unicellular organisms and quantifica-tion of Bacillus subtilistRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis.Gene 234:143–155

Karkas JD, Rudner R, Chargaff E (1968) Separation ofB. subtilisDNAinto complementary strands II. Template functions and composition

as determined by transcription with RNA polymerase. Proc NatlAcad Sci USA 60:015–920

Komine Y, Adachi T, Inokuchi H, Ozeki (1990) Genomic organizationand physical mapping of the transfer RNA genes inEscherichiacoli K12. J. Mol Biol 212:579–598

Lobry JR (1995) Properties of a general model of DNA evolution underno-strand-bias condition. J Mol Evol 40:326–330

Rudner R, Karkas JD, Chargaff E (1968) Separation ofB. subtilisDNAinto complementary strands III. Direct analysis. Proc Natl Acad SciUSA 60:921–922

Rudner R, Karkas JD, Chargaff E (1969) Separation of microbial de-oxyribonucleic acids into complemantary strands. Proc Natl AcadSci USA 63:152–159

Sueoka N (1988) Directional mutation pressure and neutral molecularevolution. Proc Natl Acad Sci USA 85:2653–2657

Sueoka N (1992) Directional mutation pressure, selective constraints,and genetic equilibria. J Mol Evol 3:95–114

Sueoka N (1995) Intra-strand parity rules of DNA base compositionand usage biases of synonymous codons. J Mol Evol 40:318–325;Erratum (1996) J Mol Evol 42:323

Sueoka N (1999a) Two aspects of DNA base composition: G+C con-tent and translation-coupled deviation from intra-strand rule ofA 4

T andG 4 C. J Mol Evol 49:49–62Sueoka N (1999b) Translation-coupled violation of Parity Rule 2 in

human genes is not the cause of heterogeneity of the DNA G+Ccontent of third codon position. Gene 238:53–58

Sueoka N, Kawanishi Y (2000) DNA G+C content of the third codonposition and codon usage biases of human genes. Gene 261:53–62

Wu C-I (1991) DNA strand asymmetry. Nature 352:114Wu C-I, Maeda N (1987) Inequality in mutation rates of the two strands

of DNA. Nature 327:169–170

476


Recommended