Conserved features of TBE1 transposons in ciliated protozoa

Post on 19-Nov-2023

0 views 0 download

transcript

Genetica 101: 75–86, 1997. 75c 1997 Kluwer Academic Publishers. Printed in the Netherlands.

Conserved features of TBE1 transposons in ciliated protozoa

Thomas G. Doak1, David J. Witherspoon1, F. Paul Doerder1;2, Kevin Williams1 &Glenn Herrick1;�

1Department of Oncological Science, University of Utah, Salt Lake City, UT, 84132, USA; 2Department ofBiology, Cleveland State University, Cleveland, OH 44115, USA; �Author for correspondence

Received 20 December 1996 Accepted 16 June 1997

Key words: Oxytricha, protein kinase, sequence repeats, telomere, zinc finger

Abstract

The complete sequences of four TBE1 transposons from Oxytricha fallax and O. trifallax are presented andanalyzed. Although two TBE1s are 98% identical to each other at the nucleotide level, the remaining two TBE1sare only 90% identical both to each other and to the other two. This large evolutionary divergence allows us toidentify conserved TBE1 features. TBE1 transposons are 4.1 kbp long and are flanked by 3 bp target-site repeats.The elements consist of 78 bp inverted terminal repeats, of which the 17 terminal base pairs are Oxytricha telomererepeats; a central conserved section of 550 bp that includes a set of nested direct and inverted sequence repeats;and 3 open reading frames conserved for encoded amino acid sequence. The three open reading frames encodea 22 kDa basic protein of unknown function, a 42 kDa ‘D,D35E’ transposase, and a 57 kDa chimeric C2H2 zincfinger/protein kinase. The protein kinase domain of the 57 kDa protein is unusual, lacking a conserved ATP-bindingmotif.

Introduction

The sister ciliated protozoa species Oxytricha fal-lax and O. trifallax each carry �4000 TBE1 ele-ments in their diploid germline nuclei (Witherspoonet al., 1997). TBE1 elements are repetitive 4.1 kbpsequences delimited by blocks of Oxytricha telomeresequence repeats ((G4T4)n), leading to their designa-tion as Telomere Bearing Elements, type 1 (Herrick etal., 1985). TBE1s are limited to the host’s germlinenucleus (see below).

Four lines of evidence suggest that TBE1s are active‘cut and paste’ transposons (transposing through aDNA intermediate; Berg & Howe, 1989). First, TBE1sare flanked by 3 bp target site duplications (Williams,Doak & Herrick, 1993), characteristic of most trans-posons. Second, TBE1s have 78 bp inverted terminalrepeats (ITRs; Herrick et al., 1985), characteristic ofcut and paste transposons. Third, TBE1s code for a42 kDa protein homologous to a superfamily of trans-posases and integrases (Doak et al., 1994) character-ized by a D,D35E motif of acidic residues essential

for catalysis (van Luenen, Colloms & Plasterk, 1994;Kulkosky et al., 1992; Vincent et al., 1993; van Gent,Groeneger & Plasterk, 1992). Fourth, alternative alle-les of a particular host gene were found to be eitherTBE1-interrupted or ‘empty’, and phylogenetic analy-ses of these alleles indicate a recent origin for the inser-tions (Seegmiller et al., 1996). Consistent with this,most TBE1s are of a similar length (Williams, Doak &Herrick, 1993), analyses of short segments of severalTBE1 genes have failed to identify any gene interruptedby stop codons or frame shifts, and divergence analy-ses show TBE1 genes are evolving under selection fora function of their encoded proteins (Witherspoon etal., 1997; this work). From these collective data weconclude that most TBE1s are fully functional.

Ciliated protozoa are single-cell eukaryotes thatcarry two types of nuclei: a transcriptionally silent,germline micronucleus (MIC), and a transcriptional-ly active, somatic macronucleus (MAC). The MACis a postzygotic copy of the micronucleus that hasbeen processed by DNA elimination, rearrangement,and amplification (Herrick, 1994; Prescott, 1994;

76

Klobutcher & Herrick, 1997). During MAC develop-ment most, apparently all, TBE1s are removed (Her-rick et al., 1985; Williams, Doak & Herrick, 1993).Comparison of MIC loci with TBE1 insertions withthe resulting MAC sequences show that TBE1 exci-sion from developing MAC DNA precisely removesthe TBE1 and one target site repeat (Williams, Doak& Herrick, 1993). This reaction regenerates the orig-inal unmutated gene in the MAC, where the gene isexpressed, making TBE1 insertions essentially pheno-typically silent. This benefits both the TBE1 transpo-son and its ciliate host and has led us to propose thatTBE1s contribute protein factors necessary for theirown excision (Williams, Doak & Herrick, 1993; With-erspoon et al., 1997).

We report here the complete sequences of fourTBE1s, three from O. fallax and one from O. trifallax.Comparison of these four sequences allows the iden-tification of shared sequence features that have beenconserved during divergence of the elements. Theseconserved features – protein-coding genes and appar-ent cis-acting sites – represent aspects of the TBE1transposon that have been maintained by selection dur-ing the evolution of TBE1 transposons in Oxytrichafallax, O. trifallax, and the immediate ancestor of thetwo sister species (Seegmiller et al., 1996).

Materials and methods

Three TBE1s – fal1, fal2, and fal4 – were isolatedfrom Oxytricha fallax strain 9D1 (Cartinhour & Her-rick, 1984). TBE1 fal1 and TBE1 fal2 were retrievedfrom a �L47 library of O. fallax micronuclear DNA(Herrick et al., 1985). The TBE1 fal1 sequence andanalyses of its transposase gene were reported pre-viously (Doak et al., 1994). A clone containing fal4was also retrieved from that �L47 library (A. Lee,unpublished). TBE1 tri1 was amplified by polymerasechain reactions (PCR) from O. trifallax strain JRB 310DNA (Williams, Doak & Herrick, 1993). Sequencingof these PCR products or cloned DNA was performedeither manually (Williams & Herrick, 1991) or withan ABI automated sequencer at the University of UtahHealth Sciences Sequencing Facility. TBE1 sequenceshave been submitted to GenBank under the follow-ing accession numbers: fal1 and fal2, L39908; fal4,U85403; tri1, L39906.

The assembly and editing of sequence alignments,and most subsequent analyses, were performed withthe GCG package of programs (Genetics Computer

Table 1. Nucleotide sequence identitiesbetween TBE1s

fal1 fal2 fal4 tri1

fal1 –

fal2 97.7% –

fal4 91.0% 90.8% –

tri1 89.7% 90.2% 89.6% –

Group, 1994; GCG program names are indicated withall uppercase letters). Default settings were used unlessotherwise noted. Analyses were performed either onthe individual ungapped sequences or on the ‘manu-ally’ aligned sequences that were gapped where nec-essary to align homologous blocks. The most diver-gent elements are�90% identical and offer little prob-lem in alignment. PROFILEMAKE and PROFILE-GAP (Gribskov, Luthy & Eisenberg, 1990) were usedto construct profiles that were used to search sequencesfor matches.

Calculation of synonymous and nonsynonymousdivergences (ds and dn, respectively) was by themethod of Nei and Gojobori (1986), as implementedby Ina et al. (1994) and modified by Witherspoon et al.(1997) to use the Oxytricha genetic code (Williams &Herrick, 1991). Smith-Waterman searches were con-ducted with MpSrch, 3.0D-� (John F. Collins, Biocom-puting Research Unit, 1995, University of Edinburgh,UK). The four TBE1 sequences were examined for evi-dence of gene conversion events using the method ofSawyer (1989), modified to use the Oxytricha geneticcode (Williams & Herrick, 1991; Witherspoon et al.,1997).

Results

To identify conserved features of TBE1 transposons,we determined the nucleotide sequence of four com-plete TBE1s, three from O. fallax (fal1, fal2, and fal4;4073 bp, 4072 bp, and 4076 bp, respectively) and onefrom O. trifallax (tri1, 4076 bp). TBE1 fal1 and fal2are adjacent insertions, separated by 935 bp, in the vCallele of the CR-MSC gene of the 81-locus (Herricket al., 1985; Seegmiller et al., 1996). TBE1 tri1 is aninsertion in that same gene in O. trifallax strain JRB310 (Williams, Doak & Herrick, 1993). TBE1 fal4was picked for complete sequencing from a larger setof cloned O. fallax TBE1s, because partial sequenceshowed it to be particularly divergent from fal1, fal2,

77

Figure 1. TBE1 fal1 sequence features and similarities among the four TBE1s, fal1, fal2, fal4, and tri1. A. Features of fal1. The five largestopen reading frames (ORFs) of fal1 are shown as large arrows (translational start to stop) above and below the central axis. The two faintlyshaded arrows indicate ORFs (28 and 20 kD) not under selection for encoded proteins (see text; Figure 4). The coordinates of the ORFs in thefal1 sequence are: 42 kDa, 81–1145 bp; 28 kDa, 1016–312 bp; 22 kDa, 1943–1362 bp; 20 kDa, 2064–2591 bp; 57 kDa, 2499–3954 bp. The42 kDa ORF encodes a protein homologous to a family of transposases characterized by a ‘D,D35E’ motif; the locations of segments of thismotif are marked as boxes on the 42 kDa ORF arrow. The 57 kDa protein is homologous to both C2H2 zinc fingers (gray boxes) and proteinkinases (protein kinase domains VI–XI; striped box; see Figure 5). Inverted terminal repeats are indicated as thickened arrow heads on thecentral axis; an enlarged ITR is labeled to indicate terminal telomere repeats and a G/C strand bias internal to the telomere repeats. A centralset of nucleotide repeats, the ‘repeats region’, is indicated by a striped box below the line (see Figure 2). B. Similarity plot offal1 vs. fal2. Theaverage similarity of the aligned fal1 and fal2 sequences is calculated and graphed across the alignment, using PLOTSIMILARITY, and alignedunder the features map of fal1. A sliding window of 1 bp was used, emphasizing large regions of complete identity between these two TBE1s.The horizontal dashed line marks the average similarity for the entire plot. C. Similarity plot for the four TBE1s (sliding window= 50 bp).Long conserved and diverged regions are marked by bars ‘a’ and ‘b’, respectively.

and tri1 (Witherspoon et al., 1997), and thus valuablefor comparisons with the other three TBEs (Table 1).

Conservation of DNA features

Of the four elements, fal1 and fal2 are the most close-ly related, 98% identical overall (Table 1). Indeed,one element could have directly given rise to the oth-er by transposition. Consistent with this, these two

TBE1s reside close to one another (935 bp); other cutand paste transposons are known to transpose to near-by positions (Greenblatt, 1984; Athma, Grotewold &Peterson, 1992; Zhang & Spradling, 1993).

However, fal1 and fal2 are diverged from each otherand the differences are not randomly distributed acrosstheir lengths. Figures 1A and 1B compare their alignedsequences in a plot of similarity below a map of fal1features. In particular there seems to be a dearth of dif-

78

ferences in the region between 1800 and 3800 bp. Thisregion of high similarity may be due to a gene conver-sion, subsequent to the transposition, that replaced the�1800–3800 bp region of one TBE1 with that of theother, or of a close relative; statistical tests designedto detect gene conversion (Sawyer, 1989) find strongevidence (not shown) for conversion in the four-TBE1data set (p < 0.0001); pairwise comparisons of theTBE1s using this method show that this result is large-ly due to an apparent gene conversion between fal1 andfal2 involving a region from 2020 to 3738 bp.

Fal4 and tri1 are quite diverged from each otherand from fal1 and fal2 (�90% identical to, or �10%diverged from, each other and fal1 and fal2; Table 1).Figure 1C shows a similarity plot comparing the fourTBE1s. Regions of high similarity are interspersedwith regions of lower similarity, for which there are twoclasses of explanations. First, different extents of sim-ilarity may reflect different histories (lineages) of thevarious regions. Above and elsewhere (Witherspoonet al., 1997), we present evidence of recombinationalexchange among TBE1s; thus any TBE1 copy may rep-resent a historic mosaic. When comparing two TBE1s,different regions will be related to different extents dueto their different times of divergence from commonancestors. Second, differences can result from differ-ent selective constraints operating on different regionsof the TBE1s. Protein-coding regions and cis-actingregulatory sites may be conserved and thus show highsimilarity among different TBE1 copies, while inter-genic regions may be relatively unconstrained.

The region of highest divergence (Figure 1C, regionb) lies within the 57 kDa ORF and is a region webelieve encodes a flexible segment between two pro-tein domains (see below). Selection maintains the read-ing frame though this region (no stops or indels arepresent), but the amino acid sequence, and thus thenucleotide sequence, is less constrained than in otherparts of the protein. We see no evidence for the actionof diversifying selection.

A 550 bp region (from 1900–2550 bp) is nearlyinvariant, more so than even the flanking conservedORFs (Figure 1C, region a; see below), suggest-ing a selection for conservation of specific cis-actingnucleotide sequences. This central region includes aconspicuous set of short repeats. Two major directrepeats (type 1 and type 2; Figure 2A) are reiterated inthe region four and three times, respectively. When therepeated units are compared within and among TBE1s,repeat type 1 has a nearly perfectly conserved core of13 contiguous base pairs and a total extent of 21 bp

(Figure 2C). Repeat type 2 is a perfectly conserved8 bp sequence (Figure 2B). Other smaller repeats and anumber of small palindromic sequences are also foundin this region, several of which are interlocked withrepeats 1b and 1c in a complex pattern (Figure 2B).Copies of the small repeats can be found elsewhere inthe TBE1s (not shown), although at a lower density,but repeats type 1 and 2 are unique to this region.

The repeats account for only�150 bp of the 550 bpcentral conserved region. The remainder includessequences associated with the 50 ends of the con-served 22 kDa and 57 kDa ORFs (see below). Fig-ure 3 juxtaposes the recognized sequence features inthe region with a similarity plot of the correspond-ing region. The conserved region 50 to the 57 kDaORF is separated from the main conserved region bya region of decreased similarity (from 2300–2430 bp);the longer, conserved section 50 of the 22 kDa ORFoverlaps the ORF’s start. These conserved segments 50

of the two ORFs might function as cis-acting signalsfor the expression of these two ORFs.

The TBE1 78 bp inverted terminal repeats (ITRs)have been described previously (Herrick et al., 1985).They are highly, though not perfectly, conservedamong the four TBE1s examined here. Fal1 and fal2share a single base pair deletion in their left ITRs rela-tive to their right ITRs and to both ITRs of fal4 and tri1,further supporting their close relationship. The termi-nal 17 bp of TBE1 ITRs consist of Oxytricha telomererepeats, G4T4G4T4G1, and are conserved among largenumbers of TBE1s (Figure 1A; Williams, Doak & Her-rick, 1993). As previously observed for fal1 and fal2(Herrick et al., 1985), the ITR sequences of fal4 andtri1 internal to the telomere sequence show a GC strandbias devoid of G’s and thus opposite to the telomerebias. This bias continues 42 bp internal to the rightITR for each of the four TBE1s, giving a total of 103nt lacking Gs.

Fal1, fal2, and tri1 are each flanked by 3 bp directrepeats of ATT, matching the inferred AnT target siteduplication of TBE1s (Williams, Doak & Herrick,1993). Fal4 is also flanked by the direct repeat AAT.

Conservation of TBE1 ORFs

The fal1 sequence contains five large ORFs (Fig-ure 1A) encoding predicted protein products of 20 kDa,22 kDa, 28 kDa, 42 kDa, and 57 kDa (each ORF isnamed for the size of the protein it would encode). The28 kDa and 42 kDa ORFs overlap on opposite strandswith their codons aligned, whereas the 30 end of the

79

Figure 2. Repeats cluster. A. Organization of two repeat types in the center of the fal1 sequence (2150–2310): type 1, black arrows, are repeatedfour times; type 2, gray arrows, are repeated three times. B. The fal1 sequence, 2094–2313, is shown with a variety of direct and invertedrepeated units marked. Type 1 and 2 repeats are marked as in 1A, except that the invariant 13 bp of the type 1 repeats are marked with a solidarrow, whereas the somewhat variable 4 bp extensions to either side of the core 13 bp are marked with a dashed line. For both repeat types 1 and2, more extensive matches can be found among various pairs of repeats (not indicated). Other direct repeated units have unique arrows aboveeach type. Inverted repeats are marked below the sequence, with various styles of arrows. C. The four aligned fal1 type 1 repeats are orderedto emphasize the shared features of the variable 4 bp on either side of the invariant core. Two sites not completely conserved among the fourelements are indicated in lower case. In repeat 1c nucleotide 11 is a T in fal1 and fal2, and an A in fal4 and tri1. In repeat 1b, nucleotide 21 isan A in fal1, fal2, and fal4, and a T in tri1.

20 kDa ORF overlaps the 50 end of the 57 kDa ORFin a different reading frame. The 28 kDa ORF is intactin fal4 and tri1, but is interrupted by 2 premature stopcodons in fal2. This suggests that the 28 kDa ORF isnot conserved for function, although the stops in thefal4 28 kDa ORF could be recent mutations. However,the 20, 22, 42, and 57 kDa ORFs of fal1 are representedintact in the other three TBE1s.

Synonymous and nonsynonymous divergences of fiveTBE1 ORFs

Elsewhere we reported (Witherspoon et al., 1997) thatsub-regions of the 42 kDa and 57 kDa ORFs are con-served for protein function. This was determined bycomparing the extents of synonymous and nonsynony-mous divergence (ds and dn as defined by Nei andGojobori [1986]; ds reflects nucleotide changes that

80

Figure 3. Sequence features of the central conserved region. A similarity plot is shown that compares four TBE1s across positions 1800–2800,which include an especially conserved region (region ‘a’, Figure 1C). A window of 1 bp was used to show the exact positions of single basedifferences among the four TBE1s. Below the plot is a features map of the region. The 20 kDa ORF, which is not under selection for proteinfunction (see text), is shown as a dashed box. The 22 kDa and 57 kDa ORFs, both conserved for protein function, are shown as solid boxes, andcontinue beyond the extent of the graph as indicated. The extent of the repeats cluster (Figure 2) is indicated by a box.

Figure 4. Divergence analyses of five TBE1 ORFs. The ratio ofsynonymous divergence (ds) to nonsynonymous divergence (dn)was calculated for the six pairwise comparisons among the fourTBE1s (after Witherspoon et al., 1997) for each ORF. The averagesof these six values are graphed. Each ORF is referred to only by thesize (in kDa) of the encoded protein. A ds/dn value of 1 indicatesan absence of purifying selection for protein function, while valuesgreater than 1 indicate selection for an encoded protein function.ds does not significantly exceed dn in any pairwise comparisonfor 20 kDa and 28 kDa ORFs, yielding ds/dn ratios �1. For the22 kDa, 42 kDa and 57 kDa ORFs ds significantly exceeds dn inall pairwise comparisons (p < 0.01, 1-tailed Z-test, no correctionfor non-independence; analyses not shown), yielding ds/dn ratiosconsiderably larger than 1.

do not alter amino acid sequence, whereas dn reflectsnucleotide changes that alter amino acid sequence)between pairs of TBE1 sequences. We found a sig-nificant lack of nonsynonymous divergence relativeto synonymous divergence, resulting in a ds/dn ratiogreater than 1. We have extended these analyses to thefull length 42 kDa and 57 kDa ORFs and again find thatds/dn is significantly greater than 1 (Figure 4). We alsofind that the 22 kDa ORF is conserved for protein func-tion. The pairwise comparisons for the remaining twoORFs (20 kDa and 28 kDa) yield ds/dn ratios of�1 inall six pairwise comparisons. Thus TBE1s carry threegenes coding for 22 kDa, 42 kDa, and 57 kDa proteins;mutations in these genes that affect protein sequencehave been selectively eliminated from the population.

Features of the proteins encoded by the three TBE1genes

The 22 kDa protein. This 193 amino acid protein ishighly charged and moderately basic (12% D+ E and16% H+K+R; predicted pI= 9.13). The gene is wellconserved for a specific protein sequence (a minimumof 92% identical at the protein level in pairwise com-parisons of the 22 kDa proteins from the four TBE1s).We have not identified homologs in the public databases.

The 42 kDa transposase. We have shown previouslythat the 354 amino acid 42 kDa protein is a member

81

of the D,D35E superfamily of transposases and inte-grases and is most closely similar to the Tc1/Marinerand Euplotes crassus Tec transposases (Doak et al.,1994). The three acidic residues of the D,D35E motifare conserved in all four TBE1s; indeed, most of theprotein is highly conserved (a minimum of 91% iden-tical at the protein level in pairwise comparisons). Thethree acidic residues are thought to coordinate one ortwo divalent metal ions in the active site in a structureanalogous to that of RNase H (reviews: Yang & Steitz,1995; Grindley & Leschziner, 1995). The 42 kDa pro-tein is basic (predicted pI = 10.42), a characteristic oftransposases (Berg & Howe, 1989).

The 57 kDa zinc finger/protein kinase chimera. This481 amino acid (fal1 and fal2) or 482 amino acid(fal4 and tri1) protein can be divided into four regions.First, amino acids 1–84 are well conserved among thefour TBE1s, but do not show significant similarities toproteins in the public databases. Second, amino acids85–155 are well conserved and contain two predictedC2H2-type zinc fingers (ZFs); these will be discussedbelow. Third, amino acids 156-195 are poorly con-served among the four TBE1s. This can be seen at thenucleotide level as a trough of low similarity centeredon base pair 3080 (Figure 1C). We suggest that thisregion is a tether between the ZF and the final proteinkinase domain with little requirement for a particularamino acid sequence. Fourth, the C-terminal half ofthe protein (amino acids 196-481) is well conservedand contains sequence homologous to protein kinases,as discussed below.

C2H2-type ZFs impart sequence-specific DNAbinding ability to many proteins (Pabo & Sauer, 1992).In Figure 5, the fal1 ZFs are aligned to the consen-sus sequence of a ZF profile (a site-specific scoringmatrix derived from, in this case, aligned ZFs; Grib-skov, Luthy & Eisenberg, 1990) to illustrate the con-served sites. Each finger has the four zinc ligands char-acteristic of a C2H2 ZF, and each finger has residuesconsistent with a site-specific, DNA-binding functionin positions known to contact DNA bases and phos-phates in well-studied ZFs (Figure 5; Choo & Klug,1994; Rebar & Pabo, 1994; Jamieson, Kim & Wells,1994; Pabo & Sauer, 1992). However, specific C2H2-type ZFs bind either DNA duplexes, RNA duplexes, orDNA-RNA heteroduplexes (reviewed in Berg & Shi,1996), and the 57 kDa protein might not bind to DNAbut to some other nucleic acid duplex. The spacingbetween the two fingers is unusually long (17 residues,rather than the usual 4–6 residues, from the last histi-

dine of the first ZF to the first conserved hydrophobicresidue of the second ZF), raising the possibility thatthe two fingers do not bind immediately adjacent 3-bpsites (pers. comm., R.N. Dutnall).

The C-terminal half of the 57 kDa protein sequenceshows strong similarities to protein kinase catalyticdomains (Z values of 10�14 in a Smith-Waterman data-base search). Figure 5 shows an alignment of the fal157 kDa protein sequence to a profile of protein kinasesand to cyclic AMP-dependent protein kinase (cAPK).Protein kinases have two domains directly involvedin catalysis, a small, N-terminal domain and a largeC-terminal domain; ATP is bound at the interface ofthe two domains (Taylor & Radzio-Andzelm, 1994).The conserved subdomains of protein kinase identifiedby Hanks and Hunter (1995) are indicated below thealignment.

The match of the 57 kDa protein sequence to theprofile and to cAPK is particularly strong in sub-domains VIA-IX, which fall in the large C-terminaldomain. The invariant or nearly invariant residues iden-tified by Hanks and Hunter (1995) are all present inthe 57 kDa protein. The presence of sequences in sub-domain VIA (HRDLKPEN) and VIII (GTPGYY-PE)diagnostic of serine/threonine protein kinases (under-lined above; Hanks, Quinn & Hunter, 1988; Tay-lor, Radzio-Andzelm & Hunter, 1995) suggest thatthe 57 kDa protein phosphorylates serine or threoninerather than tyrosine.

The weak 57 kDa match in subdomains I-V, whichconstitute the small N-terminal domain of proteinkinases, is problematic (Taylor & Radzio-Andzelm,1994; Hanks & Hunter, 1995). This domain forms oneside of the nucleotide binding cleft that anchors thenon-transferred ATP phosphates and contains a highlyconserved GxGxxG motif in subdomain I (Taylor &Radzio-Andzelm, 1994; Hanks & Hunter, 1995).

Although Figure 5 shows an alignment of the57 kDa protein to subdomains I-V of the profile,we find no evidence of homology to any of the firstfive subdomains of protein kinases. Specifically, theGxGxxG motif is missing in the 57 kDa protein andis not found in other TBE1 encoded proteins (notshown). As a more sensitive test for the presence ofthe GxGxxG-containing domain, a profile of subdo-main I was constructed from a representative set of allknown protein kinases (Hanks & Hunter, 1995) andfrom a set of protein kinases that match the first pro-file poorly. These profiles were used to search all theORFs of the four TBE1s. Matches were assessed fortheir quality, position relative to the catalytic domain,

82

83

Figure 5. Homology of the 57 kDa protein to C2H2 zinc fingers and to protein kinases. The amino acid sequence of the fal1 57 kDa peptide isshown, aligned to apparently homologous sequences. The symbol ‘)’ indicates that the 57 kDa sequence continues uninterrupted from one lineto the next. Zinc fingers: The three lines labeled ‘ZF’ show two regions of the 57 kDa protein (57k ZF) aligned to, above and below, a consensussequence for a profile of C2H2 ZFs. The profile was made with PROFILEMAKE using the first 100 ZFs in the Prosite v13.0 alignment of C2H2ZFs (URL: http://www.ebi.ac.uk/searches/prosite doc.html). Alignment gaps are represented by dashes. Matches of the 57 kDa ZF residues tothe profile are marked as given by PROFILEGAP, with strength of match indicated by ‘|’> ‘:’ > ‘.’. Aligned residues are marked in bold whenthere is an exact match between the 57 kDa and profile consensus sequence or between the two 57 kDa ZFs. Expected structural sites are markedas: z – zinc ligands; � – hydrophobic sites. Positions expected to make specific nucleic acid contacts are indicated as: � – backbone contact; *– specific major grove nucleotide contacts (after Pabo & Sauer, 1992 and Berg & Shi, 1996). Protein kinase. Following the ZF alignment, theC-terminal residues of the 57 kDa protein (57 kDa) are aligned to and above the consensus sequence for a protein kinase profile (from the GCGpackage of profiles); the sequence of the catalytic subunit of the human cyclic-AMP dependent protein kinase (cAPK; accession no. 125205) isaligned with and below the profile. The two alignments were generated by PROFILEGAP. Gaps are indicated by dashes; a single one-residuegap was inserted into the 57 kDa sequence before the PROFILEGAP alignment, to improve the alignment in domain VIII, and is indicated as‘�’. Matches of cAPK and the 57 kDa proteins to the profile are as marked by PROFILEGAP (see above). Residues of cAPK and the 57 kDaprotein that agree exactly with the profile consensus line, or with each other, are marked in bold as is the corresponding position in the profile.Eleven conserved subdomains of the protein kinases (Subd), as defined by Hanks and Hunter (1975), are labeled below the cAPK sequence anddelimited by parentheses. In subdomain I, the phosphate guide motif, GxGxxG, is indicated below the cAPK and profile sequence; its absencein the 57 kDa sequence is discussed in the text.

and the sequence conservation among the four TBE1s.The profiles failed to find convincing matches (notshown). Several proteins with known protein kinaseactivity have poor GxGxxG motif matches (e.g., ninaC,Montell & Rubin, 1988; VSP15, Herman et al., 1991;summarized by Hanks & Hunter, 1995); however oursubdomain I profiles do identify their GxGxxG sites.

Discussion

Analysis of four diverged TBE1s has allowed us toidentify their conserved features. In addition to thealready identified ITRs (Herrick et al., 1985, Williams,Doak & Herrick, 1993) and conserved portions of the42 kDa and 57 kDa ORFs (Witherspoon et al., 1997),we have shown that the entire 22 kDa, 42 kDa, and57 kDa ORFs have evolved under selection for proteinfunction (Figure 4). Also, a 550 bp region between thedivergent 22 kDa and 57 kDa ORFs is conserved fora specific nucleotide sequence (Figure 3). Analyses oflarge sets of genomic TBE1s have shown that most ofthe 4000 TBE1s in the genome share a common sizeand similarly sized internal segments and that their42 kDa and 57 kDa ORFs are under selection for func-tion (Williams, Doak & Herrick, 1993; Witherspoon etal., 1997); the four TBE1s analyzed here individuallyshare these features, and we accept them as represen-tative of most TBE1s.

Before discussing the conserved units, it is use-ful to consider by what selection pressures they mighthave been maintained. We have considered two possi-ble sources of conservative selection on TBE1s (With-erspoon et al., 1997). First, the fitness of the host

is compromised if TBE1s are not efficiently excisedfrom its developing MAC. Cis-acting sequences nec-essary for excision will be maintained, and we havesuggested that TBE1-encoded proteins contribute tothe removal of TBE1s from the developing MAC, andthus are selected for function at the level of the host(Williams, Doak & Herrick, 1993; Klobutcher & Her-rick, 1997; Witherspoon et al., 1997). Second, theremight be a selection at transposition for TBE1s capa-ble of proliferation by transposition. Again, there is aselection to maintain the cis-acting sites in elementsnecessary for transposition. However, the selection isless obvious for the TBE1 encoded proteins and cis-acting sites responsible for their expression. This isbecause TBE1 proteins are expected to act in trans; ineukaryotes, defective elements can be propagated bythe transposase provided by still-intact elements. Thisscenario predicts a rapid accumulation of defective ele-ments (Kaplan, Darden & Langley, 1985), as is seenfor many cut and paste elements (e.g., P, O’Hare etal., 1992; Ac, MacRae & Clegg, 1992 and referencestherein; mariner, Robertson & Lampe, 1995; Uhu,Brezinsky, Humphreys & Hunt, 1992; Tc1, Harris &Rose, 1989). Because we do not observe an accumula-tion of TBE1s with defective genes, we must postulatethat TBE1-encoded proteins are necessary for excision,or effectively act in cis during transposition (Wither-spoon et al., 1997).

Whatever the nature of the selection, it has acted onthe 22 kDa, transposase, and ZF/protein kinase genes.Thus all three must be necessary either for excisionor in cis for transposition, although different proteinsand nucleotide features could be under selection fordifferent reasons.

84

It is curious that both TBE1s and the Tec trans-posons of the ciliate Euplotes crassus (Jahn et al.,1993) have three genes: the recognized transposaseand two additional genes – although the two additionalTec ORFs are not obviously homologous to the twoadditional TBE1 ORFs (T.G.D., unpublished results).Most eukaryotic cut-and-paste transposons (such asTc1 and Mariner, fairly close relatives of TBE1s) car-ry only a single gene for transposase (Berg & Howe,1989). The multiple genes of TBE1s and Tecs mayeventually be explained by these transposons’ uniqueroles, current or past, in their ciliate hosts’ MAC devel-opment, because both TBE1s and Tecs are preciselyexcised from the developing MAC (Klobutcher & Her-rick, 1997).

The activities of the 42 kDa and 57 kDa proteinsare indicated by homology, but the 22 kDa protein’sfunction remains a mystery. The 42 kDa protein haspreviously been shown to be a transposase related toTc1/Mariner transposases (Doak et al., 1994), and wehave suggested that it functions as TBE1 excisase dur-ing MAC development (Williams, Doak & Herrick,1993; Witherspoon et al., 1997). The 57 kDa proteinis chimeric, having two C2H2 ZFs joined by a poorlyconserved linker region to a domain homologous to thecatalytic subunit of protein kinases.

The odd structure of the 57 kDa protein raisestwo questions. First, is it a functional protein kinase?And, second, what function might this chimeric pro-tein serve? Because the kinase catalytic domain hasbeen conserved, including essential catalytic residues,we suspect it is functional. The function of the miss-ing small domain, containing the GxGxxG motif, mustthen be provided by either an independent host pro-tein (since we failed to find the GxGxxG motif in anyof the TBE1 ORFs) or by a non-homologous domainof the 57 kDa protein. Consistant with the latter pos-sibility, we have identified a candidate region that iswell conserved among TBE1s and which correspondsin position to the small, N-terminal domain of kinases(unpublished results).

To our knowledge the 57 kDa protein representsthe only reported protein kinase homolog joined toa C2H2 ZF domain. Might the 57 kDa protein bea member of an emerging class of protein kinas-es that function in association with DNA (Hartleyet al., 1995; Hunter, 1995), in a manner similar toDNA-dependent kinase complexes? In mammalianimmunoglobin V(D)J recombination, the initial chro-mosome breaks are catalyzed by the transposition-like trans-esterification activity of RAG1 (reviewed in

van Gent, Mizuuchi & Gellert, 1996), but joining ofthe broken chromosome depends on the ‘KU’ or scidDNA-dependent protein kinase complex (reviewed byWeaver, 1995). In Drosophila, a homolog of the mam-malian DNA-dependent protein kinase complex bindsto the tip of P element ITRs and plays a role in thehealing of breaks created during transposition (Beall,Admon & Rio, 1994; Beall & Rio, 1996). These exam-ples suggest that the 57 kDa protein could activateadjacently-bound host DNA repair enzymes follow-ing TBE1 transpositional and/or developmental exci-sion. Alternatively, the 57 kDa protein might converttransposase to excisase during MAC development andspecifically act on transposase already bound to TBE1sites. However, the analogy of 57 kDa protein to DNA-dependent protein kinases is weak for two reasons.First, unlike the 57 kDa protein, the DNA binding andprotein kinase activities of the DNA-dependent pro-tein kinase reside on separate subunits. Second, whilethe 57 kDa kinase is by homology a conventional pro-tein kinase, DNA-dependent kinases are lipid-kinase-related (Hartley et al., 1995; Hunter, 1995).

The ITR sequences of all four TBE1s are wellconserved, suggesting that they interact critically withtrans-acting factors. The telomere sequence in the ITRssuggests the involvement of host duplex telomere bind-ing proteins (reviewed in Fang & Cech, 1995; Lund-blad & Wright, 1996) in excision or transposition. It isintriguing that some members of the DNA-dependentprotein kinase family are involved in telomere mainte-nance (reviewed in Zakian, 1995; Morrow et al., 1995;Greenwell et al., 1995).

A central region of 550 bp is strongly conservedamong all four TBE1s (Figure 1C). This region adjoinsthe 50 ends of the 22 kDa and 57 kDa genes, andincludes a set of small repeats that span�150 bp. Theserepeats could function as a transcriptional enhancer,providing multiple binding sites for transcriptionalactivators. They also could serve a role in the assemblyof transposase-DNA complexes, analogous to the roleof the Internal Activating Sequence (IAS) of trans-poson Mu (reviewed in Mizuuchi, 1992). The IASprovides multiple binding sites for transposase in thecenter of the element and is essential for formation ofthe active transposition complex (Watson & Chaconas,1996). Mizuuchi (1992) suggests that Mu needs an IASbecause many elements are simultaneously active dur-ing a Mu lytic infection and the IAS serves to ensurethat the two interacting Mu ends belong to the sameelement. This case seems relevant to TBE1 excision,where 4000 TBE1s are simultaneously active in exci-

85

sion. For example, fal1 and fal2 reside only 935 bpapart and are in the same orientation; their excision asa single unit would remove almost 1 kbp of a conservedprotein-coding MAC gene (Williams & Herrick,1991).

In summary, we have identified five regions ofstrong conservation in the ciliate transposon TBE1;three conserved protein-encoding genes and tworegions conserved for nucleotide sequence, the ITRs,and a 550 bp central region. Such strong conservationis unexpected for a eukaryotic cut-and-paste transpo-son, as is the presence of three genes.

Acknowledgements

We thank Robert N. Dutnall for helpful discussion ofzinc finger structure, Tom Hanks for providing a pro-tein kinase alignment and helpful discussion, Jon Segerand H.L. Ley III for many helpful discussions andinsightful reading of the manuscript, Dan Wall for pro-ducing phage DNA representing fal4, and Alice Lee,Kumar Pandya, Newton Saunders and Greg Chase fortechnical assistance. This work was supported by theNational Institute of Health through grant GM25203to G.H. and Genetics Training Grant (GM07464) Fel-lowships to T.G.D. and D.J.W.

References

Athma, P., E. Grotewold & T. Peterson, 1992. Insertional mutagene-sis of the maize P gene by intragenic transposition of Ac. Genetics131: 199–209.

Baker, T.A. & L. Luo, 1994. Identification of residues in the Mutransposase essential for catalysis. Proc. Natl. Acad. Sci. USA91: 6654–6658.

Beall, E.L. & D.C. Rio, 1996. Drosophila IRBP/Ku p70 corre-sponds to the mutagen-sensitive mus309 gene and is involved inP-element excision in vivo. Genes Dev. 10: 921–933.

Beall, E.L., A. Admon & D.C. Rio, 1994. A Drosophila proteinhomologous to the human p70 Ku autoantigen interacts with theP transposable element inverted repeats. Proc. Natl. Acad. Sci.USA 91: 12681–12685.

Berg, D.E. & M.M. Howe, eds., 1989. Mobile DNA. AmericanSociety for Microbiology, Washington, D.C.

Berg, J.M. & Y. Shi, 1996. The galvanization of biology: a growingappreciation for the roles of zinc. Science 271: 1081–1085.

Brezinsky, L., T.D. Humphreys & J.A. Hunt, 1992. Evolution of thetransposable element Uhu in five species of HawaiianDrosophila.Genetica 86: 21–35.

Cartinhour, S.W. & G. Herrick, 1984. Three different macronuclearDNAs in Oxytricha fallax share a common sequence block. Mol.Cell. Biol. 4: 931–938.

Choo, Y. & A. Klug, 1994. Toward a code for the interactions of zincfingers with DNA: selection of randomized fingers displayed onphage. Proc. Natl. Acad. Sci. USA 91: 11163–11167.

Doak, T.G., F.P. Doerder, C.L. Jahn & G. Herrick, 1994. A proposedsuperfamily of transposase genes: transposon-like elements inciliated protozoa and a common ‘D35E’ motif. Proc Natl. Acad.Sci. USA 91: 942–946.

Engelman, A. & R. Craigie, 1992. Identification of conservedresidues critical for human immunodeficiency virus type 1 inte-grase function in vitro. J. Virol. 66: 6361–6369.

Fang G. & T.R. Cech, 1995. Telomere Proteins, pp. 69–105 inTelomeres. Cold Spring Harbor Laboratory Press.

Genetics Computer Group. Program Manual for the Wisconsin Pack-age, Version 8, September, 1994, Genetics Computer Group, 575Science Drive, Madison, Wisconsin, USA 53711.

Greenblatt, I.M., 1984. A chromosome replication pattern deducedfrom pericarp phenotypes resulting from movements of the trans-posable element, modulator, in maize. Genetics 108: 471–485.

Greenwell, P.W., S.L. Kronmal, S.E. Porter, J. Gassenhuber, B.Obermaier & T.D. Petes, 1995. TEL1, a gene involved in control-ling telomere length in Saccharomyces cerevisiae, is homologousto the human ataxia telangiectasia gene. Cell 82: 823–829.

Gribskov, M., R. Luthy & D. Eisenberg, 1990. Profile analysis.Methods in Enzymology 183: 146–159.

Grindley, N.D. & A.E. Leschziner, 1995. DNA transposition: froma black box to a color monitor. Cell 83: 1063–1066.

Hanks, S. K. & T. Hunter. 1995, Protein kinases 6. The eukaryoticprotein kinase superfamily: kinase (catalytic) domain structureand classification. FASEB Journal 9: 576–596.

Hanks, S. K., A. M. Quinn & T. Hunter, 1988. The protein kinasefamily: conserved features and deduced phylogeny of the catalyticdomains. Science 241: 42–52.

Harris, L.J. & A.M. Rose, 1989. Structural analysis of Tc1 elementsin Caenorhabditis elegans var. Bristol (strain N2). Plasmid 22:10–21.

Hartley, K.O., D. Gell, G.C. Smith, H. Zhang, N. Divecha, M.A.Connelly, A. Admon, S.P. Lees-Miller, C.W. Anderson & S.P.Jackson, 1995. DNA-dependent protein kinase catalytic sub-unit: a relative of phosphatidylinositol 3-kinase and the ataxiatelangiectasia gene product. Cell 82: 849–856.

Herman, P.K., J.H. Stack, J.A. DeModena & S.D. Emr, 1991. Anovel protein kinase homolog essential for protein sorting to theyeast lysosome-like vacuole. Cell 64: 425–437.

Herrick, G., 1994. Germline-soma relationships in ciliated protozoa:the inception and evolution of nuclear dimorphism in one-celledanimals. Sem. Dev. Biol. 5: 3–12.

Herrick, G., S. Cartinhour, D. Dawson, D. Ang, R. Sheets, A. Lee &K. Williams, 1985. Mobile elements bounded by C4A4 telomericrepeats in Oxytricha fallax. Cell 43: 759–768.

Hunter, T., 1995. When is a lipid kinase not a lipid kinase? When itis a protein kinase. Cell 83: 1–4.

Ina, Y., M. Mizokami, K. Ohba & T. Gojobori, 1994. Reduction ofsynonymous substitutions in the core protein gene of hepatitis Cvirus. J. Mol. Evol. 38: 50–56.

Jahn, C.L., S.Z. Doktor, J.S. Frels, J.W. Jaraczewski & M.F. Krikau,1993. Structures of the Euplotes crassus Tec1 and Tec2 elements:identification of putative transposase coding regions. Gene 133:71–78.

Jamieson, A.C., S.H. Kim & J.A. Wells, 1994. In vitro selection ofzinc fingers with altered DNA-binding specificity. Biochemistry33: 5689–95.

Kaplan, N., T. Darden & C.H. Langley, 1985. Evolution and extinc-tion of transposable elements in Mendelian populations. Genetics109: 459–480.

Kim, K., S.Y. Namgoong, M. Jayaram & R.M. Harshey, 1995.Step-arrest mutants of phage Mu transposase: implications in

86

DNA-protein assembly, Mu end cleavage, and strand transfer. J.Biol. Chem. 270: 1472–1479.

Klobutcher, L.A. & G. Herrick, 1997. Developmental genome reor-ganization in ciliated protozoa: the transposon link. Progress inNucleic Acid Research and Mol. Bio. 56: 1–62.

Kulkosky, J., K.S. Jones, R.A. Katz, J.P.G. Mack & A.M. Skalka,1992. Residues critical for retroviral integrative recombination ina region that is highly conserved among retroviral/retrotransposonintegrases and bacterial insertion sequence transposases. Mol.Cell. Biol. 12: 2331–2338.

Lundblad, V. & W.E. Wright, 1996. Telomeres and telomerase: asimple picture becomes complex. Cell 87: 369–375.

MacRae, A.F. & M.T. Clegg, 1992. Evolution of Ac and Ds1 ele-ments in select grasses (Poaceae). Genetica 86: 55–66.

Mizuuchi, K., 1992. Transpositional recombination: mechanisticinsights from studies of Mu and other elements. Ann. Rev.Biochem. 61: 1011–1051.

Mizuuchi, M., T. A. Baker & K. Mizuuchi, 1995. Assembly of phageMu transpososomes: cooperative transitions assisted by proteinand DNA scaffolds. Cell 83: 375–385.

Montell, C. & G.M. Rubin, 1988. The Drosophila ninaC locusencodes two photoreceptor cell specific proteins with domainshomologous to protein kinases and the myosin heavy chain head.Cell 52: 757–772.

Morrow, D.M., D.A. Tagle, Y. Shiloh, F.S. Collins & P. Hieter, 1995.TEL1, an Saccharomyces cerevisiae homolog of the human genemutated in ataxia telangiectasia, is functionally related to theyeast checkpoint gene MEC1. Cell 82: 831–840.

Nei, M. & T. Gojobori, 1986. Simple methods for estimating thenumbers of synonymous and nonsynonymous nucleotide substi-tutions. Mol. Biol. Evol. 3: 418–426.

O’Hare, K., A. Driver, S. McGrath & D.M. Johnson-Schiltz,1992. Distribution and structure of cloned P elements from theDrosophila melanogaster P strain �2. Genet. Res. 60: 33–41.

Pabo, C.O. & R.T. Sauer, 1992. Transcription factors: structuralfamilies and principles of DNA recognition. Annu. Rev. Biochem.61: 1053–1095.

Prescott, D.M., 1994. The DNA of ciliated protozoa. Microbiol. Rev.58: 233–267.

Radstrom, P., O. Skold, G. Swedberg, J. Flensburg, P.H. Roy & L.Sundstrom, 1994. Transposon Tn5090 of plasmid R751, whichcarries an integron, is related to Tn7, Mu, and the retroelements.J. Bacteriol. 176: 3257–3268.

Rebar, E.J. & C.O. Pabo, 1994. Zinc finger phage: affinity selectionof fingers with new DNA-binding specificities. Science 263: 671–673.

Rezsohazy, R., B. Hallet, J. Delcour & J. Mahillon, 1993. The IS4family of insertion sequences: evidence for a conserved trans-posase motif. Mol. Microbiol. 9: 1283–1295.

Rice, P. & K. Mizuuchi, 1995. Structure of the bacteriophage Mutransposase core: a common structural motif for DNA transposi-tion and retroviral integration. Cell 82: 209–220.

Robertson, H.M. & D.J. Lampe, 1995. Recent horizontal transfer ofa mariner transposable element among and between Diptera andNeuroptera. Mol. Biol. Evol. 12: 850–862.

Robertson, H.M., 1995. The Tc1-mariner superfamily of transposonsin animals. J. Insect Physiol. 41: 99–105.

Sawyer, S., 1989. Statistical tests for detecting gene conversion.Mol. Biol. Evol. 6: 526–38.

Seegmiller, A., K.R. Williams, R.L. Hammersmith, T.G. Doak, T.Messick, D.J. Witherspoon, L.L. Storjohann & G. Herrick, 1996.Internal Eliminated Sequences of Oxytricha: Allelic Fixation,Divergence, Conservation and Conversions. Mol. Biol. Evol. 13:1351–1362.

Smith, T.F. & M.S. Waterman, 1981. Identification of common mole-cular subsequences. J. Mol. Biol. 147: 195–197.

Taylor, S. S. & E. Radzio-Andzelm, 1994. Three protein kinasestructures define a common motif. Structure 2: 345–355.

Taylor, S. S., E. Radzio-Andzelm & T. Hunter, 1995. How doprotein kinases discriminate between serine/threonine and tyro-sine? Structural insights from the insulin receptor protein-tyrosinekinase. FASEB Journal 9: 1255–1266.

van Gent, D.C., A.A. Groeneger & R.H. Plasterk, 1992. Mutationalanalysis of the integrase protein of human immunodeficiencyvirus type 2. Proc. Natl. Acad. Sci. USA 89: 9598–9602.

van Gent, D.C., K. Mizuuchi & M. Gellert, 1996. Similaritiesbetween initiation of V(D)J recombination and retroviral inte-gration. Science 271: 1592–1594.

van Luenen, H. G., S. D. Colloms & R. H. Plasterk, 1994. Themechanism of transposition of Tc3 in Caenorhabditis elegans.Cell 79: 293–301.

Vincent, K.A., V. Ellison, S.A. Chow & P.O. Brown, 1993. Char-acterization of human immunodeficiency virus type 1 integraseexpressed in Escherichia coli and analysis of variants with amino-terminal mutations. J. Virol. 67: 425–437.

Watson, M.A. & G. Chaconas, 1996. Three-site synapsis during MuDNA transposition: a critical intermediate preceding engagementof the active site. Cell 85: 435–445.

Weaver, D.T., 1995. What to do at an end: DNA double-strand-breakrepair. Trends Genet. 11: 388–392.

Williams, K., T.G. Doak & G. Herrick, 1993. Precise Excision ofOxytricha trifallax Telomere-Bearing Elements and formation ofCircles Closed by a Copy of the Flanking Target Duplication. TheEMBO Journal 12: 4593–4601.

Williams, K.R. & G. Herrick, 1991. Expression of the gene encodedby a family of macronuclear chromosomes generated by alterna-tive DNA processing in Oxytricha fallax. Nucleic Acids Res. 19:4717–4724.

Witherspoon, D.J., T.G. Doak, K. Williams, J. Seger & G. Herrick,1997. Selection on the protein-coding genes of the TBE1 familyof transposable elements in the ciliates Oxytricha fallax and O.trifallax. Mol. Biol. Evol., 14: 696–706.

Yang, W. & T.A. Steitz, 1995. Recombining the structures of HIVintegrase, RuvC and RNase H. Structure 3: 131–134.

Zakian, V.A., 1995. ATM-related genes: what do they tell us aboutfunctions of the human gene? Cell 82: 685–687.

Zhang, P. & A.C. Spradling, 1993. Efficient and dispersed local Pelement transposition from Drosophila females. Genetics 133:361–373.