+ All Categories
Home > Documents > Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences,...

Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences,...

Date post: 20-Jan-2016
Category:
Upload: brice-roberts
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
230
Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinic Taipei, Taiwan R. O. C. Wen-chang Lin E-mail: [email protected] Http://www.ibms.sinica.edu.tw/~wenlin
Transcript
Page 1: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Genome Projectsand Gene HuntingGenome Projects

and Gene Hunting

Institute of Biomedical Sciences, Academia SinicaTaipei, Taiwan

R. O. C.

Wen-chang Lin

E-mail: [email protected]://www.ibms.sinica.edu.tw/~wenlin

Page 2: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

The Human Genome Project is an ambitious effort to understand the hereditary instructions that make each of us unique. The goal of this effort is to find the location of the 100,000 or so human genes and to read the entire genetic script, all 3 billion bits of information, by the year 2005.

Page 3: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

The Human Genome Project (HGP) is an international research program designed to construct detailed genetic and physical maps of the human genome, to determine the complete nucleotide sequence of human DNA, to localize the estimated 50,000-100,000 genes within the human genome, and to perform similar analyses on the genomes of several other organisms used extensively in research laboratories as model systems. The scientific products of the HGP will comprise a resource of detailed information about the structure, organization and function of human DNA, information that constitutes the basic set of inherited "instructions” for the development and functioning of a human being. Successfully accomplishing these ambitious goals will demand the development of a variety of new technologies. It will also necessitate advanced means of making the information widely available to scientists, physicians, and others in order that the results may be rapidly used for the public good. Improved technology for biomedical research will thus be another important product of the HGP. From the inception of the HGP, it was clearly recognized that acquisition and use of such genetic knowledge would have momentous implications for both individuals and society and would pose a number of policy choices for public and professional deliberation. Analysis of the ethical, legal, and social implications of genetic knowledge, and the development of policy options for public consideration are therefore yet another major component of the human genome research effort.

What is the Human Genome Project?

Page 4: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Genetic MapComplete the 2-5 cM map by 1995 Develop technology for rapid genotyping Develop markers that are easier to useDevelop new mapping technologies

Physical MapComplete an STS map of the human genome at a resolution of 100 kb

DNA SequencingDevelop efficient approaches to sequencing one- to several- megabase regions of DNA of high biological interest.Develop technology for high throughput sequencing, focusing on systems integration of all steps from template preparation to data analysis. Build up sequencing capacity to a collective rate of 50 Mb per year by the end of the period. This rate should result in an aggregate of 80 Mb of DNA sequence completed by the end of FY 1998.

Specific Goals (Phase I 1993-1998)

Page 5: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 6: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Gene IdentificationDevelop efficient methods of identifying genes and for placement of known genes

on physical maps or sequenced DNA. Technology DevelopmentSubstantially expand support of innovative technological developments as well as improvements in current technology for DNA sequencing and to meet the needs of the Human Genome Project as a whole.

Model OrganismsFinish an STS map of the mouse at 300 Kb resolution Finish the sequence of the E. coli and S. cerevisiae genomes by 1998 or earlier Continue sequencing C. elegans and Drosophila genomes, with the aim of bringing C. elegans to near completion by 1998 Sequence selected segments of mouse DNA side by side with corresponding human DNA in areas of high biological interest

Specific Goals (Phase I 1993-1998)

Page 7: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

InformaticsContinue to create, develop and operate databases and database tools for easy access to data, including effective tools and standards for data exchange and links among databases Consolidate, distribute and continue to develop effective software for large-scale genome projects Continue to develop tools for comparing and interpreting genome information Ethical, Legal and Social Implications (ELSI)Continue to identify and define issues and develop policy options to address them Develop and disseminate policy options regarding genetic testing services with widespread potential use Foster greater acceptance of human genetic variation Enhance and expand public and professional education that is sensitive to sociocultural and psychological issues

TrainingContinue to encourage training of scientists in interdisciplinary sciences related to genome research

Specific Goals (Phase I 1993-1998)

Page 8: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Specific Goals (Phase I 1993-1998)

Technology TransferEncourage and enhance technology transfer both into and out of centers of genome research

OutreachCooperate with those who would set up distribution centers for genome materials. Share all information and materials within 6 months of their development. This should be accomplished by submission to public databases or repositories, or both, where appropriate.

Page 9: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Specific Goals (Phase II 1998-2003)

Page 10: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Goal 1--The Human DNA Sequence

a) Finish the complete human genome sequence by the end of 2003.

b) Finish one-third of the human DNA sequence by the end of 2001.

c) Achieve coverage of at least 90% of the genome in a working draft based on mapped clones by the end of 2001.

d) Make the sequence totally and freely accessible.

Specific Goals (Phase II 1998-2003)

Page 11: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Goal 2--Sequencing Technology

a) Continue to increase the throughput and reduce the cost of current sequencing technology.

b) Support research on novel technologies that can lead to significant improvements in sequencing technology.

c) Develop effective methods for the advanced development and introduction of new sequencing technologies into the sequencing process.

Specific Goals (Phase II 1998-2003)

Page 12: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

a) Develop technologies for rapid, large-scale identification or scoring, or both, of SNPs and other DNA sequence variants.

b) Identify common variants in the coding regions of the majority of identified genes during this 5-year period.

c) Create an SNP map of at least 100,000 markers.

d) Develop the intellectual foundations for studies of sequence variation.

e) Create public resources of DNA samples and cell lines.

Goal 3--Human Genome Sequence Variation

Specific Goals (Phase II 1998-2003)

Page 13: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Goal 4--Technology for Functional Genomics

a) Develop cDNA resources.

b) Support research on methods for studying functions of non-protein-coding sequences.

c) Develop technology for comprehensive analysis of gene expression.

d) Improve methods for genome-wide mutagenesis.

e) Develop technology for global protein analysis.

Specific Goals (Phase II 1998-2003)

Page 14: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Goal 5--Comparative Genomics

a) Complete the sequence of the C. elegans genome in 1998.

b) Complete the sequence of the Drosophila genome by 2002.

c) The mouse genome.1) Develop physical and genetic mapping resources.2) Develop additional cDNA resources.3) Complete the sequence of the mouse genome by 2005.

d) Identify other model organisms that can make major contributions to the understanding of the human genome and support appropriate genomic studies.

Specific Goals (Phase II 1998-2003)

Page 15: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Goal 6--Ethical, Legal, and Social Implications (ELSI)

Page 16: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

U.S. Human Genome Project Funding($Millions)FY DOE NIH* U.S. Total1988 10.7 17.2 27.91989 18.5 28.2 46.71990 27.2 59.5 86.71991 47.4 87.4 134.81992 59.4 104.8 164.21993 63.0 106.1 169.11994 63.3 127.0 190.31995 68.7 153.8 222.51996 73.9 169.3 243.21997 77.9 188.9 266.81998 85.5 217.7 303.21999 89.8 225.7 315.5 (NT$9,780,500,000)

Page 17: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Mar. 24, 2000 --Finished sequence:  561,973 kb  17.5% of genome Draft sequence:  2,020,129 kb  62.9% of genome

Page 18: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Current ProgressBreakdown by ChromosomeChr Effective Sequence Percent Number of Longest

size (kb) done (kb) finished contigs contig (kb)  

1 263000 26571 10.1% 154 928  2 255000 23193 9.1% 109 695  3 214000 10417 4.9% 59 746  4 203000 12521 6.2% 99 393  5 194000 15679 8.1% 94 739  6 183000 45668 25.0% 305 3926  7 171000 81476 47.6% 298 2094  8 155000 8730 5.6% 42 1902  9 145000 4839 3.3% 30 1010  10 144000 6091 4.2% 36 469  11 144000 8398 5.8% 63 817  12 143000 24509 17.1% 99 1526  13 98000 2143 2.2% 7 1416  14 93000 29775 32.0% 106 1450  15 89000 2196 2.5% 17 297  16 98000 19372 19.8% 118 512  17 92000 28861 31.4% 129 1101  18 85000 3734 4.4% 20 349  19 67000 15021 22.4% 144 1008  20 72000 25825 35.9% 137 1187  21 39000 25851 66.3% 72 7223  22 34491 33620 97.5% 12 23051  X 164000 65513 39.9% 347 949  Y 35000 6934 19.8% 27 1104  total 3180491 528043 16.6 2532  

Page 19: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

The completed sequence covers 33.4 Mb of 22q with 11 gaps and has been estimated to be accurate to less than 1 error in 50,000 bases, by internal and external checking exercises. The largest contiguous segment stretches over 23 Mb. From our gap-size estimates, we calculate that we have completed 33,464 kb of a total region spanning 34,491 kb and that therefore the sequence is complete to 97% coverage of 22q.

Page 20: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

545 genes;

134 pseudogenes.

Page 21: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

http://www.ornl.gov/hgmis/

Page 22: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

http://www.ncbi.nlm.nih.gov/disease/

3,000 ~ 4,000 genes

Page 23: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 24: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Completed Genomes Organism Genome Estimated

Size (Mb) GenesCaenorhabditis elegans 100 Saccharomyces cerevisiae 12.1 6034 Escherichia coli 4.6 4288 Bacillus subtilus 4.2 ~4000 Synechocystis sp. 3.6 3168 *Archaeoglobus fulgidus 2.2 2471 *Pyrobaculum aerophilum 2.2 N.A. Haemophilus influenzae 1.8 1740 *Methanobacterium   thermoautotrophicum 1.8 1855 Helicobacter pylori 1.7 1590 *Methanococcus jannaschii 1.7 1692 *Aquifex aolicus 1.5 1508Borrelia burgdorferi 1.3 863Treponema pallidum 1.1 1234Mycoplasma pneumoniae 0.8 677*Mycoplasma genitalium 0.6 470*Mycoplasma genitalium 0.6 470Treponema pallidum 1.14Chlamydia trachomatis 1.05Plasmodium falciparumChr2 1Rickettsia prowazekii 1.1Helicobacter pylori 1.64Leishmania majorChr1 .27

Page 25: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

The TIGR Microbial Database provides links to world-wide genome sequencing projects completed and underway, including the completed TIGR genomes: Archaeoglobus fulgidus, Borreliaburgdorferi, Deinococcus radiodurans,Haemophilus influenzae,Helicobacter pylori, Methanococcus jannaschii, Mycobacterium tuberculosis, Mycoplasma genitalium, Thermotoga maritima,and Treponema pallidum.

Page 26: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 27: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 28: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 29: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

In the last few decades, advances in molecular biology and the equipment available for research in this field have allowed the increasingly rapid sequencing of large portions of the genomes of several species. In fact, to date, several bacterial genomes, as well as those of some simple eukaryotes (e.g., Saccharomyces cerevisiae, or baker's yeast) have been sequenced in full. The Human Genome Project, designed to sequence all 24 of the human chromosomes, is also progressing. Popular sequence databases, such as GenBank and EMBL, have been growing at exponential rates. This deluge of information has necessitated the careful storage, organization and indexing of sequence information. Information science has been applied to biology to produce the field called Bioinformatics

Page 30: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

The most pressing tasks in bioinformatics involve the analysis of sequence information. Computational Biology is the name given to this process, and it involves the following:

• Finding the genes in the DNA sequences of various organisms • Developing methods to predict the structure and/or function of newly discovered proteins and structural RNA sequences. • Clustering protein sequences into families of related sequences and the development of protein models.• Aligning similar proteins and generating phylogenetic trees to examine evolutionary relationships.

Page 31: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Simple Mathematics:

Human Genome

3 x 10 9 bps

Human Genes (5% of the genome)

100,000 genesIn a given cell type at a certain stage, it is estimated that around 20 % of the genes are transcribed or expressed.

20,000 genes

Page 32: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Automatic sequencer

Page 33: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

The Growth of GenBank sequence database in the past 10 years.

Release Year Base pairs Entries

58 88 24,690,876 21,24862 89 37,183,950 31,22966 90 51,306,092 41,05770 91 77,337,678 58,95274 92 120,242,234 97,08480 93 163,802,597 150,74486 94 230,485,928 237,77592 95 425,860,958 620,76598 96 730,552,9381,114,581104 97 1,258,290,5131,891,953110 98 2,162,067,8713,043,729115 99 4,653,932,7455,354,511

Page 34: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Gene Expression Studies

Page 35: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 36: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

GenBank Overview

What is GenBank? GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences ( Nucleic Acids Research 1998 Jan 1;26(1):1-7). There are approximately 2,162,000,000 bases in 3,044,000 sequence records as of December 1998. As an example, you may view the record for the neurofibromatosis gene. The complete release notes for the current version of GenBank are available. A new release is made every two months. GenBank is part of the International Nucleotide Sequence Database Collaboration, which is comprised of the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis.

Submissions to GenBank Many journals require submission of sequence information to a database prior to publication so that an accession number may appear in the paper. NCBI has a WWW form, called BankIt, for convenient and quick submission of sequence data. The beta-test version of Sequin, NCBI's new stand-alone submission software for MAC, PC, and UNIX platforms, is available by FTP. When using Sequin, the output files for direct submission should be sent to GenBank by electronic mail. Alternatively, the data files may be copied to a floppy disk and mailed to NCBI. Authorin, an older stand-alone program for MACs and PCs, can still be used to format your submission, although submitters are encouraged to switch to either BankIt or Sequin.

Page 37: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Searching GenBank

Text and Similarity searching

Entrez Browser GenBank (nucleotides and proteins), PubMed (MEDLINE), 3D structures, genomes, and taxonomy databases.

BLAST Sequence Similarity Searching Nucleotide or protein query sequences against the specified database using the BLAST suite of algorithms.

dbEST Searching dbEST (Database of Expressed Sequence Tags).

Page 38: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

>gi|216185|dbj|D00635|ABCADHCC Acetobacter polyoxogenes genes for alcohol dehydrogenase, cytochrome c, complete cds GAATTCCGAACTATCCGTTTCATTGCTTATGCGACAGCATGTTCACTTTTTAGTGAGGCTGAACACTAAAATGTCAGGAGACGAGCGTGCTAGCCTCAGTATGTTGCCATGAAACGGACCACCTGCTTTGTCTTTCCTGCCTGAAGCCGGTTTCTGTCTGGCCGGAAAAGAAGCGCTAGCGCGTTTTTTTGCCGGATACATTCAGAAAGCTGCTCCGGGCAGAAAGTTGCAGCGGCGGCATCCTGAATTCGAAACCGTTAGTTTTCTGAGGACATCACATATGATTTCTGCCGTTTTCGGAAAAAGACGTTCTCTGAGCAGAACGCTTACAGCCGGAACGATATGTGCGGCTCTCATCTCCGGGTATGCCACCATGGCATCCGCAGATGACGGGCAGGGCGCCACGGGGGAAGCGATCATCCATGCCGATGATCACCCCGGTAACTGGATGACCTATGGCCGCACCTATTCTGACCAGCGCTACAGCCCGCTGGATCAGATCAACCGTTCCAATGTCGGTAACCTGAAGCTGGCCTGGTATCTGGACCTTGATACCAACCGTGGCCAGGAAGGCACGCCCCTGGTTATTGATGGCGTCATGTACGCCACCACCAACTGGAGCATGATGAAAGCCGTCGACGCCGCAACCGGCAAGCTGCTGTGGTCCTATGACCCGCGCGTGCCCGGCAACATTGCCGACAAGGGCTGCTGTGACACGGTCAACCGTGGCGCGGCATACTGGAATGGCAAGGTCTATTTCGGCACGTTCGACGGTCGCCTGATCGCGCTGGACGCCAAGACCGGCAAGCTGGTCTGGAGCGTCAACACCATTCCGCCCGAAGCGGAACTGGGCAAGCAGCGTTCCTATACGGTTGACGGCGCGCCCCGTATCGCCAAGGGCCGCGTGA>>

GenBank nr database:

FASTA format

Page 39: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 40: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 41: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 42: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 43: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 44: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 45: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 46: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 47: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 48: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 49: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 50: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 51: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 52: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 53: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 54: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 55: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Medline searches: Academia Sinica Library (local)

Page 56: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 57: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Http://igm.nlm.nih.gov/

Page 58: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Given COX-1 and COX-2 can a putative COX-3 be identified?

Text search for COX-3 (and suitable alternative forms)

Acquire human COX-1 and COX-2 sequences

Search for sequence similarties in a full-length sequence database

Search for sequence similarties in an EST database

Merge the results of the full-length and EST searches

Page 59: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

ESTs virtually indentical to COX-1 and COX-2

May provide tissue localization information

Strong similarities with other genes indicate close relationship of COX family to another gene family - probably with a different function

ESTs similar, but not indentical to COX-1/-2

Search ESTs back against full-length databases

Is it highly similar to COX-1, COX-2 or both?Is it only weakly similar?If so, might it be more similar to something else, a putative COX-3?

Page 60: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

In silico cloning:In order to perform an electronic cDNA library screen, the EST

sequences retrieved in this way can be used as queries in a BLASTN search of dbEST to identify over-lapping ESTs. This procedure can be reiterated with the newly identified ESTs until no additional hits are found. The ESTs isolated can be assembled into sequence contigs

using computer softwares.

Query

EST 2EST 1EST 3

Page 61: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

1 mdltkmgmiq lqnpshptgl lckanqmrla gtlcdvvimv dsqefhahrt vlactskmfe 61 ilfhrnsqhy tldflspktf qqileyayta tlqakaedld dllyaaeile ieyleeqclk 121 mletiqasdd ndteatmadg gaeeeedrka rylknifisk hsseesgyas vagqslpgpm 181 vdqspsvsts fglsamsptk aavdslmtig qsllqgtlqp pagpeeptla gggrhpgvae 241 vktemmqvde vpsqdspgaa essisggmgd kveergkegp gtptrssvit sarelhygre 301 esaeqvpppa eagqaptgrp ehpapppekh lgiysvlpnh kadavlsmps svtsglhvqp 361 alavsmdfst yggllpqgfi qrelfsklge lavgmksesr tigeqcsvcg velpdneave 421 qhrklhsgmk tygcelcgkr fldslrlrmh llahsagaka fvcdqcgaqf skedalethr 481 qthtgtdmav fcllcgkrfq aqsalqqhme vhagvrsyic secnrtfpsh talkrhlrsh 541 tgdhpyecef cgscfrdest lkshkrihtg ekpyecngcd kkfslkhqle thyrvhtgek 601 pfecklchqr srdysamikh lrthngaspy qcticteycp slssmqkhmk ghkpeeippd 661 wriektylyl cyv

Page 62: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Sequence Alignment and Similarity Search:One goal of sequence alignment is to enable the researcher

to determine whether two sequences display sufficient similarity to justify the inference of homology. Similarity is an observable quantity that might be expressed as, say, percent identity or some other suitable measure. Homology, on the other hand, refers to a conclusion drawn from these data that two genes share a common evolutionary history. While it is presumed that homologous sequences have diverged from a common ancestral sequence through iterative changes, we do not actually know what the ancestral sequence was (barring the possibility that DNA could be recovered from a fossil); all we have to observe are the sequences from extant organisms. In a residue-by-residue alignment it is often apparent that certain regions of a protein, or perhaps specific amino acids, are more highly conserved than others. This information may be suggestive of which residues are most crucial for a maintaining a protein’s structure or function.

Page 63: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFE 60hum TZFP p 1 MSLPPIRLPSPYGSDRLVQLAARLRPA--LCDTLITVGSQEFPAHSLVLAGVSQQLG 55 : I:L P L: A ::R A LCD :I V SQEF AH VLA S:

hum pLZF p 61 ILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLDDLLYAAEILEIEYLEEQCLK 120hum TZFP p 56 ----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELRPLQEAARALGVQSLEEACWR 111 R Q : :SP TF Q:L : Y ::: : :L L AA L :: LEE C :

hum pLZF p 121 MLETIQASDDNDTEATMADGGAEEEEDRKARYLKNIFISKHSSEESGYASVAGQSLPGPM 180hum TZFP p 112 ARGDRAKKPDP--------G-----------------LKKHQEEPEKPSRNPERELGDPG 146 D G : KH E : : L P

hum pLZF p 181 VDQSP-SVSTSFGLSAMSPTKAAVDSLMTIGQSLLQGTLQPPAGPEEPTLAGGGRHPGVA 239hum TZFP p 147 EKQKPEQVSRTGGR-----------------EQEMLHKHSPPRG--RPEMAG-------- 179 Q P VS : G : : PP G P :AG

hum pLZF p 240 EVKTEMMQVDEVPSQDSPGAAESSISGGMGDKVEERGKEGPGTPTRSSVITSARELHYGR 299hum TZFP p 180 --ATQEAQQEQTRSK------EKRLQAPVG----QRGADG-----KHGVLTWLRENPGGS 222 T: Q :: S: E : :G :RG :G : V:T RE G

hum pLZF p 300 EESAEQVPPPAEAGQAPTGRPEHPAPP-PEKHLGIYSVLPNHKADAVLSMPSSVTSGLHV 358hum TZFP p 223 EESLRKLPGPLP----PAGSLQTSVTPRPSWAEAPWLVGGQPALWSILLMPP-------- 270 EES ::P P P:G : P P V : ::L MP

Page 64: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

hum pLZF p 359 QPALAVSMDFSTYGGLLPQGFIQRELFSKLGELAVGMKSESRTIGEQCSVCGVELPDNEA 418hum TZFP p 271 RYGIPFYHSTPTTGAWQEVWREQRIPLSLNAPKGLWSQNQ---L-ASSSPTPGSLP---- 322 : : T G QR S : : : : :S : LP

hum pLZF p 419 VEQHRKLHSGMKTYGCELCGKRFLDSLRLRMHLLAHSAGAKAFVCDQCGAQFSKEDALET 478hum TZFP p 323 ----------------------------------------------QGPAQLS-PGEMEE 335 Q AQ S :E

hum pLZF p 479 HRQTHTGTDMAVFCLLCGKRFQAQSALQQHMEVHAGVRSYICSECNRTFPSHTALKRHLR 538hum TZFP p 336 SDQGHTG---------------ALATCAGHEDKAG------CPPRPHPPPAPPARSR--- 371 Q HTG A :: H : C : P: A R

hum pLZF p 539 SHTGDHPYECEFCGSCFRDESTLKSHKRIHTGEKPYECNGCDKKFSLKHQLETHYRVHTG 598hum TZFP p 372 ----------------------------------PYACSVCGKRFSLKHQMETHYRVHTG 397 PY C C K:FSLKHQ:ETHYRVHTG

hum pLZF p 599 EKPFECKLCHQRSRDYSAMIKHLRTHNGASPYQCTICTEYCPSLSSMQKHMKGHKPEEIP 658hum TZFP p 398 EKPFSCSLCPQRSRDFSAMTKHLRTH-GAAPYRCSLCGAGCPSLASMQAHMRGHSPSQLP 456 EKPF C LC QRSRD:SAM KHLRTH GA:PY:C::C CPSL:SMQ HM:GH P ::P

hum pLZF p 659 PDWRIEKTYLY------------LCYV 673hum TZFP p 457 PGWTIRSTFLYSSSRPSRPSTSPCCPSSSTT 487 P W I T:LY C

Page 65: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Sequence Alignment and Similarity Search:Database similarity searching allows us to determine which

of the hundreds of thousands of sequences present in the database are potentially related to a particular sequence of interest. In database searching, the basic operation is to sequentially align a query sequence to each subject sequence in the database. The results are reported as a ranked hit list followed by a series of individual sequence alignments, plus various scores and statistics. Current sequence databases are already immense and have continued to increase at an exponential rate, making straightforward application of dynamic programming methods impractical for database searching. One solution is to use massively parallel computers. There are several frequently used programs available on the Internet:

FastABLITZBLASTSmith-Waterman based system (GenWeb of NHRI)

Page 66: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 67: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 68: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 69: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Blast Family of Programs

The BLAST family of programs allows all combinations of DNA or protein query sequences with searchesagainst DNA or protein databases:

blastp compares an amino acid query sequence against a protein sequence database.

blastn compares a nucleotide query sequence against a nucleotide sequence database.

blastx compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database.

tblastn compares a protein query sequence against a nucleotide sequence database dynamically translated in all six reading frames (both strands).

tblastx compares the six-frame translations of a nucleo- tide query sequence against the six-frame transla- tions of a nucleotide sequence database.

The default matrix for all protein-protein comparisons is BLOSUM62.

Page 70: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Databases available for BLAST search

Protein Sequence Databasesnr All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF month All new or revised GenBank CDS translation+PDB+SwissProt+PIR+PRF released in the last 30 days. swissprot the last major release of the SWISS-PROT protein sequence database (no updates) yeast Yeast (Saccharomyces cerevisiae) protein sequences. E. coli E. coli genomic CDS translations pdb Sequences derived from the 3-dimensional structure Brookhaven Protein Data Bank

Nucleotide Sequence Databasesnr All Non-redundant GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or HTGS sequences) month All new or revised GenBank+EMBL+DDBJ+PDB sequences released in the last 30 days. dbest Non-redundant Database of GenBank+EMBL+DDBJ EST Divisions dbsts Non-redundant Database of GenBank+EMBL+DDBJ STS Divisions yeast Yeast (Saccharomyces cerevisiae) genomic nucleotide sequences E. coli E. coli genomic nucleotide sequences

Page 71: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 72: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 73: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 74: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 75: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 76: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

organism

Page 77: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 78: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

CLUSTAL W

One of the most widely used multiple sequence alignment program. Based on the idea of progressive alignment, this program takes an input set of sequences and calculates a series of pairwise alignments, comparing each sequence to every other sequence, one at a time.

Page 79: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

406 ZINC1(part #1) 406 1/1 C(part #2) 409 14/14 CGVELPDNEAVEQH(part #3) 426 1/1 H

434 ZINC1(part #1) 434 1/1 C(part #2) 437 14/14 CGKRFLDSLRLRMH(part #3) 454 1/1 H

463 ZINC1(part #1) 463 1/1 C(part #2) 466 14/14 CGAQFSKEDALETH(part #3) 483 1/1 H

492 ZINC1(part #1) 492 1/1 C(part #2) 495 14/14 CGKRFQAQSALQQH(part #3) 512 1/1 H

520 ZINC1(part #1) 520 1/1 C(part #2) 523 14/14 CNRTFPSHTALKRH(part #3) 540 1/1 H

Human PLZF

Page 80: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

548 ZINC1(part #1) 548 1/1 C(part #2) 551 14/14 CGSCFRDESTLKSH(part #3) 568 1/1 H

576 ZINC1(part #1) 576 1/1 C(part #2) 579 14/14 CDKKFSLKHQLETH(part #3) 596 1/1 H

604 ZINC1(part #1) 604 1/1 C(part #2) 607 14/14 CHQRSRDYSAMIKH(part #3) 624 1/1 H

632 ZINC1(part #1) 632 1/1 C(part #2) 635 14/14 CTEYCPSLSSMQKH(part #3) 652 1/1 H

C2H2 zinc finger motif

Page 81: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

ID ZINC_FINGER_C2H2; BLOCKAC BL00028; distance from previous block=(7,2235)DE Zinc finger, C2H2 type, domain proteins.BL CHP motif; width=29; seqs=135; 99.5%=1594; strength=1246ADR1_YEAST ( 106) CEVCTRAFARQEHLKRHYRSHTNEKPYPC 10AEF1_DROME ( 214) CNVCDKTFRQSSTLTNHLKIHTGEKPYNC 10AZF1_YEAST ( 623) CDYCGKRFTQGGNLRTHERLHTGEKPYSC 10BASO_HUMAN ( 358) CTACEKTFYDKGTLKIHYNAVHLKIKHKC 39BRC1_DROME ( 669) CNICKRVYSSLNSLRNHKSIYHRNLKQPK 37BRC2_DROME ( 471) CAICERVYCSRNSLMTHIYTYHKSRPGEM 27BRC3_DROME ( 467) GSLAAAVYSLHSHAHGHVLGHATSPPRPG 87BRLA_EMENI ( 324) EPGCNGRFKRQEHLKRHMKSHSKEKPHVC 22 BTEB_RAT ( 147) YSGCGKVYGKSSHLKAHYRVHTGERPFPC 11CF23_DROME ( 368) CPDCPKTFKTPGTLAMHRKIHTGEAEREA 24 CF2_DROME ( 403) CSYCGKSFTQSNTLKQHTRIHTGEKPFRC 11

ID ZINC_FINGER_C2H2; PATTERN.AC PS00028;DT APR-1990 (CREATED); JUN-1994 (DATA UPDATE); NOV-1995 (INFO UPDATE).DE Zinc finger, C2H2 type, domain.PA C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H-x(3,5)-H.

BLOCK

Prosite

Page 82: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Phylogenetic Analysis:Phylogenetics is the study of evolutionary relationships.

Phylogenetic analysis is the means of inferring or estimating these relationship. The evolutionary history inferred from phylogenetic analysis is usually depicted as branching (treelike) diagrams, which represent a ort of pedigree of the inherited relationships among molecules (“gene trees”), organisms, or both. The four steps in phylogenetic analysis of DNA sequences are alignment, determining the substitution model, tree building, and tree evaluation. While other scientific analysis generally have empirical bases, phylogenetic analysis do not. The physical events yielding a phylogeny happened in the past, and can only be inferred or estimated. The three major tree-building criteria are distance, maximum parsimony, and maximum likelihood.

Page 83: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Over 130 packages available for various platforms

Page 84: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 85: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Radial

Page 86: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

SlantedCladogram

Page 87: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Phylogram

Page 88: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

http://www2.ebi.ac.uk/clustalw/

Page 89: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 90: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Ortholog:Homologous genes that have diverged from each other after speciation events (e.g., human beta- and chimp beta-globin)

Paralog:Homologous genes that have diverged from each other after gene duplication events (e.g., human beta- and gamma-globin)

Xenolog:Homologous genes that have diverged from each other after lateral gene transfer events (e.g., antibiotic resistance genes in bacteria)

Homolog:Genes that are descended from a common ancestor (e.g., all globins)

Page 91: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

COG0568 K DNA-dependent RNA polymerase sigma70/sigma32 subunits

Page 92: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

EST:Expressed Sequences TagsdbEST is a division of GenBank that contains sequence data and other information on "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of

organisms. There are 1,775,721 entries in human EST and 918,414 entries in mouse EST. Total of 3,643,273 sequence entries in dbEST. (Feb. 18, 2000).

Page 93: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 94: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

EST projects have their roots in the early 1980s, when it was recognized that short stretches of DNA sequences from cDNAs could be used to identify genes. The Institute for Genomic Research (TIGR) was established to generate EST data on a massive scale. Among the largest projects conducted entirely in the public domain include an effort funded by Merck and Co., which has deposited more than 500,000 human ESTs into dbEST. A hallmark of these endeavours, carried out by a collaboration between Washington University Genome Sequencing Center and members of IMAGE (Integrated Molecular Analysis of Gene Expression) consortium, has been the rapid deposition of the sequences into the public domain and the concomitant availability of the sequence-tagged clones.

Page 95: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

dbEST release 021800Summary by Organism - February 18, 2000Number of public entries: 3,643,273

Homo sapiens (human) 1,775,721Mus musculus + domesticus (mouse) 918,414Rattus sp. (rat) 134,685Caenorhabditis elegans (nematode) 101,232Drosophila melanogaster (fruit fly) 86,121Danio rerio (zebrafish) 61,893Lycopersicon esculentum (tomato) 53,603Zea mays (maize) 51,883Glycine max (soybean) 50,656Oryza sativa (rice) 47,939Arabidopsis thaliana (thale cress) 45,757

Page 96: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Search: AA927876

Page 97: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

dbEST Id: 1659486

IDENTIFIERSEST name: om18b09.s1GenBank Acc: AA927876GenBank gi: 3076620

CLONE INFOClone Id: IMAGE:1541369 (3')Source: NCIInsert length: 1074DNA type: cDNA

PRIMERSSequencing: -40m13 fwd. ET from AmershamSEQUENCE TTTGACGGGAGGGCACAGGAAACTCTTTATTATGGTGATGAGATCGACAATCTCCCCTAC TGTTAACCTTCGCTCCTGCACACTTCAGTGTCCTCACTCTGTAGGGCTCGCTGGCCTGGG CTTCTGCGACCCGCGATCGTCCAGGAGAGGGCACTCGGCGCCCTTCCTGGGGTNNTCTGG GGCGGAATTTGCTAGGCCGCCGTAGCAGCTGTGCCAGGTCAGAAGCCGAGCCGGNCCGCT TTTCGTTCTTTAATTGGACTCTTGGCTAAGACGCTACCGACACCCCGTCAGTGGTGGAGG AAGAAGGACAACAGGGAGAGGTCGAGGQuality: High quality sequence stops at base: 318Entry Created: Apr 17 1998Last Updated: Jun 10 1998

Page 98: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

COMMENTS This clone is available royalty-free through LLNL ; contact the IMAGE Consortium ([email protected]) for further information.

LIBRARYdbEST lib id: 1042Lib Name: Soares_NFL_T_GBC_S1

Organism: Homo sapiensOrgan: pooledLab host: DH10BVector: pT7T3D-Pac (Pharmacia) with a modified polylinkerR. Site 1: Not IR. Site 2: Eco RIDescription: Equal amounts of plasmid DNA from three normalized libraries (fetal lung NbHL19W, testis NHT, and B-cell NCI_CGAP_GCB1) were mixed, and ss circles were made in vitro. Following HAP purification, this DNA was used as tracer in a subtractive hybridization reaction. The driver was PCR-amplified cDNAs from pools of 5,000 clones made from the same 3 libraries. The pools consisted of I.M.A.G.E. clones 297480-302087, 682632-687239, 726408-728711, and 729096-731399. Subtraction by Bento Soares and M. Fatima Bonaldo.

Page 99: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 100: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 101: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 102: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 103: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 104: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 105: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Simple Mathematics:

Human genes 100,000 genes

Summary by Organism - February 18, 2000Homo sapiens (human) 1,775,721

More than 10 fold coverage!!

Page 106: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Clustering is the process of finding subsets of sequences which belong together within a larger set. This is done by converting discrete similarity scores to boolean links between sequences. That is, two sequences are considered linked if their similarity exceeds a threshold. UniGene clustering proceeds in several stages, with each stage adding less reliable data to the results of the preceding stage. This staged clustering affords greater control than a more egalitarian

treatment of all links between sequences. Unigene_HUMAN: 92,571 clusters| HGI: 299,412 clustersUnigene_MOUSE: 75,963 clusters| MGI: 104,927 clustersUnigene_RAT: 28,680 clusters| RGI: 35,875 clusters

(Feb. 19 , 2000) (Jul. 3, 1999)

Page 107: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

TIGR's Human Gene Index compare with UniGene?The HGI assemblies (and all of TIGR's Gene Index assemblies) are made by firstclustering the EST sequences and then assembling these clusters into consensussequences, or THCs(TCs for non-human data). EST sequences are compared andclustered together if they meet the following criteria: a minimum 40 base pair match greater than 95% similarity in the overlap region a maximum unmatched overhang of 20 base pairs These clusters are then assembled into consensus sequences using TIGR's in-houseassembly program. UniGene links ESTs in a cluster if the sequences have a 50 base pair overlap in the3' untranslated region (UTR) with 100% identity. These clusters are not run throughthe more stringent assembly process and consensus sequences are not made. For this

reason you will often find several TIGR THCs contained within one UniGene cluster.

THCs, "Tentative Human Consensus" sequences, are assemblies of human ESTs.

Page 108: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

UniGene Human Release StatisticsStatistics for UniGene build uploaded on: Sat Feb 19 2000 UniGene Build #108Sequences Included in UniGene=============================Known genes are from GenBank 114 (1-Dec-1999)ESTs are from dbEST through 13-Feb-2000 30044 mRNAs + gene CDSs 938584 EST, 3'reads 347845 EST, 5'reads + 157255 EST, other/unknown ---------- 1473728 total sequences in clustersFinal Number of Clusters (sets)=============================== 92571 sets total 10797 sets contain at least one known gene 91523 sets contain at least one EST 9749 sets contain both genes and ESTs

Page 109: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

HGI Release 4.5 - Nov. 15, 1999

Total sequencesin THCs singletons total

ESTs 1,066,183 241,110 1,307,293HTs 5,949 1,165 7,114Totals 1,072,132 242,275 1,314,407

Total unique sequencesTHCs 84,837singleton ESTs 241,110singleton HTs 1,165Total 327,112

Page 110: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 111: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Database: Unigene_HUMAN 58,791 sequences; 43,055,747 total letters

Score ESequences producing significant alignments: (bits) Value

gnl|UG|Hs#S971963 ak43b04.s1 Homo sapiens cDNA, 3' end /clone=IM... 599 e-171gnl|UG|Hs#S510257 70F12 Homo sapiens cDNA /clone=(not-directiona... 36 0.17

gnl|UG|Hs#S971963 ak43b04.s1 Homo sapiens cDNA, 3' end /clone=IMAGE:1408687 /clone_end=3' /gb=AA868505 /ug=Hs.99430 /len=627 Length = 627 Score = 599 bits (302), Expect = e-171 Identities = 321/327 (98%), Positives = 321/327 (98%)

AA927876 as query (318 bps)

Hs. 99430

Page 112: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Hs.99430 Homo sapiens EXPRESSION INFORMATION cDNA sources: Blood, Ovary, Testis EST SEQUENCES (8)AI150041 cDNA clone IMAGE:1751830 Testis 3' read 1.1 kbAA927876 cDNA clone IMAGE:1541369 3' read 1.1 kbAI223414 cDNA clone IMAGE:1838461 Testis 3' read 1.0 kbAI150330 cDNA clone IMAGE:1751988 Testis 3' read 0.6 kbAA868505 cDNA clone IMAGE:1408687 Testis 3' readAA476210 cDNA clone IMAGE:771312 Ovary 3' readAA456628 cDNA clone IMAGE:809583 Ovary 3' readAI361709 cDNA clone IMAGE:2021901 Blood 3' read

Page 113: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Hs.434 Homo sapiens

Human heregulin-beta1 gene, complete cds

MAPPING INFORMATION

Chromosome: 8

Gene Map 98: stSG4083 , Chr.8, D8S1820-D8S505 Gene Map 98: WI-18803 , Chr.8, D8S1820-D8S505 Gene Map 98: SHGC-12780 , Chr.8, D8S1820-D8S505

EXPRESSION INFORMATION

cDNA sources: Brain, Breast, Liver, Testis

Page 114: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Database: HGI-HUMAN 234,459 sequences; 111,134,950 total letters

Score ESequences producing significant alignments: (bits) Value

lcl|THC226049 581 e-165lcl|R47793 40 0.02734 1.7

lcl|THC226049 Length = 436 Score = 581 bits (293), Expect = e-165 Identities = 313/320 (97%), Positives = 313/320 (97%), Gaps = 1/320 (0%)

AA927876 as query (318 bps)

THC226049

Page 115: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

>THC226049TGAGGGCACAGGAAACTCTTTATTATGGTGATGAGATCGACAATCTCCCCTACTGTTAACCTTCGCTCCTGCACACTTCAGTGTCCTCACTCTGTAGGGCTCGCTGGCCTGGGCTTCTGCGACCCGCGATCGTCCAGGAGAGGGCACTCGGCGCCCTTCCTGGGGCgTTcTGGGGCGGAATTTGCTAGGCCGCCGTAGCAGCGGTGCCAGGTCAGAAGCCGAGCCGGCyCGCTTTTCGTTCTTTAATTGGACTCTTGGCTAAGACGCTACCGACACCCCGTCaGgTGGTGGAGGAAGAAGGACAACAGGGAGAGGTCGAGGGCCGAGACGGCTCGAGGGAGGAGTAGAGGAAGGTGGAGCGGATGGTCCATCCGGGCGGGAGTTGGCTGGGCGAGTGACCGCGCATGTGCCGCTGCATGGAGGGCAAGCTGTTACA

1=================================THC226049================================436 ----------------------------1---------------------------> --------------------------------------2-------------------------------------->

# EST Id GB# ATCC# left right library--------------------------------------------------------------------------------1 F zw35g01.s1 AA476210 1 317 ovary tumor NbHOT, Soares2 F zx75d08.s1 AA456628 1 436 ovary tumor NbHOT, SoaresSequence source codes: F = WashU/MerckThere are no hits for THC226049.

Page 116: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

In silico cloning:In order to perform an electronic cDNA library screen, the EST

sequences retrieved in this way can be used as queries in a BLASTN search of dbEST to identify over-lapping ESTs. This procedure can be reiterated with the newly identified ESTs until no additional hits are found. The ESTs isolated can be assembled into sequence contigs

using computer softwares.

emb|AJ003623|HSJ003623 H.sapiens DNA for EST MPIpl10-4B1 Length = 556 Score = 46.9 bits (109), Expect = 1e-04 Identities = 29/83 (34%), Positives = 42/83 (49%), Gaps = 8/83 (9%)Query: 23 RLRPALCDTLITVGSQEFPAHSLVLAGVSQQLGRRGQWALGEG--------ISPSTFAQL 74 RL+ LCD L+ VG Q+F AH VLA S+ E P F +Sbjct: 164 RLKGQLCDVLLIVGDQKFRAHKNVLAASSEYFQSLFTNKENESQTVFQLDFCEPDAFDNV 343

Query: 75 LNFVYGESVELQPGELRPLQEAARALGVQSL 105 LN++Y S+ ++ L +QE +LG+ LSbjct: 344 LNYIYSSSLFVEKSSLAAVQELGYSLGISFL 436

How to start? TBLASTN

Page 117: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

TTGANNNCCTTTGAANNNCCNNTTNNTCATAGATCTCTCGAGTTTTTTTTTTTTTTTTTTTCTGAAGGGAGGGCACAGGAAACTCTTTATTATGGTGATGAGATCGACAATCTCCCCTACTGTTAACCTTCGCTCCTGCACACTTCAGTGTCCTCACTCTGTAGGGCTCGCTGGCCTGGGCTTCTGCGACCCGCGATCGTCCAGGAGAGGGCACTCGGCGCCCTTCCTGGGGCGCTTCTGGGGCGGAATTTGCTAGGCCGCCGTAGCAGCGGTGCCAGGTCAGAAGCCGAGCCGGCCCGCTTTTCGTTCTTTAATTGGACTCTTGGCTAAGACGCTACCGACACCCCGTCAGGTGGTGGAGGAAGAAGGACAACAGGGAGAGGTCGAGGGCCGAGACGGCCTCGAGGAGGAGTAGAGGAAGGTGGAGCGGATGGTCCATCCGGGCGGGAGTTGGCTGGGCGAGTGACCGCGCATGTGCGCCTGCATGGAGGCCAGGCTGGGACAGCCGGCCCCGCACAGGGAGCAGCGGTACGGAGCGGCCCCGTGTGTCCGCAGGTGCTTGGTCATGGCCGAGAAGTCCCGGGAGCGCTGAGGACAAAGGCTACAGGAGAAGGGCTTCTCTCCTGTGTGGACTCGGTAGTGCGTCTCCATCTGATGCTTGAGTGAAAACCTCTTTCACAGACAGAGCACGCATAGGGGCCCAGACCGAGCANGGTCGACGCGGCCCGCGAAATTCGGATCCCCGGGGCCTTCATGGGCCATATGACCCCCCAAGCTAGCGTAAATCTGGGAACATCGTATGGGTAAAGCCNTNANAGAATCTCTTTTTTTTTGGGTTTGGGGNGGGGGTNATCTTTCATTNATCGAATTAGANTAGTTATNTNCCATTAATCCATTGNANNGGNNTTTAAACATTCCCTTGAAGGGATTCCNAAACCCTTTTACCNCAATTTTGGGTCCCGTCCAAACCCAGGTTGACAAGNGGGTTTTTGGAAATTNTTTNCCCNTNATTCAATTTTTCCT

Experimental results:

Yeast two-hybrid experiment;Differential Display;Library screening; etc.

Page 118: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

BLASTN search to GenBank

Page 119: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Cosmid from chromosome 19; it is a novel gene.

Page 120: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

BLASTN search to dbEST; Unigene; TIGR-HGI

Page 121: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

cDNA and genomic DNA alignment and matrix analysis:

Page 122: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Gene prediction programs:

Page 123: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

http://CCR-081.mit.edu/GENSCAN.html

Page 124: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

GRAIL 210138 - 11018 +12608 - 12748 x13530 - 13923 x

GENSCAN10138 - 11018 +11268 - 11341 +11450 - 11518 +11644 - 11808 +11989 - 12144 +12360 - 12454 x12608 - 12748 x

FGENES1880 - 1908 x5061 - 5175 x5900 - 6049 x8317 - 8544 +10357 - 11018 +11268 - 11341 +11450 - 11518 +11644 - 11864 +polyA: 12521 +

Page 125: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

(Start) ATGTCCCTGCCCCCCATAAGACTGCCCAGCCCCTATGGCTCTGATCGGCTGGTACAGCTAGCAGCCAGGCTCCGGCCAGCACTCTGTGATACTCTGATCACCGTAGGGAGCCAGGAGTTC M S L P P I R L P S P Y G S D R L V Q L A A R L R P A L C D T L I T V G S Q E F> CCCGCCCACAGCCTGGTGCTAGCAGGTGTCAGCCAGCAGCTGGGCCGCAGGGGCCAGTGGGCTCTGGGAGAAGGCATCAGCCCTTCTACCTTTGCCCAGCTCCTGAACTTTGTGTATGGG P A H S L V L A G V S Q Q L G R R G Q W A L G E G I S P S T F A Q L L N F V Y G> GAGAGTGTAGAGCTGCAGCCTGGAGAGCTAAGGCCCCTTCAGGAGGCGGCCAGGGCCTTGGGAGTGCAGTCCCTGGAAGAGGCATGCTGGAGGGCTCGAGGGGACAGGGCTAAAAAGCCA E S V E L Q P G E L R P L Q E A A R A L G V Q S L E E A C W R A R G D R A K K P> GATCCAGGCCTGAAGAAACATCAGGAGGAGCCAGAGAAACCCTCAAGGAATCCTGAGAGAGAACTGGGGGACCCTGGAGAGAAGCAGAAACCAGAACAGGTTTCTAGAACTGGTGGGAGA D P G L K K H Q E E P E K P S R N P E R E L G D P G E K Q K P E Q V S R T G G R> GAACAGGAGATGTTGCACAAGCACTCGCCACCAAGAGGCAGACCCGAGATGGCAGGAGCAACGCAGGAGGCTCAGCAGGAACAGACCAGGTCAAAGGAGAAACGCCTCCAAGCCCCTGTT E Q E M L H K H S P P R G R P E M A G A T Q E A Q Q E Q T R S K E K R L Q A P V> GGCCAAAGGGGAGCAGATGGGAAGCATGGAGTGCTCACGTGGTTGAGGGAAAATCCAGGGGGCTCTGAGGAAAGTCTGCGCAAGCTCCCTGGCCCCCTTCCCCCAGCAGGCTCCCTGCAA G Q R G A D G K H G V L T W L R E N P G G S E E S L R K L P G P L P P A G S L Q> ACCAGCGTCACCCCTAGGCCCTCGTGGGCTGAGGCCCCTTGGTTGGTGGGGGGCCAGCCTGCCCTGTGGAGCATCCTGCTGATGCCGCCCAGATATGGCATTCCCTTCTACCATAGCACC T S V T P R P S W A E A P W L V G G Q P A L W S I L L M P P R Y G I P F Y H S T> CCCACCACTGGAGCCTGGCAGGAGGTCTGGCGGGAACAGAGGATCCCACTGTCCCTAAATGCCCCCAAAGGGCTCTGGAGCCAGAACCAGTTGGCCTCCTCCAGCCCTACCCCAGGTTCC P T T G A W Q E V W R E Q R I P L S L N A P K G L W S Q N Q L A S S S P T P G S> CTCCCCCAGGGCCCCGCACAGCTCAGCCCTGGGGAGATGGAAGAGTCTGATCAGGGGCACACAGGCGCACTTGCAACCTGTGCGGGTCATGAGGACAAGGCAGGCTGCCCACCTCGCCCG L P Q G P A Q L S P G E M E E S D Q G H T G A L A T C A G H E D K A G C P P R P> CACCCTCCCCCGGCCCCTCCTGCTCGGTCTCGGCCCTATGCGTGCTCTGTCTGTGGAAAGAGGTTTTCACTCAAGCATCAGATGGAGACGCACTACCGAGTCCACACAGGAGAGAAGCCC H P P P A P P A R S R P Y A C S V C G K R F S L K H Q M E T H Y R V H T G E K P> TTCTCCTGTAGCCTTTGTCCTCAGCGCTCCCGGGACTTCTCGGCCATGACCAAGCACCTGCGGACACACGGGGCCGCTCCGTACCGCTGCTCCCTGTGCGGGGCCGGCTGTCCCAGCCTG F S C S L C P Q R S R D F S A M T K H L R T H G A A P Y R C S L C G A G C P S L> GCCTCCATGCAGGCGCACATGCGCGGTCACTCGCCCAGCCAACTCCCGCCCGGATGGACCATCCGCTCCACCTTCCTCTACTCCTCCTCGAGGCCGTCTCGGCCCTCGACCTCTCCCTGT A S M Q A H M R G H S P S Q L P P G W T I R S T F L Y S S S R P S R P S T S P C> TGTCCTTCTTCCTCCACCACCTGACGGGGTGTCGGTAGCGTCTTAGCCAAGAGTCCAATTAAAGAACGAAAAGCGGGCCGGCTCGGCTTCTGACCTGGCACCGCTGCTACGGCGGCCTAG C P S S S T T *

Page 126: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 127: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 128: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 129: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

hum TZF p 1 MSLPPIRLPSPYGSDRLVQLAARLRPALCDTLITVGSQEFPAHSLVLAGVSQQLG----RRGQWALGEGISPSTFAQLLNFVYGESVELQPGELR 91hum pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD 100mus pLZF p 1 MDLTKMGMIQLQNPSHPTGLLCKANQMRLAGTLCDVVIMVDSQEFHAHRTVLACTSKMFEILFHRNSQHYTLDFLSPKTFQQILEYAYTATLQAKAEDLD 100 M : :: PS RL :LCD :I V SQEF AH VLA S: R Q : :SP TF Q:L : Y ::: : :L

hum TZF p 92 PLQEAARALGVQSLEEACW------RARGD---RAKKPDPG----------------LKKHQEEPEKPSRNPERELGDPGEKQKP--------------- 151hum pLZF p 101 DLLYAAEILEIEYLEEQCLKMLETIQASDDNDTEATMADGGAEEEEDRKARYLKNIFISKHSSEESGYASVAGQSLPGPMVDQSPSVSTSFGLSAMSPTK 200mus pLZF p 101 DLLYAAEILEIEYLEEQCLKILETIQASDDNDTEATMADGGGEEEDDRKARYLKNIFISKHSSEESGYASVAGQSLPGPMVDQSPSVSTSFGLSAMSPTK 200 L AA L :: LEE C :A D A D G : KH E : : L P Q P

hum TZF p 152 EQVSRTGGREQEMLH-KHSPPRG--RPEMAG-----ATQEAQQEQTRSKEKRLQ-AP------VG--------QRGADG-----KHGVLTWLRENPGGSE 223hum pLZF p 201 AAVDSLMTIGQSLLQGTLQPPAGPEEPTLAGGGRHPGVAEVKTEMMQVDEVPSQDSPGAAESSISGGMGDKVEERGKEGPGTPTRSSVITSARELHYGRE 300mus pLZF p 201 AAVDSLMSIGQSLLQGTLQPPAGPEEPTLAGGGRHPGVAEVKMEMMQVDEAPCQDSPGAAESSISGGMGDKFEERSKEGPGTPTRRSVITSARELHYGRE 300 V Q :L: PP G P :AG E : E : E Q :P : :R :G : V:T RE G E

hum TZF p 224 ESLRKLPGPLP----PAGSLQTSVTP--RP--SWAEAP----WLVGGQP-ALWSILLMPPRYGIPFYHST-----PTTGAWQEVWR-----------EQR 294hum pLZF p 301 ESAEQVPPPAEAGQAPTGRPEHPAPPPEKHLGIYSVLPNHKADAVLSMPSSVTSGLHVQPALAVSMDFSTYGGLLPQGFIQRELFSKLGELAVGMKSESR 400mus pLZF p 301 ESGEQLSPPVEAGQGPPGRQEPLAPPVEKHLGIYSVLPNHKADAVLSMPSSVTSGLHVQPALAVSMDFSTYGGLLPQGFIQRELFSKLGELAVGMKAESR 400 ES :: P P G : P : : P V P :: S L : P : ST P :E: E R

hum TZF p 295 ----------IPLSLN--------APKGLWSQ----------N-----Q--LASSSPTPGSLP-QGPAQLSP-GEMEESDQGHTGALAT-----CAG--- 349hum pLZF p 401 TIGEQCSVCGVELPDNEAVEQHRKLHSGMKTYGCELCGKRFLDSLRLRMHLLAHSAGAKAFVCDQCGAQFSKEDALETHRQTHTGTDMAVFCLLCGKRFQ 500mus pLZF p 401 PLGEQCSVCGVELPDNEAVEQHRKLHSGMKTYGCELCGKRFLDSLRLRMHLLAHSAGAKAFVCDQCGAQFSKEDALETHRQTHTGTDMAVFCLLCGKRFQ 500 : L N G: : LA S: : : Q AQ S :E Q HTG: : C

hum TZF p 350 --------HEDKAG--------CP---P---------RPHPPPAPPARS------R----------------PYACSVCGKRFSLKHQMETHYRVHTGEK 399hum pLZF p 501 AQSALQQHMEVHAGVRSYICSECNRTFPSHTALKRHLRSHTGDHPYECEFCGSCFRDESTLKSHKRIHTGEKPYECNGCDKKFSLKHQLETHYRVHTGEK 600mus pLZF p 501 AQSALQQHMEVHAGVRSYICSECNRTFPSHTALKRHLRSHTGDHPYECEFCGSCFRDESTLKSHKRIHTGEKPYECNGCGKKFSLKHQLETHYRVHTGEK 600 E :AG C P R H P R PY C C K:FSLKHQ:ETHYRVHTGEK

hum TZF p 400 PFSCSLCPQRSRDFSAMTKHLRTH-GAAPYRCSLCGAGCPSLASMQAHMRGHSPSQLPPGWTIRSTFLYSSSRPSRPSTSPCCPSSSTT 487hum pLZF p 601 PFECKLCHQRSRDYSAMIKHLRTHNGASPYQCTICTEYCPSLSSMQKHMKGHKPEEIPPDWRIEKTYLYLCY-V 673mus pLZF p 601 PFECKLCHQRSRDYSAMIKHLRTHNGASPYQCTICTEYCPSLSSMQKHMKGHKPEEIPPDWRIEKTYLYLCYV 673 PF C LC QRSRD:SAM KHLRTH GA:PY:C::C CPSL:SMQ HM:GH P ::PP W I T:LY :

Page 130: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Hs.99430 Homo sapiens EXPRESSION INFORMATION cDNA sources: Blood, Ovary, Testis EST SEQUENCES (8)AI150041 cDNA clone IMAGE:1751830 Testis 3' read 1.1 kbAA927876 cDNA clone IMAGE:1541369 3' read 1.1 kbAI223414 cDNA clone IMAGE:1838461 Testis 3' read 1.0 kbAI150330 cDNA clone IMAGE:1751988 Testis 3' read 0.6 kbAA868505 cDNA clone IMAGE:1408687 Testis 3' readAA476210 cDNA clone IMAGE:771312 Ovary 3' readAA456628 cDNA clone IMAGE:809583 Ovary 3' readAI361709 cDNA clone IMAGE:2021901 Blood 3' read

Northern Blotting

Page 131: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 132: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

LOCUS AF130255 1960 bp mRNA PRI 22-FEB-1999DEFINITION Homo sapiens testis zinc finger protein (TZFP) mRNA, complete cds.ACCESSION AF130255KEYWORDS .SOURCE human. ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.REFERENCE 1 (bases 1 to 1960) AUTHORS Tang,Tang K., Lai,Chun-Hung, Tang,Chieh-Ju C., Huang,Chang-Jen and Lin,Wen-chang. TITLE Identification and gene structure of a novel human PLZF related transcription factor gene, TZFP JOURNAL UnpublishedREFERENCE 2 (bases 1 to 1960) AUTHORS Tang,T. K., Tang,C.-J. C. and Lin,W.-c. TITLE Direct Submission JOURNAL Submitted (22-FEB-1999) Institute of Biomedical Sciences, Academia Sinica, No. 128, Sec. 2, Academia Road, Taipei, Taiwan 11529, TAIWAN

Page 133: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 134: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Search: AA927876

Page 135: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

dbEST Id: 1659486

IDENTIFIERSEST name: om18b09.s1GenBank Acc: AA927876GenBank gi: 3076620

CLONE INFOClone Id: IMAGE:1541369 (3')Source: NCIInsert length: 1074DNA type: cDNA

PRIMERSSequencing: -40m13 fwd. ET from AmershamSEQUENCE TTTGACGGGAGGGCACAGGAAACTCTTTATTATGGTGATGAGATCGACAATCTCCCCTAC TGTTAACCTTCGCTCCTGCACACTTCAGTGTCCTCACTCTGTAGGGCTCGCTGGCCTGGG CTTCTGCGACCCGCGATCGTCCAGGAGAGGGCACTCGGCGCCCTTCCTGGGGTNNTCTGG GGCGGAATTTGCTAGGCCGCCGTAGCAGCTGTGCCAGGTCAGAAGCCGAGCCGGNCCGCT TTTCGTTCTTTAATTGGACTCTTGGCTAAGACGCTACCGACACCCCGTCAGTGGTGGAGG AAGAAGGACAACAGGGAGAGGTCGAGGQuality: High quality sequence stops at base: 318Entry Created: Apr 17 1998Last Updated: Jun 10 1998

Page 136: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

COMMENTS This clone is available royalty-free through LLNL ; contact the IMAGE Consortium ([email protected]) for further information.

LIBRARYdbEST lib id: 1042Lib Name: Soares_NFL_T_GBC_S1

Organism: Homo sapiensOrgan: pooledLab host: DH10BVector: pT7T3D-Pac (Pharmacia) with a modified polylinkerR. Site 1: Not IR. Site 2: Eco RIDescription: Equal amounts of plasmid DNA from three normalized libraries (fetal lung NbHL19W, testis NHT, and B-cell NCI_CGAP_GCB1) were mixed, and ss circles were made in vitro. Following HAP purification, this DNA was used as tracer in a subtractive hybridization reaction. The driver was PCR-amplified cDNAs from pools of 5,000 clones made from the same 3 libraries. The pools consisted of I.M.A.G.E. clones 297480-302087, 682632-687239, 726408-728711, and 729096-731399. Subtraction by Bento Soares and M. Fatima Bonaldo.

Page 137: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Human cDNA Library Details:470 different libraries so farcovering more than 40 tissues

Stomach 202.NCI_CGAP_Gas1 gastric tumor 203.NCI_CGAP_Gas4 gastric tumor Testis 204.Barstead HPL-RB5 testis 205.Soares testis NHT 206.Life Tech. testis (10426-013) Thymus 207.NCI_CGAP_Thym1 thymoma Thyroid 208.NCI_CGAP_Thy1 invasive thyroid tumor Uterus 209.NCI_CGAP_Ut1 uterine tumor 210.NCI_CGAP_Ut2 uterine tumor 211.NCI_CGAP_Ut3 uterine tumor 212.NCI_CGAP_Ut4 uterine tumor 213.Soares pregnant uterus NbHPU

Q & A

CGAP

Page 138: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

CGAP: Cancer Genome Anatomy Project

Page 139: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Why CGAP? In the last two decades we have learned that genetic changes lie at the root of all cancers. In response, the Cancer Genome Anatomy Project (CGAP) will unite the newest technologies, along with those both cost-effective and capable of high-throughput, to identify all the genes responsible for the establishment and growth of cancer.

Project Goals To achieve a comprehensive molecular characterization of normal, precancerous, and malignant cells.

Page 140: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 141: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Normal Cells

Cancer Cells

Comparing the fingerprints of a normal versus a cancer cell will highlight genes that by their suspicious absence or presence (such as Gene H ) deserve further scientific scrutiny to determine whether such suspects play a role in cancer, or can be exploited in a test for early detection.

Page 142: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Identifying the genetic differences among normal cells, precancerous cells, and cancer cells, will contribute to our understanding of cancer as it

fosters the discovery of genes that directly cause cancer provides us with a way to identify early precancerous cells and thus enhances our methods for early detection improves our ability to match patients with appropriate treatment

Time line

Malignant TumorPre-cancer

Page 143: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

The research results displayed in this graph demonstrate that for patients suffering from the cancer neuroblastoma, the presence or absence of a specific set of genes found on Chromosome 1 strongly correlates with patient outcome. Therefore, in the future this characteristic of the tumor can be used to identify those patients that would benefit from more aggressive treatment, and those best served by the current treatment protocol.

Page 144: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 145: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 146: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 147: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 148: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 149: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Laser Capture Microdissection

(LCM)

Page 150: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 151: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 152: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 153: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 154: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 155: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 156: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Go

Page 157: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 158: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 159: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 160: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 161: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 162: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 163: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 164: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 165: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

2000CGAP sequences:

925,746CGAP genes:

79,844

1999CGAP sequences:

473,746CGAP genes:

20,665

Page 166: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 167: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 168: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 169: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 170: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Not in all others

Page 171: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 172: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Not in all others

Page 173: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 174: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 175: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Not in all others

Page 176: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Sequencing of Expressed Sequence Tags (ESTs) Serial Analysis of Gene ExpressionDifferential Display ApproachesHybridization Analysis

Page 177: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Digital Differential Display

The foundation of DDD is UniGene. UniGene employs a conservative method to assign all the human EST sequences that meet minimal standards of quality to distinct "clusters", each representing a unique human expressed gene. DDD takes advantage of UniGene by comparing the number of times sequences from different libraries were assigned to a particular UniGene cluster. This has the advantage that DDD will only report on sequences that we have confidencerepresent bona fide human expressed genes. There will of course be many differences in the number of sequences contained in each library that are assigned to a particular UniGene cluster, but only some of these differences are likely to reflect biological reality. Therefore DDD employs a statistical method of comparison - The Fisher Exact Test - to identify only those differences that are likely to be real. One important factor in determining statistical relevance is the absolute number of sequences in each library that have been successfully assigned to a UniGene cluster. In many cases there are not enough sequencesin dbEST libraries to meet the threshold of significance employed in the Fisher Exact Test. Since DDD will only yield a report if there are differences that exceed this threshold, it is expected that many comparisons will yield nothing.

Page 178: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 179: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 180: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 181: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

the fraction of sequences within the pool

visual aid that reflects the numerical values

statistically significant pairwise comparison

Page 182: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 183: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 184: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 185: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 186: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 187: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 188: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 189: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 190: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

THREE PRINCIPLES UNDERLIE THE SAGE TECHNOLOGY:

One short oligonucleotide sequence from a defined location within a transcript ("tag") allows accurate quantitation.

Tag size (10-14bp) is optimal for high throughput while maintaining accurate gene identification and quantitation.

The combined power of serial and parallel processing increases data throughput by orders of magnitude when compared to conventional approaches.

Page 191: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 192: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 193: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 194: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Ortholog:Homologous genes that have diverged from each other after speciation events (e.g., human beta- and chimp beta-globin)

Paralog:Homologous genes that have diverged from each other after gene duplication events (e.g., human beta- and gamma-globin)

Xenolog:Homologous genes that have diverged from each other after lateral gene transfer events (e.g., antibiotic resistance genes in bacteria)

Homolog:Genes that are descended from a common ancestor (e.g., all globins)

Page 195: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Dec. 11, 1998:

C. elegans: Sequence to Biology

-Jonathan Hodgkin, H. Robert Horvitz, Barbara R. Jasny, Judith Kimble*

This special issue of Science celebrates a landmark in biology: determination ofthe essentially complete DNA sequence of an animal genome. The animal is a smallinvertebrate, the nematode (or roundworm) Caenorhabditis elegans, and thesequence consists of about 97 million base pairs of DNA, approximatelyone-thirtieth the number in the human genome. Nonetheless, the information contentis enormous--eight times that of the budding yeast Saccharomyces cerevisiae, the only other eukaryote with a sequenced genome.

Page 196: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Genomic sequence of the Nematode C. elegnas:A platform for investigating biology

The C. elegans Squencing Consortium

97 MB257 YACs (20% only in YAC)2527 cosmids113 fosmids44 PCR19,099 predicted genes18,891 proteins here(16,260 reviewed)

EST: 67,815 EST from 40,379 clones

7432 genes

A multicellular organism genome

Page 197: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Genefinder program:** transplicing**

40% of predicted genes have ESTmatches

16,260/19,099 genes have been interactively reviewed. Average of one gene per 5 Kb.Average of five introns per gene.27% of genome resides in exons.

pFAM protein family search :Intracellular communicationTranscriptional regulation

Page 198: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Table 1. The 20 most common protein domains in C. elegans (41). RRM, RNA recognition motif; RBD, RNA binding domain; RNP, ribonuclear protein motif; UDP, uridine 5'-

diphosphate. -------------------------------------------------------------------Number Description-------------------------------------------------------------------

650 7 TM chemoreceptor410 Eukaryotic protein kinase domain240 Zinc finger, C4 type (two domains)170 Collagen140 7 TM receptor (rhodopsin family)130 Zinc finger, C2H2 type120 Lectin C-type domain short and long forms100 RNA recognition motif (RRM, RBD, or RNP domain) 90 Zinc finger, C3HC4 type (RING finger) 90 Protein-tyrosine phosphatase 90 Ankyrin repeat 90 WD domain, G-beta repeats 80 Homeobox domain 80 Neurotransmitter-gated ion channel 80 Cytochrome P450 80 Helicases conserved C-terminal domain 80 Alcohol/other dehydrogenases, short-chain type 70 UDP-glucoronosyl and UDP-glucosyl transferases 70 EGF-like domain 70 Immunoglobulin superfamily

Page 199: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Worming secrets from the C. elegans genome:Dec 11, 1998. Sciences

Washington University Genome Sequencing Center.Sanger Centre

8 - year effort: Sydney Brenner starts all.by 1992, they were doing a million bases per year. ~$200 MHigh-through put sequencing.Human genome project.

“We will be doing a lot of jumping back and forth between species” - F. Collins

Ping-Pong homology search

Page 200: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 201: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

In silico cloning:In order to perform an electronic cDNA library screen, the EST

sequences retrieved in this way can be used as queries in a BLASTN search of dbEST to identify over-lapping ESTs. This procedure can be reiterated with the newly identified ESTs until no additional hits are found. The ESTs isolated can be assembled into sequence contigs

using computer softwares. EST 2

EST 1EST 3

There are many sequencing related errors in the dbEST.

Page 202: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

C elegnasa. a. sequences

Human EST sequences

Comparative Gene Identification

Page 203: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Query= (597 letters) Sequences producing significant alignments: (bits) Valuelcl|THC200240 224 4e-58lcl|THC151579 181 3e-45lcl|AA099787 127 8e-29

lcl|THC200240 Length = 764 Score = 224 bits (565), Expect = 4e-58 Identities = 106/187 (56%), Positives = 136/187 (72%)

Query: 248 SGMKKNKYGNIEDLVVHLNFVCPKGIIQKQCQVPRMSSGPDIHQIILGSEGTLGVVSEVT 307 SGMKKN YGNIEDLVVH+ V P+GII+K CQ PRMS+GPDIH I+GSEGTLGV++E TSbjct: 3 SGMKKNIYGNIEDLVVHIKXVTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEAT 182

lcl|THC151579 Length = 698 Score = 181 bits (455), Expect = 3e-45 Identities = 81/142 (57%), Positives = 106/142 (74%)

Query: 446 LGMNHGVLGESFETSVPWDKVLSLCRNVKELMKREAKAQGVTHPVLANCRVTQVYDAGAC 505 L + + VLGESFETS PWD+V+ LCRNVKE + RE K +GV + CRVTQ YDAGACSbjct: 41 LALEYXVLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGAC 220

Page 204: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

sp|O00116|ADAS_HUMAN ALKYLDIHYDROXYACETONEPHOSPHATE SYNTHASE PRECURSOR (ALKYL-DHAP SYNTHASE) (ALKYLGLYCERONE-PHOSPHATE SYNTHASE) Length = 658 Score = 124 bits (309), Expect = 5e-29 Identities = 59/60 (98%), Positives = 59/60 (98%) 248Query: 1 SGMKKNIYGNIEDLVVHIKXVTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEAT 60 SGMKKNIYGNIEDLVVHIK VTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEATSbjct: 319 SGMKKNIYGNIEDLVVHIKMVTPRGIIEKSCQGPRMSTGPDIHHFIMGSEGTLGVITEAT 378

THC200240

sp|O00116|ADAS_HUMAN ALKYLDIHYDROXYACETONEPHOSPHATE SYNTHASE PRECURSOR (ALKYL-DHAP SYNTHASE) (ALKYLGLYCERONE-PHOSPHATE SYNTHASE) Length = 658 Score = 127 bits (315), Expect = 1e-29 Identities = 59/60 (98%), Positives = 59/60 (98%) 446Query: 1 LALEYXVLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGAC 60 LALEY VLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGACSbjct: 517 LALEYYVLGESFETSAPWDRVVDLCRNVKERITRECKEKGVQFAPFSTCRVTQTYDAGAC 576

THC151579

446-248=198

517-319=198

Page 205: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

[THC195737---------------------------------------------

MTRHGKNSTAASVYTYHERRRDAKASGYGTLHARLGADSIKEFHCCSLTLQPCRNPVISPTGYI

F

--------]

DREAILENILAQKKAYAKKLKEYEKQVAEESAAAKIAEGQAETFTKRTQFSAIESTPSRTGAVA

T

[THC195737--------------------

PRPEVGSLKRQGGVMSTEIAAKVKAHGEEGVMSNMKGDKSTSLPSFWIPELNPTAVATKLEKPS

S

----------------------------------------------------]

KVLCPVSGKPIKLKELLEVKFTPMPGTETAAHRKFLCPVTRDELTNTTRCAYLKKSKSVVKYDV

V

[THC195737----------------------]

EKLIKGDGIDPINGEPMSEDDIIELQRGGTGYSATNETKAKLIRPQLELQ*

U58746

Page 206: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Translation of 1 MTRHGKNCTAGAVYTYHEKKKDTAASGYGTQNIRLSRDAVKDFDCCCLSLQPCHD 55U58746 1 MTRHGKNSTAASVYTYHERRRDAKASGYGTLHARLGADSIKEFHCCSLTLQPCRN 55 *******.** .******...*. ****** . ** *..*.* **.*.****.

Translation of 56 PVVTPDGYLYEREAILEYILHQKKEIARQMKAYEKQRGTRREEQKELQRAASQDH 110U58746 56 PVISPTGYIFDREAILENILAQKKAYAKKLKEYEKQVAEESAAAKIAEGQAETFT 110 **..* **...****** ** *** *...* **** * . *

Translation of 111 VRGFLEKESAIVSRPLNPFTAKALSGTSPD-----------DVQPGPSVGPPSKD 154U58746 111 KRTQFSAIESTPSRTGAVATPRPEVGSLKRQGGVMSTEIAAKVKAHGEEGVMSNM 165 * . ** * . *. *. * *

Translation of 155 K-DK--VLPSFWIPSLTPEAKATKLEKPSRTVTCPMSGKPLRMSDLTPVHFTPLD 206U58746 166 KGDKSTSLPSFWIPELNPTAVATKLEKPSSKVLCPVSGKPIKLKELLEVKFTPMP 220 * ** ******* *.* * ******** * **.****... .* *.***.

Translation of 207 SSVDRVGLITRSER-YVCAVTRDSLSNATPCAVLRPSGAVVTLECVEKLIRKDMV 260U58746 221 ------GTETAAHRKFLCPVTRDELTNTTRCAYLKKSKSVVKYDVVEKLIKGDGI 269 * * . * ..* **** *.*.* ** *. * .** . *****. * .

Translation of 261 DPVTGDKLTDRDIIVLQRGGTGFAGSGVKLQAEKSRPVMQA 301U58746 270 DPINGEPMSEDDIIELQRGGTGYSAT-NETKAKLIRPQLELQ 310 **..*. ... *** *******.. . .* ** ..

(44%/59%)

Page 207: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

[THC171302--MVFGENQDLIRTHFQKEADKVRAMKTNWGLFTRTRMIAQSDYDFIVTYQQAENEAERSTVLSVFKEK-------------------------------------------------------------------AVYAFVHLMSQISKDDYVRYTLTLIDDMLREDVTRTIIFEDVAVLLKRSPFSFFMGLLHRQDQYIVH-------------------------------------------------------------------ITFSILTKMAVFGNIKLSGDELDYCMGSLKEAMNRGTNNDYIVTAVRCMQTLFRFDPYRVSFVNING-------------------------------------------------------------------YDSLTHALYSTRKCGFQIQYQIIFCMWLLTFNGHAAEVALSGNLIQTISGILGNCQKEKVIRIVVST-----------------] [THC177150--------------------------------------------LRNLITSNQDVYMKKQAALQMIQNRIPTKLDHLENRKFTDVDLVEDMVYLQTELKKVVQVLTSFDEY-------------------------------------------------------------------ENELRQGSLHWSPAHKCEVFWNENAHRLNDNRQELLKLLVAMLEKSNDPLVLCVAAHDIGEFVRYYP------------------------------------------------]RGKLKVEQLGGKEAMMRLLTVKDPNVRYHALLAAQKLMINNWKDLGLEI

U50199

Page 208: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

gi|2895578 (AF041338) vacuolar proton pump subunit SFD alpha is... 927 0.0gi|2895576 (AF041337) vacuolar proton pump subunit SFD beta iso... 885 0.0gi|1213557 (U50199) coded for by C. elegans cDNA yk89e9.5; code... 468 e-131gi|1086810 (U41109) similar to S. cerevisiae vacular H(+)-ATPas... 335 5e-91gnl|PID|e351278 (Z99532) hypothetical protein [Schizosaccharomy... 185 5e-46sp|P41807|VM13_YEAST VACUOLAR ATP SYNTHASE 54 KD SUBUNIT (V-ATP... 123 2e-27

gi|1213557 (U50199) coded for by C. elegans cDNA yk89e9.5; coded for by C. elegans cDNA cm7g5; coded for by C. elegans cDNA cm14b9; coded for by C. elegans cDNA yk52g5.5; coded for by C. elegans cDNA yk76e5.5; coded for by C. elegans cDNA yk131f11.5; c... Length = 470 Score = 468 bits (1192), Expect = e-131 Identities = 243/477 (50%), Positives = 314/477 (64%), Gaps = 20/477 (4%)

Human gene: 483 aa

Page 209: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

gi|2895578 (AF041338) vacuolar proton pump subunit SFD alpha isoform [Bos taurus] Length = 483 Score = 927 bits (2369), Expect = 0.0 Identities = 460/483 (95%), Positives = 465/483 (96%)

Query: 1 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMISAEDCEFIQRFEMKRSPE 60 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMIS+EDCEFIQRFEMKRSPESbjct: 1 MTKMDIRGAVDAAVPTNIIAAKAAEVRANKVNWQSYLQGQMISSEDCEFIQRFEMKRSPE 60

Query: 61 EKQEMLQTEGSQCAKTFINLMTHICKEQTVQYILTMVDDMLQENHQRVSIFFDYARCSKN 120 EKQEMLQTEGSQ AKTFINLMTHI KEQTVQYILT+VDD LQENHQRVSIFFDYA+ SKNSbjct: 61 EKQEMLQTEGSQRAKTFINLMTHISKEQTVQYILTLVDDTLQENHQRVSIFFDYAKRSKN 120

Query: 121 TAWPYFLPILNRQDPFTVHMAARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGS 180 TAW YFLP+LNRQD FTVHM ARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGSSbjct: 121 TAWSYFLPMLNRQDLFTVHMTARIIAKLAAWGKELMEGSDLNYYFNWIKTQLSSQKLRGS 180

Query: 181 GVAVETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQ 240 GV ETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQSbjct: 181 GVTAETGTVSSSDSSQYVQCVAGCLQLMLRVNEYRFAWVEADGVNCIMGVLSNKCGFQLQ 240

Page 210: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Query: 241 YQMIFSIWLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKSTERE 300 YQMIFS+WLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKS ERESbjct: 241 YQMIFSVWLLAFSPQMCEHLRRYNIIPVLSDILQESVKEKVTRIILAAFRNFLEKSVERE 300

Query: 301 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELK 360 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELKSbjct: 301 TRQEYALAMIQCKVLKQLENLEQQKYDDEDISEDIKFLLEKLGESVQDLSSFDEYSSELK 360

Query: 361 SGRLEWSPVHKSEKFWRENAVRLNEKNYELLKILTKLLEVSDDPQXLAVAAHDVGXYVRX 420 SGRLEWSPVHKSEKFWREN RLNEKNYELLKILTKLLEVSDDPQ LAVAAHDVG YVR Sbjct: 361 SGRLEWSPVHKSEKFWRENPARLNEKNYELLKILTKLLEVSDDPQVLAVAAHDVGEYVRH 420

Query: 421 YPRGKRVIEQXGGKQLVMNHMHHEXQQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQTXA 480 YPRGKRVIEQ GGKQLVMNHMHHE QQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQT ASbjct: 421 YPRGKRVIEQLGGKQLVMNHMHHEDQQVRYNALLAVQKLMVHNWEYLGKQLQSEQPQTAA 480

Query: 481 ARS 483 ARSSbjct: 481 ARS 483

Page 211: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

[AA134689-----------------------------------------------MSLNGFGEHTRSASHAGSWYNANQRDLDRQLTKWLDNAGPRIGTARALISPHAGYSYCGETAAYAF--------------------------]KQVVSSAVERVFILGPSHVVALNGCAITTCSKYRTPLGDLIVDHKINEELRATRHFDLMDRRDEES [THC196496-------------------------------------EHSIEMQLPFIAKVMGSKRYTIVPVLVGSLPGSRQQTYGNIFAHYMEDPRNLFVISSDFCHWGERF------------------------------------------------------------------SFSPYDRHSSIPIYEQITNMDKQGMSAIETLNPAAFNDYLKKTQNTICGRNPILIMLQAAEHFRIS-----------------------------------]NNHTHEFRFLHYTQSNKVRSSVDSSVSYASGVLFVHPN

U64857

Page 212: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Translation of 1 MSNR---VVCREASHAGSWYTASGPQLNAQLEGWLSQVQSTKRPARAIIAPHAGY 52U64857 1 MSLNGFGEHTRSASHAGSWYNANQRDLDRQLTKWLDNAGPRIGTARALISPHAGY 55 ** .* ********.* * ** ** . ***.*.*****

Translation of 53 TYCGSCAAHAYKQVDPSITRRIFILGPSHHVPLSRCALSSVDIYRTPLYDLRIDQ 107U64857 56 SYCGETAAYAFKQVVSSAVERVFILGPSHVVALNGCAITTCSKYRTPLGDLIVDH 110 .*** .** *.*** * *.******* * * **... ***** ** .*.

Translation of 108 KIYGELWKTGMFERMSLQTDEDEHSIEMHLPYTAKAMESHKDEFTIIPVLVGALS 162U64857 111 KINEELRATRHFDLMDRRDEESEHSIEMQLPFIAKVMGSKR--YTIVPVLVGSLP 163 ** ** * *. * . .* ******.**. ** * *.. .**.*****.*

Translation of 163 ESKEQEFGKLFSKYLADPSNLFVVSSDFCHWGQRFRYSYYD-ESQGEIYRSIEHL 216U64857 164 GSRQQTYGNIFAHYMEDPRNLFVISSDFCHWGERFSFSPYDRHSSIPIYEQITNM 218 *..* .* .*..*. ** ****.********.** .* ** * ** * ..

Translation of 217 DKMGMSIIEQLDPVSFSNYLKKYHNTICGRHPIGVLLNAITELQK-NGMNMSFSF 270U64857 219 DKQGMSAIETLNPAAFNDYLKKTQNTICGRNPILIMLQAAEHFRISNNHTHEFRF 273 ** *** ** * * .* **** .******.** ..*.* . *. . * *

Translation of 271 LNYAQSSQCRNWQDSSVSYAAGALTVH 297U64857 274 LHYTQSNKVRSSVDSSVSYASGVLFVHPN 302 *.*.** . * *******.* * **

Page 213: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

gi|1465834 (U64857) No definition line found [Caenorhabditis el... 300 1e-80sp|Q10212|YAY4_SCHPO HYPOTHETICAL 34.8 KD PROTEIN C4H3.04C IN C... 215 3e-55sp|P47085|YJX8_YEAST HYPOTHETICAL 38.5 KD PROTEIN IN SUI2-TDH2 ... 195 3e-49gi|2425141 (AF020286) similar to C. elegans CEESS08F encoded by... 155 4e-37gnl|PID|d1031681 (AP000006) 294aa long hypothetical protein [Py... 87 1e-16gi|2983422 (AE000712) hypothetical protein [Aquifex aeolicus] 85 7e-16gi|2621080 (AE000796) conserved protein [Methanobacterium therm... 79 4e-14gnl|PID|e283857 (Y08257) orf c05005 [Sulfolobus solfataricus] 78 9e-14sp|Q57846|Y403_METJA HYPOTHETICAL PROTEIN MJ0403 >gi|2129073|pi... 77 2e-13gi|2983762 (AE000735) hypothetical protein [Aquifex aeolicus] 68 1e-10

gi|1465834 (U64857) No definition line found [Caenorhabditis elegans] Length = 302 Score = 300 bits (759), Expect = 1e-80 Identities = 153/292 (52%), Positives = 198/292 (67%), Gaps = 4/292 (1%)

Query: 8 REASHAGSWYTASGPQLNAQLEGWLSQVQSTKRPARAIIAPHAGYTYCGSCAAHAYKQVD 67 R ASHAGSWY A+ L+ QL WL ARA+I+PHAGY+YCG AA+A+KQV Sbjct: 11 RSASHAGSWYNANQRDLDRQLTKWLDNAGPRIGTARALISPHAGYSYCGETAAYAFKQVV 70

BLASTP (Jan. 10, 1999)

Page 214: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

[THC132858-------------------]MKQFKRGIERDGTGFVVLMAEEAEDMWHIYNLIRIGDIIKASTIRKVVSETSTGTTSSQRVHTM

LTVSVESIDFDPGAQELHLKGRNIEENDIVKLGAYHTIDLEPNRKFTLQKTEWDSIDLERLNLA

[THC85433------------------------------------------LDPAQAADVAAVVLHEGLANVCLITPAMTLTRAKIDMTIPRKRKGFTSQHEKGLEKFYEAVSTA--------------------------------------------] {AA938998*****************FMRHVNLQVVKCVIVASRGFVKDAFMQHLIAHADANGKKFTTEQRAKFMLTHSSSGFKHALKEV*******} [THC200182----------------------------------------------------LETPQVALRLADTKAQGEVKALNQFLELMSTEPDRAFYGFNHVNRANQELAIETLLVADSLFRA-----------------------------------------------]QDIETRRKYVRLVESVREQNGKVHIFSSMHVSGEQLAQLTGCAAILRFPMPDLDDEPMDEN

Z36238

Page 215: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Translation of 1 MKLVRKNIEKDNAGQVTLVPEEPEDMWHTYNLVQVGDSLRASTIRKVQTESSTGS 55Z36238 1 MKQFKRGIERDGTGFVVLMAEEAEDMWHIYNLIRIGDIIKASTIRKVVSETSTGT 55 ** ...**.*..* * *. ** ***** ***...** ..******* .*.***.

Translation of 56 VGSNRVRTTLTLCVEAIDFDSQACQLRVKGTNIQENEYVKMGAYHTIELEPNRQF 110Z36238 56 TSSQRVHTMLTVSVESIDFDPGAQELHLKGRNIEENDIVKLGAYHTIDLEPNRKF 110 *.**.* **..**.**** * .*..** **.**. **.******.*****.*

Translation of 111 TLAKKQWDSVVLERIEQACDPAWSADVAAVVMQEGLAHICLVTPSMTLTRAKVEV 165Z36238 111 TLQKTEWDSIDLERLNLALDPAQAADVAAVVLHEGLANVCLITPAMTLTRAKIDM 165 ** * .***. ***. * *** .*******..****..**.**.*******...

Translation of 166 NIPRKRKGNCSQHDRALERFYEQVVQAIQRHIHFDVVKCILVASPGFVREQFCDY 220Z36238 166 TIPRKRKGFTSQHEKGLEKFYEAVSTAFMRHVNLQVVKCVIVASRGFVKDAFMQH 220 .******* .***.. **.*** * * **.. ****..*** ***.. *

Translation of 221 MFQQAVKTDNKLLLGNRSKFLQVHASSGHKYSLKEALCDPTVLARLSDTKAAGEV 275Z36238 221 LIAHADANGKKFTTEQRAKFMLTHSSSGFKHALKEVLETPQVALRLADTKAQGEV 275 . .* . * .*.**. *.*** * .*** * * * **.**** ***

Page 216: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

Translation of 276 KALDDSYKMLQHEPDRAFYGLKQVEKANEAMAIDTLLISDELFRHQDVATRSRYV 330Z36238 276 KALNQFLELMSTEPDRAFYGFNHVNRANQELAIETLLVADSLFRAQDIETRRKYV 330 *** .. ******** .* .**. .**.***..* *** **. ** .**

Translation of 331 RLVDSVKENAGTVRIFSSLHVSGEQLSQLTGVAAILRFPVPELSDQEGDS-SSEE 384Z36238 331 RLVESVREQNGKVHIFSSMHVSGEQLAQLTGCAAILRFPMPDLDDEPMDEN 381 ***.**.*. * *.****.*******.**** *******.*.* *. *

Translation of 385 D 385Z36238 382 381

Page 217: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

sp|P48612|PELO_DROME PELOTA PROTEIN >gi|973224 (U27197) pelota ... 520 e-147sp|P50444|YNU6_CAEEL HYPOTHETICAL 42.9 KD PROTEIN R74.6 IN CHRO... 446 e-125gi|3941543 (AF069497) pelota [Arabidopsis thaliana] 385 e-106pir||S45456 DOM34 protein - yeast (Saccharomyces cerevisiae) >g... 236 2e-61sp|P33309|DO34_YEAST DOM34 PROTEIN >gi|295608 (L11277) DOM34 [S... 212 2e-54gnl|PID|e304505 (Z86109) unknown [Saccharomyces pastorianus] 199 3e-50gi|2622770 (AE000923) cell division protein [Methanobacterium t... 155 4e-37gnl|PID|d1031529 (AP000006) 356aa long hypothetical protein [Py... 146 3e-34sp|Q57638|Y174_METJA HYPOTHETICAL PROTEIN MJ0174 >gi|2127805|pi... 145 6e-34gi|2649765 (AE001046) cell division protein pelota (pelA) [Arch... 116 3e-25

sp|P50444|YNU6_CAEEL HYPOTHETICAL 42.9 KD PROTEIN R74.6 IN CHROMOSOME III >gi|3879163|gnl|PID|e1348805 (Z36238) Similar to the DOM34 protein of saccharomyces cerevisiae (Swiss Prot accession number P33309) [Caenorhabditis elegans] Length = 381 Score = 446 bits (1136), Expect = e-125 Identities = 215/371 (57%), Positives = 282/371 (75%)

BLASTP (Jan. 10, 1999)

Page 218: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 219: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 220: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 221: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 222: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 223: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 224: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.
Page 225: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

5 55

55 55

55 555 555555 555 5 5

5 5 5555 5555555555

5

5

5

5

55 55 555 55 5555 55 5555 55555 555 55 55555 555 555 55 55

5

555 555 555 555555 5555 55 5555 555 5 5555555 5555 5 555555 5555 5

5

55

55

5

55

55

0

100

200

300

400

500

600

700

800

900

1000

1100

1200

0 100 200 300 400 500 600 700 800 900 1000

CGI protein length

Page 226: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

HH

H

H

HHH

H

H

HH

H

H

H

H

HH

H

HHH H

H

H

H

H

H

HH

HH

HH

HH

H

HHH

H

H

H

H

H

H

HH HH

HH

H

HH

H

H

H

H

H

H

H

H

HH

H

HH

HH

H

H

H

HHHH

HH

H

HH

H

H

H

H

H

H

HH

HH

HHH

H

H

H

H

H

H

H

H

HHHHH H

HH

HHHH

HH

H H

HHHHH

H

H

H

H

H

HHH

HHHH

HH

HH H

H

HH

H

HHHH

H

H

0

100

200

300

400

500

600

700

800

0 100 200 300 400 500 600 700 800 900 1000

CGI protein length

Page 227: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

A

A

AA

A

A

A

A

A

A

A

A

AA

AA

A

A

A

A

A

A

AA

A

A

A

AA

A

AA

AAA

A

A

A

A

AA

AA

A

A

A

A

A

AA

A

AAA

A

A

A

A

A

A

A

A

A

A

A

A

A

AA

A

A

A

A

AA

A

A

A

A

A

A

A

A

A

A

A

AAA

A

A AA

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

AA

AAA

A

A

A

A

A

A

A

A

A

A

AA

A

A

AA

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

A

30

40

50

60

70

80

90

100

0 100 200 300 400 500 600 700 800 900 1000

CGI protein length

Page 228: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

C. elegans from WormPept: 18,452 entries HGI searches

(5 days for TBLASTN analysis)

*Families 3,934*Known Gene 7,954*New Contig 3,456*Undetermined 2,070

<100 aa 1,038

*150 full length genes so far, more expected following GAP closure and 5’RACE.

83% between Human & C. elegans11% C. elegans specific

Page 229: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

C. elegans from WormPept: 18,452 entries MGI searches

(5 days for TBLASTN analysis)

*Families 5,602*Known Gene 4,151*New Contig 5,805*Undetermined 1,856

<100 aa 1,038

84% between Mouse & C. elegans10% C. elegans specific

Page 230: Genome Projects and Gene Hunting Genome Projects and Gene Hunting Institute of Biomedical Sciences, Academia Sinica Taipei, Taiwan R. O. C. Wen-chang.

http://www.ibms.sinica.edu.tw/~wenlin/


Recommended