ELSI abstracts

i

DOE/ER-0713 (Part 1)

Date Published: November 1997

Prepared for theU.S. Department of EnergyOffice of Energy Research

Office of Biological and Environmental ResearchGermantown, MD 20874-1290

Prepared by theHuman Genome Management Information System

Oak Ridge National LaboratoryOak Ridge, TN 37830-6480

managed byLockheed Martin Energy Research Corporation

for theU.S. Department of Energy

Under Contract DE-AC05-96OR22464

Part 1, Overview and Progress

LANL and LLNL begin production of DNA clone (cosmid) libraries representing single chromosomes.

DOE OHER and ICPEMC cosponsor Alta, Utah, conference highlighting the growing role of recombinant DNA technologies. OTA incorporates Alta proceedings into a 1986 report acknowledging value of human genome reference sequence.

Robert Sinsheimer holds meeting on human genome sequencing at University of California, Santa Cruz.

At OHER, Charles DeLisi and David A. Smith commission the first Santa Fe conference to assess the feasibility of a Human Genome Initiative.

Following the Santa Fe conference, DOE OHER announces Human Genome Initiative. With $5.3 million, pilot projects begin at DOE national laboratories to develop critical resources and technologies.

DOE advisory committee, HERAC, recommends a 15-year, multidisciplinary, scientific, and technological undertaking to map and sequence the human genome. DOE designates multidisciplinary human genome centers.

NIH NIGMS begins funding of genome projects.

Reports by OTA and NAS NRC recommend concerted genome research program.

HUGO founded by scientists to coordinate efforts internationally.

First annual Cold Spring Harbor Laboratory meeting held on human genome mapping and sequencing.

DOE and NIH sign MOU outlining plans for cooperation on genome research.

Telomere (chromosome end) sequence having implications for aging and cancer research is identified at LANL.

DOE and NIH present joint 5-year U.S. HGP plan to Congress. The 15-year project formally begins.

Projects begun to mark genes on chromosome maps as sites of mRNA expression.

R&D begun for efficient production of more stable, large-insert BACs.

DNA STSs recommended to correlate diverse types of DNA clones.

DOE and NIH establish Joint ELSI Working Group.

Human chromosome mapping data repository, GDB, established.

International IMAGE Consortium established to coordinate efficient mapping and sequencing of gene-representing cDNAs.

DOE-NIH Joint ELSI Working Group’s Task Force on Genetic Information and Insurance releases recommendations.

DOE and NIH revise 5-year goals [Science 262, 43–46 (Oct. 1,1993)].

French Genethon provides mega-YACs to the genome community.

IOM releases U.S. HGP-funded report, “Assessing Genetic Risks.”

GRAIL sequence interpretation service with Internet access initiated at ORNL.

*

*

*

* *

*

Low-resolution genetic linkage map of entire human genome published.

Guidelines for data release and resource sharing announced by DOE and NIH.

ADA Americans with Disabilities Act ANL Argonne National LaboratoryBAC bacterial artificial chromosomecDNA complementary deoxyribonucleic acidCGAP Cancer Genome Anatomy ProjectDNA deoxyribonucleic acidDHHS Department of Health and Human Services (NIH)DOE Department of EnergyEEOC Equal Employment Opportunity CommissionELSI ethical, legal, and social issuesGDB Genome DatabaseGRAIL Gene Recognition and Analysis Internet LinkHERAC Health and Environmental Research Advisory CommitteeHGP Human Genome Project, Human Genome ProgramHUGO Human Genome OrganisationICPEMC International Commission for Protection Against

Environmental Mutagens and CarcinogensIMAGE Integrated Molecular Analysis of Gene ExpressionIOM Institute of Medicine (NAS)

´ ´

ii

Genetic-mapping 5-year goal achieved 1 year ahead of schedule.

Completion of second-generation DNA clone libraries representing each human chromosome by LLNL and LBNL.

Genetic Privacy Act, first U.S. HGP legislative product, proposed to regulate collection, analysis, storage, and use of DNA samples and genetic information obtained from them; endorsed by DOE-NIH Joint ELSI Working Group.

DOE Microbial Genome Program launched; spin-off of HGP.

LLNL chromosome paints commercialized.

SBH technologies from ANL commercialized.

DOE HGP Information Web site activated for public and researchers.

LANL and LLNL announce high-resolution physical maps of chromosome 16 and chromosome 19, respectively.

Moderate-resolution maps of chromosomes 3, 11, 12, and 22 maps published.

First (nonviral) whole genome sequenced (for the bacterium Haemophilus influenzae).

Sequence of smallest bacterium, Mycoplasma genitalium, completed, displaying the minimum number of genes needed for independent existence.

EEOC guidelines extend ADA employment protection to cover discrimination based on genetic information related to illness, disease, or other conditions.

Methanococcus jannaschii genome sequenced; confirms existence of third major branch of life, the Archaea.

DOE-NIH Task Force on Genetic Testing releases interim principles.

Integrated STS-based detailed human physical map with 30,000 STSs achieves an HGP goal.

Health Care Portability and Accountability Act prohibits use of genetic information in certain health-insurance eligibility decisions, requires DHHS to enforce health-information privacy provisions.

DOE-NIH Joint ELSI Working Group releases guidelines on informed consent for large-scale sequencing projects.

DOE and NCHGR issue guidelines on use of human subjects for large-scale sequencing projects.

Saccharomyces cerevisiae (yeast) genome sequence completed by international consortium.

Sequence of the human T-cell receptor region completed.

Wellcome Trust sponsors large-scale sequencing strategy meeting in Bermuda for international coordination of human genome sequencing.

DOE forms Joint Genome Institute for implementing high-throughput sequencing at DOE HGP centers.

NIH NCHGR becomes NHGRI.

Escherichia coli genome sequence completed.

Second large-scale sequencing strategy meeting held in Bermuda.

High-resolution physical maps of chromosomes X and 7 completed.

Methanobacterium thermoautotrophicum genome sequence completed.

Archaeoglobus fulgidus genome sequence completed.

NCI CGAP begins.

*

*

*

*

*

*

**

*

*

*

* DOE had limited or no involvement in this event.

LANL Los Alamos National LaboratoryLBNL Lawrence Berkeley National LaboratoryLLNL Lawrence Livermore National LaboratoryMGP Microbial Genome ProjectMOU Memorandum of UnderstandingmRNA messenger ribonucleic acidNAS National Academy of SciencesNCHGR National Center for Human Genome Research (NIH)NCI National Cancer Institute (NIH)NHGRI National Human Genome Research Institute (NIH)NIGMS National Institute of General Medical Sciences (NIH)NIH National Institutes of HealthNRC National Research CouncilOHER Office of Health and Environmental ResearchORNL Oak Ridge National LaboratoryOTA Office of Technology AssessmentR&D Research and DevelopmentSBH sequencing by hybridizationSTS sequence tagged siteYAC yeast artificial chromosome

iii

iv

v

ore than a decade ago, the Office of Health and Environmental Research (OHER) of the U.S. Depart-ment of Energy (DOE) struck a bold course in launching its Human Genome Initiative, convinced thatits mission would be well served by a comprehensive picture of the human genome. Organizers recog-nized that the information the project would generate—both technological and genetic—would con-tribute not only to a new understanding of human biology and the effects of energy technologies but

also to a host of practical applications in the biotechnology industry and in the arenas of agriculture and environmentalprotection.

Today, the project’s value appears beyond doubt as worldwide participation contributes toward the goals of determiningthe human genome’s complete sequence by 2005 and elucidating the genome structure of several model organisms aswell. This report summarizes the content and progress of the DOE Human Genome Program (HGP). Descriptiveresearch summaries, along with information on program history, goals, management, and current research highlights,provide a comprehensive view of the DOE program.

Last year marked an early transition to the third and final phase of the U.S. Human Genome Project as pilot programs torefine large-scale sequencing strategies and resources were funded by DOE and the National Institutes of Health, the twosponsoring U.S. agencies. The human genome centers at Lawrence Berkeley National Laboratory, Lawrence LivermoreNational Laboratory, and Los Alamos National Laboratory had been serving as the core of DOE multidisciplinary HGPresearch, which requires extensive contributions from biologists, engineers, chemists, computer scientists, and mathema-ticians. These team efforts were complemented by those at other DOE-supported laboratories and about 60 universities,research organizations, companies, and foreign institutions. Now, to focus DOE’s considerable resources on meeting thechallenges of large-scale sequencing, the sequencing efforts of the three genome centers have been integrated into theJoint Genome Institute. The institute will continue to bring together research from other DOE-supported laboratories.Work in other critical areas continues to develop the resources and technologies needed for production sequencing; com-putational approaches to data management and interpretation (called informatics); and an exploration of the importantethical, legal, and social issues arising from use of the generated data, particularly regarding the privacy and confidenti-ality of genetic information.

Insights, technologies, and infrastructure emerging from the Human Genome Project are catalyzing a biological revolu-tion. Health-related biotechnology is already a success story—and is still far from reaching its potential. Other applica-tions are likely to beget similar successes in coming decades; among these are several of great importance to DOE.We can look to improvements in waste control and an exciting era of environmental bioremediation, we will see newapproaches to improving energy efficiency, and we can hope for dramatic strides toward meeting the fuel demands ofthe future.

In 1997 OHER, renamed the Office of Biological and Environmental Research (OBER), is celebrating 50 years of con-ducting research to exploit the boundless promise of energy technologies while exploring their consequences to thepublic’s health and the environment. The DOE Human Genome Program and a related spin-off project, the MicrobialGenome Program, are major components of the Biological and Environmental Research Program of OBER.

DOE OBER is proud of its contributions to the Human Genome Project and welcomes general or scientific inquiriesconcerning its genome programs. Announcements soliciting research applications appear in Federal Register, Science,Human Genome News, and other publications. The deadline for formal applications is generally midsummer for awardsto be made the next year, and submission of preproposals in areas of potential interest is strongly encouraged. Furtherinformation may be obtained by contacting the program office or visiting the DOE home page (301/903-6488,Fax: -8521, [email protected], URL: http://www.er.doe.gov/production/ober/hug_top.html).

Aristides Patrinos, Associate DirectorOffice of Biological and Environmental ResearchU.S. Department of EnergyNovember 3, 1997

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Preface

M

vi

vii

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Contents

Introduction ........................................................................................................................................................1Project Origins ...........................................................................................................................................1Anticipated Benefits of Genome Research............................................................................2Coordinated Efforts ...............................................................................................................................2DOE Genome Program.......................................................................................................................3

Five-Year Research Goals..................................................................................................................5Evolution of a Vision..............................................................................................................................6

Highlights of Research Progress.............................................................................................9Clone Resources for Mapping, Sequencing, and Gene Hunting...........................9Of Mice and Humans: The Value of Comparative Analyses.................................13DNA Sequencing....................................................................................................................................14Informatics: Data Collection and Analysis.........................................................................16Ethical, Legal, and Social Issues (ELSI)..............................................................................18

Technology Transfer.............................................................................................................................21Collaborations .........................................................................................................................................21Patenting and Licensing Highlights, FY 1994–96.........................................................22SBIR and STTR.....................................................................................................................................23Technology Transfer Award ..........................................................................................................241997 R&D 100 Awards......................................................................................................................24

Research Narratives..............................................................................................................................25Joint Genome Institute......................................................................................................................26Lawrence Livermore National Laboratory Human Genome Center.............27Los Alamos National Laboratory Center for Human Genome Studies........35Lawrence Berkeley National Laboratory Human Genome Center.................41University of Washington Genome Center.........................................................................47Genome Database.................................................................................................................................49National Center for Genome Resources...............................................................................55

Program Management........................................................................................................................59DOE OBER Mission...........................................................................................................................59Human Genome Program...............................................................................................................62

viii

Coordination with Other Genome Programs........................................................67U.S. Human Genome Project: DOE and NIH.................................................................67Other U.S. Programs..........................................................................................................................68

International Collaborations ........................................................................................................68

Appendices.........................................................................................................................................................71A: Early History, Enabling Legislation (1984–90) .............................................................73B: DOE-NIH Sharing Guidelines (1992) .................................................................................75C: Human Subjects Guidelines (1996) ......................................................................................77D: Genetics on the World Wide Web (1997) .........................................................................83E: 1996 Human Genome Research Projects (1996) ........................................................89F: DOE BER Program (1997) .........................................................................................................95

Glossary..............................................................................................................................................................101

Acronym List....................................................................................................................Inside back cover

1

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Introduction

ow completing its first de-cade, the Human GenomeProgram of the U.S. De-partment of Energy (DOE)is the longest-running

federally funded program to analyze thegenetic material—the genome— that de-termines an individual’s characteristicsat the most fundamental level. Part ofthe Biological andEnvironmental Re-search (BER)Program spon-sored by theDOE Office ofBiological andEnvironmentalResearch(OBER*), thegenome programis a major com-ponent of thelarger U.S. Hu-man GenomeProject.

Since October 1990, theproject has been supported jointly byDOE and the National Institutes ofHealth (NIH) National Human GenomeResearch Institute (formerly NationalCenter for Human Genome Research).Together, the DOE and NIH componentsmake up the world’s largest centrally co-ordinated biology research project everundertaken.

The U.S. Human Genome Project is a15-year endeavor to characterize the hu-man genome by improving existing hu-man genetic maps, constructing physicalmaps of entire chromosomes, and ulti-mately determining a complete sequenceof the deoxyribonucleic acid (DNA)subunits. Parallel studies are being car-ried out on selected model organisms tofacilitate interpretation of human genefunction.

The ultimate goal of the U.S. project isto identify the estimated 70,000 to100,000 human genes and render themaccessible for future biological study. Acomplete human DNA sequence willprovide physicians and researchers inmany biological disciplines with an ex-traordinary resource: an “encyclopedia”of human biology obtainable by com-

puter and availableto all.

NNFor 50 years, programs in the DOE Office ofBiological and Environmental Research have crossedtraditional research boundaries in seeking newsolutions to energy-related biological and

environmental challenges (see Appendix F, p. 95, andhttp://www.er.doe.gov/production/ober/ober.html).

Obtaining thecomplete se-quence by 2005will require a

highly coordinatedand focused inter-

national effort generat-ing advances in biological methodology;instrumentation (particularly automa-tion); and computer-based methods forcollecting, storing, managing, and ana-lyzing the rapidly growing body of data.

Project OriginsThe potential value of detailed geneticinformation was recognized early; untilrecently, however, obtaining this infor-mation was far beyond the capabilities ofbiomedical research. DOE OBER and itstwo predecessor agencies—the AtomicEnergy Commission and the Energy Re-search and Development Administra-tion—had long sponsored geneticresearch in both microbial and highersystems. These studies included explora-tions into population genetics; genomestructure, maintenance, replication, dam-age, and repair; and the consequences ofgenetic mutations. These traditional DOEactivities evolved naturally into the Hu-man Genome Program.

*In 1997 the Office of Health and Environ-mental Research (OHER) was renamedOffice of Biological and EnvironmentalResearch (OBER).

Scientific and technical terms aredefined in the Glossary, p. 101. Morehistorical details and other informationappear in the Appendices beginning onp. 71.

.........................

genome (je′nom), n.all the genetic materialin the chromosomes ofan organism.

. . . . . . . . . . . . . . . . . . . . . . . . ..

DOE Human Genome Program Report

2

By 1985, progress in genetic and DNAtechnologies led to serious discussionsin the scientific community about initi-ating a major project to analyze thestructure of the human genome. Afterconcluding that a DNA sequence wouldoffer the most useful approach for de-tecting inherited mutations, DOE in1986 announced its Human GenomeInitiative. The initiative emphasized de-velopment of resources and technolo-gies for genome mapping, sequencing,computation, and infrastructure supportthat would culminate in a complete se-quence of the human genome.

The National Research Council issued areport in 1988 recommending a dedi-cated research budget of $200 millionannually for 15 years to determine thesequence of the 3 billion chemical sub-units (base pairs) in the human genomeand to map and identify all human genes.

To launch the nation’s Human GenomeProject, Congress appropriated funds to

DOE and also to NIH, which had longsupported research in genetics and mo-lecular biology as an integral part of itsmission to improve the health of allAmericans. Other federal agencies andfoundations outside the Human GenomeProject also contribute to genome re-search, and many other countries aremaking important contributions throughtheir own genome research projects.

Coordinated EffortsIn 1988 DOE and NIH signed a Memo-randum of Understanding in which theagencies agreed to work together, coordi-nate technical research and activities, andshare results. The two agencies assumeda joint systematic approach toward estab-lishing goals to satisfy both short- andlong-term project needs.

Early guidelines projected three 5-yearphases, for which the first plan was pre-sented to Congress in 1990. The 1990

OBER’s mission is describedmore fully in the ProgramManagement section (p. 59)of this report.

Predictions of biology as “the scienceof the 21st century” have been madeby observers as diverse as Microsoft’sBill Gates and U.S. President BillClinton. Already revolutionizing biol-ogy, genome research has spawned aburgeoning biotechnology industryand is providing a vital thrust to theincreasing productivity and perva-siveness of the life sciences.

Technology and resources promotedby the Human Genome Project al-ready have had profound impacts onbiomedical research and promise torevolutionize biological research andclinical medicine. Increasingly de-tailed genome maps have aided re-searchers seeking genes associatedwith dozens of genetic conditions, in-cluding myotonic dystrophy, fragile X

syndrome, neurofibromatosis types 1and 2, a kind of inherited colon cancer,Alzheimer’s disease, and familial breastcancer.

Current and potential applications ofgenome research will address nationalneeds in molecular medicine, wastecontrol and environmental cleanup,biotechnology, energy sources, and riskassessment.

Molecular Medicine

On the horizon is a new era of molecu-lar medicine characterized less by treat-ing symptoms and more by looking tothe most fundamental causes of disease.Rapid and more specific diagnostic testswill make possible earlier treatment ofcountless maladies. Medical researchers

also will be able to devise novel therapeu-tic regimens based on new classes ofdrugs, immunotherapy techniques, avoid-ance of environmental conditions thatmay trigger disease, and possible aug-mentation or even replacement ofdefective genes through gene therapy.

Micr obial Genomes

In 1994, taking advantage of new capa-bilities developed by the genome project,DOE formulated the Microbial GenomeInitiative to sequence the genomes ofbacteria useful in the areas of energy pro-duction, environmental remediation, toxicwaste reduction, and industrial process-ing. In the resulting Microbial GenomeProject, six microbes that live under ex-treme conditions of temperature and pres-sure have been sequenced completely as

Anticipated Benefits of Genome Research

DOE Human Genome Program Report, Introduction

3

plan emphasized the creation of chromo-some maps, software, and automatedtechnologies to enable sequencing.

By 1993, unexpectedly rapid progress inchromosome mapping required updatingthe goals [Science 262, 43–46 (October1, 1993)], which now project through1998 (see p. 5). This plan is being re-vised again in anticipation of the ap-proaching high-throughput sequencingphase of the project. Last year marked anearly transition to this phase as manymore genome sequencing projects werefunded. The second and third phases ofthe project will optimize resources, re-fine sequencing strategies, and, finally,completely determine the sequence of allbase pairs in the genome.

Another area of DOE and NIH coopera-tion is in exploring the ethical, legal, andsocial issues (ELSI) arising from in-creased availability of genetic data andgrowing genetic-testing capabilities. The

two agencies established a joint work-ing group to confront these ELSI chal-lenges and have cosponsored jointprojects and workshops.

DOE Genome ProgramA general overview follows of recentprogress made in the DOE Human Ge-nome Program. Refer to the timeline(pp. ii–iii) for other achievements to-ward U.S. goals, including contribu-tions made outside DOE.

Physical mapsFor DOE, an early goal was to developchromosome physical maps, which in-volves reconstructing the order of clonedDNA fragments to represent their spe-cific originating chromosomes. (A set ofsuch cloned fragments is called a library.)Critical to this effort were the librariesof individual human chromosomes

of August 1997. Structural studies areunder way to learn what is uniqueabout the proteins of these organisms—the ultimate aim being to use the mi-crobes and their enzymes for suchpractical purposes as waste controland environmental cleanup.

Biotechnology

The potential for commercial develop-ment presents U.S. industry with awealth of opportunities. Sales of bio-technology products are projected toexceed $20 billion by the year 2000.The genome project already hasstimulated significant investment bylarge corporations and prompted thecreation of new biotechnology compa-nies hoping to capitalize on the far-reaching implications of its research.

Energy Sources

Biotechnology, fueled by insights reapedfrom the genome project, will play a sig-nificant role in improving the use of fos-sil-based resources. Increased energydemands, projected over the next50 years, require strategies to circumventthe many problems associated withtoday’s dominant energy technologies.Biotechnology promises to help addressthese needs by providing cleaner meansfor the bioconversion of raw materials torefined products. In addition, there is thepossibility of developing entirely newbiomass-based energy sources. Havingthe genomic sequence of the methane-producing microorganism Methano-coccus jannaschii, for example, will en-able researchers to explore the process ofmethanogenesis in more detail and could

lead to cheaper production of fuel-grade methane.

Risk Assessment

Understanding the human genomewill have an enormous impact on theability to assess risks posed to indi-viduals by environmental exposure totoxic agents. Scientists know that ge-netic differences make some peoplemore susceptible—and others moreresistant—to such agents. Far morework must be done to determine thegenetic basis of such variability. Thisknowledge will directly addressDOE’s long-term mission to under-stand the effects of low-levelexposures to radiation and otherenergy-related agents, especially interms of cancer risk.


4

produced at Los Alamos National Labo-ratory (LANL) and Lawrence LivermoreNational Laboratory (LLNL). These librar-ies allowed the huge task of mapping andsequencing the entire 3 billion bases inthe human genome to be broken down into24 much smaller single-chromosomeunits. Availability of the libraries has en-abled the participation of many laborato-ries worldwide. Some three generationsof clone libraries with improving charac-teristics have been produced and widelydistributed. In the DOE-supported proj-ects, DNA clones representing chromo-somes 16, 19, and 22 have been ordered(mapped) and are now providing mate-rial needed for large-scale sequencing.

SequencingToward the goal of greatly increasing thespeed and decreasing the cost of DNAsequencing, DOE has supported im-provements in standard technologies andhas pioneered support for revolutionarysequencing systems. Marked improve-ments have been made in reagents, en-zymes, and raw data quality. Such novelapproaches as sequencing by hybridiza-tion (using DNA “chips”) and mass spec-trometry have already found important,previously unanticipated applicationsoutside the Human Genome Project.

Joint Genome Institute

In early 1997, the human genome centersat Lawrence Berkeley National Labora-tory, LANL, and LLNL began collabo-rating in the Joint Genome Institute(JGI), within which high-throughputsequencing will be implemented [seep. 26 and Human Genome News 8(2),1–2]. The initial JGI focus will be on se-quencing areas of high biological intereston several chromosomes, including hu-man chromosomes 5, 16, and 19. Estab-lishment of JGI represents a majortransition in the DOE Human GenomeProgram.

Previously, most goals were pursued bysmall- to medium-sized teams, with

modest multisite collaborations. The JGIwill house high-throughput implementa-tions of successful technologies thatwill be run with increasingly stringentprocess- and quality-control systems.

In addition, a small component aimed atunderstanding how genes function in thebody—a field known as functional ge-nomics—has been established and willgrow as sequencing targets are met.High-throughput functional genomicsrepresents a new era in human biology,one which will have profound implica-tions for solving biological problems.

InformaticsIn preparation for the production-sequencing phase, many algorithms forinterpreting DNA sequence have beendeveloped, and an increasing numberhave become available as services overthe Internet. Last year, the GRAIL (forGene Recognition and Analysis InternetLink) and GenQuest servers, developedand maintained at Oak Ridge NationalLaboratory, processed an average ofalmost 40 million bases of sequenceeach month.

As technology improves and data accu-mulates exponentially, continued progressin the Human Genome Project will de-pend increasingly on the development ofsophisticated computational tools andresources to manage and interpret the in-formation. The ease with which re-searchers can access and use the datawill provide a measure of the project’ssuccess. Critical to this success is thecreation of interoperable databases andother computing and informatics tools tocollect, organize, and interpret thousandsof DNA clones.

For additional information on the DOEgenome programs, refer to ResearchHighlights, p. 9; Research Narratives,p. 25; this report’s Part 2, 1996 Re-search Abstracts; and the Web site(http://www.ornl.gov/hgmis).


5

Genetic Mapping

• Complete the 2- to 5-cM map by 1995.

• Develop technology for rapidgenotyping.

• Develop markers that are easier touse.

• Develop new mapping technologies.

Physical Mapping

• Complete a sequence tagged site(STS) map of the human genome ata resolution of 100 kb.

DNA Sequencing

• Develop efficient approaches to se-quencing one- to several-megabaseregions of DNA of high biologicalinterest.

• Develop technology for high-throughput sequencing, focusing onsystems integration of all steps fromtemplate preparation to data analysis.

• Build up a sequencing capacity toallow sequencing at a collective rateof 50 Mb per year by the end of theperiod. This rate should result in anaggregate of 80 Mb of DNA sequencecompleted by the end of FY 1998.

Gene Identification

• Develop efficient methods for identify-ing genes and for placement of knowngenes on physical maps or sequencedDNA.

Technology Development

• Substantially expand support of in-novative technological develop-ments as well as improvements incurrent technology for DNA se-quencing and for meeting the needsof the Human Genome Project as awhole.

Model Organisms

• Finish an STS map of the mousegenome at a 300-kb resolution.

• Finish the sequence of the Escheri-chia coli and Saccharomyces cerevi-siae genomes by 1998 or earlier.

• Continue sequencing Caenorhab-ditis elegans and Drosophilamelanogaster genomes with the aimof bringing C. elegans to nearcompletion by 1998.

• Sequence selected segments ofmouse DNA side by side with corre-sponding human DNA in areas ofhigh biological interest.

Informatics

• Continue to create, develop, andoperate databases and databasetools for easy access to data, includ-ing effective tools and standards fordata exchange and links amongdatabases.

• Consolidate, distribute, and continueto develop effective software forlarge-scale genome projects.

• Continue to develop tools for com-paring and interpreting genomeinformation.

Five-Year Research Goalsof the U.S. Human Genome Project

October 1, 1993, to September 30, 1998 (FY 1994 through FY 1998)*

*Original 1990 goals were revised in 1993 due to rapid progress. A second revision was being developed at press time.

Ethical, Legal, and SocialImplications

• Continue to identify and defineissues and develop policy optionsto address them.

• Develop and disseminate policyoptions regarding genetic testingservices with potential widespreaduse.

• Foster greater acceptance ofhuman genetic variation.

• Enhance and expand public andprofessional education that issensitive to sociocultural andpsychological issues.

Training

• Continue to encourage trainingof scientists in interdisciplinarysciences related to genomeresearch.

Technology Transfer

• Encourage and enhance technol-ogy transfer both into and out ofcenters of genome research.

Outreach

• Cooperate with those who wouldestablish distribution centers forgenome materials.

• Share all information and materi-als within 6 months of theirdevelopment. This should beaccomplished by submission ofinformation to public databasesor repositories, or both, whereappropriate.

Major events in the U.S. Human Genome Project, including progress made toward thesegoals, are charted in a timeline on pp. ii–iii.

DOE Human Genome Program Report, Introduction 5

6

Evolution of a Vision:

In an interview at a DNA sequencing conference in Hilton Head,South Carolina,* David Smith, a founder and former Director of theDOE Human Genome Program, recalled the establishment of thiscountry’s first human genome project. The impressive early achieve-ments and spin-off benefits, he noted, offer more than mere vindica-tion for project founders. They also provide a tantalizing glimpseinto the future where, he observed, “scientists will be empowered tostudy biology and make connections in ways undreamt of before.”

Project BeginsSmith recalled reaction to the first publicstatement that DOE was starting a programwith the aim of sequencing the human ge-nome. “I announced it at the Cold Spring

view. “In fact, individual investigators cando things they would never be able to dootherwise. We’re beginning to see thatdemonstrated at this meeting. For the firsttime, we’re finding people exploring sys-tematic ways of looking at gene function inorganisms. The genome project opens upenormous new research fields to be mined.Cottage-industry biologists won’t need a lotof robots, but they will have to be computerliterate to put the information all together.”

The genome project also is providing en-abling technologies essential to the futureof the emerging biotechnology industry,catalyzing its tremendous growth. Accord-ing to Smith, the technologies are

Genomics has come of age, and it isopening the door to entirely newapproaches to biology.

“”

Genome Project Origins,

Harbor meeting in May 1986, and there wasa big hullabaloo.” After a year-long review,a National Academy of Sciences NationalResearch Council panel endorsed theproject and the basic strategy proposed.Smith pointed out that NIH and others werealso having discussions on the feasibility ofsequencing the human genome. “Once NIHgot interested, many more people becameinvolved. DOE and NIH signed a Memo-randum of Understanding in October 1988to coordinate our activities aimed at charac-terizing the human genome.” But, he ob-served, it wasn’t all smooth sailing. Thenascent project had many detractors.

Responding to CriticsMany scientists, prominent biologistsamong them, thought having the sequencewould be a misuse of scarce resources.Smith, laughing now, recalls one scientistcomplaining, “Even if I had the sequence,I wouldn’t know what to do with it.” Othercritics worried that the genome projectwould siphon shrinking research fundsaway from individual investigator-initiatedresearch projects. Smith takes the opposite

capable of more than elucidating the humangenome. “We’re developing an infrastruc-ture for future research. These technologieswill allow us to efficiently characterize anyof the organisms out there that pertain tovarious DOE missions, with such applica-tions as better fuels from biomass,bioremediation, and waste control. Theyalso will lead to a greater understanding ofglobal cycles, such as the carbon cycle, andthe identification of potential biological in-terventions. Look at the ocean; an amazingnumber of microbes are in there, but wedon’t know how to use them to influencecycles to control some of the harmfulthings that might be happening. Up to now,biotechnology has been nearly all healthoriented, but applications of genome re-search to modern biology really go beyondhealth. That’s one of the things motivatingour program to try to develop some of theseother biotechnological applications.”

Responding to criticism about not research-ing gene function early in the project,Smith reasserted that the purpose of theHuman Genome Project is to build tech-nologies and resources that will enable re-searchers to learn about biology in a much

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

*The Seventh International Genome Sequenc-ing and Analysis Conference, September 1995.

The DOE Human Genome Pro-gram began as a natural out-growth of the agency’slong-term mission to developbetter technologies for measur-

ing health effects, particularly induced mu-tations. As Smith explained it, “DOE hadbeen supporting mutation studies in Japan,where no heritable mutations could be de-tected in the offspring of populations ex-posed to the atomic blasts at Hiroshima andNagasaki. The program really grew out of aneed to characterize DNA differences be-tween parents and children more efficiently.DOE led the development of many muta-tion tests, and we were interested in devel-oping even more sensitive detectionmethods. Mortimer Mendelsohn ofLawrence Livermore National Laboratory,a member of the International Commissionfor Protection Against EnvironmentalMutagens and Carcinogens, and I decidedto hold a workshop to discuss DNA-basedmethods (see Human Genome Projectchronology, p. ii).

“Ray White (University of Utah) organizedthe meeting, which took place in Alta,Utah, in December 1984. It was a smallmeeting but very stimulating intellectually.We concluded the obvious—that if you re-ally wanted to use DNA-based technolo-gies, you had to come up with moreefficient ways to characterize the DNA ofmuch larger regions of the genome. And theultimate sensitivity would be the capabilityto compare the complete DNA sequencesof parents and their offspring.”


7

Present and Future Challenges,Far-Reaching Benefits

more efficient way. “The genome budget isdevoted to very specific goals, and wemake sure that projects contribute towardreaching them.”

International ScopeSmith credited the international communitywith contributing to many project suc-cesses. “The initial planning was for a U.S.project, but the outcome, of course, is thatit is truly international, and we would notbe nearly as far as we are today withoutthose contributions. Also, there’s been a fairamount of money from private companies,and support from the Muscular DystrophyAssociation in France and The WellcomeTrust in the United Kingdom has been ex-tremely important.”

Technology AdvancesWhile noting enormous advances across theboard, Smith cited automation progress andobserved that tremendously powerful ro-bots and automated processes are changingthe way molecular biology is done. “A lotof novel technologies probably won’t beuseful for initial sequencing but will bevery valuable for comparing sequences ofdifferent people and for polymorphismstudies. One of the most gratifying recentsuccesses is the DNA polymerase engineer-ing project. Researchers made a fairlysimple change, but it resulted in athermosequenase that may answer a lot ofproblems, reduce the cost of sequencing,and give us better data.”

Progress in genome research requires theuse of maturing technologies in otherfields. “The combination of technologiesthat are coming together has been fortu-itous; for example, advances in informaticsand data-handling technologies have had atremendous impact on the genome project.We would be in deep trouble if they were ata less-mature stage of development. Theyhave been an important DOE focus.”

ELSISmith described tangible progress towardgoals associated with programs on the ethi-cal, legal, and social issues (ELSI) relatedto data produced by the genome project.“ELSI programs have done a lot to educatethe thinkers, and this has produced a higherlevel of discourse in the country aboutthese issues. DOE is spending a large frac-tion of its ELSI money on informing spe-cial populations who can reach others.Educating judges has been especially wellreceived because they realize the potentialimpact of DNA technology on the courts.”

According to Smith, more people andgroups need to be involved in ELSI mat-ters. “We have some ELSI products: theDOE-NIH Joint ELSI Working Group hasan insurance task force report, and a DOEELSI grantee has produced draft privacylegislation. Now it’s time for others tocome and translate ELSI efforts into policy.Perhaps the new National Bioethics Advi-sory Commission can do some of this.”

New Model for BiologicalResearchSmith spoke of a changing paradigm guid-ing DOE-supported biology. “Some yearsago, the central idea or dogma in molecularbiology research was that information inDNA directs RNA, and RNA directs pro-teins. Today, I think there is a new para-digm to guide us: Sequence impliesstructure, and structure implies function.The word ‘implies’ in our new paradigmmeans there are rules,” continued Smith,“but these are rules we don’t understandtoday. With the aid of structural informa-tion, algorithms, and computers, we will beable to relate sequence to structure andeventually relate structure to function. Oureffort focuses on developing the technolo-gies and tools that will allow us to do thisefficiently.”

“That’s how I think about what we do atDOE,” he said. “We’re working a lot ontechnology and projects aimed at humanand microbial genome sequencing. For un-derstanding sequence implications, we aremaking major, increasing investments insynchrotrons, synchrotron user facilities,neutron user facilities, and big nuclearmagnetic resonance machines. These are allaimed at rapid structure determination.”Smith explained that now we are seeing thebeginnings of the biotechnology revolutionimplied by the sequence-to-structure-to-function paradigm. “If you really under-stand the relationship between sequenceand function, you can begin to design se-quences for particular purposes. We don’tyet know that much about the world aroundus, but there are capabilities out there in thebiological world, and if we can understandthem, we can put those capabilities to use.”

“Comparative genomics,” he continued,“will teach us a tremendous amount abouthuman evolution. The current phylogenetictree is based on ribosomal RNA sequences,but when we have determined whole ge-nomic sequences of different microbes,they will probably give us different ideasabout relationships among archaebacteria,eukaryotes, and prokaryotes.”

Feeling good about progress over the previ-ous 5 years, Smith summed it up suc-cinctly: “Genomics has come of age, and itis opening the door to entirely new ap-proaches to biology.”

David Smith retired at the end of January1996. Taking responsibility for the DOEHuman Genome Program is AristidesPatrinos, who is also Associate Directorof the DOE Office of Biological and Envi-ronmental Research. Marvin Frazier isDirector of the Health Effects and LifeSciences Research Division, which man-ages the Human Genome Program.


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

Looking to the FutureInsights, technologies, and resources already emerg-ing from the genome project, together with advancesin such fields as computational and structural biology,will pr ovide biologists and other researchers with im-portant tools for the 21st century.


9

.........................

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Highlights of Research Progress

genome researchers worldwide (http://www-bio.llnl.gov/genome/html/cosmid.html). Very high resolution chro-mosome maps based principally onNLGLP libraries were published in1995 for chromosomes 16 and 19.These are described in detail in the Re-search Narratives section of this report(see LLNL, p. 27, and LANL, p. 35).

PACs and BACsThe third generation of clone resourcessupporting chromosome mapping iscomposed of P1 artificial chromosome(PAC) and bacterial artificial chromo-some (BAC) libraries. A prototype PAClibrary was produced by the team ofLeon Rosner (then at DuPont) manyyears ago, but more efficient produc-tion began with improvements intro-duced by the DOE-supported teamsheaded by Melvin Simon at Caltech(BACs) and Pieter de Jong at RoswellPark (PACs).

In contrast to cosmids, BACs and PACsprovide a more uniform representationof the human genome, and the greaterlength of their inserts (90,000 to

he early years of the Hu-man Genome Programhave been remarkably suc-cessful. Critical resourcesand infrastructures have

been established, and technologies havebeen developed for producing severaluseful types of chromosomal maps.These gains are supporting the project’stransition to the large-scale sequencingphase. Some highlights and trends in theU.S. Department of Energy’s (DOE)Human Genome Program after FY 1993are presented in this section.

Clone Resources forMapping, Sequencing,and Gene HuntingThe demands of large chromosomalmapping and sequencing efforts havenecessitated the development of severaldifferent types of clone collections(called libraries) carrying human DNA.Three generations of DOE-developed li-braries are being distributed to researchteams in the United States and abroad.In these libraries, human DNA seg-ments of various lengths are maintainedin bacterial cells.

NLGLP LibrariesThe first two generations arechromosome-specific libraries carryingsmall inserts of human DNA (15,000 to40,000 base pairs). As part of the Na-tional Laboratory Gene Library Project(NLGLP) begun in 1983, these librarieswere prepared at Los Alamos NationalLaboratory (LANL) and LawrenceLivermore National Laboratory (LLNL)using DOE flow-sorting technology toseparate individual chromosomes. Li-brary availability has allowed the verydifficult whole-genome tasks to be di-vided into 24 more manageable single-chromosome projects that could bepursued at separate research centers.Completed in 1994, NLGLP librarieshave provided critical resources to

TTransitioning tolarge-scale sequencing

. . . . . . . . . . . . . . . . . . . . . . . . . .

Research Narratives

Separate narratives, beginning on p. 25, contain detaileddescriptions of research programs and accomplishments atthese major DOE genome research facilities.

• Lawrence Livermore National Laboratory

• Los Alamos National Laboratory

• Lawrence Berkeley National Laboratory

• University of Washington Genome SequencingLaboratory

• Genome Database

• National Center for Genome Resources

Research Abstracts

Descriptions of individual research projects at other institu-tions are given in Part 2, 1996 Research Abstracts.

DOE Genome ResearchWeb Sitehttp://www.ornl.gov/hgmis/research.html


10

300,000 base pairs) facilitates bothmapping and sequencing. Their useful-ness was illustrated dramatically in1993 when the first breast cancer–susceptibility gene (BRCA1) was foundin a BAC clone after other types of re-sources had failed. The next year, withmajor support from NIH, de Jong’s PACscontributed to the isolation of the secondhuman breast cancer–susceptibility gene(BRCA2).

Mapping

The assembly of ordered, overlappingsets (contigs) of high-quality clones haslong been considered an essential steptoward human genome sequencing.Because the clones have been mappedto precise genomic locations, DNAsequences obtained from them can belocated on the chromosomes with mini-mal uncertainty.

The large insert size of BACs andPACs allows researchers to visuallymap them on chromosomes by usingfluorescence in situ hybridization(FISH) technology (see photomicro-graph below). These mapped BACs andPACs represent very valuable resourcesfor the cytogeneticist exploring chromo-somal abnormalities. Two major medi-cal genetics resources have beendeveloped: (1) The Resource for Mo-lecular Cytogenetics at the University ofCalifornia, San Francisco, in collabora-tion with the Lawrence Berkeley Na-tional Laboratory (LBNL) team led byJoe Gray (http://rmc-www.lbl.gov) and(2) The Total Human Genome BAC-PAC Resource at Cedars-Sinai MedicalCenter, Los Angeles, developed by JulieKorenberg’s laboratory (see map, p. 12,and Web site, http://www.csmc.edu/genetics/korenberg/korenberg.html).

FISH Mapping on DNAFibers. The fluorescencemicroscope reveals severalindividual cloned DNAfibers from yeast artificialchromosomes (YACs, inblue) after molecularcombing to attach andstretch the DNA moleculesacross a glass microscopeslide. Also shown are thelocations of two P1 clones,labeled green and red,mapped onto the YAC fibersusing FISH. Digitalimaging technology can beused to assemble physicalmaps of chromosomes witha resolution of about 3 to5 kilobases. [Source: Joe Gray,University of California, SanFrancisco]

DOE Human Genome Program Report, Highlights

11

Coordinated Mappingand Sequencing

A simple strategy was proposed in 1996for choosing BACs or PACs to elongatesequenced regions most efficiently[Nature 381, 364–66 (1996)]. The firststep is to develop a BAC end sequencedatabase, with each entry having theBAC clone name and the sequences ofits human insert ends. In toto, the sourceBACs should represent a 15- to 20-foldcoverage of the human genome. Thenfor any BAC or chromosomal region se-quenced, a comparison against the data-base will return a list of BACs (orPACs) that overlap it. Optimal choicesfor the next BACs (or PACs) to be se-quenced can then be made, entailingminimal everlap (and therefore minimalredundancy of sequencing).

Two pilot BAC-PAC end-sequencingprojects were initiated in September of1996 to explore feasibility, optimizetechnologies, establish quality controls,and design the necessary informatics in-frastructure. Particular benefits are an-ticipated for small laboratories that willnot have to maintain large libraries ofclones and can avoid preliminary contigmapping (see abstracts of Glen Evans;Julie Korenberg; Mark Adams, LeroyHood, and Melvin Simon; and Pieter deJong in Part 2 of this report).

Updated information on BAC-PAC re-sources can be found on the Web (http://www.ornl.gov/meetings/bacpac/95bac.html). [See Appendix C: Human SubjectsGuidelines, p. 77 or http://www.ornl.gov/hgmis/archive/nchgrdoe.html forDOE-NIH guidelines on using DNAfrom human subjects for large-scalesequencing.]

cDNA LibrariesIn 1990, DOE initiated projects to en-rich the developing chromosome contigmaps with markers for genes. Althoughthe protein-encoding messenger RNAsare good representatives of their source

genes, they are unstable and must beconverted to complementary DNAs(cDNAs) for practical applications.These conversions are tricky, and arti-facts are introduced easily. The team ledby Bento Soares (University of Iowa)has optimized the steps and continues toproduce cDNA libraries of the highestquality. At LLNL, individual cDNAclones are put into standard arrays andthen distributed worldwide for charac-terization by the international IMAGE(for Integrated Molecular Analysis ofGene Expression) Consortium (see box,p. 13).

Initially supported under a DOE cDNAinitiative, Craig Venter’s team (now atThe Institute for Genomic Research)greatly improved technologies for read-ing sequences from cDNA ends (ex-pressed sequence tags, called ESTs).Together with complementary analysissoftware, ESTs were shown to be a valu-able resource for categorizing cDNAsand providing the first clues to the func-tions of the genes from which they arederived. This fast EST approach has at-tracted millions of dollars in commercialinvestment. Mapping the cDNA onto achromosome can identify the location ofits corresponding gene. Many laborato-ries worldwide are contributing to thecontinuing task of mapping the estimated70,000 to 100,000 human genes.

HAECsAll the previously described DNAclones are maintained in bacterial hostcells. However, for unknown reasons,some regions of the human genome ap-pear to be unclonable or unstable inbacteria. The team led by Jean-MichelVos (University of North Carolina,Chapel Hill) has developed a human ar-tificial episomal chromosome (HAEC)system based on the Epstein-Barr virusthat may be useful for coverage of theseespecially difficult regions. In the broaderbiomedical community, HAECs alsoshow promise for use in gene therapy.


12

BAC-PAC Map. The Total Human Genome BAC-PACResource represents an important tool for understandingthe genes responsible for human development and disease(http://www.csmc.edu/genetics/korenberg/korenberg.html).The Resource, consisting of more than 5000 BAC and PACclones, covers every human chromosome band and 25%

of the entire human genome. Each color dot represents asingle BAC or PAC clone mapped by FISH to a specificchromosome band represented in black and white. Theclones, which are stable and useful for sequencing, havebeen integrated with the genetic and physical chromosomemaps. [Source: Julie Korenberg, Cedars-Sinai Medical Center]


13

Resources for GeneDiscoveryHunting for disease genes is not a spe-cific goal of the DOE Human GenomeProgram. However, DOE-supportedlibraries sent to researchers worldwidehave facilitated gene hunts by many re-search teams. DOE libraries have playeda role in the discovery of genes for cysticfibrosis, the most common lethal inher-ited disease in Caucasians; Huntington’sdisease, a progressive lethal neurologicaldisorder; Batten’s disease, the mostprevalent neurodegenerative childhooddisease; two forms of dwarfism; Fanconianemia, a rare disease characterized byskeletal abnormalities and a predisposi-tion to cancer; myotonic dystrophy, themost common adult form of musculardystrophy; a rare inherited form of breastcancer; and polycystic kidney disease,which affects an estimated 500,000people in the United States at a healthcarecost of over $1 billion per year.

The team led by Fa-Ten Kao (EleanorRoosevelt Institute) has microdissected

several chromosomes and made deriva-tive clone libraries broadly available todisease-gene hunters. This resourceplayed a critical role in isolating thegene responsible for some 15% of coloncancers.

Of Mice and Humans:The Value ofComparative AnalysesA remaining challenge is to recognizeand discriminate all the functional con-stituents of a gene, particularly regula-tory components not represented withincDNAs, and to predict what each genemay actually do in human biology.Comparing human and mouse se-quences is an exceptionally powerfulway to identify homologous genes andregulatory elements that have been sub-stantially conserved during evolution.

Researchers led by Leroy Hood (Uni-versity of Washington, Seattle) haveanalyzed more than 1 million bases ofsequence from T-cell receptor (TCR)

To IMAGE the HumanGene MapSince 1993, the Integrated MolecularAnalysis of Gene Expression (IM-AGE) Consortium has played a majorrole in the development of a humangene map. Founding members of theIMAGE Consortium are Bento Soares(Columbia University, now at Univer-sity of Iowa), Gregory Lennon(LLNL), Mihael Polymeropoulos(National Institutes of Health’s Na-tional Institute of Mental Health),and Charles Auffrey (Généthon, inFrance). Because cDNA moleculesrepresent coding (expressed-gene)areas of the genome, sets of clonedcDNAs are a valuable resource tothe gene-mapping community. The

cDNA libraries representing differenttissues have many members in com-mon. Thus, good coordination amongparticipating laboratories can minimizeredundant work. The international IM-AGE Consortium laboratories fulfillthis role by developing and arrayingcDNA clones for worldwide use.[http://www-bio.llnl.gov/bbrp/image/image.html]

From the IMAGE cDNA clones, re-searchers at the Washington University(St. Louis) Sequencing Center deter-mine ESTs with support from Merck,Inc. The data, which are used in genelocalization, are then entered into publicdatabases. More than 10,000 chromo-somal assignments have been enteredinto Genome Database (http://www.gdb.org). Including replica copies, over

3 million clones have been distrib-uted, probably representing about50,000 distinct human genes.

The IMAGE infrastructure is beingused in two additional programs. AtLLNL, the IMAGE laboratory arraysmouse cDNA libraries produced bySoares for the Washington UniversityMouse EST project (http://genome.wustl.edu/est/mouse_esthmpg.html)with sequencing sponsored by theHoward Hughes Medical Institute.Additional clone libraries are beingused in a collaborative sequencingproject sponsored by the NIH Na-tional Cancer Institute as part of theCancer Genome Anatomy Project toidentify and fully sequence genesimplicated in major cancers (http://www.ncbi.nlm.nih.gov/ncicgap).


14

chromosome regions of both human andmouse genomes. Many subtle functionalelements can be recognized only bycomparing human and mouse sequences.TCRs play a major role in immunityand autoimmune disease, and insightsinto their mechanisms may one day helptreat or even prevent such diseases asarthritis, diabetes, and multiple sclerosis(possibly even AIDS).

Comparative analysis is also used tomodel human genetic diseases. Givensequence information, researchers canproduce targeted mutations in the mouseas a rapid and economical route to elu-cidating gene function. Such studiescontinue to be used effectively at OakRidge National Laboratory (ORNL).

DNA SequencingFrom the beginning of the genomeproject, DOE’s DNA sequencing-technology program has supported bothimprovements to established method-ologies and innovative higher-risk strat-egies. The first major sequencingproject, a test bed for incremental im-provements, culminated with elucida-tion of the highly complex TCR region(described above) by a team led byHood.

A novel “directed” sequencing strategyinitiated at LBNL in 1993 provides apotential alternative approach that caninclude automation as a core design fea-ture. In this approach, every sequencingtemplate is first mapped to its originalposition on a chromosome (resolution,30 bases). The advantages of this methodinclude a large reduction in the numberof sequencing reactions needed and inthe sequence-assembly steps that follow.To date, this directed strategy hasachieved significant results with simpler,less repetitive nonhuman sequences, par-ticularly in the NIH-funded Drosophilagenome program. The system also is inuse at the Stanford Human GenomeCenter and Mercator Genetics, Inc.

The preparation of DNA clones for se-quencing involves several biochemicalprocessing steps that require differentsolution environments. At the White-head Institute, Trevor Hawkins has im-proved systems for reversible binding ofDNA molecules to magnetic beads thatare compatible with complete roboticmanagement. The second-generationSequatron fits on a tabletop with asingle robotic arm moving sample traysbetween servicing stations. This verycompact system, supported by sophisti-cated software, may be ideal for labora-tories with limited or costly floor space.

Fluorescent tags are critical componentsof conventional automated sequencingapproaches. The team of RichardMathies and Alexander Glazer (Univer-sity of California, Berkeley) has made aseries of improvements in fluorescencesystems that have decreased DNA inputneeds and markedly increased the qual-ity of raw data, thereby supportinglonger useful reads of DNA sequence.

Complementary improvements in enzy-mology have been achieved by the teamof Charles Richardson and Stanley Ta-bor (Harvard Medical School). Currentwidely used procedures for automatedDNA sequencing involve cycling be-tween high and low temperatures. TheHarvard researchers used informationabout the three-dimensional structure ofpolymerases (enzymes needed for DNAreplication) and how they function toengineer an improved Taq polymerase.ThermoSequenase, which is now pro-duced commercially as part of theThermoSequenase kit, reduces theamount of expensive sequencing re-agents required and supports popularcycle-sequencing protocols.

The application of higher electricalfields in gel electrophoresis separationof DNA fragments can increase se-quencing speed and efficiency. Conven-tional thick gels cannot adequatelydissipate the additional heat produced,however. Two promising routes to“thinness” are ultrathin slab gels and


15

capillary systems. An ultrathin gel sys-tem was developed by Lloyd Smith(University of Wisconsin, Madison) andlicensed for commercial development.

The replacement of gels by pumpablesolutions of long polymers is makingcapillary array electrophoresis (CAE)potentially practical for DNA sequenc-ing. The first CAE system for DNA wasdemonstrated by the team of BarryKarger (Northeastern University). In1995, Karger and Norman Dovichi (Uni-versity of Alberta, Canada) separatelyidentified CAE conditions under whichDNA sequencing reads could be ex-tended usefully up to the 1000-baserange. Another CAE system, developedby Edward Yeung (Iowa State Univer-sity), has been licensed for commercialproduction (see box, p. 23). Mathies hasdeveloped a system in which a confocalmicroscope displays DNA bands. Appli-cation of this system to the sizing oflarger DNA fragments binding multiplefluors allows single-molecule detection.

Replacing the gel-separation step withmass spectroscopy (MS) is anotherpromising approach for rapid DNA se-quencing. MS uses differences in mass-to-charge ratios to separate ionizedatoms or molecules. Early efforts at MSsequencing were plagued by chemicalreactivity during the “launching” phaseof matrix-assisted laser desorption ion-ization (MALDI). MALDI badly de-graded the DNA sample input. However,the degradation chemistry was elucidatedin Smith’s laboratory, leading to improve-ments. At ORNL, the team of Chung-Hsuan Chen has performed extensivetrials of alternative matrices and hasachieved significant improvements thatnow support sequence reads up to 100DNA bases. The system is undergoingtrials for DNA diagnostic applications.

The most revolutionary sequencing tech-nology is being pursued by the team ofRichard Keller and James Jett at LANL.Their goal is to read out sequence fromsingle DNA molecules, work that builds

on LANL’s expertise in flow cytometry.The strand to be sequenced is labeledfirst with fluors that distinguish thefour DNA subunits and is then sus-pended in a flow stream. An exonu-clease cleaves the subunits, which flowpast an interrogating laser system thatreports the subunits’ identities. All sys-tem constituents are operational butlimited by the low subunit release ratesof commercially available exonu-cleases. A current developmental focusis on identifying more active exonu-cleases.

Synthetic DNA strands in the 15- to 30-base range (oligomers) play essentialroles in DNA sequencing; in sample-preparation steps for the polymerasechain reaction, which copies DNAstrands millions of times; and in DNA-based diagnostics. The cost of customoligomer synthesis once was a limitingfactor in many research projects. Amore economical, highly parallel oligo-mer synthesis technology was devel-oped by Thomas Brennan at StanfordUniversity (see last bullet, p. 22, forfurther details).

The sequencing by hybridization(SBH) technology provides informationonly on short stretches of DNA in asingle trial (interrogation), but thou-sands of low-cost interrogations can beperformed in parallel. SBH is very use-ful for rapid classification of shortDNAs such as cDNAs, very low costDNA resequencing, and detection ofDNA sequence differences (polymor-phisms) over short regions. The team ofRadomir Crkvenjakov and RadojeDrmanac invented one format of SBHwhile in Yugoslavia, made substantialimprovements at Argonne NationalLaboratory (ANL), and later startedHyseq Inc. to commercialize thesetechnologies. At ANL, another imple-mentation, SBH on matrices (SHOM)of gels, holds promise for high-accu-racy sequence proofreading and diverseDNA diagnostics. The ANL team, ledby Andrei Mirzabekov, collaborates


16

with the Englehardt Institute in Moscow,where SHOM was demonstrated initially.

Informatics: DataCollection and AnalysisExplosive growth of information and thechallenges of acquiring, representing,and providing access to data pose continu-ing monumental tasks for the large publicdatabases. Over the last 3 years, the Ge-nome Database (GDB), the major inter-national repository of human genomemapping data, has made extensive changesculminating in the enhanced representa-tion of genomic maps and gene informa-tion in GDB V6.0. Major issues for theGenome Sequence DataBase (GSDB),established in 1994, are to capture andannotate the sequence data and to repre-sent it in a form capable of supportingcomplex, ad hoc queries. Both GDB andGSDB have been restructured recently tohandle the increasing flood of data andmake it more useful for downstreambiology (see Research Narratives, GDB,p. 49, and GSDB, p. 55. [http://www.gdb.org and http://www.ncgr.org/gsdb]

Victor Markowitz, formerly of LBNL, hasdeveloped a suite of database tools allow-ing substantial modifications of underly-ing data structures while the biologists’query tools remain stable. [http://gizmo.lbl.gov/DM_TOOLS/DMTools.html]

The Genome Annotation Consortium(based at ORNL) was initiated in 1997 tobe a modular, distributed informatics fa-cility for analyzing and processing (e.g.,annotating) genome-scale sequence data.

The many improvements in World WideWeb software now enable maps to bedownloaded simply by using a browserwith accessory software provided byGDB. Computers sift stretches of DNAsequence for patterns that identify suchbiologically important features as pro-tein-coding regions (exons), regulatoryareas, and RNA splice sites. Other com-puter tools are used to compare a new se-

quence (i.e., a putative gene) against allother database entries, retrieve any ho-mologous sequences that already havebeen entered, and indicate the degree ofsimilarity.

The Gene Recognition and AnalysisInternet Link (GRAIL) at ORNL local-izes genes and other biologically impor-tant sequence features (see box, p. 17).

Another analytical service that returnsinformative, annotated data is MAG-PIE, provided through ANL by TerryGaasterland. MAGPIE is designed toreside locally at the site of a genomeproject and actively carry out analysisof genome sequence data as it is gener-ated, with automated continued reevalu-ation as search databases grow (http://www.mcs.anl.gov/home/gaasterl/magpie.html). Once an automated func-tional overview has been established, itremains to pinpoint the organisms’ ex-act metabolic pathways and establishhow they interact. To this end, the WIT(What is There) system, which succeedsPUMA, supports the construction ofmetabolic pathways. Such constructionsor models are based on sequence data,the clearly established biochemistry ofspecific organisms, and an understand-ing of the interdependencies of bio-chemical mechanisms. WIT, which wasdeveloped by Evgenij Selkov and RossOverbeek at ANL, offers a particularlyvaluable tool for testing current hypoth-eses about microbial biology. [http://www.cme.msu.edu/WIT]

Researchers at the University of Colo-rado have developed another approachfor predicting coding regions in ge-nomic DNA, combining multiple typesof evidence into a single scoring func-tion, and returning both optimal andranked suboptimal solutions. The ap-proach is robust to substitution errorsbut sensitive to frameshift errors. Thegroup is now exploring methods forpredicting other classes of sequence re-gions, especially promoters. [software


17

and information: http://beagle.colorado.edu/~eesnyder/GeneParser.html]

The Baylor College of Medicine (BCM)Search Launcher improves user accessto the wide variety of database-searchtools available on the Web. SearchLauncher features a single point of en-try for related searches, the addition ofhypertext links to results returned by re-mote servers, and a batch client. [http://gc.bcm.tmc.edu:8088/search-launcher/launcher.html]

FASTA-SWAP, also from the BCMgroup, is a new pattern-search tool fordatabases that improves sensitivity andspecificity to help detect related se-quences. BEAUTY, an enhanced ver-sion of the BLAST database-searchprogram, improves access to informa-

GRAIL and GenQuestIn 1996 the Gene Recognition andAnalysis Internet Link (GRAIL)processed nearly 40 million basesof sequence per month, making itthe most widely used “gene-finding” system available. Devel-oped at Oak Ridge National Labo-ratory (ORNL) by a team led byEd Uberbacher, GRAIL uses arti-ficial intelligence and machinelearning to discover complex rela-tionships in sequence data. ThegenQuest server, also at ORNL,compares information generatedby GRAIL with data in protein,DNA, and motif databases to addfurther value to annotation ofDNA sequences.

tion about the functions of matchedsequences and incorporates additionalhypertext links. Graphical displays al-low correlation of hit positions with an-notated domain positions. Future plansinclude providing access to informationfrom and direct links to other databases,including organism-specific databases.

PROCRUSTES uses comparisons ofthe same gene of different species todelimit gene structure much more accu-rately. The product of a collaborationbetween Pavel Pevzner (University ofSouthern California) and two Russianresearchers, PROCRUSTES is based onthe spliced-alignment algorithm, whichexplores all possible exon assembliesand finds the multiexon structure thatbest fits a related protein. [http://www-hto.usc.edu/software/procrustes]

The figure above shows the GRAIL analysis of part of the human majorhistocompatibility locus, which carries genes responsible for cellularimmunity. Included in this analysis are potential exons (gene-codingregions), gene models, CpG islands (areas rich in bases C and G found inmost mammalian genes), and repetitive DNA elements. [Source: Richard Mural,ORNL]

permits remote program calls to allbasic GRAIL-genQuest analysisservices, thus allowing convenientintegration of GRAIL results intoautomated analysis pipelines.

GRAIL’s latest version (1.3) com-bines a Motif Graphical Clientwith improved sensitivity andsplice-site recognition, better per-formance in AT-rich regions, newanalysis systems for model organ-isms, and frameshift detection.This system can be used on a widevariety of UNIX platforms, includ-ing Sun, DEC, and SGI. The manyways to access GRAIL include acommand line sockets client that

Contact GRAIL staff through theWeb site at http://compbio.ornl.gov or at [email protected] e-mail and ftp access.


18

Ethical, Legal, andSocial Issues (ELSI)From the outset of the Human GenomeProject, researchers recognized that theresulting increase in knowledge abouthuman biology and personal genetic in-formation would raise complex ethicaland policy issues for individuals andsociety. Rapid worldwide progress inthe project has heightened the urgencyof this challenge.

Most observers agree that personalknowledge of genetic susceptibility canbe expected to serve humankind well,opening the door to more accurate diag-noses, preventive intervention, intensi-fied screening, lifestyle changes, andearly and effective treatment. But suchknowledge has another side, too: risk ofanxiety, unwelcome changes in personalrelationships, and the danger of stigma-tization. Often, genetic tests can indi-cate possible future medical conditionsfar in advance of any symptoms oravailable therapies or treatments. Ifhandled carelessly, genetic informationcould threaten an individual with dis-crimination by potential employers andinsurers.

Other issues are perhaps less immediatethan these personal concerns but no less

challenging. How, for example,are products of the Human Ge-nome Project to be patented andcommercialized? How are the ju-dicial, medical, and educationalcommunities—not to mention thepublic at large—to be educatedeffectively about genetic researchand its implications?

To confront these issues, the DOEand NIH ELSI programs jointlyestablished an ELSI workinggroup to coordinate policy andresearch between the two agencies.[An FY 1997 report evaluatingthe joint ELSI group is availableon the Web (http://www.ornl.gov/hgmis/archive/elsirept.html).]

The DOE Human Genome Program hasfocused its ELSI efforts on education,privacy, and the fair use of genetic in-formation (including ownership andcommercialization); workplace issues,especially screening for susceptibilitiesto environmental agents; and implica-tions of research findings regarding in-teractions among multiple genes andenvironmental influences.

A few highlights from the DOE ELSIportfolio for FY 1994 through FY 1997are outlined below.

• Three high school curriculum mod-ules developed by the BiologicalSciences Curriculum Study (BSCS).[http://www.bscs.org]

• An educational program in Los Ange-les to develop a culturally and linguis-tically appropriate genetics curriculumbased on a BSCS module (see above)for Hispanic students and their fami-lies. [http://vflylab.calstatela.edu/hgp]

• A series of workshops to educate acore group of 1000 judges around thenation and a handbook with compan-ion videotape to assist federal andstate judges in understanding and as-sessing genetic evidence in an in-creasing number of civil and criminalcases (see photo above).

The Ethical, Legal, and SocialIssues component of the DOEHuman Genome Programsupports projects to help judgesunderstand the scientificvalidity of the genetics-basedclaims that are poised to floodthe nation’s courtrooms. RobertF. Orr (left) of the NorthCarolina Supreme Court andFrancis X. Spina of the Massa-chusetts Appeals Court at theNew England RegionalConference on the Courts andGenetics (July 1997) participatein a hands-on laboratorysession. As a prelude to learningthe fundamentals of DNAscience and genetic testing, thejudges are precipitating DNA(seen as streaks on the glass rodin the tube) from a solutioncontaining the bacteriumEscherichia coli. [Courts andScience On-Line Magazine:http://www.ornl.gov/courts]


19

• Educational materials developed bythe Science+Literacy for HealthProject of the American Associationfor the Advancement of Science(AAAS) and targeted at or above the6th- to 8th-grade reading levels.[AAAS: 202/326-6453; Your Genes,Your Choices booklet: http://www.nextwave.org/ehr/books/index.html]

• A program at the University of Chi-cago aimed at developing a knowl-edge base for physicians and nurseswho will train other practitioners tointroduce new genetic services.

• A series of radio programs (see photoat right) on the science and ethicalissues of the genome project and aTV documentary program on ELSIissues. [http://www.pbs.org]

• The Gene Letter, a monthly onlinenewsletter on ELSI issues forhealthcare professionals and consum-ers. [http://www.geneletter.org]

• A congressional fellowship programin human genetics, administeredthrough AAAS, for one annual fel-lowship for a mid-career geneticist.[[email protected]]

• The draft Genetic Privacy Act, pre-pared as a model for privacy legisla-tion and covering the collection,analysis, storage, and use of DNAsamples and the genetic informationderived from them. [http://www.ornl.gov/hgmis/resource/privacy/privacy1.html]

• Privacy studies at the Center for So-cial and Legal Research, including ananalysis of the effects of new genetictechnologies on individuals and insti-tutions.

For details on these and other projects,see ELSI Abstracts, p. 45, in Part 2 of thisreport. In addition to the specific projectslisted in Part 2, the DOE program spon-sors a number of conferences and work-shops on ELSI topics.

Protection of Human Research Subjects

Leroy Hood (left) of the University of Washington, Seattle, talks withBari Scott at the 1996 DOE Human Genome Program Contractor-Grantee Workshop. Scott represented the Genome Radio Project (seetext at left), which is supported by the Ethical, Legal, and SocialIssues Program of the DOE Human Genome Program. (See theproject’s abstract in Part 2 of this report for more information.)

In 1996, President Clinton appointed the National Bioethics Advisory Com-mission to provide guidance on the ethical conduct of current and future bio-logical and behavioral research, especially that related to genetics and therights and welfare of human research subjects (http://www.nih.gov/nbac/nbac.htm).

Also in 1996, DOE and NIH issued a document providing investigators withguidance in the use of DNA from human subjects for large-scale sequencingprojects (see Appendix C: Human Subjects Guidelines, p. 77). [http://www.ornl.gov/hgmis/archive/nchgrdoe.html]

DOE ELSI Web Sitehttp://www.ornl.gov/hgmis/resource/elsi.html


20

Lawrence Livermore National Laboratory researcher Maria de Jesus, who designed softwareto automate DNA isolation. [Source: Linda Ashworth, LLNL]


21

.........................

ransferring technology tothe private sector, a pri-mary mission of DOE, isstrongly encouraged in theHuman Genome Program

to enhance the nation’s investment inresearch and technological competitive-ness. Human genome centers atLawrence Berkeley National Laboratory(LBNL), Lawrence Livermore NationalLaboratory (LLNL), and Los AlamosNational Laboratory (LANL) provideopportunities for private companies tocollaborate on joint projects or use labo-ratory resources. These opportunities in-clude access to information (includingdatabases), personnel, and special facili-ties; informal research collaborations;Cooperative Research and DevelopmentAgreements (CRADAs); and patent andsoftware licensing. For information onrecently developed resources, contactindividual genome research centers orsee Research Highlights, beginning onp. 9. Many universities have their ownlicensing and technology transfer offices.

Some collaborations and technology-transfer highlights from FY 1994through FY 1996 are described below.

CollaborationsInvolvement of the private sector in re-search and development can facilitatesuccessful transfer of technology to themarketplace, and collaborations canspeed production of essential tools forgenome research. A number of interac-tive projects are now under way, andothers are in preliminary stages.

CRADAsOne technology-transfer mechanismused by DOE national laboratories isthe CRADA, a legal agreement with anongovernmental organization to col-laborate on a defined research project.Under a CRADA, the two entities sharescientific and technological expertise,with the governmental organization pro-viding personnel, services, facilities,

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Technology Transfer

equipment, or other resources. Fundsmust come from the nongovernmentalpartner. A benefit to participating com-panies is the opportunity to negotiateexclusive licenses for inventions arisingfrom these collaborations. For periodsthrough 1996, the CRADAs in place inthe DOE Human Genome Program in-cluded the following:

• LLNL with Applied BiosystemsDivision of Perkin-Elmer Corporationto develop analytical instrumentationfor faster DNA sequencing instru-mentation;

• LANL with Amgen, Inc., to developbioassays for cell growth factors;

• Oak Ridge National Laboratory(ORNL) with Darwin Molecular,Inc., for mouse models of humanimmunologic disease;

• ORNL with Proctor &Gamble, Inc., foranalyses of liver regen-eration in a mousemodel; and

• Brookhaven NationalLaboratory with U.S.Biochemical Corpora-tion to identify proteinsuseful for primer-walking methods andlarge-scale sequencing.

Work for OthersIn other collaborations,the LBNL genome centeris participating in a Workfor Others agreementwith Amgen to automatethe isolation and charac-terization of large num-bers of mouse cDNAs.The center group is focus-ing on adapting LBNL’sautomated colony-pickingsystem to cDNA protocolsand applying methods togenerate large numbers offilter replicas for colony

TConverting scientificknowledge intocommercially usefulproducts

. . . . . . . . . . . . . . . . . . . . . . . . . .

Technology transfer involves convertingscientific knowledge into commerciallyuseful products. Through the 1980s, a se-ries of laws was enacted to encourage thedevelopment of commercial applicationsof federally funded research at universitiesand federal laboratories. Such laws [chieflythe Bayh-Dole Act of 1980, Stevenson-Wydler Act of 1980, and Federal Technol-ogy Transfer Act of 1986 (Public Laws96-517, 96-480, and 99-502, respectively)]were not aimed specifically at genome oreven biomedical research. However, suchresearch and the surrounding commercialbiotechnology enterprises clearly havebenefited from them. The biotechnologysector’s success owes much to federalpolicies on technology transfer and intel-lectual property. [Source: U.S. Congress,Office of Technology Assessment, Fed-eral Technology Transfer and the HumanGenome Project, OTA-BP-EHR-162(Washington, D.C.: U.S. GovernmentPrinting Office, September 1995)]

Technology TransferLegislation


22

filter hybridization and subsequentanalysis. [“Work for Others” projectssupported by an agency or organizationother than DOE (e.g., NIH, NationalCancer Institute, or a private company)can be conducted at a DOE installationbecause this work is complementary toDOE research missions and usually re-quires multidisciplinary DOE facilitiesand technologies.]

The Resource for Molecular Cytogeneticswas established at LBNL and the Uni-versity of California (UC), San Fran-cisco, with the support of the Office ofBiological and Environmental Researchand Vysis, Inc. (formerly Imagenetics).The Resource aims to apply fluorescentin situ hybridization (FISH) techniquesto genetic analysis of human tissuesamples; produce probe reagents; designand develop digital-imaging micros-copy; distribute probes, analysis tech-nology, and educational materials in themolecular cytogenetic community; andtransfer useful reagents, processes, andinstruments to the private sector forcommercialization.

Patenting andLicensing Highlights,FY 1994–96• A development license for single-

molecule DNA sequencing replacedthe 1991–94 CRADA (the firstCRADA to be established in the U.S.Human Genome Project) betweenLANL and Life Technologies, Inc.(LTI).

• In 1995, a broad patent was awardedto UC for chromosome painting. Thistechnology uses FISH to stain spe-cific locations in cells and chromo-somes for diagnosing, imaging, andstudying chromosomal abnormalitiesand cancer. Resulting from a 1989CRADA between LLNL and UC,FISH was licensed exclusively toVysis.

• Hyseq, Inc., was founded in 1993 byformer Argonne National Laboratoryresearchers Radoje Drmanac andRadomir Crkvenjakov to commer-cialize the sequencing by hybridiza-tion (SBH) technology. Hyseq hasexclusive patent rights to a variationknown as format 3 of SBH or the“super chip.” Hyseq later won an Ad-vanced Technology Program awardfrom the U.S. National Institute ofStandards and Technology to developthe technology further.

• Oligomers—short, single-strandedDNAs—are crucial reagents for ge-nome research and biomedical diag-nostics. ProtoGene Laboratories,Inc., was founded to commercializenew DNA synthesis technology(developed initially at LBNL withcompleted prototypes at StanfordUniversity) and to offer the firstlower-cost custom oligomer syn-thesis. The Parallel Array Synthesissystem, which independently synthe-sizes 96 oligomers per run in a stan-dard 96-well microtiter plate format,shows great promise for significantcost reductions. ProtoGene first

NIST Advanced Technology Program

Several commercial applications of research sponsored by the U.S.Human Genome Project have been furthered by the AdvancedTechnology Program (ATP) of the U.S. National Institute of Stan-dards and Technology. ATP’s mission is to stimulate economicgrowth and industrial competitiveness by encouraging high-riskbut powerful new technologies. Its Tools for DNA Diagnosticsprogram uses collaborations among researchers and industry todevelop (1) cost-effective methods for determining, analyzing, andstoring DNA sequences for a wide variety of diagnostic applica-tions ranging from healthcare to agriculture to the environment and(2) a new and potentially very large market for DNA diagnosticsystems.

Awardees have included companies developing DNA diagnosticchips, more powerful cytogenetic diagnostic techniques based oncomparative genomic hybridization, DNA sequencing instrumen-tation, and DNA analysis technology. Eventually, commercializa-tion of these underlying technologies is expected to generatehundreds of thousands of jobs. [800/287-3863, Fax: 301/926-9524,[email protected], http://www.atp.nist.gov]

DOE Human Genome Program Report, T echnology T ransfer

23

licensed sales and distribution to LTIand, later, production rights as well.LTI operates production centers inthe United States, Europe, and Japan.

• The GRAIL-genQuest sequence-analysis software developed atORNL was licensed by MartinMarietta Energy Systems (nowLockheed Martin Energy Research)to ApoCom, Inc., for pharmaceuticaland biotechnology company re-searchers who cannot use the Internetbecause of data-security concerns.The public GRAIL-genQuest serviceremains freely available on theInternet (see box, p. 17).

• In 1995, an exclusive license wasgranted to U.S. Biochemical Corpo-ration for a genetically engineered,heat-stable, DNA-replicating enzymewith much-improved sequencingproperties. The enzyme was devel-oped by Stanley Tabor at HarvardUniversity Medical School.

• In 1995, an advanced capillary arrayelectrophoresis system for sequenc-ing DNA was patented by Iowa StateUniversity. The system was licensedto Premier American TechnologiesCorporation for commercialization(see graphic at right and R&D 100Awards, next page).

• In 1996, a patent was granted toLANL researchers for DNA fragmentsizing and sorting by laser-inducedfluorescence. An exclusive licensewas awarded to Molecular Technol-ogy, Inc., for commercialization ofthe single-molecule detection capa-bility related to DNA sizing (seeR&D 100 Awards, next page).

SBIR and STTRSmall Business Innovation Research(SBIR) Program awards are designed tostimulate commercialization of newtechnology for the benefit of both theprivate and public sectors. The highlycompetitive program emphasizes

Capillary Array Electrophoresis (CAE). CAE systems promise dramaticallyfaster and higher-resolution fragment separation for DNA sequencing. Amultiplexed CAE system designed by Edward Yeung (Iowa State University)has been developed for commercial production by Premier AmericanTechnologies Corporation (PATCO). In the PATCO ESY9600 model, DNAsamples are introduced into the 96-capillary array; as the separatedfragments pass through the capillaries, they are irradiated all at once withlaser light. Fluorescence is measured by a charged coupled device that actsas a simultaneous multichannel detector. (Inset circle at upper left: Closeupview of individual capillary lanes with separated samples.) Because everyfragment length exists in the sample, bases are identified in order accord-ing to the time required for them to reach the laser-detector region.[Source: Thomas Kane, PATCO]

cutting-edge, high-risk research withpotential for high payoff in different ar-eas, including human genome research.Small business firms with fewer than500 employees are invited to submitapplications. SBIR human genome top-ics concentrate on innovative and ex-perimental approaches for carrying outthe goals of the Human Genome Project(see SBIR, p. 63, in Part 2 of this re-port). The Small Business TechnologyTransfer (STTR) Program fosters trans-fers between research institutions andsmall businesses. [DOE SBIR andSTTR contact: Kay Etzler (301/903-5867, Fax: -5488, [email protected]), http://sbir.er.doe.gov/sbir,http://sttr.er.doe.gov/sttr]


CCDCAMERA

LASER

– +

24

Technology TransferAwardA Federal Laboratory ConsortiumAward for Excellence in TechnologyTransfer was presented to EdwardYeung and a research team at IowaState University’s Ames Laboratory in1993. Their laser-based method forindirect fluorescence of biologicalsamples may have applications for rou-tine high-speed DNA sequencing (seegraphic, p. 23). Yeung also won the1994 American Chemical SocietyAward for Analytical Chemistry.

1997 R&D 100 Awards

DOE researchers in 12 facilities acrossthe country won 36 of the R&D 100Awards given by Research and Devel-opment Magazine for 1996 work. DOEaward-winning research ranged fromadvances in supercomputing to the bio-logical recycling of tires. Announced inJuly 1997, these awards bring DOE’sR&D 100 total to 453, the most of anysingle organization and twice as many asall other government agencies combined.

Two DOE genome-related researchprojects received 1997 R&D 100Awards. One was to Yeung (see text atleft and graphic, p. 23) for “ESY9600Multiplexed Capillary ElectrophoresisDNA Sequencer.”

The other award was to Richard Kellerand James Jett (LANL) with AmyGardner (Molecular Technologies, Inc.)for “Rapid-Size Analysis of IndividualDNA Fragments.” This technologyspeeds determination of DNA fragmentsizes, making DNA fingerprinting ap-plications in biotechnology and otherfields more reliable and practical.

R&D Magazine began making annualawards in 1963 to recognize the 100most significant new technologies,products, processes, and materials de-veloped throughout the world duringthe previous year (http://www.rdmag.com/rd100/100award.htm). Winners arechosen by the magazine’s editors and apanel of 75 respected scientific expertsin a variety of disciplines. Previouswinners of R&D 100 Awards includesuch well-known products as the flash-cube (1965), antilock brakes (1969),automated teller machine (1973), faxmachine (1975), digital compact cassette(1993), and Taxol anticancer drug (1993).


25

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Research Narratives

Readout from an automated DNA sequencing machine depicts the order of the four DNA bases (A, T, C, and G) in aDNA fragment of more than 500 bases. [Source: Linda Ashworth, LLNL]

Joint Genome Institute ...................................................................................................................26

Lawr ence Livermore National Laboratory ........................................................27

Los Alamos National Laboratory .....................................................................................35

Lawr ence Berkeley National Laboratory .............................................................41

University of Washington Genome Center.........................................................47

Genome Database..................................................................................................................................49

National Center for Genome Resources..................................................................55DOE Human Genome Program Report

26


○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DOE Merges Sequencing Efforts of Genome Centers

n a major restructuring of itsHuman Genome Program, onOctober 23, 1996, the DOEOffice of Biological and Envi-ronmental Research estab-

lished the Joint Genome Institute (JGI)to integrate work based at its threemajor human genome centers.

The JGI merger represents a shift to-ward large-scale sequencing via intensi-fied collaborations for more effectiveuse of the unique expertise and resourcesat Lawrence Berkeley National Labora-tory (LBNL), Lawrence Livermore Na-tional Laboratory (LLNL), and LosAlamos National Laboratory (see Re-search Narratives, beginning on p. 27 inthis report). Elbert Branscomb (LLNL)serves as JGI’s Scientific Director.Capital equipment has been ordered,and operational support of about$30 million is projected for the 1998fiscal year.

With easy access to both LBNL andLLNL, a building in Walnut Creek,California, is being modified. Here,starting in late FY 1998, productionDNA sequencing will be carried out forJGI. Until that time, large-scale se-quencing will continue at LANL,LBNL, and LLNL. Expectations arethat within 3 to 4 years the ProductionSequencing Facility will house some200 researchers and technicians work-ing on high-throughput DNA sequenc-ing using state-of-the-art robotics.

Initial plans are to target gene-rich re-gions of around 1 to 10 megabases forsequencing. Considerations include genedensity, gene families (especially clus-tered families), correlations to modelorganism results, technical capabilities,and relevance to the DOE mission (e.g.,DNA repair, cancer susceptibility, andimpact of genotoxins). The JGI programis subject to regular peer review.

Sequence data will be posted daily onthe Web; as the information progressesto finished quality, it will be submit-ted to public databases.

As JGI and other investigators involvedin the Human Genome Project are be-ginning to reveal the DNA sequence ofthe 3 billion base pairs in a referencehuman genome, the data already arebecoming valuable reagents for explora-tions of DNA sequence function in thebody, sometimes called “functionalgenomics.” Although large-scale se-quencing is JGI’s major focus, anotherimportant goal will be to enrich the se-quence data with information about itsbiological function. One measure ofJGI’s progress will be its success atworking with other DOE laboratories,genome centers, and non-DOE aca-demic and industrial collaborators. Inthis way, JGI’s evolving capabilities canboth serve and benefit from the widestarray of partners.

Production DNA Sequencing Begun Worldwide

Elbert BranscombJGI Scientific Dir ectorLawr ence Livermore National Laboratory7000 East Avenue, L-452Livermor e, CA 94551510/[email protected] [email protected]

http://www.jgi.doe.gov

I. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The year 1996 marked a transition to the final and most challengingphase of the U.S. Human Genome Project, as pilot programs aimed atrefining large-scale sequencing strategies and resources were fundedby DOE and NIH (see Research Highlights, DNA Sequencing, p. 14).Internationally, large-scale human genome sequencing was kickedoff in late 1995 when The Wellcome Trust announced a 7-year,$75-million grant to the private Sanger Centre to scale up its sequenc-ing capabilities. French investigators also have announced intentionsto begin production sequencing.

Funding agencies worldwide agree that rapid and free release of datais critical. Other issues include sequence accuracy, types of annotationthat will be most useful to biologists, and how to sustain the referencesequence.

The international Human Genome Organisation maintains a Web pageto provide information on current and future sequencing projects andlinks to sites of participating groups (http://hugo.gdb.org). The sitealso links to reports and resources developed at the February 1996 and1997 Bermuda meetings on large-scale human genome sequencing,which were sponsored by The Wellcome Trust.


27

Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Lawr ence Livermore National Laboratory Human Genome Center

he Human Genome Centerat Lawrence LivermoreNational Laboratory(LLNL) was established byDOE in 1991. The center

operates as a multidisciplinary teamwhose broad goal is understanding hu-man genetic material. It brings togetherchemists, biologists, molecular biolo-gists, physicists, mathematicians, com-puter scientists, and engineers in aninteractive research environment fo-cused on mapping, DNA sequencing,and characterizing the human genome.

Goals and PrioritiesIn the past 2 years, the center’s goalshave undergone an exciting evolution.This change is the result of several fac-tors, both intrinsic and extrinsic to theHuman Genome Project. They include:(1) successful completion of thecenter’s first-phase goal, namely ahigh-resolution, sequence-ready map ofhuman chromosome 19; (2) advances inDNA sequencing that allow acceleratedscaleup of this operation; and (3) devel-opment of a strategic plan for LLNL’sBiology and Biotechnology ResearchProgram that will integrate the center’sresources and strengths in genomicswith programs in structural biology, in-dividual susceptibility, medical biotech-nology, and microbial biotechnology.

The primary goal of LLNL’s HumanGenome Center is to characterize themammalian genome at optimal resolu-tion and to provide information and ma-terial resources to other in-house orcollaborative projects that allow exploi-tation of genomic biology in a synergis-tic manner. DNA sequence informationprovides the biological driver for thecenter’s priorities:

• Generation of highly accurate se-quence for chromosome 19.

• Generation of highly accurate se-quence for genomic regions of highbiological interest to the mission of

the DOE Office of Biological andEnvironmental Research (e.g., genesinvolved in DNA repair, replication,recombination, xenobiotic metabo-lism, and cell-cycle control).

• Isolation and sequence of the full in-sert of cDNA clones associated withgenomic regions being sequenced.

• Sequence of selected correspondingregions of the mouse genome in paral-lel with the human.

• Annotation and position of the se-quenced clones with physical land-marks such as linkage markers andsequence tagged sites (STSs).

• Generation of mapped chromo-some 19 and other genomic clones[cosmids, bacterial artificial chromo-somes (BACs), and P1 artificial chro-mosomes (PACs)] for collaboratinggroups.

• Sharing of technology with othergroups to minimize duplication ofeffort.

• Support of downstream biologyprojects, for example, structuralbiology, functional studies, humanvariation, transgenics, medical bio-technology, and microbial biotechnol-ogy with know-how, technology, andmaterial resources.

Center Organizationand ActivitiesCompletion and publication of the metricphysical map of human chromosome 19(see p. 28) in 1995 has led to consolida-tion of many functions associated withphysical mapping, with increased empha-sis on DNA sequencing. The center is or-ganized into five broad areas of researchand support: sequencing, resources, func-tional genomics, informatics and analyti-cal genomics, and instrumentation. Eacharea consists of multiple projects, andextensive interaction occurs both withinand among projects.

T

UpdateIn 1997 Lawrence Berkeley Na-tional Laboratory, LawrenceLivermore National Laboratory,and Los Alamos National Labora-tory began collaborating in a JointGenome Institute to implementhigh-throughput sequencing [seep. 26 and Human Genome News8(2), 1–2].

In lieu of individual abstracts,research projects and investi-gators at the LLNL HumanGenome Center are repre-sented in this narrative. Moreinformation can be found onthe center’s Web site (see URLabove).

Human Genome CenterLawr ence Livermore National

LaboratoryBiology and Biotechnology

Research Program7000 East Avenue, L-452Livermor e, CA 94551

Anthony V. CarranoDir ector510/422-5698, Fax: /[email protected]

Linda AshworthAssistant to Center Director510/422-5665, Fax: [email protected]

. . . . . . . . . . .

http://www-bio.llnl.gov/bbrp/genome/genome.html. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


28

SequencingThe sequencing group is divided intoseveral subprojects. The core team is re-sponsible for the construction of se-quence libraries, sequencing reactions,and data collection for all templates inthe random phase of sequencing. Thefinishing team works with data pro-duced by the core team to producehighly redundant, highly accurate “fin-ish” sequence on targets of interest. Fi-nally, a team of researchers focusesspecifically on development, testing,and implementation of new protocols

for the entire group, with an emphasison improving the efficiency and cost ba-sis of the sequencing operation.

ResourcesThe resources group provides mappedclonal resources to the sequencingteams. This group performs physicalmapping as needed for the DNA se-quencing group by using fingerprinting,restriction mapping, fluorescence in situhybridization, and other techniques. Asmall mapping effort is under way toidentify, isolate, and characterize BAC

Chromosome 19 Map. In thecurrent map (at left) of the first2 million bases at the p-telomereend of chromosome 19, theEcoR I restriction-mappedcontigs (represented by red lines)provide the starting material forgenomic sequencing across aregion.

Construction of the humanchromosome 19 physical map wasbased on a similar strategy formapping the roundwormCaenorhabditis elegans. View thecomplete map on the World WideWeb (http://www-bio.llnl.gov/genome/html/chrom_map.html).[Source: Adapted from figure providedby Linda Ashworth, LLNL]

cosmidclones

genes(red)and

markers(black)

Met

ric S

cale

7501

P-TEL

p13.3

B42542

784g9

779g5772a3

161h1

P23216

P21799

893a12

P129573

P129574

842h9

P129580

1643220282

ACUTELYMPHOBLASTICLEUKEMIA

0.0

2.0 Mb

Apa813

D19S31319474255492998923268183823313328741

2773119462

2185621817

MAdCAM-1CDC34

D19S814BSG

hm289D19S466D19S373

3351618174202332873823821

29957

1940129192

23667

AZU1PRTN3

ELA2D19S466POLR2E

GPX4

D19S373

ab1c12

RPS15PCSK4

OLFRTCF3

D19S347

+

+++

+++

+

+

+

++

§§

Legend

+

In the column labeled cosmid clones, black indicates a FISH-ordered clone where distance between clones has been measured. Other cosmids are shown in red. Genes are in red to the left of the metric scale. Other markers are labeled in black. A disease associated with a specific gene is shown in blue to the right of the metric scale.

Restriction-mapped contig

BAC, PAC, or P1 clone

YAC with known and concordant size

YAC with unknown or discordant size

Sequence tagged site (STS)

STS and/or hybridization results

Polymorphic marker§

DOE Human Genome Program Report, LLNL

29

clones (from anywhere in the human ge-nome) that relate to susceptibility genes,for example, DNA repair. These cloneswill be characterized and provided forsequencing and at the same time con-tribute to understanding the biology ofthe chromosome, the genome, and sus-ceptibility factors. The mapping teamalso collaborates with others using thechromosome 19 map as a resource forgene hunting.

Functional GenomicsThe functional genomics team is respon-sible for assembling and characterizingclones for the Integrated MolecularAnalysis of Gene Expression (calledIMAGE) Consortium and cDNA se-quencing, as well as for work on geneexpression and comparative mouse

genomics. The effort emphasizes genesinvolved in DNA repair and linksstrongly to LLNL’s gene-expression andstructural biology efforts. In addition,this team is working closely with OakRidge National Laboratory (ORNL) todevelop a comparative map and the se-quence data for mouse regions syntenicto human chromosome 19 (see p. 32).

Informatics and AnalyticalGenomicsThe informatics and analytical genom-ics group provides computer sciencesupport to biologists. The sequencinginformatics team works directly withthe DNA sequencing group to facilitateand automate sample handing, data ac-quisition and storage, and DNA se-quence analysis and annotation. The

Putative-Gene Classification. The figure depicts the functional classification of putative genes identified in a 1.02-Mbregion on the long arm of human chromosome 19. Analysis of the completed sequence between markers D19S208 andCOX7A1 revealed 43 open reading frames (ORFs) or putative genes. (An ORF is a DNA region containing specificsequences that signal the beginning and ending of a gene.)

Thirty of these putative genes were found to have sequence similarities to a wide variety of known genes or proteins,including some involved in transcription, cell adhesion and signaling, and metabolism. Many appear to be relatedfunctionally to such known proteins as the GTP-ase activating proteins or the ETS family of transcription factors. Othersseem to be new members of existing gene families, for example, the mRNA splicing factor, or of such pseudogenes as theelongation factor Tu.

In addition to those that could be classified, 13 novel genes were identified, including one with high similarity to apredicted ORF of unknown function in the roundworm Caenorhabditis elegans. [Source: Adapted from graph provided by

Linda Ashworth, LLNL]

0 2 4 6 8 10 12 14

Cell surface

Number of genes

Energy metabolismTranscription or Translational machinery

Transcription factorsReceptors

Signal transductionDevelopmental control

Structural or CytoskeletalTransportersProteases

Pseudogenes

Fun

ctio

nal c

lass

ifica

tion

Novel


30

analytical genomics team provides sta-tistical and advanced algorithmic exper-tise. Tasks include development ofmodel-based methods for data capture,signal processing, and feature extractionfor DNA sequence and fingerprintingdata and analysis of the effectiveness ofnewly proposed methods for sequencingand mapping.

InstrumentationThe instrumentation group also hasmultiple components. Group membersprovide expertise in instrumentation andautomation in high-throughput electro-phoresis, preparation of high-densityreplicate DNA and colony filters, fluo-rescence labeling technologies, and au-tomated sample handling for DNAsequencing. To facilitate seamless inte-gration of new technologies into pro-duction use, this group is coupledtightly to the biologist user groups andthe informatics group.

CollaborationsThe center interacts extensively withother efforts within the LLNL Biologyand Biotechnology Research Programand with other programs at LLNL, theacademic community, other research in-stitutes, and industry. More than 250collaborations range from simple probeand clone sharing to detailed gene fam-ily studies. The following list reflectssome major collaborations.

• Integration of the genetic map of hu-man chromosome 19 with correspond-ing mouse chromosomes (ORNL).

• Miniaturized polymerase chain reac-tion instrumentation (LLNL).

• Sequencing of IMAGE ConsortiumcDNA clones (Washington Univer-sity, St. Louis).

• Mapping and sequencing of a geneassociated with Finnish congenitalnephrotic syndrome (University ofOulu, Finland).

AccomplishmentsThe LLNL Human Genome Center hasexcelled in several areas, includingcomparative genomic sequencing ofDNA repair genes in human and rodentspecies, construction of a metric physi-cal map of human chromosome 19, anddevelopment and application of newbiochemical and mathematical ap-proaches for constructing ordered clonemaps. These and other major accom-plishments are highlighted below.

• Completion of highly accurate se-quencing totaling 1.6 million basesof DNA, including regions spanninghuman DNA repair genes, the candi-date region for a congenital kidneydisease gene, and other regions ofbiological interest on chromo-some 19.

• Completion of comparative sequenceanalysis of 107,500 bases of genomicDNA encompassing the human DNArepair gene ERCC2 and the corre-sponding regions in mouse and ham-ster (p. 32). In addition to ERCC2,analysis revealed the presence of twopreviously undescribed genes in allthree species. One of these genes is anew member of the kinesin motorprotein family. These proteins play awide variety of roles in the cell, in-cluding movement of chromosomesbefore cell division.

• Complete sequencing of human ge-nomic regions containing two addi-tional DNA repair genes. One ofthese, XRCC3, maps to human chro-mosome 14 and encodes a proteinthat may be required for chromo-some stability. Analysis of the ge-nomic sequence identified anotherkinesin motor protein gene physi-cally linked to XRCC3. The secondhuman repair gene, HHR23A, mapsto 19p13.2. Sequence analysis of110,000 bases containing HHR23Aidentified six other genes, five ofwhich are new genes with similarity

Human Genome Program Report, LLNL

31

to proteins from mouse, human,yeast, and Caenorhabditis elegans.

• Complete sequencing of full-lengthcDNAs for three new DNA repairgenes (XRCC2, XRCC3, and XRCC9)in collaboration with the LLNL DNArepair group.

• Generation of a metric physical mapof chromosome 19 spanning at least95% of the chromosome. This uniquemap incorporates a metric scale toestimate the distance between genesor other markers of interest to thegenetics community.

• Assembly of nearly 45 million basesof EcoR I restriction-mapped cosmidcontigs for human chromosome 19using a combination of fingerprintingand cosmid walking. Small gaps incosmid continuity have been spannedby BAC, PAC, and P1 clones, whichare then integrated into the restrictionmaps. The high depth of coverage ofthese maps (average redundancy,4.3-fold) permits selection of a mini-mum overlapping set of clones forDNA sequencing.

• Placement of more than 400 genes,genetic markers, and other loci on thechromosome 19 cosmid map. Also,165 new STSs associated with pre-mapped cosmid contigs were gener-ated and added to the physical map.

• Collaborations to identify the gene(COMP) responsible for two allelicgenetic diseases, pseudoachondro-plasia and multiple epiphyseal dys-plasia, and the identification ofspecific mutations causing eachcondition.

• Through sequence analysis of the 2Asubfamily of the human cytochromeP450 enzymes, identification of anew variant that exists in 10% to20% of individuals and results in re-duced ability to metabolize nicotineand the antiblood-clotting drugCoumadin.

• Location of a zinc finger gene thatencodes a transcription factor regu-lating blood-cell development adja-cent to telomere repeat sequences,possibly the gene nearest one end ofchromosome 19.

• Completion of the genomic andcDNA sequence of the gene for thehuman Rieske Fe-S protein involvedin mitochondrial respiration.

• Expansion of the mouse-human com-parative genomics collaboration withORNL to include study of newgroups of clustered transcription fac-tors found on human chromosome19q and as syntenic homologs onmouse chromosome 7 (p. 32).

• Numerous collaborations (in particu-lar, with Washington University andMerck) continuing to expand theLLNL-based IMAGE Consortium,an effort to characterize the tran-scribed human genome. The IMAGEclone collection is now the largestpublic collection of sequenced cDNAclones, with more than one millionarrayed clones, 800,000 sequences inpublic databases, and 10,000 mappedcDNAs.

• Development and deployment of acomprehensive system to handlesample tracking needs of productionDNA sequencing. The system com-bines databases and graphical inter-faces running on both Mac and Sunplatforms and scales easily to handlelarge-scale production sequencing.

• Expansion of the LLNL genomecenter’s World Wide Web site to in-clude tables that link to each gene be-ing sequenced, to the quality scoresand assembled bases collected eachnight during the sequencing process,and to the submitted GenBank se-quence when a clone is completed.[http://bbrp.llnl.gov/test-bin/projqcsummary]


32

Comparative sequencing of homologous regions inhuman and mouse at LLNL has enhanced the abilityto identify protein-coding (exon) and noncodingDNA regions that have remained unchanged over thecourse of evolution. Colors in the figure below depictsimilarities in mouse and human genes involved inDNA repair, a research interest rooted in DOE’smission to develop better technologies for measuringhealth effects, particularly mutations. [Source: LindaAshworth, LLNL]

Human-Mouse Homologies. LLNL researcher LisaStubbs (above) is shown in the Mouse GeneticsResearch Facility at ORNL. [ORNL photo]

Non-coding conserved element

Human

Mouse

Gene C

Gene C

ERCC2

ERCC2

KLC2

KLC2

5 kb

ERCC2 Region

Exons from “Gene C”Exons of ERCC2 gene

Exons of KLC2 geneLegend

The figure at left demonstrates the genetic similarity(homology) of the superficially dissimilar mouse andhuman species. The similarity is such that humanchromosomes can be cut (schematically at least) intoabout 150 pieces (only about 100 are large enoughto appear here), then reassembled into a reasonableapproximation of the mouse genome. The colors andcorresponding numbers on the mouse chromosomesindicate the human chromosomes containinghomologous segments. [Source: Lisa Stubbs, LLNL]


33

• Implementation of a new database tosupport sequencing and mappingwork on multiple chromosomes andspecies. Web-based automated toolswere developed to facilitate construc-tion of this database, the loading ofover 100 million bytes of chromosome19 data from the existing LLNL data-base, and automated generation ofWeb-based input interfaces.

• Significant enhancement of theLLNL Genome Graphical DatabaseBrowser software to display and linkinformation obtained at a subcosmidresolution from both restriction maphybridization and sequence featuredata. Features, such as genes linkedto diseases, allow tracking to frag-ments as small as 500 base pairs ofDNA.

• Development of advanced micro-fabrication technologies to produceelectrophoresis microchannels inlarge glass substrates for use in DNAsequencing.

• Installation of a new filter-spottingrobot that routinely produces 6 × 6× 384 filters. A 16× 16 × 384 patternhas been achieved.

• Upgrade of the Lawrence BerkeleyNational Laboratory colony pickerusing a second computer so that im-aging and picking can occur simulta-neously.

Futur e PlansGenomic sequencing currently is thedominant function of Livermore’s Hu-man Genome Center. The physical map-ping effort will ensure an ample supplyof sequence-ready clones. For sequenc-ing targets on chromosome 19, this in-cludes ensuring that the most stableclones (cosmids, BACs, and PACs) areavailable for sequencing and that re-gions with such known physical land-marks as STSs and expressed sequencedtags (ESTs) are annotated to facilitatesequence assembly and analysis. The

following targets are emphasized forDNA sequencing:

• Regions of high gene density, includ-ing regions containing gene families.

• Chromosome 19, of which at least 42million bases are sequence ready.

• Selected BAC and PAC clones repre-senting regions of about 0.2 millionto 1 million bases throughout thehuman genome; clones would beselected based on such high-prioritybiological targets as genes involvedin DNA repair, replication, recombi-nation, xenobiotic metabolism, cell-cycle checkpoints, or other specifictargets of interest.

• Selected BAC and PAC clones frommouse regions syntenic with thegenes indicated above.

• Full-insert cDNAs corresponding tothe genomic DNA being sequenced.

The informatics team is continuing todeploy broader-based supporting data-bases for both mapping and sequencing.Where appropriate, Web- and Java-basedtools are being developed to enable bi-ologists to interact with data. Recent re-organization within this group enablesbetter direct support to the sequencinggroup, including evaluating and inter-facing sequence-assembly algorithmsand analysis tools, data and processtracking, and other informatics func-tions that will streamline the sequencingprocess.

The instrumentation effort has threemajor thrusts: (1) continued develop-ment or implementation of laboratoryautomation to support high-throughputsequencing; (2) development of thenext-generation DNA sequencer; and(3) development of robotics to supporthigh-density BAC clone screening. Thelast two goals warrant further expla-nation.

The new DNA sequencer being devel-oped under a grant from the NationalInstitutes of Health, with minor support


34

through the DOE genome center, is de-signed to run 384 lanes simultaneouslywith a low-viscosity sieving medium.The entire system would be loaded au-tomatically, run, and set up for the nextrun at 3-hour intervals. If successful, itshould provide a 20- to 40-fold increasein throughput over existing machines.

An LLNL-designed high-precision spot-ting robot, which should allow a densityof 98,304 spots in 96 cm2, is now oper-ating. The goal of this effort is to createhigh-density filters representing a 10×BAC coverage of both human andmouse genomes (30,000 clones = 1×coverage). Thus each filter would pro-vide ~3× coverage, and eight such filterswould provide the desired coverage forboth genomes. The filters would be hy-bridized with amplicons from individualor region-specific cDNAs and ESTs;given the density of the BAC libraries,clones that hybridize should represent abinned set of BACs for a region of in-terest. These BACs could be the initialsubstrate for a BAC sequencing strategy.Performing hybridizations in parallel inmouse and human DNA facilitates thedevelopment of the mouse map (withORNL involvement), and sequencing

BACs from both species identifiesevolutionarily conserved and, perhaps,regulatory regions.

Information generated by sequencinghuman and mouse DNA in parallel isexpected to expand LLNL efforts infunctional genomics. Comparative se-quence data will be used to develop ahigh-resolution synteny map of con-served mouse-human domains andincorporate automated northern ex-pression analysis of newly identifiedgenes. Long range, the center hopes totake advantage of a variety of forms ofexpression analysis, including site-directed mutation analysis in the mouse.

SummaryThe Livermore Human Genome Centerhas undergone a dramatic shift in empha-sis toward commitment to large-scale,high-accuracy sequencing of chromo-some 19, other chromosomes, and tar-geted genomic regions in the humanand mouse. The center also is commit-ted to exploiting sequence informationfor functional genomics studies and forother programs, both in house andcollaboratively.


35

Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Los Alamos National Laboratory Center for Human Genome Studies


iological research was ini-tiated at Los Alamos Na-tional Laboratory (LANL)in the 1940s, when thelaboratory began to inves-

tigate the physiological and geneticconsequences of radiation exposure.Eventual establishment of the nationalgenetic sequence databank calledGenBank, the National Flow CytometryResource, numerous related individualresearch projects, and fulfillment of a keyrole in the National Laboratory Gene Li-brary Project all contributed to LANL’s se-lection as the site for the Center forHuman Genome Studies in 1988.

Center Organizationand ActivitiesThe LANL genome center is organizedinto four broad areas of research and sup-port: Physical Mapping, DNA Sequenc-ing, Technology Development, andBiological Interfaces. Each area consistsof a variety of projects, and work is dis-tributed among five LANL Divisions(Life Sciences; Theoretical; Computing,Information, and Communications;Chemical Science and Technology; andEngineering Sciences and Applications).Extensive interdisciplinary interactionsare encouraged.

Physical MappingThe construction of chromosome- andregion-specific cosmid, bacterial artifi-cial chromosome (BAC), and yeast artifi-cial chromosome (YAC) recombinantDNA libraries is a primary focus ofphysical mapping activities at LANL.Specific work includes the constructionof high-resolution maps of human chro-mosomes 5 and 16 and associatedinformatics and gene discovery tasks.

Accomplishments

• Completion of an integrated physicalmap of human chromosome 16 con-sisting of both a low-resolution YAC

contig map and a high-resolutioncosmid contig map (pp. 37–39).With sequence tagged site (STS)markers provided on average every125,000 bases, the YAC-STS mapprovides almost-complete coverageof the chromosome’s euchromaticarms. All available loci continue tobe incorporated into the map.

• Construction of a low-resolution STSmap of human chromosome 5 con-sisting of 517 STS markers region-ally assigned by somatic-cell hybridapproaches. Around 95% mega-YAC–STS coverage (50 millionbases) of 5p has been achieved. Ad-ditionally, about 40 million bases of5q mega-YAC–STS coverage havebeen obtained collaboratively.

• Refinement of BAC cloning proce-dures for future production ofchromosome-specific libraries.Successful partial digestion and clon-ing of microgram quantities of chro-mosomal DNA embedded in agaroseplugs. Efforts continue to increasethe average insert size to about100,000 bases.

DNA SequencingDNA sequencing at the LANL centerfocuses on low-pass sample sequencing(SASE) of large genomic regions. SASEdata is deposited in publicly availabledatabases to allow for wide distribution.Finished sequencing is prioritized frominitial SASE analysis and pursued by par-allel primer walking. Informatics devel-opment includes data tracking, gene-discovery integration with the SequenceComparison ANalysis (SCAN) program,and functional genomics interaction.

Accomplishments

• SASE sequencing of 1.5 millionbases from the p13 region of humanchromosome 16.

• Discovery of more than 100 genes inSASE sequences.

Bhttp://www-ls.lanl.gov/masterhgp.html

*Now at University of Califor-nia, Irvine

Center for Human GenomeStudies

Los Alamos National LaboratoryP.O. Box 1663Los Alamos, NM 87545

Larry L. DeavenActing Dir ector505/667-3912, Fax: [email protected]

Lynn ClarkTechnical Coordinator505/667-9376, Fax: [email protected]

Robert K. MoyzisDir ector, 1989–97*

In lieu of individual abstracts,research projects and investi-gators at the LANL Center forHuman Genome Studies arerepresented in this narrative.More information can be foundon the center’s Web site (seeURL above).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .


36

• Generation of finished sequencefor a 240,000-base telomeric re-gion of human chromosome 7q.From initial sequences generatedby SASE, oligonucleotides weresynthesized and used for primerwalking directly from cosmidscomprising the contig map. Com-plete sequencing was performed todetermine what genes, if any, arenear the 7q terminus. This intri-guing region lacks significantblocks of subtelomeric repeat DNAtypically present near eukaryotictelomeres.

• Complete single-pass sequencing of2018 exon clones generated fromLANL’s flow-sorted human chromo-some 16 cosmid library. About 950discrete sequences were identified bysequence analysis. Nearly 800 appearto represent expressed sequencesfrom chromosome 16.

• Development of Sequence Viewer todisplay ABI sequences with tracedata on any computer having anInternet connection and a NetscapeWorld Wide Web browser.

• Sequencing and analysis of a novelpericentromeric duplication of agene-rich cluster between 16p11.1and Xq28 (in collaboration withBaylor College of Medicine).

Technology DevelopmentTechnology development encompassesa variety of activities, both short andlong term, including novel vectors forlibrary construction and physical map-ping; automation and robotics tools forphysical mapping and sequencing;novel approaches to DNA sequencinginvolving single-molecule detection;and novel approaches to informaticstools for gene identification.

Accomplishments

• Development of SCAN program forlarge-scale sequence analysis and an-notation, including a translator con-verting SCAN data to GIO format forsubmission to Genome SequenceDataBase.

• Application of flow-cytometric ap-proach to DNA sizing of P1 artificialchromosome (PAC) clones. Less thanone picogram of linear or supercoiledDNA is analyzed in under 3 minutes.Sizing range has been extendeddown to 287 base pairs. Efforts con-tinue to extend the upper limit be-yond 167,000 bases.

• Characterization of the detection ofsingle, fluorescently tagged nucleo-tides cleaved from multiple DNAfragments suspended in the flowstream of a flow cytometer (see pic-ture, p. 70). The cleavage rate forExo III at 37°C was measured to beabout 5 base pairs per second perM13 DNA fragment. To achieve asingle-color sequencing demonstra-tion, either the background burst rate(currently about 5 bursts per second)must be reduced or the exonucleasecleavage rate must be increased sig-nificantly. Techniques to achieveboth are being explored.

• Construction of a simple and com-pact apparatus, based on a diode-pumped Nd:YAG laser, for routineDNA fragment sizing.

• Development of a new approach todetect coding sequences in DNA.This complete spectral analysis ofcoding and noncoding sequences isas sensitive in its first implementa-tions as the best existing techniques.

• Use of phylogenetic relationships togenerate new profiles of amino acidusage in conserved domains. Theprofiles are particularly useful forclassification of distantly relatedsequences.

DOE Human Genome Program Report, LANL

37

Biological InterfacesThe Biological Interfaces effort targetsgenes and chromosome regions asso-ciated with DNA damage and repair,mitotic stability, and chromosome struc-ture and function as primary subjectsfor physical mapping and sequencing.Specific disease-associated genes onhuman chromosome 5 (e.g., Cri-du-Chatsyndrome) and on 16 (e.g., Batten’s dis-ease and Fanconi anemia) are the sub-jects of collaborative biologicalprojects.

Accomplishments

• Identification of two human 7q exonshaving 99% homology to the cDNAof a known human gene, vasoactiveintestinal peptide receptor 2A. Pre-liminary data suggests that theVIPR2A gene is expressed.

• Identification of numerous expressedsequence tags (ESTs) localized to the7q region. Since three of the ESTscontain at least two regions with highconfidence of homology (~90%),genes in addition to VIPR2A mayexist in the terminal region of 7q.

• Generation of high-resolution cosmidcoverage on human chromosome 5pfor the larynx and critical regionsidentified with Cri-du-Chat syndrome,the most common human terminal-deletion syndrome (in collaborationwith Thomas Jefferson University).

• Refinement of the Wolf-Hirschhornsyndrome (WHS) critical region onhuman chromosome 4p. Using theSCAN program to identify geneslikely to contribute to WHS, theproject serves as a model for definingthe interaction between genomic se-quencing and clinical research.

• Collaborative construction of contigsfor human chromosome 16, includ-ing 1.05 million bases in cosmidsthrough the familial Mediterraneanfever (FMF) gene region (with

members of the FMF Consortium)and 700,000 bases in P1 clones en-compassing the polycystic kidneydisease gene (with IntegratedGenetics, Inc.).

• Collaborative identification and de-termination of the complete genomicstructure of the Batten’s disease gene(with members of the BDG Consor-tium), the gamma subunit of the hu-man amiloride-sensitive epithelialchannel (Liddle’s syndrome, withUniversity of Iowa), and the polycys-tic kidney disease gene (with Inte-grated Genetics).

• Participation in an international col-laborative research consortium thatsuccessfully identified the gene re-sponsible for Fanconi anemia type A.

Chromosome 16 Physical Map (pp. 38–39). A condensed chromosome 16physical map constructed at Los Alamos National Laboratory (LANL) isshown in two parts on the following pages. Besides facilitating the isolationand characterization of disease genes, the map provides the framework fora large-scale sequencing effort by LANL, The Institute for GenomicResearch, and the Sanger Centre.

Distinct types of maps and data are shown as levels or tiers on theintegrated map. At the top of each page is a view of the banded humanchromosome to which the map is aligned. A somatic-cell hybrid breakpointmap, which divides the chromosome into 90 intervals, was used as abackbone for much of the map integration.

The physical map consists of both a low-resolution yeast artificialchromosome (YAC) contig map localized to and ordered within thebreakpoint intervals with sequence tagged sites (STSs) and a high-resolution bacteria-based clone map. The YAC-STS map provides almostcomplete coverage of the chromosome’s euchromatic arm, with STSmarkers on average every 100,000 bases.

A high-resolution, sequence-ready cosmid contig map is anchored to theYAC and breakpoint maps via STSs developed from cosmid contigs and byhybridizations between YACs and cosmids.

As part of the ongoing effort to incorporate all available loci onto a singlemap of this chromosome, the integrated map also features genes, expressedsequence tags, exons (gene-coding regions), and genetic markers.

The mouse chromosome segments at the bottom of the map contain groupsthat correspond to human genes mapped to the regions shown above them.[Source: Norman Doggett, LANL]


38

Hum

an C

hrom

osom

e 16


39

Hum

an C

hrom

osom

e 16


40

Patents, Licenses, andCRADAs• Rhett L. Affleck, James N. Demas,

Peter M. Goodwin, Jay A. Schecker,Ming Wu, and Richard A. Keller,“Reduction of Diffusional Defocusingin Hydrodynamically Focused Flowsby Complexing with a High MolecularWeight Adduct,” United States Patent,filed December 1996.

• R.L. Affleck, W.P. Ambrose, J.D.Demas, P.M. Goodwin, M.E. Johnson,R.A. Keller, J.T. Petty, J.A. Schecker,and M. Wu, “Photobleaching to Re-duce or Eliminate Luminescent Impu-rities for Ultrasensitive LuminescenceAnalysis,” United States Patent, S-87,208, accepted September 1997.

The exhibit “Understanding Our Genetic Inheritance” at the BradburyScience Museum in Los Alamos, New Mexico, describes the LANL Centerfor Human Genome Studies’ contributions to the Human Genome Project.The exhibit’s centerpiece is a 16-foot-long version of LANL’s map of humanchromosome 16. [Source: LANL Center for Human Genome Studies]

• J.H. Jett, M.L. Hammond,R.A. Keller, B.L.Marrone, and J.C. Martin,“DNA Fragment Sizingand Sorting by Laser-Induced Fluorescence,”United States Patent,S.N. 75,001, allowedMay 1996.

• James H. Jett, “Methodfor Rapid Base Sequenc-ing in DNA and RNAwith Three Base Label-ing,” in preparation.

• Development license andexclusive license toLANL’s DNA sizingpatent obtained by Mo-lecular Technology, Inc.,for commercialization ofsingle-molecule detectioncapability to DNA sizing.

Futur e PlansLANL has joined a collabo-

ration with California Institute of Tech-nology and The Institute for GenomicResearch to construct a BAC map ofthe p arm of human chromosome 16and to complete the sequence of a 20-million–base region of this map.

In its evolving role as part of the newDOE Joint Genome Institute, LANLwill continue scaleup activities focusedon high-throughput DNA sequencing.Initial targets include genes and DNAregions associated with chromosomestructure and function, syntenic break-points, and relevant disease-gene loci.

A joint DNA sequencing center was es-tablished recently by LANL at the Uni-versity of New Mexico. This facility isresponsible for determining the DNAsequence of clones constructed at LANL,then returning the data to LANL foranalysis and archiving.


41

Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Lawrence Berkeley National Laboratory Human Genome Center

SHuman Genome CenterLawr ence Berkeley National Laboratory1 Cyclotron RoadBerkeley, CA 94720

Contact:Mohandas Narla510/486-7029, Fax: [email protected]

Joyce PfeifferAdministrative Assistant

Michael Palazzolo*Dir ector, 1996–97

http://www-hgc.lbl.gov/GenomeHome.html


ince 1937 the Ernest Or-lando Lawrence BerkeleyNational Laboratory(LBNL) has been a majorcontributor to knowledge

about human health effects resultingfrom energy production and use. Thatwas the year John Lawrence went toBerkeley to use his brother Ernest’scyclotrons to launch the application ofradioactive isotopes in biological andmedical research. Fifty years later,Berkeley Lab’s Human Genome Centerwas established.

Now, after another decade, an expansionof biological research relevant to Hu-man Genome Project goals is being car-ried out within the Life SciencesDivision, with support from the Infor-mation and Computing Sciences andEngineering divisions. Individuals inthese research projects are makingimportant new contributions to thekey fields of molecular, cellular, andstructural biology; physical chemistry;data management; and scientific instru-mentation. Additionally, industry in-volvement in this growing venture isstimulated by Berkeley Lab’s locationin the San Francisco Bay area, home tothe largest congregation of biotechnol-ogy research facilities in the world.

In July 1997 the Berkeley genomecenter became part of the Joint GenomeInstitute (see p. 26).

SequencingLarge-scale genomic sequencing hasbeen a central, ongoing activity at Ber-keley Lab since 1991. It has beenfunded jointly by DOE (for human ge-nome production sequencing and tech-nology development) and the NIHNational Human Genome Research In-stitute [for sequencing the Drosophilamelanogaster model system, which iscarried out in partnership with the Uni-versity of California, Berkeley (UCB)].The human genome sequencing area atBerkeley Lab consists of five groups:

Bioinstrumentation, Automation,Informatics, Biology, and Development.Complementing these activities is agroup in Life Sciences Division devotedto functional genomics, including thetransgenics program.

The directed DNA sequencing strategyat Berkeley Lab was designed andimplemented to increase the efficiencyof genomic sequencing (see figure,p. 45). A key element of the directed ap-proach is maintaining information aboutthe relative positions of potential se-quencing templates throughout the entiresequencing process. Thus, intelligentchoices can be made about which tem-plates to sequence, and the number ofselected templates can be kept to aminimum. More important, knowledgeof the interrelationship of sequencingruns guides the assembly process, mak-ing it more resistant to difficulties im-posed by repeated sequences. As ofJuly 3, 1997, Berkeley Lab had generated4.4 megabases of human sequence and,in collaboration with UCB, had tallied7.6 megabases of Drosophila sequence.

Instrumentation andAutomationThe instrumentation and automationprogram encompasses the design andfabrication of custom apparatus to facili-tate experiments, the programming oflaboratory robots to automate repetitiveprocedures, and the development of(1) improved hardware to extend theapplicability range of existing commer-cial robots and (2) an integrated operat-ing system to control and monitorexperiments. Although some discreteinstrumentation modules used in theintegrated protocols are obtained com-mercially, LBNL designs its own custominstruments when existing capabilities areinadequate. The instrumentation modulesare then integrated into a large systemto facilitate large-scale productionsequencing. In addition, a significanteffort is devoted to improving

In lieu of individual abstracts,research projects and investi-gators at the LBNL HumanGenome Center are repre-sented in this narrative. Moreinformation can be found onthe center’s Web site (see URLabove).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

*Now at Amgen, Inc.


42

fluorescence-assay methods, includingDNA sequence analysis and mass spec-trometry for molecular sizing.

Recent advances in the instrumentationgroup include DNA Prep machine andPrep Track. These instruments are de-signed to automate completely the highlyrepetitive and labor-intensive DNA-preparation procedure to provide higherdaily throughput and DNA of consistentquality for sequencing (see photos, p. 43,and Web pages: http://hgighub.lbl.gov/esd/DNAPrep/TitlePage.html and http://hgighub.lbl.gov/esd/prepTrackWebpage/preptrack.htm).

Berkeley Lab’s near-term needs are for960 samples per day of DNA extractedfrom overnight bacteria growths. TheDNA protocol is a modified boil prepprepared in a 96-well format. Overnightbacteria growths are lysed, and samplesare separated from cell debris by cen-trifugation. The DNA is recovered byethanol precipitation.

InformaticsThe informatics group is focused onhardware and software support andsystem administration, software

development for end sequencing,transposon mapping and sequence tem-plate selection, data-flow automation,gene finding, and sequence analysis.Data-flow automation is the main em-phasis. Six key steps have been identi-fied in this process, and software isbeing written and tested to automate allsix. The first step involves controllinggel quality, trimming vector sequence,and storing the sequences in a database.A program module called Move-Track-Trim, which is now used in production,was written to handle these steps. Thesecond through fourth steps in this pro-cess involve assembling, editing, andreconstructing P1 clones of 80,000 basepairs from 400-base traces. The fifthstep is sequence annotation, and thesixth is data submission.

Annotation can greatly enhance the bio-logical value of these sequences. Usefulannotations include homologies toknown genes, possible gene locations,and gene signals such as promoters.LBNL is developing a workbench forautomatic sequence annotation and an-notation viewing and editing. The goalis to run a series of sequence-analysistools and display the results to comparethe various predictions. Researchersthen will be able to examine all the an-notations (for example, genes predictedby various gene-finding methods) andselect the ones that look best.

Nomi Harris developed Genotator, anannotation workbench consisting of astand-alone annotation browser and sev-eral sequence-analysis functions. Theback end runs several gene finders,homology searches (using BLAST),and signal searches and saves the resultsin “.ace” format. Genotator thus auto-mates the tedious process of operating adozen different sequence-analysis pro-grams with many different input andoutput formats. Genotator can functionvia command-line arguments or withthe graphical user interface (http://www-hgc.lbl.gov/inf/annotation.html).

DNA Prep Machine. The DNAPrep machine (above) wasdesigned by Berkeley Lab’sMartin Pollard to performplasmid preparation on 192samples (2 microtiter plates)in about 2.5 to 4 hours,depending on the protocol.Controlled by a personalcomputer running a VisualBasic Control program, theinstrument includes a gantryrobot equipped with pipettors,reagent dispensers, hot andcold temperature stations, anda pneumatic gripper. [Source:LBNL]

DOE Human Genome Program Report, LBNL

43

Prep Track. Developed at the Berkeley Lab, Prep Track is ahigh-throughput, microtiter-plate, liquid-handling roboticsystem for automating DNA preparation procedures.Microtiter plates are fetched from cassettes, moved to one oftwo conveyor belts, and transported to protocol-defined modules.Plates are moved continuously and automatically through the system as each modulesimultaneously processes plates in the module lift stations. The plates exit the system and arestored in microtiter-plate cassettes.

Modules include a station capable of dispensing liquids in volumes from as low as 5 microlitersto several milliliters, four 96-channel pipettors, and the plate-fetching module. Each module iscontrolled independently by programmable logic controllers (PLCs). The overall system iscontrolled by a personal computer and a Visual Basic Control master that determines the orderin which plates are processed. The actions of each lift station and dispenser or pipettor aredetermined locally by programs resident in each module’s PLC. The Visual Basic Controlprogram moves the plates through the system based on the predefined protocol and on modulestatus reports as monitored by PLCs.

The current belt length on the Prep Track supports eight standard modules, which can bereconfigured to any order. Standardization of mechanical, electrical, and communicationcomponents allows new modules to be designed and manufactured easily. The current standardmodule footprint is 250 mm wide, 600 mm deep, and 250 mm to the conveyor belt deck. The firstprotocol to be implemented on Prep Track will be polymerase chain reaction setups, withsequence-reaction setups to follow. [Source: LBNL]


44

Progress to Date

Chromosome 5Over the last year, the center has focusedits production genomic sequencing on thedistal 40 megabases of the human chro-mosome 5 long arm. This region was cho-sen because it contains a cluster of growthfactor and receptor genes and is likely toyield new and functionally related genesthrough long-range sequence analysis.Results to date include:

• 40-megabase nonchimeric map con-taining 82 yeast artificial chromosomes(YACs) in the chromosome 5 distallong arm.

• 20-megabase contig map in the regionof 5q23-q33 that contains 198 P1s, 60P1 artificial chromosomes, and 495bacterial artificial chromosomes(BACs) linked by 563 sequencedtagged sites (STSs) to form contigs.

• 20-megabase bins containing 370 BACsin 74 bins in the region of 5q33-q35.

Chromosome 21An early project in the study of Downsyndrome (DS), which is characterized bychromosome 21 trisomy, constructed ahigh-resolution clone map in the chromo-some 21 DS region to be used as a pilotstudy in generating a contiguous genemap for all of chromosome 21. Thisproject has integrated P1 mapping effortswith transgenic studies in the Life Sci-ences Division. P1 maps provide a suit-able form of genomic DNA for isolatingand mapping cDNA.

• 186 clones isolated in the major DS re-gion of chromosome 21 comprisingabout 3 megabases of genomic DNAextending from D21S17 to ETS2.Through cross-hybridization, overlap-ping P1s were identified, as well asgaps between two P1 contigs, andtransgenic mice were created from P1clones in the DS region for use in phe-notypic studies.

Transgenic MiceOne of the approaches for determiningthe biological function of newly identi-fied genes uses YAC transgenic mice.Human sequence harbored by YACs intransgenic mice has been shown to becorrectly regulated both temporally andspatially. A set of nonchimeric overlap-ping YACs identified from the 5q31 re-gion has been used to create transgenicmice. This set of transgenic mice, whichtogether harbor 1.5 megabases of hu-man sequence, will be used to assess theexpression pattern and potential func-tion of putative genes discovered in the5q31 region. Additional mapping andsequencing are under way in a region ofhuman chromosome 20 amplified incertain breast tumor cell lines.

Resource for MolecularCytogeneticsDivining landmarks for human diseaseamid the enormous plain of the humangenetic map is the mission of an ambi-tious partnership among the BerkeleyLab; University of California, San Fran-cisco; and a diagnostics company. Thecollaborative Resource for MolecularCytogenetics is charting a course towardimportant sites of biological interest onthe 23 pairs of human chromosomes(http://rmc-www.lbl.gov).

The Resource employs the many toolsof molecular cytogenetics. The mostbasic of these tools, and the cornerstoneof the Resource’s portfolio of proprietarytechnology, is a method generally knownas “chromosome painting,” which usesa technique referred to as fluorescencein situ hybridization or FISH. This tech-nology was invented by LBNL Re-source leaders Joe Gray and Dan Pinkel.

A technology to emerge recently fromthe Resource is known as “QuantitativeDNA Fiber Mapping (QDFM).” High-resolution human genome maps in aform suitable for DNA sequencing tra-ditionally have been constructed by


45

Sequencing Strategy. The directed sequencing strategy used at LBNL involves four steps: (1) generate aP1-based physical map (using STS-content mapping) to provide a set of minimally overlapping clones,(2) shear and subclone each P1 clone into 3-kilobase fragments and identify a minimally overlappingsubclone set, (3) generate and map transposon inserts in each subclone, and (4) sequence usingcommercial primer-binding sites engineered into the transposon. Subclone sequences are then assembledand edited, and the gaps are identified. P1 clones are reconstructed, and the resulting composite data isanalyzed, annotated, and finally submitted to the databases. The production sequencing effort hasgenerated 12 megabases of finished, double-stranded genomic DNA sequence from both Drosophilaand human templates. [Source: Adapted from figure provided by LBNL]

–400 –200 +200 +400 (bp)0

0

0

0 100 200 300 (kb)

20 40 60 80 (kb)

1 2 3 (kb)

2 sequencing runsfrom each selectedtransposon

generate set of transposon insertionsin each 3-kb subclone in the spanning set

mapped transposons (subset to be sequenced shown in solid color)

single mapping clone

minimal spanning setof 3-kb subclones

1–

2–

shear and subclone physical mapping clone

generate spanning set using end sequencing

physical mapping clones

genomic region

sts 1 sts 2 sts 3 sts 4

various methods of fingerprinting, hybrid-ization, and identification of overlappingSTSs. However, these techniques do notreadily yield information about sequenceorientation, the extent of overlap of theseelements, or the size of gaps in the map.Ulli Weier of the Resource developed theQDFM method of physical map assemblythat enables the mapping of cloned DNAdirectly onto linear, fully extended DNA

molecules. QDFM allows unambiguousassembly of critical elements leading tohigh-resolution physical maps. This tasknow can be accomplished in less than2 days, as compared with weeks by con-ventional methods. QDFM also enablesdetection and characterization of gaps inexisting physical maps—a crucial steptoward completing a definitive humangenome map.


46

Lawrence Livermore National Laboratory scientist Stephanie Stilwagen loads a sample into anautomated DNA sequencing system. [Source: Linda Ashworth, LLNL]


47

he Human Genome Projectsoon will need to increaserapidly the scale at whichhuman DNA is analyzed.The ultimate goal is to de-

termine the order of the 3 billion basesthat encode all heritable information.During the 20 years since effectivemethods were introduced to carry outDNA sequencing by biochemical analy-sis of recombinant-DNA molecules,these techniques have improved dra-matically. In the late 1970s, segments ofDNA spanning a few thousand baseschallenged the capacity of world-classsequencing laboratories. Now, a fewmillion base pairs per year representstate-of-the-art output for a single se-quencing center.

However, the Human Genome Project isdirected toward completing the humansequence in 5 to 10 years, so the datamust be acquired with technology avail-able now. This goal, while clearly fea-sible, poses substantial organizationaland technical challenges. Organization-ally, genome centers must begin build-ing data-production units capable ofsustained, cost-effective operation.Technically, many incremental refine-ments of current technology must be in-troduced, particularly those that removeimpediments to increasing the scale ofDNA sequencing. The University ofWashington (UW) Genome Center isactive in both areas.

Production SequencingBoth to gain experience in the productionof high-quality, low-cost DNA sequenceand to generate data of immediate bio-logical interest, the center is sequencingseveral regions of human and mouseDNA at a current throughput of 2 mil-lion bases per year. This “production se-quencing” has three major targets: thehuman leukocyte antigen (HLA) locuson human chromosome 6, the mouse lo-cus encoding the alpha subunit of T-cellreceptors, and an “anonymous” regionof human chromosome 7.

Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

University of Washington Genome Center

TThe HLA locus encodes genes that mustbe closely matched between organ donorsand organ recipients. This sequence datais expected to lead to long-term improve-ments in the ability to achieve goodmatches between unrelated organ donorsand recipients.

The mouse locus that encodes compo-nents of the T-cell–receptor family is ofinterest for several reasons. The locusspecifies a set of proteins that play acritical role in cell-mediated immune re-sponses. It provides sequence data thatwill help in the design of new experi-mental approaches to the study of immu-nity in mice—one of the most importantexperimental animals for immunologicalresearch. In addition, the locus will pro-vide one of the first large blocks of DNAsequence for which both human andmouse versions are known.

Human-mouse sequence comparisonsprovide a powerful means of identifyingthe most important biological features ofDNA sequence because these features areoften highly conserved, even betweensuch biologically different organisms ashuman and mouse. Finally, sequencingan “anonymous” region of human chro-mosome 7, a region about which littlewas known previously, provides experi-ence in carrying out large-scale sequenc-ing under the conditions that will prevailthroughout most of the Human GenomeProject.

Technology for Large-Scale SequencingIn addition to these pilot projects, theUW Genome Center is developing incre-mental improvements in current sequenc-ing technology. A particular focus is onenhanced computer software to processraw data acquired with automated labora-tory instruments that are used in DNAmapping and sequencing. Advanced in-strumentation is commercially availablefor determining DNA sequence via the“four-color–fluorescence method,” andthis instrumentation is expected to carry

University of WashingtonGenome Center

Department of MedicineBox 352145Seattle, WA 98195

Maynard OlsonDir ector206/685-7366, Fax: [email protected]

http://www.genome.washington.edu

For more information onresearch projects and investi-gators at the University ofWashington Genome Center,see abstracts in Part 2 of thisreport and the center’s Website (see URL above).

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .


48

the main experimental load of the HumanGenome Project. Raw data produced bythese instruments, however, require ex-tensive processing before they are readyfor biological analysis.

Large-scale sequencing involves a “divide-and-conquer” strategy in which the hugeDNA molecules present in human cellsare broken into smaller pieces that can bepropagated by recombinant-DNAmethods. Individual analyses ultimatelyare carried out on segments of less than1000 bases. Many such analyses, each ofwhich still contains numerous errors, mustbe melded together to obtain finished se-quence. During the melding, errors in in-dividual analyses must be recognized andcorrected. In typical large-scale sequenc-ing projects, the results of thousands ofanalyses are melded to produce highlyaccurate sequence (less than one error in10,000 bases) that is continuous inblocks of 100,000 or more bases. TheUW Genome Center is playing a majorrole in developing software that allows thisprocess to be carried out automaticallywith little need for expert intervention.Software developed in the UW center isused in more than 50 sequencing laborato-ries around the world, including most ofthe large-scale sequencing centers produc-ing data for the Human Genome Project.

High-ResolutionPhysical MappingThe UW Genome Center also is develop-ing improved software that addresses ahigher-level problem in large-scale se-quencing. The starting point for large-scalesequencing typically is a recombinant-DNA molecule that allows propagationof a particular human genomic segmentspanning 50,000 to 200,000 bases.Much effort during the last decade hasgone into the physical mapping of suchmolecules, a process that allows hugeregions of chromosomes to be defined

in terms of sets of overlappingrecombinant-DNA molecules whoseprecise positions along the chromosomeare known. However, the precision re-quired for knowing relationships ofrecombinant-DNA molecules derivedfrom neighboring chromosomal por-tions increases as the Human GenomeProject shifts its emphasis from map-ping to sequencing.

High-resolution maps both guide the or-derly sequencing of chromosomes andplay a critical role in quality control.Only by mapping recombinant-DNAmolecules at high resolution can subtledefects in particular molecules be rec-ognized. Such defective human DNAsources, which are not faithful replicasof the human genome, must be weededout before sequencing can begin. TheUW Genome Center has a major programin high-resolution physical mappingwhich, like the work on sequencing it-self, uses advanced computing tools.The center is producing maps of regionstargeted for sequencing on a just-in-time basis. These highly detailed mapsare proving extremely valuable in fa-cilitating the production of high-qualitysequence.

Ultimate GoalAlthough many challenges currentlyposed by the Human Genome Projectare highly technical, the ultimate goal isbiological. The project will deliverimmense amounts of high-quality,continuous DNA sequence into pub-licly accessible databases. These datawill be annotated so that biologists whouse them will know the most likelypositions of genes and have convenientaccess to the best available clues aboutthe probable function of these genes.The better the technical solutions to cur-rent challenges, the better the centerwill be able to serve future users of thehuman genome sequence.

DOE Human Genome Program Report, University of W ashington

49

T

Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Genome Database

Genome DatabaseJohns Hopkins University2024 E. Monument StreetBaltimor e, MD 21205-2236

Stanley LetovskyInformatics Dir ector

Robert CottinghamOperations Director

Telephone for both: 410/955-9705Fax for both: 410/614-0434

David KingsburyDir ector, 1993–97*

http://www.gdb.org

*Now at Chiron Pharmaceuti-cals, Emeryville, California

In lieu of individual abstracts,research projects and investi-gators at GDB are representedin this narrative. More infor-mation can be found on GDB’sWeb site (see URL above).

. . . . . . . . . . . . . . . . . . . . . . . . .

he release of Version 6 of theGenome Database (GDB) inJanuary 1996 signaled a ma-jor change for both the scien-tific community and GDB

staff. GDB 6.0 introduced a number ofsignificant improvements over previousversions of GDB, most notably a reviseddata representation for genes and ge-nomic maps and a new curatorial modelfor the database. These new features,along with a remodeled database structureand new schema and user interface, pro-vide a resource with the potential to inte-grate all scientific information currentlyavailable on human genomics. GDB rap-idly is becoming the international biomedi-cal research community’s central sourcefor information about genomic structure,content, diversity, and evolution.

A New Data ModelInherent in the underlying organization ofinformation in GDB is an improvedmodel for genes, maps, and other classesof data. In particular, genomic segments(any named region of the genome) andmaps are being expanded regularly. Newsegment types have been added to supportthe integration of mapping and sequencingdata (for example, gene elements and re-peats) and the construction of comparativemaps (syntenic regions). New map typesinclude comparative maps for represent-ing conserved syntenies between speciesand comprehensive maps that combinedata from all the various submitted mapswithin GDB to provide a single integratedview of the genome. Experimental obser-vations such as order, size, distance, andchimerism are also available.

Through the World Wide Web, GDB linksits stored data with many other biologicalresources on the Internet. GDB’s ExternalLink category is a growing collection ofcross-references established betweenGDB entities and related information inother databases. By providing a place forthese cross-references, GDB can serve asa central point of inquiry into technicaldata regarding human genomics.

Dir ect CommunityData Submission andCurationTwo methods for data submission are inuse. For individuals submitting smallamounts of data, interactive editing ofthe database through the Web becameavailable in April 1996, and the processhas undergone several simplificationssince that time. This continues to be anarea of development for GDB becauseall editing must take place at the Balti-more site, and Internet connectionsfrom outside North America may be tooslow for interactive editing to be practi-cal. Until these difficulties are resolved,GDB encourages scientists with limitedconnectivity to Baltimore to submittheir data via more traditional means(e-mail, fax, mail, phone) or to prepareelectronic submissions for entry by thedata group on site.

For centers submitting large quantitiesof data, GDB developed an electronicdata submission (EDS) tool, which pro-vides the means to specify login pass-word validation and commands forinserting and updating data in GDB.The EDS syntax includes a mechanismfor relating a center’s local naming con-ventions to GDB objects. Data submit-ted to GDB may be stored privately forup to 6 months before it automaticallybecomes public. The database is pro-grammed to enforce this Human GenomeProject policy. Detailed specificationsof GDB’s EDS syntax and other sub-mission instructions are available (EDSprototype, http://www.gdb.org/eds).

Since the EDS system was imple-mented, GDB has put forth an aggres-sive effort to increase the amount ofdata stored in the database. Conse-quently, the database has grown tremen-dously. During 1996 it grew from 1.8 to6.7 gigabytes.

To provide accountability regarding dataquality, the shift to community curationintroduced the idea that individuals and


50

laboratories own the data they submit toGDB and that other researchers cannotmodify it. However, others should beable to add information and comments,so an additional feature is the commu-nity’s ability to conduct electroniconline public discussions by annotatingthe database submissions of fellow re-searchers. GDB is the first database ofits kind to offer this feature, and thenumber of third-party annotations isincreasing in the form of editorial com-mentary, links to literature citations, andlinks to other databases external toGDB. These links are an important partof the curatorial process because theymake other data collections available toGDB users in an appropriate context.

Impr oved MapRepresentationand QueryingAccompanying the release of GDB 6.0,the program Mapview creates graphicaldisplays of maps. Mapview was devel-oped at GDB to display a number ofmap types (cytogenetic, radiation hybrid,contig, and linkage) using commongraphical conventions found in the lit-erature. Mapview is designed to standalone or to be used in conjunction witha Web browser such as Netscape, therebycreating an interactive graphical displaysystem. When used with Netscape,Mapview allows the user to retrieve de-tails about any displayed map object.

Maps are accessed through the queryform for genomic segment and its sub-classes via a special program that al-lows the user to select whole maps orslices of maps from specific regions ofinterest and to query by map type. Theability to browse maps stored in GDBor download them in the backgroundwas also incorporated into GDB 6.0.

GDB stores many maps of each chro-mosome, generated by a variety of map-ping methods. Users who are interested

in a region, such as the neighborhood ofa gene or marker, will be able to see allmaps that have data in that region,whether or not they contain the desiredmarker. To support database queryingby region of interest, integrated mapshave been developed that combine datafrom all the maps for each chromosome.These are called Comprehensive Maps.

Queries for all loci in a region of inter-est are processed against the compre-hensive maps, thereby searching allrelevant maps. Comprehensive maps arealso useful for display purposes becausethey organize the content of a region byclass of locus (e.g., gene, marker, clone)rather than by data source. This approachyields a much less complex presentationthan an alignment of numerous primarymaps. Because such information as de-tailed orders, order discrepancies be-tween maps, and nonlinear metricrelations between maps is not alwayscaptured in the comprehensive maps,GDB continues to provide access toaligned displays of primary maps.

A Variety of SearchingStrategiesRecognizing the eclectic user commu-nity’s need to search data and formulatequeries, GDB offers a spectrum ofsimple to complex search strategies. Inaddition, direct programming access isavailable using either GDB’s objectquery language to the Object Brokersoftware layer or standard query lan-guage to the underlying Sybase rela-tional database.

Querying by Object Directlyfrom GDB’s Home Page

The simplest methods search for objectsaccording to known GDB accessionnumbers; sequence database–accessionnumbers; specified names, includingwildcard symbols that will automaticallymatch synonyms and primary names; andkeywords contained anywhere in the text.

DOE Human Genome Program Report, GDB

51

Querying by Region of Interest

A region of interest can be specified us-ing a pair of flanking markers, whichcan be cytogenetic bands, genes,amplimers (sequence tagged sites), orany other mapped objects. Given a re-gion of interest, the comprehensivemaps are searched to find all loci thatfall within them. These loci can be dis-played in a table, graphically as a slicethrough a comprehensive map, or asslices through a chosen set of primarymaps. A comprehensive map sliceshows all loci in the region, includinggenes, expressed sequence tags (ESTs),amplimers, and clones. A region alsocan be specified as a neighborhoodaround a single marker of interest.

Results of queries for genes, amplimers,ESTs, or clones can be displayed on aGDB comprehensive map. Results arespread across several chromosomes dis-played in Mapview (see figure, p. 52). Aquery for all the PAX genes (specifiedas symbol = PAX* on the gene queryform) retrieves genes on multiple chro-mosomes. Double-clicking on one ofthese genes brings up detailed gene in-formation via the Web browser.

Querying by Polymorphism

GDB contains a large number of poly-morphisms associated with genes andother markers. Queries can be con-structed for a particular type of marker(e.g., gene, amplimer, clone), polymor-phism (i.e., dinucleotide repeat), orlevel of heterozygosity. These queriescan be combined with positional queriesto find, for example, polymorphicamplimers in a region bounded byflanking markers or in a particular chro-mosomal band. If desired, the retrievedmarkers can be viewed on a comprehen-sive map.

Work in Pr ogress

Mapview 2.3Mapview 2.1, the next generation of theGDB map viewer, was released inMarch 1997. The latest version,Mapview 2.3, is available in all com-mon computing environments becauseit is written in the Java programminglanguage. Most important, the newviewer can display multiple alignedmaps side by side in the window, withalignment lines indicating commonmarkers in neighboring maps. As be-fore, users can select individual markersto retrieve more information about themfrom the database.

GDB developers have entered into acollaborative relationship with othermembers of the bioWidget Consortiumso the Java-based alignment viewer willbecome part of a collection of freelyavailable software tools for displayingbiological data (http://goodman.jax.org/projects/biowidgets/consortium).

Future plans for Mapview include pro-viding or enhancing the ability to gener-ate manuscript-ready Postscript mapimages, highlight or modify the displayof particular classes of map objectsbased on attribute values, and requeryfor additional information.

VariationSince its inception, GDB has been a re-pository for polymorphism data, withmore than 18,000 polymorphisms nowin GDB. A collaboration has been initi-ated with the Human Gene MutationDatabase (HGMD) based in Cardiff,Wales, and headed by David Cooperand Michael Krawczak. HGMD’s ex-tensive collection of human mutationdata, covering many disease-causingloci, includes sequence-level mutationcharacterizations. This data set will beincluded in GDB and updated fromHGMD on an ongoing basis. TheHGMD team also will provide advice


52

on GDB’s representation of geneticvariation, which is being enhanced tomodel mutations and polymorphisms atthe sequence level. These modificationswill allow GDB to act as a repositoryfor single-nucleotide polymorphisms,which are expected to be a major sourceof information on human genetic varia-tion in the near future.

GraphicalDisplay ofResults of Queryfor Genes withNames matching“PAX*.” [Source:Robert Cottingham,GDB]

Mouse SyntenyGenomic relationships between mouseand man provide important clues regard-ing gene location, phenotype, and func-tion (see figure, p. 53). One of GDB’sgoals is to enable direct comparisons be-tween these two organisms, in collabora-tion with the Mouse Genome Database


53

Human Map Mouse Maps

SyntenicBlocks

Rearranged Mouse Map Aligned Against Human Chromosome. [Source: Robert Cottingham]


54

at Jackson Laboratory. GDB is makingadditions to its schema to represent thisinformation so that it can be displayedgraphically with Mapview. In addition,algorithmic work is under way to usemapping data to automatically identifyregions of conserved synteny betweenmouse and man. These algorithms willallow the synteny maps to be updatedregularly. An important application ofcomparative mapping is the ability topredict the existence and location of un-known human homologs of known,mapped mouse genes. A set of such pre-dictions is available in a report at theGDB Web site, and similar data will beavailable in the database itself in thespring of 1998.

CollaborationsGDB is a participant in the GenomeAnnotation Consortium (GAC) project,whose goal is to produce high-quality,automatic annotation of genomic se-quences (http://compbio.ornl.gov/CoLab). Currently, GDB is developinga prototype mechanism to transitionfrom GDB’s Mapview display to theGAC sequence-level browser overcommon genome regions. GAC alsowill establish a human genome refer-ence sequence that will be the baseagainst which GDB will refer all poly-morphisms and mutations. Ultimately,every genomic object in GDB should berelated to an appropriate region of thereference sequence.

Sequencing ProgressThe sequencing status of genomic re-gions now can be recorded in GDB.

Based on submissions to sequence data-bases, GAC will determine genomic re-gions that have been completed. GDBalso will be collaborating with the Euro-pean Bioinformatics Institute, in con-junction with the international HumanGenome Organisation (HUGO), tomaintain a single shared Human Se-quence Index that will record commit-ments and status for sequencing clonesor regions. As a result, the sequencingstatus of any region can be displayedalongside other GDB mapping data.

Outr eachThe Genome Database continues toseek direct community feedback and in-teract with the broader science commu-nity via various sources:

• International Scientific AdvisoryCommittee meets annually to offerinput and advice.

• Quarterly Review Committee confersfrequently with the staff to trackGDB progress and suggest change.

• HUGO nomenclature, chromosome,and other editorial committees havespecialized functions within GDB,providing official names and consen-sus maps and ensuring the high qual-ity of GDB’s content.

Copies of GDB are available worldwidefrom ten mirror sites (nodes) that makethe data more easily accessible to the in-ternational research community. GDBstaff meet annually with node managersto facilitate interaction and to benefitfrom other user perspectives.


55

Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

National Center for Genome Resources

Thttp://www.ncgr.org

This chart illustrates thetaxonomic distribution of the1,076,481,102 base pairs in theGenome Sequence DataBase.About 47% of the base pairsand 58% of the total databaserecords represent humansequences (August 1997).[Source: Adapted from chart providedby Carol Harger, GSDB]

Other32%

Rodent + Primate4%

Microbial10%

Plant7%

Human47%

In lieu of individual abstracts,research projects and investi-gators at NCGR are repre-sented in this narrative. Moreinformation can be found onthe center’s Web site (see URLabove).

. . . . . . . . . . . . . . . . . . . . . . . . .


he National Center forGenome Resources(NCGR) is a not-for-profit organization cre-ated to design, develop,

support, and deliver resources in sup-port of public and private genome andgenetic research. To accomplish thesegoals, NCGR is developing and publish-ing the Genome Sequence DataBase(GSDB) and the Genetics and PublicIssues (GPI) program.

NCGR is a center to facilitate the flowof information and resources from ge-nome projects into both public and pri-vate sectors. A broadly based board ofgovernors provides direction and strat-egy for the center’s development.

NCGR opened in Santa Fe in July 1994,with its initial bioinformatics workbeing developed through a coopera-tive 5-year agreement with the Depart-ment of Energy funded in July 1995.Committed to serving as a resource forall genomic research, the centerworks collaboratively with researchersand seeks input from users to ensurethat tools and projects under develop-ment meet their needs.

Genome SequenceDataBaseGSDB is a relational database that con-tains nucleotide sequence data (see piechart) and its associated annotationfrom all known organisms (http://www.ncgr.org/gsdb). All data are freelyavailable to the public. The major goalsof GSDB are to provide the supportstructure for storing sequence data andto furnish useful data-retrieval services.

GSDB adheres to the philosophy thatthe database is a “community-owned”resource that should be simple to updateto reflect new discoveries about se-quences. A corollary to this is GSDB’sconviction that researchers know theirareas of expertise much better than adatabase curator and, therefore, they

should be given ownership and controlover the data they submit to the data-base. The true role of the GSDB staff isto help researchers submit data to andretrieve data from the database.

GSDB EnhancementsDuring 1996, GSDB underwent a majorrenovation to support new data typesand concepts that are important to ge-nomic research. Tables within the data-base were restructured, and new tablesand data fields were added. Some keyadditions to GSDB include the supportof data ownership, sequence align-ments, and discontiguous sequences.

The concept of data ownership is a cor-nerstone to the functioning of the newGSDB. Every piece of data (e.g., se-quence or feature) within the database isowned by the submitting researcher, andchanges can be made only by the dataowner or GSDB staff. This implementa-tion of data ownership provides GSDBwith the ability to support community(third-party) annotation—the additionof annotation to a sequence by othercommunity researchers.

Genome Sequence DataBase1800 Old Pecos Trail, Suite ASanta Fe, NM 87505

Peter SchadVice-President, Bioinformatics and Biotechnology505/995-4447, Fax: [email protected]

Carol HargerGSDB Manager505/982-7840, Fax: [email protected]

56 DOE Human Genome Program Report, NCGR

A second enhancement of GSDB is theability to store and represent sequencealignments. GSDB staff has been con-structing alignments to several key se-quences including the env and pol(reverse transcriptase) genes of the HIVgenome, the complete chromosome VIIIof Saccharomyces cerevisiae, and thecomplete genome of Haemophilusinfluenzae. These alignments are usefulas possible sites of biological interest andfor rapidly identifying differences be-tween sequences.

A third key GSDB enhancement is theability to represent known relationshipsof order and distance between separateindividual pieces of sequence. Thesesets of sequences and their relative posi-tions are grouped together as a singlediscontiguous sequence. Such a sequencemay be as simple as two primers that de-fine the ends of a sequence tagged site(STS), it may comprise all exons that arepart of a single gene, or it may be ascomplex as the STS map for an entirechromosome.

GSDB staff has constructed discontigu-ous sequences for human chromosomes 1through 22 and X that include markersfrom Massachusetts Institute of Technol-ogy–Whitehead Institute STS maps andfrom the Stanford Human Genome Cen-ter. The set of 2000 STS markers forchromosome X, which were mapped re-cently by Washington University atSt. Louis, also have been added to chro-mosome X. About 50 genomic sequenceshave been added to the chromosome 22map by determining their overlap withSTS markers. Genomic sequences arebeing added to all the chromosomes astheir overlap with the STS markers isdetermined. These discontiguous se-quences can be retrieved easily andviewed via their sequence names usingthe GSDB Annotator. Sequence namesfollow the format of HUMCHR#MP,where # equals 1 through 22 or X.

GSDB staff also has utilized discontigu-ous sequences to construct maps formaize and rice. The maize discontiguous

sequences were constructed using mark-ers from the University of Missouri,Columbia. Markers for the ricediscontiguous sequence were obtainedfrom the Rice Genome Database atCornell University and the Rice Ge-nome Research Project in Japan.

New ToolsAs a result of the major GSDB renova-tion, new tools were needed for submit-ting and accessing database data.Annotator was developed as a graphicalinterface that can be used to view, up-date, and submit sequence data (http://www.ncgr.org/gsdb/beta.html). Maestro,a Web-based interface, was developedto assist researchers in data retrieval(http://www.ncgr.org/gsdb/maestrobeta.html). Although both these tools cur-rently are available to researchers,GSDB is continuing development toadd increased capabilities.

Annotator displays a sequence and itsassociated biological information as animage, with the scale of the image ad-justable by the user. Additional informa-tion about the sequence or an associatebiological feature can be obtained in apop-up window. Annotator also allows auser to retrieve a sequence for review,edit existing data, or add annotation tothe record. Sequences can be created us-ing Annotator, and any sequences cre-ated or edited can be saved either to alocal file for later review and further ed-iting or saved directly to the database.

Correct database structures are impor-tant for storing data and providing theresearch community with tools forsearching and retrieving data. GSDB ismaking a concerted effort to expand andimprove these services. The first gen-eration of the Maestro query tool isavailable from the GSDB Web pages.Maestro allows researchers to performqueries on 18 different fields, some ofwhich are queryable only throughGSDB, for example, D segment num-bers from the Genome Database atJohns Hopkins University in Baltimore.

57DOE Human Genome Program Report, NCGR

Additionally, Maestro allows querieswith mixed Boolean operators for amore refined search. For example, auser may wish to compare relativelylong mouse and human sequences thatdo not contain identified coding re-gions. To obtain all sequences meetingthese criteria, the scientific name fieldwould be searched first for “Mus mus-culus” and then for “Homo sapiens” us-ing the Boolean term “OR.” Then thesequence-length filter could be used torefine the search to sequences longerthan 10,000 base pairs. To exclude se-quences containing identified coding-re-gion features, the “BUT NOT” term canbe used with the Feature query field setequal to “coding region.”

With Maestro, users can view the list ofsearch matches a few at a time and re-trieve more of the list as needed. Fromthe list, users can select one or severalsequences according to their short de-scriptions and review or download thesequence information in GIO, FASTA,or GSDB flatfile format.

Future PlansAlthough most pieces necessary for op-eration are now in place, GSDB is stillimproving functionality and adding en-hancements. During the next yearGSDB, in collaboration with other re-searchers, anticipates creating morediscontiguous sequence maps for sev-eral model organisms, adding morefunctionality to and providing a Web-based submission tool and tool kit forcreating GIO files.

Micr obial GenomeWeb PagesNCGR also maintains informationalWeb pages on microbial genomes.These pages, created as a communityreference, contain a list of current orcompleted eubacterial, Archaeal, andeukaryotic genome sequencing projects.Each main page includes the name of

the organism being sequenced, sequenc-ing groups involved, background infor-mation on the organism, and its currentlocation on the Carl Woese Tree of Life.As the Microbial Genome Projectprogresses, the pages will be updated asappropriate.

Genetics and PublicIssues ProgramGPI serves as a crucial resource forpeople seeking information and makingdecisions about genetics or genomics(http://www.ncgr.org/gpi). GPI developsand provides information that explainsthe ethical, legal, policy, and social rel-evance of genetic discoveries and appli-cations.

To achieve its mission, GPI has set forththree goals: (1) preparation and devel-opment of resources, including carefuldelineation of ethical, legal, policy, andsocial issues in genetics and genomics;(2) dissemination of genetic informationtargeted to the public, legal and healthprofessionals, policymakers, and deci-sion makers; and (3) creation of an in-formation network to facilitateinteraction among groups.

GPI delivers information through fourprimary vehicles: online resources, con-ferences, publications, and educationalprograms. The GPI program maintains acontinually evolving World Wide Website containing a range of materialfreely accessible over the Internet.

58

Los Alamos National Laboratory researcher David Bruce uses an automated system for gridding chromosomelibrary clones in preparation of very dense filter arrays for hybridization experiments. [Source: Lynn Clark, LANL]


59

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Program Management

he Human Genome Programwas conceived in 1986 as aninitiative within the DOE Of-fice of Health and Environ-mental Research, which has

been renamed Office of Biological andEnvironmental Research (OBER) (seechart below). The program is administeredprimarily through the OBER Health Ef-fects and Life Sciences Research Division(HELSRD), both directed by David A.Smith until his retirement in January1996. Marvin Frazier is now Director ofHELSRD, and OBER is led by Associ-ate Director Aristides Patrinos, who alsoserves as Human Genome Programmanager. Previous directors and manag-ers are listed in the table below. OBERis within the Office of Energy Research,directed by Martha Krebs.

See Appendix A, p. 73, forinformation on HumanGenome Project history,including enablinglegislation.

T DOE OBER MissionBased on mandates from Congress,DOE OBER’s principal missions are to(1) develop the knowledge necessary toidentify, understand, and anticipatelong-term health and environmentalconsequences of energy use and devel-opment and (2) employ DOE’s uniquescientific and technological capabilitiesin solving major scientific problems inmedicine, biology, and the environment.

Genome integrity and radiation biologyhave been a long-term concern ofOBER at DOE and its predecessors—the Atomic Energy Commission (AEC)and the Energy Research and Develop-ment Administration (ERDA). In theUnited States, the first federal support

. . . . . . . . . .

OHER Associate Human Genome or Acting Directors Program Managers

DOE national laboratories 7Academic institutions 28Private-sector institutions 10Companies, including Small 11 Business Innovation ResearchForeign institutions (Russia, 7 Canada, Israel)

Institutions ConductingDOE-Sponsored

Genome Research

. . . . . . . . . . . . . . . . . . . . . . . . . .

Charles De Lisi 1985 Benjamin J. Barnhart 1988Robert W. Wood 1987 David A. Smith 1991David J. Galas 1990 Aristides Patrinos 1996Aristides Patrinos 1993

http://www.er.doe.gov/production/ober/hug_top.html

Biological and Environmental Research

Advisory Committee

Merit Panel Reviews

Biotechnology Consortium

Projects at Universities, National Laboratories,

and Industrial Institutions

Other Funding Agencies

Office ofEnergy Research

Office of Biologicaland Environmental

Research

Human Genome Task Group

Health Effectsand Life Sciences Research Division


60

for genetic research was through AEC. Inthe early days of nuclear energy develop-ment, the focus was on radiation effectsand broadened later under ERDA andDOE to include health implications of allenergy technologies and their by-products.

Today, extensive OBER-sponsored re-search programs on genomic structure,maintenance, damage, and repair con-tinue at the national laboratories and uni-versities. These and other OBERefforts support a DOE shift toward a pre-ventive approach to health, environment,and safety concerns. World-class scien-tists in top facilities working on leading-edge problems spawn the knowledge torevolutionize the technology, drive thefuture, and add value to the U.S.economy. Major OBER research includescharacterization of DNA repair genes andimprovement of methodologies and re-sources for quantifying and characteriz-ing genetic polymorphisms and theirrelationship to genetic susceptibilities.

To carry out its national research and de-velopment obligations, OBER conductsthe following activities:

• Sponsors peer-reviewed research anddevelopment projects at universities,in the private sector, and at DOE na-tional laboratories (see box, p. 59).

• Considers novel, beneficial initiativeswith input from the scientific commu-nity and governmental sectors.

• Provides expertise to various govern-mental working groups.

• Supports the capabilities of multi-disciplinary DOE national laborato-ries and their unique user facilitiesfor the nation’s benefit (p. 61).

Human Genome Program resources andtechnologies are focused on sequencingthe human genome and related infor-matics and supportive infrastructure (seechart and tables, p. 62). The genomes ofselected microorganisms are analyzedunder the separate Microbial GenomeProgram.

DOE Human Genome Task Group

Chair: Aristides Patrinos DOE Office of Biological andEnvironmental Research

Charles Arntzen* Cornell UniversityElbert Branscomb Lawrence Livermore National

LaboratoryCharles Cantor Boston UniversityAnthony Carrano Lawrence Livermore National

LaboratoryThomas Caskey Merck Research LaboratoriesDavid Eisenberg University of California, Los AngelesChris Fields † National Center for Genome ResourcesDavid Galas Darwin Molecular, Inc.Raymond Gesteland University of UtahKeith Hodgson Stanford UniversityLeroy Hood University of Washington, SeattleDavid Kingsbury † Chiron PharmaceuticalsRobert Moyzis † University of California, IrvineMohandas Narla* Lawrence Berkeley National LaboratoryMichael Palazzolo Amgen, Inc.Melvin Simon* California Institute of TechnologyHamilton Smith* Johns Hopkins University School of

MedicineLloyd Smith University of Wisconsin, MadisonLisa Stubbs Lawrence Livermore National

LaboratoryEdward Uberbacher* Oak Ridge National LaboratoryMarc Van Montagu* Ghent University, BelgiumExecutive Officer: Lawrence Berkeley National Laboratory

Sylvia Spengler

*Appointed after October 1996.†Resigned, 1997.Note: All members of the DOE Human Genome Task Group are ex-officiomembers of the Biotechnology Consortium.

Biotechnology Consortium

Member Specialty

Chair: Aristides Patrinos Physical sciencesBenjamin J. Barnhart Genetics, Radiation biologyElbert Branscomb Scientific Director, Joint Genome

InstituteDaniel W. Drell Biology, ELSI, Informatics,

Microbial genomeLudwig Feinendegen Medicine, Radiation biologyMarvin Frazier Molecular and cellular biologyGerald Goldstein † Physical science, InstrumentationD. Jay Grimes † MicrobiologyRoland Hirsch Structural biology, InstrumentationArthur Katz* Physical sciencesAnna Palmisano* † Microbiology, Microbial genomeMichael Riches Physical sciencesJay Snoddy † Molecular biology, InformaticsMarvin Stodolsky Molecular biology, BiophysicsDavid G. Thomassen Cell and molecular biologyJohn C. Wooley Computational biology

*Joined, 1997.†Left OBER, 1997.

DOE Human Genome Program Report, Program Management

61

Major DOE User Facilities and ResourcesRelevant to Molecular Biology Research

Argonne National LaboratoryAdvanced Photon Source

Brookhaven National LaboratoryHigh-Flux Beam ReactorNational Synchrotron Light SourceProtein Structure Data BankScanning T ransmission Electron Microscope

Lawrence Berkeley National LaboratoryAdvanced Light SourceCenter for X-Ray OpticsNational Energy Research Scientific Computing Center

Lawrence Livermore National LaboratoryNational Laboratory Gene Library Project

Although the genome program is contributing fundamental information about the structure of chromosomesand genes, other types of knowledge are required to understand how genes and their products function. Three-dimensional protein structure studies are still essential because structure cannot be predicted fully from itsencoded DNA sequence.

To enhance these and other studies, DOE builds and maintains structural biology user facilities that enablescientists to gain an understanding of relationships between biological structures and their functions, studydisease processes, develop new pharmaceuticals, and conduct basic research in molecular biology andenvironmental processes. These resources are used heavily by both academic and private-sector scientists.

Other important resources available to the research community include the clone libraries developed in theNational Laboratory Gene Library Project and distributed worldwide, the GRAIL Online SequenceInterpretation Service, and the Mouse Genetics Research Facility.

Los Alamos National LaboratoryNational Flow-Cytometry ResourceNational Laboratory Gene Library ProjectNeutron-Scattering Center

Oak Ridge National LaboratoryGRAIL, Online Sequence Interpretation ServiceMouse Genetics Research Facility

Pacific Northwest National LaboratoryEnvironmental Molecular Sciences Laboratory

Stanford UniversitySynchrotron Radiation Laboratory


62

Human Genome Program

Coordination and ResourcesProgram coordination is the responsibility of the Human Genome Task Group (seebox, p. 60), which, beginning in 1997, includes Elbert Branscomb, the Joint GenomeInstitute’s Scientific Director. The task group is aided by the Biotechnology Consor-tium (which succeeded the former Human Genome Coordination Committee; seebox, p. 60) to foster information exchange and dissemination. The task group admin-isters the DOE Human Genome Program and its evolving needs and reports to the

Associate Director for Biological andEnvironmental Research (currentlyAristides Patrinos). The task group ar-ranges periodic workshops and coor-dinates site reviews for genomecenters, the Joint Genome Institute,databases, and other large projects. Italso coordinates peer review of researchproposals, administration of awards, andcollaboration with all concerned agen-cies and organizations.

The Biotechnology Consortium pro-vides the OBER Associate Director withexternal expertise in all aspects of ge-nomics and informatics and a mecha-nism by which OBER can keep track ofthe latest developments in the field. Itfacilitates development and disseminationof novel genome technologies through-out the DOE system, ensures appropri-ate management and sharing of data andresources by all DOE contractors andgrantees, and promotes interactions withother national and international ge-nomic entities.

0

10

20

30

40

50

60

70

80

90

100

989796959493929190898887

Dol

lars

in M

illio

ns

Fiscal Year

Operating Expenditures and FY 1998 Projected Budgetfor the DOE Human Genome Program

FY 1996 Mapping Sequencing Sequencing Informatics ELSI Administration Totals %Technology

DOE Laboratories 8,980 11,015 11,128 6,840 313 2,783 41,059 60.1

Academic 6,671 4,368 3,257 6,178 642 4 21,120 30.9

Nonprofit 563 0 467 2,783 1,311 38 5,162 7.5

Federal 0 0 0 0 0 1,000 1,000 1.5

Total 16,214 15,383 14,852 15,801 2,266 3,825 68,341

% of Total 23.8 22.5 21.7 23.1 3.3 5.6 100

*Includes DOE laboratories' nonresearch costs but not U.S. government administration or SBIR.**DOE contribution to the international Human Frontiers Neurosciences Program.

*

**

Human Genome Program Operating Funds Distribution in FY 1996 ($K)

Year Operating Capital Equipment Construction Total

1996 68.3 5.6 5.7 79.6

1997 73.9 6.0 1.0 80.9

1998* 79.9 5.2 0.0 85.1

*Projected expenses.

Human Genome Program Fiscal Year Expenditures ($M)


63

CommunicationThe DOE Human Genome Programcommunicates information in a varietyof ways. These communication systemsinclude the Human Genome Manage-ment Information System (HGMIS),projects in the Ethical, Legal, and SocialIssues (ELSI) Program, electronic re-sources, meetings, and fellowships.Some of these mechanisms are de-scribed below. For more details, see Re-search Highlights, ELSI projects, p. 18.

HGMIS

HGMIS provides technical communica-tion and information services for theDOE OBER Human Genome ProgramTask Group. HGMIS is charged with(1) helping to communicate genome-related matters and research to contrac-tors, grantees, other (nongenome project)researchers, and other multipliers of in-formation pertaining to genetic research;(2) serving as a clearinghouse for inquir-ies about the U.S. genome project; and(3) reducing research duplication by pro-viding a forum for interdisciplinary in-formation exchange (including resourcesdeveloped) among genetic investigatorsworldwide.

HGMIS publishes the newsletter HumanGenome News, sponsored by OBER.Over 14,000 HGN subscribers includegenome and basic researchers at nationallaboratories, universities, and other re-search institutions; professors and teach-ers; industry representatives; legalpersonnel; ethicists; students; geneticcounselors; physicians; science writers;and other interested individuals.

HGMIS also produces the DOE Primeron Molecular Genetics; a compilation ofELSI abstracts; and reports on the DOEHuman Genome and Microbial GenomePrograms, contractor-grantee work-shops, and other related subjects.

Electronic versions of the primer andother HGMIS publications are availablevia the World Wide Web. HGMIS also

initiates and maintains other relatedWeb sites (see DOE Electronic GenomeResources section below and DOE WebSites at right).

In addition to their print and online pub-lishing efforts, HGMIS staff membersanswer questions generated via Websites, telephone, fax, and e-mail. Theyalso furnish customized informationabout the genome project for multipliersof information (contact: Betty Mansfieldat 423/576-6669, Fax: /574-9888,[email protected]).

DOE Electronic GenomeResources

Web Sites. The DOE Human GenomeProgram Home Page displays pointersto other programs within OBER and theOffice of Energy Research. Links aremade to additional biological and envi-ronmental information and to HGMIS,Genome Database, and other sites.

HGMIS initiates and maintains thesearchable Human Genome Project In-formation Web site. This site containsmore than 1700 text files of informationfor multidisciplinary technical audiencesas well as for lay persons interested inlearning about the science, goals,progress, and history of the project. Us-ers include almost all levels of students;education, medical, and legal profes-sionals; genetic society and supportgroup members; biotechnology andpharmaceutical industry personnel; ad-ministrators; policymakers; and the press.

The site also houses a section of fre-quently asked questions, a quick factfinder, Primer on Molecular Genetics,all issues of Human Genome News,DOE Human Genome Program andcontractor-grantee workshop reports,To Know Ourselves, historical docu-ments, research abstracts, calendars ofgenome events, and hundreds of links togenome research and educational sites.More than 1000 other Web pages link tothis site, resulting in more than 100,000text file transfers each month. This

DOE Web SitesDOE Human Genome Programhttp://www.er.doe.gov/production/ober/hug_top.html

OBERhttp://www.er.doe.gov/production/ ober/ober_top.html

Office of Energy Researchhttp://www.er.doe.gov

Human Genome Project Informationhttp://www.ornl.gov/hgmis

HGP and Related Meetingshttp://www.ornl.gov/meetings

Courtshttp://www.ornl.gov/courts


64

The DOE Human GenomeProgram and Human GenomeProject Information Web sites offerboth general and scientificaudiences thousands of text filesand links for comprehensivecoverage of all aspects of genomeresearch worldwide. See text (pp. 63and 65) for further details.


65

Human Genome DistinguishedPostdoctoral Fellows

HGMIS site has received a Four-Stardesignation from the Magellan Groupand the Editor’s Choice Award fromLookSmart.

Genome-project and related meetingsare listed at a Web site (see box, p. 63),through which users can register andsubmit research abstracts. Another listedrelated site discusses issues at the criti-cal intersection of genetics and the courtsystem. This Web page is part of aproject to educate and prepare the judi-ciary for the coming onslaught of casesinvolving genetic issues and data.

Newsgroup. The Human Genome Pro-gram Newsgroup operates through theBIOSCI electronic bulletin board net-work to allow researchers worldwide tocommunicate, share ideas, and find so-lutions to problems. Genome-related in-formation is distributed through thenewsgroup, including requests for grantapplications, reports from recent scien-tific and advisory meetings, announce-ments of future events, and listings offree software and services ([email protected] or http://www.bio.net).

Postdoctoral FellowshipsOBER established the Human GenomeDistinguished Postdoctoral ResearchProgram in 1990 to support research onprojects related to the DOE Human Ge-nome Program. Beginning in FY 1996,the Human Genome DistinguishedPostdoctoral Fellowships were mergedwith the Alexander Hollaender Distin-guished Postdoctoral Fellowships,which provide support in all areas ofOBER-sponsored research. Postdoctoralprograms are administered by the OakRidge Institute for Science and Educa-tion, a university consortium and DOEcontractor. For additional information,contact Linda Holmes (423/576-3192,[email protected]) or see the Web site(http://www.orau.gov/ober/hollaend.htm).

Names of past and current fellows in genome topics are given belowwith their research institutions and titles of proposed research. For 1996research abstracts, refer to Index of Principal and Coinvestigators onp. 71 in Part 2 of this report.

1994 Mark Graves (Baylor College of Medicine): Graph DataModels for Genome Mapping

William Hawe (Duke University): Synthesis of Peptide NucleicAcids for DNA Sequencing by Hybridization

Jingyue Ju (University of California, Berkeley): Design,Synthesis, and Use of Oligonucleotide Primers Labeled withEnergy Transfer–Coupled Dyes

Mark Shannon (Oak Ridge National Laboratory): Compara-tive Study of a Conserved Zinc Finger Gene Region

1995 Evan Eichler (Lawrence Livermore National Laboratory):Identification, Organization, and Characterization of ZincFinger Genes in a 2-Mb Cluster on 19p12

Kelly Ann Frazer (Lawrence Berkeley National Laboratory):In Vivo Complementation of the Murine Mutations Grizzled,Mocha, and Jitteri

Soo-in Hwang (Lawrence Berkeley National Laboratory):Positional Cloning of Oncogenes on 20q13.2

James Labrenz (University of Washington, Seattle): ErrorAnalysis of Principal Sequencing Data and Its Role in ProcessOptimization for Genome-Scale Sequencing Projects

Marie Ruiz-Martinez (Northeastern University): MultiplexPurification Schemes for DNA Sequencing–Reaction Products:Application to Gel-Filled Capillary Electrophoresis

Todd Smith (University of Washington, Seattle): Managing theFlow of Large-Scale DNA Sequence Information

Alexander Hollaender DistinguishedPostdoctoral Fellows in Genome Research

1996 Cymbeline Culiat (Oak Ridge National Laboratory): Cloningof a Mouse Gene Causing Severe Deafness and BalanceDefects

Tau-Mu Yi (Laboratory of Structural Biology and MolecularMedicine, Los Angeles): Structure-Function Analysis ofAlpha-Factor Receptor

1997 Jeffrey Koshi (Los Alamos National Laboratory): Construction,Analysis, and Use of Optimal DNA Mutation Matrices

Sandra McCutchen-Maloney (Lawrence Livermore NationalLaboratory): Structure and Function of a Damage-SpecificEndonuclease Complex


66

The laser-based flow cytometer developed at DOE national laboratoriesenables researchers to separate human chromosomes for analysis.[Source: Los Alamos National Laboratory]


67

he U.S. Human GenomeProject is supported jointlyby the Department of En-ergy (DOE) and the Na-tional Institutes of Health

(NIH), each of which emphasizes dif-ferent facets. The two agencies coordi-nate their efforts through developmentof common project goals and joint sup-port of some programs addressing ethi-cal, legal, and social issues (ELSI)arising from new genome tools, tech-nology, and data.

Extraordinary advances in genome re-search are due to contributions by manyinvestigators in this country and abroad.In the United States, such research (in-cluding nonhuman) also is funded byother federal agencies and private foun-dations and groups. Many countries aremajor contributors to the project throughinternational collaborations and their ownfocused programs. Coordinating andfacilitating these diverse research ef-forts around the world is the aim ofthe nongovernmental internationalHuman Genome Organisation.

Some details of U.S. and worldwidecoordination are provided below.

U.S. Human GenomeProject: DOE and NIHIn 1988 DOE and NIH developed aMemorandum of Understanding thatformalized the coordination of their ef-forts to decipher the human genome andthus “enhance the human genome re-search capabilities of both agencies.” Inearly 1990 they presented Congresswith a joint plan, Understanding OurGenetic Inheritance, The U.S. HumanGenome Project: The First Five Years(1991–1995). Referred to as the Five-Year Plan, it contained short-term scien-tific goals for the coordinated, multiyearresearch project and a comprehensivespending plan. Unexpectedly rapidprogress in mapping prompted early re-vision of the original 5-year goals in the

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Coordination with Other Genome Programs

fall of 1993 [Science 262, 43–46 (Octo-ber 1, 1993)]. Current goals, which runthrough September 30, 1998, are listedon page 5; text of both 5-year plans isaccessible via the Web (http://www.ornl.gov/hgmis/project/hgp.html).

DOE and NIH have adopted a jointpolicy to promote sharing of genomedata and resources for facilitatingprogress and reducing duplicated work.(See Appendix B: DOE-NIH SharingGuidelines, p. 75.)

ELSI ConsiderationsNIH and DOE devote at least 3% oftheir respective genome program bud-gets to identifying, analyzing, and ad-dressing the ELSI considerationssurrounding genome technology andthe data it produces. The DOE ELSIcomponent focuses on research intothe privacy and confidentiality of per-sonal genetic information, geneticsrelevant to the workplace, commercial-ization (including patenting) of genomeresearch data, and genetic education forthe general public and targeted commu-nities. The NIH ELSI component sup-ports studies on a range of ethical issuessurrounding the conduct of genetic re-search and responsible clinical integra-tion of new genetic technologies,especially in testing for mutations asso-ciated with cystic fibrosis and heritablebreast, ovarian, and colon cancers.

In 1990, the DOE-NIH Joint ELSIWorking Group was established toidentify, address, and develop policyoptions; stimulate bioethics research;promote education of professional andlay groups; and collaborate with suchinternational groups as the Human Ge-nome Organisation (HUGO); UnitedNations Educational, Scientific, andCultural Organization; and the Euro-pean Community. Research funded bythe U.S. Human Genome Projectthrough the joint working group hasproduced policy recommendationsin various areas. In May 1993, for

T. . . . . . . . . . . . . . . . . . . . . . . . .Enhancing genomeresearch capabilities


68

example, the DOE-NIH Joint ELSIWorking Group Task Force on GeneticInformation and Insurance issued a re-port with recommendations for manag-ing the impact of advances in humangenetics on the current system ofhealthcare coverage. In 1996, the work-ing group released guidelines for inves-tigators on using DNA from humansubjects for large-scale sequencingprojects. The guidance emphasizes nu-merous ways to preserve donor ano-nymity [see Appendix C, p. 77, and theWorld Wide Web (http://www.ornl.gov/hgmis/archive/nchgrdoe.html)].

In 1997, following an evaluation, thetwo agencies modified the ELSI work-ing group into the ELSI Research andProgram Evaluation Group (ERPEG).ERPEG will focus more specifically onresearch activities supported by DOEand NIH ELSI programs.

Other U.S. ProgramsThe potential impact of genome re-search on society and the rapid growthof the biotechnology industry havespurred the initiation of other genomeresearch projects in this country andworldwide. These projects aim to createmaps of the human genome and the ge-nomes of model organisms and severaleconomically important microbes,plants, and animals.

• The DOE Microbial Genome Pro-gram, begun in 1994, is producingcomplete genome sequence data onindustrially important microorgan-isms, including those that live underextreme environmental conditions.The sequences of several microbialgenomes have been completed.[http://www.er.doe.gov/production/ober/EPR/mig_top.html]

• In 1990, the National Science Founda-tion, DOE, and the U.S. Departmentof Agriculture (USDA) initiated aproject to map and sequence thegenome of the model plant Arabidop-

sis thaliana. The goal of this projectis to enhance fundamental understand-ing of plant processes. In 1996, thethree agencies began funding system-atic, large-scale genomic sequencingof the 120-megabase Arabidopsisgenome, with the goal of completingit by 2004, with DOE supportthrough the Office of Basic EnergySciences. [http://pgec-genome.pw.usda.gov/agi.html]

• USDA also funds animal genomeresearch projects designed to obtaingenome maps for economically im-portant species (e.g., corn, soybeans,poultry, cattle, swine, and sheep) toenable genetic modifications that willincrease resistance to diseases andpests, improve nutrient value, andincrease productivity.

• The Advanced Technology Program(ATP) of the U.S. National Instituteof Standards and Technology pro-motes industry-government partner-ships in DNA sequencing andbiotechnology through the Tools forDNA Diagnostics component. DOEstaff participates in the ATP reviewprocess (see box, p. 22). [http://www.atp.nist.gov]

• In 1997 the NIH National Cancer In-stitute established the Cancer Ge-nome Anatomy Project (CGAP) todevelop new diagnostic tools for un-derstanding molecular changes thatunderlie all cancers (http://www.ncbi.nlm.nih.gov/ncicgap). DOEresearchers are generating clonelibraries to support this effort.

InternationalCollaborationsThe current DOE-NIH Five-Year Plancommends the “spirit of internationalcooperation and sharing” that has char-acterized the Human Genome Projectand played a major role in its success.Cooperation includes collaborationsamong laboratories in the United States

DOE Human Genome Program Report, Coordination

69

and abroad as well as extensive sharingof materials and information amonggenome researchers around the world.The DOE Human Genome Programsupports many international collabo-rations as well as grantees in severalforeign institutions.

Collaborations involving the DOE hu-man genome centers include mappingchromosomes 16 and 19, developing re-sources, and constructing the humangene map from shared cDNA libraries.These libraries were generated by theIntegrated Molecular Analysis of GeneExpression (called IMAGE) Consor-tium initiated by groups at LawrenceLivermore National Laboratory, Colum-bia University, NIH National Instituteof Mental Health, and Généthon(France).

Investigators from almost every majorsequencing center in the world met inBermuda in February 1996 and again in1997 to discuss issues related to large-scale sequencing. These meetings weredesigned to help researchers coordinate,compare, and evaluate human genomemapping and sequencing strategies;consider new sequencing and infor-matics technologies; and discuss re-lease of data.

Human GenomeOrganisationFounded by scientists in 1989, HUGOis a nongovernmental internationalorganization providing coordinationfunctions for worldwide genome efforts.HUGO activities range from support ofdata collation for constructing genome

Countries with genomeprograms or strong pro-grams in human geneticsinclude Australia, Brazil,Canada, China, Denmark,European Union, France,Germany, Israel, Italy,Japan, Korea, Mexico,Netherlands, Russia,Sweden, United Kingdom,and United States.

maps to organizing workshops. HUGOalso fosters exchange of data andbiomaterials, encourages technologysharing, and serves as a coordinatingagency for building relationships amongvarious government funding agenciesand the genome community.

HUGO offers short-term (2- to 10-week)travel awards up to $1500 for investiga-tors under age 40 to visit another coun-try to learn new methods or techniquesand to facilitate collaborative researchbetween the laboratories.

HUGO has worked closely with interna-tional funding agencies to sponsorsingle-chromosome workshops (SCWs)and other genome meetings. Due to thesuccess of these workshops as well asthe shift in emphasis from mapping tosequencing, DOE and NIH began tophase out their funding for internationalSCWs in FY 1996 but encouraged appli-cations for individual SCWs as needed.In 1996, HUGO partially funded an in-ternational strategy meeting in Bermudaon large-scale sequencing. Principles re-garding data release and a resources listdeveloped at the meeting are availableon the HUGO Web site (http://hugo.gdb.org/hugo.html).

Membership in HUGO (over 1000people in more than 50 countries) isextended to persons concerned withhuman genome research and relatedscientific subjects. Its current presidentis Grant R. Sutherland (Adelaide Womenand Children’s Hospital, Australia).Directed by an 18-member interna-tional council, HUGO is supported bygrants from the Howard Hughes Medi-cal Institute and The Wellcome Trust.

Countries withGenome Programs

DOE Human Genome Program Report, Coordination

70

Los Alamos National Laboratory researchers Peter Goodwin and Rhett Affleck load a sample of fluorescently labeledDNA into an ultrasensitive flow cytometer used to detect single cleaved nucleotides. [Source: Lynn Clark, LANL]


71DOE Human Genome Program Report, Appendices

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Appendices

Appendix A: Early Histor y, Enabling Legislation(1984–90).............73

Appendix B: DOE-NIH Sharing Guidelines (1992)...............................75

Appendix C: Human Subjects Guidelines(1996)....................................77

Appendix D: Genetics on the World Wide Web (1997)........................83

Appendix E: 1996 Human Genome Research Projects (1996).........89

Appendix F: DOE BER Program (1997).....................................................95

72 DOE Human Genome Program Report, Appendices


Appendix A

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DOE Human Genome Program: Early History, Enabling Legislation

A brief history of the U.S. Department of Energy (DOE) Hu-man Genome Program will be useful in a discussion of theobjectives of the DOE program as well as those of the col-laborative U.S. Human Genome Project. The DOE Office ofBiological and Environmental Research (OBER) of DOEand its predecessor agencies—the Atomic Energy Commis-sion and the Energy Research and Development Administra-tion—have long sponsored research into genetics, both inmicrobial systems and in mammals, including basic studieson genome structure, replication, damage, and repair and theconsequences of genetic mutations. (See Appendix E fora discussion of the DOE Biological and EnvironmentalResearch Program.)

In 1984, OBER [then named Office of Health and Environ-mental Research (OHER)] and the International Commissionon Protection Against Environmental Mutagens and Carcino-gens cosponsored a conference in Alta, Utah, which high-lighted the growing roles of recombinant DNA technologies.Substantial portions of the meeting’s proceedings were incor-porated into the Congressional Office of Technology Assess-ment report, Technologies for Detecting Heritable Mutationsin Humans, in which the value of a reference sequence of thehuman genome was recognized.

Acquisition of such a reference sequence was, however, farbeyond the capabilities of biomedical research resourcesand infrastructure existing at that time. Although the

small genomes of several microbes had been mapped or par-tially sequenced, the detailed mapping and eventual sequenc-ing of 24 distinct human chromosomes (22 autosomes andthe sex chromosomes X and Y) that together comprise anestimated 3 billion subunits was a task some thousandsfoldlarger.

DOE OHER was already engaged in several multidisciplinaryprojects contributing to the nation’s biomedical capabilities,including the GenBank DNA sequence repository, whichwas initiated and sustained by DOE computer and data-management expertise. Several major user facilities support-ing microstructure research were developed and are main-tained by DOE. Unique chromosome-processing resourcesand capabilities were in place at Los Alamos National Labo-ratory and Lawrence Livermore National Laboratory. Amongthese were the fluorescence-activated cell sorter (calledFACS) systems to purify human chromosomes within theNational Laboratory Gene Library Project for the productionof libraries of DNA clones. The availability of these mono-chromosomal libraries opened an important path—a practicalmeans of subdividing the huge total genome into 24 muchmore manageable components.

With these capabilities, OHER began in 1986 to consider thefeasibility of a dedicated human genome program. Leadingscientists were invited to the March 1986 international con-ference at Santa Fe, New Mexico, to assess the desirability

Enabling Legislation

In the United States, the first federalsupport for genetics research wasthrough the Atomic Energy Commis-sion. In the early days of nuclear en-ergy development, the focus was onradiation effects and later broadenedunder the Energy Research and De-velopment Administration (ERDA)and the Department of Energy to in-clude the health implications of allenergy technologies and theirby-products. Major enabling legisla-tion follows.

Atomic Energy Act of 1946(P.L. 79-585): Provided the initialcharter for a comprehensive programof research and development relatedto the utilization of fissionable and

radioactive materials for medical,biological, and health purposes.

Atomic Energy Act of 1954(P.L. 83-703): Further authorizedAEC “to conduct research on the bio-logic effects of ionizing radiation.”

Energy Reorganization Act of 1974(P.L. 93-438): Provided that responsi-bilities of ERDA should include “en-gaging in and supporting environ-mental, biomedical, physical, andsafety research related to the develop-ment of energy resources and utiliza-tion technologies.”

Federal Non-Nuclear EnergyResearch and Development Act of1974 (P.L. 93-577): AuthorizedERDA to conduct a comprehensive

non-nuclear energy research, devel-opment, and demonstration programto include the environmental and so-cial consequences of the various tech-nologies.

DOE Organization Act of 1977(P.L. 95-91): Instructed the depart-ment “to assure incorporation of na-tional environmental protection goalsin the formulation and implementa-tion of energy programs; and to ad-vance the goal of restoring, protect-ing, and enhancing environmentalquality, and assuring public healthand safety,” and to conduct “a com-prehensive program of research anddevelopment on the environmentaleffects of energy technology andprograms.”


and feasibility of implementing such a project. With virtualunanimity, participants agreed that ordering and eventuallysequencing DNA clones representing the human genomewere desirable and feasible goals. With the receipt of thisenthusiastic response, OHER initiated several pilot projects.Program guidance was further sought from the DOE HealthEffects Research Advisory Committee (HERAC).

HERAC Recommendation

The April 1987 HERAC report recommended that DOE andthe nation commit to a large, multidisciplinary scientific andtechnological undertaking to map and sequence the humangenome. DOE was particularly well suited to focus on re-source and technology development, the report noted;HERAC further recommended a leadership role for DOEbecause of its demonstrated expertise in managing complexand long-term multidisciplinary projects involving both thedevelopment of new technologies and the coordination ofefforts in industries, universities, and its own laboratories.

Evolution of the nation’s Human Genome Project further ben-efited from a 1988 study by the National Research Council(NRC) entitled Mapping and Sequencing the Human Ge-nome, which recommended that the United States support thisresearch effort and presented an outline for a multiphase plan.

DOE and NIH Coordination

The National Institutes of Health (NIH) was a necessary par-ticipant in the large-scale effort to map and sequence the hu-man genome because of its long history of support for bio-medical research and its vast community of scientists. Thiswas confirmed by the NRC report, which recommended amajor role for NIH. In 1987, under the leadership of DirectorJames Wyngaarden, NIH established the Office of GenomeResearch in the Director’s Office. In 1988, DOE and NIHsigned a Memorandum of Understanding in which the agen-cies agreed to work together, coordinate technical researchand activities, and share results. In 1990, DOE and NIH sub-mitted a joint research plan outlining short- and long-termgoals of the project.


Appendix B○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

DOE-NIH Guidelines for Sharing Data and Resources

At its December 7, 1992, meeting, the DOE-NIH Joint Sub-committee on the Human Genome approved the followingsharing guidelines, developed from the DOE draft of Septem-ber 1991.*

The information and resources generated by the Human Ge-nome Project have become substantial, and the interest inhaving access to them is widespread. It is therefore desirableto have a statement of philosophy concerning the sharing ofthese resources that can guide investigators who generate theresources as well as those who wish to use them.

A key issue for the Human Genome Project is how to pro-mote and encourage the rapid sharing of materials and datathat are produced, especially information that has not yetbeen published or may never be published in its entirety.Such sharing is essential for progress toward the goals of theprogram and to avoid unnecessary duplication. It is also de-sirable to make the fruits of genome research available to thescientific community as a whole as soon as possible to expe-dite research in other areas.

Although it is the policy of the Human Genome Project tomaximize outreach to the scientific community, it is also nec-essary to give investigators time to verify the accuracy oftheir data and to gain some scientific advantage from the ef-fort they have invested. Furthermore, in order to assure thatnovel ideas and inventions are rapidly developed to the ben-efit of the public, intellectual property protection may beneeded for some of the data and materials.

After extensive discussion with the community of genomeresearchers, the advisors of the NIH and DOE genome pro-grams have determined that consensus is developing aroundthe concept that a 6-month period from the time the data ormaterials are generated to the time they are made availablepublicly is a reasonable maximum in almost all cases. Morerapid sharing is encouraged.

Whenever possible, data should be deposited in public data-bases and materials in public repositories. Where appropriaterepositories do not exist or are unable to accept the data ormaterials, investigators should accommodate requests to theextent possible.

The NIH and DOE genome programs have decided to re-quire all applicants expecting to generate significant amountsof genome data or materials to describe in their applicationhow and when they plan to make such data and materialsavailable to the community. Grant solicitations will specifythis requirement. These plans in each application will be re-viewed in the course of peer review and by staff to assurethey are reasonable and in conformity with program philoso-phy. If a grant is made, the applicant’s sharing plans will be-come a condition of the award and compliance will be re-viewed before continuation funding is provided. Progressreports will be asked to address the issue.

*Reprinted from Human Genome News 4(5), 4 (1993).



Appendix CNIH-DOE Guidance on Human Subjects Issues

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

in Large-Scale DNA Sequencing

Intr oductionThe Human Genome Project (HGP) is now entering intolarge-scale DNA sequencing. To meet its complete sequenc-ing goal, it will be necessary to recruit volunteers willing tocontribute their DNA for this purpose. The guidance pro-vided in this document is intended to address ethical issuesthat must be considered in designing strategies for recruit-ment and protection of DNA donors for large-scalesequencing.

Nothing in this document should be construed to differ from,or substitute for, the policies described in the Federal Regu-lations for the Protection of Human Subjects [45CFR46(NIH) and 10CFR745 (DOE)]. Rather, it is intended tosupplement those policies by focusing on the particular is-sues raised by large-scale human DNA sequencing. Thisstatement addresses six topics: (1) benefits and risks of ge-nomic DNA sequencing; (2) privacy and confidentiality; (3)recruitment of DNA donors as sources for library construc-tion; (4) informed consent; (5) IRB approval; and (6) use ofexisting libraries.

The guidance provided in this statement is intended to affordmaximum protection to DNA donors and is based on the be-lief that protection can best be achieved by a combination ofapproaches including:

• ensuring that the initial version of the complete humanDNA sequence is derived from multiple donors;

• providing donors with the opportunity to make an in-formed decision about whether to contribute their DNAto this project; and

• taking effective steps to ensure the privacy and confi-dentiality of donors.

1. Benefits and Risks of Genomic DNASequencing

The HGP offers great promise for the improvement of humanhealth. As a consequence of the HGP, there will be a morethorough understanding of the genetic bases of human biol-ogy and of many diseases. This, in turn, will lead to bettertherapies and, perhaps more importantly, prevention strate-gies for many of those diseases. Similarly, as the technologydeveloped by the HGP is applied to understanding the biol-ogy of other organisms, many other human activities will beaffected including agriculture, environmental management,and biologically based industrial processes.

While the HGP offers great promise to humanity, there willbe no direct benefit, in either clinical or financial terms, toany of the individuals who choose to donate DNA forlarge-scale sequencing. Rather, the motivation for donation islikely to be an altruistic willingness to contribute to this his-toric research effort.

However, individuals who donate DNA to this effort mayface certain risks. Information derived from the donors willbecome available in public databases. Such information mayreveal, for example, DNA sequence-based information aboutdisease susceptibility. If the donor becomes aware of suchinformation, it could lead to emotional distress on her/hispart. If such health-related information becomes known toothers, discrimination against the donor (e.g., in insurance orin employment) could result. Unwanted notoriety is anotherpotential risk to donors. Therefore, those engaged inlarge-scale sequencing must be sensitive to the unique fea-tures of this type of research and ensure that both the protec-tions normally afforded research subjects and the special is-sues associated with human genomic DNA sequencing arethoroughly addressed.

While some risks to donors can already be identified, theprobability of adverse events materializing appears to below. However, the risks of harm to individuals will increaseif confidentiality is not maintained and/or the number of do-nors is limited to a very few individuals. Either, or both, ofthese situations would increase the possibility of a donor’sidentity being revealed without his/her knowledge orpermission.

A final issue to consider is characterized in a statement takenfrom the OPRR Guidebook1 which points out that “some ar-eas [of genetic research] present issues for which no clearguidance can be given at this point, either because enough isnot known about the risks presented by the research, or be-cause no consensus on the appropriate resolution of the prob-lem exists.” It is anticipated that the DNA sequence informa-tion produced by the Human Genome Project will be used inthe future for types of research which cannot now be pre-dicted and the risks of which cannot be assessed or disclosed.

2. Privacy and Confidentiality

In general, one of the most effective ways of protecting vol-unteers from the unexpected, unwelcome or unauthorized useof information about them is to ensure that there are no op-portunities for linking an individual donor with informationabout him/her that is revealed by the research. By not col-lecting information about the identity of a research subjectand any biological material or records developed in thecourse of the research, or by subsequently removing all

Date issued: August 9, 1996


identifiers (“anonymizing” the sample), the possibility of riskto the subject stemming from the results of the research isgreatly reduced. Large-scale DNA sequence determinationrepresents an exception because each person’s DNA sequenceis unique and, ultimately, there is enough information in anyindividual’s DNA sequence to absolutely identify her/him.However, the technology that would allow the unambiguousidentification of an individual from his/her DNA sequence isnot yet mature. Thus, for the foreseeable future, establishingeffective confidentiality, rather than relying on anonymity,will be a very useful approach to protecting donors.

Investigators should introduce as many disconnects betweenthe identity of donors and the publicly available informationand materials as possible. There should not be any way for any-one to establish that a specific DNA sequence came from a par-ticular individual, other than resampling an individual’s DNAand comparing it to the sequence information in the public data-base. In particular, no phenotypic or demographic informationabout donors should be linked to the DNA to be sequenced.2

For the purposes of the HGP such information will rarely beuseful, and recording such information could result in possiblemisuse and compromise donor confidentiality.

Confidentiality should be “two way.” Not only should othersbe unable to link a DNA sequence to a particular individual,but no individual who donates DNA should be able to confirmdirectly that a particular DNA sequence was obtained fromtheir DNA sample.3 This degree of confidentiality will pre-clude the possibility of re-contacting DNA donors, providinganother degree of protection for them. It should be clear toboth investigators and to donors that the contact involved inobtaining the initial specimen will be the only contact.4

Another approach for protecting all DNA donors is to reducethe incentive for wanting to know the identities of particulardonors. If the initial human sequence is a “mosaic” or “patch-work” of sequenced regions derived from a number of differ-ent individuals, rather than that of a single individual, therewould be considerably less interest in who the specific donorswere. Although there may be scientific justification that eachclone library used for sequencing should be derived from oneperson, there is no scientific reason that the entire initial hu-man DNA sequence should be that of a single individual. Asapproximately 99.9% of the human DNA sequence is commonbetween any two individuals, most of the fundamental bio-logical information contained in the human DNA sequence iscommon to all people.

To increase the likelihood that the first human DNA sequencewill be an amalgam of regions sequenced from differentsources, a number of clone libraries must be made available.Although a number of large insert libraries have been made,

most do not meet all of the standards set in this document;therefore, these libraries should be used as substrates forlarge-scale sequencing only under circumscribed conditions(see section 6, p. 79). Starting immediately, new librarieswill be developed that have the advantage of being con-structed in accordance with the ethical principles discussedin this document; they may also confer some additional sci-entific benefit. Such libraries are critical for the long-rangeneeds of the HGP.

3. Source/Recruitment of DNA Donorsfor Library Construction

Another implication of the fact that 99.9% of the humanDNA sequence is shared by any two individuals is that thebackgrounds of the individuals who donate DNA for the firsthuman sequence will make no scientific difference in termsof the usefulness and applicability of the information thatresults from sequencing the human genome. At the sametime, there will undoubtedly be some sensitivity about thechoice of DNA sources. There are no scientific reasons whyDNA donors should not be selected from diverse pools ofpotential donors.5

There are two additional issues that have arisen in consider-ing donor selection. These warrant particular discussion:

• It is recognized that women have historically beenunderrepresented in research, so it can be anticipatedthat concerns might arise if males (sperm DNA) wereused exclusively as the source of DNA for large-scalesequencing. Although there would be no scientific basisfor concern, because even in the case of a male source,half of the donor’s DNA would have come from hismother and half from his father, nevertheless perceptionsare not to be dismissed. While the choice of donors willnot be dictated to investigators, it is expected that, be-cause multiple libraries will be produced, a number ofthem will be made from female sources while others willbe made from male sources.

• Staff of laboratories involved in library construction andDNA sequencing may be eager to volunteer to be donorsbecause of their interest and belief in the HGP. However,proximity to the research may create some special vul-nerabilities for laboratory staff members. It is also pos-sible that they will feel pressure to donate and there maybe an increased likelihood that confidentiality would bebreached. Finally, there is a potential that the choice ofpersons so closely involved in the research may be inter-preted as elitist. For all of these reasons, it is recom-mended that donors should not be recruited from labora-tory staff, including the principal investigator.


4. Informed Consent

Obtaining informed consent specifically for the purpose ofdonating DNA for large-scale sequencing raises some uniqueconcerns. Because anonymity cannot be guaranteed and con-fidentiality protections are not absolute, the disclosure pro-cess to potential donors must clearly specify what the pro-cess of DNA donation involves, what may make it differentfrom other types of research, and what the implications areof one’s DNA sequence information being a public scientificresource.

Federal regulations (45CFR46 and 10CFR745) require thedisclosure of a number of issues in any informed consentdocument. They include such issues as potential benefits ofthe research, potential risks to the donor, control and owner-ship of donated material, long-term retention of donated ma-terial for future use, and the procedures that will be followed.In addition, there are several other disclosures that are ofspecial importance for donors of DNA for large-scale se-quencing. These include:

• the meaning of confidentiality and privacy of informa-tion in the context of large-scale DNA sequencing, andhow these issues will be addressed;

• the lack of opportunity for the donor to later withdrawthe libraries made from his/her DNA or his/her DNAsequence information from public use;

• the absence of opportunity for information of clinicalrelevance to be provided to the donor or her/his family;

• the possibility of unforeseen risks; and

• the possible extension of risk to family members of thedonor or to any group or community of interest (e.g.,gender, race, ethnicity) to which a donor might belong.

Many academic human genetics units have considerable ex-perience in dealing with research subjects and obtaining in-formed consent, while the laboratories that are likely to beinvolved in making the libraries for sequencing have, in gen-eral, much less experience of this type. Therefore, librarymakers are encouraged to establish a collaboration with oneor more human genetics units, with the latter being respon-sible for recruiting donors, obtaining informed consent, ob-taining the necessary biological samples, and providing ablinded sample to the library maker. Collaboration with tis-sue banks may be considered as long as these banks are col-lecting tissues in accordance with this guidance. The librarymaker should have no contact with the donor and no oppor-tunity to obtain any information about the donor’s identity.

5. IRB Approval

Effective immediately, projects to construct libraries forlarge-scale DNA sequencing must obtain Institutional Re-view Board (IRB) approval before work is initiated. IRBsshould carefully consider the unique aspects of large-scalesequencing projects. Some of the informed consent provi-sions outlined may be somewhat at odds with the usual andcustomary disclosures found in most protocols involving hu-man subjects and which IRBs usually consider. For example,research subjects usually are given the opportunity to with-draw from a research project if they change their mindsabout participating. In the case of donors for large-scale se-quencing, it will not be possible to withdraw either the librar-ies made from their DNA or the DNA sequence informationobtained using those libraries once the information is in thepublic domain. By the time a significant amount of DNA se-quence data has been collected, the libraries, as well as indi-vidual clones from them, will have been widely distributedand the sequence information will have been deposited inand distributed from public databases. In addition, there willbe no possibility of returning information of clinical rel-evance to the donor or his/her family.

6. Use of Existing Libraries forLarge-Scale Sequencing

Many of the existing libraries (including those derived fromanonymous donors) were not made in complete conformitywith the principles elaborated above. The potential risks thatmay result from their use will be minimized by the rapid in-troduction of several new libraries constructed in accordancewith this guidance, which NCHGR and DOE are taking stepsto initiate. This will ensure that the existing libraries willonly contribute small amounts to the first complete humanDNA sequence. In the interim, existing libraries can continueto be used for large-scale sequencing, only if IRB approvaland consent for “continued use” are obtained6 and approvalby the funding agency is granted.

It is important that in obtaining consent for contined use ofexisting libraries, no coercion of the DNA donor occur. It istherefore recommended that consideration be given towhether it is appropriate for the individual who previouslyrecruited the donor to recontact him/her to obtain this con-sent. In some cases an IRB may determine that the recontactshould be made by a third party to assure that the donors arefully informed and allowed to choose freely whether theirDNA can continue to be used for this purpose.


ConclusionThis document is intended to provide guidance to investiga-tors and IRBs who are involved in large-scale sequencingefforts. It is designed to alert them to special ethical con-cerns that may arise in such projects. In particular, it pro-vides guidance for the use of existing and the constructionof new DNA libraries. Adhering to this guidance will ensurethat the initial version of the complete human sequence isderived from multiple, diverse donors; that donors will havethe opportunity to make an informed decision aboutwhether to contribute their DNA to this project; and thateffective steps will be taken by investigators to ensure theprivacy and confidentiality of donors.

Investigators funded by NCHGR and DOE to develop newlibraries for large-scale human DNA sequencing will be re-quired to have their plans for the recruitment of DNA do-nors, including the informed consent documents, reviewedand approved by the funding agency before donors are re-cruited. Investigators involved in large-scale human se-quencing will also be asked to observe those aspects of thisguidance that pertain to them.

Approved August 17, 1996, by:

Francis S. Collins, M.D., Ph.D., Director, National Center for Human Genome Research, National Institutes of HealthAristides N. Patrinos, Ph.D., Associate Director, Office of Health and Environmental Research, U.S. Department of Energy

Footnotes

1. Office of Protection from Research Risks, ProtectingHuman Research Subjects: Institutional Review BoardGuidebook (OPRR: U.S. Government Printing Office,1993).

2. It is recognized that it will be trivially easy to deter-mine the sex of the donor of the library, by assaying for thepresence or absence of Y chromosome in the library.

3. There are a number of approaches to preventing aDNA donor from knowing that his/her DNA was actuallysequenced as part of the HGP. For example, each time aclone library is to be made, an appropriately diverse pool ofbetween five and ten volunteers can be chosen in such away that none of them knows the identity of anyone else inthe pool. Samples for DNA preparation and for preparationof a cell line can be collected from all of the volunteers(who have been told that their specimen may or may not

eventually be used for DNA sequencing) and one of thosesamples is randomly and blindly selected as the source actu-ally used for library construction. In this way, not only willthe identity of the individual whose DNA is chosen not beknown to the investigators, but that individual will also notbe sure that s/he is the actual source.

4. Although recontacting donors should not be possible,investigators will potentially want to be able to resample adonor’s genome. Thus, at the time the initial specimen is ob-tained, in addition to making a clone library representing thedonor’s genome, it should also be used to prepare an addi-tional aliquot of high molecular weight DNA for storage anda permanent cell line. Either resource could then be used as asource of the donor’s genome in case additional DNA wereneeded or comparison with the results of the analysis of thecloned DNA were desired.

5. There has been discussion in the scientific communityabout the sex of DNA donors. A library prepared from a fe-male donor will contain DNA from the X chromosome in anamount equivalent to the autosomes, but will completely lackY chromosomal DNA. Conversely, a library prepared from amale donor will contain Y DNA, but both X and Y DNA willonly be present at half the frequency of the DNA from theother chromosomes. Scientifically, then, there are both ad-vantages and disadvantages inherent in the use of either amale or a female donor. The question of the sex of the donoralso involves the question of the use of somatic or germ lineDNA to make libraries. For making libraries, useful amountsof germ line DNA can only be obtained from a male source(i.e., from sperm); it is not possible to obtain enough ovafrom a female donor to isolate germ line DNA for this pur-pose. Opinion is divided in the scientific community aboutwhether germ line or somatic DNA should be used forlarge-scale sequencing. Somatic DNA is known to be rear-ranged, relative to germ line DNA, in certain regions (e.g.,the immunoglobulin genes) and the possibility has beenraised that other developmentally based rearrangements mayoccur, although no example of the latter has been offered.While some believe that the sequence product should notcontain any rearrangements of this sort, others consider thispotential advantage of germ line DNA to be relatively minorin comparison to the need to have the X chromosome fullyrepresented in sequencing efforts and prefer the use of so-matic DNA.

6. Individuals whose DNA was used for library construc-tion (with the exception of those created from deceased oranonymous individuals) should be fully informed about therisks and benefits described above, should freely choosewhether they would like their DNA to continue to be used forthis purpose, and their decision should be documented.


Executive Summary of JointNIH-DOE Human SubjectsGuidance1. Those engaged in large-scale sequencing must be

sensitive to the unique features of this type of researchand ensure that both the protections normally affordedresearch subjects and the special issues associated withhuman genomic DNA sequencing are thoroughlyaddressed.

2. For the foreseeable future, establishing effectiveconfidentiality, rather than relying on anonymity, will bea very useful approach to protecting donors.

3. Investigators should introduce as many disconnectsbetween the identity of donors and the publicly availableinformation and materials as possible.

4. No phenotypic or demographic information aboutdonors should be linked to the DNA to be sequenced.

5. There are no scientific reasons why DNA donors shouldnot be selected from diverse pools of potential donors.

6. While the choice of donors will not be dictated toinvestigators, it is expected that, because multiplelibraries will be produced, a number of them will bemade from female sources while others will be madefrom male sources.

7. It is recommended that donors should not be recruitedfrom laboratory staff, including the principal investigator.

8. The disclosure process to potential donors must clearlyspecify what the process of DNA donation involves,what may make it different from other types of research,and what the implications are of one’s DNA sequenceinformation being a public scientific resource.

9. Library makers are encouraged to establish a collabora-tion with one or more human genetics units [or tissuebanks].

10. The library maker should have no contact with the donorand no opportunity to obtain any information about thedonor’s identity.

11. Effective immediately, projects to construct libraries forlarge-scale DNA sequencing must obtain InstitutionalReview Board (IRB) approval before work is initiated.

12. Existing libraries can continue to be used for large-scalesequencing, only if IRB approval and consent forcontinued use are obtained and approval by the fundingagency is granted.

13. It is important that in obtaining informed consent forcontinued use of existing libraries, no coercion of theDNA donor occur.



Appendix D

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Human Genome Project and Genetics on the World Wide Web

The World Wide Web offers the easiest path to informationabout the Human Genome Project and related genetics topics.Some useful sites to visit are included in the list below.

Human Genome Project

DOE Human Genome Programhttp://www.er.doe.gov/production/ober/hug_top.html

Devoted to the DOE component of the U.S. Human Ge-nome Project and to the DOE Microbial Genome Pro-gram. Links to many other sites.

Human Genome Project Informationhttp://www.ornl.gov/hgmis

Comprehensive site covering topics related to the U.S.and worldwide Human Genome Projects. Useful for up-dating scientists and providing educational material fornonscientists, in support of DOE’s commitment to publiceducation. Developed and maintained for DOE by theHuman Genome Management Information System(HGMIS) at Oak Ridge National Laboratory.

NIH National Human Genome Research Institutehttp://www.nhgri.nih.gov

Site of the NIH sector of the U.S. Human GenomeProject.

DOE Human Genome ProgramPublications

*Human Genome Newshttp://www.ornl.gov/hgmis/publicat/publications.html

Quarterly newsletter reporting on the worldwide HumanGenome Project.

Biological Sciences Curriculum Study (BSCS) TeachingModules

Online versions in preparation; hardcopies availablefrom 719/531-5550

• “Genes, Environment, and Human Behavior,” tenta-tive title, in preparation

• “Mapping and Sequencing the Human Genome:Science, Ethics, and Public Policy” (1992)

• “The Human Genome Project: Biology, Computers,and Privacy” (1996)

• “The Puzzle of Inheritance: Genetics and the Meth-ods of Science” (1997)

*Primer on Molecular Genetics, 1992http://www.ornl.gov/hgmis/publicat/publications.html#primer

Explains the science behind the genome research.

*To Know Ourselves, 1996http://www.ornl.gov/hgmis/tko

Booklet reviewing DOE’s role, history, and achieve-ments in the Human Genome Project and introducingthe science and other aspects of the project.

*Print copy available from HGMIS (see p. 87 or inside front coverfor contact information).

Ethical, Legal, and Social Issues Relatedto Genetics Research

HGMIS Gateways Web pagehttp://www.ornl.gov/hgmis/links.html

Choose “Ethical, Legal, and Social Issues.”

Center for Bioethics, University of Pennsylvaniahttp://www.med.upenn.edu/~bioethic

Full-text articles about such ethical issues as humancloning; includes a primer on bioethics.

Courts and Science On-Line Magazine (CASOLM)http://www.ornl.gov/courts

Coverage of genetic issues affecting the courts.

ELSI in Sciencehttp://www.lbl.gov/Education/ELSI/ELSI.html

Teaching modules designed to stimulate discussion onimplications of scientific research.

Eubios Ethics Institutehttp://www.biol.tsukuba.ac.jp/~macer/index.html

Site includes newsletter summarizing literature in bio-ethics and biotechnology.

Genetic Privacy Acthttp://www.ornl.gov/hgmis/resource/elsi.html

Model legislation written with support of the DOE Hu-man Genome Program.

MCET—The Human Genome Projecthttp://phoenix.mcet.edu/humangenome/index.html

ELSI issues for high school students.

August 1997


National Bioethics Advisory Committeehttp://www.nih.gov/nbac/nbac.htm

The bioethics committee offers advice to the NationalScience and Technology Council and others on bioethi-cal issues arising from research related to human biol-ogy and behavior.

National Center for Genomic Resourceshttp://www.ncgr.org

Comprehensive Genetics and Public Issues page; in-cludes congressional bills related to genetic privacy.

The Gene Letterhttp://www.geneletter.org/genetalk.html

Bimonthly newsletter to inform consumers and profes-sionals about advances in genetics and encourage dis-cussion about emerging policy dilemmas.

Your Genes, Your Choiceshttp://www.nextwave.org/ehr/books/index.html

Booklet written in simple English, describing the Hu-man Genome Project; the science behind it; and howethical, legal, and social issues raised by the project mayaffect people’s everyday lives.

General Genetics and Biotechnology

Many of the following sites contain links to both educationaland technical material.

HGMIS Community Education and Outreach GatewaysWeb Pagehttp://www.ornl.gov/hgmis/links.html

Access Excellencehttp://outcast.gene.com/ae/index.html

Extensive genetic and biotechnology resources forteachers and nonscientists.

BIO Online (Biotechnology Industry Organization)http://www.bio.com

Comprehensive directory of biotechnology sites on theInternet.

Biospacehttp://www.biospace.com

Biotech industry site; profiles biotech companies byregion.

BioTechhttp://biotech.chem.indiana.edu

An interactive educational resource and biotech refer-ence tool; includes a dictionary of 6000 life scienceterms.

Biotechnology Information Center, USDA NationalAgricultural Libraryhttp://www.nal.usda.gov/bic

Comprehensive agricultural biotechnology resource;includes a bibliography on patenting biotechnologyproducts and processes (http://www.nal.usda.gov/bic/Biblios/patentag.htm).

Bugs ’N Stuffhttp://www.ncgr.org/microbe

List of microbial genomes being sequenced, researchgroups, genome sizes, and facts about selected organ-isms. Links to related sites.

Careers in Geneticshttp://www.faseb.org/genetics/gsa/careers/bro-menu.htm

Online booklet from the Genetics Society of America,including several profiles of geneticists. See also careersections of sites specified above, such as Access Excel-lence.

Carolina Biological Supply Companyhttp://www.carosci.com/Tips.htm

Teaching materials for all levels. Includes mini-lessonson selected scientific topics, two online magazines,What’s New, software, catalogs, and publications.

Cell & Molecular Biology Onlinehttp://www.tiac.net/users/pmgannon

Links to electronic publications, current research, educa-tional and career resources, and more.

CERN Virtual Library, Genetics section, BiosciencesDivisionhttp://www.ornl.gov/TechResources/Human_Genome/genetics.html

Includes an organism index linking to other pertinentdatabases, information on the U.S. and international Hu-man Genome Projects, and links to research sites.

Classic Papers in Geneticshttp://www.esp.org

Covers the early years, with introductory notes. See alsoAccess Excellence site above for genetics history.


Community of Science Web Serverhttp://cos.gdb.org/best.html

Links to Medline, U.S. Patent Citation Database, Com-merce Business Daily, The Federal Register, and otherresources.

Database of Genome Sizeshttp://www.cbs.dtu.dk/databases/DOGS/index.html

Lists numerous organisms with genome sizes, scientificand common names, classifications, and references.

Genetic and biological resources linkshttp://www.er.doe.gov/production/ober/bioinfo_center.html

Genetics Education Center, University of Kansas MedicalCenterhttp://www.kumc.edu/instruction/medicine/genetics/homepage.html

Educational information on human genetics, career re-sources.

Genetics Glossaryhttp://www.ornl.gov/hgmis/publicat/glossary.html

Glossary of terms related to genetics.

Genetics Webliographyhttp://www.dml.georgetown.edu/%7Edavidsol/len.html

Extensive links for researchers and nonscientists fromGeorgetown University Library.

Genomics: A Global Resourcehttp://www.phrma.org/genomics/index.html

Many links. Website a joint project of the Pharmaceuti-cal Research and Manufacturers of America and theAmerican Institute of Biological Sciences; includesGenomics Today, a daily update on the latest news in thefield.

Hispanic Educational Genome Projecthttp://vflylab.calstatela.edu/hgp

Designed to educate high school students and their fami-lies about genetics and the Human Genome Project.Links to other projects.

Howard Hughes Medical Institutehttp://www.hhmi.org

Home page of major U.S. philanthropic organizationthat supports research in genetics, cell biology, immu-nology, structural biology, and neuroscience. Excellentintroductory information on these topics.

Library of Congresshttp://lcweb.loc.gov/homepage/lchp.html

Microbial Databasehttp://www.tigr.org/tdb/mdb/mdb.html

Lists completed and in-progress microbial genomes,with funding sources.

MIT Biology Hypertextbookhttp://esg-www.mit.edu:8001/esgbio/7001main.html

All the basics.

Science and Mathematics Resourceshttp://www-sci.lib.uci.edu

More than 2000 Web references, including FrankPotter’s Science Gems and Martindale’s Health ScienceGuide. For teachers at all levels.

Virtual Courses on the Webhttp://lenti.med.umn.edu/~mwd/courses.html

Links to Web tutorials in biology, genetics, and more.

Welch Webhttp://www.welch.jhu.edu

Links to many Internet biomedical resources, dictionaries,encyclopedias, government sites, libraries, and more, fromthe Johns Hopkins University Welch Library.

Why Fileshttp://whyfiles.news.wisc.edu

Illustrated explanations of the science behind the news.

Images on the Web

Biochemistry Onlinehttp://biochem.arach-net.com

Essays, courses, 3-D images of biomolecules, modeling,software.

Bugs in the News!http://falcon.cc.ukans.edu/~jbrown/bugs.html

Microbiology information and a nice collection of im-ages of biological molecules.

Cells Alive!http://www.cellsalive.com

Images (some moving) of different types of cells.


Cn3D (See in 3-D)http://www3.ncbi.nlm.nih.gov/Entrez/Structure/cn3d.html

3-D molecular structure viewer allowing the user to visual-ize and rotate structure data entries from Entrez. Highlytechnical, for researchers.

Cytogenetics Galleryhttp://www.pathology.washington.edu:80/Cytogallery

Photos (karyotypes) of normal and abnormal chromo-somes.

DNA Learning Center, Cold Spring Harbor Laboratoryhttp://darwin.cshl.org/index.html

Animated images of PCR and Southern Blotting tech-niques.

Gene Map from the 1996 Genome Issue of Sciencehttp://www.ncbi.nlm.nih.gov/SCIENCE96

Click on particular areas of chromosomes and find genes.

Images of Biological Moleculeshttp://www.cc.ukans.edu/~micro/picts.html

3-D structures of proteins and nucleic acids obtained fromBrookhaven National Laboratory Protein Database andothers.

Lawrence Livermore National Laboratory Chromosome 19Physical Maphttp://www-bio.llnl.gov/bbrp/genome/genome.html

Los Alamos National Laboratory Chromosome 16Physical Maphttp://www-ls.lanl.gov/DBqueries/QueryPage.html

Journals and Magazines

HGMIS Journals Gateways Web pagehttp://www.ornl.gov/hgmis/links.html

Choose “Journals, Books, Periodicals.”

Biochemistry and Molecular Biology Journalshttp://biochem.arach-net.com/beasley/journals.html

Comprehensive list.

Nature, Nature Genetics, and Nature Biotechnologyhttp://www.nature.com

Abstracts of articles, full text of letters and editorials.

Science Magazinehttp://www.sciencemag.org

Abstracts and some full-text articles.

Science Magazine Genome Issue (10/96)http://www.sciencemag.org/science/content/vol274/issue5287

Full text includes a “clickable” gene map.

Science Newshttp://www.sciencenews.org

Online version of weekly popular science magazine withfull text of selected articles.

Medical Genetics

Blazing a Genetic Trailhttp://www.hhmi.org/GeneticTrail

Illustrated booklet from the Howard Hughes MedicalInstitute on hunting for disease genes.

Directory of National Genetic Voluntary Organizationsand Related Resourceshttp://medhlp.netusa.net/agsg/agsgsup.htm

Support groups for people with genetic diseases andtheir families.

GeneCardshttp://bioinformatics.weizmann.ac.il/cards

A database of more than 6000 genes; describes theirfunctions, products, and biomedical applications.

Gene Therapyhttp://www.mc.vanderbilt.edu/gcrc/gene/index.html

Web course covering the basics, with links to other sites.

Inherited-Disease Genes Found by Positional Cloninghttp://www.ncbi.nlm.nih.gov/Baxevani/CLONE/index.html

Links to OMIM.

NIH Office of Recombinant DNA Activitieshttp://www.nih.gov/od/orda

Includes a database of human gene therapy protocols.

Online Mendelian Inheritance in Man (OMIM)http://www.ncbi.nlm.nih.gov/Omim

A comprehensive, authoritative, and up-to-date humangene and genetic disorder catalog that supports medicalgenetics and the Human Genome Project.


Promoting Safe and Effective Genetic Testing in theUnited States (1997)http://www.med.jhu.edu/tfgtelsi

Principles and recommendations by a joint NIH-DOEHuman Genome Project group that examined the devel-opment and provision of gene tests in the United States.

Understanding Gene Testinghttp://www.gene.com/ae/AE/AEPC/NIH/index.html

Illustrated brochure from the National Cancer Institute.

Science in the News

EurekAlert! http://www.eurekalert.org

InScight: http://www.apnet.com/inscight

SciWeb: http://www.sciweb.com/news.html

Short summaries of major stories, some with links torelated articles in other sources.

HMS Beaglehttp://biomednet.com/hmsbeagle

Biweekly electronic journal featuring major sciencestories, profiles, book reviews, and other items of interest.

Science Dailyhttp://www.sciencedaily.com

Headline stories, articles, and links to news services,newspapers, magazines, broadcast sources, journals, andorganizations. Also offers weekly bulletins for updatesby e-mail.

Science Guidehttp://www.scienceguide.com

Daily news and information service and free sciencenews e-mailer. Also contains directories of newsgroups,grant and funding resources, employment, and onlinejournals.

ScienceNowhttp://www.sciencenow.org

Daily online news service from Science magazine offersarticles on major science news.

Web Search Tools

Biosciences Index to WWW Virtual Libraryhttp://golgi.harvard.edu/htbin/biopages

Metacrawlerhttp://www.metacrawler.com

“Search the Net”http://metro.turnpike.net/adorn/search.html

Comprehensive list of search tools, libraries, world factbooks, and other useful information.

Search.comhttp://www.search.com

Yahoo!http://www.yahoo.com

Prepared August 1997 byHuman Genome Management Information SystemOak Ridge National Laboratory1060 Commerce Park, MS 6480Oak Ridge, TN 37830423/576-6669, [email protected]://www.ornl.gov/hgmis



Sequencing

Advanced Detectors for Mass SpectrometryW.H. Benner and J.M. JaklevicLawrence Berkeley National Laboratory, Berkeley, California

Mass Spectrometer for Human GenomeSequencingChung-Hsuan ChenOak Ridge National Laboratory, Oak Ridge, Tennessee

Genomic Sequence ComparisonsGeorge ChurchHarvard Medical School, Boston, Massachusetts

A PAC/BAC End-Sequence Data Resource forSequencing the Human Genome: A 2-Year PilotStudyPieter de JongRoswell Park Cancer Institute, Buffalo, New York

Multiple-Column Capillary Gel ElectrophoresisNorman DovichiUniversity of Alberta, Edmonton, Canada

DNA Sequencing with Primer LibrariesJohn J. Dunn and F. William StudierBrookhaven National Laboratory, Upton, New York

Rapid Preparation of DNA for AutomatedSequencingJohn J. Dunn and F. William StudierBrookhaven National Laboratory, Upton, New York

A PAC/BAC End-Sequence Database forHuman Genomic SequencingGlen A. EvansUniversity of Texas Southwestern Medical Center, Dallas, Texas

Automated DNA Sequencing by Parallel PrimerWalkingGlen A. EvansUniversity of Texas Southwestern Medical Center, Dallas, Texas

*Parallel Triplex Formation as PossibleApproach for Suppression of DNA-VirusesReproductionV.L. FlorentievRussian Academy of Sciences, Moscow, Russia

Advanced Automated Sequencing Technology:Fluorescent Detection for Multiplex DNASequencingRaymond F. GestelandUniversity of Utah, Salt Lake City, Utah

Resource for Molecular CytogeneticsJoe Gray and Daniel PinkelUniversity of California, San Francisco

DNA Sample Manipulation and AutomationTrevor HawkinsWhitehead Institute and Massachusetts Institute of Technol-ogy, Cambridge, Massachusetts

Construction of a Genome-Wide CharacterizedClone Resource for Genome SequencingLeroy Hood, Mark D. Adams,1 and Melvin Simon2

University of Washington, Seattle1The Institute for Genomic Research, Rockville, Maryland2California Institute of Technology, Pasadena, California

DNA Sequencing Using Capillary ElectrophoresisBarry L. KargerNortheastern University, Boston, Massachusetts

Ultrasensitive Fluorescence Detection of DNARichard A. Mathies and Alexander N. GlazerUniversity of California, Berkeley

Joint Human Genome Program BetweenArgonne National Laboratory and theEngelhardt Institute of Molecular BiologyAndrei MirzabekovArgonne National Laboratory, Argonne, Illinois, andEngelhardt Institute of Molecular Biology, Moscow, Russia

High-Throughput DNA Sequencing: SAmpleSEquencing (SASE) Analysis as a Frameworkfor Identifying Genes and CompleteLarge-Scale Genomic SequencingRobert K. MoyzisLos Alamos National Laboratory, Los Alamos, New Mexico

One-Step PCR SequencingBarbara Ramsay ShawDuke University, Durham, North Carolina

Appendix E

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

1996 Human Genome Research Projects

*Projects designated by an asterisk were funded through small emergencygrants to Russian scientists following December 1992 site reviews by DavidGalas (formerly of OHER, renamed OBER in 1997), Raymond Gesteland(University of Utah), and Elbert Branscomb (LLNL).

Research abstracts of these projects appear in Part 2 of this report.


Automation of the Front End of DNA SequencingLloyd M. Smith and Richard A. GuilfoyleUniversity of Wisconsin, Madison

High-Speed DNA Sequence Analysis by Matrix-Assisted Laser Desorption Mass SpectrometryLloyd M. SmithUniversity of Wisconsin, Madison

Analysis of Oligonucleotide Mixtures byElectrospray Ionization-Mass SpectrometryRichard D. SmithPacific Northwest National Laboratory, Richland, Washington

High-Speed Sequencing of Single DNA Mol-ecules in the Gas Phase by FTICR-MSRichard D. SmithPacific Northwest National Laboratory, Richland, Washington

Characterization and Modification of DNAPolymerases for Use in DNA SequencingStanley TaborHarvard University, Boston, Massachusetts

Modular Primers for DNA SequencingLevy Ulanovsky1,2

1Argonne National Laboratory, Argonne, Illinois2Weizmann Institute of Science, Rehovot, Israel

Time-of-Flight Mass Spectroscopy of DNA forRapid SequencePeter WilliamsArizona State University, Tempe, Arizona

Development of Instrumentation for DNASequencing at a Rate of 40 Million Bases Per DayEdward S. YeungIowa State University, Ames, Iowa

Mapping

Resolving Proteins Bound to Individual DNAMoleculesDavid Allison and Bruce WarmackOak Ridge National Laboratory, Oak Ridge, Tennessee

*Improved Cell Electrotransformation byMacromoleculesAlexandre S. BoitsovSt. Petersburg State Technical University, St. Petersburg, Russia

Overcoming Genome Mapping BottlenecksCharles R. CantorBoston University, Boston, Massachusetts

Preparation of PAC LibrariesPieter J. de JongRoswell Park Cancer Institute, Buffalo, New York

Chromosomes by Third-Strand BindingJacques R. FrescoPrinceton University, Princeton, New Jersey

Chromosome Region-Specific Libraries forHuman Genome AnalysisFa-Ten KaoEleanor Roosevelt Institute for Cancer Research, Denver,Colorado

*Identification and Mapping of DNA-BindingProteins Along Genomic DNA by DNA-ProteinCrosslinkingV.L. KarpovEngelhardt Institute of Molecular Biology, Russian Academyof Sciences, Moscow, Russia

A PAC/BAC Data Resource for SequencingComplex Regions of the Human Genome:A 2-Year Pilot StudyJulie R. KorenbergCedars Sinai Medical Center, Los Angeles, California

Mapping and Sequencing of the HumanX ChromosomeD. L. NelsonBaylor College of Medicine, Houston, Texas

*Sequence-Specific Proteins Binding to theRepetitive Sequences of High EukaryoticGenomeOlga PodgornayaInstitute of Cytology, Russian Academy of Sciences,St. Petersburg, Russia

*Protein-Binding DNA SequencesO.L. PolanovskyEngelhardt Institute of Molecular Biology, Russian Academyof Sciences, Moscow, Russia


*Development of Intracellular Flow KaryotypeAnalysisA.I. PoletaevEngelhardt Institute of Molecular Biology, Russian Academyof Sciences, Moscow, Russia

Mapping and Sequencing with BACs andFosmidsMelvin I. SimonCalifornia Institute of Technology, Pasadena, California

Towards a Globally Integrated,Sequence-Ready BAC Map of the HumanGenomeMelvin I. SimonCalifornia Institute of Technology, Pasadena, California

Generation of Normalized and SubtractedcDNA Libraries to Facilitate Gene DiscoveryMarcelo Bento SoaresColumbia University, New York, New York

Mapping in Man-Mouse Homology RegionsLisa StubbsOak Ridge National Laboratory, Oak Ridge, Tennessee

Positional Cloning of Murine GenesLisa StubbsOak Ridge National Laboratory, Oak Ridge, Tennessee

Human Artificial Episomal Chromosomes(HAECS) for Building Large Genomic LibrariesJean-Michel H. VosUniversity of North Carolina, Chapel Hill

*Cosmid and cDNA Map of a HumanChromosome 13q14 Region Frequently Lostat B Cell Chronic Lymphocytic LeukemiaN.K. YankovskyN.I. Vavilov Institute of General Genetics, Moscow, Russia

Informatics

BCM Server CoreDaniel DavisonBaylor College of Medicine, Houston, Texas

A Freely Sharable Database-ManagementSystem Designed for Use in Component-Based,Modular Genome Informatics SystemsNathan GoodmanThe Jackson Laboratory, Bar Harbor, Maine

A Software Environment for Large-ScaleSequencingMark GravesBaylor College of Medicine, Houston, Texas

Generalized Hidden Markov Models forGenomic Sequence AnalysisDavid HausslerUniversity of California, Santa Cruz

Identification, Organization, and Analysis ofMammalian Repetitive DNA InformationJerzy JurkaGenetic Information Research Institute, Palo Alto, California

*TRRD, GERD and COMPEL: Databases onGene-Expression Regulation as a Tool forAnalysis of Functional Genomic SequencesN.A. KolchanovInstitute of Cytology and Genetics, Novosibirsk, Russia

Data-Management Tools for Genomic DatabasesVictor M. Markowitz and I-Min A. ChenLawrence Berkeley National Laboratory, Berkeley, California

The Genome Topographer: System DesignT. MarrCold Spring Harbor Laboratory, Cold Spring Harbor,New York

A Flexible Sequence Reconstructor forLarge-Scale DNA Sequencing: A CustomizableSoftware System for Fragment AssemblyGene MyersUniversity of Arizona, Tucson

The Role of Integrated Software and Databasesin Genome Sequence Interpretation andMetabolic ReconstructionRoss OverbeekArgonne National Laboratory, Argonne, Illinois


Database Transformations for BiologicalApplicationsG. Christian Overton, Susan B. Davidson, andPeter BunemanUniversity of Pennsylvania, Philadelphia

Las Vegas Algorithm for Gene Recognition:Suboptimal and Error-Tolerant SplicedAlignmentPavel A. PevznerUniversity of Southern California, Los Angeles, California

Foundations for a Syntactic Pattern-Recognition System for Genomic DNASequences: Languages, Automata, Interfaces,and MacromoleculesDavid B. SearlsSmithKline Beecham Pharmaceuticals, King of Prussia,Pennsylvania

Analysis and Annotation of Nucleic AcidSequenceDavid J. StatesWashington University, St. Louis, Missouri

Gene Recognition, Modeling, and HomologySearch in GRAIL and genQuestEdward C. UberbacherOak Ridge National Laboratory, Oak Ridge, Tennessee

Informatics Support for Mapping inMouse-Human Homology RegionsEdward UberbacherOak Ridge National Laboratory, Oak Ridge, Tennessee

SubmitData: Data Submission to PublicGenomic DatabasesManfred D. ZornLawrence Berkeley National Laboratory, University of California, Berkeley

ELSI

The Human Genome: Science and the SocialConsequences; Interactive Exhibits and Pro-grams on Genetics and the Human GenomeCharles C. CarlsonThe Exploratorium, San Francisco, California

Documentary Series for Public BroadcastingGraham Chedd and Noel SchwerinChedd-Angier Production Company, Watertown,Massachusetts

Human Genome Teacher Networking ProjectDebra L. Collins and R. Neil SchimkeUniversity of Kansas Medical Center, Kansas City, Kansas

Human Genome Education ProgramLane ConnStanford Human Genome Center, Palo Alto, California

Your World/Our World–Biotechnology & You:Special Issue on the Human Genome ProjectJeff Davidson and Laurence WeinbergerPennsylvania Biotechnology Association, State College,Pennsylvania

The Human Genome Project and MentalRetardation: An Educational ProgramSharon DavisThe Arc of the United States, Arlington, Texas

Pathways to Genetic Screening: MolecularGenetics Meets the High-Risk FamilyTroy DusterUniversity of California, Berkeley

Intellectual Property Issues in GenomicsRebecca S. EisenbergUniversity of Michigan Law School, Ann Arbor, Michigan

AAAS Congressional Fellowship ProgramStephen GoodmanThe American Society of Human Genetics, Bethesda,Maryland

A Hispanic Educational Program for Scientific,Ethical, Legal, and Social Aspects of the HumanGenome ProjectMargaret C. Jefferson and Mary Ann Sesma1

California State University and 1Los Angeles Unified SchoolDistrict, Los Angeles, California

Implications of the Geneticization of HealthCare for Primary Care PractitionersMary B. MahowaldUniversity of Chicago, Chicago, Illinois


Nontraditional Inheritance: Genetics and theNature of Science; Instructional Materials forHigh School BiologyJoseph D. McInerney and B. Ellen FriedmanBiological Sciences Curriculum Study, Colorado Springs,Colorado

The Human Genome Project: Biology,Computers, and Privacy: Development ofEducational Materials for High School BiologyJoseph D. McInerney and Lynda B. MicikasBiological Sciences Curriculum Study, Colorado Springs,Colorado

Involvement of High School Students in Se-quencing the Human GenomeMaureen M. Munn , Maynard V. Olson, and Leroy HoodUniversity of Washington, Seattle

The Gene Letter: A Newsletter on Ethical, Legal,and Social Issues in Genetics for InterestedProfessionals and ConsumersPhilip J. Reilly , Dorothy C. Wertz, and Robin J.R. BlattThe Shriver Center for Mental Retardation, Waltham,Massachusetts

The DNA Files: A Nationally Syndicated Seriesof Radio Programs on the Social Implications ofHuman Genome Research and Its ApplicationsBari ScottGenome Radio Project, KPFA-FM, Berkeley, California

Communicating Science in Plain Language:The Science+ Literacy for Health: HumanGenome ProjectMaria Sosa, Judy Kass, and Tracy GathAmerican Association for the Advancement of Science,Washington, D.C.

The Community College InitiativeSylvia J. Spengler and Laurel EgenbergerLawrence Berkeley National Laboratory, Berkeley, California

Genome EducatorsSylvia Spengler and Janice MannLawrence Berkeley National Laboratory, Berkeley, California

Getting the Word Out on the Human GenomeProject: A Course for PhysiciansSara L. Tobin and Ann Boughton1

Stanford University, Palo Alto, California1Thumbnail Graphics, Oklahoma City, Oklahoma

The Genetics Adjudication Resource ProjectFranklin M. ZweigEinstein Institute for Science, Health, and the Courts,Bethesda, Maryland

Infrastructur e

Alexander Hollaender DistinguishedPostdoctoral FellowshipsLinda Holmes and Eugene SpejewskiOak Ridge Institute for Science and Education, Oak Ridge,Tennessee

Human Genome Management InformationSystemBetty K. Mansfield and John S. WassomOak Ridge National Laboratory, Oak Ridge, Tennessee

Human Genome Program CoordinationSylvia J. SpenglerLawrence Berkeley National Laboratory, Berkeley, California

Support of Human Genome Program ProposalReviewsWalter WilliamsOak Ridge Institute for Science and Education, Oak Ridge,Tennessee

Former Soviet Union Office of Health andEnvironmental Research ProgramJames WrightOak Ridge Institute for Science and Education, Oak Ridge,Tennessee

SBIR

1996 Phase I

An Engineered RNA/DNA Polymerase toIncrease Speed and Economy of DNASequencingMark W. KnuthPromega Corporation, Madison, Wisconsin


Directed Multiple DNA Sequencing andExpression Analysis by HybridizationGualberto RuanoBIOS Laboratories, Inc., New Haven, Connecticut

1996 Phase II

A Graphical Ad Hoc Query Interface Capableof Accessing Heterogeneous Public GenomeDatabasesJoseph LeoneCyberConnect Corporation, Storrs, Connecticut

Low-Cost Automated Preparation of Plasmid,Cosmid, and Yeast DNAWilliam P. MacConnellMacConnell Research Corporation, San Diego, California

GRAIL-GenQuest: A ComprehensiveComputational Framework for DNA SequenceAnalysisRuth Ann ManningApoCom, Inc., Oak Ridge, Tennessee


○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Appendix F: DOE BER Program

Text and photos in this appendix first appeared in a brochureprepared by the Human Genome Management InformationSystem for the DOE Office of Biological and EnvironmentalResearch to announce a symposium celebrating 50 years ofachievements in the Biological and Environmental ResearchProgram. “Serving Science and Society into the NewMillennium” was held on May 21–22, 1997, at the NationalAcademy of Sciences in Washington, D.C. The colorbrochure and other recent publications related to BERresearch, including the historically comprehensive A VitalLegacy, may be obtained from HGMIS at the address on theinside front cover.


Biological and Environmental Research ProgramAristides Patrinos, Ph.D.

Associate Director for Energy Researchfor the

Office of Biological and Environmental ResearchU.S. Department of Energy

301/903-3251, Fax: 301/903-5051http://www.er.doe.gov/production/ober/ober_top.html

William R. Wiley Environmental Molecular Sciences Laboratory (EMSL) is anational collaborative user facility for providing innovative approaches to meetthe needs of DOE’s environmental missions.

National User Facilities

Dedicated biomedical resources, such asthose maintained by BER at several DOElaboratories, are available at little or nocharge. These resources enable scientiststo gain an understanding of relationshipsbetween biological structures and theirfunctions, study disease processes,develop new pharmaceuticals, andconduct basic research in molecularbiology and environmental processes.

DOE Biological andEnvironmental ResearchProgramAn Extraordinary LegacyTo exploit the boundless promise of energy technologies and shedlight on their consequences to public health and the environment,the Biological and Environmental Research program of the U.S.Department of Energy’s (DOE) Office of Health andEnvironmental Research (OHER) has engaged in a variety ofmultidisciplinary research activities:

• Establishing the world’s first Human Genome Program.

• Developing advanced medical diagnostic tools andtreatments for human disease.

• Assessing the health effects of radiation.


An Enduring MandateDOE is carrying forward Congressional mandates that beganwith its predecessors, the Atomic Energy Commission and theEnergy Research and Development Agency:

Contribute to a Healthy Citizenry• Develop innovative technologies for tomorrow’s

biomedical sciences.

• Provide the basis for individual risk assessments bydetermining the human genome’s fine structure by theyear 2005.

• Conduct research into advanced medical technologiesand radiopharmaceuticals.

• Build and support national user facilities fordetermining biological structure, and ultimatelyfunction, at the molecular and cellular level.

DOE user facilities are revealing the molecular details oflife. Knowing the 3-D structure of the ras protein (above),an important molecular switch governing human cellgrowth, will enable interventions to shut off this switch incancer cells.Understand Global Climate

ChangePredict the effects of energy production and its use on theregional and global environment by acquiring data anddeveloping the necessary understanding of environmentalprocesses.

Determining the fine structure—DNA sequence—of themicroorganism Methanococcus jannaschii (pictured at right,top) and other minimal life forms in DOE’s MicrobialGenome Program will benefit medicine, agriculture,industrial and energy production, and environmentalbioremediation. The circular representation of the singleM. jannaschii chromosome, which was fully sequenced in1996, illustrates the location of genes and other importantfeatures. (Vertical bar represents a portion of a sequencingexperiment.)

Contribute to EnvironmentalCleanupConduct fundamental research to establish a betterscientific basis for remediating contaminated sites.


Genome Projects

A legacy of DOE research on geneticeffects paved the way for the world’sfirst Human Genome Program. Now newgenomic technologies are being appliedto environmental cleanup through theDOE Natural and AcceleratedBioremediation Research and MicrobialGenome programs, healthcare and riskassessment, and such other nationalpriorities as industrial processes andagriculture.

Discover the breadth of current activities and recent accomplishments via the BER Web Site:

http://www.er.doe.gov/production/ober/ober_top.html

The laser-based flowcytometer developed atDOE nationallaboratories enablesresearchers to separatehuman chromosomesfor analysis.

Fifty Years of Achievements. . .Leading to Innovative Solutions

Tools for Medicine and Research

Radioisotopes developed for medicine and medical imaging arebeing merged with current knowledge in biology and genetics todiscover new ways of diagnosing and treating cancer and otherdisorders, detecting genes in action, and understanding normaldevelopment and function of human organ systems.

One-quarter of all patients in U.S.hospitals undergo tests using descendantsof cameras developed by BER to followradioactive tracers in the body. PETscanning has been key to a generation ofbrain metabolism studies as well asdiagnostic tests for heart disease andcancer. PET studies above reveal brainmetabolism differences in recoveringalcoholics (left, 10 days, and right,30 days, after withdrawal from alcohol).

• Radioactive molecules used in medical imaging for positronemission tomography (PET) and magnetic resonance imaging(MRI) allow noninvasive diagnosis, monitoring, andexploration of human disorders and their treatments.

• Isotopes and other tracers ofbrain activity are being used toexplore drug addiction, theeffects of smoking,Alzheimer's disease,Parkinson's disease, andschizophrenia.

• Technetium-99m is used todiagnose diseases of thekidney, liver, heart, brain, andother organs in about13 million patients per year.

• Striking successes have beenachieved using charged atomicparticles to treat thyroid diseases,pituitary tumors, and eye cancer,among other disorders.


Creating a New Science of Ecology

BER achievements in using radioactive tracers to followthe movements of animals, routes of chemicals throughfood chains, decomposition of forest detritus, togetherwith the program's introduction of computer simulations,created the new field of radioecology.

High-performancecomputing ispromotingfaster andmore realisticsolutions tolong-termclimate change.

The Unmanned Aerospace Vehicle (above) conductsmeasurements to quantify the fate of solar radiation falling onthe earth.

Understanding Global Change

Important achievements in environmental researchhave led to enhanced capabilities in studying globalchange, including more accurate predictions ofglobal and regional climate changes induced byincreasing atmospheric concentrations ofgreenhouse gases.

Tracking the Regional and GlobalMovement of Pollutants

BER research helped to establish the earliest and mostauthoritative monitoring network in the world todetect airborne radioisotopes. The use of atmospherictracers has led to the improved ability to predict thedispersion of pollutants.

Radiation Risks and Protection Guidelines

BER studies have become the foundation for laws andstandards that protect the population, including workersexposed to radiological sources:

• Guidelines for the safe use of diagnostic X rays andradiopharmaceuticals.

• Safety standards for the presence of radionuclides infood and drinking water.

• Radiation-detection systems and dosimetrytechniques.

Human chromosomes “painted” by fluorescent dyes to detectabnormal exchange of genetic material frequently present incancer. Chromosome paints also serve as valuable resources forother clinical and research applications.

Finding a Link Between DNA Damageand Cancers

Studies of DNA damage have uncovered similarmechanisms at work in damage caused by radiationexposure, X rays, ultraviolet light, and cancer-causingchemicals. A screening test for such chemicals is nowone of the first hurdles a new compound must clear onits way to regulatory and public acceptance. . . . (it’s) not so much where we stand

as in what direction we are moving.[Oliver Wendell Homes, Sr.]”

“



○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Glossary

AAdenine (A): A nitrogenous base, one member of the basepair A-T (adenine-thymine).

Allele: Alternative form of a genetic locus; a single allele foreach locus is inherited separately from each parent (e.g., at alocus for eye color the allele might result in blue or browneyes).

Amino acid: Any of a class of 20 molecules that are com-bined to form proteins in living things. The sequence ofamino acids in a protein and hence protein function are deter-mined by the genetic code.

Amplification: An increase in the number of copies of a spe-cific DNA fragment; can be in vivo or in vitro. See cloning,polymerase chain reaction.

Arrayed library: Individual primary recombinant clones(hosted in phage, cosmid, YAC, or other vector) that areplaced in two-dimensional arrays in microtiter dishes. Eachprimary clone can be identified by the identity of the plateand the clone location (row and column) on that plate. Ar-rayed libraries of clones can be used for many applications,including screening for a specific gene or genomic region ofinterest as well as for physical mapping. Information gath-ered on individual clones from various genetic linkage andphysical map analyses is entered into a relational databaseand used to construct physical and genetic linkage maps si-multaneously; clone identifiers serve to interrelate the multi-level maps. Compare library, genomic library.

Autoradiography: A technique that uses X-ray film to visu-alize radioactively labeled molecules or fragments of mol-ecules; used in analyzing length and number of DNA frag-ments after they are separated by gel electrophoresis.

Autosome: A chromosome not involved in sex determina-tion. The diploid human genome consists of 46 chromo-somes, 22 pairs of autosomes, and 1 pair of sex chromo-somes (the X and Y chromosomes).

BBAC: See bacterial artificial chromosome.

Bacterial artificial chromosome (BAC): A vector used toclone DNA fragments (100- to 300-kb insert size; average,150 kb) in Escherichia coli cells. Based on naturally occur-ring F-factor plasmid found in the bacterium E. coli. Com-pare cloning vector.

Bacteriophage: See phage.

Base pair (bp): Two nitrogenous bases (adenine and thym-ine or guanine and cytosine) held together by weak bonds.Two strands of DNA are held together in the shape of adouble helix by the bonds between base pairs.

Base sequence: The order of nucleotide bases in a DNAmolecule.

Base sequence analysis: A method, sometimes automated,for determining the base sequence.

Biotechnology: A set of biological techniques developedthrough basic research and now applied to research and prod-uct development. In particular, the use by industry of recom-binant DNA, cell fusion, and new bioprocessing techniques.

bp: See base pair.

CcDNA: See complementary DNA.

Centimorgan (cM): A unit of measure of recombination fre-quency. One centimorgan is equal to a 1% chance that amarker at one genetic locus will be separated from a markerat a second locus due to crossing over in a single generation.In human beings, 1 centimorgan is equivalent, on average, to1 million base pairs.

Centromere: A specialized chromosome region to whichspindle fibers attach during cell division.

Chromosome: The self-replicating genetic structure of cellscontaining the cellular DNA that bears in its nucleotide se-quence the linear array of genes. In prokaryotes, chromo-somal DNA is circular, and the entire genome is carried onone chromosome. Eukaryotic genomes consist of a numberof chromosomes whose DNA is associated with differentkinds of proteins.

Clone bank: See genomic library.

Clone: A group of cells derived from a single ancestor.

Cloning: The process of asexually producing a group ofcells (clones), all genetically identical, from a single ances-tor. In recombinant DNA technology, the use of DNA ma-nipulation procedures to produce multiple copies of a singlegene or segment of DNA is referred to as cloning DNA.

This glossary was adapted from definitions in the DOEPrimer on Molecular Genetics (1992).

http://www.ornl.gov/hgmis/publicat/primer/intro.html. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

DOE Human Genome Program Report, Glossary


Cloning vector: DNA molecule originating from a virus, aplasmid, or the cell of a higher organism into which anotherDNA fragment of appropriate size can be integrated withoutloss of the vectors capacity for self-replication; vectors intro-duce foreign DNA into host cells, where it can be reproducedin large quantities. Examples are plasmids, cosmids, andyeast artificial chromosomes; vectors are often recombinantmolecules containing DNA sequences from several sources.

cM: See centimorgan.

Code: See genetic code.

Codon: See genetic code.

Complementary DNA (cDNA): DNA that is synthesizedfrom a messenger RNA template; the single-stranded form isoften used as a probe in physical mapping.

Complementary sequence: Nucleic acid base sequence thatcan form a double-stranded structure by matching base pairswith another sequence; the complementary sequence toG-T-A-C is C-A-T-G.

Conserved sequence: A base sequence in a DNA molecule(or an amino acid sequence in a protein) that has remainedessentially unchanged throughout evolution.

Contig: Group of clones representing overlapping regions ofa genome.

Contig map: A map depicting the relative order of a linkedlibrary of small overlapping clones representing a completechromosomal segment.

Cosmid: Artificially constructed cloning vector containingthe cos gene of phage lambda. Cosmids can be packaged inlambda phage particles for infection into E. coli; this permitscloning of larger DNA fragments (up to 45 kb) than can beintroduced into bacterial hosts in plasmid vectors.

Crossing over: The breaking during meiosis of one maternaland one paternal chromosome, the exchange of correspond-ing sections of DNA, and the rejoining of the chromosomes.This process can result in an exchange of alleles betweenchromosomes. Compare recombination.

Cytosine (C): A nitrogenous base, one member of the basepair G-C (guanine and cytosine).

DDeoxyribonucleotide: See nucleotide.

Diploid: A full set of genetic material, consisting of pairedchromosomes one chromosome from each parental set. Mostanimal cells except the gametes have a diploid set of chro-mosomes. The diploid human genome has 46 chromosomes.Compare haploid.

DNA (deoxyribonucleic acid): The molecule that encodesgenetic information. DNA is a double-stranded moleculeheld together by weak bonds between base pairs of nucle-otides. The four nucleotides in DNA contain the bases: ad-enine (A), guanine (G), cytosine (C), and thymine (T). Innature, base pairs form only between A and T and between Gand C; thus the base sequence of each single strand can bededuced from that of its partner.

DNA probe: See probe.

DNA replication: The use of existing DNA as a template forthe synthesis of new DNA strands. In humans and other eu-karyotes, replication occurs in the cell nucleus.

DNA sequence: The relative order of base pairs, whether ina fragment of DNA, a gene, a chromosome, or an entire ge-nome. See base sequence analysis.

Domain: A discrete portion of a protein with its own func-tion. The combination of domains in a single protein deter-mines its overall function.

Double helix: The shape that two linear strands of DNA as-sume when bonded together.

EE. coli: Common bacterium that has been studied intensivelyby geneticists because of its small genome size, normal lackof pathogenicity, and ease of growth in the laboratory.

Electrophoresis: A method of separating large molecules(such as DNA fragments or proteins) from a mixture of simi-lar molecules. An electric current is passed through a me-dium containing the mixture, and each kind of molecule trav-els through the medium at a different rate, depending on itselectrical charge and size. Separation is based on these differ-ences. Agarose and acrylamide gels are the media commonlyused for electrophoresis of proteins and nucleic acids.

Endonuclease: An enzyme that cleaves its nucleic acid sub-strate at internal sites in the nucleotide sequence.

Enzyme: A protein that acts as a catalyst, speeding the rate atwhich a biochemical reaction proceeds but not altering thedirection or nature of the reaction.



EST: Expressed sequence tag. See sequence tagged site.

Eukaryote: Cell or organism with membrane-bound, struc-turally discrete nucleus and other well-developed subcellularcompartments. Eukaryotes include all organisms exceptviruses, bacteria, and blue-green algae. Compare prokaryote.See chromosome.

Evolutionarily conserved: See conserved sequence.

Exogenous DNA: DNA originating outside an organism.

Exon: The protein-coding DNA sequence of a gene. Com-pare intron.

Exonuclease: An enzyme that cleaves nucleotides sequen-tially from free ends of a linear nucleic acid substrate.

Expressed gene: See gene expression.

FFISH (fluorescence in situ hybridization): A physical map-ping approach that uses fluorescein tags to detect hybridiza-tion of probes with metaphase chromosomes and with theless-condensed somatic interphase chromatin.

Flow cytometry: Analysis of biological material by detec-tion of the light-absorbing or fluorescing properties of cellsor subcellular fractions (i.e., chromosomes) passing in a nar-row stream through a laser beam. An absorbance or fluores-cence profile of the sample is produced. Automated sortingdevices, used to fractionate samples, sort successive dropletsof the analyzed stream into different fractions depending onthe fluorescence emitted by each droplet.

Flow karyotyping: Use of flow cytometry to analyze andseparate chromosomes on the basis of their DNA content.

GGamete: Mature male or female reproductive cell (sperm orovum) with a haploid set of chromosomes (23 for humans).

Gene: The fundamental physical and functional unit of he-redity. A gene is an ordered sequence of nucleotides locatedin a particular position on a particular chromosome that en-codes a specific functional product (i.e., a protein or RNAmolecule). See gene expression.

Gene expression: The process by which a gene’s coded in-formation is converted into the structures present and operat-ing in the cell. Expressed genes include those that are tran-scribed into mRNA and then translated into protein and thosethat are transcribed into RNA but not translated into protein(e.g., transfer and ribosomal RNAs).

Gene family: Group of closely related genes that make simi-lar products.

Gene library: See genomic library.

Gene mapping: Determination of the relative positions ofgenes on a DNA molecule (chromosome or plasmid) and ofthe distance, in linkage units or physical units, between them.

Gene product: The biochemical material, either RNA orprotein, resulting from expression of a gene. The amount ofgene product is used to measure how active a gene is; abnor-mal amounts can be correlated with disease-causing alleles.

Genetic code: The sequence of nucleotides, coded in triplets(codons) along the mRNA, that determines the sequence ofamino acids in protein synthesis. The DNA sequence of agene can be used to predict the mRNA sequence, and the ge-netic code can in turn be used to predict the amino acid se-quence.

Genetic engineering technology: See recombinant DNAtechnology.

Genetic map: See linkage map.

Genetic material: See genome.

Genetics: The study of the patterns of inheritance of specifictraits.

Genome: All the genetic material in the chromosomes of aparticular organism; its size is generally given as its totalnumber of base pairs.

Genome project: Research and technology developmenteffort aimed at mapping and sequencing some or all of thegenome of human beings and other organisms.

Genomic library: A collection of clones made from a set ofrandomly generated overlapping DNA fragments represent-ing the entire genome of an organism. Compare library, ar-rayed library.

Guanine (G): A nitrogenous base, one member of the basepair G-C (guanine and cytosine).



HHaploid: A single set of chromosomes (half the full set ofgenetic material), present in the egg and sperm cells of ani-mals and in the egg and pollen cells of plants. Human beingshave 23 chromosomes in their reproductive cells. Comparediploid.

Heterozygosity: The presence of different alleles at one ormore loci on homologous chromosomes.

Homeobox: A short stretch of nucleotides whose base se-quence is virtually identical in all the genes that contain it. Ithas been found in many organisms from fruit flies to humanbeings. In the fruit fly, a homeobox appears to determinewhen particular groups of genes are expressed during devel-opment.

Homology: Similarity in DNA or protein sequences betweenindividuals of the same species or among different species.

Homologous chromosome: Chromosome containing thesame linear gene sequences as another, each derived fromone parent.

Human gene therapy: Insertion of normal DNA directlyinto cells to correct a genetic defect.

Human Genome Initiative: Collective name for severalprojects begun in 1986 by DOE to (1) create an ordered setof DNA segments from known chromosomal locations,(2) develop new computational methods for analyzing ge-netic map and DNA sequence data, and (3) develop newtechniques and instruments for detecting and analyzingDNA. This DOE initiative is now known as the Human Ge-nome Program. The national effort, led by DOE and NIH, isknown as the Human Genome Project.

Hybridization: The process of joining two complementarystrands of DNA or one each of DNA and RNA to form adouble-stranded molecule.

IInformatics: The study of the application of computer andstatistical techniques to the management of information. Ingenome projects, informatics includes the development ofmethods to search databases quickly, to analyze DNA se-quence information, and to predict protein sequence andstructure from DNA sequence data.

In situ hybridization: Use of a DNA or RNA probe to de-tect the presence of the complementary DNA sequence incloned bacterial or cultured eukaryotic cells.

Interphase: The period in the cell cycle when DNA is repli-cated in the nucleus; followed by mitosis.

Intron: The DNA base sequence interrupting the protein-coding sequence of a gene; this sequence is transcribed intoRNA but is cut out of the message before it is translated intoprotein. Compare exon.

In vitro: Outside a living organism.

KKaryotype: A photomicrograph of an individual’s chromo-somes arranged in a standard format showing the number,size, and shape of each chromosome type; used inlow-resolution physical mapping to correlate gross chromo-somal abnormalities with the characteristics of specific dis-eases.

kb: See kilobase.

Kilobase (kb): Unit of length for DNA fragments equal to1000 nucleotides.

LLibrary: An unordered collection of clones (i.e., clonedDNA from a particular organism), whose relationship to eachother can be established by physical mapping. Compare ge-nomic library, arrayed library.

Linkage: The proximity of two or more markers (e.g., genes,RFLP markers) on a chromosome; the closer together themarkers are, the lower the probability that they will be sepa-rated during DNA repair or replication processes (binary fis-sion in prokaryotes, mitosis or meiosis in eukaryotes), andhence the greater the probability that they will be inheritedtogether.

Linkage map: A map of the relative positions of genetic locion a chromosome, determined on the basis of how often theloci are inherited together. Distance is measured incentimorgans (cM).

Localize: Determination of the original position (locus) of agene or other marker on a chromosome.



Locus (pl. loci): The position on a chromosome of a gene orother chromosome marker; also, the DNA at that position.The use of locus is sometimes restricted to mean regions ofDNA that are expressed. See gene expression.

MMacrorestriction map: Map depicting the order of and dis-tance between sites at which restriction enzymes cleave chro-mosomes.

Mapping: See gene mapping, linkage map, physical map.

Marker: An identifiable physical location on a chromosome(e.g., restriction enzyme cutting site, gene) whose inheritancecan be monitored. Markers can be expressed regions of DNA(genes) or some segment of DNA with no known codingfunction but whose pattern of inheritance can be determined.See RFLP, restriction fragment length polymorphism.

Mb: See megabase.

Megabase (Mb): Unit of length for DNA fragments equal to1 million nucleotides and roughly equal to 1 cM.

Meiosis: The process of two consecutive cell divisions in thediploid progenitors of sex cells. Meiosis results in four ratherthan two daughter cells, each with a haploid set of chromo-somes.

Messenger RNA (mRNA): RNA that serves as a template forprotein synthesis. See genetic code.

Metaphase: A stage in mitosis or meiosis during which thechromosomes are aligned along the equatorial plane of the cell.

Mitosis: The process of nuclear division in cells that producesdaughter cells that are genetically identical to each other andto the parent cell.

mRNA: See messenger RNA.

Multifactorial or multigenic disorder: See polygenicdisorder.

Multiplexing: A sequencing approach that uses several pooledsamples simultaneously, greatly increasing sequencing speed.

Mutation: Any heritable change in DNA sequence. Comparepolymorphism.

NNitrogenous base: A nitrogen-containing molecule havingthe chemical properties of a base.

Nucleic acid: A large molecule composed of nucleotide sub-units.

Nucleotide: A subunit of DNA or RNA consisting of a ni-trogenous base (adenine, guanine, thymine, or cytosine inDNA; adenine, guanine, uracil, or cytosine in RNA), a phos-phate molecule, and a sugar molecule (deoxyribose in DNAand ribose in RNA). Thousands of nucleotides are linked toform a DNA or RNA molecule. See DNA, base pair, RNA.

Nucleus: The cellular organelle in eukaryotes that containsthe genetic material.

OOncogene: A gene, one or more forms of which is associatedwith cancer. Many oncogenes are involved, directly or indi-rectly, in controlling the rate of cell growth.

Overlapping clones: See genomic library.

PP1-derived artificial chromosome (PAC): A vector used toclone DNA fragments (100- to 300-kb insert size; average,150 kb) in Escherichia coli cells. Based on bacteriophage (avirus) P1 genome. Compare cloning vector.

PAC: See P1-derived artificial chromosome.

PCR: See polymerase chain reaction.

Phage: A virus for which the natural host is a bacterial cell.

Physical map: A map of the locations of identifiable land-marks on DNA (e.g., restriction enzyme cutting sites, genes),regardless of inheritance. Distance is measured in base pairs.For the human genome, the lowest-resolution physical mapis the banding patterns on the 24 different chromosomes; thehighest-resolution map would be the complete nucleotidesequence of the chromosomes.



Plasmid: Autonomously replicating, extrachromosomal cir-cular DNA molecules, distinct from the normal bacterial ge-nome and nonessential for cell survival under nonselectiveconditions. Some plasmids are capable of integrating into thehost genome. A number of artificially constructed plasmidsare used as cloning vectors.

Polygenic disorder: Genetic disorder resulting from thecombined action of alleles of more than one gene (e.g., heartdisease, diabetes, and some cancers). Although such disor-ders are inherited, they depend on the simultaneous presenceof several alleles; thus the hereditary patterns are usuallymore complex than those of single-gene disorders. Comparesingle-gene disorders.

Polymerase chain reaction (PCR): A method for amplify-ing a DNA base sequence using a heat-stable polymerase andtwo 20-base primers, one complementary to the (+)-strand atone end of the sequence to be amplified and the othercomplementary to the (-)-strand at the other end. Because thenewly synthesized DNA strands can subsequently serve asadditional templates for the same primer sequences, succes-sive rounds of primer annealing, strand elongation, and dis-sociation produce rapid and highly specific amplification ofthe desired sequence. PCR also can be used to detect the ex-istence of the defined sequence in a DNA sample.

Polymerase, DNA or RNA: Enzymes that catalyze the syn-thesis of nucleic acids on preexisting nucleic acid templates,assembling RNA from ribonucleotides or DNA from deox-yribonucleotides.

Polymorphism: Difference in DNA sequence among indi-viduals. Genetic variations occurring in more than 1% of apopulation would be considered useful polymorphisms forgenetic linkage analysis. Compare mutation.

Primer: Short preexisting polynucleotide chain to which newdeoxyribonucleotides can be added by DNA polymerase.

Probe: Single-stranded DNA or RNA molecules of specificbase sequence, labeled either radioactively or immunologi-cally, that are used to detect the complementary base se-quence by hybridization.

Prokaryote: Cell or organism lacking a membrane-bound,structurally discrete nucleus and other subcellular compart-ments. Bacteria are prokaryotes. Compare eukaryote. Seechromosome.

Promoter: A site on DNA to which RNA polymerase willbind and initiate transcription.

Protein: A large molecule composed of one or more chainsof amino acids in a specific order; the order is determined bythe base sequence of nucleotides in the gene coding for theprotein. Proteins are required for the structure, function, andregulation of the bodys cells, tissues, and organs, and eachprotein has unique functions. Examples are hormones, en-zymes, and antibodies.

Purine: A nitrogen-containing, single-ring, basic compoundthat occurs in nucleic acids. The purines in DNA and RNAare adenine and guanine.

Pyrimidine: A nitrogen-containing, double-ring, basic com-pound that occurs in nucleic acids. The pyrimidines in DNAare cytosine and thymine; in RNA, cytosine and uracil.

RRare-cutter enzyme: See restriction enzyme cutting site.

Recombinant clone: Clone containing recombinant DNAmolecules. See recombinant DNA technology.

Recombinant DNA molecules: A combination of DNA mol-ecules of different origin that are joined using recombinantDNA technologies.

Recombinant DNA technology: Procedure used to join to-gether DNA segments in a cell-free system (an environmentoutside a cell or organism). Under appropriate conditions, arecombinant DNA molecule can enter a cell and replicatethere, either autonomously or after it has become integratedinto a cellular chromosome.

Recombination: The process by which progeny derive acombination of genes different from that of either parent. Inhigher organisms, this can occur by crossing over.

Regulatory region or sequence: A DNA base sequence thatcontrols gene expression.

Resolution: Degree of molecular detail on a physical map ofDNA, ranging from low to high.

Restriction enzyme, endonuclease: A protein that recog-nizes specific, short nucleotide sequences and cuts DNA atthose sites. Bacteria contain over 400 such enzymes that rec-ognize and cut over 100 different DNA sequences. See re-striction enzyme cutting site.



Restriction enzyme cutting site: A specific nucleotide se-quence of DNA at which a particular restriction enzyme cutsthe DNA. Some sites occur frequently in DNA (e.g., everyseveral hundred base pairs), others much less frequently(rare-cutter; e.g., every 10,000 base pairs).

Restriction fragment length polymorphism (RFLP):Variation between individuals in DNA fragment sizes cut byspecific restriction enzymes; polymorphic sequences thatresult in RFLPs are used as markers on both physical mapsand genetic linkage maps. RFLPs are usually caused by mu-tation at a cutting site. See marker.

RFLP: See restriction fragment length polymorphism.

Ribonucleic acid (RNA): A chemical found in the nucleusand cytoplasm of cells; it plays an important role in proteinsynthesis and other chemical activities of the cell. The struc-ture of RNA is similar to that of DNA. There are severalclasses of RNA molecules, including messenger RNA, transferRNA, ribosomal RNA, and other small RNAs, each servinga different purpose.

Ribonucleotide: See nucleotide.

Ribosomal RNA (rRNA): A class of RNA found in the ribo-somes of cells.

Ribosomes: Small cellular components composed of spe-cialized ribosomal RNA and protein; site of protein synthe-sis. See ribonucleic acid (RNA).

RNA: See ribonucleic acid.

SSequence: See base sequence.

Sequence tagged site (STS): Short (200 to 500 base pairs)DNA sequence that has a single occurrence in the humangenome and whose location and base sequence are known.Detectable by polymerase chain reaction, STSs are useful forlocalizing and orienting the mapping and sequence data re-ported from many different laboratories and serve as land-marks on the developing physical map of the human ge-nome. Expressed sequence tags (ESTs) are STSs derivedfrom cDNAs.

Sequencing: Determination of the order of nucleotides (basesequences) in a DNA or RNA molecule or the order of aminoacids in a protein.

Sex chromosome: The X or Y chromosome in human be-ings that determines the sex of an individual. Females havetwo X chromosomes in diploid cells; males have an X and aY chromosome. The sex chromosomes comprise the 23rdchromosome pair in a karyotype. Compare autosome.

Shotgun method: Cloning of DNA fragments randomlygenerated from a genome. See library, genomic library.

Single-gene disorder: Hereditary disorder caused by a mu-tant allele of a single gene (e.g., Duchenne muscular dys-trophy, retinoblastoma, sickle cell disease). Compare poly-genic disorders.

Somatic cell: Any cell in the body except gametes and theirprecursors.

Southern blotting: Transfer by absorption of DNA frag-ments separated in electrophoretic gels to membrane filtersfor detection of specific base sequences by radiolabeledcomplementary probes.

STS: See sequence tagged site.

TTandem repeat sequences: Multiple copies of the samebase sequence on a chromosome; used as a marker inphysical mapping.

Technology transfer: The process of converting scientificfindings from research laboratories into useful products bythe commercial sector.

Telomere: The end of a chromosome. This specializedstructure is involved in the replication and stability of linearDNA molecules. See DNA replication.

Thymine (T): A nitrogenous base, one member of the basepair A-T (adenine-thymine).

Transcription: The synthesis of an RNA copy from a se-quence of DNA (a gene); the first step in gene expression.Compare translation.

Transfer RNA (tRNA): A class of RNA having structureswith triplet nucleotide sequences that are complementary tothe triplet nucleotide coding sequences of mRNA. The roleof tRNAs in protein synthesis is to bond with amino acidsand transfer them to the ribosomes, where proteins are as-sembled according to the genetic code carried by mRNA.



Transformation: A process by which the genetic materialcarried by an individual cell is altered by incorporation ofexogenous DNA into its genome.

Translation: The process in which the genetic code carriedby mRNA directs the synthesis of proteins from amino acids.Compare transcription.

tRNA: See transfer RNA.

UUracil: A nitrogenous base normally found in RNA but notDNA; uracil is capable of forming a base pair with adenine.

VVector: See cloning vector.

Virus: A noncellular biological entity that can reproduceonly within a host cell. Viruses consist of nucleic acid cov-ered by protein; some animal viruses are also surrounded bymembrane. Inside the infected cell, the virus uses the syn-thetic capability of the host to produce progeny virus.

VLSI: Very large scale integration allowing more than100,000 transistors on a chip.

YYAC: See yeast artificial chromosome.

Yeast artificial chromosome (YAC): A vector used to cloneDNA fragments (up to 400 kb); it is constructed from thetelomeric, centromeric, and replication origin sequencesneeded for replication in yeast cells. Compare cloning vector.


DOE/ER-0713 (Part 2)

Date Published: November 1997

Prepared for theU.S. Department of EnergyOffice of Energy Research

Office of Biological and Environmental ResearchGermantown, MD 20874-1290

Prepared by theHuman Genome Management Information System

Oak Ridge National LaboratoryOak Ridge, TN 37830-6480

managed byLockheed Martin Energy Research Corporation

for theU.S. Department of Energy

Under Contract DE-AC05-96OR22464

Part 2, 1996 Research Abstracts

ii

iii

ore than a decade ago, the Office of Health and Environmental Research (OHER) of the U.S. Depart-ment of Energy (DOE) struck a bold course in launching its Human Genome Initiative, convinced thatits mission would be well served by a comprehensive picture of the human genome. Organizers recog-nized that the information the project would generate—both technological and genetic—would con-

tribute not only to a new understanding of human biology and the effects of energy technologies but also to a host ofpractical applications in the biotechnology industry and in the arenas of agriculture and environmental protection.

Today, the project’s value appears beyond doubt as worldwide participation contributes toward the goals of determiningthe human genome’s complete sequence by 2005 and elucidating the genome structure of several model organisms aswell. This report summarizes the content and progress of the DOE Human Genome Program (HGP). Descriptiveresearch summaries, along with information on program history, goals, management, and current research highlights,provide a comprehensive view of the DOE program.

Last year marked an early transition to the third and final phase of the U.S. Human Genome Project as pilot programs torefine large-scale sequencing strategies and resources were funded by DOE and the National Institutes of Health, the twosponsoring U.S. agencies. The human genome centers at Lawrence Berkeley National Laboratory, Lawrence LivermoreNational Laboratory, and Los Alamos National Laboratory had been serving as the core of DOE multidisciplinary HGPresearch, which requires extensive contributions from biologists, engineers, chemists, computer scientists, and mathema-ticians. These team efforts were complemented by those at other DOE-supported laboratories and about 60 universities,research organizations, companies, and foreign institutions. Now, to focus DOE’s considerable resources on meeting thechallenges of large-scale sequencing, the sequencing efforts of the three genome centers have been integrated into theJoint Genome Institute. The institute will continue to bring together research from other DOE-supported laboratories.Work in other critical areas continues to develop the resources and technologies needed for production sequencing; com-putational approaches to data management and interpretation (called informatics); and an exploration of the importantethical, legal, and social issues arising from use of the generated data, particularly regarding the privacy and confidenti-ality of genetic information.

Insights, technologies, and infrastructure emerging from the Human Genome Project are catalyzing a biological revolu-tion. Health-related biotechnology is already a success story—and is still far from reaching its potential. Other applica-tions are likely to beget similar successes in coming decades; among these are several of great importance to DOE.We can look to improvements in waste control and an exciting era of environmental bioremediation, we will see newapproaches to improving energy efficiency, and we can hope for dramatic strides toward meeting the fuel demands ofthe future.

In 1997 OHER, renamed the Office of Biological and Environmental Research (OBER), is celebrating 50 years of con-ducting research to exploit the boundless promise of energy technologies while exploring their consequences to thepublic’s health and the environment. The DOE Human Genome Program and a related spin-off project, the MicrobialGenome Program, are major components of the Biological and Environmental Research Program of OBER.

DOE OBER is proud of its contributions to the Human Genome Project and welcomes general or scientific inquiriesconcerning its genome programs. Announcements soliciting research applications appear in Federal Register, Science,Human Genome News, and other publications. The deadline for formal applications is generally midsummer for awardsto be made the next year, and submission of preproposals in areas of potential interest is strongly encouraged. Furtherinformation may be obtained by contacting the program office or visiting the DOE home page (301/903-6488,Fax: -8521, [email protected], URL: http://www.er.doe.gov/production/ober/hug_top.html).

Aristides Patrinos, Associate DirectorOffice of Biological and Environmental ResearchU.S. Department of EnergyNovember 3, 1997

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Preface

M

iv

v

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Foreword

he research abstracts in this section were funded in FY 1996 by the DOE Office of Health and Environ-mental Research, which was renamed Office of Biological and Environmental Research in 1997.

These unedited abstracts were contributed by DOE Human Genome Program grantees and contractors.Names of principal investigators are in bold print. Submitted in 1996, contact information is for the first person namedunless another investigator is designated as contact person. Principal investigators of research projects described byabstracts in this section are listed under their respective subject categories, and an index of all investigators named inthe abstracts is given at the end of this report.

Part 1 of this report contains narratives that represent DOE Human Genome Program research in large, multidisci-plinary projects. As a convenience to the reader, these narratives are reprinted (without graphics) as an appendix to thisvolume, Part 2. The projects represent work at the Joint Genome Institute (p. 72), Lawrence Livermore National Labo-ratory Human Genome Center (p. 73), Los Alamos National Laboratory Center for Human Genome Studies (p. 77),Lawrence Berkeley National Laboratory Human Genome Center (p. 81), University of Washington Genome Center(p. 85), Genome Database (p. 87), and National Center for Genome Resources (p. 91). Only the contact persons forthese organizations are listed in the Index to Principal and Coinvestigators. More information on research carried out inthese projects can be found on their listed Web sites.

T

vi

vii

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Contents

1996 Research Abstracts.....................................................................................................................................1

Sequencing ..............................................................................................................................................................1

Mapping .................................................................................................................................................................19

Informatics ............................................................................................................................................................33

Ethical, Legal, and Social Issues..................................................................................................................45

Infrastructure ......................................................................................................................................................59

Small Business Innovative Research..........................................................................................................63

Projects Completed FY 1994–95..................................................................................................................67

Appendix: Narratives from Large, Multidisciplinary Research Projects.........................71

(Text reprinted from Human Genome Program Report: Part 1, Overview and Progress)

Index to Principal and Coinvestigators...................................................................................................93

Acronym List .............................................................................................................................Inside back cover

viii

ix

1996 Research Abstracts

Project Categories and Principal InvestigatorsSequencing............................................................................................................................................................1

W.H. Benner and J.M. Jaklevic ...................................................................................................................................1

Chung-Hsuan Chen ......................................................................................................................................................1

George Church .............................................................................................................................................................2

Pieter de Jong................................................................................................................................................................2

Norman Dovichi ............................................................................................................................................................3

John J. Dunn and F. William Studier .........................................................................................................................3

John J. Dunn and F. William Studier .........................................................................................................................4

Glen A. Evans ...............................................................................................................................................................4

Glen A. Evans ...............................................................................................................................................................5

*V.L. Florentiev ............................................................................................................................................................5

Raymond F. Gesteland .................................................................................................................................................6

Joe Gray and Daniel Pinkel .........................................................................................................................................7

Trevor Hawkins ............................................................................................................................................................8

Leroy Hood, Mark D. Adams, and Melvin Simon .....................................................................................................8

Barry L. Karger ............................................................................................................................................................9

Richard A. Mathies and Alexander N. Glazer ...........................................................................................................9

Andrei Mirzabekov ....................................................................................................................................................10

Robert K. Moyzis ........................................................................................................................................................12

Barbara Ramsay Shaw ..............................................................................................................................................13

Lloyd M. Smith and Richard A. Guilfoyle ...............................................................................................................13

Lloyd M. Smith ...........................................................................................................................................................14

Richard D. Smith ........................................................................................................................................................14

Richard D. Smith ........................................................................................................................................................15

Stanley Tabor ..............................................................................................................................................................16

Levy Ulanovsky ..........................................................................................................................................................16

Peter Williams .............................................................................................................................................................17

Edward S. Yeung ........................................................................................................................................................17

Mapping..................................................................................................................................................................19

David Allison and Bruce Warmack ..........................................................................................................................19

*Alexandre S. Boitsov ................................................................................................................................................19

Charles R. Cantor.......................................................................................................................................................19

Pieter J. de Jong .........................................................................................................................................................20

Jacques R. Fresco .......................................................................................................................................................21

Fa-Ten Kao ..................................................................................................................................................................21

*V.L. Karpov ...............................................................................................................................................................22

*Russian scientists designated by an asterisk received small emergency grants following December 1992 site reviews by David Galas (formerly DOEOffice of Health and Environmental Research, which was renamed Office of Biological and Environmental Research in 1997), Raymond Gesteland(University of Utah), and Elbert Branscomb (Lawrence Livermore National Laboratory).

x

Julie R. Korenberg .....................................................................................................................................................22

D. L. Nelson .................................................................................................................................................................23

*Olga Podgornaya ......................................................................................................................................................24

*O.L. Polanovsky ........................................................................................................................................................25

*A.I. Poletaev ..............................................................................................................................................................26

Melvin I. Simon ..........................................................................................................................................................26

Melvin I. Simon ..........................................................................................................................................................27

Marcelo Bento Soares ................................................................................................................................................27

Lisa Stubbs ..................................................................................................................................................................28

Lisa Stubbs ..................................................................................................................................................................29

Jean-Michel H. Vos .....................................................................................................................................................30

*N.K. Yankovsky ........................................................................................................................................................30

Informatics ..........................................................................................................................................................33

Daniel Davison ............................................................................................................................................................33

Nathan Goodman .......................................................................................................................................................33

Mark Graves ...............................................................................................................................................................34

David Haussler ............................................................................................................................................................34

Jerzy Jurka .................................................................................................................................................................34

*N.A. Kolchanov .........................................................................................................................................................35

Victor M. Markowitz and I-Min A. Chen ................................................................................................................36

T. Marr ........................................................................................................................................................................37

Gene Myers .................................................................................................................................................................38

Ross Overbeek ............................................................................................................................................................38

G. Christian Overton, Susan B. Davidson, and Peter Buneman...........................................................................39

Pavel A. Pevzner .........................................................................................................................................................40

David B. Searls ............................................................................................................................................................41

David J. States ............................................................................................................................................................41

Edward C. Uberbacher ..............................................................................................................................................42

Edward Uberbacher ...................................................................................................................................................44

Manfred D. Zorn ........................................................................................................................................................44

Ethical, Legal, and Social Issues..........................................................................................45

Charles C. Carlson .....................................................................................................................................................45

Graham Chedd and Noel Schwerin ..........................................................................................................................45

Debra L. Collins and R. Neil Schimke ......................................................................................................................45

Lane Conn ...................................................................................................................................................................46

Jeff Davidson and Laurence Weinberger .................................................................................................................47

Sharon Davis ...............................................................................................................................................................47

Troy Duster .................................................................................................................................................................48

Rebecca S. Eisenberg .................................................................................................................................................48

Stephen Goodman ......................................................................................................................................................49

Margaret C. Jefferson and Mary Ann Sesma ..........................................................................................................50

Mary B. Mahowald ....................................................................................................................................................50

Joseph D. McInerney and B. Ellen Friedman .........................................................................................................51

xi

Joseph D. McInerney, Lynda B. Micikas .................................................................................................................52

Maureen M. Munn, Maynard V. Olson, and Leroy Hood ......................................................................................52

Philip J. Reilly, Dorothy C. Wertz, and Robin J.R. Blatt .......................................................................................53

Bari Scott .....................................................................................................................................................................53

Maria Sosa ..................................................................................................................................................................54

Sylvia J. Spengler .......................................................................................................................................................54

Sylvia Spengler and Janice Mann .............................................................................................................................55

Sara L. Tobin and Ann Boughton .............................................................................................................................55

Franklin M. Zweig......................................................................................................................................................56

Infrastructure...................................................................................................................................................59

Linda Holmes and Eugene Spejewski .......................................................................................................................59

Betty K. Mansfield and John S. Wassom .................................................................................................................59

Sylvia J. Spengler .......................................................................................................................................................60

Walter Williams ..........................................................................................................................................................61

James Wright ..............................................................................................................................................................61

Small Business Innovation Research................................................................................63

Mark W. Knuth ..........................................................................................................................................................63

Gualberto Ruano ........................................................................................................................................................63

Joseph Leone ...............................................................................................................................................................64

William P. MacConnell ..............................................................................................................................................64

Ruth Ann Manning ....................................................................................................................................................64

xii

1DOE Human Genome Program Report, Part 2, 1996 Research Abstracts

Advanced Detectors for MassSpectrometry

W.H. Benner and J.M. JaklevicHuman Genome Group; Engineering Science Department;Lawrence Berkeley National Laboratory; University ofCalifornia; Berkeley, CA 94720510/486-7194, Fax: -5857, [email protected]://www-hgc.lbl.gov

Mass spectrometry is an instrumental method capable ofproducing rapid analyses with high mass accuracy. Whenapplied to genome research, it is an attractive alternative togel electrophoresis. At present, routine DNA analysis bymass spectrometry is seriously constrained to small DNAfragments. Contrasted to other mass spectrometry facilitiesin which the development of ladder sequencing is empha-sized, we are exploring the application of mass spectrom-etry to procedures that identify short sequences. This ap-proach helps the molecular biologists associated withLBL’s Human Genome Center to identify redundant se-quences and vector contamination in clones rapidly,thereby improving sequencing efficiency. We are also at-tempting to implement a rapid mass spectrometry-basedscreening procedure for PCR products.

The implementation of these applications requires that theperformance of matrix-assisted-laser-desorption-ionization(MALDI) and electrospray mass spectrometry is im-proved. Our focus is the development of new ion detectorswhich will advance the state-of-the-art of each of thesetwo types of spectrometers. One of the limitations for ap-plying mass spectrometry to DNA analysis relates to thepoor efficiency with which conventional electron multipli-ers detect large ions, a problem most apparent inMALDI-TOF-MS. To solve this problem, we are develop-ing alternative detection schemes which rely on heat pulsedetection. The kinetic energy of impacting ions is con-verted into heat when ions strike a detector and we are at-tempting to measure indirectly such heat pulses. We aredeveloping a type of cryogenic detector called a supercon-ducting tunnel junction device which responds to thephonons produced when ions strike the detector. This de-tector does not rely on the formation of secondary elec-trons. We have demonstrated this type of detector to be atleast two orders of magnitude more sensitive, on anarea-normalized basis, than microchannel plate ion detec-tors. This development could extend the upper mass limitof MALDI-TOF-MS and increase sensitivity.

Electrospray ion sources generate ions of mega-DaltonDNA with minimal fragmentation, but the mass spectro-metric analyses of these large ions usually leads only to amass-to-charge distribution. If ion charge was known, ac-

tual mass data could be determined. To address this prob-lem, we are developing a detector that will simultaneouslymeasure the charge and velocity of individual ions. Wehave been able to mass analyze DNA molecules in the 1 to10 MDa range using charge-detection mass spectrometry.In this technique, individual electrospray ions are directedto fly through a metal tube which detects their imagecharge. Simultaneous measurement of their velocity pro-vides a way to measure their mass when ions of knownenergy are sampled. Several thousand ions can be ana-lyzed in a few minutes, thus generating statistically sig-nificant mass values regarding the ions in a sample popu-lation. We are attempting to apply this technology to theanalysis of PCR products.

DOE Contract No. DE-AC03-76SF00098.

Mass Spectrometer for HumanGenome Sequencing

Chung-Hsuan Chen, Steve L. Allman, and K. BruceJacobsonOak Ridge National Laboratory; Oak Ridge, TN 37831423/574-5895, Fax: -2115, [email protected]

The objective of this program is to develop an innovativefast DNA sequencing technology for the Human GenomeProject. It can also be applied to fast screening of geneticand contagious diseases, DNA fingerprinting, and envi-ronmental impact analysis.

The approach of this program is to replace conventionalgel electrophoresis sequencing methods by using lasersand mass spectrometry for sequencing. The present gelsequencing method usually takes hours to days to acquireDNA analysis or sequencing, since different lengths ofDNA segments need to be separated in dense gel. Withlaser desorption mass spectrometry (LDMS) approach,various sizes of DNA segments are separated in thevacuum chamber of a mass spectrometer. Thus, the timetaken to separate various sizes of DNA is less than onesecond compared to hours using other methods.

Recently, we successfully demonstrated sequencing shortDNA segments with this approach. We also have suc-ceeded in using LDMS for fast screening of cystic fibrosisdisease. We succeeded in identifying both point mutationand deletion of cystic fibrosis. In addition, we had pre-liminary success in using LDMS to achieve DNA finger-printing. Thus, laser desorption mass spectrometry(LDMS) is going to emerge as a new and important bio-technological tool for DNA analysis.

DOE Contract No. DE-AC05-84OR21400.

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing

*Projects designated by an asterisk received small emergency grants following December 1992 site reviews by David Galas (formerly DOE Office ofHealth and Environmental Research, which was renamed Office of Biological and Environmental Research in 1997), Raymond Gesteland (Universityof Utah), and Elbert Branscomb (Lawrence Livermore National Laboratory).

2 DOE Human Genome Program Report, Part 2, 1996 Research Abstracts

Genomic Sequence Comparisons

George ChurchHarvard Medical School; Boston, MA 02115617/432-0503 or -7562, Fax: -7266http://arep.med.harvard.edu

The first objective of this project is completion of an auto-mated system to sequence DNA using electrophoremass-tag (EMT) primers for dideoxy sequencing. The pro-totype machine will contain a 60 capillary array with 400EMT-labeled sequence ladders per capillary. The system isdesigned to use 100-fold less reagent and have 500-foldhigher speed (1000 bases per sec per instrument) than cur-rent sequencing technology. Cleavage and laser desorptionof EMTs from membranes for subsequent detection byEC-TOF mass spectrometry. The second objective is toovercome the limitations of purely hypothetical annotationof the growing number of reading frames in new genomesequences. We measure gene product levels and interac-tions using DNA microarrays, whole genome in vivofootprinting and crosslinking.

Our approach involves system integration of instrumenta-tion, organic chemistry, molecular biology, electrophoresisand software to the task of increasing sequencing accuracyand efficiency. Likewise we integrate such instruments andothers with the needs of acquiring and annotation oflarge-scale microbial and human genomic sequence andpopulation polymorphisms.

To establish functions for new genes, we use large scalephenotyping by multiplexed growth competition assays,both by targeted deletion and by saturation insertional mu-tagenesis. We will continue to develop a system to se-quence DNA using electrophore mass-tags (EMTs). Wewill establish genome-scale experimental methods for se-quence annotation.

The most significant findings in 1995-1996 were 1) Dem-onstration of use of electrophore mass-tags in dideoxy se-quencing. 2) Development of IR-laser desorption methodand model. 3) A novel dsDNA microarray synthesis strat-egy. 4) A new amplifiable differential display forwhole-genome in vivo DNA-protein interactions. 5) Estab-lishment and application of a microbial DNA-protein inter-action database.

DOE Grant No. DE-FG02-87ER60565.

A PAC/BAC End-Sequence DataResource for Sequencing the HumanGenome: A 2-Year Pilot Study

Pieter de JongRoswell Park Cancer Institute; Buffalo, NY 14263716/845-3168, Fax: -8849, [email protected]://bacpac.med.buffalo.edu

Large scale sequencing of the Human genome requires theavailability of high-fidelity clones with large genomic in-serts and a mechanism to find clones with minimal over-laps within the clone collections. The first need can be sat-isfied with bacterial artificial chromosome libraries (PACsand BACs) which already exist and further such librariesnow being developed. However, a cost-effective way forestablishing high-resolution contig maps for the humangenome has not yet been established. Recently, a new ap-proach for virtual screening for overlapping clones hasbeen proposed by several research groups and has beendiscussed eloquently in a manuscript by Venter et al., 1996(Nature). We will implement this approach for use withour human PAC and BAC libraries and use the first year asa pilot stage. The goal of the one year pilot is to prove thefeasibility of large scale end sequencing and to demon-strate usefulness.

The first goal will be met by sequencing the ends for40,000 clones from our existing PAC library and fromBAC libraries currently being developed under NIH fund-ing within our laboratory. The end-sequencing will bebased on our new DOP-vector PCR procedure (Chen et al,1996, Nucleic Acids Research 24, 2614-2616). All se-quence data will be made available through public data-bases (GSDB, GDB, Genbank) and will also becomeBLAST searchable through the UTSW WWW site fromour collaborator, Glen Evans. In view of our currentunder-developed informatics structure, we do not expect toprovide BLAST search access through our own web siteduring the pilot phase.

To prove the usefulness of available end sequences, wewill prepare a chromosome 14-enriched clone collectionfrom our current 20-fold deep PAC library. To detect thechromosome 14 clones, we will use as hybridizationprobes a set of 1,000 mapped STS markers available fromPaul Dear (MRC, Cambridge, UK), the about 600 markerspresent in the Whitehead map and the in situ mapped BACand PAC clones available from Julie Korenberg. We willhybridize with these existing markers in probe pools, spe-cific for regions of chromosome 14. Thus we will isolateregion-enriched PAC clone collections.

Assuming that the clone collections will be at least50%-specific for chromosome 14 (50% false positives)and will include most of the chromosome 14 PACs fromour library, a collection of about 35,000 clones is expected.

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


Hence, the bulk of the end sequences obtained during thefirst year will be derived from the chromosome 14 en-riched set and should result in a sequence ready clone col-lection covering about 100 Mbp of the human genome.The purity of the chromosome 14 PAC collection will becharacterized in a number of different ways, including test-ing with independent markers not used as probes and byFISH analysis of a representative set of PAC clones. Totest the usefulness of the end sequence resource, theSanger Centre will sequence chromosome 14 PACs fromour collection and identify overlapping clones by virtualscreening, using our end-sequence database.

If overlapping clones can not be found with the expectedlevel of redundancy in the end-sequence database, we willscreen the original PAC library with probes or STS mark-ers derived from the sequenced PAC clones.

Subcontract under Glen Evans’ DOE Grant No. DE-FC03-96ER62294.

Multiple-Column Capillary GelElectrophoresis

Norman DovichiDepartment of Chemistry; University of Alberta;Edmonton, Alberta, Canada T6G 2G2403/492-2845, Fax: -8231, [email protected]://hobbes.chem.ualberta.ca

The objective of this project is to develop high-throughputDNA sequencing instrumentation. A two-dimensional ar-rayed capillary electrophoresis instrument is under devel-opment.

We have developed multiple capillary DNA sequencers.These instruments have several important attributes. First,by operation at electric fields greater than 100 V/cm, weare able to separate DNA sequencing fragments rapidlyand efficiently. Second, the separation is performed with3%T 0%C polyacrylamide. This low viscosity,non-crosslinked matrix can be pumped from the capillaryand replaced with fresh material when required. Third, weoperate the capillary at elevated temperature. High tem-perature operation eliminates compressions, speeds theseparation, and increases the read length. Fourth, our fluo-rescence detection cuvette is manufactured locally bymeans of microlithography technology. These detectioncuvettes provide robust and precise alignment of the opti-cal system. Currently, 5, 16, and 90 capillary instrumentsare in operation in our lab; 32 and 576 capillary devicesare under development. Fourth, we use both avalanchephotodiode photodetectors and CCD cameras for high sen-sitivity detection. We have obtained detection limits of 120fluorescein molecules injected onto the capillaries. Highsensitivity is important in detecting the low concentrationfragments generated in long sequencing reads. This combi-

nation of low concentration acrylamide, high temperatureoperation, and high sensitivity detection allows separationof fragments over 800 bases in length in 90 minutes.


DNA Sequencing with Primer Libraries

John J. Dunn, Laura-Li Butler-Loffredo, and F. WilliamStudierBiology Department; Brookhaven National Laboratory;Upton, NY 11973516/344-3012, Fax: -3407, [email protected]://genome5.bio.bnl.gov

Primer walking using oligonucleotides selected from a li-brary is an attractive strategy for large-scale DNA se-quencing. Strings of three adjacent hexamers can primeDNA sequencing reactions specifically and efficientlywhen the template is saturated with a single strandedDNA-binding protein (1), and a library of all 4,096hexamers is manageable. We would like to be able to se-quence directly on 35-kbp fesmid templates, but the signalfrom a single round of synthesis is relatively weak andtriple-hexamer priming has not yet been adapted for cyclesequencing. We reasoned that a hexamer library might beused for cycle sequencing if combinations of hexamerscould be selectively ligated by using other hexamers as thetemplate for alignment. In this way, the longer primersneeded for cycle sequencing could be generated easily andeconomically without the need for complex machines forde novo synthesis.

We found that ordered ligation of 3 hexamers to form an18-mer occurs readily on a template of the 3 complemen-tary hexamers (offset by three base pairs) that can basepair unambiguously to form a double-stranded complex ofindefinite length (2). Each hexamer forms three comple-mentary base pairs with two other hexamers, generatingcomplementary chains of contiguous hexamers with strandbreaks staggered by three bases. Two adjacent hexamers inthe chain to be ligated contain 5' phosphate groups and theothers are unphosphorylated. Both T4 and T7 DNA ligasecan ligate the phosphorylated hexamers to their neighborsin such a complex at hexamer concentrations in the 50-100M range, producing an 18-mer and leaving three unphos-phorylated hexamers. The products of these ligation reac-tions can be used directly for fluorescent cycle sequencingof 35-kbp templates.

Unambiguous ligation requires that alternative complexeswith perfect base pairing not be possible with the combina-tion of hexamers used. Since the combination of hexamersis dictated by the sequence of the desired ligation product,some oligonucleotides cannot be produced unambiguouslyby this method. However, 82.5% of all possible 18-merscould potentially be generated starting with a library of all

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


4096 hexamers, more than adequate for high throughputDNA sequencing by primer walking.

DOE Grant No. DE-AC02-76CH00016.

References(1) Kieleczawa, J., Dunn, J. J., and Studier, F. W. DNA sequencing by

primer walking with strings of contiguous hexamers. Science, 258,1787-1791 (1992).

(2) Dunn, J. J., Butler-Loffredo, L. and Studier, F. W. Ligation ofhexamers on hexamer templates to produce primers for cyclesequencing or the polymerase chain reaction. Anal.Biochem. 228,91-100 (1995).

Rapid Preparation of DNA forAutomated Sequencing

John J. Dunn, Matthew Randesi, and F. William StudierBiology Department; Brookhaven National Laboratory;Upton, NY 11973516/344-3012, Fax: -3407, [email protected]://genome5.bio.bnl.gov

We have developed a vector, referred to as a fesmid, formaking libraries of approximately 35-kbp DNAs for map-ping and sequencing. The high efficiency lambda packag-ing system is used to generate libraries of clones. Theseclones are propagated at very low copy number under con-trol of the replication and partitioning functions of the Ffactor, which helps to stabilize potentially toxic clones. AP1 lytic replicon under control of the lac repressor allowsamplification simply by adding IPTG. The cloned DNAfragment is flanked by packaging signals for bacteriophageT7, and infection with an appropriate T7 mutant packagesthe cloned sequence into T7 phage particles, leaving mostof the vector sequence behind. The size of the vector por-tion is such that genomic fragments packageable in lambda(normal capacity 48.5 kbp) should also be packaged in T7(normal capacity 40 kbp).

We have made fesmid libraries of several bacterial DNAs,including Borrelia burgdorferi (the cause of Lyme disease),Bartonella henselae (the cause of cat scratch fever), E.coli, B.subtilis, H. influenzae, and S. pneumoniae, some ofwhich have been reported to be difficult to clone in cosmidvectors. Human DNA is also readily cloned in these vec-tors. Brief amplification followed by infection with a gene3 and 17.5 double mutant of T7, which is defective in rep-licating its own DNA, produces lysates in which essen-tially all of the phage particles contain the cloned DNAfragment. Simple techniques yield high-quality DNA fromthese phage particles. Primers for direct sequencing fromthe ends of fesmid clones have been made.

Primer walking from the ends of fesmid clones could be anefficient way to sequence bacterial genomes, YACs, orother large DNAs without the need for prior mapping ofclones. The ends of fesmids from a random library provide

multiple sites to initiate primer walking. Merging of theelongating sequences from different clones will simulta-neously generate the sequence of the original DNA anddetermine the order of the clones. The packaged fesmidDNAs are a convenient size for multiple restriction analy-ses to confirm the accuracy of the nucleotide sequence.

DOE Grant No. DE-AC02-76CH00016.

A PAC/BAC End-Sequence Databasefor Human Genomic Sequencing

Glen A. Evans, Dave Burbee, Chris Davies, Trey Fondon,Tammy Oliver, Terry Franklin, Lisa Hahner, Shane Probst,and Harold R. (Skip) GarnerGenome Science and Technology Center and McDermottCenter for Human Growth and Development; Universityof Texas Southwestern Medical Center at Dallas; Dallas,TX 75235-8591214/648-1660, Fax: -1666, [email protected]://mcdermott.swmed.edu

While current plans call for completing the human genomesequence in 2003, major obstacles remain in achieving thespeed and efficiency necessary to complete the task ofmapping and sequencing. As an approach to this problem,we proposed a novel approach to large scale constructionof sequence-ready physical clone maps of the human ge-nome utilizing end-specific sequence sampling. An earlierpilot project was initially carried out to develop a GSS (ge-nomic sequence sampled) map of human chromosome 11by sequencing the ends of 17,952 chromosome 11 specificcosmids. This chromosome 11-specific end-sequence data-base allows rapid and sensitive detection of clone overlapsfor chromosome 11-sequencing.

In this project, we propose to evaluate the utility of PACand BAC end-sequences representing the entire humangenome as a tool for complete, high accuracy mapping andsequencing. In this approach, we utilized total genomicPAC/BAC libraries (constructed by P. de Jong, RPCI), fol-lowed by end-sequencing of both ends of each clone in thelibrary and limited regional mapping of a subset of clonesas sequencing nucleation points by FISH (Fluorescence insitu hybridization).

To initiate regional analysis, a single clone would be se-quenced by shotgun or primer directed sequencing, theentire sequence used to search the end-database for over-lapping clones, and the minimal overlapping clones forextending the sequence selected. This approach would al-low rational and efficient simultaneous mapping and se-quencing, as well as expediting the coordination and ex-change of information between large and small groups par-ticipating in the human genome project.

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


In this pilot project proposal we are carrying out auto-mated end-sequencing of approximately 40,000 PAC andBAC clones representing the entire human genome, aswell as about 500 PAC clones localized to human chromo-somes 11 and 15. The clones and resulting end-sequencedata base will be utilized to 1) nucleate regions of interestfor large scale sequencing concentrating on regions ofchromosome 11 and 15, 2) correspond with regionsmapped by other methods to confirm the mapping accu-racy and 3) used to evaluate the use of random clone endsequence libraries. DNA sequencing is being carried out inan entirely automated fashion using a Beckman/Sagianrobotic system, ABI 377 automated sequencers and auto-mated sequence data processing, annotation and publica-tion using a Hewlett Packard/Convex superparallel com-puter located at the UTSW genome center. FISH analysisof a sample of PAC clones has been carried out and de-fines the potential chimera rate in existing PAC libraries asless than 1.2%. This effort will be coordinated with effortsof other groups carrying out PAC and BAC library con-struction, PAC and BAC end-sequencing and FISH analy-sis to avoid duplication of effort and provide a comprehen-sive end-sequence library and data set for use by the inter-national human genome sequencing effort.

DOE Grant No. DE-FC03-96ER62294.

Automated DNA Sequencing byParallel Primer Walking

Glen A. Evans, Dave Burbee, Chris Davies, JeffSchageman, Shane Probst, Terry Franklin, Ken Kupfer,and Harold R. (Skip) GarnerGenome Science and Technology Center and McDermottCenter for Human Growth and Development; Universityof Texas Southwestern Medical Center at Dallas; Dallas,TX 75235-8591214/648-1660, Fax: -1666, [email protected]://mcdermott.swmed.edu

The development of efficient mapping approaches coupledwith high throughput, automated DNA sequencing remainsone of the key challenges of the Human Genome Project.Over the past few years, a number of strategies to expediteclone-by-clone DNA sequencing have been developed in-cluding efficient shotgun sequencing, sequencing of nesteddeletions, and transposon-mediated primer insertion. Wehave developed a novel sequencing strategy applicable tohigh throughput, large scale genomic analysis based uponDNA sequencing directly primed on of cosmid templatesusing custom-designed, automatically synthesized oligo-nucleotide primers. This approach of directed primer“walking” would allow the number of sequencing reac-tions and the efficiency of sequencing to be vastly im-proved over traditional shotgun sequencing.

Custom primer design has been carried out using softwarewe developed for prediction of “walking” primers directlyfrom the output of ABI377 automated DNA sequencers,and the output used to automatically program synthesis ofthe custom primers using 96 or 192 channel oligonucle-otide synthesizers constructed at UTSW. Automated opera-tion of the sequencing system is thus possible where re-sults of each sequencing reaction is used to predict, syn-thesize, and carry out appropriate extension reactions fordownstream “walking”. A automated prototype system hasbeen assembled where dye terminator DNA sequencingcan be carried out from 96 cosmid templates simulta-neously followed by prediction of oligonucleotide “walk-ing” primers for extending the sequence of each fragment,and programming an attached 96-channel oligonucleotidesynthesizer to initiate a second round of sequencing. Usinga set of nested cosmids covering 800 kb at 5X redundancy,primer directed sequencing should allow completion of800 kb of finished, high accuracy DNA sequence in 8 to16 cycles. Furthermore, coupling of automated DNA se-quencing instrumentation to DNA sequence analysis pro-grams and multichannel oligonucleotide synthesizers willallow almost complete automation of sequencing processand the development of instrumentation for completelyunattended DNA sequencing.


*Parallel Triplex Formation as PossibleApproach for Suppression ofDNA-Viruses Reproduction

V.L. Florentiev, A.K. Shchyolkina, I.A. Il’icheva, E.N.Timofeev, and S. Yu TsybenkoEngelhardt Institute of Molecular Biology; RussianAcademy of Sciences; Moscow 117984, RussiaFax: +7-095/135-1405, [email protected]

It is well known that homopurine or homopyrimidinesingle stranded oligonucleotides can bind tohomopurine-homopyrimidine sequences of two-strandedDNA to form stable three-stranded helices. In such tri-plexes two identical strands have antiparallel orientation.We denote these triplexes as “antiparallel” or “classical”triplexes.

A particular interest of investigators to triplexes has arisendue to an elegant idea of using triplexes assequence-specific tools for purposeful influence on DNAduplexes. Triplex forming oligonucleotides were shown tobe potentially useful as regulators of gene expression andsubsequently as therapeutical (antiviral) agents.

A significant limitation to the practical application of anti-parallel triplex is the requirement for homopurine tracts intarget DNA sequences. Numerous investigations slightly

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


expanded the repertoire of triple-forming sequences butdid not completely remove this limitation.

It was recently shown that during homologous recombina-tion promoted by RecA a triple-stranded DNA intermedi-ate was formed. Such a structure is a new form of the triplehelix. In sharp contrast with the “classical” triplexes theirthird strand is parallel to the identical strand of theWatson-Crick duplex. We denote this structure as “paral-lel” triplex. Recently, the parallel triplex was obtained onlyby deproteinization of joint molecules generated by recom-bination proteins.

We first obtained experimental (chemical probe, meltingcurves and fluorescence due binding) results that provideconvincingly evidence for protein-independent formationof parallel triplex [1] and than confirmed this fact by FTIRdata [2]. Because the parallel triplex can be formed for anysequence, it might be “ideal” potential tool for sequencespecific recognition of DNA. Unfortunately, low stabilityof parallel triplexes prohibits practical application of thesestructures.

Earlier we found that propidium iodide stabilizes selec-tively the parallel triplexes [3]. This fact was the basis ofnew approach to stabilization of parallel triplexes beingdeveloped by us now. The approach consists in use of tar-geting oligonucleotide, which contains in internucleotidelinkage the alkyl insert coupled with intercalated ligandthrough linker. Length of linker was chosen to allowligand to intercalate in the same stacking-contact (lengthof linker was picked by molecular dynamic calculations).

Preliminary study showed that presence of intercalatinginserts increase considerably stability of DNA duplexes[4]. Now we are investigating in detail effect of suchmodification of targeting oligonucleotides on stability ofparallel triplexes.

DOE Grant No. OR00033-93CIS005.

References1. Shchyolkina, A. K., Timofeev, E. N., Borisova, O. F., Il’icheva, I.A.,

Minyat, E. E., Khomyakova, E. B. and Florentiev, V. L. (1994) TheR-form DNA does exist. FEBS Letters, 339, 113-118.

2. Dagneaux, C., Gousset, H., Shchyolkina, A. K., Ouali, M., Lettelier,R., Liquier, J., Florentiev, V. L. and Taillander, E. (1996) Parallel andantiparallel AA-T intramolecular triple helices. Nucleic Acids Res.,24, 4506-4512.

3. Borisova, O. F., Shchyolkina, A. K., Timofeev, E. N., Tsybenko, S. Yu.,Mirzabekov, A. and Florentiev, V. L. (1995) Stabilization of paralleltriplex with propidium iodide. J. Biomol. Struct. Dynam., 13, 15-27.

4. Timofeev, E. N., Smirnov I. P., Haff, L. A., Tishchenko, E. I.,Mirzabekov, A. D. and Florentiev, V. L. (1996) Methidiumintercalator inserted into synthetic oligonucleotides. TetrahedronLett., 37, 8467-8470.

Advanced Automated SequencingTechnology: Fluorescent Detection forMultiplex DNA Sequencing

Andy Marks, Tony Schurtz, F. Mark Ferguson, LeonardDi Sera, Alvin Kimball, Diane Dunn, Doug Adamson, Pe-ter Cartwright, Robert B. Weiss,1 and Raymond F.Gesteland1

Department of Human Genetics and 1Howard HughesMedical Institute; University of Utah; Salt Lake City,UT 84112Gesteland: 801/581-5190, Fax: /[email protected]

Automation of a large-scale sequencing process based oninstrumentation for automated DNA hybridization and de-tection is a focal point of our research. Recently, we havedevised a method for amplifying fluorescent light outputon nylon membranes by using an alkaline phosphatase-conjugated probe system combined with a fluorogenic al-kaline phosphatase substrate [1]. The amplified signal al-lows sensitive detection of DNA hybrids in thesub-femtomole/band range.

On the basis of this detection chemistry, automated devicesfor detecting DNA on blotted microporous membranes us-ing enzyme-linked fluorescence, termed Probe Chambers,have been built. The fluorescent signal is collected by aCCD camera operating in a Time Delay and Integrationmode. Concentrated solutions of probes and enzymes arestored in Peltier-cooled septa sealed vials and delivered bysyringe pumps residing in a gantry style pipetting robot.Fluorescence excitation is generated by a mercury arclamp acting through a fiber optic “light line”. Three 30 x63 centimeter sequencing membranes can be simulta-neously processed, currently revealing up to 108 lane setsper multiplex cycle. A probing cycle is completed approxi-mately every eight hours.

Integration of the Probe Chamber into the production pipeline is accomplished through connections to the laboratorydata base. A critical component of a high-throughput se-quencing laboratory is the software for interfacing to in-strumentation and managing work flow. The InformaticsGroup of the Utah Genome Center has designed andimplemented an innovative system for automating andmanaging laboratory processes. This software allows themodel of workflow to be easily defined. Given such amodel, the system allows the user to direct and track theflow of laboratory information. The core of the system is ageneric, client-server process management engine that al-lows users to define new processes without the need forcustom programming. Based on these definitions, the soft-ware will then route information to the next process, trackthe progress of each task, perform any automated opera-tions, and provide reports on these processes. To furtherincrease the usefulness of our laboratory information sys-

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


tem, we have augmented it with hand-help mobile comput-ing devices (Apple Newtons) that link to the databasethrough RF networking cards.

Base calling software has been developed to support ourautomated, large scale sequencing effort. 1st stage se-quence calling identifies putative bands, however, depend-ing on the number of reader indel errors (2-6%), merging1st stage sequence without the aide of cutoff informationcan be difficult. To improve our base calling we have em-ployed Fuzzy Logic to establish confidence metrics. Thelogic produces a confidence metric for each band usingband height, width, uniqueness, shape, and the gaps to ad-jacent bands. The confidence metric is then used to iden-tify the largest block of highest quality sequence to bemerged.


Reference[1] Cherry, J.L., Young, H., Di Sera, L.J., Ferguson, F.M., Kimball, A.W.,

Dunn, D.M., Gesteland, R.F., and Weiss, R.B. (1994). Enzyme-linkedfluorescent detection for automated multiplex DNA sequencing,Genomics 20, 68-74.

Resource for Molecular Cytogenetics

Donna Albertson, Colin Collins, Joe Gray,1 StevenLockett, Daniel Pinkel,1 Damir Sudar, Heinz-UlrichWeier, and Manfred ZornLawrence Berkeley National Laboratory; Berkeley,CA 94720 and 1University of California; San Francisco,CA 94143Gray: 415/476-3461, Fax: -8218, [email protected]: 415/476-3659, Fax: -8218, [email protected]://rmc-www.lbl.gov

The purpose of the Resource for Molecular Cytogenetics isto develop molecular cytogenetic techniques, instrumentsand reagents needed to facilitate large scale genomic DNAsequencing and to assist in identification and functionalcharacterization of genes involved in disease susceptibility,genesis and progression. This work is closely coordinatedwith the LBNL Human Genome Program and directly sup-ports research in the LBNL Life Sciences Division and theUCSF Cancer Center. Work currently is in four areas:a)Genome analysis technology, b)Probe development andphysical map assembly, c)Digital imaging microscopy andd)Informatics. The Resource acts as a catalyst for researchin several areas so some support comes from Industry, theNIH and NIST.

Probe development and physical map assembly: The Re-source maintains a list of over a thousand publicly availableprobes suitable for molecular cytogenetic studies. These in-clude approximately 600 probes each selected by the Re-source to contain a known STS or EST. Probes selected bythe Resource can be requested through our web page.

The Resource also participates in the development of lowand high resolution physical maps to facilitate analysis andcharacterization of genetic abnormalities associated withhuman disease. Low resolution mapping panels withprobes distributed at few megabase intervals have beencompleted this year for chromosomes 1, 2, 3, 7, 8, 10, and20. The mapped STSs associated with these probes facili-tate movement from low to high resolution physical maps.STS content mapping and DNA fingerprinting have beenapplied to develop a high resolution, sequence-ready mapcomprised of BAC and P1 clones for the ~1Mb region ofchromosome 20 between WI9227 and D20S902. This re-gion is amplified in ~10% of human breast cancers. Ap-proximately 300 kb of this region has been sequenced bythe LBNL Human Genome Program.

Quantitative DNA fiber mapping (QDFM) has been devel-oped this year to facilitate high resolution analysis of ge-nomic overlap between cloned probes. In this approach,cloned DNA molecules are uniformly stretched during dry-ing by the hydrodynamic action of a receding meniscus.The position of specific sequences along the stretchedDNA molecules is visualized by fluorescence in situ hy-bridization (FISH) and measured by digital image analysis.QDFM has been used to map gamma alpha transposons,plasmid or cosmid probes along P1 molecules, and P1 orPAC clones along straightened YAC molecules with fewkilobase resolution. QDFM is now being studied to deter-mine its utility in the assembly of minimally overlapping,sequence-ready contigs, assessment of the integrity ofcloned BACs and mapping of subclones prepared for di-rected DNA sequencing along the clone from which theywere derived.

Genome analysis technology: The Resource has partici-pated in the development of comparative genomic hybrid-ization (CGH) as a tool for detection and mapping ofchanges in relative DNA sequence copy number in humansand mouse. This year, CGH to arrays of cloned probes(CGHa) has been demonstrated. This is advantageous be-cause it allow aberrations to be mapped with resolutiondetermined by the genomic spacing of probes on the array.CGHa also is attractive since it appears to be linear over arelative copy number range of at least 104 between the twonucleic acid samples being compared.

The Resource has participated in the development of FISHapproaches to analysis of relative gene expression in nor-mal and aberrant tissues. FISH with cloned or predictedexpressed sequences, previously developed in C. elegans,is now being applied to the assessment of expression ofhuman genes. The C. elegans work suggests a throughputof several dozen sequences per month. Information fromthis approach will be important in assessment of the func-tion of newly discovered genes, including those predictedfrom DNA sequencing.

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing

(abstract continued)


Digital imaging microscopy: The Resource supports workin microscopy, image processing and analysis methodsneeded for CGH and CGHa, 3D FISH, tissue analysis, rareevent detection, multi-color image acquisition, aberrationscoring for biodosimetry, and analysis of FISH to DNAfibers. Developments this year include an improved pack-age for CGH and prototype systems for analysis of DNAfibers, CGHa arrays and semiautomatic segmentation ofnuclei in three dimensions.

Informatics : The Resource maintains a web site at http://rmc-www.lbl.gov that summarizes information aboutmapped probes. Probes developed by the Resource can berequested directly through this page. In addition, the Re-source has developed a Web page for exchange of ge-nomic, genetic and biologic information between geo-graphically disperse collaborators. The page, under pass-word control, carries information about physical maps,genomic sequence, sequence annotation, and gene expres-sion images.

DOE Contract No. DEAC0376SF00098.

DNA Sample Manipulation andAutomation

Trevor HawkinsCenter for Genome Research; Whitehead Institute/Massa-chusetts Institute of Technology; Cambridge, MA 02139617/252-1910, Fax: -1902, [email protected]://www-genome.wi.mit.edu

The objective of this project is to develop a high-through-put, fully automated robotic device for the complete auto-mation of the sequencing process. We also aim to furtherdevelop DNA sequencing electrophoresis systems and tointegrate these devices with our robotics.

We have built the Sequatron, an integrated, robotic devicewhich automates the tasks of DNA purification and setupof thermal cycle sequencing reactions. The major compo-nent of our system is an articulated CRS 255A robotic armwhich is track mounted. The deck of the robot containsseveral new or modified XYZ robotic workstations, anovel thermal cycler with automated headed lids, carou-sels, and custom built plate feeders.

Biochemically, we have employed our Solid-phase revers-ible immobilization (SPRI) technique to isolate and ma-nipulate the DNA throughout the process.

Specifically we have set up the Sequatron to isolate DNAfrom M13 phage or crude PCR products using the sameprotocol and procedures. From M13 phage we obtain ap-proximately 1g of DNA per well, which is sufficient formultiple sequencing reactions.

The current throughput of the system is 80 microtiter platesof samples from M13 phage supernatants or crude PCRproducts to sequence ready samples every 24 hours. Re-cently, new enzymes, new energy transfer primers and higherdensity microtiter plates have opened up possible increasesto in excess of 25,000 samples per 24 hour period.


Relevant PublicationDeAngelis, M., Wang, D., & Hawkins, T. (1995) Nucl. Acids. Res 23,

4742-4743.

Construction of a Genome-WideCharacterized Clone Resource forGenome Sequencing

Leroy Hood, Mark D. Adams,1 and Melvin Simon2

University of Washington; Seattle, WA 98195-7730206/616-5014, Fax: /685-7301, [email protected] Institute for Genomic Research; Rockville, MD20850; [email protected] Institute of Technology; Pasadena, CA 91125;[email protected]

Bacterial artificial chromosomes (BACs) represent thestate of the art cloning system for human DNA because oftheir stability and ease of manipulation. Venter, Smith andHood (Nature 381:364-366, 1996) have proposed a strat-egy based on the use of sequences from the ends of allclones in a deep coverage BAC library to produce asequence-ready set of clones for the human genome. Wepropose to demonstrate the effectiveness of this strategy byperforming a directed test, initially on chromosomes 16and 22, and continuing on to chromosome 1. All availablemarkers on chromosome 16 (including the large number ofsoon-to-be-available radiation hybrid markers) will beused to screen the existing 8x BAC library at CalTech.This will serve to evaluate the quality of the library interms of representation of broad chromosomal regions. Asimilar procedure will be used for chromosome 22, exceptthat the existing BAC map will be used to select moreevenly spaced markers for screening, including use ofend-sequence markers from the current chromosome 22BAC map constructed in the Simon lab. Each identifiedclone will be rearrayed from the library and end se-quenced. This information will dovetail nicely with ongo-ing sequencing projects at TIGR and the Sanger Centre,which will in turn provide additional information on theaverage degree of BAC overlap detectable by this method,the degree of interference with genome-wide repeats, andthe appropriate use of fingerprinting as an early or late ad-dition to the end-sequencing information. In addition, wewill develop and implement cost-effective,high-throughput methods of preparing and end-sequencingBAC DNA that are suitable for scaling to characterization

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


of the full 400,000 clones necessary for characterization ofa 15x human BAC library.


DNA Sequencing Using CapillaryElectrophoresis

Barry L. KargerBarnett Institute; Northeastern University; Boston, MA02115617/373-2867 or -2868, Fax: [email protected]

During the past year, we have made major progress in thedesign of a replaceable polymer matrix for DNA sequenc-ing and the development of the first generation multiplecapillary array of 12 capillaries. We also implementedultrafast separation of dsDNA (e.g. 30 sec for completeresolution of the standard X174-HAE III restriction frag-ments).

In the separation of sequencing reaction products, we com-pleted a study on the role of polymer molecular weight andconcentration. Using linear polyacrylamide (LPA), thepolymer with which we have had our most success, wehave achieved 1000 base read lengths in 1 1/2 hrs. Optimi-zation of column length, electric field and column tem-perature (50° C) was required. Using emulsion polymer-ization, we are now able to produce LPA powders withMW of ~104 k Da. The fully replaceable matrix is verypowerful for rapid sequencing of long reads.

We have successfully implemented a 12-capillary arrayinstrument and are using it to study issues of ruggedness inroutine sequencing. As part of this, we have developed asample clean-up procedure which reduces all reactions to asimilar state in terms of sample solution prior to injection.The results of this work have led to the design of a 96-cap-illary array that we will implement over the next year.

We have also achieved very fast separations of ss- anddsDNA using short capillaries and very high yields. Forexample, sequencing 300 bases in 3–4 mins. has beenshown, as well as very rapid mutational analysis. Imple-mentation of such speeds on a capillary array will createan instrument for high throughput automated analysis.


Ultrasensitive Fluorescence Detectionof DNA

Richard A. Mathies and Alexander N. GlazerDepartments of Chemistry and Molecular and CellBiology; University of California; Berkeley CA 94720510/642-4192, Fax: -3599, [email protected]

The overall goal of this project is to develop new fluores-cence labeling methods, separation methods and detectiontechnologies for DNA sequencing and genomic analysis.

Highlights along with representative publications are givenbelow.

Energy Transfer Primers. Families of sequencing and PCRprimers have been developed that contain both fluores-cence donor and acceptor chromophores.1 These labeledprimers with optimized excitation and emission propertiesprovide from 2- to 20-fold enhanced signal intensities inautomated DNA sequencing with slab gels and with capil-lary arrays.2 The reduced spectral cross talk of these ETprimers also makes them valuable in PCR product andSTR analyses.3

New Intercalation Dye Labels. A new family ofheterodimeric bis-intercalation dyes has been synthesizedexploiting the concept of fluorescence energy transfer be-tween two different cyanine intercalators.4 By tailoring thespectroscopic properties of the dyes, labels with intenseemission above 650 nm following 488 nm excitation havebeen fabricated. By adjusting the spacing linker betweenthe two dyes, the binding affinity has also been optimized.These molecules are useful for noncovalent multiplex la-beling of ds-DNA in a wide variety of multicolor analy-ses.5

Capillary Electrophoresis Chips. Capillary and capillaryarray electrophoresis systems have been photolithographi-cally fabricated on 2x3' glass substrates.6 These devicesprovide high quality electrophoretic separations ofds-DNA fragments and DNA sequencing reactions with a10-fold increase in speed.7 Arrays of up to 32 capillaries ona single chip have been fabricated.

Single DNA Molecule Fluorescence Burst Detection. Aconfocal fluorescence system has been used to demon-strate that single molecule fluorescence burst counting canbe used to detect CE separations of ds-DNA fragments.Fragments as small as 50 bp can be counted and mass sen-sitivities as low as 100 molecules per electrophoresis bandare possible. This technology should be valuable in incipi-ent cancer and trace pathogen detection.8


○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing



References1. Ju, J., Ruan, C., Fuller, C. W., Glazer, A. N. and Mathies, R. A.

Fluorescence Energy Transfer Dye-Labeled Primers for DNASequencing and Analysis, Proc. Natl. Acad. Sci. U.S.A. 92,4347-4351 (1995).

2. Ju, J., Glazer, A. N. and Mathies, R. A. Energy Transfer Primers: ANew Fluorescence Labeling Paradigm for DNA Sequencing andAnalysis, Nature Medicine 2, 180-182 (1996).

3. Wang, Y., Ju, J., Carpenter, B., Atherton, J. M., Sensabaugh, G. F. andMathies, R. A. High-Speed, High-Throughput THO1 Allelic SizingUsing Energy Transfer Fluorescent Primers and Capillary ArrayElectrophoresis, Analytical Chemistry 67, 1197-1203 (1995).

4. Benson, S. C., Zeng, Z., and Glazer, A. N. Fluorescence EnergyTransfer Cyanine Heterodimers with High Affinity forDouble-Stranded DNA. I. Synthesis and Spectroscopic Properties,Anal. Biochem. 231, 247-255 (1995).

5. Zeng, Z., Benson, S. C., and Glazer, A. N. Fluorescence EnergyTransfer Cyanine Heterodimers with High Affinity forDouble-Stranded DNA. II. Applications to Multiplex RestrictionFragment Sizing, Anal. Biochem. 231, 256-260 (1995).

6. Woolley, A. T. and Mathies, R. A. Ultra-High-Speed DNA FragmentSeparations Using Microfabricated Capillary Array ElectrophoresisChips, Proc. Natl. Acad. Sci. U.S.A., 91, 11348-11352 (1994).

7. Woolley, A. T. and Mathies, R. A. Ultra-High-Speed DNA SequencingUsing Capillary Array Electrophoresis Chips, Analytical Chemistry67, 3676-3680 (1995).

8. Haab, B. B. and Mathies, R. A. Single Molecule Fluorescence BurstDetection of DNA Fragments Separated by Capillary Electrophoresis,Analytical Chemistry 67, 3253-3260 (1995).

Joint Human Genome ProgramBetween Argonne National Laboratoryand the Engelhardt Institute ofMolecular Biology

Andrei Mirzabekov ,1,2 G. Yershov,1,2 Y. Lysov,2 V.Barsky,2 V. Shick,2 and S. Bavikin11Argonne National Laboratory; Argonne, Il 60439630/252-3161 or -3361, Fax: /[email protected] Institute of Molecular Biology; 117984 Mos-cow, Russia

In 1996, more than thirty U.S. and Russian research work-ers participated in the joint Human Genome Program be-tween Argonne National Laboratory and Engelhardt Insti-tute of Molecular Biology on the development of sequenc-ing by hybridization with oligonucleotide microchips(SHOM).

During this year, about twenty Russian scientists havebeen working from 3 months to 1 year in ANL. In this pe-riod, 3 papers have been published and 5 papers acceptedfor publication, 3 more papers are submitted for publica-tion.

The main research efforts of the group have been concen-trated in three directions:I. Improvement of SHOM technology.II. Development of SHOM for the needs of Human Ge-

nome Program.

III. Development of new approaches based on SHOMtechnology.

I. Improvement of SHOM technology

As a major result of the work in this direction, simple, reli-able and effective methods of microchip manufacturing,sample preparations, and quantitative hybridization analy-sis by fluorescence microscopy have been developed orimproved.

1. Photopolymerization technique for production ofmicromatrices of polyacrylamide gel pads onhydrophobicized glass surface was improved to become asimple, highly reproducible and inexpensive procedure (7).

2. New and cheaper chemistry of the oligonucleotide im-mobilization has been developed and introduced for pro-duction of more durable microchips. It is based on the useof amino-oligonucleotides and aldehyde-gels instead of3-methyluridine-oligonucleotides and hydrazide-gels (3).

3. Four-pin robot has been constructed with computer con-trol of every microchip element production. High qualitymicrochips with 4100 immobilized oligonucleotides havebeen manufactured and the complexity of the microchipscan easily be scaled up to a few tens of thousand elements.

4. Two-color fluorescence microscope has been equippedfor regular use with proper mechanics and software. It al-lows investigators to regularly use the automatic quantita-tive monitoring of the hybridization on the whole micro-chip and to measure the kinetics of hybridization as well asthe melting curves of duplexes formed with all microchipoligonucleotides (1,2,8).

5. Four-color fluorescence microscope was manufacturedand four proper fluorescence dyes are at present under se-lection.

6. Chemical methods of introduction of several fluores-cence dyes into DNA and RNA with or without fragmenta-tion have been developed and regularly used in SHOMexperiments (4).

7. A theory describing the kinetics of hybridization withgel-immobilized oligonucleotides has been developed (5).

8. Simple and relatively inexpensive equipment (around$10,000 per set) has been produced for manual manufac-turing of microchips and fluorescence measurement of hy-bridization, which will enable every laboratory to produceand practically use microchips containing up to 100 immo-bilized oligonucleotides or other compounds.

II. Application of SHOM

Although the main goal of our SHOM development is toproduce a simple de novo sequencing procedure, a number

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


of other SHOM applications have been tested as interme-diate steps in the SHOM research.

1. Sequence analysis and sequencing

A number of technical problems should be solved for denovo sequencing although they are much less stringent forcomparative sequence analysis than for de novo sequenc-ing. Among these:

a) Reliable discrimination of perfect and mismatched du-plexes. We have significantly improved the discriminationby decreasing the length of hybridized oligonucleotides to6-and 8-mers (1, 7) and by using 5-mers in “contiguousstacking” hybridization (1,2). Essential improvement wasalso achieved by automatic measuring of the meltingcurves for duplexes formed in each microchip element andcalculating their thermodynamic parameters, free energy,enthalpy and entrophy for different regions of the meltingcurves and by comparing them with these parameters forperfect duplexes. In addition, a highly reliable discrimina-tion was achieved by using two-color fluorescence micros-copy and by quantitative comparison of the hybridizationpattern of a known DNA or synthetic oligonucleotides andDNA under study labeled with different fluorophores (8).

b) Difference in hybridization efficiency depends on theGC-content and the length of the duplex. We have equal-ized the efficiency by choosing proper concentration forthe immobilized oligonucleotide (6,7) and also by increas-ing the effective length of immobilized oligonucleotidesby adding at one or both their ends 5-nitroindole as a uni-versal base or a mixture of four bases (2).

c) Interference of hairpins and other structures in DNAwith less stable duplexes formed upon the DNA hybridiza-tion with comparatively short immobilized oligonucle-otides of the microchip. This interference was decreasedby fragmentation of the analysed sample of DNA and RNAin the course of incorporation of a fluorescence label (4).We have also tested incorporation by a chemical bond ofan intercalator into immobilized oligonucleotides that sta-bilized its base paring with DNA over hairpin formation(10).

d) Necessity to increase the microchip complexity for se-quencing long DNA stretches. As an alternative, furtherdevelopment of so-called contiguous stacking hybridiza-tion was shown to improve the efficiency of 8-mer micro-chip up to that of 13-mer microchip so that DNA of severalkilobases in length could be sequenced by SHOM (2).

e) 6-mer microchips for sequencing and sequence analysis.We have now come to the stage of manufacturing micro-chips containing 4,096 (i.e. all possible) 6-mers. The con-trol tests partly described above have shown that these mi-crochips can be effectively used for sequence analysis,mutation diagnostics and detection of sequencing mistakes

by conventional gel-sequencing methods. We hope thatafter demonstrating the efficiency of 6-mer microchips, weshall be able to get sufficient financial support for produc-tion of the microchip with all 65,536 8-mers.

2. Mutation diagnostics and gene polymorphism analysis

The improvements described above have been introducedfor reliable (“Yes” or “No” mode) identification ofsingle-base changes in human genomic DNA. The effi-ciency of SHOM has been demonstrated for identificationof a number of b-thalassemia mutations (1,2,8) and HLAallele variations in the human genome.

3. Identification of microorganisms and gene expressionmonitoring

Bacterial microchips have been manufactured and tested.Their ability for reliable identification of a number of bac-terial strains in the sample has been demonstrated (6). Thechips containing oligonucleotides complementary to spe-cific regions of 16S ribosomal RNA were hybridized withsamples of rRNA, total RNA, DNA and RNA transcriptsof PCR-amplified genomic rDNA. Similar preliminaryexperiments demonstrated the efficiency of SHOM formonitoring the gene expression.

III. Development of new approaches based on theSHOM technology

1. Enzymatic modification of nucleic acids on selected ele-ments of the oligonucleotide chip. The gel pads of the oli-gonucleotide chip are separated from each other by hydro-phobic glass surface. It prevents the cross-talking of thechip elements when a drop of solution is applied on speci-fied elements. At the same time, a high porosity of the gelallows diffusion of large proteins into the gel. We havedemonstrated that immobilized oligonucleotides can beenzymatically phosphorylated and ligated with contigu-ously stacked 5-mer after hybridization with DNA. Awalking sequencing procedure by stacked pentanucleotideswas proposed that is based on enzymatic ligation andphosphorylation on oligonucleotides chips (9).

2. DNA fractionation on oligonucleotide chips. Due to thesame properties, the oligonucleotide chips are used forfractionation of DNA after DNA hybridization with somecomplementary oligonucleotides of the chip. A new proce-dure for sequencing long DNA pieces was proposed that isbased on fractionation of DNA on fractionating oligo-nucleotide chips followed by sequencing of the isolatedDNA by SHOM on sequencing microchips. The procedureallows the investigator to skip cloning and mapping oflong DNA pieces (9).

Conclusions

It appears that the major technical problems of SHOMhave been in most part solved, and this technology can al-

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


ready be applied for sequence analysis and checking theaccuracy of conventional sequencing methods. A numberof other applications in the Human Genome Program arewithin the reach of SHOM, such as mutation screening,gene polymorphism studies, detection of microorganisms,gene expression studies, etc. Application of SHOM for denovo DNA sequencing requires manufacturing of morecomplicated microchips and improvement of some other,already available methods.

DOE Contract No. W-31-109-Eng-38.

References1. Yershov G., Barsky V., Belgovsky A., Kirillov Eu., Kreindlin E.,

Ivanov I., Parinov S., Guschin D., Drobishev A., Dubiley S.,Mirzabekov A. DNA analysis and diagnostics on oligonucleotidemicrochips. // Proc. Natl. Acad. Sci. 1996. Vol. 93. 4913-4918.

2. Parinov S., Barsky V., Yershov G., Kirillov Eu., Timofeev E.,Belgovskiy A., Mirzabekov A. DNA sequencing by hybridization tomicrochip octa-and decanucleotides extended by stackedpentanucleotides. // Nucl. Acids Res. 1996. Vol. 24. N 15. P.2998-3004.

3. Timofeev E., Kochetkova S., A., Mirzabekov A. Radioselectiveimmobilization of short oligonucleotides to acrylic copolymer gels //Nucl. Acids Res. 1996. Vol. 24. N 16. P. 3142-3148.

4. Prudnikov D., Mirzabekov A. Chemical methods of DNA and RNAfluorescent labelling. // Nucl. Acids Res. 1996., in press.

5. Livshits M., Mirzabekov A. Theoretical analysis of the kinetics ofDNA hybridization with gel-immobilized oligonucleotides. //Biophys. J. 1996. Vol. 71, in print//.

6. Guschin D., Mobarry B., Proudnikov D., Stahl D., Rittmann B.,Mirzabekov A. Oligonucleotide microchips as genosensors fordeterminative and environmental studies in microbiology //Appliedand Environmental Microbiology, in print//.

7. Guschin D., Yershov G., Zaslavsky A., Gemmell A., Shick V., LysovYu., Mirzabekov A. A simple method of oligonucleotide microchipmanufacturing and properties of the microchips // submitted forpublication.

8. Drobyshev A., Mologina N., Shik V., Pobedimskaya D., Yershov G.,Mirzabekov A. Sequence analysis by hybridization with oligonucle-otide microchip: identification of beta-thalassemia mutations // Gene(in print).

9. Dubiley S., Kirillov Eu., Lysov Yu., Mirzabekov A. DNA fractionation,sequence analysis and ligation of immobilized oligomers onoligonucleotide chips // submitted for publication.

10. Timofeev E., Smirnov I.P., Haff L.A., Tishchenko E.I., MirzabekovA.D., Florentiev V.L.. Methidium Intercalator Inserted into SyntheticOligonucleotides // Tetrahedron Letters 1996, v.37, N47, p.8467.

Relevant PublicationMethods of DNA sequencing by hybridization based on optimizing

concentration of matrix-bound oligonucleotide and device forcarrying out same by Khrapko K., Khorlin A., Ivanov I., Ershov G.,Lysov Yu., Florentiev V., Mirzabekov A. US Patent 5,552,270, Sep. 3,1996. PCT/RU92/00052, filed Mar 18, 1992.

High-Throughput DNA Sequencing: SAmpleSEquencing (SASE) Analysis as a Frameworkfor Identifying Genes and CompleteLarge-Scale Genomic Sequencing

Robert K. Moyzis and Jeffrey K. Griffith1

Center for Human Genome Studies; Los Alamos NationalLaboratory; Los Alamos, NM 87545505/667-3912, Fax: -2891, [email protected] of New Mexico; Albuquerque, NM 87131

The human chromosome 5 and 16 physical maps (Doggettet al., Nature 377:Suppl:335-365, 1995; Grady et al.,Genomics 32:91-96, 1996) provide the ideal frameworkfor initiating large-scale DNA sequencing. These physicalmapping studies have shown clearly that gene density inhumans will vary greatly. For example, band 16q21, con-sisting of 8 Mb of DNA, has no genes or trapped exonsassigned to it, as yet. In contrast, band 16p13.3 has an ex-tremely high density of coding regions in the DNA exam-ined to date (i.e., multiple genes/cosmid). Given this widevariation in gene density and current sequencing costs, wepropose that newly targeted genomic regions should beanalyzed first by a “Lewis and Clark” exploratory ap-proach, before committing to full length DNA sequencing.We are using a SAmple SEquencing (SASE) approach torapidly generate aligned sequences along the chromosome5 and 16 physical maps. SASE analysis is a method forrapidly “scanning” large genomic regions with minimalcost, identifying, and localizing most genes. Briefly, indi-vidual cosmids are partially digested with Sau3A and 3 kbfragments are recloned into double-strand sequencing vec-tors. By sequencing both ends of a 1X sampling of theserecloned fragments along with end sequences of thecosmid, 70% sequence coverage is achieved with 98%clone coverage. The majority of this clone coverage is or-dered by the relationship between the subclone end se-quences. These ordered sequences are ideal substrates fordirected sequencing strategies (for example, primer walk-ing or transposon sequencing). SASE analysis has beeninitiated on the 40 Mb short arm of chromosome 16 andthe 45 Mb short arm of chromosome 5. We propose tomake SASE sequences, along with feature annotation,publicly available through GSDB. Such data are sufficientto allow PCR amplification of the sequenced region fromGSDB submissions alone, eliminating the need for exten-sive clone archiving and distributing, will allow for theeffective “democratization” of the genome, allowing nu-merous laboratories to share and contribute to the growinggenome databases.


○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


One-Step PCR Sequencing

Kenneth W. Porter, J. David Briley, and Barbara RamsayShawDepartment of Chemistry; Duke University; Durham, NC27708919/660-1553, Fax: -1605, [email protected]

A method is described to simultaneously amplify and se-quence DNA using a new class of nucleotides containingboron. During the polymerase chain reaction,boron-modified nucleotides, i.e. 2'-deoxynucleoside5'-a-[P-borano]-triphosphates,1,2 are incorporated into theproduct DNA. The boranophosphate linkages are resistantto nucleases and thus the positions of the borano-phosphates can be revealed by exonuclease digestion,thereby generating a set of fragments that defines the DNAsequence. The boranophosphate method offers an alterna-tive to current PCR sequencing methods.

Single-sided primer extension with dideoxynucleotidechain terminators is avoided with the consequence that thesequencing fragments are derived directly from the origi-nal PCR products. Boranophosphate sequencing is demon-strated with the Pharmacia and the Applied Biosystems373A automatic sequencers producing data that is compa-rable to cycle sequencing.

DOE Grant No. DE-FG02-97ER62376 and NIH Grant No.HG00782.

References[1] Sood, A., Shaw, B. R., and Spielvogel, B. F. (1990) J. Amer. Chem.

Soc. 112, 9000-9001.[2] Tomasz, J., Shaw, B. R., Porter, K., Spielvogel, B. F., and Sood, A.

(1992) Angew. Chem. Int. Ed. Engl. 31. 1373-1375.

Automation of the Front End of DNASequencing

Lloyd M. Smith and Richard A. GuilfoyleUniversity of Wisconsin; Madison, WI 53706Guilfoyle: 608/265-6138, Fax: [email protected]

The objective of this project is to continue developingmore efficient tools and methods addressing the“front-end” processes of large-scale DNA sequencing. Ourspecific aims are high-throughput purification and map-ping of cosmid inserts, controlled fragmentation of randominserts, direct selection vectors for cloning and sequencing,high-throughput M13 clone isolations, andhigh-throughput template purifications.

An approach to multi-cosmid purifications was developedusing a cell-harvester and binding to GF/C glass fiberfilter-bottom microtiter plates. This method proved inad-equate because the yields were low and the DNA was eas-

ily fragmented. In the last year we have started examiningthe use of triplex-affinity capture (TAC) for this purpose asapplied to BACs, based on our previous success with TACpurification and restriction mapping of cosmids (1,2).

We initially proposed to control random fragmentation forshotgun cloning using CviJ1 and its methyltransferase.Instead, we are now exploring automating it by scaled-down nebulization and parallel processing.

We have made a vector, M13-102 (3,4, patented)), for fa-cilitating construction and improving quality of M13 shot-gun libraries. It allows direct selection of recombinants,dephosphorylation of inserts to reducing chimerics, con-tains universal primers for fluorescent sequencing, and atriplex sequence for easy TAC purification of linearizedRF DNA. We also made a version of this vector,M13-100Z, which expressed the alpha-peptide of B-gal. Itsutility is in flow cytometry based clone isolation. We con-tinue to develop these vectors for multiple cloning sites,and insert flipping using in closing steps of large-scale se-quencing projects.

We continue to develop high-throughput clone isolationsby flow cytometric cell sorting. M13 or plasmid clones cantheoretically be isolated at rates in microtiter wells at ratesup to 2 per second using our present FacStar-Plus cytom-eter and collection assembly. Theoretical rates are muchhigher. This bypasses plating onto solid-media and anyneed for plaque/colony picking. We initially tried isola-tions after microencapsulation of cells in agarose gelmicrobeads, but with H/W and S/W improvements we cannow distinguish positively selected transfected cells frombackground. Efficiency of sorting is very sensitive to de-tection efficiency. We continue to investigate differentmethods of florescence detection for various plasmid andM13 vector systems including fluorogenic substrates forB-gal, fluorescent-tagged antibodies to M13 or cell surfaceproteins, and green fluorescent protein as a reporter.

We have been developing a solid-phase filter plate methodfor M13 template purifications using carboxylated polysty-rene beads (Bangs Labs, IN) for automating on theHamilton 2200. It should process 96 samples in under 30minutes and deliver 1-2 micrograms per sample forcycle-sequencing. This approach has proven superior toothers we have tried with respect to amenability to auto-mation (5,6).

Ancillary projects . We reported a method for direct fluo-rescence analysis of genetic polymorphisms using oligo-nucleotide arrays on glass supports (7), which spun offother projects including (a) enhanced discrimination byartificial mismatch hybridization (8), restriction hybridiza-tion ordering of shotgun clones, and restriction siteindexing-PCR (RSI-PCR) (9, patent applied for). RSI-PCRis an alternative strategy to extra-long PCR which hasapplication in large gap filling (>45kb) differential

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


gene expression analysis, RFLP and EST marker produc-tion, end-sequencing and others.

Our most significant findings are the following:1. Improved direct selection M13 cloning vector2. Rapid restriction mapping of cosmids using

triple-helix affinity capture3. High-throughput M13 template production using car-

boxylated beads4. Sequencing of a cosmid encoding the Drosophila

GABA receptor5. Improved detection of sequencing clones by

flow-cytometry6. RSI-PCR, a strategy to obtain mapped and

sequence-ready DNA directly from up to 0.5 kb re-gions of a complex genome using palindromic class IIrestriction enzymes; bypasses conventional cloningmethodology (see previous section for applications).


References1. Ji, H., Smith, L.M., and Guilfoyle, R.A. (1994) GATA 11, 43-47.2. Ji, H., Francisco, T., Smith, L.M. and Guilfoyle, R.A. (1996) Genomics

31, 185-192.3. Guilfoyle,R. and Smith, L.M. (1994) Nucleic Acids Res. 22, 100-107.4. Chen, D., Johnson, A.F., Severin, J.M., Rank, D.R., Smith, L.M. and

Guilfoyle, R.A. (1996) Gene 172, 53-57.5. Kolner, D.E., Guilfoyle, R.A., and Smith, L. (1994) DNA Sequence 4,

253-257.6. Johnson, A.F., Wang, R., Ji, H., Chen, D., Guilfoyle, R.A. and Smith,

L.M. (1996) Anal Biochem 234, 83-95.7. Guo, Z., Guilfoyle, R.A., Thiel, A.J., Wang, R. and Smith, L.M. (1994)

Nucleic Acids Res, 22, 5456-5465.8. Guo, Z., Liu, Q., and Smith, L.M. (submitted).9. Guilfoyle, R.A., Guo, Z., Kroening, D., Leeck, C. and Smith,

L.M.(submitted).

High-Speed DNA Sequence Analysis byMatrix-Assisted Laser DesorptionMass Spectrometry

Lloyd M. Smith and Brian Chait1

Department of Chemistry; University of Wisconsin;Madison, WI 53706608/263-2594, Fax: /265-6780, [email protected] University; New York, NY 10021

Our mass spec research has focused primarily on the possi-bility of utilizing Matrix-Assisted Laser Desorption/Ioniza-tion Mass Spectrometry (MALDI-MS) as an alternativemethod to conventional gel electrophoresis for DNA se-quence analysis. In this approach, extension fragments gen-erated by the Sanger sequencing reactions are separated bysize and detected in the mass spectrometer in one step.

Our group has shown fragmentation to be a major factorlimiting accessible mass range, sensitivity, and mass reso-lution in the analysis of DNA by MALDI-MS. This DNA

fragmentation was shown to be strongly dependent on boththe MALDI matrix and the nucleic acid sequence em-ployed. Fragmentation is proposed to follow a pathway inwhich nucleobase protonation leads to cleavage of theN-glycosidic bond with base loss, followed by cleavage ofthe phosphodiester backbone. Modifications of the deox-yribose sugar ring by replacing the 2' hydrogen with moreelectron-withdrawing groups such as the hydroxyl orfluoro group were shown to stabilize the N-glycosidicbond, partially or completely blocking fragmentation at themodified nucleosides. The stabilization provided by thesechemical modifications was also shown to expand therange of matrices useful for nucleic acid analysis, yieldingin some cases greatly improved performance.


Relevant PublicationZhu, L.; Parr, G. P.; Fitzgerald, M. C.; Nelson, C. M.; Smith, L. M.

Oligodeoxynucleotide fragmentation in MALDI/TOF Massspectrometry using 355 nm radiation. J. Am. Chem. Soc. 1995, 117,6048-6056.

Analysis of Oligonucleotide Mixturesby Electrospray Ionization-MassSpectrometry

Richard D. Smith, David C. Muddiman, James E. Bruce,and Harold R. UdsethEnvironmental Molecular Sciences Laboratory; PacificNorthwest National Laboratory; Richland, WA 99352509/376-0723, Fax: -5824, [email protected]://www.emsl.pnl.gov:2080/docs/msd/fticr/advmasspec.html

This project aims to develop electrospray ionization massspectrometry (ESI-MS) methods for high speed DNA se-quencing of oligonucleotide mixtures, that can be inte-grated into an effective overall sequencing strategy. A sec-ond goal is develop mass spectrometric methods that canbe effective utilized in post genomic research in broad ar-eas of DNA characterization, such as with polymerasechain reaction to rapidly and accurately identify singlebase polymorphisms. ESI produces intact molecular ionsfrom DNA fragments of different size and sequence withhigh efficiency [1]. Our aim is to determine ESI massspectrometry conditions that are compatible with biologi-cal sample preparation to allow efficient ionization ofDNA and allowing for the analysis of complex mixtures(e.g., Sanger sequencing ladder). We have developed anovel on-line microdialysis method at PNNL to removesalts, detergents, and buffers from such biological prepara-tions as PCR and dideoxy sequencing mixtures. This hasallowed for rapid and efficient desalting (e.g., of sampleshaving 0.25 M NaCl) allowing ESI mass spectral analysiswithout the typically problematic Na-adducts observed.Oligonucleotide ions are typically produced from ESI with

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


a broad distribution of net charge states for each molecularspecies, and thus leading to difficulties in analysis of com-plex mixtures [1]. To make identification of each compo-nent in a sequencing mixture possible, the charge states ofmolecular ions can be reduced using gas-phase reactions.The charge-state reduction methods being examined in-clude: (1) reactions with organic acids and bases (in thesolution to be electrosprayed and the ESI-MS interface orthe gas phase); (2) the labeling of the oligonucleotideswith a designed functional group for production of mo-lecular ions of very low charge states; and (3) the shieldingof potential charge sites on the oligonucleotide phosphate/phosphodiester groups with polyamines (and the subse-quent gas-phase removal of the neutral amines). In initialstudies two methods for charge state reduction of gasphase oligonucleotide negative ions have been tested: (1)the addition of acids and bases to the oligonucleotide solu-tion and (2) the formation of diamine adducts followed bydissociation in the interface region [2,3]. Several methodsshow promise for charge state reduction and results havebeen demonstrated for series of smaller oligonucleotides.We have recently demonstrated for the first time that PCRproducts can be rapidly detected using ESI-MS with sig-nificant improvements projected [4,5]. Finally, new massspectrometric methods have been developed to provide thedynamic range expansion necessary for addressing DNAsequencing mixtures [6]. Our overall aim is to provide afoundation for the development of an overall approach tohigh speed sequencing (including the rapid and precisePCR product characterization) using cost effectivehigh-throughput instrumentation.

DOE Contract No. DE-AC06-76RLO-1830.

References[1] “New Developments in Biochemical Mass Spectrometry:

Electrospray Ionization”, R. D. Smith, J. A. Loo, C. G. Edmonds, C.J. Barinaga, and H.R. Udseth, Anal. Chem., 62, 882-889 (1990).

[2] “Charge State Reduction of Oligonucleotide Negative Ions fromElectrospray Ionization”, X. Cheng, D. C. Gale, H. R. Udseth, and R.D. Smith, Anal. Chem., 67, 586-593 (1995).

[3] “Charge-State Reduction with Improved Signal Intensity ofOligonucleotides in Electrospray Ionization Mass Spectrometry” D.C.Muddiman, X.Cheng, H.R. Udseth and R.D. Smith J. Am. Soc. MassSpectrom., 7 (8) 697-706 (1996).

[4] “Analysis of Double-stranded Polymerase Chain Reaction Productsfrom the Bacillus cereus Group by Electrospray Ionization FourierTransform Ion Cyclotron Resonance Mass Spectrometry” D.S.Wunschel, K.F. Fox, A. Fox, J.E. Bruce, D.C. Muddiman and R.D.Smith Rapid Commun. in Mass Spectrom., 10, 29-35 (1996).

[5] “Characterization of PCR Products From Bacilli Using ElectrosprayIonization FTICR Mass Spectrometry”, D. C. Muddiman, D. S.Wunschel, C. Liu, L. Pasa-Tolic, K. F. Fox, A. Fox, G. A. Anderson,and R. D. Smith, Anal. Chem., 68, 3705-3712 (1996).

[6] “Colored Noise Waveforms and Quadrupole Excitation for theDynamic Range Expansion in Fourier Transform Ion CyclotronResonance Mass Spectrometry”, J. E. Bruce, G. A. Anderson and R.D. Smith, Anal. Chem., 68, 534-541 (1996).

High-Speed Sequencing of Single DNAMolecules in the Gas Phase byFTICR-MS

Richard D. Smith, David C. Muddiman, S. A. Hofstadler,and J. E. BruceEnvironmental Molecular Sciences Laboratory; PacificNorthwest National Laboratory; Richland, WA 99352509/376-0723, Fax: -5824, [email protected]://www.emsl.pnl.gov:2080/docs/msd/fticr/advmasspec.html

This project is aimed at the development of a totally newconcept for high speed DNA sequencing based upon theanalysis of single (i.e., individual)large DNA fragmentsusing electrospray ionization (ESI) combined with Fouriertransform ion cyclotron resonance (FTICR) mass spec-trometry. In our approach, large single-stranded DNA seg-ments extending to as much as 25 kilobases (and possiblymuch larger), are transferred to the gas phase using ESI.The multiply-charged molecular ions are trapped in thecell of an FTICR mass spectrometer, where one or moresingle ion(s) are then selected for analysis in which itsmass-to-charge ratio (m/z) is measured both rapidly andnon-destructively. Single ion detection is achievable due tothe high charge state of the electrosprayed ions and theunique sensitivity of new FTICR detection methodologies.

Initial efforts under this project have demonstrated the ca-pability for the formation, extended trapping, isolation,and monitoring of sequential reactions of highly chargedDNA molecular ions with molecular weights well into themegadalton range [1-6]. We have shown that largemultiply-charged individual ions of both single anddouble-stranded DNA anions can also be efficientlytrapped in an FTICR cell, and their mass-to-charge ratiosmeasured with very high accuracy. Thus, it is feasible toquickly determine the mass of each lost unit as the DNA issubjected to rapid reactive degradation steps. One ap-proach is to develop methods based upon the use ofion-molecule or photochemical processes that can promotea stepwise reactive degradation of gas-phase DNA anions.Successful development of one of these approaches couldgreatly reduce the cost and enhance the speed of DNA se-quencing, potentially allowing for sequencing DNA seg-ments of more than 25 kilobase in length, on a time scaleof minutes with negligible error rates with the added po-tential for conducting many such measurements in parallel.Instrumentation optimized for these purposes is currentlybeing introduced and promises to greatly advance themethodology. The techniques being developed promise tolead to a host of new methods for DNA characterization,potentially extending to the size of much larger DNA re-striction fragments (>500 kilobases).

DOE Contract No. DE-AC06-76RLO-1830.

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing



References[1] “Trapping Detection and Reaction of Very Large Single Molecular

Ions by Mass Spectrometry,” R. D. Smith, X. Cheng, J. E. Bruce, S.A.Hofstadler and G.A. Anderson, Nature, 369, 137-139 (1994).

[2] “Charge State Shifting of Individual Multiply-Charged Ions of BovineAlbumin Dimer and Molecular Weight Determination Using anIndividual-Ion Approach,” X. Cheng, R. Bakhtiar, S. Van Orden, andR. D. Smith, Anal.Chem., 66, 2084-2087 (1994).

[3] “Trapping, Detection, and Mass Measurement of Individual Ions in aFourier Transform Ion Cyclotron Resonance Mass Spectrometer,: J.E.Bruce,X. Cheng, R. Bakhtiar, Q. Wu, S.A. Hofstadler, G.A.Anderson, and R.D.Smith, J. Amer. Chem. Soc., 116, 7839-7847(1994).

[4] “Direct Charge Number and Molecular Weight Determination ofLarge Individual Ions by Electrospray Ionization-Fourier TransformIon Cyclotron Resonance Mass Spectrometry”, R. Chen, Q. Wu, D.W.Mitchell, S.A. Hofstadler, A.L. Rockwood, and R. D. Smith, Anal.Chem., 66, 3964-3969 (1994).

[5] “Trapping, Detection and Mass Determination of Coliphage T4 (108MDa) Ions by Electrospray Ionization Fourier Transform IonCyclotron Resonance Mass Spectrometry” R. Chen, X. Cheng, D.W.Mitchell, S.A. Hofstadler, A.L. Rockwood, Q. Wu, M.G. Shermanand R.D. Smith, Anal. Chem.,67, 1159-1163 (1995).

[6] “Accurate Molecular Weight Determination of Plasmid DNA UsingMass Spectrometry”, X. Cheng, D. G. Camp II, Q. Wu, R. Bakhtiar,D. L. Springer, B.J. Morris, J. E. Bruce, G. A. Anderson, C. G.Edmonds and R. D. Smith, Nucleic Acid Res., 24, 2183-2189 (1996).

Characterization and Modification ofDNA Polymerases for Use in DNASequencing

Stanley TaborHarvard University; Boston, MA 02115-5730617/432-3128, Fax: -3362, [email protected]://sbweb.med.harvard.edu/~bcmphttp://sbweb.med.harvard.edu/~bcmp/tabor.html

Our studies are directed towards improving the propertiesof DNA polymerases for use in DNA sequencing. The pri-mary focus is understanding the mechanism by whichDNA polymerases discriminate against nucleotide analogs,and the mechanism by which they incorporate nucleotidesprocessively without dissociating from the DNA template.

We are comparing three DNA polymerases that have beenused extensively for DNA sequencing; E. coli DNA poly-merase I, T7 DNA polymerase, and Taq DNA polymerase.These are related to one another, and this homology hasbeen exploited to construct active site hybrids that havebeen used to determine the structural basis for differencesin their activities. Specifically, the hybrids have been used(1) to determine why E. coli DNA polymerase I and TaqDNA polymerase discriminate strongly againstdideoxynucleotides, and (2) to understand how T7 DNApolymerase interacts with its processivity factor,thioredoxin, to confer high processivity.

Based on these studies, we have been able to modify TaqDNA polymerase and E. coli DNA polymerase I to makethem incorporate dideoxynucleotides much more effi-

ciently, and to have increased processivity in the presenceof thioredoxin. The ability to incorporatedideoxynucleotides efficiently greatly improves the unifor-mity of band intensities on a DNA sequencing gel, therebyincreasing the accuracy of the DNA sequence obtained. Inaddition, the efficient use of dideoxynucleotides reducesthe amount of these analogs required for DNA sequencing,an important issue when using fluorescently modifieddideoxy terminators. In an approach that complementsthese studies, we, in collaboration with Dr. ThomasEllenberger (Harvard Medical School), are determining thecrystal structure of T7 DNA polymerase in a complex withthioredoxin and a primer-template. Knowledge of thisstructure will allow the rationale design of specific muta-tions that will enable DNA polymerases to incorporateother analogs useful for DNA sequencing more efficiently,such as those with fluorescent moieties on the bases.


Relevant PublicationTabor, S., and Richardson, C. C. (1995). A single residue in DNA

polymerases of the Escherichia coli DNA polymerase I family iscritical for distinguishing between deoxy-and dideoxyribonucleotides.Proc. Natl. Acad. Sci. U.S.A. 92, 6339-6343.

Bedford, E., Tabor, S. and Richardson, C. C. (1997). The thioredoxinbinding domain of bacteriophage T7 DNA polymerase confersprocessivity on Escherichia coli DNA polymerase I. Proc. Natl. Acad.Sci. U.S.A. 94, 479-484.

Modular Primers for DNA Sequencing

Mugasimangalam Raja,1,2 Dina Sonkin,2 Lev Lvovsky,2

and Levy Ulanovsky1,2

1Center for Mechanistic Biology and Biotechnology;Argonne National Laboratory, Argonne, IL 60439-4833Ulanovsky: 630/252-3940; Fax: -3387, [email protected]. of Structural Biology; Weizmann Institute of Sci-ence; Rehovot 76100, Israel

We are developing molecular approaches to DNA sequenc-ing enabling primer walking without the step of chemicalsynthesis of oligonucleotide primers between the walks.One such approach involves “modular primers” describedearlier, consisting of 5-mers, 6-mers or 7-mers (selectedfrom a presynthesized library), annealing to the templatecontiguously with each other. Another approach, that wehave termed DENS (Differential Extension with Nucle-otide Subsets), works by selectively extending a shortprimer, making it a long one at the intended site only.DENS starts with a limited initial extension of the primer(at 20-30 C) in the presence of only 2 out of the 4 possibledNTPs. The primer is extended by 6-9 bases or longer atthe intended priming site, which is deliberately selected,(as is the two-dNTP set), to maximize the extensionlength. The subsequent sequencing/termination reaction at60-65 C then accepts the extended primer at the intendedsite, but not at alternative sites, where the initial extension

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


(if any) is generally much shorter. DENS allows the use ofprimers as long as 8-mers (degenerate in 2 positions)which prime much more strongly than modular primersinvolving 5-7 mers and which (unlike the latter) can beused with thermostable polymerases, thus allowingcycle-sequencing with dye-terminators for Taq, as well asmaking double-stranded DNA sequencing more robust.

These technologies are expected to speed up genome se-quencing in more than one way:

a) Reduction in redundancy would result from more effi-cient and rapid closure of even long gaps which are cur-rently avoided at the price of 7-to 9-fold redundancy inshotgun. Instantly available primers would also improvethe quality of sequencing. Stretches of sequence that havetoo low confidence level (high suspected error rate) can beresequenced without synthesizing new oligos and withoutgrowing any new subclones.

b) Further down the road, the completion of the automa-tion of the closed cycle of primer walking will be madepossible via the elimination of the need to synthesize thewalking primers. Combined with the capillary sequencers,the instant availability of the walking primers should re-duce the time per walking cycle from 2-3 days now toabout 1.5-2.0 hours, an improvement in speed by a factorof 20-50.

c) The closed-end automation would minimize both thelabor cost and human errors. As primer walking has mini-mal, if any, front-end and back-end bottlenecks inherent toshotgun, the cost of sequencing would be essentially thatof reagents, 5 cents/base or less.


Time-of-Flight Mass Spectroscopy ofDNA for Rapid Sequence

Peter Williams, Chau-Wen Chou, David Dogruel, JenniferKrone, Kathy Lewis, and Randall NelsonDepartment of Chemistry and Biochemistry; Arizona StateUniversity; Tempe, AZ 85287602/965-4107, Fax: -2747, [email protected]

There are three potential roles for mass spectrometry rel-evant to the Human Genome Project:

a) The most obvious role is that on which all groups havebeen focussing -development of an alternative, faster se-quence ladder readout method to speed up large-scale se-quencing. Progress here has been difficult and slow be-cause the mass spectrometry requirements exceed the cur-rent capabilities of mass spectrometry even for proteins,and DNA presents significantly more difficulty than pro-teins. We have shown previously that pulsed laser ablation

of DNA from frozen aqueous films has the potential toyield sequence-quality mass spectra, but that ionization inthis approach is erratic and uncontrollable. We are focus-sing on developing ionization methods using ion (or elec-tron) attachment to vapor-phase DNA (ablated from icefilms) in an electric field-free environment; results of thisapproach will be reported.

b) Mass spectrometry may not ultimately compete favor-ably in speed with large-scale multiplexing of conven-tional or near-term technologies such as capillary electro-phoresis. However, as the Genome project nears comple-tion there will be an increasing need for rapid small-scaleDNA analysis, where the multiplex advantage will not beso great and mass spectrometry could play a more signifi-cant role there. With this in mind we are looking at ways tospeed up the overall mass spectrometric analysis, e.g.simple rapid cleanup of sequence mixtures, and at genera-tion of short sequence ladders by exopeptidase digestion.

c) Given the genome data base(s) at the completion of theproject, with rapid search capability, a need will arise forcomparably rapid generation of search input data to iden-tify often very small quantities of proteins isolated frombiochemical investigations. With this in mind we have de-veloped extremely rapid enzyme digestion techniques opti-mized for mass spectrometric readout, using endopepti-dases covalently coupled directly to the mass spectrometerprobe tip. The elimination of autolysis and transfer lossesallows rapid (few minute) endopeptidase digestion andmass analysis of as little as 1 picomole of protein, leadingto an ambiguous database identification. An alternativesearch procedure uses partial amino-acid sequence infor-mation. With the added use of exopeptidases to generate apeptide ladder sequence in the mass spectrum of the en-dopeptidase digest, on the order of a dozen residues of in-ternal sequence can be generated in a total analysis time of20 minutes or less, again using only picomoles of sample.


Development of Instrumentation forDNA Sequencing at a Rate of 40Million Bases Per Day

Edward S. Yeung, Huan-Tsung Chang, Qingbo Li,Xiandan Lu, and Eliza FungAmes Laboratory and Department of Chemistry; IowaState University; Ames, IA 50011515/294-8062, Fax: -0266, [email protected]

We have developed novel separation, detection, and imag-ing techniques for real-time monitoring in capillary elec-trophoresis. These techniques will be used to substantiallyincrease the speed, throughput, reliability, and sensitivityin DNA sequencing applications in highly multiplexed

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing


capillary arrays. We estimate that it should be possible toeventually achieve a raw sequencing rate of 40 millionbases per day in one instrument based on the standardSanger protocol. We have reached a stage where an actualsequencing instrument with 100 capillaries can be built toreplace the Applied Biosystems 373 or 377 instruments,with a net gain in speed and throughput of 100-fold and24-fold, respectively.

The substantial increase in sequencing rate is a result ofseveral technical advances in our laboratory. (1) The use ofcommercial linear polymers for sieving allows replaceableyet reproducible matrices to be prepared that have lowerviscosity (thus faster migration rates) compared to poly-acrylamide. (2) The use of a charge-injection device cameraallows random data acquisition to decrease data storage anddata transfer time. (3) The use of distinct excitation wave-lengths and cut-off emission filters allows maximum lightthroughput for efficient excitation and sensitive detectionemploying the standard 4-dye coding. (4) The use ofindexmatching and 1:1 imaging reduces stray light withoutsacrificing the convenience of on-column detection.

○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing

Continuing efforts include further optimization of theseparation matrix, development of new column condition-ing protocols, refinement of the excitation/emission optics,design of a pressure injection system for 96-well titerplates, validation of a new 2-color base-calling scheme,simplification of software to allow essentially real-timedata processing, implementation of voltage programmingto shorten the total run times, and scale up of the technol-ogy to allow parallel sequencing in up to 1,000 capillaries.

Relevant PublicationsK. Ueno and E. S. Yeung, “Simultaneous Monitoring of DNA Fragments

Separated by Capillary Electrophoresis in a Multiplexed Array of 100Channels”, Anal. Chem. 66, 1424-1431 (1994).

X. Lu and E. S. Yeung, “Optimization of Excitation and DetectionGeometry for Multiplexed Capillary Array Electrophoresis of DNAFragments”, Appl. Spectrosc. 49, 605-609 (1995).

Q. Li and E. S. Yeung, “Evaluation of the Potential of a Charge InjectionDevice for DNA Sequencing by Multiplexed Capillary Electrophore-sis”, Appl. Spectrosc. 49, 825-833 (1995).

E. N. Fung and E. S. Yeung, “High-Speed DNA Sequencing by UsingMixed Poly(ethyleneoxide) Solutions in Uncoated CapillaryColumns,” Anal. Chem. 67, 1913-1919 (1995).

Q. Li and E. S. Yeung, “Simple Two-Color Base-Calling Schemes forDNA Sequencing Based on Standard 4-Label Sanger Chemistry”,Appl. Spectrosc. 49, 1528-1533 (1995).


Resolving Proteins Bound to IndividualDNA Molecules

David Allison, Bruce Warmack, Mitch Doktycz, TomThundat, and Peter HoytMolecular Imaging Group; Health Sciences ResearchDivision; Oak Ridge National Laboratory; Oak Ridge, TN37831-6123Allison: 423/574-6199, Fax: -6210, [email protected]: 423/574-6202, Fax: -6210, [email protected]

We have precisely located sequence specific proteinsbound to individual DNA molecules by direct AFM imag-ing. Using a mutant EcoR I endonuclease that site-specifi-cally binds but doesn’t cleave DNA, bound enzyme hasbeen imaged and located, with an accuracy of ±1%, onwell characterized plasmids and bacteriophage lambdaDNA (48 kb). Cosmids have been mapped and, by incor-porating methods for anchoring molecules to surfaces andstraightening to prevent molecular entanglement, BAC-sized clones could be analyzed.

This direct imaging approach could be rapidly developedto locate other sequence-specific proteins on genomicclones. Enzymatic proteins, involved in identifying andrepairing damaged or mutated regions on DNA molecules,could be imaged bound to lesion sites. Transcription factorproteins that identify gene-start regions and other regula-tory proteins that modulate the expression of genes bybinding to specific control sequences on DNA moleculescould be precisely located on intact cloned DNAs.

Conventional gel-based techniques for identifying site-specific protein binding sites must rely upon fragmentanalysis for identifying restriction enzyme sites, or, for non-cutting proteins, upon gel-shift methods that can only ad-dress small DNA fragments. Conversely, AFM imaging isa general approach that is applicable to the analysis of allsite-specific DNA protein interactions on large-insert clones.This technique could be developed for high-throughputanalysis, can be accomplished by technicians, uses readilyavailable relatively inexpensive instrumentation, and shouldbe a technology fully transferable to most laboratories.

DOE Contract No. DE-AC05-840R21400.

*Improved Cell Electrotransformationby Macromolecules

Alexandre S. Boitsov, Boris V. Oskin, Anton O. Reshetin,and Stepan A. BoitsovDepartment of Biophysics; St.Petersburg State TechnicalUniversity; 195251 St. Petersburg, Russia+7-812/277-5959, Fax: /247-2088 or /534-3314,[email protected]

Our work for 1996 and 1997 will include the following:

1. Comparative study of the kinetics of entry of DNA ofdifferent molecular forms into E.coli cells DH10B/r andDH5a during electrotransformation. Study of the optimalregimes of cell-wall permeabilization for the DH10B/r cells.

2. Study of the efficiency of BAC cloning in DH10B/rcells using new electrotransformation method. Optimiza-tion of the procedure for DH10B/r cells.

3. Modernization of the electronic equipment in accor-dance with results of the biological experiments. To ex-pand the studies, we need to extend the capability of theinstrumentation to increase its flexibility and to improvethe accuracy and reproducibility of the electric fields wegenerate by incorporating electronic components withhigher tolerances.


Overcoming Genome MappingBottlenecks

Charles R. CantorCenter for Advanced Biotechnology; Boston University;Boston MA 02215617/353-8500, Fax: 8501, [email protected]://eng.bu.edu/CAB

Most traditional DNA analysis is done based on fraction-ation of DNA by length. We have, instead, begun to ex-plore the use of DNA sequences as capture and detectionmethods to expedite a number of procedures in genomeanalysis.

Triplet repeats like (GGC)n are an important class of hu-

man genetic markers, and they are also responsible for anumber of inherited diseases involving the central nervoussystem. For both of these reasons it would be very usefulto have a way to monitor the status of large numbers oftriplet repeats simultaneously. We are developing methodsto isolate and profile classes of such repeats.

In one method, genomic DNA is cut with one or more re-striction nucleases, and splints are ligated onto the ends ofthe fragments. Then fragments containing a specific classof repeats are isolated by capture on magnetic microbeadscontaining an immobilized simple repeating sequence. Thedesired material is then released, and, if necessary, a selec-tive PCR is done to reduce the complexity of the sample.Otherwise the entire captured sample is amplified by PCR.The spectrum of repeats is then examined by electrophore-sis on an automated fluorescent gel reader. In our case thePharmacia ALF is used, because of its excellent quantita-tive signal accuracy. A very complex spectrum of bands is

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Mapping



seen representing hundreds of DNA fragments. We haveshown that this spectrum is dramatically different withDNAs from unrelated individuals, and the spectrum ismarkedly dependent on the choice of restriction enzyme,as expected. Repeated measurements on the same sampleare highly reproducible. The ability of the method to detecta specific altered repeat length in a complex DNA samplehas been validated by examining several individuals withnormal or expanded repeat sequences in the Huntington’sdisease gene. One very powerful application of thismethod may be the analysis of potential DNA differencesin monozygotic twins discordant for a genetic disease.This method can be used to capture genome subsets con-taining any interspersed repeat. It will also detect inser-tions and deletions nearby such repeats. Methylation dif-ferences between sensitive methylation samples are alsodetectable when restriction fragments are used.

Conventional analysis of triplet repeats is very laborioussince individual repeats must be analyzed by electrophore-sis on DNA sequencing gels. The decrease in effort forsuch analyses will scale linearly as the number of repeatsthat can be analyzed simultaneously, so we are potentiallylooking at something like a factor of 100 improvement ifthe above scheme under development can be effectivelyrealized.

As an alternative approach, we are developing chip-basedmethods that can detect the length of a tandemly-repeatingsequence without any need for gel electrophoresis. Herethe goal is to build an array of all possible repeat sequencelengths flanked by single-copy DNA. When an actualsample is hybridized to such an array, the specific allelesin the sample will produce perfect duplexes at their corre-sponding points in the array and at mismatched duplexeselsewhere. Thus, the task of scoring the repeat lengths isreduced to the task of distinguishing perfect and imperfectduplexes. Currently we are exploring a number of differentenzymatic protocols that offer the promise of making suchdistinctions reliably.

In other work we are using enzyme-enhanced sequencingby hybridization (SBH) as a device for the rapid prepara-tion of DNA samples for mass spectrometry. For example,partially duplex DNA probes can capture and generate se-quence ladders from any arbitrary DNA sequence. CurrentMALDI protocols allow sequence to be read to lengths of50 to 60 bases. While this is probably insufficient for mostde novo DNA sequencing, it is an extremely promisingapproach for comparative or diagnostic DNA sequencing.


Preparation of PAC Libraries

Joe Catanese, Baohui Zhao, Eirik Frengen, Chenyan Wu,Xiaoping Guan, Chira Chen, Eugenia Pietrzak,Panayotis A. Ioannou,1 Julie Korenberg,2 Joel Jessee,3 andPieter J. de JongDepartment of Human Genetics; Roswell Park CancerInstitute; Buffalo, NY 14263de Jong: 716/845-3168, Fax: [email protected]://bacpac.med.buffalo.edu1The Cyprus Institute of Neurology and Genetics; Nicosia,Cyprus2Cedars Sinai Medical Center; Los Angeles, CA 900483Life Technologies, Gaithersburg, MD 20898

Recently, we have developed procedures for the cloning oflarge DNA fragments using a bacteriophage P1 derivedvector, pCYPAC1 (loannou et al. (1994), Nature Genetics6: 84-89). A slightly modified vector (pCYPAC2) has nowbeen used to create a 15-fold redundant PAC library of thehuman genome, arrayed in more than 1,000 384-welldishes. DNA was obtained from blood lymphocytes from amale donor. The library was prepared in four distinct sec-tions designated as RPCI-1, RPCI-3, RPCI-4 and RPCI-5,respectively, each having 120 kbp average inserts. TheRPCI-1 segment of the library (3X; 120,000 clones, in-cluding 25% non-recombinant) has been distributed toover 40 genome centers worldwide and has been used inmany physical mapping studies, positional cloning effortsand in various large-scale DNA sequencing enterprises.Screening of the RPCI-1 library by numerous markers re-sults in an average of 3 positive PACs per autosome-derived probe or STS marker. In situ hybridization resultswith 250 PAC clones indicate that chimerism is low ornon-existing. Distribution of RPCI-3 (3X, 78,000 clones,less than 1% non-recombinants, 4% empty wells) is nowunderway and the further RPCI-4 and -5 segments (< 5%empty wells) will be distributed upon request. To facilitatescreening of the PAC library, we have provided the RPCI-1PAC library to several screening companies and noncommer-cial resource centers. In addition, we are now distributinghigh-density colony membranes at cost-recovery price,mainly to groups having a copy of the PAC library. Thecombined RPCI-1 and -3 segments (6X) can be repre-sented on 11 colony filters of 22x22 cm, using duplicatecolonies for each clone. We are currently generating asimilar PAC library from the 129 mouse strain.

To facilitate the additional use of large-insert bacterialclones for functional studies, we have prepared new PAC& BAC vectors with a dominant selectable marker gene(the blasticidin gene under control of the beta-actin pro-moter), an EBV replicon and an “update feature”. This fea-ture utilizes the specificity of Transposon Tn7 for the Tn7attsequence (in the new PAC and BAC vectors) to transposemarker genes, other replicons and other sequences into PACs

○ ○ ○ ○ ○ ○ ○

Mapping


or BACs. Hence, it facilitates retrofitting existing PAC/BAC clones (made with the new vectors) with desirablesequences without affecting the inserts. The new vector(s)are being applied to generate second generation librariesfor human (female donor), mouse and rat.

DOE Grant No. DE-FG02-94ER61883 and NIH Grant No.1R01RG01165.

Development of Affinity Technology forIsolating Individual HumanChromosomes by Third-StrandBinding

Jacques R. Fresco and Marion D. Johnson IIIDepartment of Molecular Biology; Princeton University;Princeton, NJ 08544-1011609/258-3927, Fax: [email protected]://molbiol.princeton.edu

Prior to the onset of this grant, solution conditions hadbeen developed for binding a 17-residue third strandoligodeoxyribonucleotide probe to a specific human chro-mosome (HC) 17 multicopy alpha satellite target sequencecloned into DNA vectors of varying size up to 50 kb.Binding was shown to be both highly efficient and spe-cific. Moreover, initial experiments with fluorescent-la-beled third strands and human lymphocyte metaphasespreads and interphase nuclei proved similarly successful.During the current research period, the technology for suchthird strand-based cytogenetic examination, i.e., Triplex InSitu Hybridization or TISH, of such spreads was perfected,so that it is now a highly reproducible method. Compari-son of spreads of different individuals by TISH and FISHanalysis has provided a new basis for detecting alpha satel-lite DNA polymorphisms, the basis of which requires fur-ther investigation.

This year work also commenced on the development ofcomparable probes specific for alpha satellite sequences inHC-X, 11, and 16. The work with HC-X has reached thestage where we are ready to test the probe for TISH-basedcytogenetic analysis. Solution studies of the interaction ofthe probes designed for HC-11 and HC-16 alpha satellitetargets are following the well-established path we em-ployed for HC-17 and HC-X. With the expectation of suc-cess in these cases during the coming year, the way shouldbe clear for the development and application of compa-rable probes for alpha satellite sequences of any other hu-man chromosomes that may be of interest, and possibly ofother eukaryotic species.

Meanwhile, we have begun to turn our attention to twoother goals, one being the exploitation of our probes forthe isolation of individual human chromosomes by affinity

purification, as we originally proposed. The other goal isto exploit our probes as aids in flow sorting human chro-mosomes, a direction of work we expect to pursue in col-laboration with the Los Alamos National Laboratory, justas soon as they indicate a readiness to do so. Finally, wehave begun to evaluate the possibility of using third-strandbinding fluorescent probes for detection of single copygenes by means of photon counting, a goal which we planto undertake with our colleague Robert Austin of our Phys-ics Department.


Chromosome Region-Specific Librariesfor Human Genome Analysis

Fa-Ten KaoEleanor Roosevelt Institute for Cancer Research; Denver,CO 80206303/333-4515, Fax: -8423, [email protected]

The objective of this project is to construct and character-ize chromosome region-specific libraries as resources forgenome analysis. We have used our chromosome micro-dissection and MboI linker-adaptor technique (PNAS 88,1844, 1991) to construct region-specific libraries for hu-man chromosome 2 and other chromosomes. The librarieshave been critically evaluated for high quality, includinginsert size, proportion of unique vs repetitive sequencemicroclones, percentage of microclones derived from dis-sected region, etc.

We have constructed and characterized 11 region-specificlibraries for the entire human chromosome 2 (the secondlargest human chromosome with 243 Mb of DNA), includ-ing 4 libraries for the short arm and 6 libraries for the longarm, plus a library for the centromere region. The librariesare large, containing hundreds of thousands of microclonesin plasmid vector pUC19, with a mean insert size of 200bp. About 40-60% of the microclones contain unique se-quences, and between 70-90% of the microclones werederived from the dissected region. In addition, we haveisolated and characterized many unique sequencemicroclones from each library that can be readily se-quenced as STSs, or used in isolating other clones withlarge inserts (like YAC, BAC, PAC, P1 or cosmid) forcontig assembly. These libraries have been used success-fully for high resolution physical mapping and for posi-tional cloning of disease-related genes assigned to theseregions, e.g. the cloning of the gene for hereditarynonpolypsis colorectal cancer (Cell 75, 1215, 1993).

For each library, we have established a plasmid sub-librarycontaining at least 20,000 independent microclones. Thesesub-libraries have been deposited to ATCC for permanentmaintenance and general distribution. The ATCC Reposi-tory numbers for these libraries are: #87188 for 2P1 library

○ ○ ○ ○ ○ ○ ○

Mapping


(region 2p23-p25, comprising 25 Mb); #87189 for 2P2library (2p21-p23, 28 Mb); #87103 for 2P3 library(2p14-p16, 22 Mb); #87104 for 2P4 library (2p11-p13, 28Mb); #77419 for 2Q1 library (2q35-q37, 28 Mb); #87308for 2Q2 library (2q33-q35, 24 Mb); #87309 for 2Q3 li-brary (2q31-q32, 26 Mb); #87310 for 2Q4 library(2q23-q24, 19 Mb); #87409 for 2Q5 library (2q21-q22, 23Mb); #87410 for 2Q6 lbrary (2q11-q14, 31 Mb); and#87411 for 2CEN library (2p11.1-q11.1, 4 Mb). Details ofthese libraries have been described: Hum. Genet. 93, 557,1994 (for 2P1 library); Cytogenet. Cell Genet. 68, 17,1995 (for 2P2 library); Somat. Cell Mol. Genet. 20, 353,1994 (for 2P3 library); Somat. Cell Mol. Genet. 20, 133,1994 (for 2P4 library); Genomics 14, 769, 1992 (for 2Q1library; Somat. Cell Mol. Genet. 21, 335, 1995 (for 2Q2,2Q3 & 2Q4 libraries); Somat. Cell Mol. Genet. 22, 57,1996 (for 2Q5, 2Q6 & 2CEN libraries).

Region-specific libraries and short insert microclones forchromosome 2 are particularly useful resources for itseventual sequencing because this chromosome is less ex-ploited and detailed mapping information is lacking. Wehave also constructed 3 region-specific libraries for theentire chromosome 18 using similar methodologies, in-cluding 18P library (18p11.32-p11.1, 22 Mb); 18Q1 library(18q11.1-q12.3, 25 Mb); and 18Q2 library (18q21.1-q23,34 Mb). Details of these libraries have been described(Somat. Cell Mol. Genet. 22, 191-199, 1996).


*Identification and Mapping ofDNA-Binding Proteins Along GenomicDNA by DNA-Protein Crosslinking

V.L. Karpov , O.V. Preobrazhenskaya, S.V. Belikov, andD.E. KamashevEngelhardt Institute of Molecular Biology; RussianAcademy of Sciences; Moscow 17984, RussiaFax: +7-095/135-1405, [email protected]

In 1995-1996 we continued to map and identify nonhistoneproteins binding at loci along the yeast chromosome. UsingDNA-protein crosslinking in vivo, we detected two polypep-tides that probably correspond to core subunits of yeastRNA-polymerase II in the coding region of the transketolasegene (TKL2). Several nonhistone proteins were detectedthat bind to the upstream region of TKL2 and to anintergenic spacer between calmodulin (CMD1) andmannosyl transferase (ALG1) genes. The apparent molecularweight of these proteins was estimated. We also developeda new method to synthesize strand-specific probes.

Using DNA-protein crosslinking in vitro, we found theamino acid residues of the Lac-repressor that interacts withDNA. Only Lys-33 crosslinks with the Lac-operator in thespecific complex.

In addition to Lys-33, the N-terminal end of the proteinalso crosslinks in a nonspecific complex. Our results dem-onstrate that, in the presence of an inducer, the repressor’sN-termini crosslink to the operator’s outermost nucle-otides. We suggest that binding of an inducer changes theorientation of the DNA-binding domain of the Lac repres-sor to the opposite of that found for the specific complex.

We plan to use a new method to increase resolution andthus identify amino acids and nucleotides that participatein DNA-protein recognition. The mechanisms of transcrip-tion regulation of some yeast genes will thus be furtherelucidated. Our approaches are based on DNA-proteincrosslinking. Detailed analysis will be done for specificand nonspecific complexes, in the presence and absence ofinducers. This will allow us to make some conclusionsabout possible conformational rearrangements inDNA-protein complexes during gene activation at theprotein’s DNA-binding domains.

DOE Grant No. OR00033-93C1S007.

References1. Papatsenko D.A., Belikov S.V., Preobrazhenskaya O.V., and Karpov

V.L. Two-dimensional gels and hybrydization for studyingDNA-protein contacts by crosslinking // Methods in Molecular andCellular Biology. 1995. V. 5, No 3. P.171-177.

2. Kamashev D., Esipova N.G., Ebralidse K., and Mirzabekov, A.D.Mechanism of lac repressor switch-off: Orientation of lac repressorDNA-binding domain is reversed upon inducer binding //FEBS Lett.1995. V.375. P.27-30

3. Papatsenko D.A., Priporova I.V., Belikov S.V., and Karpov, V.L.Mapping of DNA-binding proteins along yeast genome byUV-induced DNA-protein crosslinking.// FEBS Letters, 1996, 381,103-105.

4. Belikov S.V., Papatsenko D.A., and Karpov V.L. A method tosynthesize strand-specific probes. //Anal.Biochemistry, 1996,240,152-154.

A PAC/BAC Data Resource forSequencing Complex Regions of theHuman Genome: A 2-Year Pilot Study

Julie R. KorenbergCedars Sinai Medical Center; University of California;Los Angeles, CA 90048-1869310/855-7627, Fax: /[email protected]

While the complete sequencing the human genome at99.99% accuracy is an immediate goal of the HumanGenome Project, a serious technical deficiency remains theability to rapidly and efficiently construct sequence readymaps as sequencing templates. This is particularly prob-lematic in regions with unusual genome structure. An un-derstanding of these troublesome regions prior togenome-wide sequencing will provide quality assurance aswell as reliable sequencing strategies in these regions.

○ ○ ○ ○ ○ ○ ○

Mapping


This proposal will generate a “whole genome” data re-source to enable rapid and reliable sequencing of genomicDNA by the definition and characterization of the morethan 52 regions of high homology now known to be dis-tributed within unrelated genomic regions and cloned inBACs and PACs. To do this, we will:

1. Define regions of true homology in the human genomeby characterizing subsets of the 4,700 BAC/PACs thatgenerate multiple hybridization signals using fluorescencein situ hybridization (FISH). Of the 1,200 sites of multiplesignals, more than 52 regions contain repeats as defined by600 BAC/PACs. The chimerism rate, multiple clone wells,and chromosome of origin will be defined by re-streakingeach clone, followed by fingerprint, FISH and PCR-basedend-sequence analyses on hybrid panels and radiation hy-brids.

Data will be shared with large sequencing efforts, depos-ited in the 4D database, available with annotation on ftpserver and through GDB.

2. Generate contigs of BACs and PACs in regions of com-plex genome organization. Using STS, EST analyses, fin-gerprinting, BAC/PAC to BAC/PAC Southerns, end se-quence walking in 3.5-20X libraries, and metaphase/inter-phase FISH, contigs will be seeded in 2-5 of the regions ofknown genome complexity, each of which is estimated as2-5 Mb. These data will be used to evaluate and provideindependent quality assurance of the STS and Radiationhybrid, and genetic maps in these regions. The most sig-nificant of these include 1p36/1q; 2p/q; multiple sites;8p23 and 8 further sites; 9p/q.

3. Define additional regions of complex genomic structure.Library screening using known members of multiple mem-ber retro-transposon and other known repeated sequencesdefined by the ncbi database, followed by FISH analyzesto determine structure and potential large regions of asso-ciated homologies.

Collaboration with other genome and sequencing centerswill provide quality control in the generation ofsequence-ready maps for sequencing templates.

We believe that this effort is important since 1) it will pro-vide a critical mapping tool necessary for the generation ofsequence ready maps; 2) if initiated now, the problem ar-eas could be delineated before scale ups to full productionoccur in major genome centers; 3) represents a modest costsuch that the cost of these data would comprise only asmall fraction of the cost of the entire genome sequenceand would vastly decrease the cost of sequencing errors 4)and could be completed in a, short time (2 to 3 years) so asto be of maximum benefit to sequencing centers. The Prin-cipal Investigator in this project is ideally suited for thiseffort because the group has developed the technology andinitiated FISH and genome analyses of over 4000 clones.

We believe that this project represents a critical and timelyeffort to enable rapid and cost effective human genomesequencing.

Subcontract under Glen Evans’ DOE Grant No.DE-FC03-96ER62294.

Mapping and Sequencing of theHuman X Chromosome

D. L. Nelson, E.E. Eichler, B.A. Firulli, Y. Gu, J. Wu,E. Brundage, A.C. Chinault, M. Graves, A. Arenson,R. Smith, E.J. Roth, H.Y. Zoghbi, Y. Shen, M.A. Wentland,D.M. Muzny. J. Lu, K Timms, M. Metzger, andR.A. GibbsDepartment of Molecular and Human Genetics and HumanGenome Center; Baylor College of Medicine; Houston,TX 77030713/798-4787, Fax: -6370 or -5386, [email protected]://www.bcm.tmc.edu/molgen

The human X chromosome is significant from both medi-cal and evolutionary perspectives. It is the location of sev-eral hundred genes involved in human genetic disease, andhas maintained synteny among mammals; both of theseaspects are due to its role in sex determination and the hap-loid nature of the chromosome in males. We have ad-dressed the mapping of this chromosome through a num-ber of efforts, ranging from long-range YAC-based map-ping to genomic sequence determination.

YAC mapping. The YAC-based map of the X is essentiallycomplete. We have constructed a 40 Mb physical map ofthe Xp22.3-Xp21.3 region, spanning an interval from thepseudoautosomal boundary (PABX) to the Duchenne mus-cular dystrophy gene. This region is highly annotated, with85 breakpoints defining 53 deletion intervals, 175 STSs(20 of which are highly polymorphic), and 19 genes.

Cosmid binning. The YAC-based physical is being used ina systematic effort to identify and sort cosmids prepared atLLNL from flow sorted X chromosomes into intervals.Gene identification through use of a common database forcDNA pool hybridization data is continuing. Over 50YACs have been utilized as probes to the gridded cosmicarrays. These have identified over 9000 cosmids from the24,000 member library. An additional 4000 cosmids havebeen identified using a variety of probes, with the bulkcoming from cDNA pool probes. More recent emphasishas been placed on BAC clones as their identity forsequencing has been established. These have been identi-fied using the usual methods.

Cosmid contig construction. Creation of long-range conti-nuity in cosmids and BACs proceeds from clones identi-fied by the YAC-based binning experiments. Identificationof STS carrying clones is carried out by a combined PCR/

○ ○ ○ ○ ○ ○ ○

Mapping


hybridization protocol, and adds to the specificity of theoverlap data. Cosmids are grown and DNA is prepared byan Autogen robot. DNAs are digested and analyzed by theAB362 GeneScanner for collection of fingerprint data. Theuse of novel fluorescent dyes (BODIPY) in this applica-tion has increased signal strength markedly. End fragmentdetection is currently carried out with traditional Southernhybridization, however additional dyes will permit detec-tion without hybridization in the GeneScanner protocol.Data are transferred to a Sybase database and analyzedwith ODS (J. Arnold, U. Georgia) software for overlap.ODS output is ported to GRAM (LANL) for map con-struction. A fully automated approach has yet to beachieved, but this goal is increasingly in reach.

Sequencing. An independently funded project awarded toRAG seeks to develop long-range genomic sequence for~2 Mb of the human X chromosome. In support of thisproject, cosmids have been constructed and isolated for the1.6 Mb region between FRAXA and FRAXF inXq27.3-Xq28. To date, the complete sequences of the re-gions surrounding the FMR1 and IDS genes have beendetermined (180 and 130 kb, respectively), along with anadditional ~700 kb of the interval. This sequence has led toidentification of the gene involved in FRAXE mental retar-dation. Additional sequence in Xq28 has been determined,including that of a cosmid containing the two genes,DXS1357E and a creatine transporter. This sequence hasbeen duplicated to chromosome 16pl 1 in recent evolution-ary history. Comparative sequence analysis reveals 94%sequence identity over 25 kb, and the presence ofpentameric repeats which are likely to have mediated theduplication event. A number of technical advances insequencing have been developed, including the use ofBODIPY dyes in AB373 sequencing protocols, which hasoffered enhanced base calling due to reduced mobilityshifting, improved single strand template protocols formuch reduced cost, and streamlined informatics processesfor assembly and annotation.

DOE Grant Nos. DE-FG05-92ER6l401 andDE-FG03-94ER61830 and NIH Grant No. 5P30HG00210.

*Sequence-Specific Proteins Binding tothe Repetitive Sequences of HighEukaryotic Genome

Olga Podgornaya, Ivan Lobov, Ivan Matveev, DmitryLukjanov, Natella Enukashvily, and Elena BugaevaInstitute of Cytology; Russian Academy of Sciences; St.Petersburg 194064, RussiaTelephone and Fax: +7-812/[email protected]

Repetitive sequences occupy the most part of the wholeeukaryotic genome but up to the last few years there hasnot been much interest in their role. The situation changedwhen alpha-satellites in human and minor satellites inmouse became candidates for centromere function respon-sibility. A number of centromere-specific proteins areunder investigation but none seems to distinguish centro-meric functions of exact sequences among long arrays oftandemly repeated satellites. The proteins associated withthat array are poorly known. We are trying to find out whatproteins are involved in maintaining the heterochromatinstructure of different types of repetitive sequences.

The major proportion of total genomic satellite DNA re-mains attached to the nuclear matrix (NM) after DNase1and high salt treatment. We followed this association invarious steps during NM preparation by in situ hybridiza-tion with the mouse satellite probe. Two mouse specieswere used -M. musculus and M. spretus. Both contain thesame repertoire of satellite DNAs but in different amounts.In M. musculus the centromeric heterochromatin containsmajor satellite (MA) as the principal component. In M.spretus the minor satellite (MI) is predominant. To testDNA-binding activity of the proteins after chromatogra-phy of the soluble NM proteins on cationic and anionicion-exchange columns, gel shift assays were performedwith cloned dimer of MA and a trimer of MI. To produceantibodies, the DNA-protein complexes obtained fromlarge-scale gel-shift assays were isolated and injected intoa guinea pig.

The gel shift assay with column fractions from M. muscu-lus NM and MA shows a ladder of complexes. The com-plexes could be competed out with an excess of MA DNAbut not with the same amount of E. coli DNA. Antibodiesfrom the immune serum caused a hypershift of the MA/NM protein complexes. Preimmune serum at the same di-lution did not alter the mobility of the complexes. A com-bination of western and Southern blots allows us to con-clude that a protein with a molecular weight of about 80kD and some similarity to the intermediate filaments isresponsible for the MA/NM interaction.

Specific DNA-binding activity to the MI has been testedafter column fractionation of the M. spretus NM extract. Aladder of complexes can be competed out with an excessof unlabeled MI but not E. coli or MA DNA. MI containsthe CENPB-box sequence, which is the binding site for theprotein CENPB, one of the centromeric proteins. Fractionsfrom the NM extract with MI-specific binding activity donot contain CENPB, as shown by western blotting withanti-CENPB antibodies.

The same kind of work is going on with human analogs ofMA and MI sequences, using large clones of satellite andalpha-satellite DNA and nuclear matrices.

○ ○ ○ ○ ○ ○ ○

Mapping


There are few satellite DNA-binding proteins isolated,none of them directly from the NM. Our long-term aim isto understand the role of these proteins in heterochromatinformation and in heterochromatin association with NM.

Extracts from hand-isolated nuclear envelopes from frogoocytes were tested for the specific DNA-binding activityto (T2G4)116. A fragment of Tetrahymena telomere from aYAC plasmid was used as a labelled probe in a gel-shiftassay. The DNA-protein complexes from the assay werecut out and injected into a guinea pig. The antibodies (AB)obtained stained one protein with an m.w. of about 70 kDin the nuclear envelope of the oocyte, nothing in the innerpart of the oocyte, and 70 kD and 120 kD in the frog livernuclei. The immunofluorescent AB stained fine patches onthe oocyte nuclear envelope and a number of intranucleispots in the frog blood cells.

The electron-microscope immuno-gold technique showedthat the protein is localized in the outer surface of the oo-cyte nuclear envelope in cup-like structures. DNA-bindingactivity to the same sequence has been tested and found inthe mouse nuclear matrix extracts. The activity could beeluted from the DEAE52 ion exchange column in 0.15NaCl. The activity could be competed out with the frag-ment itself but not with E. coli DNA in the same amounts.AB stained a 70-kD protein in active fractions after ionexchange chromatography. In nuclear matrix preparations,the AB recognized a 120-kD protein as well. The ABcaused hypershift of the complexes on the gel shift assay.The AB has some affinity to the keratins. In the mouse cellculture 3T3 line the staining is intranuclei, with fine dotsforming chains surrounding dark areas, which do not cor-respond to the nucleoli.

Similar results were observed when a mouse cell line wastransformed with head-and tail-less human keratin con-structs (Bader et al., 1991, J Cell Biol 115:1293). Theseresults suggest that the nuclear proteins detected with theAB may be natural analogs of this artificial keratin con-struct. The pattern of staining did not resemble the pictureof telomere-specific staining. Possibly the protein recog-nized intragenomic (T2G4)2 sequence, which is present in25% of murine GenBank sequences rather than telomere.We are going to do immunocytochemical investigations offrog and mouse development in order to determine thepoint when transcription of the 120- kD protein is initiatedand the staining becomes intranuclear.

As a continuation of the previous project the multiplealignment of all the Alu sequences from GenBank is goingon. We are also trying to obtain antibodies to the mainAlu-binding proteins to find out how many proteins couldbe bound to Alu sequence.

DOE Grant No. OR00033-93C1S014.

*Protein-Binding DNA Sequences

O.L. Polanovsky, A.G. Stepchenko, and N.N. LuchinaEngelhardt Institute of Molecular Biology; RussianAcademy of Sciences; Moscow 117984, RussiaFax: +7-095/135-1405, [email protected]

POU domain of Oct-2 transcription factor binds octamersequence ATGCAAAT and a number of degenerated se-quences. It has been shown that POUs and POUh domainsrecognize left and right parts of the oct-sequence, respec-tively. The recognized sequences are partly overlapped inthe native octamer. In the degenerated recognition sitesthese core sequences may be separated with a spacer up tofour nucleotides. The obtained data changed our view onthe number and structure of potential targets recognized onDNA by POU proteins.

Protein-DNA binding is realized due to interaction of aconservative amino acid residues with a DNA target. InPOU proteins amino acid residues in positions 47 (Val), 50(Cys) and 51 (Asn) of POUh domain are absolutely con-servative. In order to examine a possible role of Val47 wesubstituted this residue by each of the 19 other amino acidresidues and the interaction of the mutant proteins was in-vestigated with homeospecific site and its variants(ATAANNN) and with oct sequence. It was shown thatIle47 mutant retains the affinity and specificity. Val re-placement for Ser, Thr or His partially reduce the affinity.

Asn47 mutant sharply relax the specificity of protein-DNArecognition. Mutants at 47 position have much strongereffects on binding to homeospecific sites than to octamermotifs. Our data indicate that there is not a simplemono-letter code of protein/DNA recognition. It has beenshown that this recognition is determined not only by thenature of the radicals involved in the contact but also bythe structure of DNA binding domain as a whole and prob-ably by cooperative interaction of POUs and POUh domains.

Proposals for 1997. The role of Cys50 in POU domain/DNA recognition will be investigated. This residue is ab-solutely conservative in POU proteins but it is variable inrelative homeo-proteins. Our preliminary data allow tosuppose that residue at position 50 of POU homeodomainhave a key role in discrimination between TAAT-like andoctamer sequences. The role of the nuleotides flankingDNA target will be investigated.


Relevant Publications1. Stepchenko A.G. (1994) Noncanonical oct-sequences are targets for

mouse Oct-2B transcription factor. FEBS Letters, V.337, P.175-178.2. Stepchenko A.G., Polanovsky O.L. (1996) Interaction of Oct proteins

with DNA. Molecular Biology, V.30, P.296-302.3. Stepchenko A.G., Luchina N.N., Polanovsky O.L. The role of

conservative Val47 for POU homeodomain/DNA recognition. FEBSLetters, in press.

○ ○ ○ ○ ○ ○ ○

Mapping


*Development of Intracellular FlowKaryotype Analysis

V.V. Zenin,1 N.D. Aksenov,1 A.N. Shatrova,1 N.V. Klopov,2

L.S. Cram,3 and A.I. PoletaevEngelhardt Institute of Molecular Biology; RussianAcademy of Sciences; Moscow 117984, RussiaPoletaev: +7-095/135-9824, Fax: [email protected] of Cytology; Russian Academy of Sciences;St. Petersburg, Russia2St. Petersburg Institute of Nuclear Physics; Gatchina, Russia3Los Alamos National Laboratory; Los Alamos, NM 87545

Instrumentation for univariate fluorescent flow analysis ofchromosome sets has been developed for human cells. Anew method of cell preparation and intracellular stainingof chromosome with different dyes was developed andimproved. Cells suspension for flow analysis must satisfythe following requirements: minimal amount of free chro-mosomes and debris (dead cells, cell fragments etc.); chro-mosomes structure must be stabilized inside mitotic cells;chromosomes must be stained inside the cells up to satura-tion with the used dyes; chromosomes must be able to re-lease from cells with minimal possible mechanical treat-ment. The method includes enzyme treatment (chymot-rypsin), incubation with saponin and separation ofprestained cells from debris on sucrose gradient. The de-veloped protocol was tested and improved in the course ofseveral months of work and allows us to obtain a wellstained sample with a minimal amount of contaminates [2].

A special magnetic mixing/stirring device was constructedto perform cell membrane breaking. It was placed insidethe flow chamber of a serial flow cytometer ATC-3000equipped with additional electronic card for time-gateddata acquisition [1]. The rupturing of prestained mitoticcells is performed by means of a small magnetic rod vi-brating in an alternative magnetic field. The efficiency ofmitotic cells breaking with electromagnetic cell breakingdevice was tested using different human cell lines[2,3].

The device works in a stepwise mode: a defined volume ofsample is delivered to the breaking chamber for rupturingmitotic cell (cells) for a defined time period, followed bybuffer wash to move the released chromosomes from thebreaking chamber to the point of the analysis. The infor-mation about the chromosomes appearing at the point ofanalysis is accumulated in list mode files, making it pos-sible to resolve chromosome sets arising from single cellson the basis of time gating. The concentration of cells inthe sample must be kept low to ensure that only one cell ata time enters the breaking device.

The developed software classifies chromosome sets ac-cording to different criteria: total number of chromosomes,overall DNA content in the set, and the number of chromo-

somes of certain type [2,3]. In addition it’s possible to de-termine the presence of extra chromosomes or loss ofchromosome types. Thus this approach combines the highperformance of flow cytometry (quantitation and highthroughput) with the advantages of image analysis (cell tocell karyotype analysis and skills of trained cytogeneti-cist). The data analysis capabilities offer extensive flexibil-ity in determining important features of the karyotypesunder study. This development offers the potential to du-plicate most of what is determined by clinical cytogeneti-cists. The results now obtained are in good accordancewith goals of the project formulated before [4].


References[1]. V.V. Zenin, N.D. Aksenov, A.N. Shatrova, Y.V. Kravatsky, A.

Kuznetzova, L.S. Cram, A.I. Poletaev. “Time-gated human chromo-some flow analysis” XVII Congress of the International Society forAnalytical Cytology, 1994, Lake Placid, USA, Cytometry Supplement7, p. 68.

[2]. V.V. Zenin, N.D. Aksenov, A.N. Shatrova, Y.V. Kravatsky, A.Kuznetsova, L.S. Cram , A.I. Poletaev: “Time-gated flow analysis ofhuman chromosomes”; DOE Human Genome Program,Contractor-Grantee Workshop IV, November 13-17, 1994; Santa Fe,New Mexico, p. 13.

[3]. V.V. Zenin, N.D. Aksenov, A.N. Shatrova, N.V. Klopov , L.S. Cram,A.I. Poletaev: “Cell by cell flow analysis of human chromosomesets”; DOE Human Genome Program, Contractor-Grantee WorkshopV, January 28-February 1,1996; Santa Fe, New Mexico, p. 112.

[4]. Andrei I. Poletaev, Sergei I. Stepanov, Valeri V. Zenin, NikolayAksenov, Tatijana V. Nasedkina and Yuri V. Kravazky: “Developmentof Intracellular FlowKaryotype Analysis”; DOE Human Genome,1993 Program Report, p.34-35.

Mapping and Sequencing with BACsand Fosmids

Ung-Jin Kim, Hiroaki Shizuya, and Melvin I. SimonDivision of Biology; California Institute of Technology;Pasadena, CA 91125Kim: 818/395-4901, Fax: /796-7066, [email protected]: 818/395-3944, Fax: /[email protected]://www.tree.caltech.edu

BACs and fosmids are stable, nonchimeric, and highlyrepresentative cloning systems. BACs maintainlarge-fragment genomic inserts (100 to 300 kb) that areeasily prepared for most types of experiments, includingDNA sequencing.

We have improved the methods for generating BACs anddeveloped extensive BAC libraries. We have constructedhuman BAC libraries with more than 175,000 clones frommale fibroblast and sperm, and a mouse BAC library withmore than 200,000 clones. We are currently expanding hu-man library with the aim of achieving total 50X coveragehuman genomic library using sperm samples from anony-mous donors.

○ ○ ○ ○ ○ ○ ○

Mapping


The BAC libraries provide resources to bridge the gap be-tween genetic-cytogenetic information and detailed physi-cal characteristics of genomic regions that include DNAsequence information. They also provide reliable tools forgenerating a high-resolution, integrated map on which avariety of information and resources are correlated. Usingprimarily the human BAC library constructed from fibro-blasts, we have assembled a physical contig map of chro-mosome 22 [1]. First, the entire library was screened bymost of the known chromosome 22-specific markers thatinclude cDNA, anonymous STS markers, FISH-mappedcosmids and fosmids, YAC-Alu PCR products,FISH-mapped BACs, and flow-sorted chromosome 22DNA. The positive clones have been assembled intocontigs by means of the STS-contents or other markersassigned to BAC clones. Most of the contigs were con-firmed by using a restriction fingerprinting scheme origi-nally developed by Sulston and Coulson, and modified inour laboratory. Currently, the contigs cover over 80% ofthe chromosome arm. Various physical or genetic land-marks on this chromosome can now be precisely localizedsimply by assigning them to BACs or contigs on the map.Using BAC end sequence information from each of thechromosome 22-specific BACs, it is now possible to closethe gaps efficiently by screening deeper BAC librarieswith new probes specific to the ends of contigs.

The resulting BAC contig map is now serving as a roadmap for sequencing the chromosome. Chromosome22-specific BAC clones have been distributed to our col-laborators including The Sanger Center and Dr. Bruce Roein University of Oklahoma, and many of the clones havealready been sequenced. BAC end sequencing scheme[2]will play a crucial role toward the complete sequencing ofchromosome 22, and we are currently sequencing the endsof these BACs directly using the miniprepped BAC DNAas templates.


References[1] Kim et al. (1996) A Bacterial Artificial Chromosome-based

framework contig map of human chromosome 22q. Proc. Natl. Acad.Sci. USA v93 (13): pp6297-6301.

[2] Venter, C., Smith, H.O., and Hood, L. (1996) Nature 381: pp364-366.

Towards a Globally Integrated,Sequence-Ready BAC Map of theHuman Genome

Ung-Jin Kim, Hiroaki Shizuya, and Melvin I. SimonDivision of Biology; California Institute of Technology;Pasadena, CA 91125Kim: 818/395-4901, Fax: /796-7066, [email protected]: 818/395-3944, Fax: /[email protected]://www.tree.caltech.edu

BAC clones are ideal for genome analysis since they arenon-chimeric, stably maintain large fragment genomic in-serts (100-300 kb)[1], and it is easy to prepare BAC DNAsamples for most types of experiments including DNA se-quencing[2]. We have improved BAC cloning technique inthe past years and constructed >20X human BAC libraries.As BACs are proving to be the most efficient reagents forlarge scale genomic sequencing, we intend to increase thedepth of the library to 50X genomic equivalence. Usingthe ESTs, especially the Unigenes that have been chromo-somally assigned by other means such as Radiation Hybridmapping and YAC-based STS content mapping, we plan toorganize the BAC library into a mapped resource. The re-sulting BAC-EST framework map will provide a highresolution EST (or gene) map and instant entry points forgene finding and large scale genomic sequencing. We alsointend to determine the end sequences of the BAC insertsfrom a significant number of the clones (at least 350,000clones or 15X genomic equivalence) within two years [3].All the BAC-EST mapping data and BAC end sequenceswill be made available via public databases and WEBservers. The mapping data and end sequence informationwill dramatically facilitate the process of finding clonesthat extend the sequenced regions with minimal overlaps.Thus, the tagged BAC libraries will serve as a reliable andfacile sequence-ready resource and an organizing tool tosupport and coordinate simultaneously multiple sequenc-ing projects all over the genome.


References[1] Shizuya, H., Birren, B., Kim, U.-J., Mancino, V., Slepak, T., Tachiiri,

Y., and Simon, M.I. (1992) Proc. Natl. Acad. Sci. USA 89,8794-8797.

[2] Kim, U.-J., Birren, B.W., Yu-Ling Sheng, Tatiana Slepak, ValenaMancino, Cecilie Boysen, Hyung-Lyun Kang, Melvin I. Simon, andHiroaki Shizuya. (1996) Genomics 34, 213-218.

[3] Venter, C, Smith, H.O., and Hood, L. (1996) Nature 381: pp364-366.

Generation of Normalized andSubtracted cDNA Libraries toFacilitate Gene Discovery

Marcelo Bento Soares, Maria de Fatima Bonaldo, PierreJelenc, and Susan BaumesDepartment of Psychiatry; Columbia University; and TheNew York State Psychiatric Institute; New York, NY10032212/960-2313, Fax: /781-3577,[email protected]

Large-scale single-pass sequencing of cDNA clones ran-domly picked from libraries has proven quite powerful toidentify genes and the use of normalized libraries in whichthe frequency of all cDNAs is within a narrow range hasbeen shown to expedite the process by minimizing the re-dundant identification of the most prevalent mRNAs. In an

○ ○ ○ ○ ○ ○ ○

Mapping


attempt to contribute to the ongoing gene discovery ef-forts, we have further optimized our original procedure forconstruction of normalized directionally cloned cDNA li-braries[1] and we have successfully applied it to generate anumber of human cDNA libraries from a variety of adultand fetal tissues [2]. To date we have constructed librariesfrom infant brain, fetal brain, adult brain, fetalliver-spleen, full-term and 8-9 week placentae, adultbreast, retina, ovary tumor, melanocytes, parathyroid tu-mor, senescent fibroblasts, pineal glands, multiple sclero-sis plaques, testis, B cells, fetal heart, fetal lung, 8-9 weekfetuses and pregnant uterus. Several additional libraries arecurrently in preparation. All libraries have been contrib-uted to the IMAGE consortium, and they are being widelyused for sequencing and mapping.

However, given the large scale nature of the ongoing se-quencing efforts and the fact that a significant fraction ofthe human genes has been identified already, the discoveryof novel cDNAs is becoming increasingly more challeng-ing. In an effort to expedite this process further, in collabo-ration with Greg Lennon (LLNL) we have developed andapplied subtractive hybridization strategies to eliminatepools of sequenced cDNAs from libraries yet to be sur-veyed. Briefly, single-stranded DNA obtained from poolsof arrayed and sequence I.M.A.G.E. clones are used astemplates for PCR amplification of cDNA inserts withflanking T7 and T3 primers. PCR amplification productsare then used as drivers in hybridizations with normalizedlibraries in the form of single-stranded circles. The remain-ing single-stranded circles (subtracted library) are purifiedby hydroxyapatite chromatography, converted todouble-stranded circles and electroporated into bacteria.Preliminary characterization of a subtracted fetalliver-spleen library indicates that the procedure is effectiveto enhance the representation of novel cDNAs.

In an effort to enhance the representation of full-lengthcDNAs in our libraries, as we strive towards our final ob-jective of generating full-length normalized cDNA librar-ies, we have adapted our normalization protocol to takeadvantage of the fact that it is now possible to producesingle-stranded circles in vitro by sequentially digestingsupercoiled plasmids with Gene II protein and Exonu-clease III (Life Technologies). This has proven significantbecause it circumvents the biases introduced by differen-tial growth of clones containing small and large cDNA in-serts when single-strands are produced in vivo upon super-infection with a helper phage.


References[1] Soares, M.B., Bonaldo, M.F., Su, L., Lawton, L. & Efstratiadis, A.

(1994). Construction and characterization of a normalized cDNAlibrary. Proc. Natl. Acad. Sci. USA 91(20), 9228-9232.

[2] Bonaldo, M.F., Lennon, G. and Soares, M.B. (1996). Normalizationand subtraction: Two approaches to facilitate gene discovery. GenomeResearch 6, 791-806.

Mapping in Man-Mouse HomologyRegions

Lisa Stubbs, Johannah Doyle, Ethan Carver,Mark Shannon, Joomyeong Kim, Linda Ashworth,1 andElbert Branscomb1

Biology Division; Oak Ridge National Laboratory; OakRidge, TN 37831423/574-0854, Fax: -1283, [email protected] [email protected] Genome Center; Lawrence Livermore NationalLaboratory; Livermore, CA 94550

Numerous studies have confirmed the notion that mouseand human chromosomes resemble each other closelywithin blocks of syntenic homology that vary widely insize, containing from just a few to several hundred relatedgenes. Within the best-mapped of these homologous re-gions, the presence and location of specific genes can beaccurately predicted in one species, based upon the map-ping results obtained in the other. In addition, informationregarding gene function derived from the analysis of hu-man hereditary traits or mapped murine mutations, canalso be extrapolated from one species to another. However,syntenic relationships are still not established for manyhuman regions, and local rearrangements including appar-ent deletions, inversions, insertions, and transpositionevents, complicate most of the syntenically homologousregions that appear simple on the gross genetic level. Be-cause of these complications, the power of prediction af-forded in any homology region increases tremendouslywith the level of resolution and degree of internal consis-tency associated with a particular set of comparative map-ping data. Our groups have been interested in further de-fining the borders of syntenic linkage groups in human andmouse, upon elucidating mechanisms behind evolutionaryrearrangements that distinguish chromosomes of mamma-lian species, and upon devising means of exploiting therelationships between the two genomes for the discoveryand analysis of new genes and other functional units inmouse and man.

One of the larger contiguous blocks of mouse-human ge-nomic homology includes the proximal portion of mousechromosome 7 (Mmu7). Detailed analysis of this large re-gion of mouse-human homology have served as the initialfocus of these collaborative studies. Our results haveshown that gene content, order and spacing are remarkablywell-conserved throughout the length of this approxi-mately 23 cM/29 Mb region of mouse-human homology,except for six internal rearrangements of gene sequence inmouse relative to man. One of these differences involve asmall segment of H19ql3.4 genes whose murine counter-parts have been transposed out of the large Mmu7/H19qconserved synteny region into a separate linkage grouplocated on mouse chromosome 17. The six internal rear-rangements, including two transpositions and four local

○ ○ ○ ○ ○ ○ ○

Mapping


inversions, are clustered together at two sites; our datasuggest that the rearrangements occurred in a coincidentfashion, or were commonly associated with unstable DNAsequences at those sites. Interestingly, both rearranged re-gions are occupied by large tandemly clustered gene fami-lies, suggesting that these locally repeated sequences mayhave contributed to their evolutionary instability. Thestructure and conserved functions of genes within theseand other clustered gene families located on H19 also rep-resent an active line of interest to our group. More re-cently, we have extended mapping studies to include clus-tered gene families located in other chromosomal regions,and are working to define the borders of mouse-humansyntenic segments on a broader, genome-wide scale.

DOE Contract No. DE-AC05-96OR22464 and ContractNo. W-7405-ENG-48 with Lawrence Livermore NationalLaboratory.

Positional Cloning of Murine Genes

Lisa Stubbs, Cymbeline Culiat, Ethan Carver, JohannahDoyle, Laura Chittenden, Mitchell Walkowicz, NestorCacheiro, Greg Lennon,1 Gary Wright,2 Joe Rutledge,3

Robert Nicholls,4 and Walderico GenerosoBiology Division; Oak Ridge National Laboratory; OakRidge, TN 37831-8077423/574-0854, Fax: -1283, [email protected] [email protected] Genome Center; Lawrence Livermore NationalLaboratory; Livermore, CA 945502University of Texas Southwestern Medical Center atDallas; Dallas, TX 752353Children’s Hospital and Medical Center; University ofWashington School of Medicine; Seattle, WA 981054Department of Genetics; Case Western Reserve Univer-sity; Cleveland, Ohio

Chromosome rearrangements, notably deletions and trans-locations, have proved invaluable as tools in the mappingand molecular cloning of a acquired and inherited humandiseases. Because balanced translocations are cytologicallyvisible, and generally produce profound disturbances inboth gene expression and DNA structure without necessar-ily disturbing the structure of multiple genes, this type ofmutation provides an especially valuable “tag” that greatlysimplifies mapping, cloning, and assessment of candidategenes associated with a disease. Although balanced trans-locations are relatively rare in human populations, they arereadily induced in the mouse. Using various mutagenesisprotocols, we have generated numerous translocation-bear-ing mutant mouse strains that display an impressive vari-ety of health-related anomalies, including obesity, polycys-tic kidneys, gastrointestinal disorders, limb and skeletaldeformities, neural tube defects, ataxias, tremors, heredi-tary deafness and blindness, reproductive dysfunction, andcomplex behavioral defects. The ability to map the genes

associated with translocation breakpoints cytogenetically,first crudely through straightforward banding techniquesand then to a higher level of resolution using fluorescencein situ hybridization methods, allows us to avoid the costlyand time-consuming crosses that are required for the map-ping of most mutant genes. With this rapidly-obtained,crude-level mapping information available, we can readilyassess possible relationships between newly arising mutantphenotypes and linked candidate genes or related diseasesthat map to homologous regions of the human genome.Using this approach, we have recently begun to define themap positions of several mutations. Mapping results haveled us to the identification of candidate genes for two mu-tations: one associated with congenital deafness and pre-disposition to severe gastric ulcers, and another associatedwith late-onset obesity. So far, we have characterized onlya fraction of the mouse strains that comprise this valuable,recently-generated mutant collection in detail. As a inte-gral part of this program, we are actively exploring newstrategies and integrating information, technology and re-sources derived from the Human Genome research effort,that promise to increase the efficiency of breakpoint map-ping and cloning dramatically. The mutations are scatteredwidely throughout the mouse genome corresponding to abroad selection of human homology regions. As newbreakpoints are mapped, and large numbers of newly-se-quenced cDNA clones are assigned to the mouse and hu-man maps, the potential for rapid association betweencloned gene and mapped mutation will increase dramati-cally. This large collection of murine translocation mutantstherefore represents a powerful resource for linkingmapped cDNA clones to health-related phenotypesthroughout the genome.

In addition to the analysis of translocation mutants, wehave also characterized other types of mouse mutations,including: (1) tottering and leaner, allelic mutations asso-ciated with ataxia and epilepsy in mice, and representingmurine models for human diseases, familial hemiplaegicmigraine and episodic ataxia, respectively; and (2) jdf2, alocus associated with mutations causing runting, neuro-muscular tremors and male sterility which is located in amouse region related to the Prader Willi-Angleman syn-drome gene interval of human 15q11-q13. Both sets ofmutations affect large, complex, and highly conservedgenes, and provide important animal models for the explo-ration of the diverse roles their human counterparts mayplay in human disease. In concert with these gene cloningstudies, we have been involved in exploring new means ofexploiting mouse-human genomic conservation in the iso-lation of functionally-significant sequences from largecloned regions of human DNA. The methods we have de-veloped hold great promise as an efficient tool for genediscovery in cloned genomic regions.


○ ○ ○ ○ ○ ○ ○

Mapping


Human Artificial EpisomalChromosomes (HAECS) for BuildingLarge Genomic Libraries

Min Wang, Panayotis A. Ioannou,2 Michael Grosz, SubrataBanerjee, Evy Bashiardes,2 Michelle Rider, Tian-QiangSun,1 and Jean-Michel H. Vos1

Lineberger Comprehensive Cancer Center and 1Depart-ment of Biochemistry and Biophysics; University of NorthCarolina; Chapel Hill, NC 27599Vos: 919/966-3036, Fax: -3015, [email protected] Cyprus Institute of Neurology and Genetics; Nicosia,Cyprus

Of some 100,000 human genes, only a few thousand havebeen cloned, mapped or sequenced so far. Much less isknown about other chromosomal regions such as thoseinvolved in DNA replication, chromatin packaging, andchromosome segregation. Construction of detailed physi-cal maps is only the first step in localizing, identifying anddetermining the function of genetic units in human cells.Studying human gene function and regulation of othercritical genomic regions that span hundreds of kilobasepairs of DNA requires the ability to clone an entire func-tional unit as a single DNA fragment and transfer it stablyinto human cells.

We have developed a human artificial episomal chromo-some (HAEC) system based on latent replication origin ofthe large herpes Epstein-Barr virus (EBV) for the propaga-tion and stable maintenance of DNA as circularminichromosomes in human cells.[1,2] Individual HAECScarried human genomic inserts ranging from 60 to 330 kband appeared genetically stable. An HAEC library of 1500independent clones carrying random human genomic frag-ments with average sizes of 150 to 200 kb was establishedand allowed recovery of the HAEC DNA. This autologousHAEC system with human DNA segments directly clonedin human cells provides an important tool for functionalstudy of large mammalian DNA regions and genetherapy.[3,4]

Current efforts are focused on (a) shuttling large BAC/PAC genomic inserts in human and rodent cells and (b)packaging BAC/PAC/HAEC clones as large infectiousHerpes Viruses for shuttling genomic inserts betweenmammalian cells and (c) constructing bacterial-based hu-man and rodent HAEC libraries. (a) We have designed a“pop-in” vector, which can be inserted into currentBAC-or PAC-based clone via site-specific integration.This “CRE-LOXP”-mediated system has been used to es-tablish BAC/PAC up to 250 kb in size in human cells asHAECS. (b) We have obtained packaging of 160-180 kbexogenous DNA into infectious virions using the humanlymphotropic Epstein-Barr virus. After delivery into hu-man beta-lymphoblasts cells the HAEC DNA was stably

established as 160-180 kb functional autonomously repli-cating episomes.[5,7] We have also generated a hybridBAC/HAEC vector, which can shuttle large DNA inserts,i.e., at least up to 260 kb, between bacteria and humancells. Such a system is being used to develop large insertlibraries, whose clones can be directly transferred into hu-man or rodent cells for functional analysis. TheseHAEC-derived systems will provide useful moleculartools to study large genetic units in humans and rodents,and complement the functional interpretation of currentsequencing efforts.

DOE Contract No. DE-FG05-91ER61135.

References[1] Sun, T.-Q., Fenstermacher, D. & Vos, J.-M.H. Human artificial

episomal chromosomes for cloning large DNA in human cells. NatureGenet 8, 33-41 (1994).

[2] Sun, T.-Q. & Vos, J.-M.H. Engineering of 100-300 kb of DNA aspersisting extrachromosomal elements in human cells using theHAEC system in Methods molec. Genet. (ed. Adolph, K.W.)(Academic Press, San Diego, CA, 1995).

[3] Vos, J.-M.H. Herpes viruses as Genetic Vectors in Viruses in HumanGene Therapy (ed. Vos, J.-M.H.) 109-140 (Carolina Academic Press& Chapman & Hall, Durham N.C., USA & London, UK, 1995).

[4] Kelleher, Z. & Vos, J.-M. Long-Term Episomal Gene Delivery inHuman Lymphoid Cells using Human and Avian Adenoviral-assistedTransfection. Biotechniques 17, 1110-1117 (1994).

[5] Banerjee, S., Livanos, E. & Vos, J.-M.H. Therapeutic Gene Deliveryin Human beta-lymphocytes with Engineered Epstein-Barr Virus.Nature Medicine 1, 1303–1308 (1995).

[6] Sun, T.-Q., Livanos, E., & Vos, J.-M.H. Engineering a mini-herpesvirus as a general strategy to transduce up to 180 kb offunctional self-replicating human mini-chromosomes. Gene Therapy3, 1081–1088 (1996).

[7] Wang, S. & Vos, J.-M.H. An HSV/EBV based vector for HighEfficient Gene Transfer to Human Cells in vitro/in vivo. J. Virol. 70,8422–8430 (1996).

*Cosmid and cDNA Map of a HumanChromosome 13q14 Region FrequentlyLost at B Cell Chronic LymphocyticLeukemia

N.K. Yankovsky, B.I. Kapanadze, A.B. Semov,A.V. Baranova, and G.E. SulimovaN.I. Vavilov Institute of General Genetics; Moscow117809, Russia+7-095/135-5363, Fax: -1289, [email protected] [email protected] (send to both addresses)

We are mapping a human chromosome 13q14 region fre-quently lost at human blood malignancy cold B cellchronic lymphocytic leukemia (BCLL). The final goal ofthe project is to find putative oncosupressor gene lost inthe region at BCLL. We have constructed a cosmid contigbetween D13S1168 and D13S25 loci in the region. Theinterval had been shown to be in the center of the BCLLassociated deletions. The contig consists of more than 100cosmids from LANL human chromosome 13 specific

○ ○ ○ ○ ○ ○ ○

Mapping


library (LA13NC01). We estimated the distance betweenD13S1168 and D13S25 loci as about 540 kb. We are con-structing a transcriptional map of the region. Seven differ-ent cDNA clones were found with two of the cosmidclones. All cosmids corresponding to the minimal tillingpath between D13S1168 and D13S25 are being used as

○ ○ ○ ○ ○ ○ ○

Mapping

probes for screening new cDNA clones. I.M.A.G.E. Con-sortium (LLNL) cDNA clones assigned to 13q14 will bemapped against the cosmid contig. Mapped cDNA cloneswill be checked as candidate oncosupressor genes forBCLL.



BCM Server Core

Daniel Davison and Randall SmithBaylor College of Medicine; Houston, TX 77030713/798-3738, Fax: -3759, [email protected]://www.bcm.tmc.edu

We are providing a variety of molecular biology-relatedsearch and analysis services to Genome Program investi-gators to improve the identification of new genes and theirfunctions. These services are available via the BCMSearch Launcher World Wide Web (WWW) pages whichare organized by function and provide a singlepoint-of-entry for related searches. Pages are included for1) protein sequence searches, 2) nucleic acid sequencesearches, 3) multiple sequence alignments, 4) pairwise se-quence alignments, 5) gene feature searches, 6) sequenceutilities, and 7) protein secondary structure prediction. TheProtein Sequence Search Page, for example, provides asingle form for submitting sequences to WWW serversthat provide remote access to a variety of different proteinsequence search tools, including BLAST, FASTA,Smith-Waterman, BEAUTY, BLASTPAT, FASTAPAT,PROSITE, and BLOCKS searches. The BCM SearchLauncher extends the functionality of other WWW ser-vices by adding additional hypertext links to results re-turned by remote servers. For example, links to the NCBI’sEntrez database and to the Sequence Retrieval System(SRS) are added to search results returned by the NCBI’sWWW BLAST server. These links provide easy access toMedline abstracts, links to related sequences, and addi-tional information which can be extremely helpful whenanalyzing database search results. For novice or infrequentusers of sequence database search tools, we have pre-setthe parameter values to provide the most informativefirst-pass sequence analysis possible.

A batch client interface to the BCM Search Launcher forUnix and Macintosh computers has also been developed toallow multiple input sequences to be automaticallysearched as a background task, with the results returned asindividual HTML documents directly on the user’s system.The BCM Search Launcher as well as the batch client areavailable on the WWW at URL http://gc.bcm.tmc.edu:8088/search-launcher/launcher.html.

The BCM/UH Server Core provides the necessary compu-tational resources and continuing support infrastructure forthe BCM Search Launcher. The BCM/UH Server Core iscomposed of three network servers and currently supportselectronic mail and WWW-based access; ultimately, spe-cialized client-server access will also be provided. Thehardware used includes a 2048-processor MasPar mas-sively parallel MIMD computer, a DEC Alpha AXP/OSF1,a Sun 2-processor SparcCenter 1000 server, and severalSun Sparc workstations.

In addition to grouping services available elsewhere on theWWW and providing access to services developed atBCM and UH, the BCM/UH Server Core will also provideaccess to services from developers who are unwilling orunable to provide their own Internet network servers.

Grant Nos.: DOE, DE-FG03-9SER62097/A000; NationalLibrary of Medicine, R01-LM05792; National ScienceFoundation, BIR 91-11695; National Research ServiceAward, F32-HG00133-01; NIH, P30-HG00210 andR01-HG00973-01.

A Freely SharableDatabase-Management SystemDesigned for Use in Component-Based,Modular Genome Informatics Systems†

Steve Rozen,1 Lincoln Stein,1 and Nathan GoodmanThe Jackson Laboratory; Bar Harbor, ME 04609Goodman: 207/288-6158, Fax: -6078, [email protected] Institute for Biomedical Research; Cambridge,MA 02139http://goodman.jax.orghttp://www.genome.wi.mit.edu/informatics/workflow

We are constructing a data-management component, builton top of commercial data-management products, tuned tothe requirements of genome applications. The core of thisgenome data manager is designed to:

• support the semantic and object-oriented data modelsthat have been widely embraced for representing ge-nome data,

• provide domain-specific built-in types and operationsfor storing and querying bimolecular sequences,

• provide built-in support for tracking laboratory workflows, and admit further extensions for otherspecial-purpose types,

• allow core facilities to be readily extended to meet thediverse needs of biological applications

The core data manager is being constructed on top ofSybase, Oracle, and Informix Universal Server. The soft-ware is available free of charge and is freelyredistributable.

We will be reporting progress on the core data manager’sarchitecture and interface at the URLs above, and we so-licit comments on its design.


†Originally called Database Management Research for theHuman Genome Project, this project was initiated in 1995at the Massachusetts Institute of Technology–WhiteheadInstitute.

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics



A Software Environment for Large-Scale Sequencing

Mark GravesDepartment of Cell Biology; Baylor College of Medicine;Houston, TX 77030713/798-8271, Fax: -3759; [email protected]://www.bcm.tmc.eduhttp://stork.bcm.tmc.edu/gfp

Our approach is to implement software systems whichmanage primary laboratory sequence data and explore andannotate functional information in genome sequence andgene products.

Three software systems have been developed and are be-ing used: two sequence data managers which use differentsequence assembly packages, FAK and Phrap, and a seriesof analysis and annotation tools which are available via theInternet. In addition, we have developed a prototype appli-cation for data mining of sequence data as it is related tometabolic pathways.

Products of this project are the following:

1. GRM -a sequence reconstruction manager using theFAQ assembly engine (available since October 1995).

2. GFP -a sequence finishing support tool using the Phrapassembly engine (available since March 1996).

3. A series of gene recognition tools (available since early1996).

4. A tool for visualizing metabolic pathways data and ex-ploring sequence data related to metabolic pathways (pro-totype available since August 1996).


Generalized Hidden Markov Modelsfor Genomic Sequence Analysis

David Haussler, Kevin Karplus,1 and Richard Hughey1

Computer Science Department and 1Computer EngineeringDepartment; University of California; Santa Cruz, CA95064408/459 2105, Fax: -4829, [email protected]://www.cse.ucsc.edu/research/compbiohttp://www-hgc.lbl.gov/projects/genie.html

We have developed an integrated probabilistic method forlocating genes in human DNA based on a generalized hid-den Markov model (HMM). Each state of a generalizedHMM represents a particular kind of region in DNA, suchas an initial exon for a gene. The states are connected bytransitions that model sites in DNA between adjacent re-

gions, e.g. splice sites. In the full HMM, parametric statis-tical models are estimated for each of the states and transi-tions. Generalized HMMs allow a variety of choices forthese models, such as neural networks, high order Markovmodels, etc. All that is required is that each model return alikelihood for the kind of region or transition it is supposedto model. These likelihoods are then combined by a dy-namic programming method to compute the most likelyannotation for a given DNA contig. Here the annotationsimply consists of the locations of the transitions identifiedin the DNA, and the labeling of the regions between transi-tions with their corresponding states.

This method has been implemented in the genefinding pro-gram Genie, in collaboration with Frank Eeckman, MartinReese and Nomi Harris at Lawrence Berkeley Labs. DavidKulp, at UCSC, has been responsible for the core imple-mentation. Martin Reese developed the splice site models,promoter models, and datasets. You can access Genie atthe second www address given above, submit sequences,and have them annotated. Nomi Harris has written a dis-play tool called Genotater that displays Genie’s annotationalong with the annotation of other genefinders, as well asthe location of repetitive DNA, BLAST hits to the proteindatabase, and other useful information. Papers and furtherinformation about Genie can be found at the first wwwaddress above. Since the ISMB ’96 paper, Genie’s exonmodels have been extended to explicitly incorporateBLAST and BLOCKS database hits into their probabilisticframework. This results in a substantial increase in genepredicting accuracy. Experimental results in tests using astandard set of annotated genes showed that Genie identi-fied 95% of coding nucleotides correctly with a specificityof 88%, and 76% of exons were identified exactly.


Identification, Organization, andAnalysis of Mammalian RepetitiveDNA Information

Jerzy JurkaGenetic Information Research Institute; Palo Alto, CA94306415/326-5588 Fax: -2001, [email protected]://charon.lpi.org

There are three major objectives in this project: organiza-tion of databases of mammalian repetitive sequences,development of specialized software for analysis of repeti-tive DNA, and sequence studies of new mammalian re-peats.

Our approach is based on extensive usage of computertools to investigate and organize publicly available se-quence information. We also pursue collaborative research

○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics


with experimental laboratories. The results are widely dis-seminated via the internet, peer reviewed scientific publi-cations and personal interactions. Our most recent researchconcentrates on mechanisms of retroposon integration inmammals (Jurka, J., PNAS, in press; Jurka, J andKlonowski, P., J. Mol. Evol. 43:685-689).

We continue to develop reference collections of mamma-lian repeats which became a worldwide resource for anno-tation and study of newly sequenced DNA. The referencecollections are being revised annually as part of a largerdatabase of repetitive DNA, called Repbase. The recentinflux of sequence data to public databases created an un-precedented need for automatic annotation of known re-petitive elements. We have designed and implemented aprogram for identification and elimination of repetitiveDNA known as CENSOR.

Reference collections of mammalian repeats and the CEN-SOR program are available electronically (via anonymousftp to ncbi.nih.gov; directory repository/repbase). CEN-SOR can also be run via electronic mail (mail “help” mes-sage to [email protected]).


*TRRD, GERD and COMPEL:Databases on Gene-ExpressionRegulation as a Tool for Analysis ofFunctional Genomic Sequences

A.E. Kel, O.A. Podkolodnaya, O.V. Kel, A.G.Romaschenko, E. Wingender,1 G.C. Overton,2 and N.A.KolchanovInstitute of Cytology and Genetics; Novosibirsk, RussiaKolchanov: +7-3832/353-335, Fax: -336 or /356-558,[email protected]://transfac.gbf-braunschweig.de1Gesellschaft für Biotechnologische Forschung;Braunschweig, Germany2Department of Genetics; University of PennsylvaniaSchool of Medicine; Philadelphia, PA 19104-6145

The database on transcription regulatory regions in eukary-otic genomes (TRRD) has been developed [1] (http://www.bionet.nsk.su/TRRD.html; ftp://ftp.bionet.nsk.su/pub/trrd/). The main principle of data representation inTRRD is modular structure and hierarchy of transcriptionregulatory regions. TRRD entry corresponds to a gene asentire unit. Information on gene regulation is provided(cell-cycle and cell type specificity, developmentalstage-specificity, influence of various molecular signals ongene expression). TRRD database contains informationabout structural organization of gene transcription regula-tory region. TRRD contains description of known promot-ers and enhancers in 5', 3' regions and in introns. Descrip-

tion of binding sites for transcription factors includesnucleotide sequence and precise location, name of factorsthat bind to the site, experimental evidences for the bind-ing site revealing. We provide cross-references toTRANSFAC database [2] for both sites and factors as wellas for genes. TRRD 3.3 release includes 340 vertebrategenes.

The Gene Expression Regulation Database (GERD) col-lects information on features of genes expression as wellas information about gene transcription regulation. Thecurrent release of GERD contains 75 entries with informa-tion on expression regulation of genes expressed in he-matopoietic tissues in the course of ontogenesis and bloodcells differentiation. COMPEL database contains informa-tion about composite elements which are functional unitsessential for highly specific transcription regulation [3].Direct interactions between transcription factors binding totheir target sites within composite elements result in con-vergence of different signal transduction pathways. Nucle-otide sequences and positions of composite elements,binding factors and types of their DNA binding domains,experimental evidence confirming synergistic or antago-nistic action of factors are registered in COMPEL.Cross-references to TRANSFAC factors table are given.TRRD and COMPEL are provided by cross-references toeach other. COMPEL 2.1 release includes 140 compositeelements.

We have developed a software for analysis of transcriptionregulatory region structure. The CompSearch program isbased on oligonucleotide weight matrix method. To collectsets of binding sites for the matrixes construction we haveused TRANSFAC and TRRD databases. The CompSearchprogram takes into account the fine structure of experi-mentally confirmed NFATp/AP-1 composite elements col-lected in COMPEL (distances between binding sites incomposite elements, their mutual orientation). By meansof the program we have found potential composite ele-ments of NFATp/AP-1 type in the regulatory regions ofvarious cytokine genes. Analysis of composite elementscould be the first approach to reveal specific patterns oftranscription signals encoding regulatory potential of eu-karyotic promoters.

References1. Kel O.V., Romaschenko A.G., Kel A.E., Naumochkin A.N., Kolchanov

N.A. Proceedings of the 28th Annual Hawaii International Confer-ence on System Sciences [HICSS]. (1995), v.5, BiotechnologyComputing, IEE Computer Society Press, Los Alamos, California, p.42-51.

2. Wingender E., Dietze P., Karas H., and Knuppel R. TRANSFAC: adatabase on transcription factors and their DNA binding sites (1996).Nucl. Acids Res., 1996, v. 24, pp. 238-241.

3. Kel O.V., A.G. Romaschenko, A.E. Kel, E. Wingender, N.A.Kolchanov. A compilation of composite regulatory elements affectinggene transcription in vertebrates (1995). Nucl. Acids Res., v. 23, pp.4097-4103.

○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics



Recent PublicationsKel, A., Kel, O., Ischenko, I., Kolchanov, N., Karas, H., Wingender, E.

and Sklenar, H. (1996). TRRD and COMPEL databases on transcrip-tion linked to TRANSFAC as tools for analysis and recognition ofregulatory sequences. Computer Science and Biology. Proceedings ofthe German Conference on Bioinformatics (GCB’96), R. Hofestadt,T. Lengauer, M. Löffler, D. Schomburg (eds.). University of Leipzig,Leipzig 1996, pp. 113-117.

Wingender, E., Kel, A. E., Kel, O. V., Karas, H., Heinemeyer, T., Dietze,P., Knueppel, R., Romaschenko, A. G. and Kolchanov, N. A. (1997).TRANSFAC, TRRD and COMPEL: Towards a federated databasesystem on transcriptional regulation. Nucleic Acids Res., in press.

Ananko E.A., Ignatieva E.V., Kel A.E., Kolchanov N.A (1996).WWWTRRD: Hypertext information system on transcriptionregulation. Computer Science and Biology. Proceedings of theGerman Conference on Bioinformatics (GCB’96), R. Hofestadt, T.Lengauer, M. Löffler, D. Schomburg (eds.). University of Leipzig,Leipzig 1996, pp. 153-155.

A.E. Kel, O.V. Kel, O.V. Vishnevsky, M.P. Ponomarenko, I.V. Ischenko,H. Karas, N.A. Kolchanov, H. Sklenar, E. Wingender (1997). TRRDand COMPEL databases on transcription linked to TRANSFAC astools for analysis and recognition of regulatory sequences. (1997)LECTURE NOTES IN COMPUTER SCIENCE, in press.

Holger Karas, Alexander Kel, Olga Kel, Nikolay Kolchanov, and EdgarWingender (1997). Integrating knowledge on gene regulation by afederated database approach: TRANSFAC, TRRD and COMPEL.Jurnal Molekularnoy Biologii (Russian), in press.

Kel A.E., Kolchanov N.A., Kel O.V., Romaschenko A.G., Ananko E.A.,Ignatyeva E.V., Merkulova T.I., Podkolodnaya O.A., Stepanenko I.L.,Kochetov A.V., Kolpakov F.A., Podkolodniy N.L., Naumochkin A.A.(1997). TRRD: A database on transcription regulatory regions ofeukaryotic genes. Jurnal Molekularnoy Biologii (Russian) in press.

O.V. Kel, A.E. Kel, A.G. Romaschenko, E. Wingender, N.A. Kolchanov(1997). Composite regulatory elements: classification and descriptionin the COMPEL data base. Jurnal Molekularnoy Biologii (Russian),in press.

Data-Management Tools for GenomicDatabases

Victor M. Markowitz and I-Min A. ChenInformation and Computing Sciences Division; LawrenceBerkeley National Laboratory; Berkeley, CA 94720510/486-6835, Fax: -4004, [email protected]://gizmo.lbl.gov/opm.html

The Object-Protocol Model (OPM) data management toolsprovide facilities for constructing, maintaining, and explor-ing efficiently molecular biology databases. Molecular bi-ology data are currently maintained in numerous molecularbiology databases (MBDs), including large archival MBDssuch as the Genome Database (GDB) at Johns HopkinsSchool of Medicine, the Genome Sequence Data Base(GSDB) at the National Center for Genome Resources,and the Protein Data Bank (PDB) at Brookhaven NationalLaboratory. Constructing, maintaining, and exploringMBDs entail complex and time-consuming processes.

The goal of the Object-Protocol Model (OPM) data man-agement tools is to provide facilities for efficiently con-structing, maintaining, and exploring MBDs, usingapplication-specific constructs on top of commercial data-base management systems (DBMSs). The OPM tools will

also provide facilities for reorganizing MBDs and for ex-ploring seamlessly heterogenous MBDs. The OPM toolsand documentation are available on the Web and are devel-oped in close collaboration with groups maintainingMBDs, such as GDB, GSDB, and PDB.

Current work focuses on providing new facilities for con-structing and exploring MBDs. The specific aims of thiswork are:

(1) Extend the OPM query language with additional con-structs for expressing complex conditions, and enhance theOPM query optimizer for generating more efficient queryplans.

(2) Develop enhanced OPM query interfaces supportingMBD-specific data types (e.g., protein data type) and op-erations (e.g., protein data display and 3D search), and as-sisting users in specifying and interpreting query results.

(3) Provide support for customizing MBD interfaces.

(4) Extend the OPM tools with facilities for managing per-missions (object ownership) in MBDs, and for physicaldatabase design of relational MBDs, including specifica-tion of indexes, allocation of segments, and handling ofredundant (denormalized) data.

(5) Develop OPM tools for constructing and maintainingmultiple OPM views for both relational and non-relational(e.g., ASN.1, AceDB) MBDs. For a given MBD, these toolswill allow customizing different OPM views for differentgroups of scientists. For heterogeneous MBDs, this tool willallow exploring them using common OPM interfaces.

(6) Develop tools for constructing OPM basedmultidatabase systems of heterogeneous MBDs and forexploring and manipulating data in these MBDs via OPMinterfaces. As part of this effort, the OPM-basedmultidatabase system which consists currently of GDB 6.0and GSDB 2.0, will be extended to include additionalMBDs, primarily GSDB 2.2 (when it becomes available),PDB, and Genbank.

(7) Develop facilities for reorganizing OPM-basedMBDs.The database reorganization tools will support au-tomatic generation of procedures for reorganizing MBDsfollowing restructuring (revision) of MBD schemas.

In the past year, the OPM data management tools have beenextended in order to address specific requirements of devel-oping MBDs such as GDB 6 and the new version of PDB.

The current version of the OPM data management tools(4.1) was released in June 1996 for Sun/OS, Sun/Solarisand SGI. The following OPM tools are available on theWeb at http://gizmo.lbl.gov/opm.html:

(1) an editor for specifying OPM schemas;

○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics


(2) a translator of OPM schemas into relational databasespecifications and procedures;

(3) utilities for publishing OPM schemas in text (Latex),diagram (Postscript), and Html formats;

(4) a translator of OPM queries into SQL queries;

(5) a retrofitting tool for constructing OPM schemas(views) for existing relational genomic databases;

(6) a tool for constructing Web-based form interfaces toMBDs that have an OPM schema; this tool was developedby Stan Letovsky at Johns Hopkins School of Medicine, aspart of a collaboration.

The OPM data management tools have been highly suc-cessful in developing new genomic databases, such asGDB 6 (released in January 1996; http://gdbgeneral.gdb.org/gdb/) and the relational version of PDB (http://terminator.pdb.bnl.gov:4148), and in constructing OPMviews and interfaces for existing genomic databases suchas GSDB 2.0. The OPM data management tools are cur-rently used by over ten groups in USA and Europe. Theresearch underlying these tools is described in several pa-pers published in scientific journals and presented at data-base and genome conferences.

In the past year the OPM tools have been presented at da-tabase and bioinformatics conferences, including the Inter-national Symposium on Theoretical and ComputationalGenome Research, Heidelberg, Germany, March 1996, theWorkshop on Structuring Biological Information, Heidel-berg, Germany, March 1996, the Meeting on GenomeMapping and Sequencing, Cold Spring Harbor, May 1996,the International Sybase User Group Conference, May1996, the Bioinformatics -Structure Conference, Jerusa-lem, November 1996, and the Pacific Symposium onBioinformatics, January 1997.

The results of the research and development underlyingthe OPM tools work have been presented in papers pub-lished in proceedings of database and bioinformatics con-ferences; these papers are available at http://gizmo.lbl.gov/opm.html#Publications.


The Genome Topographer: SystemDesign

S. Cozza, D. Cuddihy, R. Iwasaki, M. Mallison, C. Reed,J. Salit, A. Tracy, and T. MarrCold Spring Harbor Laboratory; Cold Spring Harbor, NY11724Marr: 516/367-8393, Fax: -8461, [email protected] [email protected]

Genome Topographer (GT) is an advanced genomeinformatics system that has received joint funding fromDOE and NIH over a number of years. DOE funding hasfocused on GT tools supporting computational genomeanalysis, principally on sequence analysis. GT is scheduledfor public release next spring under the auspices of theCold Spring Harbor Human Genome Informatics ResearchResource. GT has 17 major existing frameworks: 1. Views,including printing, 2. Default manager, 3. Graphical UserInterface, 4. Query, 5. Project Manager, 6. WorkspaceManager, 7. Asynchronous Process Manager, 8. StudyManager, 9. Help, 10. Application, 11. Notification, 12.Security, 13. World Wide Web Interface, 14. NCBI, 15.Reader, 16. Writer, 17. External Database Interface. GTFrameworks are independent sets of VisualWorks (client)or SmallTalkDB (GemStone) classes which interact to per-form the duties required to satisfy the responsibilities ofthe specific framework. Each framework is clearly definedand has a well-defined interface to use it. These frame-works are used over and over in GT to perform similar du-ties in different places. GT has basic tools and specialtools. Basic tools get used many times in different applica-tions, while special tools tend to be special purpose, de-signed to do fairly limited things, although the distinctionis somewhat arbitrary. Tools typically use several frame-works when they get assembled. Basic Tools: 1. ProjectBrowser, 2. Editor/Viewer, 3. Query, 4. NCBI Entrez, 5.File reader/writer, 6. Map comparison, 7. Database Admin-istrator, 8. Login, 9. Default, 10. Help. Special Tools: 1.Study Manager, 2. Compute Server, 3. Sequence Analysis,4. Genetic Analysis. These frameworks and tools are com-bined with a comprehensive database schema of very richbiological expression linked with plugable computationaltools. Taken together, these features allow users to con-struct, with relative ease, on-line databases of the primarydata needed to study a genetic disease (or genes and phe-notypes in general) from the stage of family collection anddiagnostic ascertainment through cloning and functionalanalysis of candidate genes, including mutational analysis,expression information, and screening for biochemical in-teractions with candidate molecules. GT was designed onthe premise that a highly informative, visual presentationof comprehensive data to a knowledgeable user is essentialto their understanding. The advanced software engineeringtechniques that are promoted by using relatively new ob-ject oriented products has allowed GT to become a highlyinteractive and visually-oriented system that allows theuser to concentrate on the problem rather than on the com-puter. Using the rich data representational features charac-teristic of this technology, the GT software enables users toconstruct models of real-world, complex biological phe-nomena. These unique features of GT are key to the thesisthat such a system will allow users to discover otherwiseintractable networks of interactions exhibited by complexgenetic diseases.

○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics


The VisualWorks development environment allows thedevelopment of code that runs unchanged across all majorworkstation and personal computers, including PCS,Macintoshes and most Unix workstations.


A Flexible Sequence Reconstructor forLarge-Scale DNA Sequencing: ACustomizable Software System forFragment Assembly

Gene Myers and Susan LarsonDepartment of Computer Science; University of Arizona;Tucson, AZ 85721602/621-6612, Fax: -4246, [email protected]://www.cs.arizona.edu/faktory

We have completed the design and begun construction of asoftware environment in support of DNA sequencingcalled the “FAKtory”. The environment consists of (1) ourpreviously described software library, FAK, for the corecombinatorial problem of assembling fragments, (2) a Tcl/Tk based interface, and (3) a software suite supporting amodest database of fragments and a processing pipelinethat includes clipping and vector prescreening modules. Akey feature of our system is that it is highly customizable:the structure of the fragment database, the processing pipe-line, and the operation of each phase of the pipeline arespecifiable by the user. Such customization need only beestablished once at a given location, subsequently userssee a relatively simple system tailored to their needs. In-deed one may direct the system to input a raw dataset ofsay ABI trace files, pass them through a customized pipe-line, and view the resulting assembly with two buttonclicks.

The system is built on top of our FAK software library andas a consequence one receives (a) high-sensitivity overlapdetection, (b) correct resolution to large high-fidelity re-peats, (c) near perfect multi-alignments, and (d) support ofconstraints that must be satisfied by the resulting assem-blies. The FAKtory assumes a processing pipeline for frag-ments that consists of an INPUT phase, any number andsequence of CLIP, PRESCREEN, and TAG phases, fol-lowed by an OVERLAP and then an ASSEMBLY phase.The sequence of clip, prescreen, and tag phases iscustomizable and every phase is controlled by a panel ofuser-settable preferences each of which permits setting thephase’s mode to AUTO, SUPERVISED, or MANUAL.This setting determines the level of interaction required bythe user when the phase is run, ranging from none tohands-on. Any diagnostic situations detected during pipe-line processing are organized into a log that permits one to

confirm, correct, or undo decisions that might have beenmade automatically.

The customized fragment database contains fields whosetype may be chosen from TIME, TEXT, NUMBER, andWAVEFORM. One can associate default values for fieldsunspecified on input and specify a control vocabulary lim-iting the range of acceptable values for a given field (e.g.,John, Joe, or Mary for the field Technician, and [1, 36] forthe field Lane). This database may be queried withSQL-like predicates that further permit approximatematching over text fields. Common queries and/or sets offragments selected by them may be named and referred tolater by said name. The pipeline status of a fragment maybe part of a query.

The system permits one to maintain a collection of alterna-tive assemblies, to compare them to see how they are dif-ferent, and directly manipulate assemblies in a fashionconsistent with sequence overlaps. The system can be cus-tomized so that a priori constraints reflecting a given se-quencing protocol (e.g. double-barreled or transposon-mapped) are automatically produced according to the syn-tax of the names of fragments (e.g. X.f and X.r for any Xare mates for double-barreled sequencing). The systempresents visualizations of the constraints applied to an as-sembly, and one may experiment with an assembly by add-ing and/or removing constraints. Finally, one may edit themulti-alignment of an assembly while consulting the rawwaveforms. Special attention was given to optimizing theergonomics of this time-intensive task.


The Role of Integrated Software andDatabases in Genome SequenceInterpretation and MetabolicReconstruction

Terry Gaasterland, Natalia Maltsev, Ross Overbeek, andEvgeni SelkovMathematics and Computer Science Division; ArgonneNational Laboratory; Argonne, IL 60439630/252-4171, Fax: -5986, [email protected]: http://www.mcs.anl.gov/home/gaasterl/magpie.htmlWIT: http://www.cme.msu.edu/WIT

As scientists successfully sequence complete genomes, theissue of how to organize the large quantities of evolvingsequence data becomes paramount. Through our work incomparative whole genome analysis (MAGPIE,Gaasterland) and metabolic reconstruction algorithms(WIT, Overbeek, Maltsev, and Selkov), we carry genomeinterpretation beyond the identification of gene products tocustomized views of an organism’s functional properties.

○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics


MAGPIE is a system designed to reside locally at the siteof a genome project and actively carry out analysis of ge-nome sequence data as it is generated.1,2 DNA sequencesproduced in a sequencing project mature through a seriesof stages that each require different analysis activities.Even after DNA has been assembled into contiguous frag-ments and eventually into a single genome, it must beregularly reanalyzed. Any new data in public sequence da-tabases may provide clues to the identity of genes. Over ayear, for 2 megabases with 4-fold coverage, MAGPIE willrequest on the order of 100,000 outputs from remoteanalysis software, manipulate and manage the output, up-date the current analysis of the sequence data, and monitorthe project sequence data for changes that initiate reanaly-sis.

In collaboration with Canada’s Institute for Marine Bio-sciences and the Canadian Institute for Advanced Re-search, MAGPIE is being used to maintain and study com-parative views of all open reading frames (ORFs) acrossfully sequenced genomes (currently 5), nearly completedgenomes (currently 2) and 1 genome in progress(Sulfolobus solfataricus). Together, these genomes repre-sent multiple archaeal and bacterial genomes and one eu-karyotic genome. This analysis provides the necessary datato assign phylogenetic classifications to each ORF (e.g.,“AE” for archaeal and eukaryotic). This data in turn pro-vides the basis for validating and assessing functional an-notations according to phylogenetic neighborhood (e.g.,selecting the eukaryotic form of a biochemical functionover a bacterial form for an “AE” ORF).3

Once an automated functional overview has been estab-lished, it remains to pinpoint the organisms’ exact meta-bolic pathways and establish how they interact.To this end,the WIT (What Is There) system supports efforts to de-velop metabolic reconstructions. Such constructions, ormodels, are based on sequence data, clearly establishedbiochemistry of specific organisms, understanding of theinterdependencies of biochemical mechanisms. WIT thusoffers a valuable tool for testing current hypotheses aboutmicrobial behavior. For example, a reconstruction maybegin with a set of established enzymes (enzymes withstrong similarities in identified coding regions to existingsequences for which the enzymatic function is known) andputative enzymes (enzymes with weak similarity to se-quences of known function). From these initial “hits,”within a phylogenetic perspective, we identify an initial setof pathways. This set can be used to generate a set of ex-pected enzymes (enzymes that have not been clearly de-tected, but that would be expected given the set of hypoth-esized pathways) and missing enzymes (enzymes that oc-cur in the pathways but for which no sequence has yetbeen biochemically identified for any organism). Furtherreasoning identifies tentative connective pathways.

In addition to helping curators develop metabolic recon-structions, WIT lets users examine models curated by ex-perts, follow connections between more than two thousandmetabolic diagrams, and compare models (e.g., which ofcertain genes that are conserved among bacterial genomesare found in higher life). The objective is to set the stagefor meaningful simulations of microbial behavior and thusto advance our understanding of microbial biochemistryand genetics.

DOE Contract No. W-31-109-Eng-38 (ANL FWP No.60427).

References[1] T. Gaasterland and C. Sensen, Fully Automated Genome Analysis that

Reflects User Needs and Preferences --a Detailed Introduction to theMAGPIE System Architecture, Biochemie, 78(4), (accepted).

[2] T. Gaasterland, J. Lobo, N. Maltsev, and G. Chen. Assigning Functionto CDS Through Qualified Query Answering. In Proc. 2nd Int. Conf.Intell. Syst. for Mol. Bio., Stanford U. (1994).

[3] T. Gaasterland and E. Selkov. Automatic Reconstruction of MetabolicStructure from Incomplete Genome Sequence Data. In Proc. Int.Conf. Intell. Syst. for Mol. Bio., Cambridge, England (1995).

Database Transformations forBiological Applications

G. Christian Overton, Susan B. Davidson,1 and PeterBuneman1

Department of Genetics and 1Department of Computer andInformation Science; University of Pennsylvania;Philadelphia, PA 19104Overton: 215/573-3105, Fax: -3111, [email protected]: 215/898-3490, Fax: -0587, [email protected]: 215/898-7703, Fax: -0587,[email protected]://agave.humgen.upenn.edu/cpl/cplhome.htmlhttp://sdmc.iss.nus.sg/kleisli-stuff/MoreInfo.html

We have implemented a general-purpose query system,Kleisli, that provides access to a variety of “non-standard”data sources (e.g., ACeDB, ASN.1, BLAST), as well as to“standard” relational databases. The system represents amajor advance in the ability to integrate the growing num-ber and diversity of biology data sources conveniently andefficiently. It features a uniform query interface, the CPLquery language, across heterogeneous data sources, amodular and extensible architecture, and most significantlyfor dealing with the Internet environment, a programmableoptimizer. We have demonstrated the utility of the systemin composing and executing queries that were considereddifficult, if not unanswerable, without first either buildinga monolithic database or writing highly application-specific integration code (details and examples available atURL above).

In conjunction with other software developed in our group,we have assembled a toolset that supports a range of data

○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics


integration strategies as well as the ability to create spe-cialized data warehouses initialized from community data-bases. Our integration strategy is based upon the conceptof “mediators”, which serve a group of related applicationsby providing a uniform structural interface to the relevantdata sources. This approach is cost-effective in terms ofquery development time and maintenance. We have exam-ined in detail methods for optimizing queries such as “re-trieve all known human sequence containing an Alu repeatin an intragenic region” where the data sources are hetero-geneous and distributed across the Internet.

Transformation of data resources, that is the structural re-organization of a data resource from one form to another,arises frequently in genome informatics. Examples includethe creation of data warehouses and database evolution.Implementing such transformations by hand on a case bycase basis is time consuming and error prone. Conse-quently there is a need for a method of specifying, imple-menting and formally verifying transformations in a uni-form way across a wide variety of different data models.Morphase is a prototype system for specifying transforma-tions between data sources and targets in an intuitively ap-pealing, declarative language based on Horn clause logic.Transformations specification in Morphase are translatedinto CPL and executed in the Kleisli system. Thedata-types underlying Morphase include arbitrarily nestedrecords, sets, variants, lists and object identity, thus captur-ing the types common to most data formats relevant to ge-nome informatics, including ASN.1 and ACE. Morphasecan be connected to a wide variety of data sources simulta-neously through Kleisli. In this way, data can be read frommultiple heterogeneous data sources, transformed usingMorphase according to the desired output format, and in-serted into the target data source.

We have tested Morphase by applying it to a variety ofdifferent transformation problems involving Sybase, ACEand ASN.1. For example, we used it to specify a transfor-mation between the Sanger Center’s Chromosome 22 ACEdatabase (ACE22DB) and a Chromosome 22 Sybase data-base (Chr22DB), as well as between a portion of GDB andChr22DB. Some of these transformations had already beenhand-coded without our tools, forming a basis for compari-son.

Once the semantic correspondences between objects in thevarious databases were understood, writing the transforma-tion program in Morphase was easy, even by a non-expertof the system. Furthermore, it was easy to find conceptualerrors in the transformation specification. In contrast, thehand-coded programs were obtuse, difficult to understand,and even more difficult to debug.


Relevant PublicationsP. Buneman, S.B. Davidson, K. Hart, C. Overton and L. Wong,”A Data

Transformation System for Biological Data Sources,” in Proceedingsof VLDB, Sept. 1995 (Zurich, Switzerland). Also available asTechnical Report MS-CIS-95-10, University of Pennsylvania, March1995.

S.B. Davidson, C. Overton and P. Buneman, “Challenges in IntegratingBiological Data Sources,” J. Computational Biology 2 (1995), pp557-572.

A. Kosky, “Transforming Databases with Recursive Data Structures,”PhD Thesis, December 1995.

S.B. Davidson and A. Kosky, “Effecting Database Transformations UsingMorphase,” Technical Report MS-CIS-96-05, University ofPennsylvania.

A. Kosky, S.B. Davidson and P. Buneman, “Semantics of DatabaseTransformations,” Technical Report MS-CIS-95-25, University ofPennsylvania, 1995.

K. Hart and L. Wong, “Pruning Nested Data Values Using BranchExpressions With Wildcards,” In Abstracts of MIMBD, Cambridge,England, July 1995.

Las Vegas Algorithm for GeneRecognition: Suboptimal andError-Tolerant Spliced Alignment

Sing Hoi Sze and Pavel A. Pevzner1

Departments of Computer Science and 1Mathematics;University of Southern California; Los Angeles, CA 90089Pevzner: 213/740-2407, Fax: [email protected]://www-hto.usc.edu/software/procrustes

Recently, Gelfand, Mironov, and Pevzner (Proc. Natl.Acad. Sci. USA, 1996, 9061-9066) proposed a splicedalignment approach to gene recognition that provides 99%accurate recognition of human gene if a related mamma-lian protein is available. However, even 99% accurate genepredictions are insufficient for automated sequence annota-tion in large-scale sequencing projects and therefore haveto be complemented by experimental gene verification.100% accurate gene predictions would lead to a substantialreduction of experimental work on gene identification. Ourgoal is to develop an algorithm that either predicts an exonassembly with accuracy sufficient for sequence annotationor warns a biologist that the accuracy of a prediction isinsufficient and further experimental work is required. Westudy suboptimal and error-tolerant spliced alignmentproblems as the first steps towards such an algorithm, andreport an algorithm which provides 100% accurate recog-nition of human genes in 37% of cases (if a related mam-malian protein is available). For 52% of genes, the algo-rithm predicts at least one exon with 100% accuracy.


○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics


Foundations for a Syntactic Pattern-Recognition System for Genomic DNASequences: Languages, Automata,Interfaces, and Macromolecules

David B. Searls and G. Christian Overton1

SmithKline Beecham Pharmaceuticals; King of Prussia,PA 19406610/270-4551, Fax: -5580, [email protected] of Genetics; University of Pennsylvania;Philadelphia, PA 19104

Viewed as strings of symbols, biological macromoleculescan be modelled as elements of formal languages. Genera-tive grammars have been useful in molecular biology forpurposes of syntactic pattern recognition, for example inthe author’s work on the GenLang pattern matching sys-tem, which is able to describe and detect patterns that areprobably beyond the capability of a regular expressionspecification. More recently, grammars have been used tocapture intramolecular interactions or long-distance depen-dencies between residues, such as those arising in foldedstructures. In the work of Haussler and colleagues, for ex-ample, stochastic context-free grammars have been used asa framework for “learning” folded RNA structures such astRNAs, capturing both primary sequence information andsecondary structural covariation. Such advances make thestudy of the formal status of the language of biologicalmacromolecules highly relevant, and in particular the find-ing that DNA is beyond context-free has already createdchallenges in algorithm design.

Moreover, to date, such methods have not been able tocapture relationships between strings in a collection, suchas those that arise via intermolecular interactions, or evolu-tionary relationships implicit in alignments. Recently wehave attempted to remedy this by showing (1) how formalgrammars can be extended to describe interacting collec-tions of molecules, such as hybridization products and,potentially, multimeric or physiological protein interac-tions, and (2) how simple automata can be used to modelevolutionary relationships in such a way that complexmodel-based alignment algorithms can be automaticallygenerated by means of visual programming. These resultsallow for a useful generalization of the language-theoreticmethods now applied to single molecules.

In addition, we describe a new software package—bioWidget—for the rapid development and deployment ofgraphical user interfaces (GUIs) designed for the scientificvisualization of molecular, cellular and genomics informa-tion. The overarching philosophy behind bioWidgets iscomponentry: that is, the creation of adaptable, reusablesoftware, deployed in modules that are easily incorporatedin a variety of applications and in such a way as to pro-mote interaction between those applications. This is in

sharp distinction to the common practice of developingdedicated applications. The bioWidgets project addition-ally focuses on the development of specific applicationsbased on bioWidget componentry, including chromo-somes, maps, and nucleic acid and peptide sequences.

The current set of bioWidgets has been implemented inJava with the goal in mind of delivering local applicationsand distributed applets via Intranet/Internet environmentsas required. The immediate focus is on developing inter-faces for information stored in distributed heterogeneousdatabases such as GDB, GSDB, Entry, and ACeDB. Theissues we are addressing are database access, reflectingdatabase schemas in bioWidgets, and performance. We arealso directing our efforts into creating a consortium ofbioWidget developers and end-users. This organizationwill create standards for and encourage the development ofbioWidget components. Primary participants in the consor-tium include Gerry Rubin (UC Berkeley) and NatGoodman (Jackson Labs).


Relevant PublicationsD.B. Searls, “String Variable Grammar: A Logic Grammar Formalism for

DNA Sequences,” Journal of Logic Programming 24 (1,2):73-102(1995).

D.B. Searls, “Formal Grammars for Intermolecular Structure,” FirstInternational Symposium on Intelligence in Neural and BiologicalSystems, 30-37 (1995).

D.B. Searls and K.P. Murphy, “Automata-Theoretic Models of Mutationand Alignment,” Third International Conference on IntelligentSystems for Molecular Biology, 341-349 ( 1995).

D.B. Searls, “bioTk: Componentry for Genome Informatics GraphicalUser Interfaces,” Gene 163 (2):GC1-16 (1995).

Analysis and Annotation of NucleicAcid Sequence

David J. States, Ron Cytron, Pankaj Agarwal, and HughChouInstitute for Biomedical Computing; WashingtonUniversity; St. Louis, MO 63108314/362-2134, Fax: -0234, [email protected]://www.ibc.wustl.edu

Bayesian estimates for sequence similarity: There is aninherent relationship between the process of pairwise se-quence alignment and the estimation of evolutionary dis-tance. This relationship is explored and made explicit. As-suming an evolutionary model and given a specific patternof observed base mismatches, the relative probabilities ofevolution at each evolutionary distance are computed us-ing a Bayesian framework. The mean or the median of thisprobability distribution provides a robust estimate of thecentral value. Bayesian estimates of the evolutionary dis-tance incorporate arbitrary prior information about variablemutation rates both over time and along sequence position,

○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics


thus requiring only a weak form of the molecular-clockhypothesis.

The endpoints of the similarity between genomic DNAsequences are often ambiguous. The probability of evolu-tion at each evolutionary distance can be estimated overthe entire set of alignments by choosing the best alignmentat each distance and the corresponding probability of du-plication at that evolutionary distance. A central value ofthis distribution provides a robust evolutionary distanceestimate. We provide an efficient algorithm for computingthe parametric alignment, considering evolutionary dis-tance as the only parameter.

These techniques and estimates are used to infer the dupli-cation history of the genomic sequence in C. elegans andin S. cerevisae. Our results indicate that repeats discoveredusing a single scoring matrix show a considerable bias insubsequent evolutionary distance estimates.

Model based sequence scoring metrics: PAM basedDNA comparison metric has been extended to incorporatebiases in nucleotide composition and mutation rates, ex-tending earlier work (States, Gish and Altschul, 1993). Acodon based scoring system has been developed that incor-porates the effects biased codon utilization frequencies.

A dynamic programming algorithm has been developedthat will optimally align sequences using a choice of com-parison measures (non-coding vs. coding, etc.). We are inthe process of evaluating this approach as a means foridentifying likely coding regions in cDNA sequences.

Efficient sequence similarity search tools: Most se-quence search tools have been designed for use with pro-tein sequence queries a few hundred residues long. Theanalysis of genomic DNA sequence necessitates the use ofqueries hundreds of kilobases or even megabases in length.A memory and computationally efficient search tool hasbeen developed for the identification of repeats and se-quence similarity in very large segments of nucleic acidsequence. The tool implements optimal encoding of theword table, repeat filters, flexible scoring systems, andanalytically parameterized search sensitivity. Output for-mats are designed for the presentation of genomic se-quence searches.

Federated databases: A sybase server and mirror forGSDB are being developed to facilitate the annotation ofrepeat sequence elements in public data repositories.


Gene Recognition, Modeling, andHomology Search in GRAIL andgenQuest

Ying Xu, Manesh Shah, J. Ralph Einstein, Sherri Matis,Xiaojun Guan, Sergey Petrov, Loren Hauser,1 Richard J.Mural,1 and Edward C. UberbacherComputer Science and Mathematics and 1BiologyDivisions; Oak Ridge National Laboratory; Oak Ridge,TN 37831Uberbacher: 423/574-6134, Fax: -7860, [email protected]://compbio.ornl.gov

GRAIL is a modular expert system for the analysis andcharacterization of DNA sequences which facilitates therecognition of gene features and gene modeling. A newversion of the system has been created with greater sensi-tivity for exon prediction (especially in AT rich regions),more accurate splice site prediction, and robust indel errordetection capability. GRAIL 1.3 is available to the user ina Motif graphical client-server system (XGRAIL), throughWWW-Netscape, by e-mail server, or callable from otheranalysis programs using Unix sockets.

In addition to the positions of protein coding regions andgene models, the user can view the positions of a numberof other features including poly-A addition sites, potentialPol II promoters, CpG islands and both complex andsimple repetitive DNA elements using algorithms devel-oped at ORNL. XGRAIL also has a direct link to thegenQuest server, allowing characterization of newly ob-tained sequences by homology-based methods using anumber of protein, DNA, and motif databases and com-parison methods such as FastA, BLAST, parallelSmith-Waterman, and special algorithms which considerpotential frameshifts during sequence comparison.

Following an analysis session, the user can use an annota-tion tool which is part of the XGRAIL 1.3 system to gener-ate a “feature table” report describing the current sequenceand its properties. Links to the GSDB sequence databasehave been established to record computer-based analysisof sequences during submission to the database or as thirdparty annotation.

Gene Modeling and Client-Server GRAIL: In additionto the current coding region recognition capabilities basedon a multiple sensor-neural network and rule base, mod-ules for the recognition of features such as splice junc-tions, transcription and translation start and stop, and othercontrol regions have been constructed and incorporatedinto an expert system (GAP III) for reliablecomputer-based modeling of genes. Heuristic methods anddynamic programming are used to construct first pass genemodels which include the potential for modification of ini-tially predicted exons. These actions result in a net im-provement in gene characterization, particularly in the rec-

○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics


ognition of very short coding regions. Translation of genemodels and database searches are also supported throughaccess to the genQuest server (described below).

Model Organism Systems: A number of model organismsystems have been designed and implemented and can beaccessed within the XGRAIL 1.3 client including Escheri-chia coli, Drosophila melanogaster and Arabidopsisthaliana. The performance of these systems is basicallyequivalent to the Human GRAIL 1.3 system. Additionalmodel organism systems, including several important mi-croorganisms, are in progress.

Error Detection in Coding Sequences: Single-pass DNAsequencing is becoming a widely used technique for geneidentification from both cDNA and genomic DNA se-quences. An appreciably higher rate of base insertion anddeletion errors (indels) in this type of sequence can causeserious problems in the recognition of coding regions, ho-mology search, and other aspects of sequence interpreta-tion. We have developed two error detection and “correc-tion” strategies and systems which make low-redundancysequence data more informative for gene identification andcharacterization purposes. The first algorithm detects se-quencing errors by finding changes in the statistically pre-ferred reading frame within a possible coding region andthen rectifies the frame at the transition point to make thepotential exon candidate frame-consistent. We have incor-porated this system in GRAIL 1.3 to provide analysiswhich is very error tolerant. Currently the system can de-tect about 70% of the indels with an indel rate of 1%, andGRAIL identifies 89% of the coding nucleotides comparedto 69% for the system without error correction. The algo-rithm uses dynamic programming and runs in time andspace linear to the size of the input sequence.

In the second method, a Smith-Waterman type comparisonis facilitated in which the frame of DNA translation to pro-tein sequence can change within the sequence. The transi-tion points in the translation frame are determined duringthe comparison process and a best match to potential pro-tein homologs is obtained with sections of translationsfrom more than one frame. The algorithm can detect ho-mologies with a sensitivity equivalent to Smith-Watermanin the presence of 5% indel errors.

Detection of Regulatory Regions: An initial PolymeraseII promoter detection system has been implemented whichcombines individual detectors for TATA, CAAT, GC, cap,and translation start elements and distance information us-ing a neural network. This system finds about 67% ofTATA containing promoters with a false positive rate ofone per 35 kilobases. Additionally a systems to detect po-tential polyA addition sites and CpG islands has been in-corporated into GRAIL.

The GenQuest Sequence Comparison Server: ThegenQuest server is an integrated sequence comparison

server which can be accessed via e-mail, using Unix sock-ets from other applications, Netscape, and through a Motifgraphical client-server system. The basic purpose of theserver system is to facilitate rapid and sensitive compari-son of DNA and protein sequences to existing DNA, pro-tein, and motif databases. Databases accessed by this sys-tem include the daily updated GSDB DNA sequence data-base, SwissProt, the dbEST expressed sequence tag data-base, protein motif libraries and motif analysis systems(Prosite, BLOCKS), a repetitive DNA library (from J.Jurka), Genpept, and sequences in the PDB protein struc-tural database. These options can also be accessed from theXGRAIL graphical client tool.

The genQuest server supports a variety of sequence querytypes. For searching protein databases, queries may be sentas amino acid or DNA sequence. DNA sequence can betranslated in a user specified frame or in all 6 frames.DNA-DNA searches are also supported. User selectablemethods for comparison include the Smith-Waterman dy-namic programming algorithm, FastA, versions of BLAST,and the IBM dFLASH protein sequence comparison algo-rithm. A variety of options for search can be specified in-cluding gap penalties and option switches forSmith-Waterman, FastA, and BLAST, the number of align-ments and scores to be reported, desired target databasesfor query, choice of PAM and Blosum matrices, and anoption for masking out repetitive elements. Multiple targetdatabases can be accessed within a single query.

Additional Interfaces and Access: Batch GRAIL 1.3 is anew “batch” GRAIL client allows users to analyze groupsof short (300-400 bp) sequences for coding character andautomates a wide choice of database searches for homol-ogy and motifs. A Command Line Sockets Client has beenconstructed which allows remote programs to call all thebasic analysis services provided by the GRAIL-genQuestsystem without the need to use the XGRAIL interface.This allows convenient integration of selected GRAILanalyses into automated analysis pipelines being con-structed at some genome centers. An XGRAIL MotifGraphical Client for the GRAIL release 1.3 has been con-structed using Motif with versions for a wide variety ofUNIX platforms including Sun, Dec, and SGI. The e-mailversion of GRAIL can be accessed at [email protected] andthe e-mail version of genQuest can be accessed [email protected]. Instructions can be obtained by sending theword “help” to either address. The Motif or Sun versionsof XGRAIL, batch GRAIL, and XgenQuest client softwareare available by anonymous ftp from grailsrv.lsd.ornl.gov(124.167.140.21). Both GRAIL and genQuest are accessibleover the World Wide Web (URL http://compbio.ornl.gov).Communications with the GRAIL staff should be ad-dressed to [email protected].


○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics


Informatics Support for Mappingin Mouse-Human Homology Regions

Edward Uberbacher, Richard Mural,1 Manesh Shah,Loren Hauser,1 and Sergey PetrovComputer Science and Mathematics Division and 1BiologyDivision; Oak Ridge National Laboratory; Oak Ridge, TN37831423/574-6134, Fax: -7860, [email protected]

The purpose of this project is to develop databases andtools for the Oak Ridge National Laboratory (ORNL)Mouse-Human Mapping Project, including the construc-tion of a mapping database for the project; tools for man-aging and archiving cDNAs and other probes used in thelaboratory; and analysis tools for mapping, interspecificbackcross, and other needs. Our initial effort involved in-stalling and developing a relational SYBASE database fortracking samples and probes, experimental results, andanalyses. Recent work has focused on a correspondingACeDB implementation containing mouse mapping dataand providing numerous graphical views of this data. Theinitial relational database was constructed with SYBASEusing a schema modeled on one implemented at theLawrence Livermore National Laboratory (LLNL) center;this was because of documentation available for the LLNLsystem and the opportunity to maximize compatibility withhuman chromosome 19 mapping. (Major homologies existbetween human chromosome l9 and mouse chromosome7, the initial focus of the ORNL work.)

With some modification, our ACeDB implementation wasmodeled somewhat on the Lawrence Berkeley NationalLaboratory (LBNL) chromosome 21 ACeDB system anddesigned to contain genetic and physical mouse map dataas well as homologous human chromosome data. The use-fulness of exchanging map information with LLNL (hu-man chromosome 19) and potentially with other centershas led to the implementation of procedures for data exportand the import of human mapping data into ORNL data-bases.

User access to the system is being provided by workstationforms-based data entry and ACeDB graphical data brows-ing. We have also implemented the LLNL databasebrowser to view human chromosome l9 data maintained atLLNL, and arrangements are being made to incorporatemouse mapping information into the browser. Other appli-cations such as the Encyclopedia of the Mouse, specifictools for archiving and tracking cDNAs and other mappingprobes, and analysis of interspecific backcross data andYAC restriction mapping have been implemented.

We would like to acknowledge use of ideas from theLLNL and LBNL Human Genome Centers.


SubmitData: Data Submissionto Public Genomic Databases

Manfred D. ZornSoftware Technologies and Applications Group;Information and Computing Sciences Division; LawrenceBerkeley National Laboratory; University of California;Berkeley CA 94720510/486-5041, Fax: -4004, [email protected]://www-hgc.lbl.gov/submitr.html

Making information generated by the various genomeprojects available to the community is very important forthe researcher submitting data and for the overall project tojustify the expenses and resources. Public genome data-bases generally provide a protocol that defines the requireddata formats and details how they accept data, e.g., se-quences, mapping information. These protocols have tostrike a balance between ease of use for the user and op-erational considerations of the database provider, but are inmost cases rather complex and subject to change to accom-modate modifications in the database.

SubmitData is a user interface that formats data for sub-mission to GSDB or GDB. The user interface serves dataentry purposes, checking each field for data types, allowedranges and controlled values, and gives the user feedbackon any problems. Besides one-time submissions, templatescan be created that can later be merged withTAB-delimited data files, e.g., as produced by commonspreadsheet programs. Variables in the template are thenreplaced by values in defined columns of the input datafile. Thus submitting large amounts of related data be-comes as easy as selecting a format and supplying an inputfilename. This allows easy integration of data submissioninto the data generation process.

The interface is generated directly from the protocol speci-fications. A specific parser/compiler interprets the protocoldefinitions and creates internal objects that form the basisof the user interface. Thus a working user interface, i.e.,static layout of buttons and fields, data validation, is auto-matically generated from the protocol definitions. Protocolmodifications are propagated by simply regenerating theinterface.

The program has been developed using ParcPlaceVisualWorks and currently supports GSDB, GDB andRHdb data submissions. The program has been updated touse VisualWorks 2.0.


○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics


The Human Genome: Science andthe Social Consequences; InteractiveExhibits and Programs on Geneticsand the Human Genome

Charles C. CarlsonThe Exploratorium; San Francisco, CA 94123415/561-0319, Fax: -0307; [email protected]

From April through September 1995, the Exploratoriummounted a special exhibition called Diving into the GenePool consisting of 26 interactive exhibits developed overthe course of three years. The exhibits introduce the scienceof genetics and increase public awareness of the HumanGenome Project and its implications for society. Foundedin the success of exhibits developed for the 1992 geneticsand biotechnology symposium “Winding Your Way ThroughDNA” (co-hosted with the University of California, SanFrancisco), the 1995 exhibition aimed to create an engag-ing and accessible presentation of specific informationabout genetic science and our understanding of the struc-ture and function of the human genome, genetic technol-ogy, and ethical issues surrounding current genetic science.

In addition to creating a unique collection of exhibits, theproject developed a range of supplemental public program-ming to provide public forum for discussion and interac-tion about genetics and bioethics. A lecture series entitled“Bioethics and the Human Genome Project,” featured suchkey thinkers as Mary Claire King, Leroy Hood, DavidMartin, Troy Duster, Michael Yesley, William Atchley, andJoan Hamilton (among others). A weekend event programfocused on biodiversity in animal and plant life withevents such as “Seedy Science,” “Blooming Genes,” and“Dog Diversity.” A Biotech Weekend offered access tonew technologies through demonstrations by local biotechfirms and genetic counselors. And a specially-commis-sioned theatre piece, “Dog Tails,” provided a instructiveand comic look for kids into the foundations of geneticsand issues of diversity.

In the 5-month exhibition period, approximately 300,000visitors had the opportunity to visit the exhibition, andwell over 5,000 participated in the special programming.Following the exhibition’s close, the new exhibits will be-come a permanent part of the Exploratorium’s collectionof over 650 interactive exhibits.

Additional funding for 1995-96 will support formal outsideevaluation of the effectiveness of the exhibits, and supportexhibit remediation based on the evaluation findings. Thisactivity will both strengthen the Exploratorium’s permanentcollection of genetics exhibits and help to develop a feasi-bility study for a travelling version of the genetics exhibi-tion for other museums around the country and the world.


Documentary Series for PublicBroadcasting

Graham Chedd and Noel SchwerinChedd-Angier Production Company; Watertown, MA02172617/926-8300, Fax: -2710

Designed as a 4-hour documentary series for PublicBroadcasting, Genetics in Society (working title) will ex-plore the ethical, legal, and social implications of genetictechnology. Currently funded and in production for a 90-minute special (Testing Family Ties), the first program pro-files several individuals and families as they confront ge-netic tests and the information they generate. One high-risk cancer family struggles to make sense of their geneticlegacy as it debates prophylactic surgery and whether ornot to test for BRCA1 and BRCA2. In a family without thatfamily risk, news of the Ashkenazi BRCA1 finding pushesan anxious Jewish woman to demand testing for herselfand her young daughter. In another, a woman chooses tocarry to term her prenatally diagnosed Cystic Fibrosistwins, despite social and personal pressures. In a third, ascientist researching the so-called “obesity gene” at abiotech company debates the proper “marketing” of hisresearch and confronts the larger questions it raises aboutwhat should be considered “normal” and what constitutestherapy vs enhancement.

Testing Family Ties will explore not only what genetictechnology does—in testing, drug development, and po-tential therapy—but what it means to our sense of self,family, and future and to our concepts of health and nor-mality.

Depending on outstanding funding requests, Genetics inSociety will be broadcast in the Fall of 1996 or the Winterof 1997 on PBS. Noel Schwerin is Producer/Director. Gra-ham Chedd is Executive Producer.


Human Genome Teacher NetworkingProject

Debra L. Collins and R. Neil SchimkeGenetics Education Center; Division of Endocrinology andGenetics; University of Kansas Medical Center; KansasCity, KS 66160-7318913/588-6043, Fax: -4060, [email protected]://www.kumc.edu/GEC

This project links over 150 middle and secondary teachersfrom throughout the United States with genetic and publicpolicy professionals, as well as families who are knowl-edgeable about the ethical, legal, and social implications

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Ethical, Legal, and Social Issues


(ELSI) of the Human Genome Project. Teachers networkwith peers and professionals, and acquire new sources ofinformation during four phases: 1) the first one-week sum-mer workshop to update teachers on human genetics con-cepts and new sources for classroom curricula includingonline resources; 2) classroom use of new materials andinformation; 3) the second one-week summer workshopwhere teachers return to exchange successful teachingideas and plan peer teaching sessions and mentor network-ing; 4) dissemination of genetic information throughin-services and workshops for colleagues; and collabora-tion with genetic professional participating in our MentorNetwork.

The applications of Human Genome Project technologyare emphasized. Individuals who have contact and experi-ence with patients, including clinical geneticists, geneticcounselors, attorneys, laboratories geneticists and families,take part in didactic sessions with teachers. Throughout theworkshop, family panels provide an opportunity for par-ticipants to compare their textbook-based knowledge ofgenetic conditions with the personal experiences of fami-lies who discuss their condition, including: diagnosis,treatment, genetic risk, decisions, insurance, employment,family planning, and confidentiality.

Because of this project, teachers feel more prepared andconfident teaching about human genetics, the Human Ge-nome Project, and ELSI topics. The teachers are effectivein disseminating knowledge of genetics to their studentswho show a significant increase in human genome knowl-edge compared to students whose teachers have not par-ticipated in this project.

Teacher dissemination activities extend the project beyondparticipation at summer workshops. To date, 55 workshopparticipants have completed all four project phases by or-ganizing more than 200 local, regional, and nationalteacher education programs to disseminate knowledge andresources. More than 1500 colleagues and the general pub-lic have participated in teacher workshops, and over56,000 students have been reached through project partici-pants and their peers.

The project participants organize interdisciplinary peerteaching sessions including bioethical decision makingsessions combining debate and biology classes; sessionsfor social studies teachers; human genetics andmulti-cultural collaborations; cooperative learning activi-ties; and curricular development sessions. Students wereinvolved in sessions on ethics, politics, economics and law.Teachers organize bioethics curriculum writing sessions,laboratory activities using electrophoresis as well as otherbiotechnology, and sessions on genetic databases.

A World Wide Web home page for Genetics Education as-sists teachers in remaining current on genetic informationand helps them find answers to student inquiries. The

home page has links to numerous genome sites, sources ofinformation on genetic conditions, networking opportuni-ties with other genetics education programs, teaching re-sources, lesson plan ideas, and the Mentor Network of ge-netic professionals and a network of family support groupswilling to work with teachers and their students.


Human Genome Education Program

Lane ConnHuman Genome Education Program; Stanford HumanGenome Center; Palo Alto, CA 94304415/812-2003, Fax: -1916, [email protected]

The Human Genome Education Program (HGEP) operateswithin the Stanford Human Genome Center. It is a collabo-rative effort among HGEP staff, Genome Center scientists,collaborating staff from other education programs, experi-enced high school teachers, and an Advisory Panel in thefields of science, education, social science, assessment,and ethics.

The Human Genome Project will have a profound impacton society with its applications in testing for and improv-ing treatment of genetic disease and the many uses ofDNA profiling. The goal of HGEP is to help prepare highschool students and community members to be able tomake educated decisions on the personal, ethical, socialand policy questions raised by the application of genomeinformation and technology in their lives.

The primary objectives for HGEP are to (1) develop a hu-man genome curriculum for high school science and (2)education outreach to schools and community groups inthe San Francisco Bay Area. To achieve Objective 1, theHGEP is working to develop, field test, and prepare fornational dissemination a two laboratory-based curriculumunits for high school students. Unit 1, “Dealing With Ge-netic Disorders,” explores the variety of treatment optionspotentially available for a genetic disorder, including genetherapy. Unit 2, “DNA Snapshots, Peeking at Your DNA,”explores human relatedness through examining thestudent’s own DNA polymorphisms using PCR.

Each unit is centered around a societal or ethical problemraised by these important applications of genome informa-tion and technology. Students use modeling exercises andinquiry laboratory experiments to learn about the sciencebehind a given application. Students then combine the sci-ence they have learned with other relevant information tochoose a solution to the societal/ethical problem posed inthe unit. As a culminating activity, the students work ingroups to present and defend their solution.

○ ○ ○ ○

ELSI


To achieve Objective 2, the HGEP provides Genome Cen-ter tours for teacher, student and community groups thatinvolve pre-tour lectures; tour exploration of genome map-ping, sequencing and informatics; and post-tour lectureand discussion on genome applications, and their socialand ethical implications. Also, the education program con-tinues to work to establish and sustain local science educa-tion partnerships among schools, industry, universities andnational laboratories.


Your World/Our World–Biotechnology &You: Special Issue on the HumanGenome Project

Jeff Davidson and Laurence WeinbergerPennsylvania Biotechnology Association; State College,PA 16801814/238-4080, Fax: -4081, [email protected]

Your World/Our World is a biotechnology science maga-zine published semi-annually by the non-profit Pennsylva-nia Biotechnology Association (PBA) describing for sev-enth to tenth grade students the excitement and achieve-ments of contemporary biotechnology. This is the onlycontinuing source of biotechnology education specificallydirected to this age group - an age at which students toofrequently are turned off from science. The special Spring1996 issue will be devoted to the presentation of the sci-ence behind the HGP, the HGP itself, and the ethical, legal,and social issues generated by the project. The strong em-phasis on attractive graphic presentation and age appropri-ate text that have been the hallmark of the earlier issues,which have been highly acclaimed and well received bythe educational, scientific, and business community, willbe continued.

PBA believes that increased educational opportunities tolearn about biotechnology are most effective if presentedat the seventh to tenth grade levels for the following rea-sons:

• Full semester life science and biology classes oftenoccur for the first time in these grades;

• Across the nation, textbooks are typically 10 to 14years old, and even the most recent textbooks arequickly dated by the rapid development in the biologi-cal sciences;

• Curricula at this level are more flexible than highschool curricula, allowing the addition of informationabout exciting biological developments; and

• Science at this level is generally not elective, and,therefore, a very comprehensive student population isaddressed rather than the more selective populationsavailable later in the educational program.

In creating Your World/Our World, the PBA defined thefollowing educational goals to guide the development ofthe magazine:

• Contribute to general science literacy and an educatedelectorate;

• Contribute to biological and technological literacy;and

• Motivate students to pursue additional science studyand careers in science, particularly among women andminority populations.

PBA recognizes that it has been a point of pride thatbiotechnologists have been uniquely concerned with theimpact of their technology on society and have been thefirst to raise and encourage responsible public debate with-out being forced to do so by others. To do less now for thechildren would be a breach of this responsible history. Ac-cordingly, this special HGP issue will address the ethical,legal, and social issues raised by the new genomic tech-nologies. Special ethics advisors have been recruited to aidin the development of these aspects.

A complimentary copy of the special issue and its teachers’guide will be mailed to every public and private schoolseventh to tenth grade science teacher (approximately40,000) in the United States. A cover announcement willexplain the origin and development of the magazine and ofthe special edition. Teachers will be invited to purchasefull classroom packets (30 copies & teacher’s guide) fromthe PBA, but, if they are not able to afford the packets,they will be asked to respond by postcard indicating theirinterest. The cost of the packets will probably be in the $20range. The PBA is actively seeking additional support sothat the issue may be distributed for free or at a reducedcost. In addition, parts of the special issue will be availableover the Internet via a World Wide Web Page.

PBA believes this is a unique opportunity to educateAmerica’s youth about the HGP and insure that accuratenon-sensational information will be made available to ourcountry’s children.


The Human Genome Project andMental Retardation: An EducationalProgram

Sharon DavisDepartment of Research and Program Services; The Arcof the United States; Arlington, TX 76010817/261-6003, Fax: /277-3491, [email protected]://The Arc.org/welcome.html

The Arc of the United States, a national organization onmental retardation, with 140,000 members and more than1000 affiliated chapters proposes to educate its general

○ ○ ○ ○

ELSI


membership and volunteer leaders about the Human Ge-nome Project as it relates to mental retardation. A largenumber of identified causes of mental retardation are ge-netic, and many family members of The Arc deal with is-sues related to a genetic condition on a daily basis. We be-lieve it is critical for our members and leaders to be edu-cated about the scientific and ethical, legal and social as-pects of the HGP, so that the association can evaluate anddiscuss the issues and develop positions based on adequateknowledge.

The major objectives of the proposed three-year projectare to develop and disseminate educational materials formembers/leaders of The Arc to inform them about the Hu-man Genome Project and mental retardation and to con-duct training on the scientific and ethical, legal and socialaspects of the Human Genome Project and mental retarda-tion using The Arc’s existing training vehicles.

The Arc will develop and disseminate educational materi-als oriented toward families and conduct training at its na-tional and state conventions, local chapter meetings and atboard of director’s meetings. The American Association ofUniversity Affiliated Programs for Persons with Develop-mental Disabilities (AAUAP) will assist with the projectby providing needed expertise. The AAUAP membershipincludes university faculty who are experts on the geneticcauses of mental retardation and on related ethical, legaland social issues. An advisory panel of university scientistsand leaders of The Arc will guide the project.


Pathways to Genetic Screening:Molecular Genetics Meets the High-Risk Family

Troy Duster and Diane Beeson1

Institute for the Study of Social Change; University ofCalifornia; Berkeley, CA 94705510/642-0813, Fax: /8674, [email protected] of Sociology; California State University;Hayward, CA 94542

The proliferation of genetic screening and testing is requir-ing increasing numbers of Americans to integrate geneticknowledge and interventions into their family life and per-sonal experience. This study examines the social processesthat occur as families at risk for two of the most commonautosomal recessive diseases, sickle cell disease (SC) andcystic fibrosis (CF), encounter genetic testing. Since eachof these diseases is found primarily in a different ethnic/racial group (CF in European Americans and SC is AfricanAmericans), this research will clarify the role of culture inintegrating genetic testing into family life and reproductiveplanning. A third type of genetic disorder, the

thalassemias, has recently been added to our sample in or-der to extend our comparative frame to include other eth-nic and racial groups. In California, the thalassemias pri-marily affect Southeast Asian immigrants, although an-other risk group is from the Mediterranean region.Thalassemias, like cystic fibrosis and sickle cell disease,have a similar pattern of inheritance and raise similarlyserious bio-medical challenges and issues of informationmanagement.

Data are drawn from interviews with members of familiesin which a gene for CF, SC or thalassemia has been identi-fied. Data collection consists primarily of focused inter-views with approximately 400 individuals from families inwhich at least one member has been identified as having agenetic disorder (or trait). In the most recent phase of theresearch, we are conducting focus groups selected toachieve stratified homogeneity around key social dimen-sions such as gender and relationship to disease. This isclarifying the social processes that facilitate and inhibitgenetic testing.

We are currently assessing the concerns expressed by re-spondents about the potential uses of genetic information.We find strong patterns of concern, often based on per-sonal experience, that genetic information may be used inways that family members perceive as dangerous and/ordiscriminatory. First among these concerns is fear of losingaccess to health care. Additional concerns include fear ofgenetic discrimination in employment and other types ofinsurance, particularly life insurance. Similar patterns ofconcern exist among members of each ethnic group, andare frequently the focus of attention among family mem-bers, but take somewhat different form within each cul-tural group. These concerns constitute a growing obstacleto widespread use of genetic testing.


Intellectual Property Issues inGenomics

Rebecca S. EisenbergUniversity of Michigan Law School; Ann Arbor, MI 48109313/763-1372, Fax: -9375, [email protected]

Intellectual property issues have been uncommonly salientin the recent history of advances in genomics. Beginningwith the filing of patent applications by NIH on the firstbatch of expressed sequence tags (ESTs) from the labora-tory of Dr. Craig Venter, each new development has beenmet with speculation about its strategic significance froman intellectual property perspective. Are ESTs of unknownfunction patentable, or is further work necessary beforethey satisfy patent law standards? Will patents on suchfragments promote commercial investment in product de-velopment, or will they interfere with scientific communi-

○ ○ ○ ○

ELSI


cation and collaboration and retard the overall researcheffort? Without patent rights, how may the owners of pri-vate cDNA sequence databases earn a return on their in-vestment while still permitting other investigators to obtainaccess to the information on reasonable terms? What arethe rights of those who contribute resources such as cDNAlibraries that are used to create the databases, and of thosewho identify sequences of interest out of the morass ofinformation in the databases by formulating appropriatequeries? Will the disclosure of ESTs in the public domainpreclude patenting of subsequently characterizedfull-length genes and gene products? And why would acommercial firm invest its own resources in generating anEST database for the public domain?

Two factors have contributed to the fascination with intel-lectual property in this setting. First is a perception thatsome pioneers in genomics have sought to claim intellec-tual property rights that reach beyond their actual achieve-ments to cover future discoveries yet to be made by others.For example, the controversial NIH patent applicationsclaimed rights not only in the ESTs that were actually setforth in the specifications, but also in the full-lengthcDNAs that might be obtained by using the ESTs asprobes, as well as in other, undisclosed fragments of thosegenes. More recently, private owners of cDNA sequencedatabases have set as a condition for access agreement tooffer the database owners licenses to any resulting intellec-tual property. These efforts to claim rights to the futurediscoveries of others raise issues about the fairness andefficiency of the law in allocating rewards and incentivesalong the path of cumulative innovation.

Second is the counterintuitive alignment of interests in thedebate. It was a public institution, NIH, that initially fa-vored patenting discoveries that some representatives ofindustry thought should remain unpatented, and it was amajor pharmaceutical firm, Merck & Co., that ultimatelytook upon itself the quasi-governmental function of spon-soring a university-based effort to place comparable infor-mation in the public domain. These topsy-turvy positionsin the public and private sectors raise intriguing questionsabout the proper roles of government and industry ingenomics research, and about who stands to benefit (andwho stands to lose) from the private appropriation of ge-nomic information.


AAAS Congressional FellowshipProgram

Stephen GoodmanThe American Society of Human Genetics; Bethesda, MD20814-3998301/571-1825, Fax: /530-7079, [email protected]

Few individuals in the genetics community are conversantwith federal mechanisms for developing and implementingpolicy on human genetics research. In 1995 the AmericanSociety of Human Genetics (ASHG), in conjunction withDOE, initiated an American Association for the Advance-ment of Science (AAAS) Congressional Fellowship Pro-gram to strengthen the dialogue between the professionalgenetics community and federal policymakers. The fellow-ship will allow genetics professionals to spend a year asspecial legislative assistants on the staff of members ofCongress or on congressional committees. Directed towardproductive scientists, the program is intended to attractindependent investigators.

In addition to educating the scientific community about thepublic policy process, the fellowship is expected to dem-onstrate the value of science-government interactions andmake practical contributions to the effective use of scien-tific and technical knowledge in government. The programincludes an orientation to legislative and executive opera-tions and a year-long weekly seminar on issues involvingscience and public policy.

Unlike similar government programs, this fellowship isaimed primarily at scientists outside government. It em-phasizes policy-oriented public service rather than obser-vational learning and designates its fellows as free agentsrather than representatives of their sponsoring societies.

One of the goals of DOE and ASHG is to develop a groupof nongovernmental professionals who will be equipped todeal with issues concerning human genetics policy devel-opment and implementation, particularly in the currentenvironment of health-care reform and managed care.Graduates of this program will serve as a resource for con-sultation in the development of public-health policy con-cerning genetic disease.

Fellowship candidates must demonstrate exceptional basicunderstanding of and competence in human genetics; holdan earned degree in genetics, biology, life sciences, or asimilar field; have a well-grounded and appropriatelydocumented scientific and technical background; have abroad professional background in the practice of humangenetics as demonstrated by national or international repu-tation; be cognizant of related nonscientific matters thatimpact on human genetics; exhibit sensitivity toward po-litical and social issues; have a strong interest and someexperience in applying personal knowledge toward the

○ ○ ○ ○

ELSI


solution of social problems; be a member of ASHG; bearticulate, literate, adaptable, and interested in working onlong-range public policy problems; be able to work with avariety of people of diverse professional backgrounds; andfunction well during periods of intense pressure.

The first fellow is working in the office of SenatorWellstone, Democrat from Minnesota, and devoting mostof his time to studying and commenting on health-care andscience issues.


A Hispanic Educational Program forScientific, Ethical, Legal, and SocialAspects of the Human Genome Project

Margaret C. Jefferson and Mary Ann Sesma1

Department of Biology and Microbiology; California StateUniversity; Los Angeles CA 90032213/343-2059, Fax: -2095, [email protected]://vflylab.calstatela.edu/hgp1Los Angeles Unified School District

The primary objectives of this grant are to develop, imple-ment, and distribute culturally competent, linguisticallyappropriate, and relevant curriculum that leads to Hispanicstudent and family interactions regarding the science, ethi-cal, legal, and social issues of the Human Genome Project.By opening up channels of familial dialogue between par-ents and their high school students, entire families can beexposed to genetic health and educational information andopportunities. In addition, greater interaction is anticipatedbetween students and teachers, and parents and teachers.In the Los Angeles Unified School District alone, over65% of the approximately 850,000 student enrollment arebilingual Hispanics. The 1990 census data revealed thatthe U.S.A. had a total population of 248,709,873, of which22,354,059 were Hispanics, and thus, there is a need formaterials to be disseminated throughout the U.S.A. that arerelevant and understandable to this population.

Student curriculum consists of BSCS HGP-ELSI curricu-lum available in both English and Spanish; supplementallesson plans developed and utilized by high school teach-ers in predominantly Hispanic classrooms that will beavailable via the World Wide Web; student-developed sur-veys that ascertain knowledge and perceptions of geneticsand HGP-ELSI in Hispanic and other ethnic communitiesin the greater Los Angeles area; the University of Wash-ington High School Human Genome Program exercises onDNA synthesis and sequencing; and career ladders andopportunities in genetics. The supplemental lesson plansare focused on four major units: the Cell; Mendelian Ge-netics and its Extensions; Molecular Genetics; and the Hu-man Genome Project and ELSI. The concise concepts un-derlying each unit are being utilized in two ways: (a) first,

the student activities emphasize logical, problem-solvingexercises; tools or technologies applicable to that concept;when and where appropriate, a focus on the Hispanicpopulation; and an understanding of the problems andcompassion for the families associated with learning ofgenetic diseases. (b) second, the concepts serve as thespringboard for the topics that the students include in sci-ence newsletters to their parents. In addition to on-campusactivities, we intend to arrange field trips and/or classroomdemonstrations of genetic and molecular biology techniquesby scientists and other experts. The speakers would also beasked to discuss career opportunities and the educationalrequirements needed to enter the specific careers presented.

The parent curriculum consists of two major activities.First the student-parent newsletter is designed to drawn theparents into the curriculum. Students write newsletters ona biweekly basis. Each newsletter relates to a student cur-riculum subunit and the specific subunit concepts. English,Spanish, social science as well as biology and chemistryteachers assist the students in its production. The other ma-jor activity that involves the parents are the parent focusgroups. Parents from each participating school are invitedto monthly focus groups at their specific campus. The fo-cus groups discuss issues related to genetics and health,legal and social issues as well as science issues that stemfrom the student newsletters. The discussions are in bothEnglish and Spanish with translators available. Links withother programs have been established.


Implications of the Geneticization ofHealth Care for Primary CarePractitioners

Mary B. Mahowald , John Lantos, Mira Lessick, RobertMoss, Lainie Friedman Ross, Greg Sachs, and Marion VerpDepartment of Obstetrics and Gynecology and MacLeanCenter for Clinical Medical Ethics; University of Chicago;Chicago, IL 60637312/702-9300, Fax: -0840, [email protected]://ccme-mac4.bsd.uchicago.edu/CCMEHomePage.html

“Geneticization” refers to the process by which advancesin genetic research are increasingly applicable to all areasof health care.1 Studies show that primary caregivers areoften deficient in their knowledge of genetics and genetictests, and the ethical, legal, and social implications of thisknowledge.2-6 Accordingly, this project prepares primarycaregivers who have no special training in genetics or ge-netic counseling to deal with the implications of the Hu-man Genome Project for their practice.

Phase I (fall 1995): Generic topics will be addressed by PIand Co-PIs with Robert Wood Johnson clinical scholarsand clinical ethics fellows, led by visiting or internal experts.

○ ○ ○ ○

ELSI


Topics: Goals, Methods, & Achievements of the HGP; Ty-pology of Genetic Conditions; Scientific, Clinical, Ethical,and Legal Aspects of Gene Therapy; Concepts of Disease;Genetic Disabilities; Gender and Socio-economic Differ-ences; Cultural and Ethnic Differences; Directive or Non-directive genetic counseling.

Speakers: Jeff Leiden; Julie Palmer; Dan Brock; Anita Sil-vers; Abby Lippman; James Bowman; Beth Fine

Phase II (Jan.–Mar. 1996): Teams of individuals, alltrained in the same area of primary care, will identify andaddress issues specific to their area, developing course out-lines, bibliography, and methodology based on grandrounds given by national expert.

Primar y Care Ar eaPediatrics: Genetics expert: Stephen Friend, Ethics Expert:

Lainie F. Ross + fellowObstetrics/Gynecology: Genetics expert: Joe Leigh

Simpson, Ethics Expert: Marion Verp + fellowMedicine: Genetics expert: Tom Caskey, Ethics Expert:

Greg Sachs + fellowFamily medicine: Genetics expert: Noralane Lindor, Ethics

Expert: Robert Moss + fellowNursing: Genetics expert: Mira Lessick, Ethics Expert:

Colleen Scanlon + fellow

Phase III (Apr.–May 1996): Policy issues will be identi-fied and addressed as above for all areas of primary care,based on grand rounds given by national expert.

Policy team: Genetics expert: Sherman Elias; Ethics ex-pert: John Lantos + trainee

Phase IV (Oct.–Dec. 1996): Presentation of content devel-oped to new group of fellows and scholars by each of theabove teams, followed by evaluation & revision.

Phase V (spring 1997): NATIONAL CONFERENCE andCME/CNE WORKSHOPS for primary caregivers, key-noted by Victor McKusick.


References1Lippman A., Prenatal genetic testing and screening, Amer J Law & Med

XVII, 15-50 (1991).2Hofman, K.J., Tambor, E.S., Chase, G.A., Geller, G., Faden, R.R., and

Holtzman, N.A., Physicians’ knowledge of genetics and genetic tests,Acad Med 68, 625-32 (1993).

3Holtzman, N.A., The paradoxical effect of medical training, J ClinEthics 2, 24142 (1992).

4Forsman, I, Education of nurses in genetics, Amer J of Hum Genetics552-58, (1988).

5Williams, J.D., Pediatric nurse practitioners’ knowledge of geneticdisease Ped Nursing 9, 1 19-21 (1983).

6George, J.B., Genetics: Challenges for nursing education, J Ped Nursing7, 5-8, (1992).

Nontraditional Inheritance: Geneticsand the Nature of Science; InstructionalMaterials for High School Biology

Joseph D. McInerney and B. Ellen FriedmanBiological Sciences Curriculum Study; Colorado Springs,CO 80918719/531-5550, Fax: -9104, [email protected]

There often is a gap between the public’s and scientists’views of new research findings, particularly if the public’sunderstanding of the nature of science is not sound. Largequantities of new evidence and consequent changes in sci-entific explanations, such as those associated with the Hu-man Genome Project and related genetics research, canaccentuate those different views. Yet an appealing second-ary effect of the unusually fast acquisition of data is thatour view of genetics is changing rapidly during a brieftime period, a relatively recent phenomenon in the field ofbiological sciences. This situation provides an outstandingopportunity to communicate the nature and methods ofscience to teachers and students, and indirectly to the pub-lic at large. The immediacy of new explanations of geneticmechanisms lets nontechnical audiences actually experi-ence a changing view of various aspects of genetics, and inso doing, gain an appreciation of the nature of science thatrarely is felt outside of the research laboratory.

The Biological Sciences Curriculum Study (BSCS) is de-veloping a curriculum module that brings this active viewof the nature and methods of science into the classroomvia examples from recent discoveries in genetics. We willdistribute this print module free of charge to interestedhigh school biology teachers in the United States.

The examples selected for classroom activities include theinstability of trinucleotide repeats as an explanation of ge-netic anticipation in Huntington disease and myotonic dys-trophy, and the more widespread genetic mechanism ofextranuclear inheritance, illustrated by mitochondrial in-heritance. Background materials for teachers discuss awider range of phenomena that require nontraditionalviews of inheritance, including RNA editing, genomic im-printing, transposable elements, and uniparental disomy.The genetics topics in the module share the common char-acteristic that they are not adequately explained by the tra-ditional, Mendelian concepts that are taught in introduc-tory biology at the high school level. In addition to updat-ing the genetics curriculum and communicating the natureof science, the module devotes one activity to the ethicaland social aspects of new genetics discoveries by challeng-ing students to consider the current reluctance to test as-ymptomatic minors for the presence of the HD gene.

The major challenge we have faced in this project is tomake relatively technical genetics information accessibleto high school teachers and students and to turn the often

○ ○ ○ ○

ELSI


passive treatment of scientific processes into an active ex-perience that helps students develop an understanding andappreciation of the nature and methods of science. Themodule is being field tested in classrooms across the coun-try. Evaluation data from the field test will guide final revi-sion of the module prior to distribution.


The Human Genome Project: Biology,Computers, and Privacy: Developmentof Educational Materials for HighSchool Biology

Joseph D. McInerney, Lynda B. Micikas, and B. EllenFriedmanBiological Sciences Curriculum Study; Colorado Springs,CO 80918719/531-5550, Fax: -9104, [email protected]

One of the challenges faced by the Human GenomeProject (HGP) is to handle effectively the enormous quan-tities and types of data that emerge as a result of progressin the project. The informatics aspect of the HGP offers anexcellent example of the interdependence of science andtechnology. In addition, the electronic storage of genomicinformation raises important questions of ethics and publicpolicy, many revolving around privacy.

The Biological Sciences Curriculum Study (BSCS) ad-dresses the scientific, technological, ethical, and policyaspects of genome informatics in the instructional programtitled The Human Genome Project: Biology, Computers,and Privacy. The program, intended for use in high schooland college biology, consists of software and a 150-pageprint module. The software includes two model databases:a research database housing anonymous data (map data,sequence data, and biological/clinical information) and aregistry that attaches names of 52 fictitious individuals(three kindreds) to genomic data. Students manipulate thedatabase software as they work through seven classroominquiries described in the print material. Also included is50 pages of background material for teachers.

An introductory activity lets students become familiar withthe software and dramatically demonstrates the advantagesof technology in analysis of sequence data. In activities 1and 2, students use the database to construct pedigrees andmake initial choices about privacy with regard to genetictests for their fictitious person. Activity 3 expands geneticanticipation, and in activities 4 and 5, students deal indepth with decision-making, ethics, and public policy, re-visiting their earlier decision about testing and data acces-sibility. A final extension activity shows how comparisonswith genomic data can be used to test hypotheses about thebiological relationships between individual humans and

about the evolutionary significance of DNA sequencesimilarities between different species.

External reviews and evaluation data from a field test in-volving 1,000 students in schools across the United Stateswere used to guide final revision of the materials. BSCSwill distribute the module free of charge to more than10,000 high school and college biology teachers.


Involvement of High School Students inSequencing the Human Genome

Maureen M. Munn , Maynard V. Olson, and Leroy HoodDepartment of Molecular Biotechnology; University ofWashington; Seattle, WA 98195206/616-4538, Fax: /685-7344, [email protected]

For the past two years, we have been developing a pro-gram that involves high school students in the excitementof genetic research by enabling them to participate in se-quencing the human genome. This program provides highschool teachers with the proper training, equipment, andsupport to lead their students through the exercise of se-quencing small portions of DNA. The participating class-rooms carry out two experimental modules, DNA synthe-sis (an introduction to DNA replication and the techniquesused to study it) and DNA sequencing. Both of these ex-periments consist of three parts–synthesizing DNA frag-ments using Sequenase and a biotinlabeled primer, benchtop electrophoresis using denaturing polyacrylamide gels,and colorimetric DNA detection that is specific for thebiotinylated primer. Students analyze their sequencing dataand enter it into a DNA assembly program. This year, incollaboration with Eric Lynch and Mary-Claire King fromthe Department of Genetics at the University of Washing-ton, the students will be sequencing a region of chromo-some 5q that may be involved in a form of hereditary deaf-ness.

Students also consider the ethical, legal and social issues(ELSI) of genome research in a unit that explores the topicof presymptomatic testing for Huntington’s disease (HD).This module was developed by Sharon Durfy and RobertHansen from the Department of Medical History and Eth-ics at the University of Washington. It provides a scenarioabout a family that carries the HD allele, descriptions ofthe clinical and genetic aspects of the disorder, an exercisein drawing pedigrees and an autoradiograph showing thePCR assay used to detect HD. Students use an ethicaldecision-making model to decide whether, as a characterfrom the scenario, they would be tested presymptomati-cally for the HD allele. Through this experience, they de-velop the skills to define ethical issues, ask and researchthe relevant questions about a particular topic and makejustifiable ethical decisions.

○ ○ ○ ○

ELSI


In the first two years of this program, our focus was on thedevelopment of robust, classroom friendly modules thatcan be presented in up to six classes at one time. This yearwe will focus on disseminating this program to local, re-gional, and national sites. During a week-long workshop inJuly, 1995, we trained an additional thirteen high schoolteachers, bringing our current number to twenty teachers atthirteen schools. We have recruited local scientists to act asmentors to each of the schools and provide classroom sup-port. On the regional level, four of our teachers are fromoutside the greater Seattle area and will be supported dur-ing the classroom experiments by scientists in their region.We have presented this program at national meetings andworkshops, including the Human Genome Teacher Net-working Project Workshop in Kansas City, KS (June,1995) and the meeting of the National Association of Biol-ogy Teachers in Phoenix, AZ (October 1995). We havealso distributed our modules to teachers and scientiststhroughout the nation to encourage the development ofsimilar programs. This year we will also develop and pilota module using automated sequencing. This will enabledistant schools to participate in the program by providingthem with the option of sending their DNA samples to theUW genome center for electrophoresis .

While we hope the human genome sequencing experiencewill interest some students in science careers, a broadergoal is to encourage high school students to think con-structively and creatively about the implications of scien-tific findings so that the coming generation of adults willmake judicious decisions affecting public policies.


The Gene Letter: A Newsletter onEthical, Legal, and Social Issues inGenetics for Interested Professionalsand Consumers

Philip J. Reilly , Dorothy C. Wertz, and Robin J.R. Blattl

The Shriver Center for Mental Retardation; Division ofSocial Science, Ethics and Law; Waltham, MA 02254617/642-0230, Fax: /893-5340, [email protected] at Massachusetts Department of Public Health, Bos-ton, MAhttp://www.shriver.org

We propose to develop a newsletter on ELSI-related issuesfor dissemination to a broad general audience of profes-sionals and consumers. No such focussed public newslettercurrently exists. Entitled The Gene Letter, the newsletterwill be distributed monthly on-line, through the Internet.Updated weekly on the Internet, it will be poised to reactin a timely fashion to new developments in science, law,medicine, ethics, and culture. The newsletter does not pro-pose to provide comprehensive education in genetics for

the American public, but rather to begin an informationnetwork that interested people can use for further informa-tion. It will be the most widely-distributed newsletter onELSI genetics in the world, with the largest consumerreadership. Features will be largely informational and willinclude new scientific/medical developments and attendantELSI issues, new court decisions, legislation, and regula-tions, balanced responses to new concerns in the media,and new developments related to health that may be of in-terest to health care providers and consumers. Featureswill present balanced opinions. An editorial board will re-view each issue, prior to publication, for cultural sensitiv-ity, emphasis, balance, and concerns of persons with dis-abilities. The Gene Letter will also include factual infor-mation on upcoming events, new ELSI research, where tofind genetics on the Internet, new publications (annotated),and where to find further information about each feature.Readers will be invited to send letters, queries, news, bibli-ography, comments, and consumer concerns either on TheGene Letter Internet chatroom or in hard copy. A hardcopy of the first on-line issue will be used to assess read-ers’ needs and interests. It will be distributed to 500 com-munity college students representing blue-collar ethnicgroups, and to 2000 members of a broad general audience.

A special evaluation of readers’ knowledge and ethical/social concerns raised by The Gene Letter will take placeat the end of the second year in order to assess outcome. Itis our intention that The Gene Letter become self-support-ing after two years.


The DNA Files: A NationallySyndicated Series of Radio Programson the Social Implications of HumanGenome Research and Its Applications

Bari Scott, Matt Binder, and Jude ThilmanGenome Radio Project; KPFA-FM; Berkeley, CA 94704510/848-6767 ext 235, Fax: /883-0311, [email protected]

The DNA Files is a series of nationally distributed publicradio programs furthering public education on develop-ments in genetic science. Program content is guided by adistinguished body of advisors and will include the voicesof prominent genetic researchers, people affected by ad-vances in the clinical application of genetic medicine,members of the biotech industry, and others from relatedfields. They will provide real-life examples of the complexsocial and ethical issues associated with new discoveries ingenetics. In addition to the general public radio audience,the series will target educators, scientists, and involvedprofessionals. Ancillary educational materials will be dis-tributed in paper and digital form through over two dozen

○ ○ ○ ○

ELSI


collaborative organizations and fulfillment of listener re-quests.

“DNA and Behavior: Is Our Fate Written in Our Genes?”is the pilot documentary for the series, scheduled for re-lease in early 1996. The show will help the lay person un-derstand and evaluate recent research in the area of behav-ioral genetics. Recently, we’ve seen news media reports onnewly discovered genetic factors being related to behav-iors such as alcoholism, mental illness, sexual orientationand aggression. This program will look at several ex-amples of these “genetic factors” and evaluate thestrengths and weaknesses of various methodologies in-volved in the research; and introduce such controversialissues as the re-emergence of a eugenics movement basedon theoretical suppositions drawn from recent work in be-havioral genetics.

With information linking major diseases such as breastcancer, colon cancer, and arteriosclerosis to genetic fac-tors, new dangers in public perception emerge. Manypeople who hear about them mistakenly conclude thatthese diseases can now be easily diagnosed and evencured. On the other end of the public perception spectrum,unfounded fears of extreme, and highly unlikely, conse-quences also appear. Will society now genetically engineerwhole generations of people with “designer genes” offer-ing more “desirable physical qualities”? The DNA Fileswill ground public understanding of these issues in reality.“DNA and the Law” reviews the scientific basis for ge-netic fingerprinting and looks at cases of alleged geneticdiscrimination by insurance companies, employers andothers. This program also looks at disputes over paternity,intellectual property rights, the commercialization of ge-netic information, informed consent and privacy issues.Other shows include “The Search for a Breast CancerGene,” “Prenatal Genetic Testing and Treatment,” “Evolu-tion and Genetic Diversity,” “Sickle-Cell Disease andThalassemia: Hope for a Cure,” and “Theology, Mythol-ogy and Human Genetic Research.”


Communicating Science in PlainLanguage: The Science+ Literacy forHealth: Human Genome Project

Maria Sosa, Judy Kass, and Tracy GathAmerican Association for the Advancement of Science;Washington, DC 20005202/326-6453, Fax: /371-9849, [email protected]

Recent literacy surveys have found that a large number ofadults lack the skills to bring meaning to much of what iswritten about science. This, in effect, denies them access tovital information about their health and well-being. To ad-

dress this need, the American Association for the Advance-ment of Science (AAAS) is developing a 2-year project toprovide low-literate adults with the background knowledgenecessary to address the social, ethical, and legal implica-tions of the Human Genome Project.

With its Science + Literacy for Health: Human GenomeProject, AAAS is using its existing network of adult edu-cation providers and volunteer science and health profes-sionals to pursue the following overall objectives: (1) todevelop new materials for adult literacy classes, includinga high-interest reading book and accompanying curricu-lum, an implementation framework, a short video provid-ing background information on genetics, a database of re-sources, and fact sheets that will assist other organizationsand researchers in preparing easy-to-read materials aboutthe human genome project, and (2) to develop and conducta campaign to disseminate project materials to librariesand community organizations carrying out literacy pro-grams throughout the United States.

Because not every low-literate adult is enrolled in a lit-eracy class, our model for helping scientists communicatein simple language will have impact beyond classroomsand learning centers. In preliminary contacts, communitygroups providing health services have indicated that theproposed materials are not only desirable but needed; in-deed such groups often receive requests for information onheredity and genetics. The module developed by AAASshould enable other medical and scientific organizations tocommunicate more effectively with economically disad-vantaged populations, which often include a large numberof low-literate individuals.


The Community College Initiative

Sylvia J. Spengler and Laurel EgenbergerLawrence Berkeley National Laboratory; Berkeley, CA94720510/486-4879, Fax: -5717, [email protected]://csee.lbl.gov/cup/ccibiotech/Index.html

The Community College Initiative prepares communitycollege students for work in biotechnology. A combinedeffort of Lawrence Berkeley National Laboratory (LBNL)and the California Community Colleges, we aim to de-velop mechanisms to encourage students to pursue sciencestudies, to participate in forefront laboratory research, andto gain work experience. The initiative is structured to up-grade the skills of students and their instructors throughfour components.

Summer Student Workshops: Four weeks summer resi-dential programs for students who have completed the firstyear of the biotechnology academic program. Ethical, legal

○ ○ ○ ○

ELSI


and social concerns are integrated into the laboratory exer-cises and students learn to identify commonly shared val-ues of the scientific community as well as increase theirunderstanding of issues of personal and public concern.

Teacher Workshop Training: Seminars for biotechnologyinstructors to improve, upgrade, and update their under-standing of current technology and laboratory practices,with emphasis on curriculum development in current top-ics in ethical, legal, and social issues in science.

Sabbatical Fellowships: For community college instruc-tors to provide investigative and field experience in re-search laboratories. During the fellowship, teachers alsoassist in development of student summer research activi-ties.

Summer Faculty-Student Teams: Post-fellowship facultyand biotechnology students who have finished their secondyear of study team on a research project.

Genome Educators

Sylvia Spengler and Janice MannHuman Genome Program; Life Sciences Division;Lawrence Berkeley National Laboratory; Berkeley, CA94720510/486-4879, Fax: -5717, [email protected] [email protected]://www.lbl.gov/Education/Genome

Genome Educators is an informal network of educationalprofessionals who have an active interest in all aspects ofgenetics research and education. This national group in-cludes scientists, researchers, educational curriculum de-velopers, ethicists, health professionals, high school teach-ers and instructors at college and graduate levels, and oth-ers in occupations affected by genetic research.

Genome Educators is a unique collaborative effort dedi-cated to sharing information and resources to further un-derstanding of current advances in the field of genetics.Seminars, workshops, and special events are sponsored atfrequent intervals. Genome Educators maintains an activeWorld Wide Web site (URL: http://www.lbl.gov/Educa-tion/Genome). This site contains a calendar of events, di-rectory of participating genome educators, and informationabout educational resources and reference tools. Participat-ing genome educators may publish articles and talks ofinterest at this site. In addition, a monitored discussiongroup is maintained to facilitate dialog and resource shar-ing among participants.

Getting the Word Out on the HumanGenome Project: A Course forPhysicians

Sara L. Tobin and Ann Boughton1

Department of Biochemistry and Molecular Biology;Center for Biomedical Ethics; Stanford University; PaloAlto, CA 94304-1709415/725-2663, Fax: -6131, [email protected] Graphics; Oklahoma City, OK 73118

Progressive identification of new genes and implicationsfor medical treatment of genetic diseases appear almostdaily in the scientific and medical literature, as well as inpublic media reports. However, most individuals do notunderstand the power or the promise of the current explo-sion in knowledge of the human genome. This is also trueof physicians, most of whom completed their medicaltraining prior to the application of recombinant DNA tech-nology to medical diagnosis and treatment. This lack oftraining prevents physicians from appreciating many of therecent advances in molecular genetics and may delay theiracceptance of new treatment regimens. In particular, physi-cians practicing in rural communities are often limited intheir access to resources that would bring them into themainstream of current molecular developments. Thisproject is designed to fill two important functions: first, toprovide solid training for physicians in the field of molecu-lar medical genetics, including the impact, implications,and potential of this field for the treatment of human dis-ease; second, to utilize physicians as informed communityresources who can educate both their patients and commu-nity groups about the new genetics.

We propose to develop a flexible, user-friendly, interactivemultimedia CD-ROM designed for continuing educationof physicians in applications of molecular medical genet-ics. To initiate these objectives, we will develop the designof the CD and will produce a prototype providing a de-tailed presentation of one of the four training areas. Theseareas are (1) Genetics, including DNA as a molecular blue-print, chromosomes as vehicles for genetic information,and patterns of inheritance; (2) Recombinant techniques,stressing cloning and analytical tools and techniques ap-plied to medical case studies; (3) Current and future clini-cal applications, encompassing the human genome project,technical advances, and disease diagnosis and prognosis;and (4) Societal implications, focusing on approaches topatient counseling, genetic dilemmas faced by patients andpractitioners, and societal values and development of anethical consensus. Area (2) will be presented in the proto-type.

The CD format will permit the use of animation, video,and audio, in addition to graphic illustrations and photo-graphs. We will build on our existing base of computergenerated illustrations. A hypertext glossary, user notes,

○ ○ ○ ○

ELSI


practice tests, and customized settings will be utilized totailor the CD to the needs of the user. Brief,multiple-choice examinations will be evaluated for con-tinuing medical education credits by the Office of Continu-ing Medical Education. The CD will be programmed topermit updates of scientific and medical advances eitherby downloading from the Internet or from a disc availableby subscription.

This is a cooperative project involving individuals withdocumented expertise in teaching of molecular medicalgenetics, continuing medical education, graphic design,and CD-ROM production. The content of the CD will besupervised by a scientific board of directors. We presentmechanisms for the evaluation of the CD by rural Okla-homa physicians. Arrangements have been made for distri-bution of the CD by a national publisher of medical andscientific materials. This CD will provide a powerful toolto educate physicians and the public about the power andpotential of the human genome project for the benefit ofhuman health.


The Genetics Adjudication ResourceProject

Franklin M. ZweigEinstein Institute for Science, Health, and the Courts;Bethesda, MD 20814301/961-1949, Fax: /913-0448, [email protected]://www.ornl.gov/courts

The Einstein Institute for Science, Health, and the Courtsis preparing the foundation for a new utility needed to pre-pare the nation’s 21,000 courts to adjudicate the geneticsand ELSI-related issues that foreseeably will rush into thecourtroom as the Human Genome Project completes itsgenomic mapping and sequencing mission during the nextten years. This project initiates practical collaborationamong courts, legal and policy-making institutions, andscience centers leading to modalities for understanding thescientific validity of claims, and for the resolution of ethi-cal, legal, and social disputes arising within the genetictesting and gene therapy contexts. Our objective over theensuing decade is to facilitate genetic testing and genetherapy dispute management, and to avoid to the extentpossible the confusion that characterized adjudication offorensic DNA technologies during the decade just ended.

The outlines of a genetics adjudication utility were givenform by the 1995 Working Conversation on Genetics, Evo-lution, and the Courts, involving 37 federal and statejudges and others in science and policymaking leadershippositions from across the nation. The courts are becomingaware of genetics, molecular biology, and their applica-tions, and judges want public confidence to be maintained

as the profound and complex issues set in motion by theHGP begin the long course of litigation. Modalities forunderstanding the underpinning science are needed, aswell as instrumentalities to assure that the best cases areactually filed and pursued. Because the courts are thefront-line for resolving disputes, creative lawyering willassure an abundance of lawsuits. Many such lawsuits willrequest the courts to make policy judgments, perhaps bestundertaken by state legislatures and Congress. Accord-ingly, a new adjudication utility should provide forums forjudicial/legislative exchange, preparatory deliberations inanticipation of pressure to make rushed policies under con-ditions of great social uncertainty in the wake of humangenetics progress.

EINSHAC will provide a design, planning, communica-tions, and implementation center for a multipurpose re-source project available to the courts. It will undertakeover an 18 month period the following tasks, pilot-testingeach and assessing the best organizational locales for thosethat exhibit promise:

1. Judicial Education in Genetics & ELSI-Related Issuesfor six Judicial Branch leadership associations and ninemetropolitan courts—aimed at 1,000 judges—in conjunc-tion with scientific faculty and coaches mobilized byDOE/national laboratories and the American Society forHuman Genetics.

2. Judicial Digital Electronic Collegium—technologicalmodernization of the courts community by providing ac-cess to ELSI and genetics information through Internetresources.

3. Amicus Brief Development Trust Fund—a process andresources to support law development at the state and fed-eral appeals courts level.

4. Genetics Indigent Party Trust Fund—a process and re-sources at the state and federal trial level to sustain merito-rious civil cases holding promise of effective law develop-ment.

5. Establishment of a Pro-Bono Legal Services Clearing-house—a personal and on-line referral resource for per-sons seeking representation for genetics and ELSI-relatedcases.

6. Access to Neutral Expert Witnesses—advisors to courtsencountering particularly complex cases deemed right forthe judicial exercise of Federal Rule of Evidence 706 andits State counterparts.

7. Pilot of Judicial/Legislative ELSI Policy Forums—pro-vision of neutral staff and coordination in threemid-Atlantic states considering legislation related to healthcare, insurance, privacy, medical records.

○ ○ ○ ○

ELSI


8. National Training Center for Minority Justice Person-nel—facilitating a leadership preparation program for thenation’s minority court-related personnel in a consortiumarrangement with the Ruffin Society of Massachusetts, theCollege of Criminal Justice at Northeastern University,and the Flaschner Judicial Institute.

The Project actively involves judges, scientists, and promi-nent lawyers. It will report to the EINSHAC Board of Di-

○ ○ ○ ○

ELSI

rectors that includes prominent judges, justices and scien-tists, several of whom participated in the 1995 WorkingConversation on Genetics, Evolution and the Courts. As acontinuing guidance forum, EINSHAC will conduct aWorking Conversation followup in Orleans, Cape Cod inJuly, 1996.




Alexander Hollaender DistinguishedPostdoctoral Fellowships

Linda Holmes and Eugene SpejewskiOak Ridge Institute for Science and Education; Oak Ridge,TN 37831-0117423/576-3192, Fax: /241-5220, [email protected] [email protected]://www.orau.gov/oher/hollaend.htm

The Alexander Hollaender Distinguished Postdoctoral Fel-lowships, sponsored by the Department of Energy (DOE),Office of Health and Environmental Research (OHER),support research in the fields of life, biomedical, and envi-ronmental sciences. Since the DOE Human Genome Dis-tinguished Postdoctoral Fellowships and DOE GlobalChange Distinguished Postdoctoral Fellowships both hadtheir last application cycles in FY 1995, the Hollaenderprogram is now open to recent PhD graduates in the fieldsof human genome and global change, as well.

Fellowships of up to 2 years are tenable at any DOE, uni-versity, or private laboratory providing the proposed ad-viser at that laboratory receives at least $150,000 per yearin support from OHER. Fellows earn stipends of $37,500the first year and $40,500 the second. To be eligible, appli-cants must be U.S. citizens or permanent residents at thetime of application, and must have received their doctoraldegrees within two years of the earliest possible startingdate, which is May 1 of the appointment year.

The Oak Ridge Institute for Science and Education(ORISE), administrator of the fellowships, prepares anddistributes program literature to universities and laborato-ries across the country, accepts applications, convenes apanel to make award recommendations, and issues stipendchecks to fellows. The review panel identifies finalistsfrom which DOE selects the award winners. Deadline forthe FY 1999 fellowship cycle is January 15, 1998. Formore information or an application packet, contact LindaHolmes at the Oak Ridge Institute for Science and Educa-tion, P. O. Box 117, Oak Ridge, TN 37831-0117 (423/576-9975, Fax: /241-5220).


Human Genome ManagementInformation System

Betty K. Mansfield, Anne E. Adamson, Denise K. Casey,Sheryl A. Martin, John S. Wassom, Judy M. Wyrick,Laura N. Yust, Murray Browne, and Marissa D. MillsLife Sciences Division; Oak Ridge National Laboratory;Oak Ridge, TN 37830423/576-6669, Fax: /574-9888, [email protected]://www.ornl.gov/hgmis

The Human Genome Management Information System(HGMIS), established in 1989, provides information aboutthe international Human Genome Project in print andWorld Wide Web formats to both technical and generalaudiences. HGMIS is sponsored by the Human GenomeProgram Task Group of the DOE Office of Biological andEnvironmental Research to help fulfill DOE’s commitmentto informing scientists, policymakers, and the public aboutthe program’s funded research and the context in which theresearch is conducted. Several HGMIS products, includingthe Web sites and newsletter, have won technical and elec-tronic communication awards.

HGMIS goals center on facilitating research at the inter-face of genomics and other biological disciplines that seekrevolutionary solutions to biological, environmental, andbiomedical challenges. By communicating informationabout the Human Genome Project and its impact, HGMISincreases the use of project-generated resources, reducesduplicative research efforts, and fosters collaborations andcontributions to biology from other research disciplines.

Furthermore, communicating scientific and societal issuesto nonscientist audiences contributes to increased scienceliteracy, thus laying a foundation for more informed deci-sion making and public-policy development. For example,since 1995 HGMIS has been participating in a project toeducate the judiciary about the basics of genetics and genetesting. The aim is to prepare judges for the flood of casesinvolving genetic evidence that soon will enter the nation’scourtrooms.

Information Resources

In keeping with its goals, HGMIS produces the followinginformation resources in print and on the Web:

Human Genome News (HGN). A quarterly forum for in-terdisciplinary information exchange, HGN uniquely pre-sents a broad spectrum of topics related to the Human Ge-nome Project in a single publication. Articles feature topicsthat include project goals, progress, and direction; avail-able resources; applications of project data and resourcesto provide a better understanding of biological processes;related or spinoff programs; medical uses of genome data;ethical, legal, and social considerations; legislative up-dates; other publications; meeting calendars; and fundinginformation. Most HGN articles also contain sources ofadditional information. In May 1997, DOE acknowledgedthe newsletter’s value by presenting an exceptional serviceaward to HGN’s managing editor at a symposium celebrat-ing 50 years of biological and environmental research.

Among 14,000 domestic and foreign HGN subscribers aregenome and basic researchers at universities, nationallaboratories, nonprofit organizations, and industrial facili-ties; educators; industry representatives; legal personnel;ethicists; students; genetic counselors; medical profession-

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Infrastructure


als; science writers; and other interested individuals. All 41issues of HGN, indexed and searchable, are accessible viathe HGMIS Web site.

Other Publications. HGMIS also produces the DOEPrimer on Molecular Genetics, progress reports on theDOE Human Genome Program, Santa Fe contractor-grantee workshop proceedings, 1-page topical handouts,and other related resource documents. Expanded and re-vised by HGMIS from an earlier DOE document, the DOEPrimer on Molecular Genetics continues to be in demand.It is used as a handout for genome centers; a resource fornew staff training by companies that make products forgenome scientists; and an educational tool for teachers,genetic counselors, and such organizations as high schools,universities, and medical schools for student andcontinuing-education curricula. More than 35,000 hardcopies have been distributed. The primer also is availablein several formats at the HGMIS Web site, including anAdobe Acrobat version that can be used to print “origi-nals” from users’ printers.

Distribution of Documents. HGMIS has distributed morethan 65,000 copies of items requested by subscribers,meeting attendees, and managers of genetics meetings andeducational events. These items include HGN, programand workshop reports, DOE-NIH 5-year plans, DOEPrimer on Molecular Genetics, and To Know Ourselves.On request, HGMIS supplies multiple copies of publica-tions for meetings and educational purposes.

Electronic Communications. In November 1994, HGMISbegan producing a comprehensive, text-based Web servercalled Human Genome Project Information, which is de-voted to topics relating to the science and societal issuessurrounding the genome project. In July 1997, this site wasdivided to better serve the two diverse audience categoriesthat represent the majority of users: scientists and the pub-lic. The sites contain more than 1700 text files that are ac-cessed over 1.2 million times each year. Each month,about 10,000 host computers connect to the HGMIS sitesdirectly and through more than 1000 other Web sites. Inaddition, HGMIS links to the National Institutes of Healthand international Human Genome Organisation sites, aswell as to sites dedicated to education and to the ethical,legal, and social implications of the Human GenomeProject.

All HGMIS publications are published on the Web site,along with such DOE-sponsored documents as YourGenes, Your Choices; the Genetic Privacy Act; and histori-cal and other documents pertaining to the Human GenomeProject. HGMIS collaborates with the Einstein Institute forScience, Health, and the Courts to produce CASOLM, theonline magazine for judicial education in genetics and bio-medical issues. HGMIS also maintains the Genetics sec-tion of the Virtual Library from CERN (Switzerland) and

the DOE Human Genome Program pages and moderatesthe BioSci Human Genome Newsgroup.

Information Source

HGMIS answers individual questions and supplies generalinformation about the Human Genome Project by tele-phone, fax, and e-mail and, as appropriate, links scientistswith questions to appropriate Human Genome Project con-tacts. HGMIS staff exchange ideas and suggestions withinvestigators, industry representatives, and others whenattending occasional scientific conferences andgenome-related meetings and displaying the DOE HumanGenome Project traveling exhibit. HGMIS staff also makepresentations on the Human Genome Project to educa-tional, judicial, and other groups.

HGMIS resources serve as a primary source for the popu-lar media and for discipline-specific publications thatbroaden the distribution of genome project information byextracting and reprinting from HGMIS resources and bylinking to various parts of the HGMIS Web site.

HGMIS continuously monitors changes in the direction ofthe international Human Genome Project and searches forways to strengthen the content relevancy of the newsletter,the Web site, and other services.


Human Genome ProgramCoordination

Sylvia J. SpenglerLawrence Berkeley National Laboratory; Berkeley CA94720510/486-4879, Fax: -5717, [email protected]://www.lbl.gov/Education/ELSI

The DOE Human Genome Program of the Office ofHealth and Environmental Research (OHER) has devel-oped a number of tools for management of the Program.Among these was the Human Genome Coordinating Com-mittee (HGCC), established in 1988. In 1996, the HGCCwas expanded to a broader vision of the role of genomictechnologies in OHER programs, and the name waschanged to reflect this broadening. The HGCC is now theBiotechnology Forum. The Forum is chaired by the Asso-ciate Director, OHER. Members of the Human GenomeProgram Management Task group are ex officio members,as are members of the Health and Environmental ResearchAdvisory Committee’s subcommittee on the Human Ge-nome. Responsibilities of the Forum include: assistingOHER in overall coordination of DOE-funded genomeresearch; facilitating the development and dissemination ofnovel genome technologies; recommending establishmentof ad hoc task groups in specific areas, such as informatics,

○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Infrastructure


technologies, model organisms; and evaluation of progressand consideration of long-term goals. Members also serveon the Joint DOE-NIH Subcommittee on the Human ge-nome, for interagency coordination. The coordinationgroup also participates in interface programs with otherfacilities and provides scientific support for developmentof other OHER goals, as requested.

Support of Human Genome ProgramProposal Reviews

Walter WilliamsEducation/Training Division; Oak Ridge Institute forScience and Education; Oak Ridge, TN 37831-0117423/576-4811, Fax: /241-2727, [email protected]

The Oak Ridge Institute for Science and Education(ORISE), operated by Oak Ridge Associated Universities,provides assistance to the DOE Office of Health and Envi-ronmental Research in the technical review of proposalssubmitted in response to solicitations by the DOE HumanGenome Program. ORISE staff members create and main-tain a database of all proposal information; including ab-stracts, relevant names and addresses, and budget data.This information is compiled and presented to proposalreviewers. Before review meetings, ORISE staff membersmake appropriate hotel and meeting arrangements, provideeach reviewer with proposal copies and evaluation guide-lines, and coordinate reviewer travel and honoraria pay-ment. Onsite meeting support includes collecting all re-viewer evaluation forms and scores, entering reviewerscores into the database, preparing appropriate reports,providing onsite computer support, and handling all logis-tical issues. Other support includes assistance with pro-gram advertising and preparation of reviewer commentsfollowing each review. ORISE may also assist with pre-and post-review activities related to conferences, seminars,and site visits.


Former Soviet Union Office of Healthand Environmental Research Program

James WrightEducation/Training Division; Oak Ridge Institute forScience and Education; Oak Ridge, TN 37831-0117423/576-1716, Fax: /241-2727, [email protected]

The Former Soviet Union Office of Health and Environ-mental Research Program, sponsored by the U.S. Depart-ment of Energy, Office of Health and Environmental Re-search, recognizes outstanding scientists in the field ofhealth and environmental research from the independentstates of the former Soviet Union. The program fosters theinternational exchange of new ideas and innovative ap-proaches in health and environmental research; strengthensties and encourages continuing collaboration among Rus-sians and U.S. scientists; and establishes and maintainsenvironmental research capability in the former SovietUnion. The program has supported more than 23 Russianprincipal investigators and approximately 110 other re-search associates in Moscow, St. Petersburg, andNovosibirsk. More importantly, the program has enabledmany high quality Russian biological, genome informatics,physical mapping and mutagenesis detection, human ge-netics,, biochemistry, DNA sequencing technology, proteinanalysis, molecular genetics, and other related researchinfrastructures to continue operating in an uncertain eco-nomic environment.


○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Infrastructure



An Engineered RNA/DNA Polymeraseto Increase Speed and Economy ofDNA Sequencing

Mark W. KnuthPromega Corporation; Madison, WI 53711-5399608/274-4330, Fax: /277-2601

DNA sequence information is the cornerstone for consider-able experimental design and analysis in the biologicalsciences. The proposed studies will focus on advancingDNA sequencing by creating a new enzyme that eliminatesthe need for an oligonucleotide primer to initiate DNAsynthesis at a defined site, and that can use dideoxy nucle-otides for chain termination. The new method should re-duce the time and cost required to obtain DNA sequencesand enhance the speed and cost effectiveness of currentDNA sequencing technologies. Phase I studies will focuson purifying mutant T7 RNA polymerases known to incor-porate dNTPs into DNA chains, developing protocols forrapid small scale mutant enzyme purification, evaluatingthe purified mutants for properties relevant to DNA se-quencing, developing facile mutagenesis schemes and pro-ducing mutant RNA/DNA polymerases with altered pro-moter recognition. The results from phase I will providethe foundation for Phase II research, which will focus onrefining properties of the mutant by: (1) expanding thenumber of mutations examined using the purification pro-tocols, assays, and mutagenesis screening methods devel-oped in Phase I and (2) examining the effect of each muta-tion on enzymatic properties important to DNA sequencingapplications, and (3) optimizing conditions for sequencingperformance. In Phase III, Promega will commercialize thenew mutant enzymes through its own extensive distribu-tion network and by collaborating with major instrumenta-tion firms to adapt the technology to automated DNA se-quencing systems.


Directed Multiple DNA Sequencing andExpression Analysis by Hybridization

Gualberto RuanoBIOS Laboratories, Inc.; New Haven, CT 06511800/678-9487 or 203/773-1450, Fax: 800/315-7435 or203/562-9377

The overall goal of this project is to develop molecularresources with direct applications to either DNA sequenceanalysis or gene expression analysis in multiplexed for-mats using sequential hybridization of Peptide NucleicAcid (PNA) oligomer probes. PNA oligomers hybridizemore stably and specifically to cognate DNA targets thanconventional DNA oligonucleotides. The Phase I projectdiscussed here is concerned with development of PNAprobe technology having direct application either to thedirected sequencing process or to gene expression profil-ing. With regard to directed sequencing, we seek improve-ments in the three multiply repeated steps associated withthis process, namely (1) probe assembly, (2) sequencingreactions, and (3) gel electrophoresis. In PNA hybridiza-tion sequencing, sequences are generated directly from thetemplate by multiplex DNA sequencing using anchorprimers known to have frequent annealing sites. Electro-phoresis is performed en masse for each anchor primerreaction, blotted to nylon membranes and individual se-quences are selectively exposed by iterative hybridizationto specific 8-mer PNA probes derived from sequences sta-tistically over-represented in expressed DNA and obtainedfrom a pre-synthesized library. Additionally, the same PNAlibrary can be used as a source of hybridization probes forquerying expression patterns of specific genes in any cellline or tissue. Specific gene expression can be monitoredby coupling gene-specific RT-PCR with hybridizationwhen cDNA products are separated by gel electrophoresisand blotted to nylon membranes. Patterns of gene expres-sion are then resolved by hybridization using PNA oligo-mers. Bands corresponding to specific genes can bedeconvoluted using sequence information from RT-PCRprimers and PNA probes. Higher throughput expressionanalysis can be achieved by multiplexed gel electrophore-sis, blotting and iterative probing of RT-PCR reactionswith individual PNA probes.


○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Small Business Innovation Research

1996 Phase I


1996 Phase II○ ○ ○ ○

SBIR

A Graphical Ad Hoc Query InterfaceCapable of Accessing HeterogeneousPublic Genome Databases

Joseph LeoneCyberConnect Corporation; Storrs, CT 06268860/486-2783, Fax: /429-2372

The interoperability of public genome databases is ex-pected to be crucial in making the Human Genome Projecta success. This project will develop software tools inwhich users in the genome community can learn or exam-ine public genome database schemes in a relatively shorttime and can produce a correct Structured Query Language(SQL) expression easily. In Phase I, a concept system wasconstructed and the effectiveness of formulating ad hocqueries graphically was demonstrated. Phase II will focuson transforming the concept system into a product that isrobust and portable. Two types of computer programs willbe developed. One is a client program which is to be dis-tributed to community users who intend to access publicgenomic databases and link them with local databases. Theother is a server program and a suite of software tools de-signed to be used by those genome centers which intend tomake their databases publicly accessible.


Low-Cost Automated Preparation ofPlasmid, Cosmid, and Yeast DNA

Tuyen Nguyen, Randy F. Sivila, Joshua P. Dyer, andWilliam P. MacConnellMacConnell Research Corporation; San Diego, CA 92121619/452-2603, Fax: -6753

MacConnell Research currently manufactures and sells alow cost automated bench-top instrument that can purifyup to 24 samples of plasmid DNA simultaneously in onehour at a cost of $0.65 per sample and under $8000 for theinstrument. The patented instrument uses a form of agar-ose gel electrophoresis to purify the plasmid DNA andelectroelutes into approximately a 20 +l volume. The in-strument has many advantages over other robotic andmanual methods including the fact that is it two timesfaster, at least six times less expensive, much smaller insize, easier to operate, less cost per sample, and results inDNA pure enough for direct use in fluorescent automatedsequencing. The instrument process begins with bacterialculture which is loaded directly into a disposable cassettein the machine.

In Phase II work we are developing an instrument whichsimultaneously purifies plasmid DNA from up to 192 (2X 96) bacterial samples in 1.5 hours. Prototypes of thisinstrument thus far constructed have allowed the purifi-cation of 3–7 micrograms of high purity plasmid DNAper lane from 1.5 ml of bacterial culture. We have at-tempted to optimize all of the: instrument electrophoreticrun parameters, lysis chemistry, lysis reagent deliverydevices, reagent storage at room temperature, desaltingprocesses and overall instrument mechanical and elec-tronic control. Instrument prototypes have also beenused to prepare cosmid or yeast DNA in quantities of 1–5 micrograms per cassette lane. Trials thus far haveyielded plasmid DNA of sufficient purity for direct usein automated fluorescent and manual sequencing as wellas other molecular biology protocols. We have studiedthe purity of the resulting DNA when directly sequencedon a Licor 4000 Long Reader and ABI 373A automatedDNA sequencers. Results from the Licor 4000 instru-ment give routine read lengths of >850 base pairs with98% accuracy while ABI 373A reads generally exceed400 base pairs with similar accuracy.

The proposed 2 X 96-channel instrument will purify upto 1200 plasmid DNA preps per eight hour day. It willsignificantly reduce the cost and technician labor of highthroughput plasmid DNA purification for automated se-quencing and mapping.

DOE Grant No. DE-FG03-94ER81802/A000.

GRAIL-GenQuest: A ComprehensiveComputational Framework for DNASequence Analysis

Ruth Ann ManningApoCom, Inc.; Oak Ridge, TN 37830423/482-2500, Fax: /220-2030

Although DNA sequencing in the Human GenomeProject is occurring fairly systematically, biotechnologycompanies have focused on sequencing regions thoughtto contain particular disease genes. The client-serverDNA sequence analysis system GRAIL is the most accu-rate and widely used computer-based system for locatingand characterizing genes in DNA sequences, but it is notaccessible to many biotechnology environments. TheGRAIL client software and graphical displays have beendeveloped for high-end UNIX-based computer worksta-tions. Such workstations are standard equipment in uni-versities and large companies, but personal computers(PCs) and Macintosh computers are the prevalent tech-nology within the biotechnology community. ThisPhase I project will design Macintosh- and Windows-based client graphical user interface prototypes forGRAIL.


○ ○ ○ ○

SBIR

The growth of DNA databases is expected to continue at afast pace in the attempt to sequence the human genomecompletely by the year 2005. Parallel processing is a vi-able solution to handle searching through the ever-increas-ing volume of data. During Phase I, genQuest—the se-quence comparison server portion of the GRAIL system—will be parallelized for shared-memory platforms and willuse PVM1 for the development of genQuest servers on net-works of PCs and workstations and other innovative, high-performance computer architectures.

Prototype graphical interface systems for Macintosh, NTWindows, and Windows 95 that mimic the function andoperation of the current GRAIL-genQuest clients will en-

able a larger portion of biotechnology companies to makeuse of the GRAIL suite of analysis tools. Parallel genQuestservers will improve response time for searches and in-crease user capacity per server. Such fast shared- and dis-tributed-memory computing solutions will improve thecost-performance ratio and make parallel searches moreaffordable to the biotechnology community using generalmultipurpose hardware.


1The Parallel Virtual Machine (PVM) message-passinglibrary allows a collection of UNIX-based computers tofunction as a single multiple-processor supercomputer.



○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Sequencing

Sequencing by Hybridization: Methods to GenerateLarge Arrays of Oligonucleotides

Thomas M. Brennan

Sequencing by Hybridization: Development of anEfficient Large-Scale Methodology

Radomir Crkvenjakov

Genomic Instrumentation Development: DetectionSystems for Film and High-Speed Gel-Less Methods

Jack B. Davidson and Robert S. Foote

Single-Molecule Detection Using Charge-CoupledDevice Array Technology

M. Bonner Denton, Richard Keller, Mark E.Baker, Colin W. Earle, and David A. Radspinner

Coupling Sequencing by Hybridization with GelSequencing for Inexpensive Analysis of Genes andGenomes

Radoje Drmanac, Snezana Drmanac, and Ivan Labat

Physical Structure and DNA Sequence of HumanChromosomes

Glen A. Evans

Using Scanning Tunneling Microscopy to Sequencethe Human Genome

Thomas L. Ferrell, Robert J. Warmack,David P. Allison, K. Bruce Jacobson, Gilbert M. Brown, and Thomas G. Thundat

DNA Sequence Analysis by Solid-Phase HybridizationRobert S. Foote, Richard A. Sachleben, andK. Bruce Jacobson

DNA Sequencing Using Stable IsotopesK. Bruce Jacobson, Heinrich F. Arlinghaus, Gilbert M. Brown, Robert S. Foote, Frank W. Larimer, Richard A. Sachleben, Norbert Thonnard, and Richard P. Woychik

Preparation of Oligonucleotide Arrays for Hybridiza-tion Studies

Michael C. Pirrung , Steven W. Shuey, David C. Lever, Lara Fallon, J.-C. Bradley, andWilliam P. Hawe

Improvement and Automation of Ligation-MediatedGenomic Sequencing

Arthur D. Riggs and Gerd P. Pfeifer

*Analysis of a 53-Kb Nucleotide Sequence from theRight Genome Terminus of the Variola Major VirusStrain India-1967

Sergei N. Shchelkunov, Vladimir M. Blinov, Sergei M. Resenchuk, Alexei V. Totmenin, Viktor N. Krasnykh, Ludmilla V. Olenina, Oleg I. Serpinsky, and Lev S. Sandakhchiev

A High-Speed Automated DNA SequencerLloyd M. Smith

Characterization and Modification of DNAPolymerases for Use in DNA Sequencing

Stanley Tabor

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Mapping

*Toward Cloning Human Chromosome 19 in YeastArtificial Chromosomes

Inga P. Arman, Alexander B. Devin, Svetlana P.Legchilina, Irina G. Efimenko, Marina E.Smirnova, and Dina V. Glazkova

A Panel of Mouse-Human MonochromosomalHybrid Cell Lines, Each Containing a Single Differ-ent Tagged Human Chromosome

Arbansjit K. Sandhu, G. Pal Kaur, and Raghbir S. Athwal

*Preparation of a Set of Molecular Markers forHuman Chromosome 5 Using G+C–Rich andFunctional Site-Specific Oligonucleotides

M.L. Filipenko, A.I. Muravlev, E.I. Jantsen, V.V. Smirnova, N.A. Chikaev, V.P. Mishin, andM.A. Ivanovich

An Improved Method for Producing RadiationHybrids Applied to Human Chromosome 19

Cynthia L. Jackson and Hon Fong L. Mark

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Projects Completed FY 1994–95Projects in this section have been completed or did not receive support through the DOE Human Genome Program inFY 1996.

DOE Human Genome Program Report, Part 2, Completed Projects


Construction of a Human Genome Library Com-posed of Multimegabase Acentric ChromosomeFragments

Michael J. Lane, Peter Hahn, and John Hozier

Reagents for Understanding and Sequencing theHuman Genome

J.R. Korenberg, X-N. Chen, S. Mitchell, S. Gerwehr, Z. Sun, D. Noya, R. Hubert, U-J. Kim, H. Shizuya, X. Wu, J. Silva, B. Birren,T.J. Hudson, P. de Jong, E. Lander, and M. Simon

Development of Diallelic Marker Maps UsingPCR/OLA

Deborah A. Nickerson and Pui-Yan Kwok

Multiplex Mapping of Human cDNAsWilliam C. Nierman, Donna R. Maglott, and Scott Durkin

Physical Mapping in Preparation for DNA SequencingAndreas Gnirke, Regina Lim, Gane Wong, Jun Yu, Roger Bumgarner, and Maynard Olson

Construction of a Genetic Map Across Chromosome 21Elaine A. Ostrander

Integrated Physical Mapping of Human cDNAsMihael H. Polymeropoulos

Sequence-Tagged Sites for Human Chromosome 19cDNAs

Michael J. Siciliano and Anthony V. Carrano

cDNA/STS Map of the Human Genome: MethodsDevelopment and Applications Using Brain cDNAs

James M. Sikela, Akbar S. Khan, Arto K.Orpana, Andrea S. Wilcox, Janet A. Hopkins, andTamara J. Stevens

Physical Structure of Human Chromosome 21Cassandra L. Smith, Denan Wang, Kaoru Yoshida, Jesus Sainz, Carita Fockler, andMeire Bremer

Physical Mapping of Human Chromosome 16David F. Callen, Sinoula Apostolou, ElizabethBaker, Helen Kozman, Sharon A. Lane,Julie Nancarrow, Hilary A. Phillips, Scott A.Whitmore, Norman A. Doggett, John C. Mulley, Robert I. Richards, and Grant R. Sutherland

Chromosome Mapping by FISH to Interphase NucleiBarbara J. Trask

Flow Karyotyping and Flow Instrumentation Devel-opment

Ger van den Engh and Barbara Trask

Isolation of Specific Human Telomeric Clones byHomologous Recombination and YAC Rescue

Geoffrey Wahl and Linnea Brody

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Informatics

*A Method for Direct Sequencing of DiploidGenomes on Oligonucleotide Arrays: TheoreticalAnalysis and Computer Modeling

Alexander B. Chetverin

Sampling-Based Methods for the Estimation of DNASequence Accuracy

Gary Churchill and Betty Lazareva

Computer-Aided Genome Map Assembly withSIGMA (System for Integrated Genome MapAssembly)

Michael J. Cinkosky, Michael A. Bridgers, William M. Barber, Mohamad Ijadi, and James W. Fickett

Informatics for the Sequencing by HybridizationProject

Aleksandar Milosavljevic and RadomirCrkvenjakov

Sequencing by Hybridization Algorithms andComputational Tools

Radoje Drmanac, Ivan Labat, and Nick Stavropoulos

HGIR: Information Management for a Growing MapJames W. Fickett, Michael J. Cinkosky, Michael A. Bridgers, Henry T. Brown, ChristianBurks, Philip E. Hempfner, Tran N. Lai, DebraNelson, Robert M. Pecherer, Doug Sorenson, Peichen H. Sgro, Robert D. Sutherland, Charles D. Troup, and Bonnie C. Yantis

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Completed Projects



Identification of Genes in Anonymous DNASequences

Christopher A. Fields and Carol A. Soderlund

Algorithms in Support of the Human Genome ProjectDan Gusfield, Jim Knight, Kevin Murphy, Paul Stelling, Lushen Wang, Archie Cobbs,Paul Horton, Richard Karp, and Gene Lawler

BISP: VLSI Solutions to Sequence-ComparisonProblems

Tim Hunkapiller , Leroy Hood, Ed Chen, and Michael Waterman

Physical Mapping of DNA MoleculesRichard M. Karp

BIOSCI Electronic Newsgroup Network for theBiological Sciences

David Kristofferson

Multiple Alignment and Homolog Sequence Data-base Compilation

Hwa A. Lim

Applying Machine Learning Techniques to DNASequence Analysis

Jude W. Shavlik, Michiel O. Noordewier, Geoffrey Towell, Mark Craven, Andrew Whitsitt,Kevin Cherkauer, and Lorien Pratt

New Approaches to Recognizing FunctionalDomains in Biological Sequences

Gary D. Stormo

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

ELSI

Protecting Genetic Privacy by Regulating theCollection, Analysis, Use, and Storage of DNA andInformation Obtained from DNA Analysis

George J. Annas, Leonard H. Glantz, and Patricia A. Roche

“The Secret of Life”Paula Apsell and Graham Chedd

Genome Technology and Its Implications: AHands-On Workshop for Educators

Diane Baker and Paula Gregory

Predicting Future Disease: Issues in the Develop-ment, Application, and Use of Tests for GeneticDisorders

Ruth E. Bulger and Jane E. Fullarton

HUGO International Yearbook: Genetics, Ethics,Law, and Society (GELS)

Alex Capron and Bartha Knoppers

The Human Genome: Science and the Social Conse-quences; Interactive Exhibits and Programs on Ge-netics and the Human Genome

Charles C. Carlson

International Conference Working Group: The SocialCosts and Medical Benefits of Human GeneticInformation

Betsy Fader

“Medicine at the Crossroads”George Page and Stefan Moore

Pilot Senior Research Fellowship Program: Bioethi-cal Issues in Molecular GeneticsDeclan Murphy and Claudette Cyr Friedman

Studies of Genetic DiscriminationMarvin Natowicz

DNA Banking and DNA Data Banking: Legal,Ethical, and Public Policy Issues

Philip Reilly

Mechanical Interactive Exhibits on BiotechnologyElizabeth Sharpe

Impact of Technology Derived from the HumanGenome Project on Genetic Testing, Screening, andCounseling: Cultural, Ethical, and Legal Issues

Ralph W. Trottier, Lee A. Crandall , David Phoenix, Mwalimu Imara, and Ray E. Mosley

Social Science Concepts and Studies of Privacy:A Comprehensive Inventory and Analysis forConsidering Privacy, Confidentiality, and AccessIssues in the Use of Genetic Tests and Applications ofGenetic Data

Alan F. Westin

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Completed Projects



Human Genetics and Genome Analysis: A PracticalWorkshop for Public Policymakers and OpinionLeaders

Jan Witkowski , David A. Micklos, and Margaret Henderson

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

SBIR Phase I

A Graphical Ad Hoc Query Interface Capable of Ac-cessing Heterogenous Public Genome Databases

J. Clarke Anderson

Techniques for Screening Large-Insert LibrariesSaika Aytay

Interactive DNA Sequence Processing for a Micro-computer

Wayne Dettloff and Holt Anderson

High-Performance Searching and Pattern Recogni-tion for Human Genome Databases

Douglas J. Eadline

Estimating, Encoding, and Using Uncertainties in Se-quence Data

John R. Hartman

Low-Cost Massively Parallel Neurocomputing forPattern Recognition in Macromolecular Sequences

John R. Hartman

Electrophoretic Separation of DNA Fragments in Ul-trathin Planar-Format Linear Polyacrylamide

Michael T. MacDonell and Darlene B. Roszak

An Acoustic Plate Mode DNA BiosensorDouglas J. McAllister

Piezoelectric Biosensor Using Peptide Nucleic Acidsfor Triplex Capture

Douglas McAllister

Pedigree Software for the Presentation of Human Ge-nome Information for Genetic Education and Coun-seling

Charles L. Manske

A High-Spatial-Resolution Spectrograph for DNASequencing

Cathy D. Newman

Nonradioactive Detection Systems Based onEnzyme-Fragment Complementation

Peter Richterich

Separation Media for DNA SequencingDavid S. Soane and Herbert H. Hooper

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

SBIR Phase II

Increased Speed in DNA Sequencing by UtilizingLARIS and SIRIS to Localize Multiple StableIsotope-Labeled Fragments

Heinrich F. Arlinghaus

Rapid, High-Throughput DNA Sequencing UsingConfocal Fluorescence Imaging of Capillary Arrays

David L. Barker and Jay Flatley

Spatially Defined Oligonucleotide ArraysStephen P. A. Fodor

Site-Specific Endonucleases for Human GenomeMapping

George Golumbeski, Kimberly Knoche,Susanne Selman, im Hartnett, Lydia Hung, andPeter Bayne

High-Performance DNA and Protein SequenceAnalysis on a Low-Cost Parallel-Processor Array

John R. Hartman and David L. Solomon

Chemiluminescent Multiprimed DNA SequencingChris S. Martin, Corinne E. M. Olesen, andIrena Bronstein

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Completed Projects



Appendix

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Narratives from Large, Multidisciplinary Research Projects

Part 1 of this report contains narratives that represent DOE Human Genome Program research in large,multidisciplinary projects. As a convenience to the reader, these narratives are reprinted without graphics in thisappendix. Only the contact persons for these organizations are listed in the Index to Principal and Coinvesti-gators. To obtain more information on research carried out in these projects, see their contact information orvisit the Web sites listed with the narratives.

Joint Genome Institute.......................................................................................................................................72

Elbert Branscomb

Lawrence Livermore National Laboratory Human Genome Center..................................73

Anthony V. Carrano

Los Alamos National Laboratory Center for Human Genome Studies.............................77

Larry L. Deaven

Lawrence Berkeley National Laboratory Human Genome Center......................................81

Mohandas Narla

University of Washington Genome Center............................................................................................85

Maynard Olson

Genome Database..................................................................................................................................................87

Stanley Letovsky and Robert Cottingham

National Center for Genome Resources.................................................................................................91

Peter Schad

DOE Human Genome Program Report, Part 2, Appendix


In a major restructuring of its Human Genome Program,on October 23, 1996, the DOE Office of Biological andEnvironmental Research established the Joint GenomeInstitute (JGI) to integrate work based at its three majorhuman genome centers.

The JGI merger represents a shift toward large-scale se-quencing via intensified collaborations for more effectiveuse of the unique expertise and resources at LawrenceBerkeley National Laboratory (LBNL), LawrenceLivermore National Laboratory (LLNL), and Los AlamosNational Laboratory. Elbert Branscomb (LLNL) serves asJGI’s Scientific Director. Capital equipment has been or-dered, and operational support of about $30 million isprojected for the 1998 fiscal year.

With easy access to both LBNL and LLNL, a building inWalnut Creek, California, is being modified. Here, start-ing in late FY 1998, production DNA sequencing will becarried out for JGI. Until that time, large-scale sequencingwill continue at LANL, LBNL, and LLNL. Expectationsare that within 3 to 4 years the Production SequencingFacility will house some 200 researchers and techniciansworking on high-throughput DNA sequencing usingstate-of-the-art robotics.

Initial plans are to target gene-rich regions of around 1 to10 megabases for sequencing. Considerations includegene density, gene families (especially clustered families),correlations to model organism results, technical capabili-ties, and relevance to the DOE mission (e.g., DNA repair,cancer susceptibility, and impact of genotoxins). The JGIprogram is subject to regular peer review.

Sequence data will be posted daily on the Web; as the in-formation progresses to finished quality, it will be submit-ted to public databases.

As JGI and other investigators involved in the Human Ge-nome Project are beginning to reveal the DNA sequenceof the 3 billion base pairs in a reference human genome,the data already are becoming valuable reagents for


○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Genome Center Sequencing Efforts Merge

Lawrence Livermore National Laboratory7000 East Avenue, L-452Livermore, CA 94551

Production DNA Sequencing BegunWorldwide

The year 1996 marked a transition to the final and mostchallenging phase of the U.S. Human Genome Project, aspilot programs aimed at refining large-scale sequencingstrategies and resources were funded by DOE and NIH(see Research Highlights, DNA Sequencing, p. 14). Inter-nationally, large-scale human genome sequencing waskicked off in late 1995 when The Wellcome Trust an-nounced a 7-year, $75-million grant to the private SangerCentre to scale up its sequencing capabilities. French in-vestigators also have announced intentions to begin pro-duction sequencing.

Funding agencies worldwide agree that rapid and free re-lease of data is critical. Other issues include sequence ac-curacy, types of annotation that will be most useful to bi-ologists, and how to sustain the reference sequence.

The international Human Genome Organisation maintainsa Web page to provide information on current and futuresequencing projects and links to sites of participatinggroups (http://hugo.gdb.org). The site also links to reportsand resources developed at the February 1996 and 1997Bermuda meetings on large-scale human genome sequenc-ing, which were sponsored by The Wellcome Trust.

explorations of DNA sequence function in the body, some-times called “functional genomics.” Although large-scalesequencing is JGI’s major focus, another important goalwill be to enrich the sequence data with information aboutits biological function. One measure of JGI’s progress willbe its success at working with other DOE laboratories,genome centers, and non-DOE academic and industrialcollaborators. In this way, JGI’s evolving capabilities canboth serve and benefit from the widest array of partners.


Elbert Branscomb, JGI Scientific Director510/[email protected] or [email protected]

http://www.jgi.doe.gov


The Human Genome Center at Lawrence Livermore Na-tional Laboratory (LLNL) was established by DOE in1991. The center operates as a multidisciplinary teamwhose broad goal is understanding human genetic mate-rial. It brings together chemists, biologists, molecular bi-ologists, physicists, mathematicians, computer scientists,and engineers in an interactive research environment fo-cused on mapping, DNA sequencing, and characterizingthe human genome.

Goals and Priorities

In the past 2 years, the center’s goals have undergone anexciting evolution. This change is the result of several fac-tors, both intrinsic and extrinsic to the Human GenomeProject. They include: (1) successful completion of thecenter’s first-phase goal, namely a high-resolution,sequence-ready map of human chromosome 19; (2) ad-vances in DNA sequencing that allow accelerated scaleupof this operation; and (3) development of a strategic planfor LLNL’s Biology and Biotechnology Research Programthat will integrate the center’s resources and strengths ingenomics with programs in structural biology, individualsusceptibility, medical biotechnology, and microbial bio-technology.

The primary goal of LLNL’s Human Genome Center is tocharacterize the mammalian genome at optimal resolutionand to provide information and material resources to otherin-house or collaborative projects that allow exploitationof genomic biology in a synergistic manner. DNA se-quence information provides the biological driver for thecenter’s priorities:

• Generation of highly accurate sequence for chromo-some 19.

• Generation of highly accurate sequence for genomicregions of high biological interest to the mission ofthe DOE Office of Biological and Environmental Re-search (e.g., genes involved in DNA repair, replica-tion, recombination, xenobiotic metabolism, and cell-cycle control).

• Isolation and sequence of the full insert of cDNAclones associated with genomic regions being se-quenced.

• Sequence of selected corresponding regions of themouse genome in parallel with the human.

• Annotation and position of the sequenced clones withphysical landmarks such as linkage markers and se-quence tagged sites (STSs).

• Generation of mapped chromosome 19 and other ge-nomic clones [cosmids, bacterial artificial chromo-somes (BACs), and P1 artificial chromosomes (PACs)]for collaborating groups.

• Sharing of technology with other groups to minimizeduplication of effort.

• Support of downstream biology projects, for example,structural biology, functional studies, human variation,transgenics, medical biotechnology, and microbial bio-technology with know-how, technology, and materialresources.

Center Organization and Activities

Completion and publication of the metric physical map ofhuman chromosome 19 in 1995 has led to consolidation ofmany functions associated with physical mapping, with in-creased emphasis on DNA sequencing. The center is orga-nized into five broad areas of research and support: se-quencing, resources, functional genomics, informatics andanalytical genomics, and instrumentation. Each area con-sists of multiple projects, and extensive interaction occursboth within and among projects.

Sequencing

The sequencing group is divided into several subprojects.The core team is responsible for the construction of se-quence libraries, sequencing reactions, and data collectionfor all templates in the random phase of sequencing. Thefinishing team works with data produced by the core teamto produce highly redundant, highly accurate “finish” se-quence on targets of interest. Finally, a team of researchersfocuses specifically on development, testing, and imple-mentation of new protocols for the entire group, with anemphasis on improving the efficiency and cost basis of thesequencing operation.

Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Lawrence Livermore National Laboratory Human Genome Center

Human Genome CenterLawrence Livermore National LaboratoryBiology and Biotechnology Research Program7000 East Avenue, L-452Livermore, CA 94551

Anthony V. Carrano, Director510/422-5698, Fax: /423-3110, [email protected]

Linda Ashworth, Assistant to Center Director510/422-5665, Fax: -2282, [email protected]


http://www-bio.llnl.gov/bbrp/genome/genome.html


Resources

The resources group provides mapped clonal resources tothe sequencing teams. This group performs physical map-ping as needed for the DNA sequencing group by usingfingerprinting, restriction mapping, fluorescence in situhybridization, and other techniques. A small mapping ef-fort is under way to identify, isolate, and characterize BACclones (from anywhere in the human genome) that relate tosusceptibility genes, for example, DNA repair. Theseclones will be characterized and provided for sequencingand at the same time contribute to understanding the biol-ogy of the chromosome, the genome, and susceptibilityfactors. The mapping team also collaborates with othersusing the chromosome 19 map as a resource for gene hunt-ing.

Functional Genomics

The functional genomics team is responsible for assem-bling and characterizing clones for the Integrated Molecu-lar Analysis of Gene Expression (called IMAGE) Consor-tium and cDNA sequencing, as well as for work on geneexpression and comparative mouse genomics. The effortemphasizes genes involved in DNA repair and linksstrongly to LLNL’s gene-expression and structural biologyefforts. In addition, this team is working closely with OakRidge National Laboratory (ORNL) to develop a compara-tive map and the sequence data for mouse regions syntenicto human chromosome 19.

Informatics and Analytical Genomics

The informatics and analytical genomics group providescomputer science support to biologists. The sequencinginformatics team works directly with the DNA sequencinggroup to facilitate and automate sample handing, data ac-quisition and storage, and DNA sequence analysis and an-notation. The analytical genomics team provides statisticaland advanced algorithmic expertise. Tasks include devel-opment of model-based methods for data capture, signalprocessing, and feature extraction for DNA sequence andfingerprinting data and analysis of the effectiveness ofnewly proposed methods for sequencing and mapping.

Instrumentation

The instrumentation group also has multiple components.Group members provide expertise in instrumentation andautomation in high-throughput electrophoresis, preparationof high-density replicate DNA and colony filters, fluores-cence labeling technologies, and automated sample han-dling for DNA sequencing. To facilitate seamless integra-tion of new technologies into production use, this group iscoupled tightly to the biologist user groups and theinformatics group.

Collaborations

The center interacts extensively with other efforts withinthe LLNL Biology and Biotechnology Research Programand with other programs at LLNL, the academic commu-nity, other research institutes, and industry. More than 250collaborations range from simple probe and clone sharingto detailed gene family studies. The following list reflectssome major collaborations.

• Integration of the genetic map of human chromo-some 19 with corresponding mouse chromosomes(ORNL).

• Miniaturized polymerase chain reaction instrumenta-tion (LLNL).

• Sequencing of IMAGE Consortium cDNA clones(Washington University, St. Louis).

• Mapping and sequencing of a gene associated withFinnish congenital nephrotic syndrome (University ofOulu, Finland).

Accomplishments

The LLNL Human Genome Center has excelled in severalareas, including comparative genomic sequencing of DNArepair genes in human and rodent species, construction ofa metric physical map of human chromosome 19, and de-velopment and application of new biochemical and math-ematical approaches for constructing ordered clone maps.These and other major accomplishments are highlightedbelow.

• Completion of highly accurate sequencing totaling1.6 million bases of DNA, including regions spanninghuman DNA repair genes, the candidate region for acongenital kidney disease gene, and other regions ofbiological interest on chromosome 19.

• Completion of comparative sequence analysis of107,500 bases of genomic DNA encompassing thehuman DNA repair gene ERCC2 and the correspond-ing regions in mouse and hamster. In addition toERCC2, analysis revealed the presence of two previ-ously undescribed genes in all three species. One ofthese genes is a new member of the kinesin motor pro-tein family. These proteins play a wide variety of rolesin the cell, including movement of chromosomes be-fore cell division.

• Complete sequencing of human genomic regions con-taining two additional DNA repair genes. One ofthese, XRCC3, maps to human chromosome 14 andencodes a protein that may be required for chromo-some stability. Analysis of the genomic sequenceidentified another kinesin motor protein gene physi-

○ ○ ○ ○ ○

LLNL



cally linked to XRCC3. The second human repairgene, HHR23A, maps to 19p13.2. Sequence analysisof 110,000 bases containing HHR23A identified sixother genes, five of which are new genes with similar-ity to proteins from mouse, human, yeast, andCaenorhabditis elegans.

• Complete sequencing of full-length cDNAs for threenew DNA repair genes (XRCC2, XRCC3, andXRCC9) in collaboration with the LLNL DNA repairgroup.

• Generation of a metric physical map of chromo-some 19 spanning at least 95% of the chromosome.This unique map incorporates a metric scale to esti-mate the distance between genes or other markers ofinterest to the genetics community.

• Assembly of nearly 45 million bases of EcoR I restric-tion-mapped cosmid contigs for human chromo-some 19 using a combination of fingerprinting andcosmid walking. Small gaps in cosmid continuity havebeen spanned by BAC, PAC, and P1 clones, which arethen integrated into the restriction maps. The highdepth of coverage of these maps (average redundancy,4.3-fold) permits selection of a minimum overlappingset of clones for DNA sequencing.

• Placement of more than 400 genes, genetic markers,and other loci on the chromosome 19 cosmid map.Also, 165 new STSs associated with premappedcosmid contigs were generated and added to thephysical map.

• Collaborations to identify the gene (COMP) respon-sible for two allelic genetic diseases, pseudoachondro-plasia and multiple epiphyseal dysplasia, and the iden-tification of specific mutations causing each condi-tion.

• Through sequence analysis of the 2A subfamily of thehuman cytochrome P450 enzymes, identification of anew variant that exists in 10% to 20% of individualsand results in reduced ability to metabolize nicotineand the antiblood-clotting drug Coumadin.

• Location of a zinc finger gene that encodes a tran-scription factor regulating blood-cell developmentadjacent to telomere repeat sequences, possibly thegene nearest one end of chromosome 19.

• Completion of the genomic and cDNA sequence ofthe gene for the human Rieske Fe-S protein involvedin mitochondrial respiration.

• Expansion of the mouse-human comparativegenomics collaboration with ORNL to include studyof new groups of clustered transcription factors foundon human chromosome 19q and as syntenic homologson mouse chromosome 7.

• Numerous collaborations (in particular, with Washing-ton University and Merck) continuing to expand theLLNL-based IMAGE Consortium, an effort to charac-terize the transcribed human genome. The IMAGEclone collection is now the largest public collection ofsequenced cDNA clones, with more than 500,000 ar-rayed clones, 500,000 sequences in public databases,and 10,000 mapped cDNAs.

• Development and deployment of a comprehensivesystem to handle sample tracking needs of productionDNA sequencing. The system combines databases andgraphical interfaces running on both Mac and Sunplatforms and scales easily to handle large-scale pro-duction sequencing.

• Expansion of the LLNL genome center’s World WideWeb site to include tables that link to each gene beingsequenced, to the quality scores and assembled basescollected each night during the sequencing process,and to the submitted GenBank sequence when a cloneis completed. [http://bbrp.llnl.gov/test-bin/projqcsummary]

• Implementation of a new database to support sequenc-ing and mapping work on multiple chromosomes andspecies. Web-based automated tools were developedto facilitate construction of this database, the loadingof over 100 million bytes of chromosome 19 datafrom the existing LLNL database, and automated gen-eration of Web-based input interfaces.

• Significant enhancement of the LLNL GenomeGraphical Database Browser software to display andlink information obtained at a subcosmid resolutionfrom both restriction map hybridization and sequencefeature data. Features, such as genes linked to dis-eases, allow tracking to fragments as small as 500base pairs of DNA.

• Development of advanced microfabrication technolo-gies to produce electrophoresis microchannels in largeglass substrates for use in DNA sequencing.

• Installation of a new filter-spotting robot that routinelyproduces 6 × 6 × 384 filters. A 16× 16 × 384 patternhas been achieved.

• Upgrade of the Lawrence Berkeley National Labora-tory colony picker using a second computer so thatimaging and picking can occur simultaneously.

Future Plans

Genomic sequencing currently is the dominant function ofLivermore’s Human Genome Center. The physical map-ping effort will ensure an ample supply of sequence-readyclones. For sequencing targets on chromosome 19, this

○ ○ ○ ○ ○

LLNL



includes ensuring that the most stable clones (cosmids,BACs, and PACs) are available for sequencing and thatregions with such known physical landmarks as STSs andexpressed sequenced tags (ESTs) are annotated to facilitatesequence assembly and analysis. The following targets areemphasized for DNA sequencing:

• Regions of high gene density, including regions con-taining gene families.

• Chromosome 19, of which at least 42 million basesare sequence ready.

• Selected BAC and PAC clones representing regions ofabout 0.2 million to 1 million bases throughout thehuman genome; clones would be selected based onsuch high-priority biological targets as genes involvedin DNA repair, replication, recombination, xenobioticmetabolism, cell-cycle checkpoints, or other specifictargets of interest.

• Selected BAC and PAC clones from mouse regionssyntenic with the genes indicated above.

• Full-insert cDNAs corresponding to the genomicDNA being sequenced.

The informatics team is continuing to deploy broader-based supporting databases for both mapping and sequenc-ing. Where appropriate, Web- and Java-based tools are be-ing developed to enable biologists to interact with data.Recent reorganization within this group enables better di-rect support to the sequencing group, including evaluatingand interfacing sequence-assembly algorithms and analysistools, data and process tracking, and other informaticsfunctions that will streamline the sequencing process.

The instrumentation effort has three major thrusts: (1) con-tinued development or implementation of laboratory auto-mation to support high-throughput sequencing; (2) devel-opment of the next-generation DNA sequencer; and (3) de-velopment of robotics to support high-density BAC clonescreening. The last two goals warrant further explanation.

The new DNA sequencer being developed under a grantfrom the National Institutes of Health, with minor supportthrough the DOE genome center, is designed to run 384

lanes simultaneously with a low-viscosity sieving medium.The entire system would be loaded automatically, run, andset up for the next run at 3-hour intervals. If successful, itshould provide a 20- to 40-fold increase in throughput overexisting machines.

An LLNL-designed high-precision spotting robot, whichshould allow a density of 98,304 spots in 96 cm2, is nowoperating. The goal of this effort is to create high-densityfilters representing a 10× BAC coverage of both humanand mouse genomes (30,000 clones = 1× coverage). Thuseach filter would provide ~3× coverage, and eight suchfilters would provide the desired coverage for both ge-nomes. The filters would be hybridized with ampliconsfrom individual or region-specific cDNAs and ESTs; giventhe density of the BAC libraries, clones that hybridizeshould represent a binned set of BACs for a region of in-terest. These BACs could be the initial substrate for a BACsequencing strategy. Performing hybridizations in parallelin mouse and human DNA facilitates the development ofthe mouse map (with ORNL involvement), and sequencingBACs from both species identifies evolutionarily con-served and, perhaps, regulatory regions.

Information generated by sequencing human and mouseDNA in parallel is expected to expand LLNL efforts infunctional genomics. Comparative sequence data will beused to develop a high-resolution synteny map of con-served mouse-human domains and incorporate automatednorthern expression analysis of newly identified genes.Long range, the center hopes to take advantage of a varietyof forms of expression analysis, including site-directedmutation analysis in the mouse.

Summary

The Livermore Human Genome Center has undergone adramatic shift in emphasis toward commitment tolarge-scale, high-accuracy sequencing of chromosome 19,other chromosomes, and targeted genomic regions in thehuman and mouse. The center also is committed to exploit-ing sequence information for functional genomics studiesand for other programs, both in house and collaboratively.

○ ○ ○ ○ ○

LLNL



Biological research was initiated at Los Alamos NationalLaboratory (LANL) in the 1940s, when the laboratory be-gan to investigate the physiological and genetic conse-quences of radiation exposure. Eventual establishment ofthe national genetic sequence databank called GenBank,the National Flow Cytometry Resource, numerous relatedindividual research projects, and fulfillment of a key rolein the National Laboratory Gene Library Project all con-tributed to LANL’s selection as the site for the Center forHuman Genome Studies in 1988.

Center Organization and Activities

The LANL genome center is organized into four broad ar-eas of research and support: Physical Mapping, DNA Se-quencing, Technology Development, and Biological Inter-faces. Each area consists of a variety of projects, and workis distributed among five LANL Divisions (Life Sciences;Theoretical; Computing, Information, and Communica-tions; Chemical Science and Technology; and EngineeringSciences and Applications). Extensive interdisciplinaryinteractions are encouraged.

Physical Mapping

The construction of chromosome- and region-specificcosmid, bacterial artificial chromosome (BAC), and yeastartificial chromosome (YAC) recombinant DNA librariesis a primary focus of physical mapping activities at LANL.Specific work includes the construction of high-resolutionmaps of human chromosomes 5 and 16 and associatedinformatics and gene discovery tasks.

Accomplishments

• Completion of an integrated physical map of humanchromosome 16 consisting of both a low-resolutionYAC contig map and a high-resolution cosmid contigmap. With sequence tagged site (STS) markers pro-vided on average every 125,000 bases, the YAC-STSmap provides almost-complete coverage of thechromosome’s euchromatic arms. All available locicontinue to be incorporated into the map.

• Construction of a low-resolution STS map of humanchromosome 5 consisting of 517 STS markers region-ally assigned by somatic-cell hybrid approaches.Around 95% mega-YAC–STS coverage (50 millionbases) of 5p has been achieved. Additionally, about40 million bases of 5q mega-YAC–STS coverage havebeen obtained collaboratively.

• Refinement of BAC cloning procedures for futureproduction of chromosome-specific libraries. Success-ful partial digestion and cloning of microgram quanti-ties of chromosomal DNA embedded in agarose plugs.Efforts continue to increase the average insert size toabout 100,000 bases.

DNA Sequencing

DNA sequencing at the LANL center focuses on low-passsample sequencing (SASE) of large genomic regions.SASE data is deposited in publicly available databases toallow for wide distribution. Finished sequencing is priori-tized from initial SASE analysis and pursued by parallelprimer walking. Informatics development includes datatracking, gene-discovery integration with the SequenceComparison ANalysis (SCAN) program, and functionalgenomics interaction.

Accomplishments

• SASE sequencing of 1.5 million bases from the p13region of human chromosome 16.

• Discovery of more than 100 genes in SASE se-quences.

• Generation of finished sequence for a 240,000-basetelomeric region of human chromosome 7q. From ini-tial sequences generated by SASE, oligonucleotideswere synthesized and used for primer walking directlyfrom cosmids comprising the contig map. Completesequencing was performed to determine what genes, ifany, are near the 7q terminus. This intriguing regionlacks significant blocks of subtelomeric repeat DNAtypically present near eukaryotic telomeres.

Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Los Alamos National Laboratory Center for Human Genome Studies

Center for Human Genome StudiesLos Alamos National LaboratoryP.O. Box 1663Los Alamos, NM 87545

Robert K. Moyzis, Director, 1989–97*

*Now at University of California, Irvine

Larry L. Deaven, Acting Director505/667-3912, Fax: [email protected]

Lynn Clark, Technical Coordinator505/667-9376, Fax: [email protected]

http://www-ls.lanl.gov/masterhgp.html



• Complete single-pass sequencing of 2018 exon clonesgenerated from LANL’s flow-sorted human chromo-some 16 cosmid library. About 950 discrete sequenceswere identified by sequence analysis. Nearly 800 ap-pear to represent expressed sequences from chromo-some 16.

• Development of Sequence Viewer to display ABI se-quences with trace data on any computer having anInternet connection and a Netscape World Wide Webbrowser.

• Sequencing and analysis of a novel pericentromericduplication of a gene-rich cluster between 16p11.1 andXq28 (in collaboration with Baylor College of Medi-cine).

Technology Development

Technology development encompasses a variety of activi-ties, both short and long term, including novel vectors forlibrary construction and physical mapping; automation androbotics tools for physical mapping and sequencing; novelapproaches to DNA sequencing involving single-moleculedetection; and novel approaches to informatics tools forgene identification.

Accomplishments

• Development of SCAN program for large-scale se-quence analysis and annotation, including a translatorconverting SCAN data to GIO format for submission toGenome Sequence DataBase.

• Application of flow-cytometric approach to DNA siz-ing of P1 artificial chromosome (PAC) clones. Lessthan one picogram of linear or supercoiled DNA is ana-lyzed in under 3 minutes. Sizing range has been ex-tended down to 287 base pairs. Efforts continue to ex-tend the upper limit beyond 167,000 bases.

• Characterization of the detection of single, fluores-cently tagged nucleotides cleaved from multiple DNAfragments suspended in the flow stream of a flow cy-tometer. The cleavage rate for Exo III at 37°C wasmeasured to be about 5 base pairs per second per M13DNA fragment. To achieve a single-color sequencingdemonstration, either the background burst rate (cur-rently about 5 bursts per second) must be reduced orthe exonuclease cleavage rate must be increased sig-nificantly. Techniques to achieve both are being ex-plored.

• Construction of a simple and compact apparatus, basedon a diode-pumped Nd:YAG laser, for routine DNAfragment sizing.

• Development of a new approach to detect coding se-quences in DNA. This complete spectral analysis of

coding and noncoding sequences is as sensitive in itsfirst implementations as the best existing techniques.

• Use of phylogenetic relationships to generate newprofiles of amino acid usage in conserved domains.The profiles are particularly useful for classificationof distantly related sequences.

Biological Interfaces

The Biological Interfaces effort targets genes and chromo-some regions associated with DNA damage and repair,mitotic stability, and chromosome structure and functionas primary subjects for physical mapping and sequencing.Specific disease-associated genes on human chromo-some 5 (e.g., Cri-du-Chat syndrome) and on 16 (e.g.,Batten’s disease and Fanconi anemia) are the subjects ofcollaborative biological projects.

Accomplishments

• Identification of two human 7q exons having 99% ho-mology to the cDNA of a known human gene, vasoac-tive intestinal peptide receptor 2A. Preliminary datasuggests that the VIPR2A gene is expressed.

• Identification of numerous expressed sequence tags(ESTs) localized to the 7q region. Since three of theESTs contain at least two regions with high confi-dence of homology (~90%), genes in addition toVIPR2A may exist in the terminal region of 7q.

• Generation of high-resolution cosmid coverage onhuman chromosome 5p for the larynx and critical re-gions identified with Cri-du-Chat syndrome, the mostcommon human terminal-deletion syndrome (in col-laboration with Thomas Jefferson University).

• Refinement of the Wolf-Hirschhorn syndrome (WHS)critical region on human chromosome 4p. Using theSCAN program to identify genes likely to contributeto WHS, the project serves as a model for defining theinteraction between genomic sequencing and clinicalresearch.

• Collaborative construction of contigs for human chro-mosome 16, including 1.05 million bases in cosmidsthrough the familial Mediterranean fever (FMF) generegion (with members of the FMF Consortium) and700,000 bases in P1 clones encompassing the poly-cystic kidney disease gene (with Integrated Genetics,Inc.).

• Collaborative identification and determination of thecomplete genomic structure of the Batten’s diseasegene (with members of the BDG Consortium), thegamma subunit of the human amiloride-sensitive epi-thelial channel (Liddle’s syndrome, with University ofIowa), and the polycystic kidney disease gene (withIntegrated Genetics).

○ ○ ○ ○ ○

LANL



• Participation in an international collaborative researchconsortium that successfully identified the gene re-sponsible for Fanconi anemia type A.

Patents, Licenses, and CRADAs• Rhett L. Affleck, James N. Demas, Peter M. Goodwin,

Jay A. Schecker, Ming Wu, and Richard A. Keller,“Reduction of Diffusional Defocusing in Hydrody-namically Focused Flows by Complexing with a HighMolecular Weight Adduct,” United States Patent, filedDecember 1996.

• R.L. Affleck, W.P. Ambrose, J.D. Demas, P.M.Goodwin, M.E. Johnson, R.A. Keller, J.T. Petty, J.A.Schecker, and M. Wu, “Photobleaching to Reduce orEliminate Luminescent Impurities for UltrasensitiveLuminescence Analysis,” United States Patent, S-87,208, accepted September 1997.

• J.H. Jett, M.L. Hammond, R.A. Keller, B.L. Marrone,and J.C. Martin, “DNA Fragment Sizing and Sortingby Laser-Induced Fluorescence,” United States Patent,S.N. 75,001, allowed May 1996.

• James H. Jett, “Method for Rapid Base Sequencing inDNA and RNA with Three Base Labeling,” in prepa-ration.

○ ○ ○ ○ ○

LANL

• Development license and exclusive license to LANL’sDNA sizing patent obtained by Molecular Technology,Inc., for commercialization of single-molecule detec-tion capability to DNA sizing.

Future Plans

LANL has joined a collaboration with California Instituteof Technology and The Institute for Genomic Research toconstruct a BAC map of the p arm of human chromo-some 16 and to complete the sequence of a 20-million–base region of this map.

In its evolving role as part of the new DOE Joint GenomeInstitute, LANL will continue scaleup activities focused onhigh-throughput DNA sequencing. Initial targets includegenes and DNA regions associated with chromosomestructure and function, syntenic break-points, and relevantdisease-gene loci.

A joint DNA sequencing center was established recentlyby LANL at the University of New Mexico. This facility isresponsible for determining the DNA sequence of clonesconstructed at LANL, then returning the data to LANL foranalysis and archiving.




Since 1937 the Ernest Orlando Lawrence Berkeley Na-tional Laboratory (LBNL) has been a major contributor toknowledge about human health effects resulting from en-ergy production and use. That was the year John Lawrencewent to Berkeley to use his brother Ernest’s cyclotrons tolaunch the application of radioactive isotopes in biologicaland medical research. Fifty years later, Berkeley Lab’s Hu-man Genome Center was established.

Now, after another decade, an expansion of biological re-search relevant to Human Genome Project goals is beingcarried out within the Life Sciences Division, with supportfrom the Information and Computing Sciences and Engi-neering divisions. Individuals in these research projects aremaking important new contributions to the key fields ofmolecular, cellular, and structural biology; physical chem-istry; data management; and scientific instrumentation.Additionally, industry involvement in this growing ventureis stimulated by Berkeley Lab’s location in the San Fran-cisco Bay area, home to the largest congregation of bio-technology research facilities in the world.

In July 1997 the Berkeley genome center became part ofthe Joint Genome Institute.

Sequencing

Large-scale genomic sequencing has been a central, ongo-ing activity at Berkeley Lab since 1991. It has been fundedjointly by DOE (for human genome production sequencingand technology development) and the NIH National Hu-man Genome Research Institute [for sequencing theDrosophila melanogaster model system, which is carriedout in partnership with the University of California, Berke-ley (UCB)]. The human genome sequencing area at Berke-ley Lab consists of five groups: Bioinstrumentation, Auto-mation, Informatics, Biology, and Development. Comple-menting these activities is a group in Life Sciences Divi-sion devoted to functional genomics, including thetransgenics program.

The directed DNA sequencing strategy at Berkeley Labwas designed and implemented to increase the efficiency

of genomic sequencing. A key element of the directed ap-proach is maintaining information about the relative posi-tions of potential sequencing templates throughout the en-tire sequencing process. Thus, intelligent choices can bemade about which templates to sequence, and the numberof selected templates can be kept to a minimum. More im-portant, knowledge of the interrelationship of sequencingruns guides the assembly process, making it more resistantto difficulties imposed by repeated sequences. As of July 3,1997, Berkeley Lab had generated 4.4 megabases of hu-man sequence and, in collaboration with UCB, had tallied7.6 megabases of Drosophila sequence.

Instrumentation and Automation

The instrumentation and automation program encompassesthe design and fabrication of custom apparatus to facilitateexperiments, the programming of laboratory robots to auto-mate repetitive procedures, and the development of (1) im-proved hardware to extend the applicability range of exist-ing commercial robots and (2) an integrated operating sys-tem to control and monitor experiments. Although somediscrete instrumentation modules used in the integrated pro-tocols are obtained commercially, LBNL designs its owncustom instruments when existing capabilities are inadequate.The instrumentation modules are then integrated into alarge system to facilitate large-scale production sequencing.In addition, a significant effort is devoted to improvingfluorescence-assay methods, including DNA sequenceanalysis and mass spectrometry for molecular sizing.

Recent advances in the instrumentation group include DNAPrep machine and Prep Track. These instruments are de-signed to automate completely the highly repetitive and la-bor-intensive DNA-preparation procedure to provide higherdaily throughput and DNA of consistent quality for se-quencing (see Web pages: http://hgighub.lbl.gov/esd/DNAPrep/TitlePage.html and http://hgighub.lbl.gov/esd/repTrackWebpage/preptrack.htm).

Berkeley Lab’s near-term needs are for 960 samples per dayof DNA extracted from overnight bacteria growths. TheDNA protocol is a modified boil prep prepared in a 96-well

Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Lawrence Berkeley National Laboratory Human Genome Center

Human Genome CenterLawrence Berkeley National Laboratory1 Cyclotron RoadBerkeley, CA 94720

Michael Palazzolo,* Director, 1996–97

*Now at Amgen, Inc.

Contact: Mohandas Narla510/486-7029, Fax: [email protected]

Joyce Pfeiffer, Administrative Assistant

http://www-hgc.lbl.gov/GenomeHome.html



format. Overnight bacteria growths are lysed, and samplesare separated from cell debris by centrifugation. The DNA isrecovered by ethanol precipitation.

Informatics

The informatics group is focused on hardware and softwaresupport and system administration, software developmentfor end sequencing, transposon mapping and sequence tem-plate selection, data-flow automation, gene finding, and se-quence analysis. Data-flow automation is the main empha-sis. Six key steps have been identified in this process, andsoftware is being written and tested to automate all six. Thefirst step involves controlling gel quality, trimming vectorsequence, and storing the sequences in a database. A pro-gram module called Move-Track-Trim, which is now usedin production, was written to handle these steps. The secondthrough fourth steps in this process involve assembling, ed-iting, and reconstructing P1 clones of 80,000 base pairsfrom 400-base traces. The fifth step is sequence annotation,and the sixth is data submission.

Annotation can greatly enhance the biological value of thesesequences. Useful annotations include homologies to knowngenes, possible gene locations, and gene signals such as pro-moters. LBNL is developing a workbench for automatic se-quence annotation and annotation viewing and editing. Thegoal is to run a series of sequence-analysis tools and displaythe results to compare the various predictions. Researchersthen will be able to examine all the annotations (for ex-ample, genes predicted by various gene-finding methods)and select the ones that look best.

Nomi Harris developed Genotator, an annotation workbenchconsisting of a stand-alone annotation browser and severalsequence-analysis functions. The back end runs several genefinders, homology searches (using BLAST), and signalsearches and saves the results in “.ace” format. Genotatorthus automates the tedious process of operating a dozen dif-ferent sequence-analysis programs with many different in-put and output formats. Genotator can function via com-mand-line arguments or with the graphical user interface(http://www-hgc.lbl.gov/inf/annotation.html).

Progress to Date

Chromosome 5

Over the last year, the center has focused its production ge-nomic sequencing on the distal 40 megabases of the humanchromosome 5 long arm. This region was chosen because itcontains a cluster of growth factor and receptor genes and islikely to yield new and functionally related genes throughlong-range sequence analysis. Results to date include:

• 40-megabase nonchimeric map containing 82 yeastartificial chromosomes (YACs) in the chromosome 5distal long arm.

• 20-megabase contig map in the region of 5q23-q33that contains 198 P1s, 60 P1 artificial chromosomes,and 495 bacterial artificial chromosomes (BACs)linked by 563 sequenced tagged sites (STSs) to formcontigs.

• 20-megabase bins containing 370 BACs in 74 bins inthe region of 5q33-q35.

Chromosome 21

An early project in the study of Down syndrome (DS),which is characterized by chromosome 21 trisomy, con-structed a high-resolution clone map in the chromosome 21DS region to be used as a pilot study in generating a con-tiguous gene map for all of chromosome 21. This projecthas integrated P1 mapping efforts with transgenic studiesin the Life Sciences Division. P1 maps provide a suitableform of genomic DNA for isolating and mapping cDNA.

• 186 clones isolated in the major DS region of chromo-some 21 comprising about 3 megabases of genomicDNA extending from D21S17 to ETS2. Throughcross-hybridization, overlapping P1s were identified,as well as gaps between two P1 contigs, andtransgenic mice were created from P1 clones in theDS region for use in phenotypic studies.

Transgenic Mice

One of the approaches for determining the biological func-tion of newly identified genes uses YAC transgenic mice.Human sequence harbored by YACs in transgenic mice hasbeen shown to be correctly regulated both temporally andspatially. A set of nonchimeric overlapping YACs identifiedfrom the 5q31 region has been used to create transgenicmice. This set of transgenic mice, which together harbor1.5 megabases of human sequence, will be used to assessthe expression pattern and potential function of putativegenes discovered in the 5q31 region. Additional mappingand sequencing are under way in a region of human chro-mosome 20 amplified in certain breast tumor cell lines.

Resource for Molecular Cytogenetics

Divining landmarks for human disease amid the enormousplain of the human genetic map is the mission of an ambi-tious partnership among the Berkeley Lab; University ofCalifornia, San Francisco; and a diagnostics company. Thecollaborative Resource for Molecular Cytogenetics ischarting a course toward important sites of biologicalinterest on the 23 pairs of human chromosomes (http://rmc-www.lbl.gov).

○ ○ ○ ○ ○

LBNL



The Resource employs the many tools of molecular cyto-genetics. The most basic of these tools, and the corner-stone of the Resource’s portfolio of proprietary technol-ogy, is a method generally known as “chromosome paint-ing,” which uses a technique referred to as fluorescence insitu hybridization or FISH. This technology was inventedby LBNL Resource leaders Joe Gray and Dan Pinkel.

A technology to emerge recently from the Resource isknown as “Quantitative DNA Fiber Mapping (QDFM).”High-resolution human genome maps in a form suitablefor DNA sequencing traditionally have been constructedby various methods of fingerprinting, hybridization, and

○ ○ ○ ○ ○

LBNL

identification of overlapping STSs. However, these tech-niques do not readily yield information about sequenceorientation, the extent of overlap of these elements, or thesize of gaps in the map. Ulli Weier of the Resource devel-oped the QDFM method of physical map assembly thatenables the mapping of cloned DNA directly onto linear,fully extended DNA molecules. QDFM allows unambigu-ous assembly of critical elements leading to high-resolutionphysical maps. This task now can be accomplished in lessthan 2 days, as compared with weeks by conventionalmethods. QDFM also enables detection and characteriza-tion of gaps in existing physical maps—a crucial step towardcompleting a definitive human genome map.




Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

University of Washington Genome Center

University of Washington Genome CenterDepartment of MedicineBox 352145Seattle, WA 98195

The Human Genome Project soon will need to increase rap-idly the scale at which human DNA is analyzed. The ulti-mate goal is to determine the order of the 3 billion basesthat encode all heritable information. During the 20 yearssince effective methods were introduced to carry out DNAsequencing by biochemical analysis of recombinant-DNAmolecules, these techniques have improved dramatically. Inthe late 1970s, segments of DNA spanning a few thousandbases challenged the capacity of world-class sequencinglaboratories. Now, a few million base pairs per year repre-sent state-of-the-art output for a single sequencing center.

However, the Human Genome Project is directed towardcompleting the human sequence in 5 to 10 years, so the datamust be acquired with technology available now. This goal,while clearly feasible, poses substantial organizational andtechnical challenges. Organizationally, genome centersmust begin building data-production units capable of sus-tained, cost-effective operation. Technically, many incre-mental refinements of current technology must be intro-duced, particularly those that remove impediments to in-creasing the scale of DNA sequencing. The University ofWashington (UW) Genome Center is active in both areas.

Production Sequencing

Both to gain experience in the production of high-quality,low-cost DNA sequence and to generate data of immediatebiological interest, the center is sequencing several regionsof human and mouse DNA at a current throughput of 2 mil-lion bases per year. This “production sequencing” has threemajor targets: the human leukocyte antigen (HLA) locus onhuman chromosome 6, the mouse locus encoding the alphasubunit of T-cell receptors, and an “anonymous” region ofhuman chromosome 7.

The HLA locus encodes genes that must be closely matchedbetween organ donors and organ recipients. This sequencedata is expected to lead to long-term improvements in theability to achieve good matches between unrelated organdonors and recipients.

The mouse locus that encodes components of the T-cell–receptor family is of interest for several reasons. The locusspecifies a set of proteins that play a critical role incell-mediated immune responses. It provides sequence datathat will help in the design of new experimental approachesto the study of immunity in mice—one of the most impor-tant experimental animals for immunological research. In

addition, the locus will provide one of the first large blocksof DNA sequence for which both human and mouse ver-sions are known.

Human-mouse sequence comparisons provide a powerfulmeans of identifying the most important biological featuresof DNA sequence because these features are often highlyconserved, even between such biologically different organ-isms as human and mouse. Finally, sequencing an “anony-mous” region of human chromosome 7, a region aboutwhich little was known previously, provides experience incarrying out large-scale sequencing under the conditionsthat will prevail throughout most of the Human GenomeProject.

Technology for Large-Scale Sequencing

In addition to these pilot projects, the UW Genome Centeris developing incremental improvements in current se-quencing technology. A particular focus is on enhancedcomputer software to process raw data acquired with auto-mated laboratory instruments that are used in DNA map-ping and sequencing. Advanced instrumentation is commer-cially available for determining DNA sequence via the“four-color–fluorescence method,” and this instrumentationis expected to carry the main experimental load of the Hu-man Genome Project. Raw data produced by these instru-ments, however, require extensive processing before theyare ready for biological analysis.

Large-scale sequencing involves a “divide-and-conquer”strategy in which the huge DNA molecules present in hu-man cells are broken into smaller pieces that can be propa-gated by recombinant-DNA methods. Individual analysesultimately are carried out on segments of less than 1000bases. Many such analyses, each of which still contains nu-merous errors, must be melded together to obtain finishedsequence. During the melding, errors in individual analysesmust be recognized and corrected. In typical large-scale se-quencing projects, the results of thousands of analyses aremelded to produce highly accurate sequence (less than oneerror in 10,000 bases) that is continuous in blocks of100,000 or more bases. The UW Genome Center is playinga major role in developing software that allows this processto be carried out automatically with little need for expertintervention. Software developed in the UW center is usedin more than 50 sequencing laboratories around the world,including most of the large-scale sequencing centers pro-ducing data for the Human Genome Project.

Maynard Olson, Director206/685-7366, Fax: [email protected]

http://www.genome.washington.edu



High-Resolution Physical Mapping

The UW Genome Center also is developing improved soft-ware that addresses a higher-level problem in large-scalesequencing. The starting point for large-scale sequencingtypically is a recombinant-DNA molecule that allowspropagation of a particular human genomic segment span-ning 50,000 to 200,000 bases. Much effort during the lastdecade has gone into the physical mapping of such mol-ecules, a process that allows huge regions of chromosomesto be defined in terms of sets of overlapping recombinant-DNA molecules whose precise positions along the chro-mosome are known. However, the precision required forknowing relationships of recombinant-DNA moleculesderived from neighboring chromosomal portions increasesas the Human Genome Project shifts its emphasis frommapping to sequencing.

High-resolution maps both guide the orderly sequencing ofchromosomes and play a critical role in quality control.Only by mapping recombinant-DNA molecules at highresolution can subtle defects in particular molecules berecognized. Such defective human DNA sources, which

are not faithful replicas of the human genome, must beweeded out before sequencing can begin. The UW GenomeCenter has a major program in high-resolution physicalmapping which, like the work on sequencing itself, usesadvanced computing tools. The center is producing mapsof regions targeted for sequencing on a just-in-time basis.These highly detailed maps are proving extremely valuablein facilitating the production of high-quality sequence.

Ultimate Goal

Although many challenges currently posed by the HumanGenome Project are highly technical, the ultimate goal isbiological. The project will deliver immense amounts ofhigh-quality, continuous DNA sequence into publicly ac-cessible databases. These data will be annotated so thatbiologists who use them will know the most likely posi-tions of genes and have convenient access to the bestavailable clues about the probable function of these genes.The better the technical solutions to current challenges, thebetter the center will be able to serve future users of thehuman genome sequence.

○ ○ ○

UW



Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Genome Database

Genome DatabaseJohns Hopkins University2024 E. Monument StreetBaltimore, MD 21205-2236

David Kingsbury, Director, 1993–97*

*Now at Chiron Pharmaceuticals, Emeryville, California

The release of Version 6 of the Genome Database (GDB)in January 1996 signaled a major change for both the sci-entific community and GDB staff. GDB 6.0 introduced anumber of significant improvements over previous ver-sions of GDB, most notably a revised data representationfor genes and genomic maps and a new curatorial modelfor the database. These new features, along with a remod-eled database structure and new schema and user inter-face, provide a resource with the potential to integrate allscientific information currently available on humangenomics. GDB rapidly is becoming the internationalbiomedical research community’s central source for in-formation about genomic structure, content, diversity,and evolution.

A New Data Model

Inherent in the underlying organization of information inGDB is an improved model for genes, maps, and otherclasses of data. In particular, genomic segments (anynamed region of the genome) and maps are being ex-panded regularly. New segment types have been added tosupport the integration of mapping and sequencing data(for example, gene elements and repeats) and the con-struction of comparative maps (syntenic regions). Newmap types include comparative maps for representingconserved syntenies between species and comprehensivemaps that combine data from all the various submittedmaps within GDB to provide a single integrated view ofthe genome. Experimental observations such as order,size, distance, and chimerism are also available.

Through the World Wide Web, GDB links its stored datawith many other biological resources on the Internet.GDB’s External Link category is a growing collection ofcross-references established between GDB entities andrelated information in other databases. By providing aplace for these cross-references, GDB can serve as a cen-tral point of inquiry into technical data regarding humangenomics.

Direct Community Data Submissionand Curation

Two methods for data submission are in use. For individu-als submitting small amounts of data, interactive editingof the database through the Web became available inApril 1996, and the process has undergone several simpli-fications since that time. This continues to be an area ofdevelopment for GDB because all editing must take placeat the Baltimore site, and Internet connections from out-side North America may be too slow for interactive edit-ing to be practical. Until these difficulties are resolved,GDB encourages scientists with limited connectivity toBaltimore to submit their data via more traditional means(e-mail, fax, mail, phone) or to prepare electronic submis-sions for entry by the data group on site.

For centers submitting large quantities of data, GDB de-veloped an electronic data submission (EDS) tool, whichprovides the means to specify login password validationand commands for inserting and updating data in GDB.The EDS syntax includes a mechanism for relating acenter’s local naming conventions to GDB objects. Datasubmitted to GDB may be stored privately for up to6 months before it automatically becomes public. Thedatabase is programmed to enforce this Human GenomeProject policy. Detailed specifications of GDB’s EDS syn-tax and other submission instructions are available (EDSprototype, http://www.gdb.org/eds).

Since the EDS system was implemented, GDB has putforth an aggressive effort to increase the amount of datastored in the database. Consequently, the database hasgrown tremendously. During 1996 it grew from 1.8 to6.7 gigabytes.

To provide accountability regarding data quality, the shiftto community curation introduced the idea that individu-als and laboratories own the data they submit to GDB andthat other researchers cannot modify it. However, othersshould be able to add information and comments, so anadditional feature is the commu-nity’s ability to conductelectronic online public discussions by annotating the


Stanley Letovsky, Informatics [email protected]

Robert Cottingham, Operations [email protected]

Telephone for both: 410/955-9705Fax for both: 410/614-0434

http://www.gdb.org


database submissions of fellow researchers. GDB is thefirst database of its kind to offer this feature, and thenumber of third-party annotations is increasing in theform of editorial commentary, links to literature citations,and links to other databases external to GDB. These linksare an important part of the curatorial process becausethey make other data collections available to GDB usersin an appropriate context.

Improved Map Representationand Querying

Accompanying the release of GDB 6.0, the programMapview creates graphical displays of maps. Mapviewwas developed at GDB to display a number of map types(cytogenetic, radiation hybrid, contig, and linkage) usingcommon graphical conventions found in the literature.Mapview is designed to stand alone or to be used in con-junction with a Web browser such as Netscape, therebycreating an interactive graphical display system. Whenused with Netscape, Mapview allows the user to retrievedetails about any displayed map object.

Maps are accessed through the query form for genomicsegment and its subclasses via a special program that al-lows the user to select whole maps or slices of maps fromspecific regions of interest and to query by map type. Theability to browse maps stored in GDB or download themin the background was also incorporated into GDB 6.0.

GDB stores many maps of each chromosome, generatedby a variety of mapping methods. Users who are inter-ested in a region, such as the neighborhood of a gene ormarker, will be able to see all maps that have data in thatregion, whether or not they contain the desired marker. Tosupport database querying by region of interest, inte-grated maps have been developed that combine data fromall the maps for each chromosome. These are called Com-prehensive Maps.

Queries for all loci in a region of interest are processedagainst the comprehensive maps, thereby searching allrelevant maps. Comprehensive maps are also useful fordisplay purposes because they organize the content of aregion by class of locus (e.g., gene, marker, clone) ratherthan by data source. This approach yields a much lesscomplex presentation than an alignment of numerous pri-mary maps. Because such information as detailed orders,order discrepancies between maps, and nonlinear metricrelations between maps is not always captured in thecomprehensive maps, GDB continues to provide access toaligned displays of primary maps.

A Variety of Searching Strategies

Recognizing the eclectic user commu-nity’s need to searchdata and formulate queries, GDB offers a spectrum ofsimple to complex search strategies. In addition, directprogramming access is available using either GDB’s objectquery language to the Object Broker software layer orstandard query language to the underlying Sybase rela-tional database.

Querying by Object Directly from GDB’sHome Page

The simplest methods search for objects according toknown GDB accession numbers; sequence database–accession numbers; specified names, including wildcardsymbols that will automatically match synonyms and pri-mary names; and keywords contained anywhere in thetext.

Querying by Region of Interest

A region of interest can be specified using a pair of flank-ing markers, which can be cytogenetic bands, genes,amplimers (sequence tagged sites), or any other mappedobjects. Given a region of interest, the comprehensivemaps are searched to find all loci that fall within them.These loci can be displayed in a table, graphically as aslice through a comprehensive map, or as slices through achosen set of primary maps. A comprehensive map sliceshows all loci in the region, including genes, expressedsequence tags (ESTs), amplimers, and clones. A regionalso can be specified as a neighborhood around a singlemarker of interest.

Results of queries for genes, amplimers, ESTs, or clonescan be displayed on a GDB comprehensive map. Resultsare spread across several chromosomes displayed inMapview. A query for all the PAX genes (specified as sym-bol = PAX* on the gene query form) retrieves genes onmultiple chromosomes. Double-clicking on one of thesegenes brings up detailed gene information via the Webbrowser.

Querying by Polymorphism

GDB contains a large number of polymorphisms associ-ated with genes and other markers. Queries can be con-structed for a particular type of marker (e.g., gene,amplimer, clone), polymorphism (i.e., dinucleotide repeat),or level of heterozygosity. These queries can be combinedwith positional queries to find, for example, polymorphicamplimers in a region bounded by flanking markers or in aparticular chromosomal band. If desired, the retrievedmarkers can be viewed on a comprehensive map.

○ ○ ○ ○

GDB



Work in Progress

Mapview 2.3

Mapview 2.1, the next generation of the GDB map viewer,was released in March 1997. The latest version,Mapview 2.3, is available in all common computing envi-ronments because it is written in the Java programming lan-guage. Most important, the new viewer can display mul-tiple aligned maps side by side in the window, with align-ment lines indicating common markers in neighboringmaps. As before, users can select individual markers to re-trieve more information about them from the database.

GDB developers have entered into a collaborative relation-ship with other members of the bioWidget Consortium sothe Java-based alignment viewer will become part of a col-lection of freely available software tools for displayingbiological data (http://goodman.jax.org/projects/biowidgets/consortium).

Future plans for Mapview include providing or enhancingthe ability to generate manuscript-ready Postscript map im-ages, highlight or modify the display of particular classesof map objects based on attribute values, and requery foradditional information.

Variation

Since its inception, GDB has been a repository for poly-morphism data, with more than 18,000 polymorphismsnow in GDB. A collaboration has been initiated with theHuman Gene Mutation Database (HGMD) based inCardiff, Wales, and headed by David Cooper and MichaelKrawczak. HGMD’s extensive collection of human muta-tion data, covering many disease-causing loci, includes se-quence-level mutation characterizations. This data set willbe included in GDB and updated from HGMD on an ongo-ing basis. The HGMD team also will provide advice onGDB’s representation of genetic variation, which is beingenhanced to model mutations and polymorphisms at thesequence level. These modifications will allow GDB to actas a repository for single-nucleotide polymorphisms, whichare expected to be a major source of information on humangenetic variation in the near future.

Mouse Synteny

Genomic relationships between mouse and man provideimportant clues regarding gene location, phenotype, andfunction. One of GDB’s goals is to enable direct compari-sons between these two organisms, in collaboration withthe Mouse Genome Database at Jackson Laboratory. GDBis making additions to its schema to represent this infor-mation so that it can be displayed graphically withMapview. In addition, algorithmic work is under way to

use mapping data to automatically identify regions of con-served synteny between mouse and man. These algorithmswill allow the synteny maps to be updated regularly. Animportant application of comparative mapping is the abilityto predict the existence and location of unknown humanhomologs of known, mapped mouse genes. A set of suchpredictions is available in a report at the GDB Web site,and similar data will be available in the database itself inthe spring of 1998.

Collaborations

GDB is a participant in the Genome Annotation Consortium(GAC) project, whose goal is to produce high-quality, auto-matic annotation of genomic sequences (http://compbio.ornl.gov/CoLab). Currently, GDB is developing a proto-type mechanism to transition from GDB’s Mapview displayto the GAC sequence-level browser over common genomeregions. GAC also will establish a human genome referencesequence that will be the base against which GDB will referall polymorphisms and mutations. Ultimately, every ge-nomic object in GDB should be related to an appropriateregion of the reference sequence.

Sequencing Progress

The sequencing status of genomic regions now can be re-corded in GDB. Based on submissions to sequence data-bases, GAC will determine genomic regions that have beencompleted. GDB also will be collaborating with the Euro-pean Bioinformatics Institute, in conjunction with the inter-national Human Genome Organisation (HUGO), to main-tain a single shared Human Sequence Index that will recordcommitments and status for sequencing clones or regions.As a result, the sequencing status of any region can be dis-played alongside other GDB mapping data.

Outreach

The Genome Database continues to seek direct communityfeedback and interact with the broader science communityvia various sources:

• International Scientific Advisory Committee meets an-nually to offer input and advice.

• Quarterly Review Committee confers frequently withthe staff to track GDB progress and suggest change.

• HUGO nomenclature, chromosome, and other editorialcommittees have specialized functions within GDB,providing official names and consensus maps and en-suring the high quality of GDB’s content.

Copies of GDB are available worldwide from ten mirrorsites (nodes), and GDB staff members meet annually withnode managers.

○ ○ ○

GDB


90 DOE Human Genome Program Report, Part 2, 1996 Research AbstractsDOE Human Genome Program Report, Part 2, Appendix


Research Narratives

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

National Center for Genome Resources

The National Center for Genome Resources (NCGR) is anot-for-profit organization created to design, develop, sup-port, and deliver resources in support of public and privategenome and genetic research. To accomplish these goals,NCGR is developing and publishing the Genome Se-quence DataBase (GSDB) and the Genetics and PublicIssues (GPI) program.

NCGR is a center to facilitate the flow of information andresources from genome projects into both public and pri-vate sectors. A broadly based board of governors providesdirection and strategy for the center’s development.

NCGR opened in Santa Fe in July 1994, with its initialbioinformatics work being developed through a coopera-tive 5-year agreement with the Department of Energyfunded in July 1995. Committed to serving as a resourcefor all genomic research, the center works collaborativelywith researchers and seeks input from users to ensure thattools and projects under development meet their needs.

Genome Sequence DataBase

GSDB is a relational database that contains nucleotide se-quence data and its associated annotation from all knownorganisms (http://www.ncgr.org/gsdb). All data are freelyavailable to the public. The major goals of GSDB are toprovide the support structure for storing sequence data andto furnish useful data-retrieval services.

GSDB adheres to the philosophy that the database is a“community-owned” resource that should be simple to up-date to reflect new discoveries about sequences. A corol-lary to this is GSDB’s conviction that researchers knowtheir areas of expertise much better than a database curatorand, therefore, they should be given ownership and controlover the data they submit to the database. The true role ofthe GSDB staff is to help researchers submit data to andretrieve data from the database.

GSDB Enhancements

During 1996, GSDB underwent a major renovation to sup-port new data types and concepts that are important to ge-nomic research. Tables within the database were restruc-

tured, and new tables and data fields were added. Somekey additions to GSDB include the support of data owner-ship, sequence alignments, and discontiguous sequences.

The concept of data ownership is a cornerstone to thefunctioning of the new GSDB. Every piece of data (e.g.,sequence or feature) within the database is owned by thesubmitting researcher, and changes can be made only bythe data owner or GSDB staff. This implementation of dataownership provides GSDB with the ability to support com-munity (third-party) annotation—the addition of annota-tion to a sequence by other community researchers.

A second enhancement of GSDB is the ability to store andrepresent sequence alignments. GSDB staff has been con-structing alignments to several key sequences includingthe env and pol (reverse transcriptase) genes of the HIVgenome, the complete chromosome VIII of Saccharomy-ces cerevisiae, and the complete genome of Haemophilusinfluenzae. These alignments are useful as possible sites ofbiological interest and for rapidly identifying differencesbetween sequences.

A third key GSDB enhancement is the ability to representknown relationships of order and distance between sepa-rate individual pieces of sequence. These sets of sequencesand their relative positions are grouped together as a singlediscontiguous sequence. Such a sequence may be assimple as two primers that define the ends of a sequencetagged site (STS), it may comprise all exons that are partof a single gene, or it may be as complex as the STS mapfor an entire chromosome.

GSDB staff has constructed discontigu-ous sequences forhuman chromosomes 1 through 22 and X that includemarkers from Massachusetts Institute of Technology–Whitehead Institute STS maps and from the Stanford Hu-man Genome Center. The set of 2000 STS markers forchromosome X, which were mapped recently by Washing-ton University at St. Louis, also have been added to chro-mosome X. About 50 genomic sequences have been addedto the chromosome 22 map by determining their overlapwith STS markers. Genomic sequences are being added toall the chromosomes as their overlap with the STS markersis determined. These discontigu-ous sequences can be re-trieved easily and viewed via their sequence names using

Genome Sequence DataBase1800 Old Pecos Trail, Suite ASanta Fe, NM 87505

Peter Schad, Vice-PresidentBioinformatics and Biotechnology505/995-4447, Fax: [email protected]

Carol HargerGSDB Manager505/982-7840, Fax: [email protected]

http://www.ncgr.org



the GSDB Annotator. Sequence names follow the formatof HUMCHR#MP, where # equals 1 through 22 or X.

GSDB staff also has utilized discontigu-ous sequences toconstruct maps for maize and rice. The maize discontig-uous sequences were constructed using markers from theUniversity of Missouri, Columbia. Markers for the ricediscontiguous sequence were obtained from the Rice Ge-nome Database at Cornell University and the Rice Ge-nome Research Project in Japan.

New Tools

As a result of the major GSDB renovation, new tools wereneeded for submitting and accessing database data. Anno-tator was developed as a graphical interface that can beused to view, update, and submit sequence data (http://www.ncgr.org/gsdb/beta.html). Maestro, a Web-based in-terface, was developed to assist researchers in data re-trieval (http://www.ncgr.org/gsdb/maestrobeta.html). Al-though both these tools currently are available to research-ers, GSDB is continuing development to add increasedcapabilities.

Annotator displays a sequence and its associated biologicalinformation as an image, with the scale of the image ad-justable by the user. Additional information about the se-quence or an associate biological feature can be obtainedin a pop-up window. Annotator also allows a user to re-trieve a sequence for review, edit existing data, or add an-notation to the record. Sequences can be created using An-notator, and any sequences created or edited can be savedeither to a local file for later review and further editing orsaved directly to the database.

Correct database structures are important for storing dataand providing the research community with tools forsearching and retrieving data. GSDB is making a con-certed effort to expand and improve these services. Thefirst generation of the Maestro query tool is available fromthe GSDB Web pages. Maestro allows researchers to per-form queries on 18 different fields, some of which arequeryable only through GSDB, for example, D segmentnumbers from the Genome Database at Johns HopkinsUniversity in Baltimore.

Additionally, Maestro allows queries with mixed Booleanoperators for a more refined search. For example, a usermay wish to compare relatively long mouse and humansequences that do not contain identified coding regions. Toobtain all sequences meeting these criteria, the scientificname field would be searched first for “Mus musculus”and then for “Homo sapiens” using the Boolean term“OR.” Then the sequence-length filter could be used torefine the search to sequences longer than 10,000 basepairs. To exclude sequences containing identified coding-

region features, the “BUT NOT” term can be used with theFeature query field set equal to “coding region.”

With Maestro, users can view the list of search matches afew at a time and retrieve more of the list as needed. Fromthe list, users can select one or several sequences accord-ing to their short descriptions and review or download thesequence information in GIO, FASTA, or GSDB flatfileformat.

Future Plans

Although most pieces necessary for operation are now inplace, GSDB is still improving functionality and addingenhancements. During the next year GSDB, in collabora-tion with other researchers, anticipates creating morediscontiguous sequence maps for several model organisms,adding more functionality to and providing a Web-basedsubmission tool and tool kit for creating GIO files.

Microbial Genome Web Page

NCGR also maintains informational Web pages on micro-bial genomes. These pages, created as a community refer-ence, contain a list of current or completed eubacterial,Archaeal, and eukaryotic genome sequencing projects.Each main page includes the name of the organism beingsequenced, sequencing groups involved, background infor-mation on the organism, and its current location on theCarl Woese Tree of Life. As the Microbial Genome Projectprogresses, the pages will be updated as appropriate.

Genetics and Public Issues Program

GPI serves as a crucial resource for people seeking infor-mation and making decisions about genetics or genomics(http://www.ncgr.org/gpi). GPI develops and provides in-formation that explains the ethical, legal, policy, and socialrelevance of genetic discoveries and applications.

To achieve its mission, GPI has set forth three goals:(1) preparation and development of resources, includingcareful delineation of ethical, legal, policy, and social is-sues in genetics and genomics; (2) dissemination of ge-netic information targeted to the public, legal and healthprofessionals, policymakers, and decision makers; and (3)creation of an information network to facilitate interactionamong groups.

GPI delivers information through four primary vehicles:online resources, conferences, publications, and educa-tional programs. The GPI program maintains a continuallyevolving World Wide Web site containing a range of mate-rial freely accessible over the Internet.

○ ○ ○ ○ ○

GSDB



A

Adams, Mark D. 8Adamson, Doug 6Adamson, Anne E. 59Agarwal, Pankaj 41Aksenov, N.D. 26Albertson, Donna 7Allison, David 19Allman, Steve L. 1Anderson, Holt 70Anderson, J. Clarke 70Annas, George J. 69Apostolou, Sinoula 68Apsell, Paula 69Arenson, A. 23Arlinghaus, Heinrich F. 67, 70Arman, Inga P. 67Ashworth, Linda 28Athwal, Raghbir S. 67Aytay, Saika 70

B

Baker, Diane 69Baker, Elizabeth 68Baker, Mark E. 67Banerjee, Subrata 30Baranova, A.V. 30Barber, William M. 68Barker, David L. 70Barsky, V. 10Bashiardes, Evy 30Baumes, Susan 27Bavikin, S. 10Bayne, Peter 70Beeson, Diane 48Belikov, S.V. 22Benner, W.H. 1Binder, Matt 53Birren, B. 68Blatt, Robin J.R. 53Blinov, Vladimir M. 67Boitsov, Alexandre S. 19Boitsov, Stepan A. 19Bonaldo, Maria de Fatima 27Boughton, Ann 55Bradley, J.-C. 67Branscomb, Elbert 28Bremer, Meire 68Brennan, Thomas M. 67Bridgers, Michael A. 68Briley, J. David 13Brody, Linnea 68Bronstein, Irena 70Brown, Gilbert M. 67Brown, Henry T. 68

Browne, Murray 59Bruce, J. E. 15Bruce, James E. 14Bugaeva, Elena 24Bulger, Ruth E. 69Bumgarner, Roger 68Buneman, Peter 39Burbee, Dave 4, 5Burks, Christian 68Butler-Loffredo, Laura-Li 3

C

Cacheiro, Nestor 29Callen, David F. 68Cantor, Charles R. 19Capron, Alex 69Carlson, Charles C. 45, 69Carrano, Anthony V. 68Cartwright, Peter 6Carver, Ethan 28, 29Casey, Denise K. 59Catanese, Joe 20Chait, Brian 14Chang, Huan-Tsung 17Chedd, Graham 45, 69Chen, Chira 20Chen, Chung-Hsuan 1Chen, Ed 69Chen, I-Min A. 36Chen, X-N. 68Cherkauer, Kevin 69Chetverin, Alexander B. 68Chikaev, N.A. 67Chinault, A.C. 23Chittenden, Laura 29Chou, Chau-Wen 17Chou, Hugh 41Church, George 2Churchill, Gary 68Cinkosky, Michael J. 68Cobbs, Archie 69Collins, Colin 7Collins, Debra L. 45Conn, Lane 46Cozza, S. 37Cram, L.S. 26Crandall, Lee A. 69Craven, Mark 69Crkvenjakov, Radomir 67, 68Cuddihy, D. 37Culiat, Cymbeline 29Cytron, Ron 41

○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○

Index to Principal and Coinvestigators Listed in Abstracts

DOE Human Genome Program Report, Part 2, Index


D

Davidson, Jack B. 67Davidson, Jeff 47Davidson, Susan B., 39Davies, Chris 4, 5Davis, Sharon 47Davison, Daniel 33de Jong, P. 68de Jong, Pieter 2de Jong, Pieter J. 20Denton, M. Bonner 67Dettloff, Wayne 70Devin, Alexander B. 67Di Sera, Leonard 6Doggett, Norman A. 68Dogruel, David 17Doktycz, Mitch 19Dovichi, Norman 3Doyle, Johannah 28, 29Drmanac, Radoje 67, 68Drmanac, Snezana 67Dunn, Diane 6Dunn, John J. 3, 4Durkin, Scott 68Duster, Troy 48Dyer, Joshua P. 64

E

Eadline, Douglas J. 70Earle, Colin W. 67Efimenko, Irina G. 67Egenberger, Laurel 54Eichler, E.E. 23Einstein, J. Ralph 42Eisenberg, Rebecca S. 48Enukashvily, Natella 24Evans, Glen A. 4, 5, 67

F

Fader, Betsy 69Fallon, Lara 67Ferguson, F. Mark 6Ferrell, Thomas L. 67Fickett, James W. 68Fields, Christopher A. 69Filipenko, M.L. 67Firulli, B.A. 23Flatley, Jay 70Florentiev, V.L. 5Fockler, Carita 68Fodor, Stephen P. A. 70Fondon, Trey 4Foote, Robert S. 67Franklin, Terry 4, 5Frengen, Eirik 20

Fresco, Jacques R. 21Friedman, B. Ellen 51, 52Friedman, Claudette Cyr 69Fullarton, Jane E. 69Fung, Eliza 17

G

Gaasterland, Terry 38Garner, Harold R. (Skip) 4, 5Gath, Tracy 54Generoso, Walderico 29Gerwehr, S. 68Gesteland, Raymond F. 6Gibbs, R.A. 23Glantz, Leonard H. 69Glazer, Alexander N. 9Glazkova, Dina V. 67Gnirke, Andreas 68Golumbeski, George 70Goodman, Nathan 33Goodman, Stephen 49Graves, M. 23Graves, Mark 34Gray, Joe 7Gregory, Paula 69Griffith, Jeffrey K. 12Grosz, Michael 30Gu, Y. 23Guan, Xiaojun 42Guan, Xiaoping 20Guilfoyle, Richard A. 13Gusfield, Dan 69

H

Hahn, Peter 68Hahner, Lisa 4Hartman, John R. 70Hartnett, Jim 70Hauser, Loren 42, 44Haussler, David 34Hawe, William P. 67Hawkins, Trevor 8Hempfner, Philip E. 68Henderson, Margaret 70Hofstadler, S. A. 15Holmes, Linda 59Hood, Leroy 8, 52, 69Hooper, Herbert H. 70Hopkins, Janet A. 68Horton, Paul 69Hoyt, Peter 19Hozier, John 68Hubert, R. 68Hughey, Richard 34Hung, Lydia 70Hunkapiller, Tim 69



I

Ijadi, Mohamad 68Il’icheva, I.A. 5Imara, Mwalimu 69Ioannou, Panayotis A. 20, 30Ivanovich, M.A. 67Iwasaki, R. 37

J

Jackson, Cynthia L. 67Jacobson, K. Bruce 1, 67Jaklevic, J.M. 1Jantsen, E.I. 67Jefferson, Margaret C. 50Jelenc, Pierre 27Jessee, Joel 20Johnson, Marion D., III 21Jurka, Jerzy 34

K

Kamashev, D.E. 22Kao, Fa-Ten 21Kapanadze, B.I. 30Karger, Barry L. 9Karp, Richard 69Karp, Richard M. 69Karplus, Kevin 34Karpov, V.L. 22Kass, Judy 54Kaur, G. Pal 67Kel, A.E. 35Kel, O.V. 35Keller, Richard 67Khan, Akbar S. 68Kim, Joomyeong 28Kim, U-J. 68Kim, Ung-Jin 26, 27Kimball, Alvin 6Klopov, N.V. 26Knight, Jim 69Knoche, Kimberly 70Knoppers, Bartha 69Knuth, Mark W. 63Kolchanov, N.A. 35Korenberg, J.R. 68Korenberg, Julie 20Korenberg, Julie R. 22Kozman, Helen 68Krasnykh, Viktor N. 67Krone, Jennifer 17Kupfer, Ken 5Kwok, Pui-Yan 68

L

Labat, Ivan 67, 68Lai, Tran N. 68Lander, E. 68Lane, Michael J. 68Lane, Sharon A. 68Lantos, John 50Larimer, Frank W. 67Larson, Susan 38Lawler, Gene 69Lazareva, Betty 68Legchilina, Svetlana P. 67Lennon, Greg 29Leone, Joseph 64Lessick, Mira 50Lever, David C. 67Lewis, Kathy 17Li, Qingbo 17Lim, Hwa A. 69Lim, Regina 68Lobov, Ivan 24Lockett, Steven 7Lu, J. 23Lu, Xiandan 17Luchina, N.N. 25Lukjanov, Dmitry 24Lvovsky, Lev 16Lysov, Y. 10

M

MacConnell, William P. 64MacDonell, Michael T. 70Maglott, Donna R. 68Mahowald, Mary B. 50Mallison, M. 37Maltsev, Natalia 38Mann, Janice 55Manning, Ruth Ann 64Mansfield, Betty K. 59Manske, Charles L. 70Mark, Hon Fong L. 67Markowitz, Victor M. 36Marks, Andy 6Marr, T. 37Martin, Sheryl A. 59Martin, Chris S. 70Mathies, Richard A. 9Matis, Sherri 42Matveev, Ivan 24McAllister, Douglas 70McAllister, Douglas J. 70McInerney, Joseph D. 51, 52Metzger, M. 23Micikas, Lynda B. 52Micklos, David A. 70



Mills, Marissa D. 59Milosavljevic, Aleksandar 68Mirzabekov, Andrei 10Mishin, V.P. 67Mitchell, S. 68Moore, Stefan 69Mosley, Ray E. 69Moss, Robert 50Moyzis, Robert K. 12Muddiman, David C. 14, 15Mulley, John C. 68Munn, Maureen M. 52Mural, Richard 44Mural, Richard J. 42Muravlev, A.I. 67Murphy, Declan 69Murphy, Kevin 69Muzny, D.M. 23Myers, Gene 38

N

Nancarrow, Julie 68Natowicz, Marvin 69Nelson, D. L. 23Nelson, Debra 68Nelson, Randall 17Newman, Cathy D. 70Nguyen, Tuyen 64Nicholls, Robert 29Nickerson, Deborah A. 68Nierman, William C. 68Noordewier, Michiel O. 69Noya, D. 68

O

Olenina, Ludmilla V. 67Olesen, Corinne E. M., 70Oliver, Tammy 4Olson, Maynard 68Olson, Maynard V. 52Orpana, Arto K. 68Oskin, Boris V. 19Ostrander, Elaine A. 68Overbeek, Ross 38Overton, G. Christian 39, 41Overton, G.C. 35

P

Page, George 69Pecherer, Robert M. 68Petrov, Sergey 42, 44Pevzner, Pavel A. 40Pfeifer, Gerd P. 67Phillips, Hilary A. 68Phoenix, David 69

Pietrzak, Eugenia 20Pinkel, Daniel 7Pirrung, Michael C. 67Podgornaya, Olga 24Podkolodnaya, O.A. 35Polanovsky, O.L. 25Poletaev, A.I. 26Polymeropoulos, Mihael H. 68Porter, Kenneth W. 13Pratt, Lorien 69Preobrazhenskaya, O.V. 22Probst, Shane 4, 5

R

Radspinner, David A. 67Raja, Mugasimangalam 16Randesi, Matthew 4Reed, C. 37Reilly, Philip 69Reilly, Philip J. 53Resenchuk, Sergei M. 67Reshetin, Anton O. 19Richards, Robert I. 68Richterich, Peter 70Rider, Michelle 30Riggs, Arthur D. 67Roche, Patricia A. 69Romaschenko, A.G. 35Ross, Lainie Friedman 50Roszak, Darlene B. 70Roth, E.J. 23Rozen, Steve 33Ruano, Gualberto 63Rutledge, Joe 29

S

Sachleben, Richard A. 67Sachs, Greg 50Sainz, Jesus 68Salit, J. 37Sandakhchiev, Lev S. 67Sandhu, Arbansjit K. 67Schageman, Jeff 5Schimke, R. Neil 45Schurtz, Tony 6Schwerin, Noel 45Scott, Bari 53Searls, David B. 41Selkov, Evgeni 38Selman, Susanne 70Semov, A.B. 30Serpinsky, Oleg I. 67Sesma, Mary Ann 50Sgro, Peichen H. 68Shah, Manesh 42, 44Shannon, Mark 28



Sharpe, Elizabeth 69Shatrova, A.N. 26Shavlik, Jude W. 69Shaw, Barbara Ramsay 13Shchelkunov, Sergei N. 67Shchyolkina, A.K. 5Shen, Y. 23Shick, V. 10Shizuya, H. 68Shizuya, Hiroaki 26, 27Shuey, Steven W. 67Siciliano, Michael J. 68Sikela, James M. 68Silva, J. 68Simon, M. 68Simon, Melvin 8Simon, Melvin I. 26, 27Sivila, Randy F. 64Smirnova, Marina E. 67Smirnova, V.V. 67Smith, Cassandra L. 68Smith, Lloyd M. 13, 14, 67Smith, Randall 33Smith, Richard D. 14, 15Soane, David S. 70Soares, Marcelo Bento 27Soderlund, Carol A. 69Solomon, David L. 70Sonkin, Dina 16Sorenson, Doug 68Sosa, Maria 54Spejewski, Eugene 59Spengler, Sylvia 55Spengler, Sylvia J. 54, 60States, David J. 41Stavropoulos, Nick 68Stein, Lincoln 33Stelling, Paul 69Stepchenko, A.G. 25Stevens, Tamara J. 68Stormo, Gary D. 69Stubbs, Lisa 28, 29Studier, F. William 3, 4Sudar, Damir 7Sulimova, G.E. 30Sun, Tian-Qiang 30Sun, Z. 68Sutherland, Grant R. 68Sutherland, Robert D. 68Sze, Sing Hoi 40

T

Tabor, Stanley 16, 67Thilman, Jude 53Thonnard, Norbert 67Thundat, Thomas G. 67

Thundat, Tom 19Timms, K 23Timofeev, E.N. 5Tobin, Sara L. 55Totmenin, Alexei V. 67Towell, Geoffrey 69Tracy, A. 37Trask, Barbara 68Trask, Barbara J. 68Trottier, Ralph W. 69Troup, Charles D. 68Tsybenko, S. Yu 5

U

Uberbacher, Edward 44Uberbacher, Edward C. 42Udseth, Harold R. 14Ulanovsky, Levy 16

V

van den Engh, Ger 68Verp, Marion 50Vos, Jean-Michel H. 30

W

Wahl, Geoffrey 68Walkowicz, Mitchell 29Wang, Denan 68Wang, Lushen 69Wang, Min 30Warmack, Bruce 19Warmack, Robert J. 67Wassom, John S. 59Waterman, Michael 69Weier, Heinz-Ulrich 7Weinberger, Laurence 47Weiss, Robert B. 6Wentland, M.A. 23Wertz, Dorothy C. 53Westin, Alan F. 69Whitmore, Scott A. 68Whitsitt, Andrew 69Wilcox, Andrea S. 68Williams, Peter 17Williams, Walter 61Wingender, E. 35Witkowski, Jan 70Wong, Gane 68Woychik, Richard P. 67Wright, Gary 29Wright, James 61Wu, Chenyan 20Wu, J. 23Wu, X. 68Wyrick, Judy M. 59



X

Xu, Ying 42

Y

Yankovsky, N.K. 30Yantis, Bonnie C. 68Yershov, G. 10Yeung, Edward S. 17Yoshida, Kaoru 68Yu, Jun 68Yust, Laura N. 59


Z

Zenin, V.V. 26Zhao, Baohui 20Zoghbi, H.Y. 23Zorn, Manfred 7Zorn, Manfred D. 44Zweig, Franklin M. 56

Date post:	19-Jan-2017
Category:	Documents
Upload:	lamanh
View:	238 times
Download:	4 times