Date post: | 28-Aug-2014 |
Category: |
Education |
Upload: | robert-cormia |
View: | 3,063 times |
Download: | 3 times |
Bioinformatics – Discovering the “Bio-Logic” of Nature
Robert CormiaFoothill College
Transducing the Genome
• 50 years after Watson and Crick deduced the structure of DNA…
• The information molecules of nature now reside as data bits inside computers– But what does it all mean?
• We have ~15 GBytes of genomic data– And only just beginning to unravel it
‘Energy Systems’ Before ‘Life’
• “Life” arose on earth almost 4 billion years ago, 1 billion years before cells
• Long chains of molecules harvesting energy, probably deep below the sea– Before DNA, RNA and the sophisticated
proteins that we know today• There were plenty of sources of energy,
but no “choreographed metabolism”
Energy Metabolism
“In the Beginning”
• Rock, heat, and some water• Early molecules of life• Energy moved from rock into sea• Molecular networks played in the path• Capturing a memory of that process
was probably the key to life today
Life on the Sea Floor?
RNA Busy Before Cellular Life
The RNA World
• There is no way to know how the molecules of life really formed…
• Amino acids and ribonucleotides have formed in “pre-biotic” experiments
• RNA molecules, which appear to be both catalysts and templates, are thought to have formed energy networks
RNA Codons and Catalysts
RNA and DNA
• A, T, C, G, and U• A = Adenine• T = Thymine• C = Cytosine• G = Guanine• U = Uracil• A-T and C-G in DNA• A-U and C-G in RNA
Central Dogma of Life
The Genome
• DNA – DeoxyriboNucleic Acid is the prominent molecule of the genome
• Genes are formed of lengths of DNA polymers which code for proteins
• Exons and introns exist in DNA• Regulatory regions control transcription
and the formation of every protein and enzyme. It is the key to metabolism.
DNA at Transcription
The Proteome
• Proteins form cellular structure and enzymes, which function in metabolism
• Over 100,000 proteins exist in humans• DNA is not enough to run metabolism• Proteins have a “run-time” knowledge• Proteins control the transcription of DNA
and DNA controls formation of proteins
Rubisco Protein – Photosynthesis
RAD Protein Complex
Number of Genes vs. Time
What is Bioinformatics?
• Molecular biology– Ability to sequence DNA
• Internet databases– To store and transmit data
• Mathematical algorithms– To model and solve biological problems
Analysis Using the I2I Technology Model
Internet TechnologiesCPU
NetworkingData Storage
Data
Mini
ngGrid Com
puting
Storage Area Networks
Bioinformatics TechnologiesInformatics
IT / NetworkingMolecular Biology
Data
Mod
eling
Computational Biology
Genomic Databases
A Tool for Biotechnology
• Bioinformatics creates a set of tools for understanding the mountain of new data
• In biotechnology, these tools are used to discover how genes and proteins work
• Computers are used to both analyze and “mine” new data for hidden relationshipsDiscovering the “bio-logic” of nature
From Data to Knowledge
DNA Sequencing
DNA Sequencing
• Chemical sequencing• Molecular sequencing• Now about $0.01 per base• Human Genome took 10 years
– Celera sequenced in 3 years• Moore’s law applies to biotechnology too
– In 2010 a single human genome in ~7 days
DNA Sequencing
http://www.accessexcellence.org
Gel Enhanced Staining
DNA Micro Arrays
• Used to monitor gene expression– Which genes are active?– What are the “co-expressed patterns”?
• Compare healthy and diseased tissue– Extract “expressed” mRNA in cytoplasm– Convert mRNA to cDNA
• Discover relationships of proteins to disease states, and function / location of genes
• Is becoming the first step in “drug-discovery”
Microarray Output Screen
Microarray Output
Partnering with Pharma
• Bioinformatics is an industry of tools– Biotech is a consumer / user of these tools
• Pharma needs more “innovation engines”– Less than 2 drugs per firm in the ‘pipeline’– Drug discovery creates a new value chain
bioinformatics > biotech > ‘big pharma’Convergence is the modality of innovation
Pharma and Biotech
Drug Discovery
• Target discovery• Target validation• Protein interactions• Rapid screening• The long haul…
– $800 million / year is spent on drug discovery– Over 75% of drug compounds will never work
Drug Development Process
Drug Discovery
“Pharmaco Genomics”
• Individualized medicine• Looking at SNPs along drug targets
– What makes each of us – us?– 1 million SNPs, about one per intron
• In the future, each of us will have our genome “insilico” (genome on a chip)
• Data mining against 6 billion genomes!
Pharmaco Genomics
One Genome
• There are three very different ways to look at genomic diversity – and all are equally valid!
• A “collective” human genome– 3 billion base pairs – called the ‘golden path’
• Each one of us is a unique genome– “I am a genome of one”, my SNPS make me - ‘me’
• The Genome on planet earth– A collective metabolic evolution and speciation
Terra Genoma
Molecular Networks
• Genome or Proteome?• Proteome of Genome?• Wait a minute…• What if it’s both?• Now what would that look like?
Gene Regulatory Networks
Pathway Kinetics
Gene Regulatory Network
Bioinformatics Tools
• NCBI– BLAST, 12 million records, SNP databases
• ExPASy– Swiss-Prot, EMBL, Swiss-Model
• PIR – Protein Information Resource• PDB – Protein Data Bank• Pfam – Protein families
NCBI
• National Center for Biotechnology Information, part of NIH and NLM
• Funded by US – open to all• GenBank and GenPept
– 13 million entries, 12 billion base pairs– Resources include oncology, retroviruses,
SNP databases, and much more• Sequin submission of raw sequence data
NCBI Resources
Retroviruses
BLAST
• Basic Local Alignment Search Tool• Used as a “genomic search engine”• Compare your target sequence to the
“non-redundant” database of 13B bps.• Can search the genomes of species
– Human, mouse, fly, E.coli etc.• ‘Hits’ return inks to GenBank and GenPept
Swiss-Prot
• Swiss - protein annotated database• Protein resource
– Minimal redundancy, reasonably current– protein annotated / integrated database– Links to protein structures and properties
• Links back into GenBank, EMBL, DDBJ• Literature resources for submissions
ExPASy
• The ExPASy (Expert Protein Analysis System)
• Proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to analysis of protein sequences and structures
• Swiss-Prot and PROSITE• Links to SWISS-MODEL
PROSITE - Database of Protein Families and Domains
Structure Analysis
Protein Data Bank
• SWISS-MODEL• Protein Data Bank• Archive of .pdb files• Structures determined by X-ray, NMR• Theoretical Structure Search• Features a “Molecule of the Month”• http://www.rcsb.org/pdb/
PIR
• Protein Information Resource• iProClass and PRI-NREF
– PIR-PSD, Swiss-Prot, TrEMBL, RefSeq, GenPept, and PDB
• http://pir.georgetown.edu/• Integrated public resource of protein informatics• Supports genomic and proteomic research and
scientific discovery - iProClass and PRI-NREF
Pfam
• Protein family comparisons– Look at multiple alignments – View protein domain architectures – Examine species distribution – Follow links to other databases – View known protein structures
• Follow ‘conserved domains’ from BLASTp searches of protein databases
The Grand Challenge
The Technology Roadmap• Genomics
– 1995 to 2005• Proteomics
– 2000 to 2010• Systems biology
– 2005 to 2015• Genetic remodeling / re-engineering
– 2010 to 2020• Generation Phi
– Children born in 2025 may never know disease
Convergence of Biotech & Pharma
• Genomics• Proteomics• Systems biology• Pharmaco genomics• Genetic engineering
Mouse Genome
Gene Therapy
• Somatic Gene Therapy• Therapeutic Gene Therapy
– Incorporate “missing genes”– Remove cells from host organism– Amplify target cells– Insert gene using (viral) vector– Return target cells into host organism
• Insulin gene was one of the first trials
Labeling Active Genes Along Chromosomes
Transgenic Species
Designer Flies – Is Blue Cool?
Your Own Private Genome
Surfing the Genome• Internet technologies
– Connecting users, tools, and data• Molecular biology
– Racing forward a top Moore’s Law• Informatics
– Mathematical interrogation of nature’s secrets• Surfing the Genome!
– Discovering the “bio-logic” of Nature
http://www.SurfingTheGenome.us/ Spring 2003
Contact Information
• Robert D. Cormia• Foothill College• [email protected]• http://www.informaticus.org/• 650 747 1588• Surfing the Genome – Spring 2003