Date post: | 16-Jul-2015 |
Category: |
Education |
Upload: | skuastkashmir |
View: | 77 times |
Download: | 1 times |
Dr N A Ganai Professor
Centre of Animal Biotechnology
SKUAST-Kashmir
Contents Introduction to Bioinformatics
Complexity of life
Size of genome
Exponential growth in information generation
Why and how to handle this information
Definition of Bioinformatics? Data bases
Tools
Scope of Bioinformatics
Anticipated benefits
Ethical, Legal, and Social Issues
DNA is not merely a molecule with a pattern;
it is a code, a language, and an information
storage mechanism
Size of Human Genome Each cell carries: 3.2 billion base pairs
A code you need to write in 500 books, each book of 500 pages
Length of DNA in adult man:
The total length of DNA present in one adult human is calculated as: (length of 1 bp)(number of bp per cell)(number of cells in the body)
(0.34 × 10-9 m)(6 × 109)(1013)
2.0 × 1013 meters
That is the equivalent of nearly 70 trips from the earth to the sun and back.
Human Genome Project • HGP: International research effort
• Began 1990, completed 2003
• Biggest ever project in life sciences
• 20 labs participated world around
• Next steps for ~30,000 genes – Function and regulation of all genes
– Significance of variations between people
– Cures, therapies, “genomic healthcare”
From DNA to Cell Function
DNA sequence (split into genes)
Amino Acid Sequence
Protein
3D Structure
Protein Function
Cell Activity
codes for
folds into
dictates determines
has
Lecture 2
Genomics
Transcriptomics
Proteomics Metabolomics
Year Base Pairs Sequences 1982 680,338 606
1983 2,274,029 2,427
1984 3,368,765 4,175
1985 5,204,420 5,700
1986 9,615,371 9,978
1987 15,514,776 14,584
1988 23,800,000 20,579
1989 34,762,585 28,791
1990 49,179,285 39,533
1991 71,947,426 55,627
1992 101,008,486 78,608
1993 157,152,442 143,492
1994 217,102,462 215,273
1995 384,939,485 555,694
1996 651,972,984 1,021,211
1997 1,160,300,687 1,765,847
1998 2,008,761,784 2,837,897
1999 3,841,163,011 4,864,570
2000 11,101,066,288 10,106,023
2001 15,849,921,438 14,976,310
2002 28,507,990,166 22,318,883
2003 36,553,368,485 30,968,418
2004 44,575,745,176 40,604,319
2005 56,037,734,462 52,016,762
2006 69,019,290,705 64,893,747
2007 83,874,179,730 80,388,382
2008 99,116,431,942 98,868,465
Av. Growth in data generation :
5400 times per year
Exponential Growth in Biological Databases:
High throughput Technologies
PCR : by Kary Mullis 1983 - an employee of Cetus Corporation, a
biotechnology firm in California
Awarded the Nobel Prize for the discovery of PCR in 1993
Microarray Technology
Real-Time PCR
DNA Chips
Sequencing
Sanger method : 1975
Chain Termination Method
Maxam Gilbert : 1977
Chemical Modification Method
Next Generation: 1994 High Throughput
Parallel sequencing
Entire genome can be sequenced
in a matter of weeks
History of DNA Sequencing
Avery: Proposes DNA as ‘Genetic Material’
Watson & Crick: Double Helix Structure of DNA
Holley: Sequences Yeast tRNAAla
1870
1953
1940
1965
1970
1977
1980
1990
2002
Miescher: Discovers DNA
Wu: Sequences Cohesive End DNA
Sanger: Dideoxy Chain Termination
Gilbert: Chemical Degradation
Messing: M13 Cloning
Hood et al.: Partial Automation
• Cycle Sequencing
• Improved Sequencing Enzymes
• Improved Fluorescent Detection Schemes
1986
• Next Generation Sequencing
•Improved enzymes and chemistry
•Improved image processing
Adapted from Eric Green, NIH; Adapted from Messing & Llaca, PNAS (1998)
1
15
150
50,000
25,000
1,500
200,000
50,000,000
Efficiency
(bp/person/year)
15,000
100,000,000,000 2008
The Genome Sequence is at hand…so?
“The good news is that we have the human genome.
The bad news is it’s just a parts list”
• Gene number, exact locations, and functions
• Gene regulation
• DNA sequence organization
• Noncoding DNA types, amount, distribution, information content, and functions
• Coordination of gene expression, protein synthesis, and post-translational events
• Interaction of proteins in complex molecular machines
• Predicted vs experimentally determined gene function
• Evolutionary conservation among organisms
• Protein conservation (structure and function)
• Proteomes (total protein content and function) in organisms
• Correlation of SNPs (single-base DNA variations among individuals) with health and
disease
• Disease-susceptibility prediction based on gene sequence variation
• Genes involved in complex traits and multigene diseases
• Complex systems biology including microbial consortia useful for environmental
restoration
• Developmental genetics, genomics
What Next??? We need to know every part, its function
and application
What is Bioinformatics? The newest, fastest growing specialty
in the life sciences that integrates biotechnology and computer science.
Computers aid to collect, analyze, and interpret biological information at the molecular level.
Bioinformatics encompasses a set of software tools that aid in:
molecular sequence analysis,
structural analysis
functional analysis
of genes & genomes and their corresponding products
Understand a living cell and how it functions at molecular level
Develop data basses and computational tools
Tools are used to mine (analyze) databases to generate knowledge to better understand the living systems
Goal of Bioinformatics
Biological Data basses : Why Why?
Store all the data (information) related to Genomics, Transcriptomics,
preoteomics, Metabolomics in Data Bases
Make biological data available to scientists.
To make biological data available in computer-readable form.
Types of Databases
Primary Databases: Store raw DNA/RNA and protein data submitted by scientists
GenBank: by NCBI USA www.ncbi.nlm.nih.gov/genbank/
EMBL: European : www.ebi.ac.uk/embl/
DDBJ: Japan www.ddbj.nig.ac.jp/
PDB: Protein Data bank http://www.rcsb.org/pdb/home/home.do
Data Bases … cont. Secondary data bases: Contain computationally processed or
manually curetted information based on primary data bases. SWISS-Prot: Curetted protein data base www.ebi.ac.uk/swissprot
TrEMBL: Translated Nucleic acid sequences in EMBL
PIR: annotated protein sequences
UniProt: Combined database of SWISSProt, TrEMBL, PIR
Prosite PRINTS BLOCKS PFAM
Specialized Data bases :cater to a particular research interest
FlyBase HIV Sequence data base Ribosome data base OMIM Microarray Gene expression database ExPASY etc. etc.
We need Bioinformatics Tools…
To mine (analyze) databases to generate knowledge to
better understand the living systems
Search/compare databases
Sequence Analysis
Genomics
Phylogenics
Structure Prediction
Molecular Modelling
Microarrays
Packages, Misc Apps, Graphics, Scripts
Examples of Bioinformatics Tools
Database interfaces (Search Tools) Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, …
Sequence alignment BLAST, FASTA (Fast All)
Multiple sequence alignment Clustal, MultAlin, DiAlign
Gene finding Genscan, GenomeScan, GeneMark, GRAIL
Protein Domain analysis and identification pfam, BLOCKS, ProDom,
Pattern Identification/Characterization Gibbs Sampler, AlignACE, MEME
Protein Folding prediction PredictProtein, SwissModeler
Five websites that all biologists should Bookmark
NCBI (The National Center for Biotechnology Information; http://www.ncbi.nlm.nih.gov/
EBI (The European Bioinformatics Institute) http://www.ebi.ac.uk/
The Canadian Bioinformatics Resource http://www.cbr.nrc.ca/
SwissProt/ExPASy (Swiss Bioinformatics Resource) http://expasy.cbr.nrc.ca/sprot/
PDB (The Protein Databank) http://www.rcsb.org/PDB/
Anticipated Benefits of
Genome Research & Bioinformatics
Molecular Medicine : Gene Testing ,
Pharmacogenomics
Gene Therapy
improve diagnosis of disease
detect genetic predispositions to disease
create drugs based on molecular information
use gene therapy and control systems as drugs
design “custom drugs” (pharmacogenomics) based on
individual genetic profiles
Microbial Genomics
rapidly detect and treat pathogens in clinical practice
develop new energy sources (biofuels)
monitor environments to detect pollutants
protect citizenry from biological and chemical warfare
clean up toxic waste safely and efficiently
DNA Identification (Forensics)
identify potential suspects whose DNA may
match evidence left at crime scenes
exonerate persons wrongly accused of
crimes
establish paternity and other family
relationships
identify endangered and protected species
as an aid to wildlife officials (could be
detect bacteria and other organisms that
may pollute air, water, soil, and food
match organ donors with recipients in
transplant programs
determine pedigree for seed or livestock
breeds
Benefits: …contined
Agriculture, Livestock Breeding, and
Bioprocessing
grow disease-, insect-, and drought-resistant crops
breed healthier, more productive, disease-resistant
farm animals
grow more nutritious produce
develop biopesticides
incorporate edible vaccines incorporated into food
products
develop new environmental cleanup uses for
plants like tobacco
Benefits …cont
.
ELSI: Ethical, Legal,
and Social Issues • Privacy and confidentiality of genetic information.
• Fairness in the use of genetic information by insurers, employers, courts, schools, adoption agencies, and the military, among others.
• Psychological impact, stigmatization, and discrimination due to an individual’s genetic differences.
• Reproductive issues including adequate and informed consent and
use of genetic information in reproductive decision making.
• Clinical issues including the education of doctors and other health-
service providers, people identified with genetic conditions, and the
general public about capabilities, limitations, and social risks; and
implementation of standards and quality-control measures.
Health and environmental issues concerning genetically modified foods
(GM) and microbes.
Commercialization of products including property rights (patents, copyrights, and trade secrets) and accessibility of data and materials.
Common Questions
of a Student of biology