Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | roxanne-jacobs |
View: | 225 times |
Download: | 0 times |
• Interface of biology and computers
• Analysis of genomes, genes, mRNA and proteins using computer algorithms and computer databases
What is bioinformatics?
On bioinformatics
“Science is about building causal relations between natural phenomena (for instance, between a mutation in a gene and a disease). The development of instruments to increase our capacity to observe natural phenomena has, therefore, played a crucial role in the development of science - the microscope being the paradigmatic example in biology. With the human genome, the natural world takes an unprecedented turn: it is better described as a sequence of symbols. Besides high-throughput machines such as sequencers and DNA chip readers, the computer and the associated software becomes the instrument to observe it, and the discipline of bioinformatics flourishes.”
Martin Reese and Roderic Guigó, Genome Biology 2006 7(Suppl I):S1,introducing EGASP, the Encyclopedia of DNA Elements (ENCODE) Genome Annotation Assessment Project
Themes throughout the course: gene/protein families
Retinol-binding protein 4 (RBP4)
member of the lipocalin family small, abundant carrier protein
We will study it in a variety of contexts including--homologs in various species--sequence alignment--gene expression--protein structure--phylogeny
Tool-users
Tool-makers
bioinformatics
public healthinformatics
medicalinformatics
infrastructure
databases algorithms
GenBankEMBL DDBJ
Housedat EBI
EuropeanBioinformatics
Institute
There are three major public DNA databases
Housed at NCBINational
Center forBiotechnology
Information
Housed in Japan
Growth of GenBank
Year
Bas
e p
airs
of
DN
A (
bil
lio
ns)
Seq
uen
ces
(mil
lio
ns)
Updated 8-12-04:>40b base pairs
1982 1986 1990 1994 1998 2002
Growth of GenBank + Whole Genome Shotgun(1982-November 2008)
Nu
mb
er
of s
eq
uen
ces
in G
en
Ban
k (m
illio
ns)
0
50
100
150
200
250
1982 1987 1992 1997 2002 2007
Ba
se p
air
s o
f DN
A in
Gen
Ba
nk (
bill
ions
) B
ase
pa
irs
in G
en
Ban
k +
WG
S (
billi
ons
)
Taxonomy at NCBI:~200,000 species are represented in GenBank
http://www.ncbi.nlm.nih.gov/Taxonomy/txstat.cgi11/08
The most sequenced organisms in GenBank
Homo sapiens 13.1 billion basesMus musculus 8.4bRattus norvegicus 6.1bBos taurus 5.2bZea mays 4.6bSus scrofa 3.6bDanio rerio 3.0bOryza sativa (japonica) 1.5bStrongylocentrotus purpurata 1.4bNicotiana tabacum 1.1b
Updated 11-6-08GenBank release 168.0Excluding WGS, organelles, metagenomics
PubMed is… • National Library of Medicine's search service• 12 million citations in MEDLINE• links to participating online journals• PubMed tutorial (via “Education” on side bar)
Entrez integrates…
• the scientific literature; • DNA and protein sequence databases; • 3D protein structure data; • population study data sets; • assemblies of complete genomes
BLAST is…
• Basic Local Alignment Search Tool• NCBI's sequence similarity search tool• supports analysis of DNA and protein databases• 80,000 searches per day
OMIM is…
•Online Mendelian Inheritance in Man•catalog of human genes and genetic disorders•edited by Dr. Victor McKusick, others at JHU
TaxBrowser is…
• browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses)• taxonomy information such as genetic codes• molecular data on extinct organisms
Structure site includes…
• Molecular Modelling Database (MMDB)
• biopolymer structures obtained from
the Protein Data Bank (PDB)• Cn3D (a 3D-structure viewer)• vector alignment search tool (VAST)
Synonymous vs. nonsynonymous Synonymous vs. nonsynonymous changeschanges
Proline four fold degenerateamino acid
C C TTC C CC Synonymous changesC C AA Nonsynonmous changesC C GG
Arginine
C GG T
Protein structures
• X-ray crystallography and Nuclear magnetic resonance (NMR)
• Primary structure– linear AA
• Secondary structure-– alpha helix and beta sheet
• Tertiary structures-– 3-d that exposes binding domains etc
Linkage maps
• YAC Yeast artificial chromosome &
• BAC Bacterial artificial chromosome -used to clone large pieces of DNA
-overlapping clones
• Are genes linked?
Organization of genomes
• Groups of genes within a species
-Comparative Genomics
• plastid genomes and mt genomes
How do we determine functions of genes?
• Expression patterns– Northerns
– RT-PCR
– SAGE
– Microarrays
• Transgenics– insert genes what results?
• Mutants– classical genetics
– molecular genetics
• And Functional Protein Assays
Charles Darwin
• Descent with modification– species change through time and are related to a
common ancestor
• Natural Selection is the process by which this change occurs
Understanding Natural selection
• acts on individuals though consequences occur in populations– Individual’s phenotype reason survived and
reproduced– after a time this will change the distribution in
the population, – what ultimately changes?
• Gene pool
New alleles
• Point change is all that is needed– not always a "big deal"
• neutral change
– can be in Sickle cell anemia
Gene duplication
• creates an additional copy of a gene – unequal cross-over– X-rays
• Are these duplicates maintained in populations?– Psuedogenes
Polyploidy
• additional set of chromosomes– Found in plants– Amphibians, invertebrates
• Through a type of parthenogenesis
– Triploid• Poor fertility
• Hybridization or meiosis malfunction
Homology
• study of likeness (literal)
• Similarity between species (or genes) that results from inheritance of traits from a common ancestor– Unless know of a common ancestor have to be
careful when using this word.