+ All Categories
Home > Documents > Genomics, Proteomics, and Bioinformatics Biology 224 Instructor: Tom Peavy August 31, 2009.

Genomics, Proteomics, and Bioinformatics Biology 224 Instructor: Tom Peavy August 31, 2009.

Date post: 30-Dec-2015
Category:
Upload: roxanne-jacobs
View: 225 times
Download: 0 times
Share this document with a friend
Popular Tags:
54
Genomics, Proteomics, and Bioinformatics Biology 224 Instructor: Tom Peavy August 31, 2009
Transcript

Genomics, Proteomics, and Bioinformatics

Biology 224

Instructor: Tom Peavy

August 31, 2009

• Interface of biology and computers

• Analysis of genomes, genes, mRNA and proteins using computer algorithms and computer databases

What is bioinformatics?

What is Genomics?

What is Proteomics?

What is the Transcriptome?

On bioinformatics

“Science is about building causal relations between natural phenomena (for instance, between a mutation in a gene and a disease). The development of instruments to increase our capacity to observe natural phenomena has, therefore, played a crucial role in the development of science - the microscope being the paradigmatic example in biology. With the human genome, the natural world takes an unprecedented turn: it is better described as a sequence of symbols. Besides high-throughput machines such as sequencers and DNA chip readers, the computer and the associated software becomes the instrument to observe it, and the discipline of bioinformatics flourishes.”

Martin Reese and Roderic Guigó, Genome Biology 2006 7(Suppl I):S1,introducing EGASP, the Encyclopedia of DNA Elements (ENCODE) Genome Annotation Assessment Project

What do you want out of this course?

Themes throughout the course: gene/protein families

Retinol-binding protein 4 (RBP4)

member of the lipocalin family small, abundant carrier protein

We will study it in a variety of contexts including--homologs in various species--sequence alignment--gene expression--protein structure--phylogeny

Tool-users

Tool-makers

bioinformatics

public healthinformatics

medicalinformatics

infrastructure

databases algorithms

DNA RNA

cDNAESTsUniGeneMicroarrays

phenotype

genomicDNAdatabases

protein sequence databases

protein

GenBankEMBL DDBJ

Housedat EBI

EuropeanBioinformatics

Institute

There are three major public DNA databases

Housed at NCBINational

Center forBiotechnology

Information

Housed in Japan

Growth of GenBank

Year

Bas

e p

airs

of

DN

A (

bil

lio

ns)

Seq

uen

ces

(mil

lio

ns)

Updated 8-12-04:>40b base pairs

1982 1986 1990 1994 1998 2002

Growth of GenBank + Whole Genome Shotgun(1982-November 2008)

Nu

mb

er

of s

eq

uen

ces

in G

en

Ban

k (m

illio

ns)

0

50

100

150

200

250

1982 1987 1992 1997 2002 2007

Ba

se p

air

s o

f DN

A in

Gen

Ba

nk (

bill

ions

) B

ase

pa

irs

in G

en

Ban

k +

WG

S (

billi

ons

)

Taxonomy at NCBI:~200,000 species are represented in GenBank

http://www.ncbi.nlm.nih.gov/Taxonomy/txstat.cgi11/08

The most sequenced organisms in GenBank

Homo sapiens 13.1 billion basesMus musculus 8.4bRattus norvegicus 6.1bBos taurus 5.2bZea mays 4.6bSus scrofa 3.6bDanio rerio 3.0bOryza sativa (japonica) 1.5bStrongylocentrotus purpurata 1.4bNicotiana tabacum 1.1b

Updated 11-6-08GenBank release 168.0Excluding WGS, organelles, metagenomics

Go to NCBI website

http://www.ncbi.nlm.nih.gov/

PubMed is… • National Library of Medicine's search service• 12 million citations in MEDLINE• links to participating online journals• PubMed tutorial (via “Education” on side bar)

Entrez integrates…

• the scientific literature; • DNA and protein sequence databases; • 3D protein structure data; • population study data sets; • assemblies of complete genomes

Entrez is a search and retrieval system that integrates NCBI databases

BLAST is…

• Basic Local Alignment Search Tool• NCBI's sequence similarity search tool• supports analysis of DNA and protein databases• 80,000 searches per day

OMIM is…

•Online Mendelian Inheritance in Man•catalog of human genes and genetic disorders•edited by Dr. Victor McKusick, others at JHU

Books is…

• searchable resource of on-line books

TaxBrowser is…

• browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses)• taxonomy information such as genetic codes• molecular data on extinct organisms

Structure site includes…

• Molecular Modelling Database (MMDB)

• biopolymer structures obtained from

the Protein Data Bank (PDB)• Cn3D (a 3D-structure viewer)• vector alignment search tool (VAST)

Review ofGenetics, Biochemistry

& Evolution

Human Genome Project

What is a typical Genomic structure for a

Eukaryotic gene?

Synonymous vs. nonsynonymous Synonymous vs. nonsynonymous changeschanges

Proline four fold degenerateamino acid

C C TTC C CC Synonymous changesC C AA Nonsynonmous changesC C GG

Arginine

C GG T

SynonymousSubstitution

Non-synonymousSubstitution

Central Dogma

• DNA RNA protein

• sequence structure function evolution

What kind of modificationsAre made to Eukaryotic mRNAs?

RNA Modifications

What are cDNAs?

Protein structures

• X-ray crystallography and Nuclear magnetic resonance (NMR)

• Primary structure– linear AA

• Secondary structure-– alpha helix and beta sheet

• Tertiary structures-– 3-d that exposes binding domains etc

Linkage maps

• YAC Yeast artificial chromosome &

• BAC Bacterial artificial chromosome -used to clone large pieces of DNA

-overlapping clones

• Are genes linked?

Organization of genomes

• Groups of genes within a species

-Comparative Genomics

• plastid genomes and mt genomes

How do we determine functions of genes?

How do we determine functions of genes?

• Expression patterns– Northerns

– RT-PCR

– SAGE

– Microarrays

• Transgenics– insert genes what results?

• Mutants– classical genetics

– molecular genetics

• And Functional Protein Assays

Charles Darwin

• Descent with modification– species change through time and are related to a

common ancestor

• Natural Selection is the process by which this change occurs

Understanding Natural selection

• acts on individuals though consequences occur in populations– Individual’s phenotype reason survived and

reproduced– after a time this will change the distribution in

the population, – what ultimately changes?

• Gene pool

New alleles

• Point change is all that is needed– not always a "big deal"

• neutral change

– can be in Sickle cell anemia

Gene duplication

• creates an additional copy of a gene – unequal cross-over– X-rays

• Are these duplicates maintained in populations?– Psuedogenes

Polyploidy

• additional set of chromosomes– Found in plants– Amphibians, invertebrates

• Through a type of parthenogenesis

– Triploid• Poor fertility

• Hybridization or meiosis malfunction

Homology

• study of likeness (literal)

• Similarity between species (or genes) that results from inheritance of traits from a common ancestor– Unless know of a common ancestor have to be

careful when using this word.

Orthologous vs Paralogous Genes

Gene Duplication

Speciation

Species 1 Species 2

Species

• All organisms alive today can trace their ancestry back to the origin of life some 3.8 billion years ago– Since then millions if not billions of branching

events have occurred

• Mechanisms have to be in place for change to occur – genetic drift and natural selection


Recommended