+ All Categories
Home > Documents > Introduction to Bioinformatics 236523/234525 Lecturer: Prof. Yael Mandel-Gutfreund Teaching...

Introduction to Bioinformatics 236523/234525 Lecturer: Prof. Yael Mandel-Gutfreund Teaching...

Date post: 22-Dec-2015
Category:
Upload: alfred-greene
View: 218 times
Download: 0 times
Share this document with a friend
Popular Tags:
45
Introduction to Bioinformatics 236523/234525 Lecturer: Prof. Yael Mandel- Gutfreund Teaching Assistance: Shai Ben-Elazar Idit kosti urse web site : tp://webcourse.cs.technion.ac.il/236523
Transcript

Introduction to Bioinformatics236523/234525

Lecturer: Prof. Yael Mandel-Gutfreund

Teaching Assistance:

Shai Ben-Elazar

Idit kosti

Course web site :http://webcourse.cs.technion.ac.il/236523

2

What is Bioinformatics?

3

Course Objectives

• To introduce the bioinfomatics discipline • To make the students familiar with the major

biological questions which can be addressed by bioinformatics tools

• To introduce the major tools used for sequence and structure analysis and explain in general how they work (limitation etc..)

4

Course Structure and Requirements

1.Class Structure1. 2 hours Lecture 2. 1 hour tutorial

2. Home work• Homework assignments will be given every second

week• The homework will be done in pairs.• 5/5 homework assignments will be submitted

2. A final project will be conducted in pairs * Project will be presented as a poster –poster day 14.3

5

Grading

• 20 % Homework assignments

• 80 % final project

6

Literature list• Gibas, C., Jambeck, P. Developing Bioinformatics

Computer Skills. O'Reilly, 2001. • Lesk, A. M. Introduction to Bioinformatics. Oxford

University Press, 2002.

• Mount, D.W. Bioinformatics: Sequence and Genome Analysis. 2nd ed.,Cold Spring Harbor Laboratory Press, 2004.

Advanced Reading

Jones N.C & Pevzner P.A. An introduction to Bioinformatics algorithms MIT Press, 2004

7

What is Bioinformatics?

8

“The field of science in which biology, computer science, and information technology merge to form a single discipline”

Ultimate goal: to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.

What is Bioinformatics?

9

Central Paradigm in Molecular Biology

mRNAGene (DNA) Protein

21ST centaury

Genome Transcriptome Proteome

10

From DNA to Genome

Watson and Crick DNA model 1955

1960

1965

1970

1975

1980

1985

11

1995

1990

2000 First human genome draft

First genomeHemophilus Influenzae

Yeast genome

12

Total 1379 294

Eukaryotes 133 39

Bacteria 1152 235

Archaea 94 23

Complete Genomes

2010 2005

1,000 Genomes Project: Expanding the Map of Human Genetics

Researchers hope the effort will speed up the discovery of many diseases's genetic roots

13

14

Main Goal:

To understand the living cell

Annotation Comparativegenomics

Functionalgenomics

25000 genomes… What’s Next ?

The “post-genomics” The “post-genomics” eraera

SystemsBiology

From ….25000 genomes

To…Understanding living cells

16

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG

CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA

CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC

AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA

AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA

TAT GGA CAA TTG GTT TCT TCT CTG AAT ......

.............. TGAAAAACGTA

Annotation

17

Annotation

Identify the genes within a given sequence of DNA

Identify the sitesWhich regulate the gene

Predict the function

18

How do we identify a genein a genome?

A gene is characterized by several features (promoter, ORF…)some are easier and some harder to detect…

19

CCTGACAAATTCGACGTGCGGCATTGCATGCAGACGTGCATG

CGTGCAAATAATCAATGTGGACTTTTCTGCGATTATGGAAGAA

CTTTGTTACGCGTTTTTGTCATGGCTTTGGTCCCGCTTTGTTC

AGAATGCTTTTAATAAGCGGGGTTACCGGTTTGGTTAGCGAGA

AGAGCCAGTAAAAGACGCAGTGACGGAGATGTCTGATG CAA

TAT GGA CAA TTG GTT TCT TCT CTG

AAT .................................

.............. TGAAAAACGTA

TF binding sitepromoter

Ribosome binding SiteORF=Open Reading FrameCDS=Coding Sequence

Transcription

Start Site

20

Using Bioinformatics approaches for Gene hunting

Relative easy in simple organisms (e.g. bacteria)

VERY HARD for higher organism (e.g. humans)

21

Comparativegenomics

22

Comparison between the full drafts of the human and chimp genomesrevealed that they differ only by 1.23%

How humans are chimps?

Perhaps not surprising!!!

So where are we different ??

23

Human ATAGCGGGGGGATGCGGGCCCTATACCCChimp ATAGGGG--GGATGCGGGCCCTATACCCMouse ATAGCG---GGATGCGGCGC-TATACC-A

Human ATAGCGGGGGGATGCGGGCCCTATACCCChimp ATAGGGGGGATGCGGGCCCTATACCCMouse ATAGCGGGATGCGGCGCTATACCA

24

And where are we similar ???

VERY SIMAILARConserved between many organisms

VERYDIFFERENT

25

Functionalgenomics

26

TO BE IS NOT ENOUGH In any time point a gene can be functional or not

27

From the gene expression pattern we can lean:

What does the gene do ?When is it needed?What other genes or proteins interact with it?…..

What's wrong??

28

Systems Biology

Jeong et al. Nature 411, 41 - 42 (2001)

Biological networks

What can we learn from a network?

What can we learn from Biological Networks

• Is the protein essential for the organism ?• Is it a good drug targets?

What can we learn about this protein

What of all this will we learn in the course?

32

The course will concentrate on the bioinformatics tools and databases which are used to :Annotate genes, Compare genes and genomesInfer the function of the genes and proteinsAnalyze the interactions between genes and proteinsETC….

33

Biological Databases

The different types of data are collected in database

– Sequence databases – Structural databases– Databases of Experimental Results

All databases are connected

34

Sequence databases

• Gene database

• Genome database

• Disease related mutation database

• ………….

35

Genome Browsers

Easy “walk” through the genome

UCSC Genome Browser http://genome.ucsc.edu/

36

Disease related database

37

Sickle Cell Anemia

• Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin

Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/

38

Healthy Individual>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA

ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA

GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]

MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG

AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

39

Diseased Individual>gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA

ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA

GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATGCTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

>gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens]

MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG

AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

40

Structure Databases

• 3-dimensional structures of proteins, nucleic acids, molecular complexes etc

• 3-d data is available due to techniques such as NMR and X-Ray crystallography

41

42

Databases of Experimental Results

• Data such as experimental microarray images- gene expression data

• Proteomic data- protein expression data

• Metabolic pathways, protein-protein interaction data, regulatory networks

• ETC………….

43

PubMed

Service of the National Library of Medicine

http://www.ncbi.nlm.nih.gov/pubmed/

Literature Databases

44

Putting it all Together

• Each Database contains specific information

• Like other biological systems also these databases are interrelated

45

GENOMIC DATAGenBank

DDBJ

EMBL

ASSEMBLED GENOMES

GoldenPath

WormBase

TIGR

PROTEIN

PIR

SWISS-PROT

STRUCTUREPDB

MMDB

SCOP

LITERATURE

PubMed

PATHWAYKEGG

COG

DISEASE

LocusLink

OMIM

OMIA

GENESRefSeq

AllGenes

GDBSNPs

dbSNP

ESTs

dbEST

unigene

MOTIFS

BLOCKS

Pfam

Prosite

GENE EXPRESSION

Stanford MGDB

NetAffx

ArrayExpress


Recommended