+ All Categories
Home > Documents > Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics)...

Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics)...

Date post: 20-Dec-2015
Category:
View: 215 times
Download: 1 times
Share this document with a friend
Popular Tags:
11
Lecture 1 BNFO 135 Usman Roshan
Transcript
Page 1: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.

Lecture 1

BNFO 135

Usman Roshan

Page 2: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.

Course overview

• Perl progamming language (and some Unix basics)– Unix basics– Intro Perl exercises– Programs for comparing DNA and protein sequences

• Sequence analysis– Pairwise and multiple sequence comparison– Sequence alignments– Application of alignments– Heuristic alignment (BLAST)

Page 3: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.

Overview (contd)

• Grade: 40% programming assignments, 30% mid-term and 30% final exam

• Recommended Texts:– Perl for Bioinformatics by Arun Jagota– Introduction to Bioinformatics by Arthur Lesk

Page 4: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.

Nothing in biology makes sense, except in the light of evolution

AAGACTT -3 mil yrs

-2 mil yrs

-1 mil yrs

today

AAGACTT

T_GACTTAAGGCTT

_GGGCTT TAGACCTT A_CACTT

ACCTT (Cat)

ACACTTC (Lion)

TAGCCCTTA (Monkey)

TAGGCCTT (Human)

GGCTT(Mouse)

T_GACTTAAGGCTT

AAGACTT

_GGGCTT TAGACCTT A_CACTT

AAGGCTT T_GACTT

AAGACTT

TAGGCCTT (Human)

TAGCCCTTA (Monkey)

A_C_CTT (Cat)

A_CACTTC (Lion)

_G_GCTT (Mouse)

_GGGCTT TAGACCTT A_CACTT

AAGGCTT T_GACTT

AAGACTT

Page 5: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.

Representing DNA in a format manipulatable by computers

• DNA is a double-helix molecule made up of four nucleotides:– Adenosine (A)– Cytosine (C)– Thymine (T)– Guanine (G)

• Since A (adenosine) always pairs with T (thymine) and C (cytosine) always pairs with G (guanine) knowing only one side of the ladder is enough

• We represent DNA as a sequence of letters where each letter could be A,C,G, or T.

• For example, for the helix shown here we would represent this as CAGT.

Page 6: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.

Transcription and translation

Page 7: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.

Amino acids

Proteins are chains ofamino acids. There aretwenty different aminoacids that chain indifferent ways to formdifferent proteins.

For example,FLLVALCCRFGH (this is how we could storeit in a file)

This sequence of aminoacids folds to form a 3-Dstructure

Page 8: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.

Protein folding

Page 9: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.

Protein folding

• The protein foldingproblem is to determinethe 3-D protein structurefrom the sequence.• Experimental techniquesare very expensive. • Computational are cheap but difficult to solve. • By comparing sequences we can deduce the evolutionary conserved portions which are also functional (most of the time).

Page 10: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.

Protein structure

• Primary structure: sequence ofamino acids.• Secondary structure: parts of thechain organizes itself into alpha helices, beta sheets, and coils. Helices and sheets are usually evolutionarily conserved and can aid sequence alignment.• Tertiary structure: 3-D structure of entire chain• Quaternary structure: Complex of several chains

Page 11: Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.

Key points

• DNA can be represented as strings consisting of four letters: A, C, G, and T. They could be very long, e.g. thousands and even millions of letters

• Proteins are also represented as strings of 20 letters (each letter is an amino acid). Their 3-D structure determines the function to a large extent.


Recommended