+ All Categories
Home > Education > BoInformatics Lecture 5

BoInformatics Lecture 5

Date post: 18-Jul-2015
Category:
Upload: hamid-ur-rahman
View: 609 times
Download: 0 times
Share this document with a friend
Popular Tags:
65
Bioinformatics Lecture# 5 Dr. Naeem Ud Din Khattak Professor Department of Zoology Islamia College Peshawar (Chartered University)
Transcript

Bioinformatics Lecture# 5

Dr. Naeem Ud Din Khattak

Professor

Department of Zoology

Islamia College Peshawar (Chartered University)

Phylogenetic Tree

Construction

3

• The mutation distance : The

minimal number of nucleotides that would

need to be altered in order for the gene for one

Protein to code for the other.

• ACTGAT A C T G A T -

T C T - A T C

TCTATC

The construction of the tree

4

• Assume proteins, A, B and C, and their mutation distances.

• There are two Qs:

1. Which pair does one join together first?

2. What are the lengths of edges a, b, and c?

B C

A 24 28

B 32

Which pair does one join together first ?

5

• It is simply by choosing the pair with the

smallest mutation distance.

B C

A 24 28

B 32 A B C

What are the lengths of legs a, b, and c?

6

B C

A 24 28

B 32

a+b=24 a+c=28b+c=32

a =10b =14c =18

A B C

a b

c

a =?b =?c =?

• i. a+b=24 ii. a+c=28 iii. b+c=32

• a+b=24 : a=24-b put the value of a in ii :

• 24-b+c=28 ; c-b=28-24; c-b=4 : c=4+b

• put value of c in iii. b+4+b=32 :

2b+4=32: 2b=32-4;

• b=28/2=14

• Now put the value of b in 1

• Note that this analysis assumes that there are no multiple substitutions|||||||||||||||when a single site undergoes two or more changes e.g. the ancestral sequence … ATGT … gives

… AGGT …

• and … ACGT …).

Based on lectures by C-B Stewart, and by Tal Pupko

Ancestral Node

or ROOT of the Tree

Internal Nodes orDivergence Points (represent hypothetical ancestors of the

taxa)

Branches orLineages

Terminal Nodes

A

B

C

D

E

Represent theTAXA (genes,populations,species, etc.)used to inferthe phylogeny

Phylogenetic Tree Terminology

Based on lectures by C-B Stewart, and by Tal Pupko

Phylogenetic trees diagram the evolutionaryrelationships between the taxa

((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses

Taxon A

Taxon BTaxon C

Taxon E

Taxon D

Based on lectures by C-B Stewart, and by Tal Pupko

((A,(B,C)),(D,E))

Taxon A

Taxon B

Taxon C

Taxon E

Taxon D

__ B and C are more closely related to each otherthan either is to A,___ and A, B, and C form a clade that is a sistergroup to the clade composed of D and E. ____Ifthe tree has a time scale, then D and E are the mostclosely related.

clade

Clade

Sequence Comparisons

• Nature acts conservatively, i.e., it does not develop a new kind of biology for every life form but continuously changes and adapts a proven general concept.

• Novel functionalities do not appear because a new gene has suddenly arisen but are developed and modified during evolution.

• Thus, Alleles of a gene found in a population arise from a common ancestor gene_____________ HOMOLOGOUS

Homology is not a measure of similarity, but rather that sequences have a shared evolutionary history and, therefore, possess a common ancestral sequence

(Tatusovet al. 1997).

• An all or none phenomenon

Orthologs• Homologous proteins from different species

that possess the same function (e.g.,

corresponding kinases in a signal

transduction pathway in humans and mice)

are called orthologs.

Paralogs• Homologous proteins that have different

functions in the same species (e.g., two

kinases in different signal transduction

pathways of humans) are termed paralogs.

• A visual representation of orthologs (and some other commonly confused terms, paralogs and homologs)

Orthologs: "genes that have diverged after a speciation event... [that] tend to have similar function" (Fulton et al. 2006). Thus, orthologs are genes whose encoded proteins fulfill similar roles in different species.

• Homology is not

quantifiable –

• The similarity and Identity

of two sequences, however

IS

Identity

• ratio of the number of identical amino acids or nucleotides relative to the total number of amino acids or nucleotides.

4/20 = 0.2.

similarity• Unlike identity, similarity is not as simple to

calculate. Before similarity can be determined, it must first be defined how similar the building blocks of sequences are to each other.

• This is done with the help of similarity matrices _____ specify the probability at which a sequence transforms into another sequence over time.

• dependent on the time and the mutational rate of nucleotides.

• For nucleotide sequences the simplest solution is an identity matrix ( Fig. 4.2a).

• For protein sequences, an identity matrix is not

sufficient to describe biological and evolutionary processes.

• Amino acids are not exchanged with the same probability as might be conceived theoretically.

• YOU CAN RECALL THE SYNONYMOUS AND NON-SYNONYMOUS MUTATIONS

• For example,

• an exchange of aspartic acid for glutamic acid is frequently observed;

• aspartic acid to tryptophan is seen rarely.

T in

DNA

DNAT

• A second reason for the mutation ofaspartic acid- to- glutamic acid

to occur more often is that both have similar properties.

• In contrast aspartic acid and tryptophan are chemically different – the hydrophobic tryptophan is frequently found in the center of proteins, whereas the hydrophilic aspartic acid occurs more often at the surface.

• Amino acid substitution matrices, therefore, describe the probability at which amino acids are exchanged in the course of evolution.

• The most commonly used amino acid scoring matrices are the

PAM

(Position Accepted Mutation; Dayhoff et al. 1978) and

BLOSUM groups

• (Blocks Substitution Matrix; Henikoff and Henikoff 1992)

Tryptophan Trp WHydrophobic

aspartic acid Asp D

Glutamic acid GluHydrophilic

E

Electrically Charged (negative)

NUCLEOTIDE AND AMINO ACID

SEQUENCES ARE

EVOLUTIONARILY DIFFERENT

SO,

WE NEED DIFFERENT CRITERIA AND

MATRICES TO ANALYZE THEM

• ( Fig. 4.2 a)

• For nucleotide sequences the simplest solution is an identity matrix

Score: 65 Score: 19

( Fig. 4.2 b) For Amino Acid SeqsWe need Similarity Matrices

Calculation of a global alignment of two similar protein sequences.

Calculation of a global alignment of two similar protein Sequences

Identity

• ratio of the number of identical amino acids or nucleotides relative to the total number of amino acids or nucleotides.

4/20 = 0.2.

Identity

• ratio of the number of identical amino acids or nucleotides relative to the total number of amino acids or nucleotides.

4/20 = 0.2.

•Using MEGA to Calculate Mutation Distance

Outgroup to root a phylogenetic tree

• The tree of human, chimpanzee, gorilla and orangutan genes is rooted with a baboon gene because

• we know from the fossil record that the common ancestor of the four species split away from baboon earlier in geological time

• Let’s See Members of this Group

Outgroup

Chimp

Human

Gorilla

Orangutan

Baboon

0.02

Chimp

Human

Gorilla

Orangutan

0.01

Kiwi Ostrich Swan Ring Necked Phaesant Silver phaesant song sparrow Parrot Lizzard

Outgroup

Kiwi

Struthio camelus

Swan

song sparrow

Ring nicked Phaesant

Silver pheasant

Parrot

The Design of the phylogenetic TREE does not change the evolutionary distance among the various taxa represented.

Kiwi

Struthio camelus

Swan

song sparrow

Ring nicked Phaesant

Silver pheasant

Parrot

The Design of the phylogenetic TREE does not change the evolutionary distance among the various taxa represented.

Types of Trees

rooted trees

Common

Ancestor

Types of treesUnrooted tree represents the same phylogeny without the

root node

Fig. 4.6. Phylogenetic tree of dopamine receptor sequences.

Gene trees are not the same as species trees

Examples of what can be inferred from phylogenetic trees

(DNA, protein)

1. Which species are the closest living relatives of modern humans?

2. Did the infamous Florida Dentist infect his patients with HIV?

3. What is the relation between HIV and SIV

Relatives of modern humans?

Mitochondrial DNA, most nuclear DNA-encoded genes, and DNA/DNA hybridization

The pre-molecular view

MYA

Chimpanzees

Orangutans Humans

Bonobos

GorillasHumans

Bonobos

Gorillas Orangutans

Chimpanzees

MYA015-30014

Based on lectures by C-B Stewart, and by Tal Pupko

2. Did the Florida Dentist infect his patients with HIV?

DENTIST

DENTIST

Patient D

Patient F

Patient C

Patient A

Patient G

Patient B

Patient E

Patient A

Local control 2

Local control 3

Local control 9

Local control 35

Local control 3

Yes:The HIV sequences fromthese patients fall withinthe clade of HIV sequences found in the dentist.

No

No

From Ou et al. (1992) and Page & Holmes (1998)

Phylogenetic treeof HIV sequencesfrom the DENTIST,his Patients, & LocalHIV-infected People:

3. Relating Human HIV to Simian SIVretroviruses

human immunodeficiency virus 1 (HIV-1), pathogenic

SIVs are not pathogenic in their normal hosts

IMAGE FROM: Medical Art Service, Munich / Wellcome Images.

The structure of HIV

CD4 proteins on surface

Phospholipidmembrane

Matrix

Viral RNA

Viral enzymes:- Reverse transcriptase- Integrase- Protease

Capsid

HIV attaches to CD4receptors on T-Cell

Viral core of enzymes and RNA injected into cell

HIV’s replication cycle

DNA transcribed from viral RNA

Double-stranded DNA produced

DNA integrates with host chromosome

Viral RNA

Viral proteins

New virus assembled

Viral protease cuts up proteins

Transcription

New virus leaves cell

Viral integrase

Retrovirus genomes accumulate mutations relatively quickly • lacks an efficient proofreading, so make errors when it carries out RNA-dependent DNA synthesis.• the molecular clock runs rapidly in retroviruses,

•genomes that diverged quite recently display sufficient nucleotide dissimilarity for a phylogenetic analysis to be carried out.

•In less than 100 years, HIV and SIV genomes contain sufficient data.

RT-PCRReverse transcription polymerase chain reaction (RT-PCR) is a variant of polymerase chain reaction (PCR). It is a laboratory technique commonly used in molecular biologywhere a RNA strand is reverse transcribed into its DNA complement (complementary DNA, or cDNA) using the enzyme reverse transcriptase, and the resulting cDNA is amplified using PCR.

• This tree has a number of interesting features. First it shows that different samples ofHIV-1 have slightly different sequences, the samples as a whole forming a tight cluster, almost a star-like pattern, that radiates from one end of the unrootedtree.

•*This star-like topology implies

that the global AIDS epidemic

began with a very small number of

viruses, perhaps just one, which have spread and diversified since entering the human population.

• The closest relative to HIV-1 among primates is the SIV of chimpanzees, the implication being that

• this virus jumped across the species barrier between chimps and humans and initiated the AIDS epidemic.

• However, this epidemic did not begin immediately: a relatively long uninterrupted branch links the center of the HIV-1 radiation with the internal node leading to the relevant SIV sequence, suggesting that after transmission to humans, HIV-1 underwent a latent period when it remained restricted to a small part of the global human population, presumably in Africa, before beginning its rapid spread to other parts of the world.

• Other primate SIVs are less closely related to HIV-1, but one, the SIV from sooty mangabey, clusters in the tree with the second human immunodeficiency virus, HIV-2.

• It appears that HIV-2 was transferred to the human population independently of HIV-1, and from a different simian host. HIV-2 is also able to cause AIDS, but has not, as yet, become globally epidemic.


Recommended