Multiple alignment using hidden Markove models November 21, 2001 Kim Hye Jin Intelligent Multimedia...

transcript

Multiple alignment using hidden Multiple alignment using hidden Markove modelsMarkove models

November 21, 2001

Kim Hye Jin

Intelligent Multimedia Lab

marisan@postech.ac.kr

Outline

• Introduction

• Methods and algorithm

• Result

• Discussion

IM lab

IntroductionIntroduction

• Why HMM?– Mathematically consistent description of

insertions and deletions– Theoretical insight into the difficulties of

combining disparate forms of information

Ex) sequences / 3D structures– Possible to train models from initially unaligned

sequences

Introduction| why HMM

IM lab

Methods and algorithms

• State transition – State sequence is a 1st

order Markov chain

– Each state is hidden

– match/Insert/delete state

• Symbol emission

Methods and algorithms|HMMs

States transition

Symbol emission

IM lab

Deletion state

Match state

Insertion state

IM lab

• Replacing arbitrary scores with probabilities relative to consensus

• Model M consists of N states S1 …SN.

• Observe sequence O consists of T symbols

O1 … ON from an alphabet x• aij : a transition from Si to Sj • bj(x) : emission probabilities for emission of a

symbol x from each state Sj

IM lab

• Model of HMM : example of ACCY

IM lab

• Forward algorithm

- a sum rather than a maximum

IM lab

• Viterbi algorithm- the most likely path through the model- following the back pointers

IM lab

• Baum-Welch algorithm– A variation of the forward algorithm– Reasonable guess for initial model and then

calculates a score for each sequence in the training set using EM algorithms

• Local optima problem: – forward algorithm /Viterbi algorithm – Baum-welch algorithm

IM lab

• Simulated annealing– support global suboptimal – kT = 0 : standard Viterbi training procesure– kT goes down while in training

IM lab

ClustalW

IM lab

ClustalX

Results

IM lab

• len : consensus length of the alignment

• ali : the # structurally aligned sequences

• %id: the percentage sequence identity

• Homo: the # homologues identified in and extraced from SwissProt 30

• %id : the average percentage sequence identity in the set of homologues

Results

IM lab

Discussion

IM lab

• HMM- a consistent theory for insertion and deletion

penality- EGF : fairly difficult alignments are well done

• ClusterW- progressive alignment- Disparaties between the sequence identity of the

structures and the sequence identity of the homologoues

- Large non-correlation between score and quality

Discussion

IM lab

• The ability of HMM to sensitive fold recognition is apparent

Multiple alignment using hidden Markove models November 21, 2001 Kim Hye Jin Intelligent Multimedia...

Documents