Date post: | 27-Mar-2015 |
Category: |
Documents |
Upload: | paige-johnston |
View: | 216 times |
Download: | 0 times |
Speeding the Database Searches and Sequence Alignments with Multi-Motif
PHI-BLAST
Nitin Bhardwaj,
Dept. of Chemical Engineering,
IIT Bombay.
National Center For Biological Sciences, (NCBS)
Bangalore
A unit of TATA Institute of Fundamental Research (TIFR)
What is Sequence Alignment ?
The process of lining up two or more sequences to achieve maximal levels of similarity
Why make Sequence Alignments ?
To detect:
Structural & Functional Relationship
Evolutionary Relationship
Some Basic TermsGlobal Alignment
Entire Sequence
Local Alignment
Restricted to regions of identity and strong similarity
Query SequenceThe sequence of interest
Subject Sequence
The other one
And….
Scoring Matrices: to score a match/mismatch
True Positive
True Negative
False PositiveFalse Negative
Motif: A short conserved region of a sequence
Hits: Sequences picked up from the database
What after alignment ?
Calculate the score of the alignment
Sort the aligned sequences in the order of their decreasing scores
Go ahead with your analysis to find out the relationships/similarities
Pattern-Hit Initiated Basic Local Alignment Search Tool (PHI-BLAST)
Takes a query seq, a motif, a database to search into
Aligns the query sequence with all the seqs which have the motif
Brings out a score for each seq
Reports all the seqs which have the score above a particular thresh-hold value sorted in the order of the score
A Typical PHI-BLAST Output1 occurrence(s) of pattern in query
pattern [RA][C][ACDEFGHIKLMNPQRSTVWY][C]
at position 3 of query sequenceSignificant matches for pattern occurrence 1 at position 3
Score E
Value (bits)pdb|1ILP|A Chain A, Cxcr-1 N-Terminal Peptide Bound To Interleuk... 128 2e-37
pdb|1QE6|D Chain D, Interleukin-8 With An Added Disulfide Betwee... 121 2e-35
pdb|1ICW|A Chain A, Interleukin-8, Mutant With Glu 38 Replaced B... 121 3e-35
pdb|1ROD|A Chain A, Chimeric Protein Of Interleukin 8 And Human ... 98 2e-28
pdb|1TVX|B Chain B, Neutrophil Activating Peptide-2 Variant Form... 50 6e-14
pdb|1NAP|A Chain A, Mol_id: 1; Molecule: Neutrophil Activating P... 50 6e-14
pdb|1MSG|A Chain A, Human Melanoma Growth Stimulatory Activity (... 48 3e-13
pdb|1MGS|A Chain A, Human Melanoma Growth Stimulating Activity (... 48 3e-13
pdb|1QNK|A Chain A, Truncated Human Grob[5-73], Nmr, 20 Structur... 47 5e-13
pdb|1MI2|A Chain A, Solution Structure Of Murine Macrophage Infl... 46 1e-12
Strategy behind PHI-BLAST
Location of motifs in the seqs
Motif (Query)
Motif (Subject)
Extension in both directions with local alignment
Calculate the score for the alignment
Problems with PHI-BLAST
Only one motif as input so no of runs required thus increasing the time
Consequently, no space for attaching any weightage to any motif
No parallel comparison possible
No control on the specificity of the program
The Solution(s) !!!
MULTI – MOTIF PHI-BLAST (MMPB)
RANKED MOTIF PHI-BLAST (RMPB)
Multi-Motif PHI-BLAST
Takes a query seq, any no of motifs, a database to search into
Aligns the query sequence with all the seqs which have a min no of motif(s)
Brings out a score for each seq
Reports all the seqs which have the score above a particular thresh-hold value sorted in the order of the score
Strategy behind MMPB
Location of motifs in the two seqs
Extension in both directions with local alignment and the part in between with global alignment
Calculate the score for the alignment
Query Motif 2
Motif 1 Motif 2Subject(Local) (Global)(Global)
Motif 1
Comparison of Results il8 Macrophage Inflammatory 1beta
(the middle columns correspond to PHI-BLAST(e=1)
And the last one correspond to MMPB
0
5
10
15
20
25
30
35
PHI-
BLAST(e=10)
True Positives
False positives
il8 (1ikl) Interleukin-8
0
5
10
15
20
25
PHI-
BLAST(e=10)
MMPB
East
West
4helud (1bbh) Cytochrome $c (prime)
0
1
2
3
4
PHI-
BALST(e=10)
MMPB
True Positives
False Positives
4helud (256b) Cytochrome $b502
0
5
10
15
20
PHI-
BLAST(e=100
MMPB
True Positives
False Positives
Flav (1ord) Orthinine Decarboxylase
0
5
10
15
20
25
30
PHI-
BALST(e=10)
MMPB
True Positives
False Positives
Flav (1cus)Cutinase
05
10
1520
25
30
3540
PHI-
BALST(e=10)
MMPB
True Positives
False Positives
Ranked Motif PHI-BLAST
Takes a query seq, a number of motifs in the order of their ranks, and a database to search into
Aligns the query sequence with all the seqs which have the min no of highest ranked motifs
Brings out a score for each seq
Reports all the seqs which have the score above a particular thresh-hold value sorted in the order of the score
Comparison of results Results for il8 (1hum)
Macrophage Inflammatory 1beta the unmarked columns correspond to RMPB with at least 3 & 2
05
1015
202530
MMPB
with
atleast
3
MMPB
with
atleast
2
True positives
Flase Positives
il8 (1ikl) Interleukin-8
05
1015
202530
MMPB
with
atleast
3
MMPB
with
atleast
2
True Positives
False Positives
The problems are solved !!!!
Space for multiple motifs as input
Space for attaching weightage tothe motifs via their ranks
Only one run required for any number of motifs so less time
A deeper analysis possible
That’s All &
Thanks to All of You