+ All Categories
Home > Documents > BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics ,

BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics ,

Date post: 13-Jan-2016
Category:
Upload: percy
View: 26 times
Download: 0 times
Share this document with a friend
Description:
BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics , Dec 10-14, Australian National University . Canberra, Australia. CLe PAPS : Fast P air A lignment of P rotein S tructures Based on C onformational Le tters. Sheng WANG, Wei-Mou ZHENG* - PowerPoint PPT Presentation
Popular Tags:
31
CLePAPS : Fast Pair Alignment of Prote in Structures Based on Conformational Lett ers BioInfoSummer07 ICE-EM Summer Symposium in BioInf ormatics , Dec 10-14, Australian National University . Canberra, Australia Sheng WANG, Wei-Mou ZHENG* Institute of Theoretical Physics, CAS [email protected] *To whom corre spondence should be addressed
Transcript
Page 1: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

CLePAPS : Fast Pair Alignment of Protein Structures

Based on Conformational Letters

BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics , Dec 10-14, Australian National University . Canberra, Australia

Sheng WANG,Wei-Mou ZHENG*

Institute of Theoretical Physics, CAS

[email protected] *To whom correspondence should be addressed

Page 2: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Outline

• [1] Introduction

• [2] The flow chart of CLePAPS Algorithm

[2-1] Find SFPs by CLeSUM

[2-2] Construct ‘Star-Tree’

[2-3] The ‘Zoon-In’ Strategy

• [3] Result & Discussion

Page 3: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Structure alignment --- a self-consistent problemCorrespondence Rigid transformation

However, when aligning two protein structures, at the beginning we know neither the transformation nor the correspondence.

DALI, CEVASTSTRUCTAL, ProSup

CLePAPS: Conformational Letters based Pairwise Alignment of Protein Structures

Initialization + iteration• Similar Fragment Pairs (SFPs); • Anchor-based; • Alignment = As many consistent SFPs as possible

Page 1 (Chp1) Chapter[1] : Introduction

Page 4: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Anchor-based superposition

SFPs

Anchor SFP

consistent

inconsistent

Alignment = Collect as many consistent SFPs as possible

Page 2 (Chp1) Chapter[1] : Introduction

Page 5: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Initial correspondence(Anchor SFP)

Optimal transformation

for the correspondence

Correspondenceupdate

(adding consistent SFPs)

Convergence? End

Structure Alignment => a self-consistent problem

Yes

No

ProteinA ProteinB

Align

Chapter[1] : Introduction Page 3 (Chp1)

Page 6: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

[1] How can we find SFPs as fast as possible?

[2] How can we balance Specificity and Sensitivity of the found SFPs?

[3] How can we avoid a start?

[4] How can we haste the convergence while not to be Local Traped?

Four Main Problems

LOCAL TRAP

Chapter[1] : Introduction Page 4 (Chp1)

Page 7: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

An example ofLOCAL TRAP

Page 8: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Find SFPsBy CLeSUM

SFP List(width 8)

SFP List(width 20)

FinalAlignment

ThirdUpdate

SecondUpdate

FirstUpdate

OptimalAnchor SFP

Star-TreeConstruct

Part_III: ‘Zoom-In’

Top K for anchor

Top J for neighbor

d1 blank-filling

d2 blank-filling

d3 blank-filling

Part_II: ‘Star-Tree’

Specificity

Sensitivity

Part_I: SFP

Chapter[2] : The flow chart of CLePAPS Algorithm Page 5 (Chp2)

Part_II: ‘Star-Tree’

Initial correspondence

(Select an Optimal

Anchor SFP)

Part_III: ‘Zoom-In’

Correspondenceupdate

(adding consistent SFPs without

Local Trap and to haste the convergence)

Page 9: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Find SFPsBy CLeSUM

SFP ( Similarity Fragment Pair)

Chapter[2-1] : Find SFPs by CLeSUM Page 6 (Chp2)

CLeSUM ( Conformational Letter SUbstitution Matrix )

Hint:

Part_I: SFP

Page 10: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

The main difference of CLePAPS from other existing algorithms for structure alignment is the use of Conformational Letters. Conformational letters = discretized states of 3D segmental conformations. A letter = a cluster of combinations of three angles formed by C pseudobonds of four contiguous residues. (obtained by clustering according to the probability distribution.)

Fig.1 Centers of 17 conformational letters

Page 7 (Chp2) Chapter[2-1] : Find SFPs by CLeSUM

Page 11: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Similarity between conformational letters

CLeSUM: Conformational Letter SUbstitution Matrix

Mij = 20* log 2 (Pij/PiPj) ~ BLOSUM83, H ~ 1.05

constructed using FSSP representatives.

typical helix

typical sheet

evolutionary

+ geometric

Page 8 (Chp2) Chapter[2-1] : Find SFPs by CLeSUM

Page 12: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

SFP => highly scored string pair

• Fast search for SFPs by string comparison

• CLESUM similarity score importance of SFPs

Guided by CLESUM scores, only the top few SFPs need to be examinedto determine the superposition for alignment, and hence a reliable greedy strategy becomes possible.

Protein Asimilar

seed

Page 9 (Chp2)

Protein B (smaller)

Chapter[2-1] : Find SFPs by CLeSUM

Example

Page 13: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

An example of Find SFP

>1molARRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCPLDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR

>1cewIRRCECECAJGBIHHHHHHHHIIHHHIIGPGBLDFFCPLDPLEEFEDPOLCEEEEEEDEFDEAGCAKLAJGKHHIIMNGKLQQQDEEEDEEEEEBPKKOGEEDPLEEER

HHHHHHHHAJGKHHII

FEDECCGAOLCEEEEE

FEDPLDEQEEDPLEEE

PLDDEEEDPLEEFEDP

CEDEEEEEEEDEEEEE

Similar Fragment Pair (SFP)

Score rank51 2 3 4

To find SFP , we take the shorter sequence as template , and record every pair position which score is higher than the threshold , the fragment is at a given length

seed

1cewI 1molA

Align

Page 14: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Chapter[2-2] : Construct ‘Star-Tree’ Page 10 (Chp2)

SFP List (width 20) =>We create a list of SFP with length 20

and sort them by CLeSUM score

Top_K & Top_J ( J > K ) =>We only select the Top_K of the list as Anchor SFPand check their consistency use Top_J for neighbor

Hint:

Find SFPsBy CLeSUM

Part_I: SFP

Page 15: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Score rank 5 1 4 2

Example: Top K, K = 2; Top J,J = 5

Anchor

# of consistent SFPs = 4 # of consistent SFPs = 1

Selection of Optimal Anchor SFP

1

Top_1 SFP is globally supported by three other SFPs, while Top_2 SFP is supported only by itself.

Page 11 (Chp2)

3

Anchor

2

Example

Chapter[2-2] : Construct ‘Star-Tree’

SFP

SFP

Page 16: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

1cewI 1molA

Anchor

Consistent

# of consistent SFBs = 4

Anchor

# of consistent SFBs = 1

Top_1 SFPTop_2 SFP

‘Star-Tree’ view

An example of ‘Star-Tree’ construct

Align

Page 17: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

SFP List(width 8)

SFP List(width 20)

FinalAlignment

ThirdUpdate

SecondUpdate

FirstUpdate

OptimalAnchor SFP

Star-TreeConstruct

Part_III: ‘Zoom-In’

Top K for anchor

Top J for neighbor

d1 blank-filling

d2 blank-filling

d3 blank-filling

Part_II: ‘Star-Tree’

Specificity

Sensitivity

Page 5 (Chp2)

Find SFPsBy CLeSUM

Part_I: SFP

Part_III: ‘Zoom-In’

Correspondenceupdate

(adding consistent SFPs without

Local Trap and to haste the convergence)

Chapter[2] : The flow chart of CLePAPS Algorithm

Top 1 ( 4 ) Top 2 ( 1 )

Page 18: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Chapter[2-3] : The ‘Zoon-In’ Strategy Page 12 (Chp2)

SFP List (width 8) =>We create a list of SFP with length 8

and sort them by CLeSUM score (descending order)

blank-filling =>We add consistent SFPs one by one from SFP List (width 8) to update the correspondence

Hint:

Find SFPsBy CLeSUM

Part_I: SFP

Page 19: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

d1

d2

d3

Page 13 (Chp2)

[1] The first transformation is determined by

the Optimal Anchor SFP , so we use a large cutoff d1 to avoid LOCAL TRAP

Example

Chapter[2-3] : The ‘Zoon-In’ Strategy

d1 > d2 > d3

8A 6A 5A。 。 。

[2] The later transformation is determined by a set of globally consistent SFPs , so we use a lower cutoff to add new consistent SFPs

Page 20: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

ThirdUpdate

d1 d2

d3

d1 > d2 > d3

An example of ‘Zoom-In’ strategy

Elongation

FinalAlignment

FisrtUpdate

SecondUpdate

Shrink

8A 6A 5A。 。 。

Page 21: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

SFP List(width 8)

SFP List(width 20)

FinalAlignment

ThirdUpdate

SecondUpdate

FirstUpdate

OptimalAnchor SFP

Star-TreeConstruct

Part_III: ‘Zoom-In’

Top K for anchor

Top J for neighbor

d1 blank-filling

d2 blank-filling

d3 blank-filling

Top 1 ( 4 ) Top 2 ( 1 )

Part_II: ‘Star-Tree’

Specificity

Sensitivity

Page 5 (Chp2)

Find SFPsBy CLeSUM

Part_I: SFP

Chapter[2] : The flow chart of CLePAPS Algorithm

Page 22: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

[1] How can we find SFPs as fast as possible?

[2] How can we balance Specificity and Sensitivity of the found SFPs ?

[3] How can we avoid a Local Trap start?

[4] How can we haste the convergence while not to be Local Traped ?

Four Main Problems

[1] Fast search for SFPs by merely string comparison

[2] Width 20 for Specificity and width 8 for Sensitivity, both sorted by CLeSUM score

[3] Optimal Anchor SFP selected through ‘Star-Tree’

[4] Fast ‘Zoom-In’ strategy to convergence only within three times

CLePAPS ‘s Solution

Page 14 (Chp3) Chapter[3] : Result & Conclusion

Page 23: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Chapter[3] : Result & Conclusion Page 15 (Chp3)

•The Fischer benchmark test• Database search with CLePAPS• Multi-Solution of alignments: symmetry, domain move, repeats• Non-topological alignment and domain shuffling

[pdb:1ihwA] [pdb:1ssoA]

Page 24: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Multi-Solution[1] : Symmetry

[pdb:4fgf][pdb:8i1b]

Red structure fixed

Solution [A] Solution [B] Solution [C]

[pdb:4fgf][OGCCFEFAHOGEED][OGDCEDFAIOGEED][KGFCEDDAJOGCCC]

Page 25: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Multi-Solution[2] : Domain Move Blue structure fixed

[pdb:2gbp][pdb:2liv]

Solution [A] Solution [B]

Domain_1

Domain_2

Page 26: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Multi-Solution[3] : Repeats Blue structure fixed

[pdb:4cpv][pdb:1osa]

Solution [A] Solution [B]

Repeat_1

Repeat_2

Page 27: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

ConclusionCLePAPS distinguishes itself from other existing algorithms for pairwise structure alignment in its use of conformational letters.

• conformational letters : aptly balance precision with simplicity• CLeSUM: a proper measure of similarity between states

• CLeSUM extracted from the database FSSP contains information of structure database statistics, which reduces the chance of accidental matching of two irrelevant helices. evolutionary + geometric = specificity gain

For example, two frequent helices are geometrically very similar,

but their score is relatively low.• CLeSUM similarity score can be used to sort the importance of SFPs for a greedy algorithm. Only the top few SFPs need to be examined.

Page 16 (Chp3) Chapter[3] : Result & Conclusion

Page 28: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

1, Fast search for SFPs by merely string comparison

2, Width 20 for specificity + width 8 for sensitivity

3, Optimal Anchor SFP selected by checking consistency

4, Avoid Local Trap by ’zoom-in’

The running time for the 68 pairs of the Fischer benchmark is less than 2% of that of the downloaded CE local version.

Next steps

1, BLOMAPS: fast multiple structure alignment;

SFPs → Highly Similar Fragment Blocks (HSFBs)

2, Include biochemical information into CLESUM by amino acid clustering.

Entropic clustering: AVCFIWLMY (h) + DEGHKNPQRST (p)

Page 17 (Chp3) Chapter[3] : Result & Conclusion

Page 29: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

Thank you

Page 30: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

>1molARRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCPLDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR

N-Terminal

C-Terminal

Step 1 get four continuous Cα atom

Step 2 get two bending angle θ and θ’ and one torsion angle τStep 3 select the most similar one from the 17 statesStep 4 assign the code

Step 1 Step 2

Step 3

Step 4

Page 31: BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics  ,

θ

θ’

τ


Recommended