+ All Categories
Home > Documents > Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Date post: 11-Jan-2016
Category:
Upload: francine-wiggins
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
65
Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004
Transcript
Page 1: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Exploring 3D Molecular Structures Using NCBI Tools

A Field Guide

June 17, 2004

Page 2: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

NCBI Structure Resources

• Overview of Structural Informatics at NCBI• How 3D Macromolecular Structures are Determined• Indexing Structural Data at NCBI• Finding Homologous Structures

– By Sequence Similarity: BLAST– By Structure Similarity: VAST– By Conserved Function: RPS-BLAST and CDD

• Finding a Structural Template for a Query Protein

Page 3: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

The National Center for Biotechnology Information

• Created as a part of NLM in 1988– Establish public databases– Perform research in computational biology– Develop software tools for sequence analysis– Disseminate biomedical information

Page 4: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Structural Informatics

ChemicalFormula

3D Conformation

Function

ARKLMPQSCSW…ModificationsIonsLigands

Binding Sites Catalytic ResiduesKinetics ThermodynamicsSubstrates Intermediates

StructureDynamicsActive StatesFolding

Page 5: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Structural Informatics

ChemicalFormula

3D Conformation

Function

GenPeptNCBI RefSeqSWISS-PROTPIRPRF

Multiple Sequence Alignments:Pfam, SMART, COGs, CDD

PDB

Page 6: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Structural Informatics at NCBI

ChemicalFormula

3D Conformation

Function

GenPeptNCBI RefSeqSWISS-PROTPIRPRF

Multiple Sequence Alignments:Pfam, SMART, COGs, CDD

EntrezProtein

EntrezDomains

PDB

EntrezStructure

Entrez3D Domains

4,818,495 25,003

11,382

103,820

Page 7: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

The Entrez System

Entrez

Nucleotide

PubMed

Protein

Taxonomy

Structure Domains

3D Domains

Books

Journals

PMC

OMIM

UniSTS

PopSet

GenomeSNP UniGene

Gene

GEO

GEO Datasets

MeSH

Page 8: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Solving StructuresX-Ray Crystallography

Bond r (Å)

C-S 1.82

C-C 1.54

C-N 1.47

C-O 1.43

S-H 1.34

C=O 1.20

C-H 1.09

N-H 1.01

O-H 0.96

Electron Density Map

P F I

Resolution

5 Å 3 Å 1 Å T or V?

Challenges

Disorder

Cn3D

Page 9: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

More About Resolution1EJG: Crambin at 0.54 Å 2TMA: Tropomyosin at 15 Å

protons!! only alpha carbons!!

3 Å

“Temperature”

Page 10: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Solving StructuresNuclear Magnetic Resonance Spectroscopy

Bo

Constraint List

DistancesDihedral AnglesOrientation

Models consistentwith constraints

RMSD (Å)

Cn3D

Page 11: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

PDB

Page 12: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

PDB File: HeaderHEADER ISOMERASE/DNA 01-MAR-00 1EJ9TITLE CRYSTAL STRUCTURE OF HUMAN TOPOISOMERASE I DNA COMPLEX COMPND MOL_ID: 1; COMPND 2 MOLECULE: DNA TOPOISOMERASE I; COMPND 3 CHAIN: A; COMPND 4 FRAGMENT: C-TERMINAL DOMAIN, RESIDUES 203-765; COMPND 5 EC: 5.99.1.2; COMPND 6 ENGINEERED: YES; COMPND 7 MUTATION: YES; COMPND 8 MOL_ID: 2; COMPND 9 MOLECULE: DNA (5'- COMPND 10 D(*C*AP*AP*AP*AP*AP*GP*AP*CP*TP*CP*AP*GP*AP*AP*AP*AP*AP*TP* COMPND 11 TP*TP*TP*T)-3'); COMPND 12 CHAIN: C; COMPND 13 ENGINEERED: YES; COMPND 14 MOL_ID: 3; COMPND 15 MOLECULE: DNA (5'- COMPND 16 D(*C*AP*AP*AP*AP*AP*TP*TP*TP*TP*TP*CP*TP*GP*AP*GP*TP*CP*TP* COMPND 17 TP*TP*TP*T)-3'); COMPND 18 CHAIN: D; COMPND 19 ENGINEERED: YES SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 EXPRESSION_SYSTEM_COMMON: BACULOVIRUS EXPRESSION SYSTEM; SOURCE 4 EXPRESSION_SYSTEM_CELL: SF9 INSECT CELLS; SOURCE 5 MOL_ID: 2; SOURCE 6 SYNTHETIC: YES; SOURCE 7 MOL_ID: 3; SOURCE 8 SYNTHETIC: YES KEYWDS PROTEIN-DNA COMPLEX, TYPE I TOPOISOMERASE, HUMAN

REMARK 1 REMARK 2 REMARK 2 RESOLUTION. 2.60 ANGSTROMS. REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : X-PLOR 3.1 REMARK 3 AUTHORS : BRUNGER …REMARK 280 REMARK 280 CRYSTALLIZATION CONDITIONS: 27% PEG 400, 145 MM MGCL2, 20 REMARK 280 MM MES PH 6.8, 5 MM TRIS PH 8.0, 30 MM DTT REMARK 290 ...

Page 13: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

PDB File: DataATOM 1 N TRP A 203 30.156 -4.908 37.767 1.00 50.81 N ATOM 2 CA TRP A 203 30.797 -4.667 36.431 1.00 49.96 C ATOM 3 C TRP A 203 30.369 -3.337 35.766 1.00 49.18 C ATOM 4 O TRP A 203 29.315 -3.238 35.147 1.00 49.27 O ATOM 5 CB TRP A 203 30.518 -5.863 35.513 1.00 46.77 C ATOM 6 CG TRP A 203 30.847 -5.651 34.081 1.00 44.60 C ATOM 7 CD1 TRP A 203 32.028 -5.234 33.553 1.00 49.72 C ATOM 8 CD2 TRP A 203 29.980 -5.876 32.984 1.00 43.73 C ATOM 9 NE1 TRP A 203 31.956 -5.191 32.177 1.00 45.45 N ATOM 10 CE2 TRP A 203 30.704 -5.582 31.805 1.00 45.23 C ATOM 11 CE3 TRP A 203 28.657 -6.305 32.877 1.00 46.48 C ATOM 12 CZ2 TRP A 203 30.149 -5.705 30.539 1.00 46.06 C ATOM 13 CZ3 TRP A 203 28.101 -6.431 31.622 1.00 43.08 C ATOM 14 CH2 TRP A 203 28.849 -6.131 30.463 1.00 45.77 C …

Name

AtomNumber

AtomName

ResidueName

Chain ID

ResidueNumber

YX Z

Occupancy

TemperatureFactor

Issues:Justification

Nomenclature

ATOM 1 N TRP A 203 30.156 -4.908 37.767 1.00 50.81

Page 14: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

From PDB to Entrez

Structure

3D DomainsProtein

Domains

Page 15: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

From Coordinates to Models1EJ9: Human topoisomerase I

Page 16: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Building the Structure Summary

Taxonomy

Pubmed

Protein 3D Domains

Domains

Nucleotide

Page 17: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Indexing into MMDB

Structure

• Import only experimentally determined structures• Convert to ASN.1 • Verify sequences

inter-residue-bonds { { atom-id-1 { molecule-id 1 , residue-id 1 , atom-id 1 } , atom-id-2 { molecule-id 1 , residue-id 2 , atom-id 9 } } ,

id 1 , name "helix 1" , type helix , location subgraph residues interval { { molecule-id 1 , from 49 , to 61 } } } ,

Add secondary structure Add chemical bonds

• Create “backbone” model (Cα, P only)• Create single-conformer model

Page 18: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Structure Indexing

Entrez• MMDB-ID• MMDB entry date• EC number • Organism

PDB• Accession• Release date• Class• Source• Description• Comment

Ligands• PDB code• PDB name• PDB description

Literature• Article title• Author• Journal • Publication date

Experimental• Method• Resolution

Counters• Ligand types• Modified amino acids• Modified nucleotides• Modified ribonucleotides• Protein chains• DNA chains• RNA chains

topoisomerase AND 2[dnachaincount] AND human[organism]

Page 19: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Creating Sequence Records

Protein Nucleotide Nucleotide

1EJ9A 1EJ9C 1EJ9D

One record per chain

Page 20: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Building the Structure Summary

Page 21: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Building the Structure Summary

Page 22: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Annotating Secondary Structure1EJ9: Human topoisomerase I

α-Helices

β-strands

coils/loops

Page 23: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Creating 3D Domains3D Domain 0: 1EJ9A0 = entire polypeptide

Page 24: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Creating 3D Domains

3D Domains

1EJ9A1

1EJ9A3

1EJ9A2

1EJ9A4

1EJ9A5

< 3 Secondary Structure Elements

Page 25: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Building the Structure Summary

Page 26: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Building the Structure Summary

Page 27: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

3D Domain IndexingEntrez• SDI• MMDB-ID• Accession• MMDB entry date • Organism• Domain number• Cumulative number

PDB• Accession• Release date• Class• Source• Description• Comment

Literature• Article title• Author • Publication date

Counters• Modified amino acids• α-Helices• β-Strands• Residues• Molecular weight

REMEMBER:3D Domain 0 is the entirepolypeptide chain!

4[helixcount] AND 0[strandcount] AND 0[domainno] AND viruses[organism]

Find all viral four helix bundles

Page 28: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Conserved Domains

Weakly conserved serine Active site serine

Sequences Aligned by Function

Page 29: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Linking Sequence to FunctionThe PSSM Position Specific Score Matrix

A R N D C Q E G H I L K M F P S T W Y V 206 D 0 -2 0 2 -4 2 4 -4 -3 -5 -4 0 -2 -6 1 0 -1 -6 -4 -1 207 G -2 -1 0 -2 -4 -3 -3 6 -4 -5 -5 0 -2 -3 -2 -2 -1 0 -6 -5 208 V -1 1 -3 -3 -5 -1 -2 6 -1 -4 -5 1 -5 -6 -4 0 -2 -6 -4 -2 209 I -3 3 -3 -4 -6 0 -1 -4 -1 2 -4 6 -2 -5 -5 -3 0 -1 -4 0 210 S -2 -5 0 8 -5 -3 -2 -1 -4 -7 -6 -4 -6 -7 -5 1 -3 -7 -5 -6 211 S 4 -4 -4 -4 -4 -1 -4 -2 -3 -3 -5 -4 -4 -5 -1 4 3 -6 -5 -3 212 C -4 -7 -6 -7 12 -7 -7 -5 -6 -5 -5 -7 -5 0 -7 -4 -4 -5 0 -4 213 N -2 0 2 -1 -6 7 0 -2 0 -6 -4 2 0 -2 -5 -1 -3 -3 -4 -3 214 G -2 -3 -3 -4 -4 -4 -5 7 -4 -7 -7 -5 -4 -4 -6 -3 -5 -6 -6 -6 215 D -5 -5 -2 9 -7 -4 -1 -5 -5 -7 -7 -4 -7 -7 -5 -4 -4 -8 -7 -7 216 S -2 -4 -2 -4 -4 -3 -3 -3 -4 -6 -6 -3 -5 -6 -4 7 -2 -6 -5 -5 217 G -3 -6 -4 -5 -6 -5 -6 8 -6 -8 -7 -5 -6 -7 -6 -4 -5 -6 -7 -7 218 G -3 -6 -4 -5 -6 -5 -6 8 -6 -7 -7 -5 -6 -7 -6 -2 -4 -6 -7 -7 219 P -2 -6 -6 -5 -6 -5 -5 -6 -6 -6 -7 -4 -6 -7 9 -4 -4 -7 -7 -6 220 L -4 -6 -7 -7 -5 -5 -6 -7 0 -1 6 -6 1 0 -6 -6 -5 -5 -4 0 221 N -1 -6 0 -6 -4 -4 -6 -6 -1 3 0 -5 4 -3 -6 -2 -1 -6 -1 6 222 C 0 -4 -5 -5 10 -2 -5 -5 1 -1 -1 -5 0 -1 -4 -1 0 -5 0 0 223 Q 0 1 4 2 -5 2 0 0 0 -4 -2 1 0 0 0 -1 -1 -3 -3 -4 224 A -1 -1 1 3 -4 -1 1 4 -3 -4 -3 -1 -2 -2 -3 0 -2 -2 -2 -3

Serine scored differently in these two positions

Active site nucleophile

Page 30: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Pfam-A seeds: HMM based models representing a wide variety of functional domains derived from SWISS-PROT

COG

SMART

CD

Entrez Domains (CDD v2.00)

HMM based models originally concentrating on eukaryotic signalingdomains, now expanding

BLAST based alignments derived from complete proteomes of prokaryotes

NCBI curated domains based on sequence and structural alignments

Pfam pfam01234

smart00123

cd01234

COG0123

NCBI

NCBI

Sanger

EMBL

Single Domains

Protein Families

A database of Position Specific Score Matrices (PSSMs)

Page 31: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

CD-Search Output

CD

SMART

Pfam

COG

Click on a colored bar to align your sequence to the CD

Page 32: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

CD Summary

Alignment view controls

Cn3D launch

PSSM created

Aligned query

Page 33: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Building the Structure Summary

Page 34: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Building the Structure Summary

Cn3D

Page 35: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Creating Entrez Links

NCBI Taxonomy

Literature from PDB

Sequences

Full Chain

Entrez Structure

Entrez 3D Domains

Page 36: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Links to CDsCD-Search / RPS-BLAST

1EJ9A

Query: protein sequence Database: PSSMs

pre-computed inEntrez Protein

Enter accession, GI,or FASTA sequenceinto RPS-BLAST

Page 37: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Finding Homologous Structures

• By sequence similarity: BLAST

• By structural similarity: VAST

• By conserved function: CD-Search

EntrezProtein

EntrezStructure

Entrez3D Domains

EntrezDomains

Page 38: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

BLAST: Sequence Neighbors

BLAST Related StructuresDisplays a graphical and text alignment between a query sequence and a similar sequence with structure

Accessed from• Blink• Any protein BLAST search

?GVKWKYLEHKGPVFAPPYDPLP

GIKWKFLEHKGPVFAPPYEPLP

Page 39: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

BLink NeighborsEAA05377: ENSANGP00000011118 from A. gambiae

Related Structures

Page 40: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Related Structures from BLASTp

Page 41: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Related Structures Cn3D

Page 42: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

VAST: Searching by StructureWhy search for similar structures?

• To find homologs that sequence searches cannot: distant protein homologs often conserve structure more strongly than sequence

• To explore protein evolution: similar protein folds can be used to support different functions

• To identify conserved core elements of a protein fold that can be used to model related proteins of unknown structure

Page 43: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

VAST: Structure NeighborsVector Alignment Search Tool

For each protein chain,

locate SSEs (secondarystructure elements),

and represent them asindividual vectors. 1

2

3

4

5 6

Human IL-4

Page 44: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

VAST: Calculate ij

1

2

3

4

5 6

16

4

5

2

14

zFor both the query andtarget structures,

Calculate the midpointof each SSE.

For each SSE k,align k along z andproject midpoints ontothe xy plane.

Then calculate [ij]k fori ≠ k, j ≠ k.

Vector position about the z axis

Page 45: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

VAST: Calculate (rik, zik)

3

1

zFor both the query andtarget structures,

For each SSE k,set the origin at themidpoint of k.

Then calculate rik andzik for the endpoints ofSSEs i ≠ k.

Vector position relative to the xy plane

xyz13

r13

Page 46: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

VAST: Create Comparison Graph

IL-4

IL-6

3 1

4

6

12

3

5

1 2 3 4 5 6

1

2

3

4

5

4

2

5

Nodes: r13<>r12

z13<>z12

Arcs: 16<>15

must follow sequence order

Select path with highest “weights”

N

N

C

C

Page 47: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

VAST: Refinement

Aligned residuesare red

Alignment extended to the end of this strand

C atoms are added to the aligned SSEs

Alignments are allowed to extend beyond SSE boundaries

All atoms are added to the models, and the detailed backbone and sidechain positions are refined

Page 48: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

VAST: Alignment of Sequence• Aligned blocks represent structural core elements• Aligned blocks have no internal gaps• Aligned residues occupy the same position in space• Aligned residues are shown in CAPITAL letters

Helix 1

Helix 2 Helix 3

Helix 4

Page 49: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

VAST: Scoringp = d P(s > s0, n) c(n, P1, P2)

P(s > s0, n) Probability of observing an alignment of n SSEs with a score greater than s0 by chance.

c(n, P1, P2)Search space:Number of possible alignments of n SSEs between vector sets P1 and P2.

d Number of structures searched (set to 500)

The probability that the VAST alignment occurred by chance.

Page 50: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

VAST: Summary• Secondary structure elements are represented as vectorsand are aligned based on their relative orientations

• VAST ignores loops and tolerates variation in SSE length• The initial alignment is wholly ignorant of atomic coordinates

• Pathways through aligned SSEs respect sequence order• VAST is sensitive to topology

NN N

C C

C

• Alignments are extended and optimized using all-atom models• Aligned blocks may extend across or into loops or other SSEs

Page 51: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Query by Chain vs 3D Domain

Query by whole chain

Query by domain 5

Not found using whole chain query!

c(n, P1, P2) is smaller for a 3D domain!

Page 52: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

VAST: Multiple Alignments Cn3D

Page 53: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

nr-PDB Sets

EntrezStructure

Choose criteria for inclusion in a set

Non-redundant set ofsequence similar clusters

VAST reports onerepresentative from each cluster

Page 54: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Submitting a PDB File to VAST

• Pick the correct file format• Remove all records except ATOM• This is the best way to convert PDB into MMDB format!

Page 55: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Blocks in CD Alignments

Alignment view controls

Aligned query

Cn3D launch

Block 1 Block 2 Block 3

Consensus sequence created

PSSM created

Page 56: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Curating CD Alignmentssmart00235

VAST

cd00203

Cn3DCn3D

Page 57: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Curated CD Summary

List of annotated features

Customized view of the selected feature in Cn3D

Residues comprising the selected feature

Cn3D

Page 58: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

CD-Curation: Effect on model alignment accuracy

04

81

2

0 10 20 30 40 50 60 70 80 90 100

%id in structure alignment

mo

de

l alig

nm

en

t R

MS VAST

04

81

2

0 10 20 30 40 50 60 70 80 90 100

%id in structure alignment

mo

de

l alig

nm

en

t R

MS RPS-BLAST before curation

04

81

2

0 10 20 30 40 50 60 70 80 90 100

%id in structure alignment

mo

de

l alig

nm

en

t R

MS RPS-BLAST after curation

A. Marchler-Bauer

Page 59: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

CDART

Only available for single domain records:cd, pfam, smart

Page 60: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

Finding a Structural TemplateOverall Strategy: For a query protein sequence, construct a block alignment representing conserved core SSEs of the most sequence similar structures to the query, and then align the query sequence to this template.

1. Construct the block alignmentA. Curated CD: Locate using CD-Search and use the sequences

most similar to the queryB. VAST: Find the most sequence similar structure and find its

VAST neighbors

2. Align the query to the template: Use Cn3DA. PSI-BLAST: Aligns sequence using PSSM of current alignmentB. BLOCKER: Aligns sequence to an existing block alignment: use

where sequence similarity is highC. Threader: Aligns sequence to a structure and a block alignment:

use where sequence similarity is low

Page 61: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

BLOCKER: The Block Aligner

PSSM

• Creates alignments that match the existing block structure• Matches are scored from a PSSM generated from the block alignment• An entire block must be matched with no internal gaps• There are no penalties for gaps between blocks up to a set gap length• Can perform both local and global alignments• Generally used after BLAST or PSI-BLAST

The Block Aligner tests the existing block structure

Page 62: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

BLAST/PSSM vs BLOCKER

BLAST/PSSM

BLOCKER

Alignment

Import and align GI 1470115

Page 63: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

The NCBI ThreaderLRLSLEQLQVIAIAN

Input• Structure• Block alignment• Sequence

Attempts to find matches based on chemical contacts, mainly buried hydrophobic interactions

Useful on blocks for which sequence alignment methods fail

Should be iterated with varying block structures

Cn3D

Page 64: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

The Future

• More curated CDs: they keep coming…• Pre-computed Related Structures for all sequences in

Entrez Protein• CD “children”: subfamilies of large CD records based on

sequence and structure similarity• Improved mapping of SNP data onto 3D structures• Further linking of structural and genomic biology

Page 65: Exploring 3D Molecular Structures Using NCBI Tools A Field Guide June 17, 2004.

What comes next…

• Workshop I– Working with Structures

• Workshop II– Working with Alignments

• All exercises and other resources will remain on the course web pages

[email protected]• NCBI Handbook, Ch. 3


Recommended