+ All Categories
Home > Documents > Pharm 201 Lecture 09, 20091 Understanding Sequence, Structure and Function Relationships and the...

Pharm 201 Lecture 09, 20091 Understanding Sequence, Structure and Function Relationships and the...

Date post: 01-Jan-2016
Category:
Upload: randell-neal
View: 219 times
Download: 1 times
Share this document with a friend
Popular Tags:
38
Pharm 201 Lecture 09, 2009 1 Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy Pharm 201/Bioinformatics I Philip E. Bourne Department of Pharmacology, UCSD
Transcript

Pharm 201 Lecture 09, 2009 1

Understanding Sequence, Structure and Function Relationships and the

Resulting Redundancy

Pharm 201/Bioinformatics I

Philip E. BourneDepartment of Pharmacology, UCSD

Pharm 201 Lecture 09, 2009 2

Agenda

• Understand the relationship between sequence, structure and function. Consider specifically:– sequence-structure– structure-structure– structure-function

• Take home message: a non-redundant set of sequences is different than a non-redundant set of structures is different than a non-redundant set of functions

Pharm 201 Lecture 09, 2009 3

Why Bother?

• Biology:– A full understanding of a molecular system

comes from careful examination of the sequence-structure-function triad

– Each triad is then a component in a biological process

• Method:– Bioinformatics studies invariably start from a

non-redundant set of data to achieve appropriate statistical significance

Pharm 201 Lecture 09, 2009 4

Background – RMSD Defined

Protein A

a1

a2

a3

a4

Protein B

b1

b2

b3

b4

RMSD = Sqrt (1/N Σ | di|2)i=1

i=N

aN bN

d1d1

d2

d3

d4

Represents the overall distancebetween two proteins usually averaged over their Calpha atoms denoted here a and b

Thus RMSD is the square root of the sum of the squares of the distances between all Calpha atoms

Rule of thumb:1-2 Å RMSD the proteins are close<6 Å RMSD they are likely related

Note: Assumes you know residuescorrespondences

Pharm 201 Lecture 09, 2009 5

Some Useful Observations

• Below 30% protein sequence identity detection of a homologous relationship is not guaranteed by sequence alone

• Structure is much more conserved than sequence• Distinguishing between divergent versus convergent

evolution is an issue• Structure is limited relative to sequence or the order

1:100 – 1:10000 (depending on how you count)• Structure follows a power law with respect to function –

each structural template has from 1 to n functions

Pharm 201 Lecture 09, 2009 6

Relationship Between Sequence and Structure

Pharm 201 Lecture 09, 2009 7

The classic hssp curve from Sander and Schneider (1991) Proteins 9:56-68

Pharm 201 Lecture 09, 2009 8

This Analysis was Updated by Rost in 1999

http://peds.oupjournals.org/cgi/content/full/12/2/85

Pharm 201 Lecture 09, 2009 9

Random 1000 structurally similar PDB polypeptide chains from CE with z > 4.5 (% sequence identity vs alignment length)

Twilight Zone

Midnight Zone

Sequence vs Structure – Another Perspective

Alignment Length

% Seq. Id.

1HMP:A

Glycosyltransferase

1PIV:1Viral Capsid Protein

80 Residue Stretch (Yellow) with Over 40% Sequence Identity

There Are No Absolute Rules - Similar Sequences – Different Structures

10Pharm 201 Lecture 09, 2009

Pharm 201 Lecture 09, 2009 11

Given This Complex Relationship a Non-redundant Set of

Sequences Does not Imply a Non-redundant Set of Structures

Pharm 201 Lecture 09, 2009 12

Structure vs Structure

Pharm 201 Lecture 09, 2009 13

Structure Alignments using CE with z>4.0

Homology modelingis used here

The Russian Doll Effect

Structure Is Highly Redundant

Pharm 201 Lecture 09, 2009 14

We will be revisiting this in the next couple of lectures

• Specifically:– How do we capture this redundancy?– What systems are commonly used to express

this redundancy and what do they bring to our understanding of biology?

• For now consider what this means using the most popular structure classification scheme - SCOP

Pharm 201 Lecture 09, 2009 15

Nature’s Reductionism

There are ~ 20300 possible proteins>>>> all the atoms in the Universe

5.8M protein sequences from 5513 species (source RefSeq)

34,494 protein structures yield 1086 domain folds (SCOP 1.73)

Pharm 201 Lecture 09, 2009 16

The SCOP Hierarchy v1.73Based on 34494 Structures

7

1086

1777

3464

97178

This is remarkable!Explains the one fold many functions

Specific Examples From the SCOP

Hierarchy

17Pharm 201 Lecture 09, 2009

Pharm 201 Lecture 09, 2009 18

Protein Domains

• Definition– Compact,

spatially distinct– Fold in isolation– Recurrence

Pharm 201 Lecture 09, 2009 19

Structure vs Function

Pharm 201 Lecture 09, 2009 20

Some Basic Rules Governing Structure-Function Relationships …• The golden rule is there are no golden

rules – George Bernard Shaw

• Above 40% sequence identity sequences tend to have the same structure and function – But there are exceptions

• Structure and function tend to diverge at the same level of sequence identity

Pharm 201 Lecture 09, 2009 21

Structure vs Function

This is even more complicated than the relationship between sequence and structure and not as well understood

Pharm 201 Lecture 09, 2009 22

Complication Comes from One Structure Multiple Functions

• We saw this from GO already

• phosphoglucose isomerase acts as a neuroleukin, cytokine and a differentiation mediator as a monomer in the extracellular space and as a dimer in the cell involved in glucose metabolism

Pharm 201 Lecture 09, 2009 23

Consider an Example Relative to SCOP

• lysozyme and alpha-lactalbumin:– Same class alpha+beta– Same superfamily – lysozyme-like– Same family C-type lysozyme– Same fold – lysozyme-like– different function at 40% sequence identity

• Lysozyme – hydrolase EC 3.2.1.17• Alpha lactalbumin – Ca binding lactose

biosynthesis

Pharm 201 Lecture 09, 2009 24

More Details…

Lysozyme is an O-glycosyl hydrolase, but -lactalbumin does not have this catalytic activity. Instead it regulates the substrate specificity of galactosyl transferase through its sugar binding site, which is common to both -lactalbumin and lysozyme. Both the sugar binding site and catalytic residues have been retained by lysozyme during evolution, but in -lactalbumin, the catalytic residues have changed and it is no longer an enzyme.

Pharm 201 Lecture 09, 2009 25

Why is It Not so Well Understood?

1. Function is often ill-defined e.g., biochemical, biological, phenotypical

2. The PDB is biased – it does not have a balanced repertoire of functions and those functions are ill-defined

3. There are a number of functional classifications eg EC, GO that have differing coverage and depth

Pharm 201 Lecture 09, 2009 26

Point 2 PDB Bias PDB vs Human Genome

EC – Hydrolases – Begins to Illustrate the Bias in the PDB

PDB

EnsemblHuman

GenomeAnnotation

2.5 Transferring alkyl or aryl groups over represented in PDB

2.4 Glycosyltransferases under represented in PDB

Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31 http://sg.rcsb.org

Pharm 201 Lecture 09, 2009 27

Structure vs Function Follows a Power Law Distribution

• Some folds are promiscuous and adopt many different functions - superfolds

Qian J, Luscombe NM, Gerstein M. JMB 2001313(4):673-81

Pharm 201 Lecture 09, 2009 28

Examples of Superfolds..1TIM

Pharm 201 Lecture 09, 2009 29

Examples of Superfolds

3ADK

1FXI

Pharm 201 Lecture 09, 2009 30

Specific Examples of the Relationship Between Structure

and Function

Pharm 201 Lecture 09, 2009 31

Same Structure and Function Low Sequence Identity

The globin fold is resilient to amino acid changes. V. stercoraria (bacterial) hemoglobin (left) and P. marinus (eukaryotic) hemoglobin (right) share just 8% sequence identity, but their overall fold and function is identical.

Pharm 201 Lecture 09, 2009 32

1fla1ymv 1pdo

Same Structure Different Function - Alpha/beta proteins characterized as different superfamilies

Pharm 201 Lecture 09, 2009 33

1flaFlavodoxin

Electron Transport

1pdoMannose Transporter

1ymvCheY

Signal Transduction

Less than 15% sequence identity

Example – Same Structure Different Function

Pharm 201 Lecture 09, 2009 34

Convergent Evolution

Subtilisin and chymotrypsin are both serine endopeptidases. They share no sequence identity, and their folds are unrelated. However, they have an identical, three-dimensionally conserved Ser-His-Asp catalytic triad, which catalyses peptide bond hydrolysis. These two enzymes are a classic example of convergent evolution.

Pharm 201 Lecture 09, 2009 35

150 200 Ilk____PSS .......... .......... ........CC ....CEEEHH HHCCCCCCEE Ilk____Seq .......... .......... ........FK ....QLNFLT KLNENHSGEL ------------ -+ +L-+++ KL-+---GE- 1fmk--_Seq KHADGLCHRL TTVCPTSKPQ TQGLAKDAWE IPRESLRLEV KLGQGCFGEV 1fmk--_SS HCCCCCCCCC CEECCCCCCC CCCCCCCCCE CCHHHEEEEE EEEECCCEEE * * *

200 250 Ilk____PSS EEEECCCCE. EEEEEEECCC CCCCCHHHHH HHHHHHHHHC CCCEEEEEEE Ilk____Seq WKGRWQGND. IVVKVLKVRD WSTRKSRDFN EECPRLRIFS HPNVLPVLGA ------------ W+G+W-G+- +-+K+LK- +T+++-+F- +E---++-++ H++++-++++ 1fmk--_Seq WMGTWNGTTR VAIKTLKP.. .GTMSPEAFL QEAQVMKKLR HEKLVQLYAV 1fmk--_SS EEEEECCCEE EEEEEECC.. .CCCCHHHHH HHHHHHHHCC CCCECCEEEE * *

250 300 Ilk____PSS EECCCCEEEE EEHHHHCCCC HHHHHHCCCC CCCCHHHHHH HHHHHHHHHH Ilk____Seq CQSPPAPHPT LITHWMPYGS LYNVLHEGTN FVVDQSQAVK FALDMARGMA ------------ ++++P -- ++T--M++GS L-++L-+-T+ --+--+Q-V+ +A+++A+GMA 1fmk--_Seq VSEEP...IY IVTEYMSKGS LLDFLKGETG KYLRLPQLVD MAAQIASGMA 1fmk--_SS ECCCC...EE EEEECCCCCE HHHHHCCCCC CCCCHHHHHH HHHHHHHHHH

300 350 Ilk____PSS HHHCCCCCEE CCCCCCCCEE ECCCCEEEEC CCCCEEECCC CCCCCCCCCC Ilk____Seq FLHTLEPLIP RHALNSRSVM IDEDMTARIS MADVKFSFQC PGRMYAPAWV ------------ ++++--- - ---L-+++++ ++E+-+++++ ---+-- +---W- 1fmk--_Seq YVERMNY..V HRDLRAANIL VGENLVCKVA DFGLAR.... ....FPIKWT 1fmk--_SS HHHHHCC..C CCCCCHHHEE EECCCEEEEC CCCCCC.... ....CCHHHC * * * Cat. Loop 350 400 Ilk____PSS HHHHHHCCCC CCCCEEEEEE EEHHHHHHHH H.CCCCCCCC CHHHHHHHHH Ilk____Seq APEALQKKPE DTNRRSADMW SFAVLLWELV T.REVPFADL SNMEIGMKVA ------------ APEA++++- ---++D+W SF++LL+EL+ T -+VP+-++ +N-E+-++V 1fmk--_Seq APEAALYGR. ..FTIKSDVW SFGILLTELT TKGRVPYPGM VNREVLDQV. 1fmk--_SS CHHHHHHCC. ..CCHHHHHH HHHHHHHHHH CCCCCCCCCC CHHHHHHHH. ***

Example: Same Fold but Not Function

•“Integrin-linked kinase” (Ilk) is a novel protein kinase fold with strong sequence similarity to known structures (Hannigan et al. 1996 Nature 379, 91-96)

•Aligns to Src kinases with BLAST e-value of 10-19 and 27% identity (alignment shown is to a known Src kinase structure)

•Several key residues are conserved, but residues important to catalysis, including catalytic Asp, are missing

•Recent experimental evidence suggests that Ilk lacks kinase activity (Lynch et al. 1999 Oncogene 18, 8024-8032)

Pharm 201 Lecture 09, 2009 36

Non-Redundant Sets: Sequences

• NR dataset (NCBI) - All non-redundant GenBank CDS translations+RefSeq Proteins+PDB+SwissProt+PIR+PRF

• Refseq (NCBI) – Annotated

• CDhit http://bioinformatics.org/cd-hit/ - popular algorithm for fast clustering of sequences

Pharm 201 Lecture 09, 2009 37

Non-Redundant Sets: Sequences with Structure

• PDBselect - http://bioinfo.tg.fh-giessen.de/pdbselect/

• Astral http://astral.berkeley.edu/

• Pisces http://dunbrack.fccc.edu/Guoli/PISCES_OptionPage.php

• RCSB PDB queries

• RCSB Sequence Similaity

Pharm 201 Lecture 09, 2009 38


Recommended