Module in tics

BIOINFORMATICS

MAPÚASchool of Chemical Engineering and ChemistryExcellence at the Highest Level

MODULE IN BIOINFORMATICS

Databases for the Storage and “Mining” of Genome Sequences

Using Databases to Compare and Identify Related Protein Sequences

Visualizing Three-Dimensional Protein Structures

Enzyme Inhibitors and Rational Drug Design

Metabolic Enzymes, Microarrays, and Proteomics

Formatted: Level 1, Space Before: Auto,After: Auto

BIOINFORMATICS


MODULE 1: Databases for the Storage and “Mining” of Genome

Sequences

This is an introduction to nucleotides, nucleic acids (DNA and RNA), and the processes of

transcription and translation. The exercises below are designed to introduce you to some of the

relevant databases and the tools they contain for examining and comparing different bits of

information (see Sections 3-4C and 3-4D). Biological databases are an important resource for the

study of biochemistry at all levels. These databases contain huge amounts of information about the

sequences and structures of nucleic acids (DNA and RNA) and proteins. They also contain software

tools that can be used to analyze the data. Some of the software—called web applications—can be

used directly from a web browser. Other software—called freestanding applications—must be

downloaded and installed on your local computer.

1. Finding Databases. We'll start with finding databases.

a. What major online databases contain DNA and protein sequences?

b. Which databases contain entire genomes?

c. Using your textbook and online resources (http://www.google.com), make sure you

understand the meaning of the following terms: BLAST, taxonomy, gene ontology, phylogenetic

trees, and multiple sequence alignment. Once you have defined these terms, find resources on the

Internet that enable you to study them.

2. TIGR (The Institute for Genomic Research). Open the TIGR site (http://www.tigr.org/db.shtml)

Find the Comprehensive Microbial Resource.

a. What 2001 publication describes the Comprehensive Microbial Resource at TIGR?

b. How many completed genomes from Pseudomonas species have been deposited at TIGR?

c. Which Pseudomonas species are these?

d. Identify the primary reference for Pseudomonas putida KT2440.

e. Find the link on the Comprehensive Microbial Resource home page for restriction digests. Perform

a computer-generated restriction digest on Pseudomonas putida KT2440 with BamH1. How many

fragments form and what is the average fragment size? (See Section 3-4A for a discussion of

restriction endonucleases.)

f. In addition to microbial genomes, TIGR also contains the genomes of many higher organisms.

Identify five eukaryotic genomes that are available at TIGR.

3. Analyzing a DNA Sequence. Using high-throughput methods, scientists are now able to sequence

entire genomes in a very short period of time. Sequencing a genome is quite an accomplishment in

itself, but it is really only the beginning of the study of an organism. Further study can be done both at

the wet lab bench and on the computer. In this problem, you will use a computer to help you identify

an open reading frame, determine the protein that it will express, and find the bacterial source for that

protein. Here is the DNA sequence: Click here for text version

http://www.google.com/

http://www.tigr.org/db.shtml

http://higheredbcs.wiley.com/legacy/college/voet/0471214957/bioinf_stud/Bioinfo_3_3.rtf

BIOINFORMATICS


a. First, try to find an open reading frame in this segment of DNA. What is an open reading frame

(ORF)? You can find the answer in your textbook (Section 3-4D) or online with a simple Internet

search (http://www.google.com). You may also wish to try the bookshelf at PubMed

(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books). In bacteria, an

open reading frame on a piece of mRNA almost always begins with AUG, which corresponds to

ATG in the DNA segment that codes for the mRNA. According to the standard genetic code (Table

26-1), there are three Stop codons on mRNA: UAA, UAG, and UGA, which correspond to TAA,

TAG, and TGA in the parent DNA segment. Here are the rules for finding an open reading frame in

this piece of bacterial DNA:

1. It must start with ATG. In this exercise, the first ATG is the Start codon. In a real gene search,

you would not have this information.

2. It must end with TAA, TAG, or TGA.

3. It must be at least 300 nucleotides long (coding for 100 amino acids).

4. The ATG Start codon and the Stop codon must be in frame. This means that the total number of

bases in the sequence from the Start to the Stop codon must be evenly divisible by 3 (see

Section 26-1A).

Hints: Try this search by pasting the DNA sequence into a word processing program, then

searching for the Start and Stop codons. Once you have found a pair, highlight the text of the

proposed ORF and use the program's Word Count function to count the number of characters

between (or including) the Start and Stop codons. This number must be evenly divisible by 3. You

can also use a fixed-width font such as Courier, enlarge the size of the text, and adjust the margins

so that each line holds just three characters (one codon). Once you find the first ATG, delete the

characters that precede it. Then search for a Stop codon that fits all on one line (is in the same

reading frame as the Start codon).

b. Admittedly, Part (a) is a tedious approach. Here is an easier one: Highlight the entire DNA

http://www.google.com/

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books

BIOINFORMATICS


sequence again and copy it. Then go to the Translate tool on the ExPASy server

(http://www.expasy.org/tools/dna.html). Paste the sequence into the box entitled

―Please enter a DNA or RNA sequence in the box below (numbers and blanks are ignored).‖ Then

select ―Verbose (―Met‖, ―Stop‖, spaces between residues)‖ as the Output format and click on

―Translate Sequence.‖ The ―Results of Translation‖ page that appears contains six different reading

frames. What is a reading frame and why are there six? (Refer to Section 26-1A, the Internet, or the

PubMed bookshelf for an answer.) Identify the reading frame that contains a protein (more than

100 continuous amino acids with no interruptions by a Stop codon) and note its name. Now go back

to the Translate tool page, leave the DNA sequence in the sequence box, but select ―Compact (―M‖,

―-‖, no spaces)‖ as the Output format. Go to the same reading frame as before and copy the protein

sequence (by one-letter abbreviations) starting with ―M‖ for methionine and ending in ―-‖ for the

Stop codon. Save this sequence to a separate text file.

c. Now you will identify the protein and the bacterial source. Go to the NCBI BLAST page

(http://www.ncbi.nlm.nih.gov/BLAST/). What does BLAST stand for? You will do a

simple BLAST search using your protein sequence, but you can do much more with BLAST. You

are encouraged to work the Tutorials on the BLAST home page to learn more. On the BLAST

page, select ―Protein-protein BLAST.‖ Enter your protein sequence in the ―Search‖ box. Use the

default values for the rest of the page and click on the ―BLAST!‖ button. You will be taken to the

―formatting BLAST‖ page. Click on the ―Format!‖ button. You may have to wait for the results.

Your protein should be the first one listed in the BLAST output. What is the protein and what is the

source?

Note to instructors: You can do this exercise with any DNA sequence. You can also start from a DNA

sequence directly in BLAST (use blastn) and find the genes that way. It is probably best to choose a

DNA segment that encodes only one protein.

4. Sequence Homology. You will use BLAST to look at sequences that are homologous to the protein

that you identified in Problem 3.

a. First, some definitions: What do the terms ―homolog,‖ ―ortholog,‖ and ―paralog― mean? Go to the

NCBI BLAST page (http://www.ncbi.nlm.nih.gov/BLAST/) and choose ―Protein-

protein BLAST.‖ Paste your protein sequence into the ―Search‖ box. Before clicking on the

―BLAST!‖ button, narrow the search by kingdom. As you look down the BLAST page, you'll see

an Options section. Under ―Limit by entrez query‖ (followed by an empty box) or ―select from:‖

(followed by a drop-down menu), select ―Eukaryota.‖ Now click on the ―BLAST!‖ button. Click

on the ―Format!‖ button on the next page. Can you find a homologous sequence from yeast?

(Hint: Use your browser's Find tool to search for the term ―Saccharomyces.‖) Note the Score and E

value given at the right of the entry.

Can you find a homologous sequence from humans?

(Hint: Search for the term ―Homo.‖) Note its Score and E value.

Most biochemists consider 25% identity the cutoff for sequence homology, meaning that if two

proteins are less than 25% identical in sequence, more evidence is needed to determine whether

they are homologs. Click on the Score values for the yeast and human proteins to see each sequence

aligned with the Yersinia pestis sequence and to see the percent sequence identity. Are the yeast

and human sequences homologous to the Yersinia pestis sequence?

b. Use the BLAST online tutorial

(http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html)

to discover the meaning of the Score and E value for each sequence that is reported. What is the

difference between an identity and a conservative substitution? Provide an example of each from

the comparison of your sequence and a homologous sequence obtained from BLAST (see Section

5-4A for a discussion of conservative substitution).

http://www.expasy.org/tools/dna.html

http://www.ncbi.nlm.nih.gov/BLAST/

http://www.ncbi.nlm.nih.gov/BLAST/

http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html

BIOINFORMATICS


c. BLAST uses a substitution matrix to assign values in the alignment process, based on the analysis

of amino acid substitutions in a wide variety of protein sequences. Be sure you understand the

meaning of the term ―substitution matrix.‖ What is the default substitution matrix on the BLAST

page? What other matrices are available? What is the source of the names for these substitution

matrices? Repeat the BLAST search in Problem 4(a) using a different substitution matrix. Do you

find different answers?

5. Plasmids and Cloning

a. REBASE is the Restriction Enzyme Database

(http://rebase.neb.com/rebase/rebase.html), which is supported by a number of

commercial restriction enzyme suppliers (restriction enzymes are described in Section 3-4A). Go to

the REBASE Enzymes page (http://rebase.neb.com/rebase/rebase.enz.html)

and find a restriction enzyme from Rhodothermus marinus (it starts with the letters Rma). What is

the abbreviation for this enzyme?

Click on the enzyme's abbreviation to be taken to the page for this enzyme. Follow the links there

to answer the following two questions. What is the recognition sequence for this enzyme? What are

the expected and actual frequencies of restriction enzyme recognition sites for this enzyme in

Bacillus halodurans C-125?

b. What is a plasmid? pBR322 was one of the first plasmids to be developed for experimental work.

Go to the Entrez site (http://www.ncbi.nlm.nih.gov/Entrez) and find the sequence of

pBR322 by searching for the terms ―pBR322, complete genome.‖ You must select Nucleotide as

your search option on the Entrez main page.

Look through the Entrez description of pBR322 and identify one gene encoded by pBR322 and

name the antibiotic that it targets.

You can get Entrez to display your sequence in FASTA format by selecting this option next to the

―Display‖ button. (Here are two of many sites that describe the FASTA format:

http://ngfnblast.gbf.de/docs/fasta.html;

http://bioinformatics.ubc.ca/resources/faq/?faq_id=1). Save the pBR322

sequence in FASTA format.

c. Go to PubMedCentral and search for a 1978 article in Nucleic Acids Research about restriction

mapping of pBR322. Download the article in pdf format (use Adobe Acrobat to read it; you can get

this program at http://www.adobe.com). What is the size of the pBR322 plasmid in number

of base pairs?

How many cut sites are there for the restriction enzyme HaeIII on pBR322?

d. Some restriction enzymes generate ―blunt ends,‖ and some generate ―sticky ends.‖ Explain the

meaning of those terms and provide an example of each.

e. Go to the RESTRICT site at the Pasteur Institute

(http://bioweb.pasteur.fr/seqanal/interfaces/restrict.html). Enter your

email address at the top, then input the pBR322 sequence file. Scroll down to the ―Required

section‖ and note that you have a Minimum recognition site length of four nucleotides and you

have selected all the enzymes available in REBASE to digest pBR322 at the same time. Click on

the ―Run Restrict‖ button.

On the output screen, click on the ―outfile.out‖ link. This takes you to a simple text page that lists

all the cuts that were made in the pBR322 plasmid. How many pBR322 fragments did ―all‖ the

enzymes generate? (Look for the ―HitCount‖ number on the output.out page).

What happens to the number of fragments when the minimum recognition site length is changed to

six nucleotides? Why did the number change?

f. Now change the enzyme name from ―all‖ to ―BamHI‖ in the enzymes box under the Required

http://rebase.neb.com/rebase/rebase.html

http://rebase.neb.com/rebase/rebase.enz.html

http://www.ncbi.nlm.nih.gov/Entrez

http://ngfnblast.gbf.de/docs/fasta.html

http://bioinformatics.ubc.ca/resources/faq/?faq_id=1

http://www.adobe.com/

http://bioweb.pasteur.fr/seqanal/interfaces/restrict.html

BIOINFORMATICS


section on the RESTRICT page. How many fragments are generated? How many fragments are

obtained using AvaI? What is the size of the restriction site for AvaI? How many fragments are

obtained using Eco47III? What is the size of the restriction site for Eco47III?

g. How many pBR322 fragments are produced when the three different enzymes are combined

(separate the enzyme names by commas)? How large are the fragments?

h. Use a mixture of the restriction enzymes BamHI, AvaI, and PstI to construct a restriction map of

pUC18 similar to the one shown in Fig. 3-25. How does this procedure for restriction mapping

differ from that used in Problem 10 at the end of Chapter 3?

i. For the adventurous: Find an enzyme or combination of enzymes that will produce 10 fragments

from pUC18. Draw a restriction map of your results.

---END OF MODULE 1---

BIOINFORMATICS


MODULE 2: Using Databases to Compare and Identify Related Protein

Sequences

1. Obtaining Sequences from BLAST. Triose phosphate isomerase is an enzyme that occurs in a

central metabolic pathway called glycolysis (see Chapter 14.) It is also known as an enzyme that

demonstrates catalytic perfection (see Section 12-1B). For this problem, you'll start with the sequence

of triose phosphate isomerase from rabbit muscle and look for related proteins in the online databases.

Here is the sequence of rabbit muscle triose phosphate isomerase in FASTA format: Click here for

text version

a. Go to http://www.ncbi.nlm.nih.gov/BLAST and follow the link to Protein-protein

BLAST (blastp) under Protein. Perform a BLAST search using the triose phosphate isomerase

sequence by copying and pasting it into the Search box. Find a human homolog of rabbit muscle

triose phosphate isomerase.

The first item in this record (gi|4507645|ref|NP_000356.1|) is a link to another database where this

protein is described in more detail (items that begin with ―gi‖ lead to GenBank records). The next

item (triosephosphate isomerase 1 [Ho ) is a description of the protein. Next is the score (493)

followed by the E value (e-138). In bioinformatics, two proteins are called ―homologs‖ if they

arose from a common ancestor; the two proteins are called ―orthologs‖ if they arose from a

common ancestor and perform the same function in two different species. Does the NP_000356.1

entry represent a human ortholog of rabbit muscle triose phosphate isomerase? What is the percent

identity between the two enzymes?

Find another human homolog to the rabbit muscle enzyme. Click on the link on the left side of the

record to bring up its Genbank entry. Select ―FASTA‖ as the display format and click on the

―Display‖ button. Copy the FASTA text and save it to a text file (if you are using a word processor,

be sure to save the file in ―text only‖ format). Save the text file (suggested name: TIM_FASTA.txt)

for later use.

b. Instead of trying to look through the entire BLAST output to find triose phosphate isomerase

homologs from plants, bacteria, and archaea, you can use some options in BLAST to narrow your

search. Return to the protein-protein BLAST page and paste the rabbit muscle sequence into the

―Search‖ box. This time, look down the BLAST page for an option to select ―Archaea‖ and then

perform the BLAST search. Select one of the resulting sequences and save it in FASTA format.

Repeat this process to get FASTA-formatted sequences for triose phosphate isomerases from a

bacterial and plant (Viridiplantae) source. Combine the five FASTA-formatted sequences (rabbit,

human, archaea, bacterial, and plant) in a single file (suggested name: TIM_5_FASTA.txt). This

must be a simple text file with individual sequences separated by a blank line.

2. Multiple Sequence Alignment. Multiple sequence alignment is a tool to identify highly conserved

residues in homologous proteins. A program called CLUSTALW will perform multiple sequence

alignments on protein sets that are submitted in FASTA format. CLUSTALW is available as a

command line program to be executed in a UNIX environment (not very user-friendly). Fortunately,

the European Bioinformatics Institute has a web interface that performs CLUSTALW alignments:

http://www.ebi.ac.uk/clustalw/.



http://www.ncbi.nlm.nih.gov/BLAST

http://www.ebi.ac.uk/clustalw/

BIOINFORMATICS


[Note to instructors: The EBI asks that you alert them via email if you are using this resource for your

course. Respond to http://www.ebi.ac.uk/support/.]

a. Go to the EBI site and submit your text file containing the five triose phosphate isomerase

sequences in FASTA format on the input form page. There are many options for refining the

alignment, but for now, use the default values. Be sure to enter your email address. The output of

CLUSTALW can be accessed in many ways. The simplest version will be described here, but you

are encouraged to explore other options (especially JaiView). In the simple text output, the

sequences are optimally aligned and annotated: Residues that are identical in all chains are marked

with an asterisk (*), those that are highly conserved are marked with a colon (:), and those that are

semiconserved are marked with a period (.). From your multiple sequence alignment, how many

identical residues did you find? Identify the residues, using the single-letter amino acid

abbreviations found in Table 4-1. Classify these ―identity‖ sites as polar, nonpolar, acidic, and basic

amino acids. Do most of the ―identities‖ fall into a single class of amino acids? If you plan to

continue to Part (b), keep your browser open or bookmark the results page. You can learn more

about CLUSTALW at a tutorial provided by EBI

(http://www.ebi.ac.uk/2can/tutorials/protein/clustalw.html).

b. Figure 5-23 of the textbook shows a phylogenetic tree, which is described as ―a diagram that

indicates the ancestral relationships among organisms that produce the protein.‖ There are useful

tutorials on phylogenetic trees at the Los Alamos National Laboratories web site

(http://www.hiv.lanl.gov/content/hiv-db/TREE_TUTORIAL/Tree-

tutorial.html), at the EBI help page

(http://www.ebi.ac.uk/clustalw/tree_frame.html), and at the NCBI site

(http://www.ncbi.nlm.nih.gov/About/primer/phylo.html). Complete one or all

these tutorials.

Scroll down the output page from the CLUSTALW program at EBI to the tree representations of

the alignments. What is the difference between a cladogram and a phylogram tree? What do these

trees tell you about triose phosphate isomerase from the five different species? The tree image on

the EBI site is a dynamic image, meaning that you can't just cut and paste it. If you would like to

capture this image, you can use the PrintScreen button on your computer and paste the image into a

simple Paint program (with Mac OSX use the program Grab for screen capture).

3. One-Dimensional Electrophoresis. Electrophoresis is a laboratory technique that is used to separate

proteins and DNA molecules on the basis of size and charge. The principles of one-dimensional

electrophoresis (1DE) are explained in Sections 3-4B and 5-2D. Sodium dodecyl sulfate

polyacrylamide gel electrophoresis (SDS-PAGE) is the most common form of 1DE. In this technique,

proteins are mixed with a reducing agent (usually dithiothreitol or 2-mercaptoethanol) and a detergent

(SDS), heated for 5 minutes, then separated on an acrylamide gel. You can explore 1DE at the

Electrophoresis Simulation site

(http://www.rit.edu/~pac8612/electro/Electro_Sim.html), which contains a

Java applet that enables you to compare the migration of an unknown protein (you can choose from

seven unknowns) with a series of standards. Visit the site and report the molecular weights you

determine for each of the unknowns. Also, experiment with the controls (voltage, % acrylamide,

animation speed). You can learn how to use the applet by clicking on the ―How To ‖ button.

4. Two-Dimensional Electrophoresis. Two-dimensional electrophoresis (2DE) is also described in

Section 5-2D. In the first dimension, proteins are separated by isoelectric focusing; that is, they move

to a position in a pH gradient according to their isoelectric point (pI, the pH at which the net charge of

the protein is 0). Then they are separated according to molecular weight by SDS-PAGE in the second

dimension, as described in Problem 3. For this part of the exercise you will need to retrieve the text

file containing the triose phosphate isomerase sequences from five different species (Problem 1;

TIM_5_FASTA.txt).

a. The ExPASy Proteomics Server contains many tools for analyzing data from two-dimensional

http://www.ebi.ac.uk/support/

http://www.ebi.ac.uk/2can/tutorials/protein/clustalw.html

http://www.hiv.lanl.gov/content/hiv-db/TREE_TUTORIAL/Tree-tutorial.html

http://www.hiv.lanl.gov/content/hiv-db/TREE_TUTORIAL/Tree-tutorial.html

http://www.ebi.ac.uk/clustalw/tree_frame.html

http://www.ncbi.nlm.nih.gov/About/primer/phylo.html

http://www.rit.edu/~pac8612/electro/Electro_Sim.html

BIOINFORMATICS


electrophoresis gels, as well as a catalog of gels themselves. Go to the Primary structure analysis

tools section of the ExPASy Proteomics server at

http://us.expasy.org/tools/#primary. These tools will compute and predict values

for a protein based only on its primary structure.

Select the Compute pI/Mw tool (http://us.expasy.org/tools/pi_tool.html). Enter

the sequence for one of the five triose phosphate isomerase proteins in the data entry box. Be sure

to enter only one amino acid sequence and do not include its FASTA header (e.g.,

>gi|17389815|gb|AAH17917.1| Triosephosphate isomerase 1 [Homo sapiens]), because the

program will attempt to calculate pI and Mw values for each term entered. Record the predicted pI

and molecular weight for the protein. Repeat these steps for the other four protein sequences. How

similar are the pI and Mw values for the triose phosphate isomerases from the five different

organisms?

b. Now try to find triose phosphate isomerase on a published gel. Go to the SWISS-2DPAGE search

page (http://us.expasy.org/cgi-bin/ch2d-search-de) and search for triose

phosphate isomerase. Select the entry for the human enzyme. How do the reported values for pI and

Mw compare with the theoretical values you obtained in Part (a)? If the values are different, can

you suggest an explanation?

c. The best way to identify a protein spot on a 2DE gel is to use mass spectrometry (see Section 5-

3D). The ExPASy Proteomics server has tools to predict fragmentation patterns based on the

primary sequence of a protein and also to identify proteins based on fragmentation patterns from

actual mass spectrographs. Go to the Protein identification and characterization tool section at the

ExPASy Proteomics server (http://us.expasy.org/tools/#proteome). Select the

―PeptideMass‖ tool. Paste in one of your triose phosphate isomerase sequences, verify that

―trypsin‖ is selected as the enzyme, and leave the other options at their default settings. Click on

the ―Perform‖ button and record the four largest fragments that you would obtain if you digested

the protein with trypsin. Do the same for the four other sequences. Are any of the fragmentation

patterns identical between the species?

http://us.expasy.org/tools/#primary

http://us.expasy.org/tools/pi_tool.html

http://us.expasy.org/cgi-bin/ch2d-search-de

http://us.expasy.org/tools/#proteome

BIOINFORMATICS


MODULE 3: Visualizing Three-Dimensional Protein Structures

There are a number of useful free visualization tools available on the Internet. Each has strengths and

weakness. For this exercise you will use a tool called Rasmol that you can download from

http://www.bernstein-plus-sons.com/software/rasmol/README.html. Rasmol

is available for Macintosh, Windows, and Linux/UNIX operating systems. Install Rasmol on your

computer according to the instructions on the Rasmol site (http://www.bernstein-plus-

sons.com/software/rasmol/INSTALL.html). As you go through the exercises below, you

are encouraged to visit any of a number of excellent Rasmol tutorials on the Internet: Gale Rhodes's

tutorial at the University of Southern Maine

(http://www.usm.maine.edu/~rhodes/RasTut/), Eric Martz's tutorial at the University

of Massachusetts Amherst

(http://www.umass.edu/microbio/rasmol/rasquick.htm), and David Hackney's

tutorial (adapted to HTML by Will McClure) at Carnegie Mellon University

(http://www.bio.cmu.edu/Courses/BiochemMols/RasMolTutorial/RasTut.ht

ml). If you are interested in exploring additional visualization tools, you can obtain free software via

the Internet for Protein Explorer, KING (Kinemage), DeepView, CN3D, Chime, Jmol, and BioEditor.

1. Obtaining Structural Information. Review the discussion of protein secondary structure (Section 6-

1). Secondary structures in proteins include alpha helices, beta sheets, and beta turns.

a. Many programs have been written to predict secondary structures based only on the primary

structure (amino acid sequence) of a protein. Here is a list of such programs that are available

online:

1. PredictProtein (http://www.embl-

heidelberg.de/predictprotein/predictprotein.html). You can request this

site to predict secondary structure from seven different web servers online. If this site is

available, it will enable you to complete this problem by clicking on two or more of the optional

services.

2. JPred (http://www.compbio.dundee.ac.uk/~www-jpred/). If you use the JPred

server, be certain to check the box under #4 to avoid comparison to known PDB structure files.

3. NNPredict (http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html)

For this exercise, you will use the sequence of rabbit muscle triose phosphate isomerase, given

here: Click here for text version

Submit this sequence to two of the servers listed above. You may have to wait several minutes for

the results. Compare the results you receive from the different servers. Can you identify segments

where the predictions are not consistent between servers?

b. The structure of rabbit muscle triose phosphate isomerase has been determined by X-ray

crystallography (Section 6-2A). Go to the Protein Data Bank web server

(http://www.rcsb.org/pdb) and search for 1R2R (the PDB ID for this protein). Once you

reach the Structure Explorer page for 1R2R, click on the link for Sequence Details. Scroll down the

Formatted: Left

http://www.bernstein-plus-sons.com/software/rasmol/README.html

http://www.bernstein-plus-sons.com/software/rasmol/INSTALL.html

http://www.bernstein-plus-sons.com/software/rasmol/INSTALL.html

http://www.usm.maine.edu/~rhodes/RasTut/

http://www.umass.edu/microbio/rasmol/rasquick.htm

http://www.bio.cmu.edu/Courses/BiochemMols/RasMolTutorial/RasTut.html

http://www.bio.cmu.edu/Courses/BiochemMols/RasMolTutorial/RasTut.html

http://www.embl-heidelberg.de/predictprotein/predictprotein.html

http://www.embl-heidelberg.de/predictprotein/predictprotein.html

http://www.compbio.dundee.ac.uk/~www-jpred/

http://www.cmpharm.ucsf.edu/~nomi/nnpredict.html


http://www.rcsb.org/pdb

BIOINFORMATICS


page to the section entitled ―Sequence and Secondary Structure.‖ The results shown here for the

secondary structure are based on an analysis of the actual (not predicted) three-dimensional

structure, using the principles developed by Kabsh and Sander [see

http://www.rcsb.org/pdb/help-results.html#sequence_details and

Biopolymers 22, 2577–2637 (1983)]. The secondary structure assignments are H = helix; B =

residue in isolated beta bridge; E = extended beta strand; G = 310 helix; I = pi helix; T = hydrogen

bonded turn; S = bend. Compare your predicted secondary structure results from Part (a) with the

results presented on the PDB site.

Note that the Protein Data Bank web site is undergoing revision so that some of the web addresses

and specific instructions provided here may vary somewhat. For example, in the new site, you can

go to the 1R2R main page and access the secondary structure information by clicking on the

―Sequence Details‖ link on the right side of the screen under the image of the structure.

c. Follow the link on the PDB site for ―Download File.‖ You can download the file in a number of

formats, but it is best to download the file in PDB format for use with Rasmol. Save the structure

file as 1R2R.pdb on your computer (suggested folder: My Documents/PDB Files). Open the

Rasmol program, then use the drop-down menu File. .Open to open 1R2R.pdb. You will initially

see a wireframe model that simply displays all the bonds in the structure as lines. Perform the

following steps to get a more informative view:

Select Display. .Cartoons from the drop-down menu.

Select Colours. .Structure

Now you should be able to see the alpha helix and beta sheet structures in rabbit muscle triose

phosphate isomerase. How many chains are shown in the structure? What is the dominant structural

feature of this protein? How does your structure compare with Fig. 6-30c? Take time to experiment

with the other drop-down menu options on Rasmol.

In addition to drop-down menus, Rasmol also has a ―command line‖ window that enables you to

select specific atoms or parts of a structure (amino acid residues, for example) and change the way

they appear. There are more details in the tutorials listed above. Eric Martz has also prepared a

helpful command list

(http://www.umass.edu/microbio/rasmol/distrib/rasman.htm#chcomref).

Bring up the command line window and note the effects of entering the following commands:

1. select hetero and not water (selects nonprotein parts of the structure excluding water)

2. spacefill (a van der Waals radius representation)

3. color cpk (standard chemistry color scheme)

What heteroatoms do you see in this structure?

Hint: Click on them to see their identities in the command line. You can also find the hetero

atoms in a structure by looking at the summary information page of the pdb file. Are any

substrates or inhibitors represented in this structure? Now try more commands:

4. select protein

5. cartoon off

6. select sheet

7. wireframe 30

8. spacefill 100 (these combined commands yield a ball-and-stick structure)

Can you see the sheet structure now? If not, type the command ―cartoon.‖ What do you see?

http://www.rcsb.org/pdb/help-results.html#sequence_details

http://www.umass.edu/microbio/rasmol/distrib/rasman.htm#chcomref

BIOINFORMATICS


2. Exploring the Protein Data Bank. In the first problem, you visited the Protein Data Bank (PDB).

You can explore that site in more detail now (see also Section 6-6). The Protein Data Bank web site

(http://www.rcsb.org/pdb) is undergoing revision so that some of the web addresses

provided in this set of exercises may become outdated. The new site incorporates a ―Site Search‖

button that will enable you to search the PDB site for teaching materials and tutorials, in addition to

the standard ―Search‖ box that can be used to find specific structures in the PDB. You may be able to

find any of the materials described below using the ―Site Search‖ button. You can also use the

extensive Help files that are accessible from any page on the new PDB site.

The PDB is a repository of macromolecular structures. Perhaps the most important skill for a PDB site

user is the ability to find a particular structure. There is a query tutorial at

http://www.rcsb.org/pdb/query_tut.html that provides instructions on finding

structures in the PDB. On the new PDB site, the query tutorial is contained in the Help files.

Each structure in the PDB is assigned a PDB ID (or PDBid), a four-character alphanumeric code that

uniquely identifies that structure. So, for example, 4HHB is a hemoglobin structure and 8GCH is a

chymotrypsin structure. If you know the PDB ID, then you can use that to search the PDB. You can

obtain PDB IDs from research publications. Most scientists who determine macromolecular structures

are highly motivated to publish their findings in journals such as Science, Nature, Journal of

Biological Chemistry, Journal of Molecular Biology, and Protein Science. These journals have an

agreement with the PDB that requires authors to submit their structures to the PDB before the journal

will publish the results. Most of the figures in the text that contain a molecular structure include the

PDB ID (PDBid) for that structure in the figure legend.

For your first PDB search, you will find a PDB ID in a journal article, then find that structure on the

PDB site. Go to the Journal of Biological Chemistry web site (http://www.jbc.org) and search

for this paper:

Parthasarathy, S., Eaazhisai, K., Balaram, H., Balaram, P., and Murthy, M.R.N., Structure of

Plasmodium falciparum triose-phosphate isomerase-2-phosphoglycerate complex at 1.1-Å

resolution. J. Biol. Chem. 278, 52461–52470 (2003).

Go to the footnotes section and find the four-character PDB ID code for the Plasmodium protein. Next

go to the Protein Data Bank main page. Type the PDB ID in the search box and click on the ―Search‖

button to find the ―Structure Explorer‖ page for this enzyme. You can investigate the linked resources

on this page by completing the following exercises:

a. Download the PDB (structure) file for this protein to your computer (suggested folder: My

Documents/PDB Files; suggested name: 1o5x.pdb). You will need this file for Problem 3, studying

the protein's structure using Rasmol.

b. Download the protein sequence in FASTA format.

c. Find some still images of this protein on the PDB site. You can look under the ―View Structure‖

link on the left side of the 1o5x ―Structure Explorer‖ page. Scroll down the ―View Structure‖ page

until you come to ―Still Images.‖ To save an image, just right click on it (Mac users: Control click)

and select the option that lets you save the file (in Internet Explorer, the command is ―Download

image to disk‖; in Firefox, the command is ―Save Image As. .‖).

d. Return to the ―Structure Explorer‖ page for 1o5x. Click on the ―Other Sources‖ link on the left side

of the page. Follow the links for 1o5x to the sites at PDBSum and the IMB Jena Image Library.

Collect still images from each of these sites, and be sure to keep a record of where you found each

image. Suggest ways in which you can use such downloaded images.

http://www.rcsb.org/pdb

http://www.rcsb.org/pdb/query_tut.html

http://www.jbc.org/

BIOINFORMATICS


3. Using Rasmol. In Problem 2, you saved the PDB file for 1o5x, entitled ―Plasmodium falciparum

TIM complexed to 2-phosphoglycerate.‖ You can use Rasmol to explore this structure, focusing on

identifying secondary structures and looking at the active site.

a

.

Open the Rasmol program on your computer. (If you have not installed it already, please see the

opening paragraph of the exercises for Chapter 6). Open the file 1o5x.pdb. You will see a wireframe

image: All the bonds in the PDB structure file are shown as thin wires, colored according to Corey-

Pauling-Kultun (CPK) coloring rules (oxygen is red, nitrogen is blue, hydrogen is white, and carbon

is gray). There are seven drop-down menus in Rasmol: File, Edit, Display, Colours, Options,

Export, and Help. Spend a few minutes trying each command in each of the menus. Perform the

following operations:

1. File. .Information. This identifies the protein structure by name and PDB ID.

2. Display. .Backbone. This shows the protein backbone; the bonds actually connect alpha carbons.

3. Display. .Cartoon. This shows an image of the protein that clearly displays helices and sheets.

Leave your image in cartoon format and move to the Colours menu.

4. Colours. .Structure. This shows alpha helices in magenta, beta sheets in yellow, and turns in pale

blue.

5. Options. .Labels. This command labels all selected atoms. The view will not look good right

now because all atoms are selected. If you select a single atom and then use the label command,

you can attach text to that atom (call it by its name, or give it another label such as ―inhibitor‖).

6. Export. .GIF. This function enables you to export a still image of the structural view you just

created. Export your image as 1o5x.gif (store it somewhere accessible, such as the Desktop),

then view it in a simple image viewer (Paint in Windows; Preview in OSX).

7. HELP. .User Manual. This is really a critical tool for using Rasmol. In order for this to work, the

help file (Rasmol.hlp in Windows) must be stored in the same directory as the Rasmol program.

Even then, Rasmol may ask you to find it on your computer system. The Help file is searchable.

The Table of Contents has links to major features of the program, including frequently used

items such as Command Reference, Atom Expressions, and Colour Schemes. The manual is also

available at

http://info.bio.cmu.edu/Courses/BiochemMols/RasFrames/TOC.HTM.

b

.

Open the Rasmol command line window, if it is not already visible. You will use this window to

enter specific commands for viewing the structure, including highlighting the small molecules (3-

hydroxypyruvic acid and 2-phosphoglyceric acid) that are bound to triose phosphate isomerase in

the 1o5x file. But first you'll need to learn a little bit about viewing a PDB file.

Go to the Structure Explorer page for 1o5x at the PDB website

(http://www.rcsb.org/pdb/cgi/explore.cgi?job=summary&pdbId=1O5X&pag

e=&pid=190431099321055). Click on ―Download/Display File‖ on the left side of the page.

Under ―Display the Structure File,‖ select the ―HTML‖ option. This shows you the complete PDB

file. There is a lot of information in this file, but you'll only look at a few items. For more details,

you can go to the PDB Format Description page

(http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.htm

l) or click on any links you see on the HTML page for 1o5x.

Each line in a PDB file is called a ―record,‖ and the first six characters on that line tell what kind of

―record‖ it is. In your browser, search for SEQRES. As explained in the PDB Format Description,

SEQRES records contain the amino acid or nucleic acid sequence of residues in each chain of the

macromolecule. Hence you can see the sequence of your protein there. For 1o5x, the first few lines

of the SEQRES section are given in Table I (below). Each line contains 13 amino acid residues

listed by their three-letter abbreviations. So residue #27 in chain A is PHE (phenylalanine). The

12th character (counting spaces) in each record is a chain identifier. If a protein contains more than

one polypeptide chain, the chains are identified with a letter (in this file, there are two chains: A and

B).

http://info.bio.cmu.edu/Courses/BiochemMols/RasFrames/TOC.HTM

http://www.rcsb.org/pdb/cgi/explore.cgi?job=summary&pdbId=1O5X&page=&pid=190431099321055

http://www.rcsb.org/pdb/cgi/explore.cgi?job=summary&pdbId=1O5X&page=&pid=190431099321055

http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html

http://www.rcsb.org/pdb/docs/format/pdbguide2.2/guide2.2_frame.html

http://higheredbcs.wiley.com/legacy/college/voet/0471214957/bioinf_stud/sect4.html#voet4957b01-tbl-0001

BIOINFORMATICS


Table I.

SEQRES 1

A

248 MET ALA ARG LYS TYR PHE VAL ALA ALA ASN TRP LYS CYS

SEQRES 2

A

248 ASN GLY THR LEU GLU SER ILE LYS SER LEU THR ASN SER

SEQRES 3

A

248 PHE ASN ASN LEU ASP PHE ASP PRO SER LYS LEU ASP VAL

SEQRES 4

A

248 VAL VAL PHE PRO VAL SER VAL HIS TYR ASP HIS THR ARG

SEQRES 5

A

248 LYS LEU LEU GLN SER LYS PHE SER THR GLY ILE GLN ASN

Anything in a PDB file that is not either protein or nucleic acid is considered a heterogen atom and

is referred to with the prefix ―het.‖ So HETNAM is the label for a record that contains the name of a

nonprotein, non-nucleic acid group. Search the HTML version of the 1o5x file for ―HETNAM.‖

What are the heterogen groups in this structure?

c

.

Now you can display the heterogen groups in 1o5x. Go to the command line window of Rasmol and

enter the command ―select hetero and not water.‖ This command selects all the heterogen atoms

excluding water. The current view of 1o5x should be a cartoon diagram of the structure. To show

the heterogen molecules differently, enter the command ―spacefill on‖ and then ―color cpk.‖ This

creates a space-filling representation of the two molecules and colors them according to the CPK

conventions.

d

.

As the last part of this exercise, you will display the active site residues, based on Figure 4 of the

primary citation for this structure (see Problem 2). Figure 4a shows three residues that interact with

the 2-phosphoglycerate: glutamate 165, lysine 12, and histidine 95. Select these residues by entering

the command ―select lys12,his95,glu165‖ in the Rasmol command line window. Then use the drop-

down menu in the structure window to show the residues in a ball-and-stick format (Display. .Ball

& Stick). Finally, enter the command ―color cpk‖ so that you can distinguish the atoms of the

structure.

To get a better look at the intermolecular interactions, you can zoom in on the structure by pressing

the ―Shift‖ key as you move the mouse. To zoom in, drag the mouse up in the structure window. To

move the image side-to-side or up-and-down, use the right-click on your mouse (Mac users: use the

―Option‖ key) and drag the image where you want to go. Using a combination of Shift-mouse and

right-click (or Option-Mouse), you can get a close-up view of the binding site for 2-

phosphoglycerate.

To complete this exercise, identify and display additional residues that interact with the 2-

phosphoglycerate in 1o5x. A couple of hints: Use Figures 3 and 4 from the primary citation. Also,

for some reason, Rasmol won't select 2-phosphoglycerate using the command ―select 2PG,‖ but you

can select it using ―select 4400‖, which is the second way 2PG is identified in 1o5x.

4. Protein Families. The goal of this exercise is to identify a protein that shares structural homology

with triose phosphate isomerase (as seen in PDB ID 1o5x) but catalyzes a different reaction. Because

1o5x is a fairly recently described structure, it is not well documented in other structural databases.

Therefore you will use an earlier PDB entry on triose phosphate isomerase from Plasmodium

falciparum, PDB ID 1ydv. You will explore two resources, CATH and SCOP (see Section 6-6).

BIOINFORMATICS


a. CATH. You can access the CATH homepage at

http://www.biochem.ucl.ac.uk/bsm/cath/, but perhaps the easiest way to get to

these resources is through the ―Other Sources‖ link on the Protein Data Bank Structure Explorer

page for 1ydv. Click on the CATH link to open a new browser window containing the CATH main

page. Review the introduction to CATH before proceeding.

CATH describes proteins in a hierarchical fashion. What information is found in CATH under the

following headings?

1. Class

2. Architecture

3. Topology

4. Homologous superfamily

Return to the PDB ―Other Sources‖ page for 1ydv. Click on the ―CATH‖ link for 1ydv (right side

of the screen). Click on the ―1ydvA0‖ link to find a list of proteins that are members of this

superfamily. Explore the page to find five enzymes in this superfamily that catalyze reactions

different from the reaction catalyzed by triose phosphate isomerase.

b. SCOP. Once again, you can go directly to the SCOP homepage (http://scop.mrc-

lmb.cam.ac.uk/scop/), but it may be easier to get to these resources in SCOP by going

through the ―Other Sources‖ link on the Protein Data Bank Structure Explorer page for 1ydv. Click

on the ―SCOP‖ link to open a new browser window containing the SCOP main page. Read the

synopsis before continuing. Then return to the PDB page and click on the ―SCOP‖ link for 1ydv.

SCOP provides a lineage for each protein that is classified. Follow the lineage links at the Fold

level to identify five proteins that are related to triose phosphate isomerase. Are any of these

proteins also in your list for Problem 4(a)?

c. The final part of this exercise is to identify other resources that help you find proteins related to

triose phosphate isomerase from Plasmodium falciparum. You are encouraged to follow other links

from the PDB Other Sources page for 1ydv. You may also be able to find other resources by

searching the Internet using the PDB ID codes. List and summarize three other resources that you

find.


http://www.biochem.ucl.ac.uk/bsm/cath/

http://scop.mrc-lmb.cam.ac.uk/scop/

http://scop.mrc-lmb.cam.ac.uk/scop/

BIOINFORMATICS


MODULE 4: Enzyme Inhibitors and Rational Drug Design

1. Dihydrofolate Reductase. In Section 12-4, enzyme inhibitors are identified as the second largest

class of drugs. The first exercise for this chapter is to find the structure of an enzyme that has a

competitive inhibitor bound to its active site. First, look in the Protein Data Bank for the enzyme

dihydrofolate reductase (DHFR; see Section 22-3A). How many structures do you find for DHFR?

Since there are too many to analyze all at once, you can limit your search to DHFR complexed with

an inhibitor. On the PDB results page, select ―Refine your query‖ from the ―Pull down to select

option‖ menu. Enter the words ―inhibitor and x-ray‖ and click on the option for a full text search. Go

down the list to find DHFR complexed with NADPH (a substrate) and an inhibitor.

Download the PDB file and use Rasmol to visualize this structure (see the exercises for Chapter 6).

Display the protein in Ribbons format, colored by structure. Then select NADPH and the inhibitor

(using the command ―select hetero and not water‖) and show them as space-filling models with CPK

coloring.

2. HIV Protease. There is a description of structure-based drug design in Section 12-4A, and Box 12-

4 describes two drugs that are directed at the protease from human immunodeficiency virus (HIV

protease). In fact, these were some of the very first drugs to be created using structure-based drug

design. The first drugs directed against HIV targeted the reverse transcriptase (Boxes 12-4 and 24-2),

but the virus quickly developed resistance to these drugs. Researchers in academia and at

pharmaceutical companies began studying HIV protease when initial results indicated that it might be

useful as an additional drug target to delay the onset of AIDS. You are encouraged to look for recent

reviews on PubMedCentral (http://www.pubmedcentral.gov) or at your local library (try

journals such as Current Opinion in Chemical Biology and Trends in Biochemical Sciences) to find

more information on structure-based drug design.

http://www.pubmedcentral.gov/

BIOINFORMATICS


a. Go to Genbank (http://www.ncbi.nlm.nih.gov/) and search for the protein sequence of

HIV protease. Many mutant forms of HIV protease have been sequenced, so you may find a mutant

sequence. To get the native sequence, you'll have to reverse the mutation. For example, if you have

an L90M mutant sequence, you'll simply need to replace the M (methionine) at position 90 with L

(leucine). Save the sequence to a file in FASTA format.

b. Do a BLAST search with this sequence to find homologous proteins. HIV protease is a member of

the aspartic protease family, which includes pepsin (Box 12-4). What other proteases also appear in

your BLAST search?

c. Now search the Protein Data Bank for HIV protease structures. How many do you find? Rather

than searching through these structures to find particular inhibitors, start a new search using the

name of an inhibitor, such as saquinavir or ritonavir, both of which are mentioned in Box 12-4.

Search for those terms. Does either appear in the Protein Data Bank?

d. To take a closer look at the HIV protease complex with ritonavir, download the PDB file, 1HXW,

and open it in Rasmol. Use the drop-down menu to display the protein structure as a cartoon. Then

color by Structure.

The drug ritonavir is identified on the PDB Structure Explorer site for 1HXW by the abbreviation

―RIT.‖ Bring up the Rasmol command line window. Type in ―select RIT‖ and hit return. Then type

in ―wireframe 100.‖ You should see the ritonavir in wireframe format with CPK coloring. Now you

can get a better look at how ritonavir affects HIV protease. Look at the picture of ritonavir in Box

12-4. Identify the feature (in red) that mimics the geometry of the transition state. Find the

tetrahedral carbon atom in Rasmol. Note that when you click on an atom in the Rasmol structure

window, the atom is identified in the command line window. You may want to make the protein

disappear temporarily by entering ―select protein; cartoon off‖ in the command line window (you

can make it reappear with the command ―cartoon on‖). When you click on the correct carbon atom

(in the backbone between the two phenyl groups), it will be listed in the command line window:

Rasmol > Atom: C13 1865 Hetero: RIT 301.

Now identify the two aspartate residues in HIV protease that interact with ritonavir. You can use

the within command to do this: Enter ―select asp and within (5.0, atomno=1865).‖ This command

selects all aspartate residues within 5.0 angstroms of atom number 1865 in the PDB structure file.

Which aspartate residues are close to the carbon atom you identified above? Identify these residues

by number and chain.

e. Find the names of additional HIV protease inhibitors (using Google, for example) and see whether

they occur in structures in the PDB. Explore the structures using Rasmol and identify interactions

between the hydrophobic side chains on the inhibitors and the surface of HIV protease.

3. Pharmacogenomics and Single Nucleotide Polymorphisms. PubMedCentral contains an excellent

article that reviews the role of pharmacogenomics in medicine and drug discovery. Go to

http://www.pubmedcentral.gov and search for "pharmacogenomics review". One of the

articles that you should find is Chiche, J.-D., Cariou, A., and Mira, J.-P., Bench-to-bedside review:

Fulfilling promises of the Human Genome Project, Critical Care 6, 212–215 (2002). Oak Ridge

National Laboratories (ORNL) also has an excellent site on pharmacogenomics at http://www.ornl.gov/sci/techresources/Human_Genome/medicine/pharma.s

html. Pharmacogenomics is a very broad and rapidly expanding field. This exercise is a general

guide to introduce you to some relevant database and literature resources.

a. Using the article above and the ORNL site, define the following terms: pharmacogenomics, single

nucleotide polymorphism (SNP), and cytochrome P450. What is the significance of SNPs in the

function of cytochrome P450 and drug metabolism (see also Section 12-4D)?

b. Look in Entrez (http://www.ncbi.nlm.nih.gov) for cytochrome P450. You will have a

number of options to explore. Explore PubMed, PubMed Central, Books, and OMIM to find

documents relating to cytochrome P450 and single nucleotide polymorphisms.

http://www.ncbi.nlm.nih.gov/

http://www.pubmedcentral.gov/

http://www.ornl.gov/sci/techresources/Human_Genome/medicine/pharma.shtml

http://www.ornl.gov/sci/techresources/Human_Genome/medicine/pharma.shtml

http://www.ncbi.nlm.nih.gov/

BIOINFORMATICS


c. Return to the ―Entrez‖ page. This time, explore the databases for information on cytochrome P450

and single nucleotide polymorphisms. Suggested sites are the Protein sequence database and SNP

database. Describe the results of your exploration.


MODULE 5: Metabolic Enzymes, Microarrays, and Proteomics

Here is a list of a few useful and reliable online resources about metabolism:

The Biology Project at the University of Arizona: http://www.biology.arizona.edu/biochemistry/biochemistry.html

Metabolic Pathways of Biochemistry at George Washington University: http://www.gwu.edu/~mpb/

Chemistry Biology Information Center at ETH Zurich: http://www.infochembio.ethz.ch/links/en/biochem_metabolismus.html

Main Metabolic Pathways on the Internet: http://home.wxs.nl/~pvsanten/mmp/main.htm

Kyoto Encyclopedia of Genes and Genomes (KEGG): http://www.genome.ad.jp/kegg/metabolism.html

Enzyme Structures Database: http://www.ebi.ac.uk/thornton-srv/databases/enzymes/

1. Metabolic Enzymes. In Chapter 12, you looked at the role of enzyme inhibitors as drugs. In this

exercise, you will use some online resources to learn more about the enzymes involved and the

pathways that are affected.

a. Look in your textbook for dihydrofolate reductase (DHFR). Write out the reaction catalyzed and

the pathway involved (see Section 22-3B and Box 22-1).

b. Go to the Enzyme search page at the KEGG site (http://www.genome.jp/dbget-

bin/www_bfind?enzyme) and search for the enzyme (by name). Now look for links that lead

http://www.biology.arizona.edu/biochemistry/biochemistry.html

http://www.gwu.edu/~mpb/

http://www.infochembio.ethz.ch/links/en/biochem_metabolismus.html

http://home.wxs.nl/~pvsanten/mmp/main.htm

http://www.genome.ad.jp/kegg/metabolism.html

http://www.ebi.ac.uk/thornton-srv/databases/enzymes/


http://www.genome.jp/dbget-bin/www_bfind?enzyme

http://www.genome.jp/dbget-bin/www_bfind?enzyme

BIOINFORMATICS


to pathways that include DHFR. Where does DHFR appear in each pathway? Is this consistent with

your findings in the textbook?

c. Go to the Enzyme Structures Database (http://www.ebi.ac.uk/thornton-

srv/databases/enzymes/) and find dihydrofolate reductase. Every enzyme with a known

reaction is classified in a hierarchy: EC #.#.#.# where # represents a number (see Section 11-1A).

What is the enzyme classification for dihydrofolate reductase? Explore the hierarchy for

dihydrofolate reductase. What does each of the numbers in the hierarchy represent?

2. Microarrays. Malcolm Campbell at Davidson University has done a remarkable job of making a

high-end technology (microarrays; see Section 13-4C) available to researchers (students and faculty)

at the undergraduate level.

a. Visit Malcolm Campbell's site at Davidson and go through the following web exercise: DNA

Microarray Methodology (a FLASH animation) at

http://www.bio.davidson.edu/courses/genomics/chip/chipQ.html.

b

.

For more advanced background on microarrays, visit Manish Patel's microarray tutorial at http://www.ucl.ac.uk/oncology/MicroCore/HTML_resource/tut_frameset

.htm.

c. Visit PubMed Central or PubMed and find a review article on the use of microarrays to study one

of the following diseases: breast cancer, lymphoma, hypertension, atherosclerosis, or a disease that

is of particular interest to you. Provide the citation for the article you found and explain how

microarray technology was applied.

3. Proteomics. Proteomics (Section 13-4D) is the study of all the proteins expressed in an organism or

tissue under a specific set of conditions. To gain a broader understanding of proteomics, read the

following article: Graves, P.R. and Haystead, T.A., ―Molecular biologist's guide to proteomics,‖

Microbiol. Mol. Biol. Rev. 66, 39–63 (2002), which is available at PubMed Central. After reading this

article and reviewing other available resources, answer the following questions:

a. What analytical techniques are used most commonly to separate proteins in proteomics?

b. How can proteins be identified with certainty in proteomics?

c. What is meant by the phrase ―the dynamic range of protein expression‖? You will need to find an

additional source to answer this question; it is not addressed in the ―Molecular Biologist's Guide to

Proteomics.‖ Can you find quantitative values in the literature to further define this term? Leigh

Anderson has published some informative articles on the human plasma proteome; this would be a

good place to look.

4. Two-Dimensional Gel Electrophoresis. A major proteomics tool is two-dimensional gel

electrophoresis (2DE; see Problem 4 in the Bioinformatics exercises for Chapter 5). One of the

techniques you encountered in Problem 3(a) was 2DE. One of the best bioinformatics sites on the web

is the ExPASy server in Geneva, Switzerland (http://www.expasy.org), which has a

database/tools site for 2DE called Swiss-2DPAGE.

a. Open the Swiss-2DPAGE site (http://www.expasy.org/ch2d/). Follow the link to search

the site by description and find out how many human proteins are catalogued there (enter ―human‖

as the search keyword).

b. Return to the search page and search for dihydrofolate reductase. How many listings do you find?

Is a human version of DHFR catalogued here?



http://www.bio.davidson.edu/courses/genomics/chip/chipQ.html

http://www.ucl.ac.uk/oncology/MicroCore/HTML_resource/tut_frameset.htm

http://www.ucl.ac.uk/oncology/MicroCore/HTML_resource/tut_frameset.htm

http://www.expasy.org/

http://www.expasy.org/ch2d/

BIOINFORMATICS


c. Start over and search for E. coli proteins (enter ―E. coli‖ as the search keyword). Note that if you

use ―E. coli,‖ you'll get much different results than if you just use ―coli.‖ Now look for

dihydrofolate reductase in E. coli. (Use the find function in your browser to search the E. coli

results.) Follow the links to E. coli DHFR to answer the following questions:

1. What is the theoretical molecular weight (Mw) and isoelectric point (pI) for E. coli DHFR?

2. What actual values were obtained by 2DE?

3. What peptide fragment was used to identify E. coli DHFR? Use the BLAST server to find out

where this peptide is located in the E. coli DHFR sequence.

d. Search the Protein Data Bank for structures of DHFR. Has the three-dimensional structure of this

enzyme been determined? Keep searching to see if there are three-dimensional structures of E. coli

DHFR complexed with methotrexate. What is methotrexate and how is it used in the treatment of

disease? (Consult Box 22-1 and other resources for the answer.)


Date post:	08-Apr-2015
Category:	Documents
Upload:	emyad
View:	130 times
Download:	0 times