+ All Categories
Home > Self Improvement > Bio bikepresentation

Bio bikepresentation

Date post: 02-Nov-2014
Category:
Upload: abebebd
View: 253 times
Download: 2 times
Share this document with a friend
Description:
Biobike how to ?
Popular Tags:
240
BioBike: a web-based environment for integration and analysis of Biological knowledge Biniam Abebe
Transcript
Page 1: Bio bikepresentation

BioBike: a web-based environment for integration and analysis of Biological knowledge

Biniam Abebe

Page 2: Bio bikepresentation

• Scope

• How to use BioBike?

• Introduction

• What is BioBike ?

• How can we use BioBike to solve some of the question we have on our research?

Page 3: Bio bikepresentation

globin

Highly filtered output • Easy to grasp• High-level insights

Unfiltered output• Confusing• Basic insights

PROGRAMMER

Page 4: Bio bikepresentation

Integration- Information

-Resources

Page 5: Bio bikepresentation

We need…

Biologists . . .

. . . and Programmers

Page 6: Bio bikepresentation

We need…

Biologists . . .

. . . and Programmers

Page 7: Bio bikepresentation

We need…

Page 8: Bio bikepresentation

What is BioBike ?

Page 9: Bio bikepresentation

biobike.csbc.vcu.edu

Page 10: Bio bikepresentation

BioBIKE INSTANCES AND THEIR KNOWLEDGE AND DATA BASES

① A BioBIKE instance provides a framework for all available information needed by a given research communityIncluding

1. Sets of genomic sequences2. Gene annotations3. Functional descriptions4. Formal categories (e.g. COG)5. hierarchical groupings of metabolic reactions

linked with genes (from KEGG

More………….

Page 11: Bio bikepresentation

Current BioBIKEs

• CyanoBIke• 42 – Cyanobacteria

• Phantome/BioBIKE • 6 - Archeal virus , 758 – Bacteriophage, 754- Eubacteria,

1 - Eukaryotic Virus • Sterptobike

• stephylobike• ViroBike

Page 12: Bio bikepresentation

BioBIKE provides access to several programs that are commonly used:Blast - for sequence searchesClustal - for multiple sequence

alignmentsMeme - for motif discovery;RNAz - for discovery of conserved RNA

sequences; Phylip - for construction of phylogenetic

trees. All are accessed through the same interface,

Page 13: Bio bikepresentation

Way BioBIKE?

• Intelligibility• Computability of results and

nesting• Small working vocabulary• Implied iteration• Extensibility

Page 14: Bio bikepresentation

WELCOME TO BioBIKE !

Biological Integrated

Knowledge Environment

Page 15: Bio bikepresentation

The BioBIKE environment is divided into three areas as shown. You'll bring functions down from the function palette to the workspace, execute them, and note the results in the results window

Function palette

Workspace

Results window

Page 16: Bio bikepresentation

Construct the code you want to execute here!

For a visual guide to the VPL, click here

Two very important buttons on the function palette:

On-line help (general)

Something went wrong? Tell us!

HELP!

PROBLEM

Page 17: Bio bikepresentation

Construct the code you want to execute here!

For a visual guide to the VPL, click here

Two very important buttons in the workspace:

Undo (return to workspace before last action)

Redo (Get back the workspace you undid)

Page 18: Bio bikepresentation
Page 19: Bio bikepresentation

Construct the code you want to execute here!

For a visual guide to the VPL, click here

Page 20: Bio bikepresentation

Construct the code you want to execute here!

For a visual guide to the VPL, click here

Page 21: Bio bikepresentation

A COUNT-OF function box is now in the workspace.

Before continuing with the problem, let's consider what function boxes mean.

Page 22: Bio bikepresentation

A COUNT-OF function box is now in the workspace.

Before continuing with the problem, let's consider what function boxes mean.

Page 23: Bio bikepresentation

General Syntax of BioBIKE

Function-name Argument(object)

Keyword object Flag

The basic unit of BioBIKE is the function box. It consists of the name of a function, perhaps one or more required arguments, and optional keywords and flags.

A function may be thought of as a black box: you feed it information, it produces a product.

Page 24: Bio bikepresentation

• Function-name (e.g. SEQUENCE-OF or LENGTH-OF)

• Argument: Required, acted on by function

• Keyword clause: Optional, more information

General Syntax of BioBIKE

• Flag: Optional, more (yes/no) information

Function-name Argument(object)

Keyword object Flag

Function boxes contain the following elements:

Page 25: Bio bikepresentation

General Syntax of BioBIKE

Function-name Argument(object)

Keyword object Flag

… and icons to help you work with functions:

• Option icon: Brings up a menu of keywords and flags

• Clear/Delete icon: Removes information you entered or removes box entirely

• Action icon: Brings up a menu enabling you to execute a function, copy and paste, information, get help, etc

Page 26: Bio bikepresentation

Functions

SinAngle

Sin (angle)

Page 27: Bio bikepresentation

Functions

Length

Entity

Page 28: Bio bikepresentation

Functions

Length

Entity"icahLnlna bormA" 14

Abraham Lincoln

"Abraham Lincoln"

192

14

variable vs literal

Page 29: Bio bikepresentation

Functions

Length

Entity"icahLnlna bormA" 14

Abraham Lincoln

"Abraham Lincoln"

192

14

US-presidents 44

list vs single value

Page 30: Bio bikepresentation

Functions

Length

Entity"icahLnlna bormA" 14

Abraham Lincoln

"Abraham Lincoln"

192

14

US-presidents 44(188 170 189 163 …)

single application of a function vs

iteration of a function

Page 31: Bio bikepresentation

Arcsin

Functions

SinAngle

Angle

Page 32: Bio bikepresentation

Arcsin

Functions

Angle

Sin (angle)

Nested functionsEvaluated from the inside outA box is replaced by its value

Page 33: Bio bikepresentation

Gene (npf0076)

Functions

"transposase"

Page 34: Bio bikepresentation

Gene (npf0076)

Nested functions

Evaluated from the inside outA box is replaced by its value

Page 35: Bio bikepresentation

Gene (npf0076)

Functions

OptionsModify the characteristics of the function they govern

Page 36: Bio bikepresentation

Gene (npf0076)

Pitfalls(the most common error in the language)

CLOSE BOXES BEFORE EXECUTINGWhite is incompatible with execution

Page 37: Bio bikepresentation

Distinction between

a result and

a display

result

display

Page 38: Bio bikepresentation
Page 39: Bio bikepresentation

• Speaks molecular biology

Page 40: Bio bikepresentation

• Speaks common bioinformatic tools

Page 41: Bio bikepresentation

• Speaks common bioinformatic tools

Page 42: Bio bikepresentation

• Speaks common bioinformatic tools

Page 43: Bio bikepresentation
Page 44: Bio bikepresentation

Dem

o

Page 45: Bio bikepresentation

BioBIKE

Page 46: Bio bikepresentation

Tour of BioBIKE : Integration of sequences across organisms &

human insight

We are interested in a highly conserved hypothetical protein:

asr1156

Page 47: Bio bikepresentation
Page 48: Bio bikepresentation

Very strange it start in different place different cyanobacteria!

Is the start Wrong ?A. Collect the NT sequence

including the upstream region. HOW ???

B. Translate into AA sequenceC. Repeat X timesD. Make an alignment

Page 49: Bio bikepresentation

STEP IA. Find orthologs in other

cyanobacteria

Page 50: Bio bikepresentation

STEP IA. Find orthologs in other

cyanobacteria

Page 51: Bio bikepresentation

STEP IA. Find orthologs in other

cyanobacteria

Page 52: Bio bikepresentation

STEP IA. Find orthologs in other

cyanobacteria

Page 53: Bio bikepresentation

STEP IA. Find orthologs in other

cyanobacteria

Page 54: Bio bikepresentation

STEP IA. Find orthologs in other

cyanobacteria

Page 55: Bio bikepresentation

STEP IA. Find orthologs in other

cyanobacteria

Page 56: Bio bikepresentation

STEP IIA. Align the proteins of the

previous result

Page 57: Bio bikepresentation

STEP IIA. Align the proteins of the

previous result

Page 58: Bio bikepresentation

STEP IIA. Align the proteins of the

previous result

Page 59: Bio bikepresentation

STEP IIA. Align the proteins of the

previous result

Page 60: Bio bikepresentation

STEP IIA. Align the proteins of the

previous result

Page 61: Bio bikepresentation
Page 62: Bio bikepresentation
Page 63: Bio bikepresentation
Page 64: Bio bikepresentation
Page 65: Bio bikepresentation
Page 66: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 67: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 68: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 69: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

A function may directly be applied

on another function

Page 70: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 71: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 72: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 73: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 74: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 75: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 76: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 77: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 78: Bio bikepresentation

STEP IIA. Align the proteins of the

previous resultB. Align the protein sequences

extended uspstream

Page 79: Bio bikepresentation

The start is wrong !

Page 80: Bio bikepresentation
Page 81: Bio bikepresentation
Page 82: Bio bikepresentation

Tour of BioBIKE : integration of metabolism information,

Bioinformatic tools & human knowledge

How to find a regulatory motive?

Example: GlnA

Page 83: Bio bikepresentation

Mission im

possible !!!

Page 84: Bio bikepresentation

A. Find GlnA in the cyanobacterial genomes

Page 85: Bio bikepresentation
Page 86: Bio bikepresentation
Page 87: Bio bikepresentation
Page 88: Bio bikepresentation

A. Find GlnA in the cyanobacterial genomes

B. Collect the sequences upstream

Page 89: Bio bikepresentation

A. Find GlnA in the cyanobacterial genomes

B. Collect the sequences upstream

Page 90: Bio bikepresentation

A. Find GlnA in the cyanobacterial genomes

B. Collect the sequences upstream

Page 91: Bio bikepresentation
Page 92: Bio bikepresentation

A. Find GlnA in the cyanobacterial genomes

B. Collect the sequences upstream

C. Search for a conserved motif among these sequences using MEME

Page 93: Bio bikepresentation

A. Find GlnA in the cyanobacterial genomes

B. Collect the sequences upstream

C. Search for a conserved motif among these sequences using MEME

Page 94: Bio bikepresentation

A. Find GlnA in the cyanobacterial genomes

B. Collect the sequences upstream

C. Search for a conserved motif among these sequences using MEME

Page 95: Bio bikepresentation
Page 96: Bio bikepresentation
Page 97: Bio bikepresentation
Page 98: Bio bikepresentation

OR

Page 99: Bio bikepresentation
Page 100: Bio bikepresentation
Page 101: Bio bikepresentation
Page 102: Bio bikepresentation
Page 103: Bio bikepresentation

We have found a potential NtcA binding site!

GT9NTAC

Page 104: Bio bikepresentation

Dem

o

Page 105: Bio bikepresentation

In this tour, you'll see how to:

Tour of BioBIKE II

• Find the number of contigs in a metagenome• Find the average contig size in a metagenome• Find the average GC content within a metagenome• Visualize the distribution of GC content amongst the contigs of a metagenome

Page 106: Bio bikepresentation

Construct the code you want to execute here!

For a visual guide to the VPL, click here

Our Story

Suppose you have a special interest in a sequence, a contig, derived from the metagenome taken from the Arctic Ocean. The metagenome is called p-arct.

How many sequences does that metagenome contain?

Clicking on or hovering over any palette button brings down choices of functions or data to bring into the workspace. Click on the LISTS-TABLES button to reach a function that will count how many sequences are within the list of sequences that make up the metagenome p-Arct.

Page 107: Bio bikepresentation

Construct the code you want to execute here!

For a visual guide to the VPL, click here

Page 108: Bio bikepresentation

Construct the code you want to execute here!

For a visual guide to the VPL, click here

Page 109: Bio bikepresentation

A COUNT-OF function box is now in the workspace.

Before continuing with the problem, let's consider what function boxes mean.

Page 110: Bio bikepresentation

Back to our story… we wanted to count the number of contiguous sequences in our favorite metagenome p-Arct.

Click on the gray argument box to activate it for entry, either from the keyboard or by insertion.

Page 111: Bio bikepresentation
Page 112: Bio bikepresentation
Page 113: Bio bikepresentation
Page 114: Bio bikepresentation
Page 115: Bio bikepresentation
Page 116: Bio bikepresentation
Page 117: Bio bikepresentation
Page 118: Bio bikepresentation
Page 119: Bio bikepresentation
Page 120: Bio bikepresentation
Page 121: Bio bikepresentation
Page 122: Bio bikepresentation
Page 123: Bio bikepresentation
Page 124: Bio bikepresentation
Page 125: Bio bikepresentation
Page 126: Bio bikepresentation
Page 127: Bio bikepresentation
Page 128: Bio bikepresentation
Page 129: Bio bikepresentation
Page 130: Bio bikepresentation
Page 131: Bio bikepresentation
Page 132: Bio bikepresentation
Page 133: Bio bikepresentation
Page 134: Bio bikepresentation
Page 135: Bio bikepresentation
Page 136: Bio bikepresentation
Page 137: Bio bikepresentation
Page 138: Bio bikepresentation
Page 139: Bio bikepresentation
Page 140: Bio bikepresentation
Page 141: Bio bikepresentation
Page 142: Bio bikepresentation
Page 143: Bio bikepresentation
Page 144: Bio bikepresentation
Page 145: Bio bikepresentation
Page 146: Bio bikepresentation
Page 147: Bio bikepresentation
Page 148: Bio bikepresentation
Page 149: Bio bikepresentation
Page 150: Bio bikepresentation
Page 151: Bio bikepresentation
Page 152: Bio bikepresentation
Page 153: Bio bikepresentation
Page 154: Bio bikepresentation
Page 155: Bio bikepresentation
Page 156: Bio bikepresentation
Page 157: Bio bikepresentation
Page 158: Bio bikepresentation
Page 159: Bio bikepresentation
Page 160: Bio bikepresentation
Page 161: Bio bikepresentation

GC fraction of contigs in p-Arct

0

2000

4000

6000

8000

10000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

GC-fraction

Nu

mb

er o

f co

nti

gs

Page 162: Bio bikepresentation

• Display the sequence of a metagenome contig• Find similar sequences amongst metagenomes• Find similar sequences amongst known viruses• Find similar sequences amongst everything in GenBank• Make a sequence alignment• Make a phylogenetic tree• Save your work session

In this tour, you'll see how to:

Tour of BioBIKE III Sequence comparison

Page 163: Bio bikepresentation

Our Story

Suppose you have a special interest in a sequence, a contig, derived from the metagenome taken from the Arctic Ocean.

The metagenome is called p-arct.

The sequence is called C60790.

What does the sequence look like?

Page 164: Bio bikepresentation

Clicking on any palette button brings down choices of functions or data to bring into the workspace.

Click the function DISPLAY-SEQUENCE-OF.

Page 165: Bio bikepresentation

A DISPLAY-SEQUENCE-OF function box is now in the workspace.

Before continuing with the problem, let's consider what function boxes mean.

Page 166: Bio bikepresentation

Back to our story… we were displaying the sequence of our favorite metagenome contig, C60790.

Click on the gray argument box to activate it for entry, either from the keyboard or by insertion.

Page 167: Bio bikepresentation

Now that the box is open, type in the name of the contig, C60790. Upper/lower case doesn't matter.

When you're done, close the box by pressing Enter or Tab. If you forget to close the box, the function will not work.

Page 168: Bio bikepresentation

To set the length of the lines to be displayed by mousing over the Options icon and clicking LINE-LENGTH.

Actually, the default line length is perfectly OK. I did this just to show you an option in action.

Page 169: Bio bikepresentation

Enter a value into the option entry box in the same way you entered a value into the argument box: Click on the box, type, then close the box by pressing Enter or Tab.

Page 170: Bio bikepresentation

The default format for sequences is lines preceded by coordinates. If you want the sequence in FastA format, mouse over the Options icon and click FastA.

(An example of a Flag in action)

Page 171: Bio bikepresentation

The function is now complete. To execute it, mouse over the Action icon and click Execute.

Page 172: Bio bikepresentation

Displayed results appear in popup windows, which you can copy or save. When your done with it, click the red X in the upper right hand corner to get rid of it.

FireFox has an upper limit on popup windows, so it's a good idea to clean up as you go.

Page 173: Bio bikepresentation

Is the DNA sequence similar to any other metagenome sequence?

To find out, mouse over the STRINGS-SEQUENCES menu and click SEQUENCE-SIMILAR-TO.

This function allows you to search for similarity by pattern, by mismatches, or by Blast (default).

Page 174: Bio bikepresentation

The function asks for two arguments: the query sequence and the target sequences against which the query will be compared.

The query is c60790, of course. We could enter it by typing, as before, but it is more interesting to copy and paste what you already typed. To do this mouse over the Action icon of the box containing c60790.

Page 175: Bio bikepresentation

Click Copy.

Page 176: Bio bikepresentation

To paste, mouse over the Action icon of the box into which you're pasting and click Paste.

Page 177: Bio bikepresentation

Now to enter the target sequences – the set of all metagenome sequences. Click on the target box to open it for entry.

Once the box is open, you could specify by typing that you want to search metagenomic sequences… if you knew what to type.

Page 178: Bio bikepresentation

If you don't know, then mouse over the DATA button, then Organisms, then Metagenomes.

Clicking on Metagenomes transfers it to the open target box.

Page 179: Bio bikepresentation

Execute the completed function as before, mousing over the Action icon of the function and clicking Execute.

Doing so starts Blast, which may take several seconds to complete execution.

Page 180: Bio bikepresentation

You might expect that your sequence from P-Arct would find other sequences from the same metagenome. It does, but interestingly, after itself, the next 10 best hits are from the P-BBC metagenome.

Use browser controls to save the box, if you like, then X out of it.

Page 181: Bio bikepresentation

Of course the metagenome sequences are not annotated. Perhaps you can learn more about your sequence by comparing it to sequences from known viruses.

To do this, clear the target box, open it up again by clicking on it…

Page 182: Bio bikepresentation

…and bring down Known Viruses into the box.

Page 183: Bio bikepresentation

Protein searches will find more sequences, mouse over the Options icon and specify that your DNA sequence is to be translated and compared to viral proteins.

Page 184: Bio bikepresentation

Execute the completed function. Again, execution may take several seconds.

Page 185: Bio bikepresentation

Only one hit, and a very poor one at that!

This is typical, because while ViroBIKE has virtually all known viral genomes, those that are known cover only a tiny fraction of viruses that exist in nature.

X out of the window and clear known viruses so that we can try another approach.

Page 186: Bio bikepresentation

There is a good deal more variety in organismal genomes than viral genomes, so let's search them.

ViroBIKE does not keep organismal genomes locally, so we need to go out to GenBank.

Click on the DATA button again.

Page 187: Bio bikepresentation

…and this time click GenBank.

Page 188: Bio bikepresentation

Execute the function as usual. This time we will be at the mercy of NCBI, and depending on the time of day and the phase of the moon, execution may take a minute or longer.

By default, ViroBIKE times out execution at 40 seconds. If this occurs, you'll get a message like…

Page 189: Bio bikepresentation

*** TIMEOUT ! TIMEOUT ! TIMEOUT *** *** COMPUTATION ABORTED AFTER 40 SECONDS *** *** YOU CAN: *** - contact support for help: [email protected] *** - use the TOOLS -> PREFS menu or the SET-TIMELIMIT function to extend your timeout up to 1 hour *** - use RUNJOB to run your code in a separate process *** - type (explain-timeout) at the weblistener for detailed info.

You can change the time limit, but let's say that fate is with us and you get your result.

Page 190: Bio bikepresentation

Interesting! Many highly significant hits from various bacteria…

Page 191: Bio bikepresentation

…at different regions of your sequence.

At NCBI, that would be the end of the story. In ViroBIKE, it's the beginning, since you can work with your Blast results.

First, we'll want to give the result a name.

Page 192: Bio bikepresentation

To name a result, mouse over the DEFINITION menu and click DEFINE.

Page 193: Bio bikepresentation

The DEFINE function asks for two arguments: the name of the variable and the value that will be assigned to it.

Click on the variable entry box.

Page 194: Bio bikepresentation

You can name the result anything you like, so long as the name does not contain spaces (hyphens and underscores are OK).

I chose c67090-vs-NR.

Press Tab after typing a name.

Page 195: Bio bikepresentation

Tabbing opens up the next argument, the value box.

The value to be assigned is the Blast table. There are many ways to retrieve that result. One way is to recognize that it is the result of the previous function.

Click the OTHER-COMMAND button...

Page 196: Bio bikepresentation

…and click Previous-Result.

Page 197: Bio bikepresentation

Executing the function will cause the variable you named to spring into existence, accessible through a new button. Watch for it!

Page 198: Bio bikepresentation

We'll be using that VARIABLES button in a moment. For now, mouse over STRINGS-SEQUENCES, then SEARCH/COMPARE, and…

Page 199: Bio bikepresentation

Click on BLAST-VALUE.

This function allows you to extract values from the Blast table.

Page 200: Bio bikepresentation

What values do we want to extract?

Recall…

Page 201: Bio bikepresentation

7 of the top 27 hits came from the same region of your sequence, from coordinates 15 to 503. Notice also that the reading frame is the same in all cases, negative, indicating that the match is on the complementary strand.

Let's extract the 7 sequences that matched. First specify the blast-table from which you'll extract data.

Page 202: Bio bikepresentation

After opening up the blast-table entry box, mouse over the VARIABLES button and click the name of the variable you just created.

Page 203: Bio bikepresentation

This brings the variable into the open box.

Now specify the cells you want, by row numbers (lines) and column.

Click to open the line box

Page 204: Bio bikepresentation

Type the lines you want into the open box as a set: (2 6 10 14 17 20 23)

In BioBIKE, elements of sets are separated by spaces, not commas.

After typing in the list in parentheses, press TAB to move to the column box.

Page 205: Bio bikepresentation

You can enter any column shown in the Blast table plus several other fields that are normally not displayed. One of these fields is the sequence of the target ("T-SEQ"). Type this into the column box and press Enter.

Page 206: Bio bikepresentation

Executing the function will get you the seven bacterial target sequences matching the coordinate 15 – 503 region of your sequence.

Page 207: Bio bikepresentation

We'd like to compare these bacteral sequences with the region from your sequence.

But that region is a DNA sequence. We'll need to translate it.

To do this, click on the GENES-PROTEINS button

Page 208: Bio bikepresentation

Mouse over TRANSLATION and click the TRANSLATION-OF function.

Page 209: Bio bikepresentation

Open the argument box of TRANSLATION-OF for input.

We want to put into this box your sequence, but just the portion from 15 to 503, and on the complementary strand.

Mouse over the GENES-PROTEINS button to get a function that will extract what you want.

Page 210: Bio bikepresentation

Click the SEQUENCE-OF function.

Page 211: Bio bikepresentation

And paste it into the argument of SEQUENCE-OF.

Executing now will translate the entire sequence. But we want only part of the sequence.

Page 212: Bio bikepresentation

So mouse over Options icon and click the FROM option.

Page 213: Bio bikepresentation

And do the same thing to get the TO option.

Page 214: Bio bikepresentation

Now type into the FROM entry box the beginning coordinate, 15, and press TAB.

Page 215: Bio bikepresentation

And type into the TO entry box the end coordinate, 503, and press ENTER.

Page 216: Bio bikepresentation

The sequence needs to be inverted (read from the complementary strand), so choose that option.

Page 217: Bio bikepresentation

And finally, we want to give the sequence a name so we can keep track of it during sequence comparisons. Uh-oh… The option, WITH-LABEL is off screen.

One way to handle this is to make space by clearing a now unnecessary box.

Page 218: Bio bikepresentation

Better. Now click on the Options icon

Page 219: Bio bikepresentation

And this time the WITH-LABEL option appears. Click on it.

Page 220: Bio bikepresentation

And fill in its entry box with a descriptive name. I chose "c60790-15-503R", indicating the contig, coordinates, and orientation.

Note that the name must be in quotes.

Page 221: Bio bikepresentation

Executing the function should give an amino acid sequence resulting from the translation of the desired region of your sequence.

Page 222: Bio bikepresentation

We now have all the relevant sequences, ready to be joined together into a single list and compared.

To join the sequences, mouse over the LISTS-TABLES button, then LIST-PRODUCTION, and click on the JOIN function

Page 223: Bio bikepresentation

We could define names for the bacterial sequence and the translated sequence, but… too much bother. Instead, cut and paste.

Click on the Action icon of the function that produced the bacterial sequences…

Page 224: Bio bikepresentation

Cut the function box and paste it into the first argument box of JOIN.

Page 225: Bio bikepresentation

Then cut the TRANSLATION function…

Page 226: Bio bikepresentation

…and paste it into the second argument box of JOIN.

Page 227: Bio bikepresentation

Again, we could name the joined sequences and then align them, but it is easier simply to surround the JOIN function with the function that will do the aligning.

Click on Surround with, from the Action icon menu.

Page 228: Bio bikepresentation

Then select ALIGNMENT-OF from the STRINGS-SEQUENCES menu, BIOINFORMATI-TOOLS submenu.

Page 229: Bio bikepresentation

It was a bit of work, but we finally have what we want: a single list consisting of the region of your sequence that is similar to the collection of bacterial sequences, all ready to be aligned.

Go to the Action icon to execute.

Page 230: Bio bikepresentation

This is another function that usually requires several seconds.

Page 231: Bio bikepresentation

The alignment in the popup window shows us which regions are conserved in the putative open reading frame in your sequence. By including more divergent protein, we can assess whether the putative ORF retains motifs typical of this class of protein.

From the alignment we can also generate a phylogenetic tree. X out of the window.

Page 232: Bio bikepresentation

And to save space, collapse the alignment box into a stub.

Page 233: Bio bikepresentation

The full function is still there, but it occupies less space on the screen.

Now click on the Action icon of the ALIGNMENT-OF box to begin surrounding the function by a function that will create a phylogenetic tree.

Page 234: Bio bikepresentation

Click Surround with.

Page 235: Bio bikepresentation

…and go to STRINGS-SEQUENCES, PHYLOGENETIC-TREE, TREE-OF to surround the alignment with the tree function.

Page 236: Bio bikepresentation

The function will store much tree-related information on disk, in case you want to modify the tree later. It needs to know the name of a new directory in which to put the information. I chose "c60790-orf1".

Page 237: Bio bikepresentation

There are many ways of constructing trees. I chose PARSIMONY -- estimating phylogenetic proximity by the number of steps it takes to go from one sequence to another.

Page 238: Bio bikepresentation

Execute. After several seconds, the function will give you the same alignment you saw before and a few seconds after that a tree.

Page 239: Bio bikepresentation

The three Sphingomonas proteins cluster together, as do the Erythrobacter proteins. Then there's yours.

Page 240: Bio bikepresentation

If you want to return to this session or refer to it later, you can save it by mousing over the EDIT button and clicking Save user session.


Recommended